Pull i2c fixes from Wolfram Sang:
"Nothing fancy. Two driver and one DT binding fix"
* tag 'i2c-for-6.4-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: imx-lpi2c: fix type char overflow issue when calculating the clock cycle
i2c: qup: Add missing unwind goto in qup_i2c_probe()
dt-bindings: i2c: opencores: Add missing type for "regstep"
Pull perf fixes from Borislav Petkov:
- Drop the __weak attribute from a function prototype as it otherwise
leads to the function getting replaced by a dummy stub
- Fix the umask value setup of the frontend event as former is
different on two Intel cores
* tag 'perf_urgent_for_v6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel: Fix the FRONTEND encoding on GNR and MTL
perf/core: Drop __weak attribute from arch_perf_update_userpage() prototype
Pull objtool fix from Borislav Petkov:
- Add a ORC format hash to vmlinux and modules in order for other tools
which use it, to detect changes to it and adapt accordingly
* tag 'objtool_urgent_for_v6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/unwind/orc: Add ELF section with ORC version identifier
Pull x86 fixes from Borislav Petkov:
- Do not use set_pgd() when updating the KASLR trampoline pgd entry
because that updates the user PGD too on KPTI builds, resulting in
memory corruption
- Prevent a panic in the IO-APIC setup code due to conflicting command
line parameters
* tag 'x86_urgent_for_v6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/apic: Fix kernel panic when booting with intremap=off and x2apic_phys
x86/mm: Avoid using set_pgd() outside of real PGD pages
Pull drm fixes from Dave Airlie:
"Very quiet last week, just two misc fixes, one dp-mst and one qaic:
qaic:
- dma-buf import fix
dp-mst:
- fix NULL ptr deref"
[ It turns out it was a quiet week because Alex Deucher hadn't sent in
his pending AMD changes. So they are coming next - Linus ]
* tag 'drm-fixes-2023-06-23' of git://anongit.freedesktop.org/drm/drm:
drm: use mgr->dev in drm_dbg_kms in drm_dp_add_payload_part2
accel/qaic: Call DRM helper function to destroy prime GEM
Pull ARM SoC fixes from Arnd Bergmann:
"The final bug fixes for Qualcomm and Rockchips came in, all of them
for devicetree files:
- Devices on Qualcomm SC7180/SC7280 that are cache coherent are now
marked so correctly to fix a regression after a change in kernel
behavior
- Rockchips has a few minor changes for correctness of regulator and
cache properties, as well as fixes for incorrect behavior of the
RK3568 PCI controller and reset pins on two boards"
* tag 'arm-fixes-6.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
arm64: dts: qcom: sc7280: Mark SCM as dma-coherent for chrome devices
arm64: dts: qcom: sc7180: Mark SCM as dma-coherent for trogdor
arm64: dts: qcom: sc7180: Mark SCM as dma-coherent for IDP
dt-bindings: firmware: qcom,scm: Document that SCM can be dma-coherent
arm64: dts: rockchip: Fix rk356x PCIe register and range mappings
arm64: dts: rockchip: fix button reset pin for nanopi r5c
arm64: dts: rockchip: fix nEXTRST on SOQuartz
arm64: dts: rockchip: add missing cache properties
arm64: dts: rockchip: fix USB regulator on ROCK64
Pull btrfs fix from David Sterba:
"Unfortunately the recent u32 overflow fix was not complete, there was
one conversion left, assertion not triggered by my tests but caught by
Qu's fstests case.
The "cleanup for later" has been promoted to a proper fix and wraps
all uses of the stripe left shift so the diffstat has grown but leaves
no potentially problematic uses.
We should have done it that way before, sorry"
* tag 'for-6.4-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix remaining u32 overflows when left shifting stripe_nr
Pull block fix from Jens Axboe:
"It's apparently the week of 'fixup something from last week', because
the same is true for this block pull request.
Fix up a lock grab that needs to be IRQ saving, rather than just IRQ
disabling, in the block cgroup code"
* tag 'block-6.4-2023-06-23' of git://git.kernel.dk/linux:
block: make sure local irq is disabled when calling __blkcg_rstat_flush
Pull iommu fix from Joerg Roedel:
- Fix potential memory leak in AMD IOMMU domain allocation path
* tag 'iommu-fix-v6.4-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/amd: Fix possible memory leak of 'domain'
Pull sound fixes from Takashi Iwai:
"Three oneliner fixes: one for a thinko in SOF SoundWire code and two
HD-audio quirks for ASUS laptops. All device-specific and should be
safe to apply"
* tag 'sound-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda/realtek: Add quirk for ASUS ROG GV601V
ALSA: hda/realtek: Add quirk for ASUS ROG G634Z
ASoC: intel: sof_sdw: Fixup typo in device link checking
Pull gpio fixes from Bartosz Golaszewski:
- fix IRQ initialization in gpiochip_irqchip_add_domain()
- add a missing return value check for platform_get_irq() in
gpio-sifive
- don't free irq_domains which GPIOLIB does not manage
* tag 'gpio-fixes-for-v6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpiolib: Fix irq_domain resource tracking for gpiochip_irqchip_add_domain()
gpio: sifive: add missing check for platform_get_irq
gpiolib: Fix GPIO chip IRQ initialization restriction
One last Qualcomm ARM64 DeviceTree fix for v6.4
Changes related to cache management for DMA memory caused WiFi to stop
work on SC7180 and SC7280 based products, using TF-A. These changes
marks the relevant device dma-coherent to correct the behavior.
* tag 'qcom-arm64-fixes-for-6.4-2' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux:
arm64: dts: qcom: sc7280: Mark SCM as dma-coherent for chrome devices
arm64: dts: qcom: sc7180: Mark SCM as dma-coherent for trogdor
arm64: dts: qcom: sc7180: Mark SCM as dma-coherent for IDP
dt-bindings: firmware: qcom,scm: Document that SCM can be dma-coherent
Link: https://lore.kernel.org/r/20230622203248.106422-1-andersson@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Dave Airlie reports that gcc-13.1.1 has started complaining about some
of the workqueue code in 32-bit arm builds:
kernel/workqueue.c: In function ‘get_work_pwq’:
kernel/workqueue.c:713:24: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast]
713 | return (void *)(data & WORK_STRUCT_WQ_DATA_MASK);
| ^
[ ... a couple of other cases ... ]
and while it's not immediately clear exactly why gcc started complaining
about it now, I suspect it's some C23-induced enum type handlign fixup in
gcc-13 is the cause.
Whatever the reason for starting to complain, the code and data types
are indeed disgusting enough that the complaint is warranted.
The wq code ends up creating various "helper constants" (like that
WORK_STRUCT_WQ_DATA_MASK) using an enum type, which is all kinds of
confused. The mask needs to be 'unsigned long', not some unspecified
enum type.
To make matters worse, the actual "mask and cast to a pointer" is
repeated a couple of times, and the cast isn't even always done to the
right pointer, but - as the error case above - to a 'void *' with then
the compiler finishing the job.
That's now how we roll in the kernel.
So create the masks using the proper types rather than some ambiguous
enumeration, and use a nice helper that actually does the type
conversion in one well-defined place.
Incidentally, this magically makes clang generate better code. That,
admittedly, is really just a sign of clang having been seriously
confused before, and cleaning up the typing unconfuses the compiler too.
Reported-by: Dave Airlie <airlied@gmail.com>
Link: https://lore.kernel.org/lkml/CAPM=9twNnV4zMCvrPkw3H-ajZOH-01JVh_kDrxdPYQErz8ZTdA@mail.gmail.com/
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Tejun Heo <tj@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Claim clkhi and clklo as integer type to avoid possible calculation
errors caused by data overflow.
Fixes: a55fa9d0e4 ("i2c: imx-lpi2c: add low power i2c bus driver")
Signed-off-by: Clark Wang <xiaoning.wang@nxp.com>
Signed-off-by: Carlos Song <carlos.song@nxp.com>
Reviewed-by: Andi Shyti <andi.shyti@kernel.org>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Smatch Warns:
drivers/i2c/busses/i2c-qup.c:1784 qup_i2c_probe()
warn: missing unwind goto?
The goto label "fail_runtime" and "fail" will disable qup->pclk,
but here qup->pclk failed to obtain, in order to be consistent,
change the direct return to goto label "fail_dma".
Fixes: 9cedf3b2f0 ("i2c: qup: Add bam dma capabilities")
Signed-off-by: Shuai Jiang <d202180596@hust.edu.cn>
Reviewed-by: Dongliang Mu <dzm91@hust.edu.cn>
Reviewed-by: Andi Shyti <andi.shyti@kernel.org>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Cc: <stable@vger.kernel.org> # v4.6+
Pull networking fixes from Paolo Abeni:
"Including fixes from ipsec, bpf, mptcp and netfilter.
Current release - regressions:
- netfilter: add NFT_TRANS_PREPARE_ERROR to deal with bound set/chain
- eth: mlx5e:
- fix scheduling of IPsec ASO query while in atomic
- free IRQ rmap and notifier on kernel shutdown
Current release - new code bugs:
- phy: manual remove LEDs to ensure correct ordering
Previous releases - regressions:
- mptcp: fix possible divide by zero in recvmsg()
- dsa: revert "net: phy: dp83867: perform soft reset and retain
established link"
Previous releases - always broken:
- sched: netem: acquire qdisc lock in netem_change()
- bpf:
- fix verifier id tracking of scalars on spill
- fix NULL dereference on exceptions
- accept function names that contain dots
- netfilter: disallow element updates of bound anonymous sets
- mptcp: ensure listener is unhashed before updating the sk status
- xfrm:
- add missed call to delete offloaded policies
- fix inbound ipv4/udp/esp packets to UDPv6 dualstack sockets
- selftests: fixes for FIPS mode
- dsa: mt7530: fix multiple CPU ports, BPDU and LLDP handling
- eth: sfc: use budget for TX completions
Misc:
- wifi: iwlwifi: add support for SO-F device with PCI id 0x7AF0"
* tag 'net-6.4-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (74 commits)
revert "net: align SO_RCVMARK required privileges with SO_MARK"
net: wwan: iosm: Convert single instance struct member to flexible array
sch_netem: acquire qdisc lock in netem_change()
selftests: forwarding: Fix race condition in mirror installation
wifi: mac80211: report all unusable beacon frames
mptcp: ensure listener is unhashed before updating the sk status
mptcp: drop legacy code around RX EOF
mptcp: consolidate fallback and non fallback state machine
mptcp: fix possible list corruption on passive MPJ
mptcp: fix possible divide by zero in recvmsg()
mptcp: handle correctly disconnect() failures
bpf: Force kprobe multi expected_attach_type for kprobe_multi link
bpf/btf: Accept function names that contain dots
Revert "net: phy: dp83867: perform soft reset and retain established link"
net: mdio: fix the wrong parameters
netfilter: nf_tables: Fix for deleting base chains with payload
netfilter: nfnetlink_osf: fix module autoload
netfilter: nf_tables: drop module reference after updating chain
netfilter: nf_tables: disallow timeout for anonymous sets
netfilter: nf_tables: disallow updates of anonymous sets
...
Pull kvm fixes from Paolo Bonzini:
"ARM:
- Correctly save/restore PMUSERNR_EL0 when host userspace is using
PMU counters directly
- Fix GICv2 emulation on GICv3 after the locking rework
- Don't use smp_processor_id() in kvm_pmu_probe_armpmu(), and
document why
Generic:
- Avoid setting page table entries pointing to a deleted memslot if a
host page table entry is changed concurrently with the deletion"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: Avoid illegal stage2 mapping on invalid memory slot
KVM: arm64: Use raw_smp_processor_id() in kvm_pmu_probe_armpmu()
KVM: arm64: Restore GICv2-on-GICv3 functionality
KVM: arm64: PMU: Don't overwrite PMUSERENR with vcpu loaded
KVM: arm64: PMU: Restore the host's PMUSERENR_EL0
Pull powerpc fix from Michael Ellerman:
- Disable IRQs when switching mm in exit_lazy_flush_tlb() called from
exit_mmap()
Thanks to Nicholas Piggin and Sachin Sant.
* tag 'powerpc-6.4-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/64s/radix: Fix exit lazy tlb mm switch with irqs enabled
Pull pci fix from Bjorn Helgaas:
- Transfer Intel LGM GW PCIe maintenance from Rahul Tanwar to Chuanhua
Lei (Zhu YiXin)
* tag 'pci-v6.4-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
MAINTAINERS: Add Chuanhua Lei as Intel LGM GW PCIe maintainer
Pull x86 platform driver fix from Hans de Goede:
"One small fix for an AMD PMF driver issue which is causing issues for
users of just released AMD laptop models"
* tag 'platform-drivers-x86-v6.4-5' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/x86/amd/pmf: Register notify handler only if SPS is enabled
Pull io_uring fixes from Jens Axboe:
"A fix for a race condition with poll removal and linked timeouts, and
then a few followup fixes/tweaks for the msg_control patch from last
week.
Not super important, particularly the sparse fixup, as it was broken
before that recent commit. But let's get it sorted for real for this
release, rather than just have it broken a bit differently"
* tag 'io_uring-6.4-2023-06-21' of git://git.kernel.dk/linux:
io_uring/net: use the correct msghdr union member in io_sendmsg_copy_hdr
io_uring/net: disable partial retries for recvmsg with cmsg
io_uring/net: clear msg_controllen on partial sendmsg retry
io_uring/poll: serialize poll linked timer start with poll removal
Pull cgroup fixes from Tejun Heo:
"It's late but here are two bug fixes. Both fix problems which can be
severe but are very confined in scope. The risk to most use cases
should be minimal.
- Fix for an old bug which triggers if a cgroup subsystem is
remounted to a different hierarchy while someone is reading its
cgroup.procs/tasks file. The risk is pretty low given how seldom
cgroup subsystems are moved across hierarchies.
- We moved cpus_read_lock() outside of cgroup internal locks a while
ago but forgot to update the legacy_freezer leading to lockdep
triggers. Fixed"
* tag 'cgroup-for-6.4-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: Do not corrupt task iteration when rebinding subsystem
cgroup,freezer: hold cpu_hotplug_lock before freezer_mutex in freezer_css_{online,offline}()
KVM/arm64 fixes for 6.4, take #4
- Correctly save/restore PMUSERNR_EL0 when host userspace is using
PMU counters directly
- Fix GICv2 emulation on GICv3 after the locking rework
- Don't use smp_processor_id() in kvm_pmu_probe_armpmu(), and
document why...
Just like for sc7180 devices using the Chrome bootflow (AKA trogdor
and IDP), sc7280 devices using the Chrome bootflow also need their
firmware marked dma-coherent. On sc7280 this wasn't causing WiFi to
fail to startup, since WiFi works differently there. However, on
sc7280 devices we were still getting the message at bootup after
commit 7bd6680b47 ("Revert "Revert "arm64: dma: Drop cache
invalidation from arch_dma_prep_coherent()"""):
qcom_scm firmware:scm: Assign memory protection call failed -22
qcom_rmtfs_mem 9c900000.memory: assign memory failed
qcom_rmtfs_mem: probe of 9c900000.memory failed with error -22
We should mark SCM properly just like we did for trogdor.
Fixes: 7bd6680b47 ("Revert "Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()""")
Fixes: 7a1f4e7f74 ("arm64: dts: qcom: sc7280: Add basic dts/dtsi files for sc7280 soc")
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20230616081440.v2.4.I21dc14a63327bf81c6bb58fe8ed91dbdc9849ee2@changeid
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Trogdor devices use firmware backed by TF-A instead of Qualcomm's
normal TZ. On TF-A we end up mapping memory as cacheable.
Specifically, you can see in Trogdor's TF-A code [1] in
qti_sip_mem_assign() that we call qti_mmap_add_dynamic_region() with
MT_RO_DATA. This translates down to MT_MEMORY instead of
MT_NON_CACHEABLE or MT_DEVICE. Apparently Qualcomm's normal TZ
implementation maps the memory as non-cacheable.
Let's add the "dma-coherent" attribute to the SCM for trogdor.
Adding "dma-coherent" like this fixes WiFi on sc7180-trogdor
devices. WiFi was broken as of commit 7bd6680b47 ("Revert "Revert
"arm64: dma: Drop cache invalidation from
arch_dma_prep_coherent()"""). Specifically at bootup we'd get:
qcom_scm firmware:scm: Assign memory protection call failed -22
qcom_rmtfs_mem 94600000.memory: assign memory failed
qcom_rmtfs_mem: probe of 94600000.memory failed with error -22
From discussion on the mailing lists [2] and over IRC [3], it was
determined that we should always have been tagging the SCM as
dma-coherent on trogdor but that the old "invalidate" happened to make
things work most of the time. Tagging it properly like this is a much
more robust solution.
[1] https://chromium.googlesource.com/chromiumos/third_party/arm-trusted-firmware/+/refs/heads/firmware-trogdor-13577.B/plat/qti/common/src/qti_syscall.c
[2] https://lore.kernel.org/r/20230614165904.1.I279773c37e2c1ed8fbb622ca6d1397aea0023526@changeid
[3] https://oftc.irclog.whitequark.org/linux-msm/2023-06-15
Fixes: 7bd6680b47 ("Revert "Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()""")
Fixes: 7ec3e67307 ("arm64: dts: qcom: sc7180-trogdor: add initial trogdor and lazor dt")
Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20230616081440.v2.3.Ic62daa649b47b656b313551d646c4de9a7da4bd4@changeid
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
sc7180-idp is, for most intents and purposes, a trogdor device.
Specifically, sc7180-idp is designed to run the same style of firmware
as trogdor devices. This can be seen from the fact that IDP has the
same "Reserved memory changes" in its device tree that trogdor has.
Recently it was realized that we need to mark SCM as dma-coherent to
match what trogdor's style of firmware (based on TF-A) does [1]. That
means we need this dma-coherent tag on IDP as well.
Without this, on newer versions of Linux, specifically those with
commit 7bd6680b47 ("Revert "Revert "arm64: dma: Drop cache
invalidation from arch_dma_prep_coherent()"""), WiFi will fail to
work. At bootup you'll see:
qcom_scm firmware:scm: Assign memory protection call failed -22
qcom_rmtfs_mem 94600000.memory: assign memory failed
qcom_rmtfs_mem: probe of 94600000.memory failed with error -22
[1] https://lore.kernel.org/r/20230615145253.1.Ic62daa649b47b656b313551d646c4de9a7da4bd4@changeid
Fixes: 7bd6680b47 ("Revert "Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()""")
Fixes: f5ab220d16 ("arm64: dts: qcom: sc7180: Add remoteproc enablers")
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20230616081440.v2.2.I3c17d546d553378aa8a0c68c3fe04bccea7cba17@changeid
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Trogdor devices use firmware backed by TF-A instead of Qualcomm's
normal TZ. On TF-A we end up mapping memory as cacheable. Specifically,
you can see in Trogdor's TF-A code [1] in qti_sip_mem_assign() that we
call qti_mmap_add_dynamic_region() with MT_RO_DATA. This translates
down to MT_MEMORY instead of MT_NON_CACHEABLE or MT_DEVICE.
Let's allow devices like trogdor to be described properly by allowing
"dma-coherent" in the SCM node.
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20230616081440.v2.1.Ie79b5f0ed45739695c9970df121e11d724909157@changeid
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
We run into guest hang in edk2 firmware when KSM is kept as running on
the host. The edk2 firmware is waiting for status 0x80 from QEMU's pflash
device (TYPE_PFLASH_CFI01) during the operation of sector erasing or
buffered write. The status is returned by reading the memory region of
the pflash device and the read request should have been forwarded to QEMU
and emulated by it. Unfortunately, the read request is covered by an
illegal stage2 mapping when the guest hang issue occurs. The read request
is completed with QEMU bypassed and wrong status is fetched. The edk2
firmware runs into an infinite loop with the wrong status.
The illegal stage2 mapping is populated due to same page sharing by KSM
at (C) even the associated memory slot has been marked as invalid at (B)
when the memory slot is requested to be deleted. It's notable that the
active and inactive memory slots can't be swapped when we're in the middle
of kvm_mmu_notifier_change_pte() because kvm->mn_active_invalidate_count
is elevated, and kvm_swap_active_memslots() will busy loop until it reaches
to zero again. Besides, the swapping from the active to the inactive memory
slots is also avoided by holding &kvm->srcu in __kvm_handle_hva_range(),
corresponding to synchronize_srcu_expedited() in kvm_swap_active_memslots().
CPU-A CPU-B
----- -----
ioctl(kvm_fd, KVM_SET_USER_MEMORY_REGION)
kvm_vm_ioctl_set_memory_region
kvm_set_memory_region
__kvm_set_memory_region
kvm_set_memslot(kvm, old, NULL, KVM_MR_DELETE)
kvm_invalidate_memslot
kvm_copy_memslot
kvm_replace_memslot
kvm_swap_active_memslots (A)
kvm_arch_flush_shadow_memslot (B)
same page sharing by KSM
kvm_mmu_notifier_invalidate_range_start
:
kvm_mmu_notifier_change_pte
kvm_handle_hva_range
__kvm_handle_hva_range
kvm_set_spte_gfn (C)
:
kvm_mmu_notifier_invalidate_range_end
Fix the issue by skipping the invalid memory slot at (C) to avoid the
illegal stage2 mapping so that the read request for the pflash's status
is forwarded to QEMU and emulated by it. In this way, the correct pflash's
status can be returned from QEMU to break the infinite loop in the edk2
firmware.
We tried a git-bisect and the first problematic commit is cd4c718352 ("
KVM: arm64: Convert to the gfn-based MMU notifier callbacks"). With this,
clean_dcache_guest_page() is called after the memory slots are iterated
in kvm_mmu_notifier_change_pte(). clean_dcache_guest_page() is called
before the iteration on the memory slots before this commit. This change
literally enlarges the racy window between kvm_mmu_notifier_change_pte()
and memory slot removal so that we're able to reproduce the issue in a
practical test case. However, the issue exists since commit d5d8184d35
("KVM: ARM: Memory virtualization setup").
Cc: stable@vger.kernel.org # v3.9+
Fixes: d5d8184d35 ("KVM: ARM: Memory virtualization setup")
Reported-by: Shuai Hu <hshuai@redhat.com>
Reported-by: Zhenyu Zhang <zhenyzha@redhat.com>
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Shaoqin Huang <shahuang@redhat.com>
Message-Id: <20230615054259.14911-1-gshan@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
There was regression caused by a97699d1d6 ("btrfs: replace
map_lookup->stripe_len by BTRFS_STRIPE_LEN") and supposedly fixed by
a7299a18a1 ("btrfs: fix u32 overflows when left shifting stripe_nr").
To avoid code churn the fix was open coding the type casts but
unfortunately missed one which was still possible to hit [1].
The missing place was assignment of bioc->full_stripe_logical inside
btrfs_map_block().
Fix it by adding a helper that does the safe calculation of the offset
and use it everywhere even though it may not be strictly necessary due
to already using u64 types. This replaces all remaining
"<< BTRFS_STRIPE_LEN_SHIFT" calls.
[1] https://lore.kernel.org/linux-btrfs/20230622065438.86402-1-wqu@suse.com/
Fixes: a7299a18a1 ("btrfs: fix u32 overflows when left shifting stripe_nr")
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ update changelog ]
Signed-off-by: David Sterba <dsterba@suse.com>
Pablo Neira Ayuso says:
====================
Netfilter/IPVS fixes for net
This is v3, including a crash fix for patch 01/14.
The following patchset contains Netfilter/IPVS fixes for net:
1) Fix UDP segmentation with IPVS tunneled traffic, from Terin Stock.
2) Fix chain binding transaction logic, add a bound flag to rule
transactions. Remove incorrect logic in nft_data_hold() and
nft_data_release().
3) Add a NFT_TRANS_PREPARE_ERROR deactivate state to deal with releasing
the set/chain as a follow up to 1240eb93f0 ("netfilter: nf_tables:
incorrect error path handling with NFT_MSG_NEWRULE")
4) Drop map element references from preparation phase instead of
set destroy path, otherwise bogus EBUSY with transactions such as:
flush chain ip x y
delete chain ip x w
where chain ip x y contains jump/goto from set elements.
5) Pipapo set type does not regard generation mask from the walk
iteration.
6) Fix reference count underflow in set element reference to
stateful object.
7) Several patches to tighten the nf_tables API:
- disallow set element updates of bound anonymous set
- disallow unbound anonymous set/chain at the end of transaction.
- disallow updates of anonymous set.
- disallow timeout configuration for anonymous sets.
8) Fix module reference leak in chain updates.
9) Fix nfnetlink_osf module autoload.
10) Fix deletion of basechain when NFTA_CHAIN_HOOK is specified as
in iptables-nft.
This Netfilter batch is larger than usual at this stage, I am aware we
are fairly late in the -rc cycle, if you prefer to route them through
net-next, please let me know.
netfilter pull request 23-06-21
* tag 'nf-23-06-21' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nf_tables: Fix for deleting base chains with payload
netfilter: nfnetlink_osf: fix module autoload
netfilter: nf_tables: drop module reference after updating chain
netfilter: nf_tables: disallow timeout for anonymous sets
netfilter: nf_tables: disallow updates of anonymous sets
netfilter: nf_tables: reject unbound chain set before commit phase
netfilter: nf_tables: reject unbound anonymous set before commit phase
netfilter: nf_tables: disallow element updates of bound anonymous sets
netfilter: nf_tables: fix underflow in object reference counter
netfilter: nft_set_pipapo: .walk does not deal with generations
netfilter: nf_tables: drop map element references from preparation phase
netfilter: nf_tables: add NFT_TRANS_PREPARE_ERROR to deal with bound set/chain
netfilter: nf_tables: fix chain binding transaction logic
ipvs: align inner_mac_header for encapsulation
====================
Link: https://lore.kernel.org/r/20230621100731.68068-1-pablo@netfilter.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
This reverts commit 1f86123b97 ("net: align SO_RCVMARK required
privileges with SO_MARK") because the reasoning in the commit message
is not really correct:
SO_RCVMARK is used for 'reading' incoming skb mark (via cmsg), as such
it is more equivalent to 'getsockopt(SO_MARK)' which has no priv check
and retrieves the socket mark, rather than 'setsockopt(SO_MARK) which
sets the socket mark and does require privs.
Additionally incoming skb->mark may already be visible if
sysctl_fwmark_reflect and/or sysctl_tcp_fwmark_accept are enabled.
Furthermore, it is easier to block the getsockopt via bpf
(either cgroup setsockopt hook, or via syscall filters)
then to unblock it if it requires CAP_NET_RAW/ADMIN.
On Android the socket mark is (among other things) used to store
the network identifier a socket is bound to. Setting it is privileged,
but retrieving it is not. We'd like unprivileged userspace to be able
to read the network id of incoming packets (where mark is set via
iptables [to be moved to bpf])...
An alternative would be to add another sysctl to control whether
setting SO_RCVMARK is privilged or not.
(or even a MASK of which bits in the mark can be exposed)
But this seems like over-engineering...
Note: This is a non-trivial revert, due to later merged commit e42c7beee7
("bpf: net: Consider has_current_bpf_ctx() when testing capable() in sk_setsockopt()")
which changed both 'ns_capable' into 'sockopt_ns_capable' calls.
Fixes: 1f86123b97 ("net: align SO_RCVMARK required privileges with SO_MARK")
Cc: Larysa Zaremba <larysa.zaremba@intel.com>
Cc: Simon Horman <simon.horman@corigine.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Eyal Birger <eyal.birger@gmail.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Patrick Rohr <prohr@google.com>
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20230618103130.51628-1-maze@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
When mirroring to a gretap in hardware the device expects to be
programmed with the egress port and all the encapsulating headers. This
requires the driver to resolve the path the packet will take in the
software data path and program the device accordingly.
If the path cannot be resolved (in this case because of an unresolved
neighbor), then mirror installation fails until the path is resolved.
This results in a race that causes the test to sometimes fail.
Fix this by setting the neighbor's state to permanent in a couple of
tests, so that it is always valid.
Fixes: 35c31d5c32 ("selftests: forwarding: Test mirror-to-gretap w/ UL 802.1d")
Fixes: 239e754af8 ("selftests: forwarding: Test mirror-to-gretap w/ UL 802.1q")
Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://lore.kernel.org/r/268816ac729cb6028c7a34d4dda6f4ec7af55333.1687264607.git.petrm@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Matthieu Baerts says:
====================
mptcp: fixes for 6.4
Patch 1 correctly handles disconnect() failures that can happen in some
specific cases: now the socket state is set as unconnected as expected.
That fixes an issue introduced in v6.2.
Patch 2 fixes a divide by zero bug in mptcp_recvmsg() with a fix similar
to a recent one from Eric Dumazet for TCP introducing sk_wait_pending
flag. It should address an issue present in MPTCP from almost the
beginning, from v5.9.
Patch 3 fixes a possible list corruption on passive MPJ even if the race
seems very unlikely, better be safe than sorry. The possible issue is
present from v5.17.
Patch 4 consolidates fallback and non fallback state machines to avoid
leaking some MPTCP sockets. The fix is likely needed for versions from
v5.11.
Patch 5 drops code that is no longer used after the introduction of
patch 4/6. This is not really a fix but this patch can probably land in
the -net tree as well not to leave unused code.
Patch 6 ensures listeners are unhashed before updating their sk status
to avoid possible deadlocks when diag info are going to be retrieved
with a lock. Even if it should not be visible with the way we are
currently getting diag info, the issue is present from v5.17.
====================
Link: https://lore.kernel.org/r/20230620-upstream-net-20230620-misc-fixes-for-v6-4-v1-0-f36aa5eae8b9@tessares.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The MPTCP protocol access the listener subflow in a lockless
manner in a couple of places (poll, diag). That works only if
the msk itself leaves the listener status only after that the
subflow itself has been closed/disconnected. Otherwise we risk
deadlock in diag, as reported by Christoph.
Address the issue ensuring that the first subflow (the listener
one) is always disconnected before updating the msk socket status.
Reported-by: Christoph Paasch <cpaasch@apple.com>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/407
Fixes: b29fcfb54c ("mptcp: full disconnect implementation")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Thanks to the previous patch -- "mptcp: consolidate fallback and non
fallback state machine" -- we can finally drop the "temporary hack"
used to detect rx eof.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
An orphaned msk releases the used resources via the worker,
when the latter first see the msk in CLOSED status.
If the msk status transitions to TCP_CLOSE in the release callback
invoked by the worker's final release_sock(), such instance of the
workqueue will not take any action.
Additionally the MPTCP code prevents scheduling the worker once the
socket reaches the CLOSE status: such msk resources will be leaked.
The only code path that can trigger the above scenario is the
__mptcp_check_send_data_fin() in fallback mode.
Address the issue removing the special handling of fallback socket
in __mptcp_check_send_data_fin(), consolidating the state machine
for fallback and non fallback socket.
Since non-fallback sockets do not send and do not receive data_fin,
the mptcp code can update the msk internal status to match the next
step in the SM every time data fin (ack) should be generated or
received.
As a consequence we can remove a bunch of checks for fallback from
the fastpath.
Fixes: 6e628cd3a8 ("mptcp: use mptcp release_cb for delayed tasks")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
At passive MPJ time, if the msk socket lock is held by the user,
the new subflow is appended to the msk->join_list under the msk
data lock.
In mptcp_release_cb()/__mptcp_flush_join_list(), the subflows in
that list are moved from the join_list into the conn_list under the
msk socket lock.
Append and removal could race, possibly corrupting such list.
Address the issue splicing the join list into a temporary one while
still under the msk data lock.
Found by code inspection, the race itself should be almost impossible
to trigger in practice.
Fixes: 3e5014909b ("mptcp: cleanup MPJ subflow list handling")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently the mptcp code has assumes that disconnect() can fail only
at mptcp_sendmsg_fastopen() time - to avoid a deadlock scenario - and
don't even bother returning an error code.
Soon mptcp_disconnect() will handle more error conditions: let's track
them explicitly.
As a bonus, explicitly annotate TCP-level disconnect as not failing:
the mptcp code never blocks for event on the subflows.
Fixes: 7d803344fd ("mptcp: fix deadlock in fastopen error path")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Tested-by: Christoph Paasch <cpaasch@apple.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Daniel Borkmann says:
====================
pull-request: bpf 2023-06-21
We've added 7 non-merge commits during the last 14 day(s) which contain
a total of 7 files changed, 181 insertions(+), 15 deletions(-).
The main changes are:
1) Fix a verifier id tracking issue with scalars upon spill,
from Maxim Mikityanskiy.
2) Fix NULL dereference if an exception is generated while a BPF
subprogram is running, from Krister Johansen.
3) Fix a BTF verification failure when compiling kernel with LLVM_IAS=0,
from Florent Revest.
4) Fix expected_attach_type enforcement for kprobe_multi link,
from Jiri Olsa.
5) Fix a bpf_jit_dump issue for x86_64 to pick the correct JITed image,
from Yonghong Song.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf: Force kprobe multi expected_attach_type for kprobe_multi link
bpf/btf: Accept function names that contain dots
selftests/bpf: add a test for subprogram extables
bpf: ensure main program has an extable
bpf: Fix a bpf_jit_dump issue for x86_64 with sysctl bpf_jit_enable.
selftests/bpf: Add test cases to assert proper ID tracking on spill
bpf: Fix verifier id tracking of scalars on spill
====================
Link: https://lore.kernel.org/r/20230621101116.16122-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull timer fix from Thomas Gleixner:
"A single regression fix for a regression fix:
For a long time the tick was aligned to clock MONOTONIC so that the
tick event happened at a multiple of nanoseconds per tick starting
from clock MONOTONIC = 0.
At some point this changed as the refined jiffies clocksource which is
used during boot before the TSC or other clocksources becomes usable,
was adjusted with a boot offset, so that time 0 is closer to the point
where the kernel starts.
This broke the assumption in the tick code that when the tick setup
happens early on ktime_get() will return a multiple of nanoseconds per
tick. As a consequence applications which aligned their periodic
execution so that it does not collide with the tick were not longer
guaranteed that the tick period starts from time 0.
The fix for this regression was to realign the tick when it is
initially set up to a multiple of tick periods. That works as long as
the underlying tick device supports periodic mode, but breaks under
certain conditions when the tick device supports only one shot mode.
Depending on the offset, the alignment delta to clock MONOTONIC can
get in a range where the minimal programming delta of the underlying
clock event device is larger than the calculated delta to the next
tick. This results in a boot hang as the tick code tries to play catch
up, but as the tick never fires jiffies are not advanced so it keeps
trying for ever.
Solve this by moving the tick alignement into the NOHZ / HIGHRES
enablement code because at that point it is guaranteed that the
underlying clocksource is high resolution capable and not longer
depending on the tick.
This is far before user space starts, so at the point where
applications try to align their timers, the old behaviour of the tick
happening at a multiple of nanoseconds per tick starting from clock
MONOTONIC = 0 is restored"
* tag 'timers-urgent-2023-06-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tick/common: Align tick period during sched_timer setup
Pull virtio fix from Michael Tsirkin:
"A last minute revert to fix a regression"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
Revert "virtio-blk: support completion batching for the IRQ path"
Pull spi fix from Mark Brown:
"One last fix for SPI, just a simple fix for incorrect handling of
probe deferral for DMA in the Qualcomm GENI driver"
* tag 'spi-fix-v6.4-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: spi-geni-qcom: correctly handle -EPROBE_DEFER from dma_request_chan()
Pull regulator fix from Mark Brown:
"One simple fix for v6.4, some incorrectly specified bitfield masks in
the PCA9450 driver"
* tag 'regulator-fix-v6.4-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: pca9450: Fix LDO3OUT and LDO4OUT MASK
Pull regmap fix from Mark Brown:
"One more fix for v6.4
The earlier fix to take account of the register data size when
limiting raw register writes exposed the fact that the Intel AVMM bus
was incorrectly specifying too low a limit on the maximum data
transfer, it is only capable of transmitting one register so had set a
transfer size limit that couldn't fit both the value and the the
register address into a single message"
* tag 'regmap-fix-v6.4-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
regmap: spi-avmm: Fix regmap_bus max_raw_write
Rather than assign the user pointer to msghdr->msg_control, assign it
to msghdr->msg_control_user to make sparse happy. They are in a union
so the end result is the same, but let's avoid new sparse warnings and
squash this one.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202306210654.mDMcyMuB-lkp@intel.com/
Fixes: cac9e4418f ("io_uring/net: save msghdr->msg_control for retries")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We cannot sanely handle partial retries for recvmsg if we have cmsg
attached. If we don't, then we'd just be overwriting the initial cmsg
header on retries. Alternatively we could increment and handle this
appropriately, but it doesn't seem worth the complication.
Move the MSG_WAITALL check into the non-multishot case while at it,
since MSG_WAITALL is explicitly disabled for multishot anyway.
Link: https://lore.kernel.org/io-uring/0b0d4411-c8fd-4272-770b-e030af6919a0@kernel.dk/
Cc: stable@vger.kernel.org # 5.10+
Reported-by: Stefan Metzmacher <metze@samba.org>
Reviewed-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
If we have cmsg attached AND we transferred partial data at least, clear
msg_controllen on retry so we don't attempt to send that again.
Cc: stable@vger.kernel.org # 5.10+
Fixes: cac9e4418f ("io_uring/net: save msghdr->msg_control for retries")
Reported-by: Stefan Metzmacher <metze@samba.org>
Reviewed-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Adds the required quirk to enable the Cirrus amp and correct pins
on the ASUS ROG GV601V series.
While this works if the related _DSD properties are made available, these
aren't included in the ACPI of these laptops (yet).
Signed-off-by: Luke D. Jones <luke@ljones.dev>
Link: https://lore.kernel.org/r/20230621085715.5382-1-luke@ljones.dev
Signed-off-by: Takashi Iwai <tiwai@suse.de>
We currently allow to create perf link for program with
expected_attach_type == BPF_TRACE_KPROBE_MULTI.
This will cause crash when we call helpers like get_attach_cookie or
get_func_ip in such program, because it will call the kprobe_multi's
version (current->bpf_ctx context setup) of those helpers while it
expects perf_link's current->bpf_ctx context setup.
Making sure that we use BPF_TRACE_KPROBE_MULTI expected_attach_type
only for programs attaching through kprobe_multi link.
Fixes: ca74823c6e ("bpf: Add cookie support to programs attached with kprobe multi link")
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230618131414.75649-1-jolsa@kernel.org
When building a kernel with LLVM=1, LLVM_IAS=0 and CONFIG_KASAN=y, LLVM
leaves DWARF tags for the "asan.module_ctor" & co symbols. In turn,
pahole creates BTF_KIND_FUNC entries for these and this makes the BTF
metadata validation fail because they contain a dot.
In a dramatic turn of event, this BTF verification failure can cause
the netfilter_bpf initialization to fail, causing netfilter_core to
free the netfilter_helper hashmap and netfilter_ftp to trigger a
use-after-free. The risk of u-a-f in netfilter will be addressed
separately but the existence of "asan.module_ctor" debug info under some
build conditions sounds like a good enough reason to accept functions
that contain dots in BTF.
Although using only LLVM=1 is the recommended way to compile clang-based
kernels, users can certainly do LLVM=1, LLVM_IAS=0 as well and we still
try to support that combination according to Nick. To clarify:
- > v5.10 kernel, LLVM=1 (LLVM_IAS=0 is not the default) is recommended,
but user can still have LLVM=1, LLVM_IAS=0 to trigger the issue
- <= 5.10 kernel, LLVM=1 (LLVM_IAS=0 is the default) is recommended in
which case GNU as will be used
Fixes: 1dc9285184 ("bpf: kernel side support for BTF Var and DataSec")
Signed-off-by: Florent Revest <revest@chromium.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Cc: Yonghong Song <yhs@meta.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Link: https://lore.kernel.org/bpf/20230615145607.3469985-1-revest@chromium.org
This reverts commit 07b679f70d.
This change appears to have broken things...
We now see applications hanging during disk accesses.
e.g.
multi-port virtio-blk device running in h/w (FPGA)
Host running a simple 'fio' test.
[global]
thread=1
direct=1
ioengine=libaio
norandommap=1
group_reporting=1
bs=4K
rw=read
iodepth=128
runtime=1
numjobs=4
time_based
[job0]
filename=/dev/vda
[job1]
filename=/dev/vdb
[job2]
filename=/dev/vdc
...
[job15]
filename=/dev/vdp
i.e. 16 disks; 4 queues per disk; simple burst of 4KB reads
This is repeatedly run in a loop.
After a few, normally <10 seconds, fio hangs.
With 64 queues (16 disks), failure occurs within a few seconds; with 8 queues (2 disks) it may take ~hour before hanging.
Last message:
fio-3.19
Starting 8 threads
Jobs: 1 (f=1): [_(7),R(1)][68.3%][eta 03h:11m:06s]
I think this means at the end of the run 1 queue was left incomplete.
'diskstats' (run while fio is hung) shows no outstanding transactions.
e.g.
$ cat /proc/diskstats
...
252 0 vda 1843140071 0 14745120568 712568645 0 0 0 0 0 3117947 712568645 0 0 0 0 0 0
252 16 vdb 1816291511 0 14530332088 704905623 0 0 0 0 0 3117711 704905623 0 0 0 0 0 0
...
Other stats (in the h/w, and added to the virtio-blk driver ([a]virtio_queue_rq(), [b]virtblk_handle_req(), [c]virtblk_request_done()) all agree, and show every request had a completion, and that virtblk_request_done() never gets called.
e.g.
PF= 0 vq=0 1 2 3
[a]request_count - 839416590 813148916 105586179 84988123
[b]completion1_count - 839416590 813148916 105586179 84988123
[c]completion2_count - 0 0 0 0
PF= 1 vq=0 1 2 3
[a]request_count - 823335887 812516140 104582672 75856549
[b]completion1_count - 823335887 812516140 104582672 75856549
[c]completion2_count - 0 0 0 0
i.e. the issue is after the virtio-blk driver.
This change was introduced in kernel 6.3.0.
I am seeing this using 6.3.3.
If I run with an earlier kernel (5.15), it does not occur.
If I make a simple patch to the 6.3.3 virtio-blk driver, to skip the blk_mq_add_to_batch()call, it does not fail.
e.g.
kernel 5.15 - this is OK
virtio_blk.c,virtblk_done() [irq handler]
if (likely(!blk_should_fake_timeout(req->q))) {
blk_mq_complete_request(req);
}
kernel 6.3.3 - this fails
virtio_blk.c,virtblk_handle_req() [irq handler]
if (likely(!blk_should_fake_timeout(req->q))) {
if (!blk_mq_complete_request_remote(req)) {
if (!blk_mq_add_to_batch(req, iob, virtblk_vbr_status(vbr), virtblk_complete_batch)) {
virtblk_request_done(req); //this never gets called... so blk_mq_add_to_batch() must always succeed
}
}
}
If I do, kernel 6.3.3 - this is OK
virtio_blk.c,virtblk_handle_req() [irq handler]
if (likely(!blk_should_fake_timeout(req->q))) {
if (!blk_mq_complete_request_remote(req)) {
virtblk_request_done(req); //force this here...
if (!blk_mq_add_to_batch(req, iob, virtblk_vbr_status(vbr), virtblk_complete_batch)) {
virtblk_request_done(req); //this never gets called... so blk_mq_add_to_batch() must always succeed
}
}
}
Perhaps you might like to fix/test/revert this change...
Martin
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202306090826.C1fZmdMe-lkp@intel.com/
Cc: Suwan Kim <suwan.kim027@gmail.com>
Tested-by: edliaw@google.com
Reported-by: "Roberts, Martin" <martin.roberts@intel.com>
Message-Id: <336455b4f630f329380a8f53ee8cad3868764d5c.1686295549.git.mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Pull hotfixes from Andrew Morton:
"19 hotfixes. 8 of these are cc:stable.
This includes a wholesale reversion of the post-6.4 series 'make slab
shrink lockless'. After input from Dave Chinner it has been decided
that we should go a different way [1]"
Link: https://lkml.kernel.org/r/ZH6K0McWBeCjaf16@dread.disaster.area [1]
* tag 'mm-hotfixes-stable-2023-06-20-12-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
selftests/mm: fix cross compilation with LLVM
mailmap: add entries for Ben Dooks
nilfs2: prevent general protection fault in nilfs_clear_dirty_page()
Revert "mm: vmscan: make global slab shrink lockless"
Revert "mm: vmscan: make memcg slab shrink lockless"
Revert "mm: vmscan: add shrinker_srcu_generation"
Revert "mm: shrinkers: make count and scan in shrinker debugfs lockless"
Revert "mm: vmscan: hold write lock to reparent shrinker nr_deferred"
Revert "mm: vmscan: remove shrinker_rwsem from synchronize_shrinkers()"
Revert "mm: shrinkers: convert shrinker_rwsem to mutex"
nilfs2: fix buffer corruption due to concurrent device reads
scripts/gdb: fix SB_* constants parsing
scripts: fix the gfp flags header path in gfp-translate
udmabuf: revert 'Add support for mapping hugepages (v4)'
mm/khugepaged: fix iteration in collapse_file
memfd: check for non-NULL file_seals in memfd_create() syscall
mm/vmalloc: do not output a spurious warning when huge vmalloc() fails
mm/mprotect: fix do_mprotect_pkey() limit check
writeback: fix dereferencing NULL mapping->host on writeback_page_template
Pull ACPI fix from Rafael Wysocki:
"Fix a kernel crash during early resume from ACPI S3 that has been
present since the 5.15 cycle when might_sleep() was added to
down_timeout(), which in some configurations of the kernel caused an
implicit preemption point to trigger at a wrong time"
* tag 'acpi-6.4-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: sleep: Avoid breaking S3 wakeup due to might_sleep()
Pull thermal control fix from Rafael Wysocki:
"Fix a regression introduced during the 6.3 cycle causing
intel_soc_dts_iosf to report incorrect temperature values
due to a coding mistake (Hans de Goede)"
* tag 'thermal-6.4-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
thermal/intel/intel_soc_dts_iosf: Fix reporting wrong temperatures
Pull tracing fixes from Steven Rostedt:
- Fix MAINTAINERS file to point to proper mailing list for rtla and rv
The mailing list pointed to linux-trace-devel instead of
linux-trace-kernel. The former is for the tracing libraries and the
latter is for anything in the Linux kernel tree. The wrong mailing
list was used because linux-trace-kernel did not exist when rtla and
rv were created.
- User events:
- Fix matching of dynamic events to their user events
When user writes to dynamic_events file, a lookup of the
registered dynamic events is made, but there were some cases that
a match could be incorrectly made.
- Add auto cleanup of user events
Have the user events automatically get removed when the last
reference (file descriptor) is closed. This was asked for to
prevent leaks of user events hanging around needing admins to
clean them up.
- Add persistent logic (but not let user space use it yet)
In some cases, having a persistent user event (one that does not
get cleaned up automatically) is useful. But there's still debates
about how to expose this to user space. The infrastructure is
added, but the API is not.
- Update the selftests
Update the user event selftests to reflect the above changes"
* tag 'trace-v6.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing/user_events: Document auto-cleanup and remove dyn_event refs
selftests/user_events: Adapt dyn_test to non-persist events
selftests/user_events: Ensure auto cleanup works as expected
tracing/user_events: Add auto cleanup and future persist flag
tracing/user_events: Track refcount consistently via put/get
tracing/user_events: Store register flags on events
tracing/user_events: Remove user_ns walk for groups
selftests/user_events: Add perf self-test for empty arguments events
selftests/user_events: Clear the events after perf self-test
selftests/user_events: Add ftrace self-test for empty arguments events
tracing/user_events: Fix the incorrect trace record for empty arguments events
tracing: Modify print_fields() for fields output order
tracing/user_events: Handle matching arguments that is null from dyn_events
tracing/user_events: Prevent same name but different args event
tracing/rv/rtla: Update MAINTAINERS file to point to proper mailing list
Pull btrfs fix from David Sterba:
"One more regression fix for an assertion failure that uncovered a
nasty problem with stripe calculations. This is caused by a u32
overflow when there are enough devices. The fstests require 6 so this
hasn't been caught, I was able to hit it with 8.
The fix is minimal and only adds u64 casts, we'll clean that up later.
I did various additional tests to be sure"
* tag 'for-6.4-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix u32 overflows when left shifting stripe_nr
When deleting a base chain, iptables-nft simply submits the whole chain
to the kernel, including the NFTA_CHAIN_HOOK attribute. The new code
added by fixed commit then turned this into a chain update, destroying
the hook but not the chain itself. Detect the situation by checking if
the chain type is either netdev or inet/ingress.
Fixes: 7d937b1071 ("netfilter: nf_tables: support for deleting devices in an existing netdev chain")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Move the alias from xt_osf to nfnetlink_osf.
Fixes: f932495208 ("netfilter: nfnetlink_osf: extract nfnetlink_subsystem code from xt_osf.c")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Otherwise the module reference counter is leaked.
Fixes b9703ed44f ("netfilter: nf_tables: support for adding new devices to an existing netdev chain")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Disallow updates of set timeout and garbage collection parameters for
anonymous sets.
Fixes: 123b99619c ("netfilter: nf_tables: honor set timeout and garbage collection updates")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Use binding list to track set transaction and to check for unbound
chains before entering the commit phase.
Bail out if chain binding remain unused before entering the commit
step.
Fixes: d0e2c7de92 ("netfilter: nf_tables: add NFT_CHAIN_BINDING")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Add a new list to track set transaction and to check for unbound
anonymous sets before entering the commit phase.
Bail out at the end of the transaction handling if an anonymous set
remains unbound.
Fixes: 96518518cc ("netfilter: add nftables")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Anonymous sets come with NFT_SET_CONSTANT from userspace. Although API
allows to create anonymous sets without NFT_SET_CONSTANT, it makes no
sense to allow to add and to delete elements for bound anonymous sets.
Fixes: 96518518cc ("netfilter: add nftables")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Since ("netfilter: nf_tables: drop map element references from
preparation phase"), integration with commit protocol is better,
therefore drop the workaround that b91d903688 ("netfilter: nf_tables:
fix leaking object reference count") provides.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The .walk callback iterates over the current active set, but it might be
useful to iterate over the next generation set. Use the generation mask
to determine what set view (either current or next generation) is use
for the walk iteration.
Fixes: 3c4287f620 ("nf_tables: Add set type for arbitrary concatenation of ranges")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
set .destroy callback releases the references to other objects in maps.
This is very late and it results in spurious EBUSY errors. Drop refcount
from the preparation phase instead, update set backend not to drop
reference counter from set .destroy path.
Exceptions: NFT_TRANS_PREPARE_ERROR does not require to drop the
reference counter because the transaction abort path releases the map
references for each element since the set is unbound. The abort path
also deals with releasing reference counter for new elements added to
unbound sets.
Fixes: 591054469b ("netfilter: nf_tables: revisit chain/object refcounting from elements")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Add a new state to deal with rule expressions deactivation from the
newrule error path, otherwise the anonymous set remains in the list in
inactive state for the next generation. Mark the set/chain transaction
as unbound so the abort path releases this object, set it as inactive in
the next generation so it is not reachable anymore from this transaction
and reference counter is dropped.
Fixes: 1240eb93f0 ("netfilter: nf_tables: incorrect error path handling with NFT_MSG_NEWRULE")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Add bound flag to rule and chain transactions as in 6a0a8d10a3
("netfilter: nf_tables: use-after-free in failing rule with bound set")
to skip them in case that the chain is already bound from the abort
path.
This patch fixes an imbalance in the chain use refcnt that triggers a
WARN_ON on the table and chain destroy path.
This patch also disallows nested chain bindings, which is not
supported from userspace.
The logic to deal with chain binding in nft_data_hold() and
nft_data_release() is not correct. The NFT_TRANS_PREPARE state needs a
special handling in case a chain is bound but next expressions in the
same rule fail to initialize as described by 1240eb93f0 ("netfilter:
nf_tables: incorrect error path handling with NFT_MSG_NEWRULE").
The chain is left bound if rule construction fails, so the objects
stored in this chain (and the chain itself) are released by the
transaction records from the abort path, follow up patch ("netfilter:
nf_tables: add NFT_TRANS_PREPARE_ERROR to deal with bound set/chain")
completes this error handling.
When deleting an existing rule, chain bound flag is set off so the
rule expression .destroy path releases the objects.
Fixes: d0e2c7de92 ("netfilter: nf_tables: add NFT_CHAIN_BINDING")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The max_raw_write member of the regmap_spi_avmm_bus structure is defined
as:
.max_raw_write = SPI_AVMM_VAL_SIZE * MAX_WRITE_CNT
SPI_AVMM_VAL_SIZE == 4 and MAX_WRITE_CNT == 1 so this results in a
maximum write transfer size of 4 bytes which provides only enough space to
transfer the address of the target register. It provides no space for the
value to be transferred. This bug became an issue (divide-by-zero in
_regmap_raw_write()) after the following was accepted into mainline:
commit 3981514180 ("regmap: Account for register length when chunking")
Change max_raw_write to include space (4 additional bytes) for both the
register address and value:
.max_raw_write = SPI_AVMM_REG_SIZE + SPI_AVMM_VAL_SIZE * MAX_WRITE_CNT
Fixes: 7f9fb67358 ("regmap: add Intel SPI Slave to AVMM Bus Bridge support")
Reviewed-by: Matthew Gerlach <matthew.gerlach@linux.intel.com>
Signed-off-by: Russ Weight <russell.h.weight@intel.com>
Link: https://lore.kernel.org/r/20230620202824.380313-1-russell.h.weight@intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
We have seen a bug where the NIC incorrectly changes the length in the
IP header of a padded packet to include the padding bytes. The driver
already has a workaround for this so do the workaround for this NIC too.
This resolves the issue.
The NIC in question identifies itself as follows:
[ 8.828494] be2net 0000:02:00.0: FW version is 10.7.110.31
[ 8.834759] be2net 0000:02:00.0: Emulex OneConnect(be3): PF FLEX10 port 1
02:00.0 Ethernet controller: Emulex Corporation OneConnect 10Gb NIC (be3) (rev 01)
Fixes: ca34fe38f0 ("be2net: fix wrong usage of adapter->generation")
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Link: https://lore.kernel.org/r/20230616164549.2863037-1-ross.lagerwall@citrix.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull smb server fixes from Steve French:
"Four smb3 server fixes, all also for stable:
- fix potential oops in parsing compounded requests
- fix various paths (mkdir, create etc) where mnt_want_write was not
checked first
- fix slab out of bounds in check_message and write"
* tag '6.4-rc6-smb3-server-fixes' of git://git.samba.org/ksmbd:
ksmbd: validate session id and tree id in the compound request
ksmbd: fix out-of-bound read in smb2_write
ksmbd: add mnt_want_write to ksmbd vfs functions
ksmbd: validate command payload size
[BUG]
David reported an ASSERT() get triggered during fio load on 8 devices
with data/raid6 and metadata/raid1c3:
fio --rw=randrw --randrepeat=1 --size=3000m \
--bsrange=512b-64k --bs_unaligned \
--ioengine=libaio --fsync=1024 \
--name=job0 --name=job1 \
The ASSERT() is from rbio_add_bio() of raid56.c:
ASSERT(orig_logical >= full_stripe_start &&
orig_logical + orig_len <= full_stripe_start +
rbio->nr_data * BTRFS_STRIPE_LEN);
Which is checking if the target rbio is crossing the full stripe
boundary.
[100.789] assertion failed: orig_logical >= full_stripe_start && orig_logical + orig_len <= full_stripe_start + rbio->nr_data * BTRFS_STRIPE_LEN, in fs/btrfs/raid56.c:1622
[100.795] ------------[ cut here ]------------
[100.796] kernel BUG at fs/btrfs/raid56.c:1622!
[100.797] invalid opcode: 0000 [#1] PREEMPT SMP KASAN
[100.798] CPU: 1 PID: 100 Comm: kworker/u8:4 Not tainted 6.4.0-rc6-default+ #124
[100.799] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552-rebuilt.opensuse.org 04/01/2014
[100.802] Workqueue: writeback wb_workfn (flush-btrfs-1)
[100.803] RIP: 0010:rbio_add_bio+0x204/0x210 [btrfs]
[100.806] RSP: 0018:ffff888104a8f300 EFLAGS: 00010246
[100.808] RAX: 00000000000000a1 RBX: ffff8881075907e0 RCX: ffffed1020951e01
[100.809] RDX: 0000000000000000 RSI: 0000000000000008 RDI: 0000000000000001
[100.811] RBP: 0000000141d20000 R08: 0000000000000001 R09: ffff888104a8f04f
[100.813] R10: ffffed1020951e09 R11: 0000000000000003 R12: ffff88810e87f400
[100.815] R13: 0000000041d20000 R14: 0000000144529000 R15: ffff888101524000
[100.817] FS: 0000000000000000(0000) GS:ffff88811ac00000(0000) knlGS:0000000000000000
[100.821] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[100.822] CR2: 000055d54e44c270 CR3: 000000010a9a1006 CR4: 00000000003706a0
[100.824] Call Trace:
[100.825] <TASK>
[100.825] ? die+0x32/0x80
[100.826] ? do_trap+0x12d/0x160
[100.827] ? rbio_add_bio+0x204/0x210 [btrfs]
[100.827] ? rbio_add_bio+0x204/0x210 [btrfs]
[100.829] ? do_error_trap+0x90/0x130
[100.830] ? rbio_add_bio+0x204/0x210 [btrfs]
[100.831] ? handle_invalid_op+0x2c/0x30
[100.833] ? rbio_add_bio+0x204/0x210 [btrfs]
[100.835] ? exc_invalid_op+0x29/0x40
[100.836] ? asm_exc_invalid_op+0x16/0x20
[100.837] ? rbio_add_bio+0x204/0x210 [btrfs]
[100.837] raid56_parity_write+0x64/0x270 [btrfs]
[100.838] btrfs_submit_chunk+0x26e/0x800 [btrfs]
[100.840] ? btrfs_bio_init+0x80/0x80 [btrfs]
[100.841] ? release_pages+0x503/0x6d0
[100.842] ? folio_unlock+0x2f/0x60
[100.844] ? __folio_put+0x60/0x60
[100.845] ? btrfs_do_readpage+0xae0/0xae0 [btrfs]
[100.847] btrfs_submit_bio+0x21/0x60 [btrfs]
[100.847] submit_one_bio+0x6a/0xb0 [btrfs]
[100.849] extent_write_cache_pages+0x395/0x680 [btrfs]
[100.850] ? __extent_writepage+0x520/0x520 [btrfs]
[100.851] ? mark_usage+0x190/0x190
[100.852] extent_writepages+0xdb/0x130 [btrfs]
[100.853] ? extent_write_locked_range+0x480/0x480 [btrfs]
[100.854] ? mark_usage+0x190/0x190
[100.854] ? attach_extent_buffer_page+0x220/0x220 [btrfs]
[100.855] ? reacquire_held_locks+0x178/0x280
[100.856] ? writeback_sb_inodes+0x245/0x7f0
[100.857] do_writepages+0x102/0x2e0
[100.858] ? page_writeback_cpu_online+0x10/0x10
[100.859] ? __lock_release.isra.0+0x14a/0x4d0
[100.860] ? reacquire_held_locks+0x280/0x280
[100.861] ? __lock_acquired+0x1e9/0x3d0
[100.862] ? do_raw_spin_lock+0x1b0/0x1b0
[100.863] __writeback_single_inode+0x94/0x450
[100.864] writeback_sb_inodes+0x372/0x7f0
[100.864] ? lock_sync+0xd0/0xd0
[100.865] ? do_raw_spin_unlock+0x93/0xf0
[100.866] ? sync_inode_metadata+0xc0/0xc0
[100.867] ? rwsem_optimistic_spin+0x340/0x340
[100.868] __writeback_inodes_wb+0x70/0x130
[100.869] wb_writeback+0x2d1/0x530
[100.869] ? __writeback_inodes_wb+0x130/0x130
[100.870] ? lockdep_hardirqs_on_prepare.part.0+0xf1/0x1c0
[100.870] wb_do_writeback+0x3eb/0x480
[100.871] ? wb_writeback+0x530/0x530
[100.871] ? mark_lock_irq+0xcd0/0xcd0
[100.872] wb_workfn+0xe0/0x3f0<
[CAUSE]
Commit a97699d1d6 ("btrfs: replace map_lookup->stripe_len by
BTRFS_STRIPE_LEN") changes how we calculate the map length, to reduce
u64 division.
Function btrfs_max_io_len() is to get the length to the stripe boundary.
It calculates the full stripe start offset (inside the chunk) by the
following code:
*full_stripe_start =
rounddown(*stripe_nr, nr_data_stripes(map)) <<
BTRFS_STRIPE_LEN_SHIFT;
The calculation itself is fine, but the value returned by rounddown() is
dependent on both @stripe_nr (which is u32) and nr_data_stripes() (which
returned int).
Thus the result is also u32, then we do the left shift, which can
overflow u32.
If such overflow happens, @full_stripe_start will be a value way smaller
than @offset, causing later "full_stripe_len - (offset -
*full_stripe_start)" to underflow, thus make later length calculation to
have no stripe boundary limit, resulting a write bio to exceed stripe
boundary.
There are some other locations like this, with a u32 @stripe_nr got left
shift, which can lead to a similar overflow.
[FIX]
Fix all @stripe_nr with left shift with a type cast to u64 before the
left shift.
Those involved @stripe_nr or similar variables are recording the stripe
number inside the chunk, which is small enough to be contained by u32,
but their offset inside the chunk can not fit into u32.
Thus for those specific left shifts, a type cast to u64 is necessary so
this patch does not touch them and the code will be cleaned up in the
future to keep the fix minimal.
Reported-by: David Sterba <dsterba@suse.com>
Fixes: a97699d1d6 ("btrfs: replace map_lookup->stripe_len by BTRFS_STRIPE_LEN")
Tested-by: David Sterba <dsterba@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Arınç ÜNAL says:
====================
net: dsa: mt7530: fix multiple CPU ports, BPDU and LLDP handling
This patch series fixes all non-theoretical issues regarding multiple CPU
ports and the handling of LLDP frames and BPDUs.
I am adding me as a maintainer, I've got some code improvements on the way.
I will keep an eye on this driver and the patches submitted for it in the
future.
Arınç
v6:
- Change a small portion of the comment in the diff on "net: dsa: mt7530:
set all CPU ports in MT7531_CPU_PMAP" with Russell's suggestion.
- Change the patch log of "net: dsa: mt7530: fix trapping frames on
non-MT7621 SoC MT7530 switch" with Vladimir's suggestion.
- Group the code for trapping frames into a common function and call that.
- Add Vladimir and Russell's reviewed-by tags to where they're given.
v5:
- Change the comment in the diff on the first patch with Russell's words.
- Change the patch log of the first patch to state that the patch is just
preparatory work for change "net: dsa: introduce
preferred_default_local_cpu_port and use on MT7530" and not a fix to an
existing problem on the code base.
- Remove the "net: dsa: mt7530: fix trapping frames with multiple CPU ports
on MT7530" patch. It fixes a theoretical issue, therefore it is net-next
material.
- Remove unnecessary information from the patch logs. Remove the enum
renaming change.
- Strengthen the point of the "net: dsa: introduce
preferred_default_local_cpu_port and use on MT7530" patch.
v4: Make the patch logs and my comments in the code easier to understand.
v3: Fix the from header on the patches. Write a cover letter.
v2: Add patches to fix the handling of LLDP frames and BPDUs.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Since the introduction of the OF bindings, DSA has always had a policy that
in case multiple CPU ports are present in the device tree, the numerically
smallest one is always chosen.
The MT7530 switch family, except the switch on the MT7988 SoC, has 2 CPU
ports, 5 and 6, where port 6 is preferable on the MT7531BE switch because
it has higher bandwidth.
The MT7530 driver developers had 3 options:
- to modify DSA when the MT7531 switch support was introduced, such as to
prefer the better port
- to declare both CPU ports in device trees as CPU ports, and live with the
sub-optimal performance resulting from not preferring the better port
- to declare just port 6 in the device tree as a CPU port
Of course they chose the path of least resistance (3rd option), kicking the
can down the road. The hardware description in the device tree is supposed
to be stable - developers are not supposed to adopt the strategy of
piecemeal hardware description, where the device tree is updated in
lockstep with the features that the kernel currently supports.
Now, as a result of the fact that they did that, any attempts to modify the
device tree and describe both CPU ports as CPU ports would make DSA change
its default selection from port 6 to 5, effectively resulting in a
performance degradation visible to users with the MT7531BE switch as can be
seen below.
Without preferring port 6:
[ ID][Role] Interval Transfer Bitrate Retr
[ 5][TX-C] 0.00-20.00 sec 374 MBytes 157 Mbits/sec 734 sender
[ 5][TX-C] 0.00-20.00 sec 373 MBytes 156 Mbits/sec receiver
[ 7][RX-C] 0.00-20.00 sec 1.81 GBytes 778 Mbits/sec 0 sender
[ 7][RX-C] 0.00-20.00 sec 1.81 GBytes 777 Mbits/sec receiver
With preferring port 6:
[ ID][Role] Interval Transfer Bitrate Retr
[ 5][TX-C] 0.00-20.00 sec 1.99 GBytes 856 Mbits/sec 273 sender
[ 5][TX-C] 0.00-20.00 sec 1.99 GBytes 855 Mbits/sec receiver
[ 7][RX-C] 0.00-20.00 sec 1.72 GBytes 737 Mbits/sec 15 sender
[ 7][RX-C] 0.00-20.00 sec 1.71 GBytes 736 Mbits/sec receiver
Using one port for WAN and the other ports for LAN is a very popular use
case which is what this test emulates.
As such, this change proposes that we retroactively modify stable kernels
(which don't support the modification of the CPU port assignments, so as to
let user space fix the problem and restore the throughput) to keep the
mt7530 driver preferring port 6 even with device trees where the hardware
is more fully described.
Fixes: c288575f78 ("net: dsa: mt7530: Add the support of MT7531 switch")
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
LLDP frames are link-local frames, therefore they must be trapped to the
CPU port. Currently, the MT753X switches treat LLDP frames as regular
multicast frames, therefore flooding them to user ports. To fix this, set
LLDP frames to be trapped to the CPU port(s).
Fixes: b8f126a8d5 ("net-next: dsa: add dsa support for Mediatek MT7530 switch")
Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
BPDUs are link-local frames, therefore they must be trapped to the CPU
port. Currently, the MT7530 switch treats BPDUs as regular multicast
frames, therefore flooding them to user ports. To fix this, set BPDUs to be
trapped to the CPU port. Group this on mt7530_setup() and
mt7531_setup_common() into mt753x_trap_frames() and call that.
Fixes: b8f126a8d5 ("net-next: dsa: add dsa support for Mediatek MT7530 switch")
Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
All MT7530 switch IP variants share the MT7530_MFC register, but the
current driver only writes it for the switch variant that is integrated in
the MT7621 SoC. Modify the code to include all MT7530 derivatives.
Fixes: b8f126a8d5 ("net-next: dsa: add dsa support for Mediatek MT7530 switch")
Suggested-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
MT7531_CPU_PMAP represents the destination port mask for trapped-to-CPU
frames (further restricted by PCR_MATRIX).
Currently the driver sets the first CPU port as the single port in this bit
mask, which works fine regardless of whether the device tree defines port
5, 6 or 5+6 as CPU ports. This is because the logic coincides with DSA's
logic of picking the first CPU port as the CPU port that all user ports are
affine to, by default.
An upcoming change would like to influence DSA's selection of the default
CPU port to no longer be the first one, and in that case, this logic needs
adaptation.
Since there is no observed leakage or duplication of frames if all CPU
ports are defined in this bit mask, simply include them all.
Suggested-by: Russell King (Oracle) <linux@armlinux.org.uk>
Suggested-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Layerscape MACs support 25Gbps network speed with dpmac "CAUI" mode.
Add the mappings between DPMAC_ETH_IF_* and HY_INTERFACE_MODE_*, as well
as the 25000 mac capability.
Tested on SolidRun LX2162a Clearfog, serdes 1 protocol 18.
Signed-off-by: Josua Mayer <josua@solid-run.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stefan Schmidt says:
====================
An update from ieee802154 for your *net* tree:
Two small fixes and MAINTAINERS update this time.
Azeem Shaikh ensured consistent use of strscpy through the tree and fixed
the usage in our trace.h.
Chen Aotian fixed a potential memory leak in the hwsim simulator for
ieee802154.
Miquel Raynal updated the MAINATINERS file with the new team git tree
locations and patchwork URLs.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull hyperv fixes from Wei Liu:
- Fix races in Hyper-V PCI controller (Dexuan Cui)
- Fix handling of hyperv_pcpu_input_arg (Michael Kelley)
- Fix vmbus_wait_for_unload to scan present CPUs (Michael Kelley)
- Call hv_synic_free in the failure path of hv_synic_alloc (Dexuan Cui)
- Add noop for real mode handlers for virtual trust level code (Saurabh
Sengar)
* tag 'hyperv-fixes-signed-20230619' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
PCI: hv: Add a per-bus mutex state_lock
Revert "PCI: hv: Fix a timing issue which causes kdump to fail occasionally"
PCI: hv: Remove the useless hv_pcichild_state from struct hv_pci_dev
PCI: hv: Fix a race condition in hv_irq_unmask() that can cause panic
PCI: hv: Fix a race condition bug in hv_pci_query_relations()
arm64/hyperv: Use CPUHP_AP_HYPERV_ONLINE state to fix CPU online sequencing
x86/hyperv: Fix hyperv_pcpu_input_arg handling when CPUs go online/offline
Drivers: hv: vmbus: Fix vmbus_wait_for_unload() to scan present CPUs
Drivers: hv: vmbus: Call hv_synic_free() if hv_synic_alloc() fails
x86/hyperv/vtl: Add noop for realmode pointers
Currently the MM selftests attempt to work out the target architecture by
using CROSS_COMPILE or otherwise querying the host machine, storing the
target architecture in a variable called MACHINE rather than the usual
ARCH though as far as I can tell (including for x86_64) the value is the
same as we would use for architecture.
When cross compiling with LLVM we don't need a CROSS_COMPILE as LLVM can
support many target architectures in a single build so this logic does not
work, CROSS_COMPILE is not set and we end up selecting tests for the host
rather than target architecture. Fix this by using the more standard ARCH
to describe the architecture, taking it from the environment if specified.
Link: https://lkml.kernel.org/r/20230614-kselftest-mm-llvm-v1-1-180523f277d3@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Tom Rix <trix@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
In a syzbot stress test that deliberately causes file system errors on
nilfs2 with a corrupted disk image, it has been reported that
nilfs_clear_dirty_page() called from nilfs_clear_dirty_pages() can cause a
general protection fault.
In nilfs_clear_dirty_pages(), when looking up dirty pages from the page
cache and calling nilfs_clear_dirty_page() for each dirty page/folio
retrieved, the back reference from the argument page to "mapping" may have
been changed to NULL (and possibly others). It is necessary to check this
after locking the page/folio.
So, fix this issue by not calling nilfs_clear_dirty_page() on a page/folio
after locking it in nilfs_clear_dirty_pages() if the back reference
"mapping" from the page/folio is different from the "mapping" that held
the page/folio just before.
Link: https://lkml.kernel.org/r/20230612021456.3682-1-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: syzbot+53369d11851d8f26735c@syzkaller.appspotmail.com
Closes: https://lkml.kernel.org/r/000000000000da4f6b05eb9bf593@google.com
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
As a result of analysis of a syzbot report, it turned out that in three
cases where nilfs2 allocates block device buffers directly via sb_getblk,
concurrent reads to the device can corrupt the allocated buffers.
Nilfs2 uses sb_getblk for segment summary blocks, that make up a log
header, and the super root block, that is the trailer, and when moving and
writing the second super block after fs resize.
In any of these, since the uptodate flag is not set when storing metadata
to be written in the allocated buffers, the stored metadata will be
overwritten if a device read of the same block occurs concurrently before
the write. This causes metadata corruption and misbehavior in the log
write itself, causing warnings in nilfs_btree_assign() as reported.
Fix these issues by setting an uptodate flag on the buffer head on the
first or before modifying each buffer obtained with sb_getblk, and
clearing the flag on failure.
When setting the uptodate flag, the lock_buffer/unlock_buffer pair is used
to perform necessary exclusive control, and the buffer is filled to ensure
that uninitialized bytes are not mixed into the data read from others. As
for buffers for segment summary blocks, they are filled incrementally, so
if the uptodate flag was unset on their allocation, set the flag and zero
fill the buffer once at that point.
Also, regarding the superblock move routine, the starting point of the
memset call to zerofill the block is incorrectly specified, which can
cause a buffer overflow on file systems with block sizes greater than
4KiB. In addition, if the superblock is moved within a large block, it is
necessary to assume the possibility that the data in the superblock will
be destroyed by zero-filling before copying. So fix these potential
issues as well.
Link: https://lkml.kernel.org/r/20230609035732.20426-1-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: syzbot+31837fe952932efc8fb9@syzkaller.appspotmail.com
Closes: https://lkml.kernel.org/r/00000000000030000a05e981f475@google.com
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
--0000000000009a0c9905fd9173ad
Content-Transfer-Encoding: 8bit
After f15afbd34d ("fs: fix undefined behavior in bit shift for
SB_NOUSER") the constants were changed from plain integers which
LX_VALUE() can parse to constants using the BIT() macro which causes the
following:
Reading symbols from build/linux-custom/vmlinux...done.
Traceback (most recent call last):
File "/home/fainelli/work/buildroot/output/arm64/build/linux-custom/vmlinux-gdb.py", line 25, in <module>
import linux.constants
File "/home/fainelli/work/buildroot/output/arm64/build/linux-custom/scripts/gdb/linux/constants.py", line 5
LX_SB_RDONLY = ((((1UL))) << (0))
Use LX_GDBPARSED() which does not suffer from that issue.
f15afbd34d ("fs: fix undefined behavior in bit shift for SB_NOUSER")
Link: https://lkml.kernel.org/r/20230607221337.2781730-1-florian.fainelli@broadcom.com
Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Acked-by: Christian Brauner <brauner@kernel.org>
Cc: Hao Ge <gehao@kylinos.cn>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Kieran Bingham <kbingham@kernel.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Pankaj Raghav <p.raghav@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Remove an unnecessary call to xas_set(index) when iterating over the
target range in collapse_file. The extra call to xas_set reset the xas
cursor to the top of the tree, causing the xas_next call on the next
iteration to walk the tree to index instead of advancing to index+1. This
returned the same page again, which would cause collapse_file to fail
because the page is already locked.
This bug was hidden when CONFIG_DEBUG_VM was set. When that config was
used, the xas_load in a subsequent VM_BUG_ON assert would walk xas from
the top of the tree to index, causing the xas_next call on the next loop
iteration to advance the cursor as expected.
Link: https://lkml.kernel.org/r/20230607053135.2087354-1-stevensd@google.com
Fixes: a2e17cc2ef ("mm/khugepaged: maintain page cache uptodate flag")
Signed-off-by: David Stevens <stevensd@chromium.org>
Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jiaqi Yan <jiaqiyan@google.com>
Cc: Kirill A . Shutemov <kirill@shutemov.name>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
In __vmalloc_area_node() we always warn_alloc() when an allocation
performed by vm_area_alloc_pages() fails unless it was due to a pending
fatal signal.
However, huge page allocations instigated either by vmalloc_huge() or
__vmalloc_node_range() (or a caller that invokes this like kvmalloc() or
kvmalloc_node()) always falls back to order-0 allocations if the huge page
allocation fails.
This renders the warning useless and noisy, especially as all callers
appear to be aware that this may fallback. This has already resulted in
at least one bug report from a user who was confused by this (see link).
Therefore, simply update the code to only output this warning for order-0
pages when no fatal signal is pending.
Link: https://bugzilla.suse.com/show_bug.cgi?id=1211410
Link: https://lkml.kernel.org/r/20230605201107.83298-1-lstoakes@gmail.com
Fixes: 80b1d8fdfa ("mm: vmalloc: correct use of __GFP_NOWARN mask in __vmalloc_area_node()")
Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Baoquan He <bhe@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
When commit 19343b5bdd ("mm/page-writeback: introduce tracepoint for
wait_on_page_writeback()") repurposed the writeback_dirty_page trace event
as a template to create its new wait_on_page_writeback trace event, it
ended up opening a window to NULL pointer dereference crashes due to the
(infrequent) occurrence of a race where an access to a page in the
swap-cache happens concurrently with the moment this page is being written
to disk and the tracepoint is enabled:
BUG: kernel NULL pointer dereference, address: 0000000000000040
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 800000010ec0a067 P4D 800000010ec0a067 PUD 102353067 PMD 0
Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 1 PID: 1320 Comm: shmem-worker Kdump: loaded Not tainted 6.4.0-rc5+ #13
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS edk2-20230301gitf80f052277c8-1.fc37 03/01/2023
RIP: 0010:trace_event_raw_event_writeback_folio_template+0x76/0xf0
Code: 4d 85 e4 74 5c 49 8b 3c 24 e8 06 98 ee ff 48 89 c7 e8 9e 8b ee ff ba 20 00 00 00 48 89 ef 48 89 c6 e8 fe d4 1a 00 49 8b 04 24 <48> 8b 40 40 48 89 43 28 49 8b 45 20 48 89 e7 48 89 43 30 e8 a2 4d
RSP: 0000:ffffaad580b6fb60 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff90e38035c01c RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff90e38035c044
RBP: ffff90e38035c024 R08: 0000000000000002 R09: 0000000000000006
R10: ffff90e38035c02e R11: 0000000000000020 R12: ffff90e380bac000
R13: ffffe3a7456d9200 R14: 0000000000001b81 R15: ffffe3a7456d9200
FS: 00007f2e4e8a15c0(0000) GS:ffff90e3fbc80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000040 CR3: 00000001150c6003 CR4: 0000000000170ee0
Call Trace:
<TASK>
? __die+0x20/0x70
? page_fault_oops+0x76/0x170
? kernelmode_fixup_or_oops+0x84/0x110
? exc_page_fault+0x65/0x150
? asm_exc_page_fault+0x22/0x30
? trace_event_raw_event_writeback_folio_template+0x76/0xf0
folio_wait_writeback+0x6b/0x80
shmem_swapin_folio+0x24a/0x500
? filemap_get_entry+0xe3/0x140
shmem_get_folio_gfp+0x36e/0x7c0
? find_busiest_group+0x43/0x1a0
shmem_fault+0x76/0x2a0
? __update_load_avg_cfs_rq+0x281/0x2f0
__do_fault+0x33/0x130
do_read_fault+0x118/0x160
do_pte_missing+0x1ed/0x2a0
__handle_mm_fault+0x566/0x630
handle_mm_fault+0x91/0x210
do_user_addr_fault+0x22c/0x740
exc_page_fault+0x65/0x150
asm_exc_page_fault+0x22/0x30
This problem arises from the fact that the repurposed writeback_dirty_page
trace event code was written assuming that every pointer to mapping
(struct address_space) would come from a file-mapped page-cache object,
thus mapping->host would always be populated, and that was a valid case
before commit 19343b5bdd. The swap-cache address space
(swapper_spaces), however, doesn't populate its ->host (struct inode)
pointer, thus leading to the crashes in the corner-case aforementioned.
commit 19343b5bdd ended up breaking the assignment of __entry->name and
__entry->ino for the wait_on_page_writeback tracepoint -- both dependent
on mapping->host carrying a pointer to a valid inode. The assignment of
__entry->name was fixed by commit 68f23b8906 ("memcg: fix a crash in
wb_workfn when a device disappears"), and this commit fixes the remaining
case, for __entry->ino.
Link: https://lkml.kernel.org/r/20230606233613.1290819-1-aquini@redhat.com
Fixes: 19343b5bdd ("mm/page-writeback: introduce tracepoint for wait_on_page_writeback()")
Signed-off-by: Rafael Aquini <aquini@redhat.com>
Reviewed-by: Yafang Shao <laoar.shao@gmail.com>
Cc: Aristeu Rozanski <aris@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
When booting with "intremap=off" and "x2apic_phys" on the kernel command
line, the physical x2APIC driver ends up being used even when x2APIC
mode is disabled ("intremap=off" disables x2APIC mode). This happens
because the first compound condition check in x2apic_phys_probe() is
false due to x2apic_mode == 0 and so the following one returns true
after default_acpi_madt_oem_check() having already selected the physical
x2APIC driver.
This results in the following panic:
kernel BUG at arch/x86/kernel/apic/io_apic.c:2409!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0-rc2-ver4.1rc2 #2
Hardware name: Dell Inc. PowerEdge R6515/07PXPY, BIOS 2.3.6 07/06/2021
RIP: 0010:setup_IO_APIC+0x9c/0xaf0
Call Trace:
<TASK>
? native_read_msr
apic_intr_mode_init
x86_late_time_init
start_kernel
x86_64_start_reservations
x86_64_start_kernel
secondary_startup_64_no_verify
</TASK>
which is:
setup_IO_APIC:
apic_printk(APIC_VERBOSE, "ENABLING IO-APIC IRQs\n");
for_each_ioapic(ioapic)
BUG_ON(mp_irqdomain_create(ioapic));
Return 0 to denote that x2APIC has not been enabled when probing the
physical x2APIC driver.
[ bp: Massage commit message heavily. ]
Fixes: 9ebd680bd0 ("x86, apic: Use probe routines to simplify apic selection")
Signed-off-by: Dheeraj Kumar Srivastava <dheerajkumar.srivastava@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Kishon Vijay Abraham I <kvijayab@amd.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230616212236.1389-1-dheerajkumar.srivastava@amd.com
Pull AFS writeback fixes from David Howells:
- release the acquired batch before returning if we got >=5 skips
- retry a page we had to wait for rather than skipping over it after
the wait
* tag 'afs-fixes-20230719' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
afs: Fix waiting for writeback then skipping folio
afs: Fix dangling folio ref counts in writeback
When using encapsulation the original packet's headers are copied to the
inner headers. This preserves the space for an inner mac header, which
is not used by the inner payloads for the encapsulation types supported
by IPVS. If a packet is using GUE or GRE encapsulation and needs to be
segmented, flow can be passed to __skb_udp_tunnel_segment() which
calculates a negative tunnel header length. A negative tunnel header
length causes pskb_may_pull() to fail, dropping the packet.
This can be observed by attaching probes to ip_vs_in_hook(),
__dev_queue_xmit(), and __skb_udp_tunnel_segment():
perf probe --add '__dev_queue_xmit skb->inner_mac_header \
skb->inner_network_header skb->mac_header skb->network_header'
perf probe --add '__skb_udp_tunnel_segment:7 tnl_hlen'
perf probe -m ip_vs --add 'ip_vs_in_hook skb->inner_mac_header \
skb->inner_network_header skb->mac_header skb->network_header'
These probes the headers and tunnel header length for packets which
traverse the IPVS encapsulation path. A TCP packet can be forced into
the segmentation path by being smaller than a calculated clamped MSS,
but larger than the advertised MSS.
probe:ip_vs_in_hook: inner_mac_header=0x0 inner_network_header=0x0 mac_header=0x44 network_header=0x52
probe:ip_vs_in_hook: inner_mac_header=0x44 inner_network_header=0x52 mac_header=0x44 network_header=0x32
probe:dev_queue_xmit: inner_mac_header=0x44 inner_network_header=0x52 mac_header=0x44 network_header=0x32
probe:__skb_udp_tunnel_segment_L7: tnl_hlen=-2
When using veth-based encapsulation, the interfaces are set to be
mac-less, which does not preserve space for an inner mac header. This
prevents this issue from occurring.
In our real-world testing of sending a 32KB file we observed operation
time increasing from ~75ms for veth-based encapsulation to over 1.5s
using IPVS encapsulation due to retries from dropped packets.
This changeset modifies the packet on the encapsulation path in
ip_vs_tunnel_xmit() and ip_vs_tunnel_xmit_v6() to remove the inner mac
header offset. This fixes UDP segmentation for both encapsulation types,
and corrects the inner headers for any IPIP flows that may use it.
Fixes: 84c0d5e96f ("ipvs: allow tunneling with gue encapsulation")
Signed-off-by: Terin Stock <terin@cloudflare.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Simon Horman <horms@kernel.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Up until commit 6a45b0e258 ("gpiolib: Introduce
gpiochip_irqchip_add_domain()") all irq_domains were allocated
by gpiolib itself and thus gpiolib also takes care of freeing it.
With gpiochip_irqchip_add_domain() a user of gpiolib can associate an
irq_domain with the gpio_chip. This irq_domain is not managed by
gpiolib and therefore must not be freed by gpiolib.
Fixes: 6a45b0e258 ("gpiolib: Introduce gpiochip_irqchip_add_domain()")
Reported-by: Jiawen Wu <jiawenwu@trustnetic.com>
Signed-off-by: Michael Walle <mwalle@kernel.org>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Andy Shevchenko <andy@kernel.org>
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Adds the required quirk to enable the Cirrus amp and correct pins
on the ASUS ROG G634Z series.
While this works if the related _DSD properties are made available, these
aren't included in the ACPI of these laptops (yet).
Signed-off-by: Luke D. Jones <luke@ljones.dev>
Link: https://lore.kernel.org/r/20230619060320.1336455-1-luke@ljones.dev
Signed-off-by: Takashi Iwai <tiwai@suse.de>
The driver overrides the error codes returned by platform_get_irq_byname()
to -ENODEV, so if it returns -EPROBE_DEFER, the driver will fail the probe
permanently instead of the deferred probing. Switch to propagating error
codes upstream.
Fixes: 9ec36cafe4 ("of/irq: do irq resolution in platform_get_irq")
Signed-off-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Link: https://lore.kernel.org/r/20230617203622.6812-13-s.shtylyov@omp.ru
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
The driver overrides the error codes and IRQ0 returned by platform_get_irq()
to -EINVAL, so if it returns -EPROBE_DEFER, the driver will fail the probe
permanently instead of the deferred probing. Switch to propagating the error
codes upstream. Since commit ce753ad154 ("platform: finally disallow IRQ0
in platform_get_irq() and its ilk") IRQ0 is no longer returned by those APIs,
so we now can safely ignore it...
Fixes: 2408a08583 ("mmc: sunxi-mmc: Handle return value of platform_get_irq")
Cc: stable@vger.kernel.org # v5.19+
Signed-off-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Reviewed-by: Jernej Skrabec <jernej.skrabec@gmail.com>
Link: https://lore.kernel.org/r/20230617203622.6812-12-s.shtylyov@omp.ru
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
The driver overrides the error codes returned by platform_get_irq() to
-ENXIO, so if it returns -EPROBE_DEFER, the driver will fail the probe
permanently instead of the deferred probing. Switch to propagating the
error codes upstream.
Fixes: 9ec36cafe4 ("of/irq: do irq resolution in platform_get_irq")
Signed-off-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Link: https://lore.kernel.org/r/20230617203622.6812-11-s.shtylyov@omp.ru
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
The driver overrides the error codes and IRQ0 returned by platform_get_irq()
to -EINVAL, so if it returns -EPROBE_DEFER, the driver will fail the probe
permanently instead of the deferred probing. Switch to propagating the error
codes upstream. Since commit ce753ad154 ("platform: finally disallow IRQ0
in platform_get_irq() and its ilk") IRQ0 is no longer returned by those APIs,
so we now can safely ignore it...
Fixes: 682798a596 ("mmc: sdhci-spear: Handle return value of platform_get_irq")
Cc: stable@vger.kernel.org # v5.19+
Signed-off-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Link: https://lore.kernel.org/r/20230617203622.6812-10-s.shtylyov@omp.ru
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
The driver overrides the error codes returned by platform_get_irq() to
-EINVAL, so if it returns -EPROBE_DEFER, the driver will fail the probe
permanently instead of the deferred probing. Switch to propagating the
error codes upstream.
Fixes: ff65ffe46d ("mmc: Add Actions Semi Owl SoCs SD/MMC driver")
Signed-off-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Link: https://lore.kernel.org/r/20230617203622.6812-8-s.shtylyov@omp.ru
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
The driver overrides the error codes returned by platform_get_irq() to
-ENXIO, so if it returns -EPROBE_DEFER, the driver will fail the probe
permanently instead of the deferred probing. Switch to propagating the
error codes upstream.
Fixes: 9ec36cafe4 ("of/irq: do irq resolution in platform_get_irq")
Signed-off-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Link: https://lore.kernel.org/r/20230617203622.6812-7-s.shtylyov@omp.ru
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
The driver overrides the error codes returned by platform_get_irq() to
-ENXIO, so if it returns -EPROBE_DEFER, the driver will fail the probe
permanently instead of the deferred probing. Switch to propagating the
error codes upstream.
Fixes: 9ec36cafe4 ("of/irq: do irq resolution in platform_get_irq")
Signed-off-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Link: https://lore.kernel.org/r/20230617203622.6812-6-s.shtylyov@omp.ru
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
The driver overrides the error codes returned by platform_get_irq() to
-ENXIO, so if it returns -EPROBE_DEFER, the driver will fail the probe
permanently instead of the deferred probing. Switch to propagating the
error codes upstream.
Fixes: 9ec36cafe4 ("of/irq: do irq resolution in platform_get_irq")
Signed-off-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Link: https://lore.kernel.org/r/20230617203622.6812-5-s.shtylyov@omp.ru
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
The driver overrides the error codes returned by platform_get_irq() to
-EINVAL, so if it returns -EPROBE_DEFER, the driver will fail the probe
permanently instead of the deferred probing. Switch to propagating the
error codes upstream.
Fixes: 208489032b ("mmc: mediatek: Add Mediatek MMC driver")
Signed-off-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Link: https://lore.kernel.org/r/20230617203622.6812-4-s.shtylyov@omp.ru
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
The driver overrides the error codes and IRQ0 returned by platform_get_irq()
to -EINVAL, so if it returns -EPROBE_DEFER, the driver will fail the probe
permanently instead of the deferred probing. Switch to propagating the error
codes upstream. Since commit ce753ad154 ("platform: finally disallow IRQ0
in platform_get_irq() and its ilk") IRQ0 is no longer returned by those APIs,
so we now can safely ignore it...
Fixes: cbcaac6d7d ("mmc: meson-gx-mmc: Fix platform_get_irq's error checking")
Cc: stable@vger.kernel.org # v5.19+
Signed-off-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://lore.kernel.org/r/20230617203622.6812-3-s.shtylyov@omp.ru
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
The driver overrides the error codes and IRQ0 returned by platform_get_irq()
to -EINVAL, so if it returns -EPROBE_DEFER, the driver will fail the probe
permanently instead of the deferred probing. Switch to propagating the error
codes upstream. Since commit ce753ad154 ("platform: finally disallow IRQ0
in platform_get_irq() and its ilk") IRQ0 is no longer returned by those APIs,
so we now can safely ignore it...
Fixes: 660fc733bd ("mmc: bcm2835: Add new driver for the sdhost controller.")
Cc: stable@vger.kernel.org # v5.19+
Signed-off-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Link: https://lore.kernel.org/r/20230617203622.6812-2-s.shtylyov@omp.ru
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Saeed Mahameed says:
====================
mlx5-fixes-2023-06-16
This series provides bug fixes to mlx5 driver.
Please pull and let me know if there is any problem.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
In case the QCA7000 is not available via SPI (e.g. in reset),
the driver will cause a high load. The reason for this is
that the synchronization is never finished and schedule()
is never called. Since the synchronization is not timing
critical, it's safe to drop this from the scheduling condition.
Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
Fixes: 291ab06ecf ("net: qualcomm: new Ethernet over SPI driver for QCA7000")
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull SCSI fixes from James Bottomley:
"Four fixes, all in drivers: three fairly obvious small ones and a
large one in aacraid to add block queue completion mapping and fix a
CPU offline hang"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: lpfc: Fix incorrect big endian type assignment in bsg loopback path
scsi: target: core: Fix error path in target_setup_session()
scsi: storvsc: Always set no_report_opcodes
scsi: aacraid: Reply queue mapping to CPUs based on IRQ affinity
Pull ata fix from Damien Le Moal:
- Avoid deadlocks on resume from sleep by delaying scsi rescan until
the scsi device is also fully resumed.
* tag 'ata-6.4-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
ata: libata-scsi: Avoid deadlock on rescan after device resume
Pull parisc fix from Helge Deller:
- Drop redundant register definitions to fix build with latest binutils
* tag 'parisc-for-6.4-4' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Delete redundant register definitions in <asm/assembly.h>
If the core is left to remove the LEDs via devm_, it is performed too
late, after the PHY driver is removed from the PHY. This results in
dereferencing a NULL pointer when the LED core tries to turn the LED
off before destroying the LED.
Manually unregister the LEDs at a safe point in phy_remove.
Cc: stable@vger.kernel.org
Reported-by: Florian Fainelli <f.fainelli@gmail.com>
Suggested-by: Florian Fainelli <f.fainelli@gmail.com>
Fixes: 01e5b728e9 ("net: phy: Add a binding for PHY LEDs")
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
The error unrolling was leaving the VMAs detached in many cases and
leaving the locked_vm statistic altered, and skipping the unrolling
entirely in the case of the vma tree write failing.
Fix the error path by re-attaching the detached VMAs and adding the
necessary goto for the failed vma tree write, and fix the locked_vm
statistic by only updating after the vma tree write succeeds.
Fixes: 763ecb0350 ("mm: remove the vma linked list")
Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The module loads firmware so add a MODULE_FIRMWARE macro to provide that
information via modinfo.
Signed-off-by: Juerg Haefliger <juerg.haefliger@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the case of fast device addition/removal, it's possible that
hv_eject_device_work() can start to run before create_root_hv_pci_bus()
starts to run; as a result, the pci_get_domain_bus_and_slot() in
hv_eject_device_work() can return a 'pdev' of NULL, and
hv_eject_device_work() can remove the 'hpdev', and immediately send a
message PCI_EJECTION_COMPLETE to the host, and the host immediately
unassigns the PCI device from the guest; meanwhile,
create_root_hv_pci_bus() and the PCI device driver can be probing the
dead PCI device and reporting timeout errors.
Fix the issue by adding a per-bus mutex 'state_lock' and grabbing the
mutex before powering on the PCI bus in hv_pci_enter_d0(): when
hv_eject_device_work() starts to run, it's able to find the 'pdev' and call
pci_stop_and_remove_bus_device(pdev): if the PCI device driver has
loaded, the PCI device driver's probe() function is already called in
create_root_hv_pci_bus() -> pci_bus_add_devices(), and now
hv_eject_device_work() -> pci_stop_and_remove_bus_device() is able
to call the PCI device driver's remove() function and remove the device
reliably; if the PCI device driver hasn't loaded yet, the function call
hv_eject_device_work() -> pci_stop_and_remove_bus_device() is able to
remove the PCI device reliably and the PCI device driver's probe()
function won't be called; if the PCI device driver's probe() is already
running (e.g., systemd-udev is loading the PCI device driver), it must
be holding the per-device lock, and after the probe() finishes and releases
the lock, hv_eject_device_work() -> pci_stop_and_remove_bus_device() is
able to proceed to remove the device reliably.
Fixes: 4daace0d8c ("PCI: hv: Add paravirtual PCI front-end for Microsoft Hyper-V VMs")
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Acked-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230615044451.5580-6-decui@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
This reverts commit d6af2ed29c.
The statement "the hv_pci_bus_exit() call releases structures of all its
child devices" in commit d6af2ed29c is not true: in the path
hv_pci_probe() -> hv_pci_enter_d0() -> hv_pci_bus_exit(hdev, true): the
parameter "keep_devs" is true, so hv_pci_bus_exit() does *not* release the
child "struct hv_pci_dev *hpdev" that is created earlier in
pci_devices_present_work() -> new_pcichild_device().
The commit d6af2ed29c was originally made in July 2020 for RHEL 7.7,
where the old version of hv_pci_bus_exit() was used; when the commit was
rebased and merged into the upstream, people didn't notice that it's
not really necessary. The commit itself doesn't cause any issue, but it
makes hv_pci_probe() more complicated. Revert it to facilitate some
upcoming changes to hv_pci_probe().
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Acked-by: Wei Hu <weh@microsoft.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230615044451.5580-5-decui@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
When the host tries to remove a PCI device, the host first sends a
PCI_EJECT message to the guest, and the guest is supposed to gracefully
remove the PCI device and send a PCI_EJECTION_COMPLETE message to the host;
the host then sends a VMBus message CHANNELMSG_RESCIND_CHANNELOFFER to
the guest (when the guest receives this message, the device is already
unassigned from the guest) and the guest can do some final cleanup work;
if the guest fails to respond to the PCI_EJECT message within one minute,
the host sends the VMBus message CHANNELMSG_RESCIND_CHANNELOFFER and
removes the PCI device forcibly.
In the case of fast device addition/removal, it's possible that the PCI
device driver is still configuring MSI-X interrupts when the guest receives
the PCI_EJECT message; the channel callback calls hv_pci_eject_device(),
which sets hpdev->state to hv_pcichild_ejecting, and schedules a work
hv_eject_device_work(); if the PCI device driver is calling
pci_alloc_irq_vectors() -> ... -> hv_compose_msi_msg(), we can break the
while loop in hv_compose_msi_msg() due to the updated hpdev->state, and
leave data->chip_data with its default value of NULL; later, when the PCI
device driver calls request_irq() -> ... -> hv_irq_unmask(), the guest
crashes in hv_arch_irq_unmask() due to data->chip_data being NULL.
Fix the issue by not testing hpdev->state in the while loop: when the
guest receives PCI_EJECT, the device is still assigned to the guest, and
the guest has one minute to finish the device removal gracefully. We don't
really need to (and we should not) test hpdev->state in the loop.
Fixes: de0aa7b2f9 ("PCI: hv: Fix 2 hang issues in hv_compose_msi_msg()")
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230615044451.5580-3-decui@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Since day 1 of the driver, there has been a race between
hv_pci_query_relations() and survey_child_resources(): during fast
device hotplug, hv_pci_query_relations() may error out due to
device-remove and the stack variable 'comp' is no longer valid;
however, pci_devices_present_work() -> survey_child_resources() ->
complete() may be running on another CPU and accessing the no-longer-valid
'comp'. Fix the race by flushing the workqueue before we exit from
hv_pci_query_relations().
Fixes: 4daace0d8c ("PCI: hv: Add paravirtual PCI front-end for Microsoft Hyper-V VMs")
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Acked-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230615044451.5580-2-decui@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
When an ATA port is resumed from sleep, the port is reset and a power
management request issued to libata EH to reset the port and rescanning
the device(s) attached to the port. Device rescanning is done by
scheduling an ata_scsi_dev_rescan() work, which will execute
scsi_rescan_device().
However, scsi_rescan_device() takes the generic device lock, which is
also taken by dpm_resume() when the SCSI device is resumed as well. If
a device rescan execution starts before the completion of the SCSI
device resume, the rcu locking used to refresh the cached VPD pages of
the device, combined with the generic device locking from
scsi_rescan_device() and from dpm_resume() can cause a deadlock.
Avoid this situation by changing struct ata_port scsi_rescan_task to be
a delayed work instead of a simple work_struct. ata_scsi_dev_rescan() is
modified to check if the SCSI device associated with the ATA device that
must be rescanned is not suspended. If the SCSI device is still
suspended, ata_scsi_dev_rescan() returns early and reschedule itself for
execution after an arbitrary delay of 5ms.
Reported-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Reported-by: Joe Breuer <linux-kernel@jmbreuer.net>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217530
Fixes: a19a93e4c6 ("scsi: core: pm: Rely on the device driver core for async power management")
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Tested-by: Joe Breuer <linux-kernel@jmbreuer.net>
We selectively grab the ctx->uring_lock for poll update/removal, but
we really should grab it from the start to fully synchronize with
linked timeouts. Normally this is indeed the case, but if requests
are forced async by the application, we don't fully cover removal
and timer disarm within the uring_lock.
Make this simpler by having consistent locking state for poll removal.
Cc: stable@vger.kernel.org # 6.1+
Reported-by: Querijn Voet <querijnqyn@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
These commits
a494aef23d ("PCI: hv: Replace retarget_msi_interrupt_params with hyperv_pcpu_input_arg")
2c6ba42168 ("PCI: hv: Enable PCI pass-thru devices in Confidential VMs")
update the Hyper-V virtual PCI driver to use the hyperv_pcpu_input_arg
because that memory will be correctly marked as decrypted or encrypted
for all VM types (CoCo or normal). But problems ensue when CPUs in the
VM go online or offline after virtual PCI devices have been configured.
When a CPU is brought online, the hyperv_pcpu_input_arg for that CPU is
initialized by hv_cpu_init() running under state CPUHP_AP_ONLINE_DYN.
But this state occurs after state CPUHP_AP_IRQ_AFFINITY_ONLINE, which
may call the virtual PCI driver and fault trying to use the as yet
uninitialized hyperv_pcpu_input_arg. A similar problem occurs in a CoCo
VM if the MMIO read and write hypercalls are used from state
CPUHP_AP_IRQ_AFFINITY_ONLINE.
When a CPU is taken offline, IRQs may be reassigned in state
CPUHP_TEARDOWN_CPU. Again, the virtual PCI driver may fault trying to
use the hyperv_pcpu_input_arg that has already been freed by a
higher state.
Fix the onlining problem by adding state CPUHP_AP_HYPERV_ONLINE
immediately after CPUHP_AP_ONLINE_IDLE (similar to CPUHP_AP_KVM_ONLINE)
and before CPUHP_AP_IRQ_AFFINITY_ONLINE. Use this new state for
Hyper-V initialization so that hyperv_pcpu_input_arg is allocated
early enough.
Fix the offlining problem by not freeing hyperv_pcpu_input_arg when
a CPU goes offline. Retain the allocated memory, and reuse it if
the CPU comes back online later.
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Dexuan Cui <decui@microsoft.com>
Link: https://lore.kernel.org/r/1684862062-51576-1-git-send-email-mikelley@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Pull staging driver fix from Greg KH:
"Here is a single staging driver "fix" for 6.4-rc7. I've been sitting
on it in my tree for many weeks as it is just a simple documentation
update, with the hope that maybe some other staging driver fixes would
need to be merged for 6.4-final, but that does not seem to be the
case.
So please, pull in this one documentation update so that Aaro doesn't
get emails going forward that he can't do anything about"
* tag 'staging-6.4-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: octeon: delete my name from TODO contact
Pull USB / Thunderbolt fixes from Greg KH:
"Here are some small USB and Thunderbolt driver fixes and new device
ids for 6.4-rc7 to resolve some reported problems. Included in here
are:
- new USB serial device ids
- USB gadget core fixes for long-dissussed problems
- dwc3 bugfixes for reported issues.
- typec driver fixes
- thunderbolt driver fixes
All of these have been in linux-next this week with no reported issues"
* tag 'usb-6.4-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: gadget: udc: core: Prevent soft_connect_store() race
usb: gadget: udc: core: Offload usb_udc_vbus_handler processing
usb: typec: Fix fast_role_swap_current show function
usb: typec: ucsi: Fix command cancellation
USB: dwc3: fix use-after-free on core driver unbind
USB: dwc3: qcom: fix NULL-deref on suspend
usb: dwc3: gadget: Reset num TRBs before giving back the request
usb: gadget: udc: renesas_usb3: Fix RZ/V2M {modprobe,bind} error
USB: serial: option: add Quectel EM061KGL series
thunderbolt: Mask ring interrupt on Intel hardware as well
thunderbolt: Do not touch CL state configuration during discovery
thunderbolt: Increase DisplayPort Connection Manager handshake timeout
thunderbolt: dma_test: Use correct value for absent rings when creating paths
Pull serial driver fixes from Greg KH:
"Here are two small serial driver fixes for 6.4-rc7 that resolve some
reported problems:
- lantiq serial driver irq fix
- fsl_lpuart serial driver watermark fix
Both of these have been in linux-next this week with no reported issues"
* tag 'tty-6.4-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
tty: serial: fsl_lpuart: reduce RX watermark to 0 on LS1028A
serial: lantiq: add missing interrupt ack
When running workloads heavy unbalanced towards TX (high TX, low RX
traffic), sfc driver can retain the CPU during too long times. Although
in many cases this is not enough to be visible, it can affect
performance and system responsiveness.
A way to reproduce it is to use a debug kernel and run some parallel
netperf TX tests. In some systems, this will lead to this message being
logged:
kernel:watchdog: BUG: soft lockup - CPU#12 stuck for 22s!
The reason is that sfc driver doesn't account any NAPI budget for the TX
completion events work. With high-TX/low-RX traffic, this makes that the
CPU is held for long time for NAPI poll.
Documentations says "drivers can process completions for any number of Tx
packets but should only process up to budget number of Rx packets".
However, many drivers do limit the amount of TX completions that they
process in a single NAPI poll.
In the same way, this patch adds a limit for the TX work in sfc. With
the patch applied, the watchdog warning never appears.
Tested with netperf in different combinations: single process / parallel
processes, TCP / UDP and different sizes of UDP messages. Repeated the
tests before and after the patch, without any noticeable difference in
network or CPU performance.
Test hardware:
Intel(R) Xeon(R) CPU E5-1620 v4 @ 3.50GHz (4 cores, 2 threads/core)
Solarflare Communications XtremeScale X2522-25G Network Adapter
Fixes: 5227ecccea ("sfc: remove tx and MCDI handling from NAPI budget consideration")
Fixes: d19a537218 ("sfc_ef100: TX path for EF100 NICs")
Reported-by: Fei Liu <feliu@redhat.com>
Signed-off-by: Íñigo Huguet <ihuguet@redhat.com>
Acked-by: Martin Habets <habetsm.xilinx@gmail.com>
Link: https://lore.kernel.org/r/20230615084929.10506-1-ihuguet@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We define sp and ipsw in <asm/asmregs.h> using ".reg", and when using
current binutils (snapshot 2.40.50.20230611) the definitions in
<asm/assembly.h> using "=" conflict with those:
arch/parisc/include/asm/assembly.h: Assembler messages:
arch/parisc/include/asm/assembly.h:93: Error: symbol `sp' is already defined
arch/parisc/include/asm/assembly.h:95: Error: symbol `ipsw' is already defined
Delete the duplicate definitions in <asm/assembly.h>.
Also delete the definition of gp, which isn't used anywhere.
Signed-off-by: Ben Hutchings <benh@debian.org>
Cc: stable@vger.kernel.org # v6.0+
Signed-off-by: Helge Deller <deller@gmx.de>
Pull clk fixes from Stephen Boyd:
"A handful of clk driver fixes:
- Fix an OOB issue in the Mediatek mt8365 driver where arrays of clks
are mismatched in size
- Use the proper clk_ops for a few clks in the Mediatek mt8365 driver
- Stop using abs() in clk_composite_determine_rate() because 64-bit
math goes wrong on large unsigned long numbers that are subtracted
and passed into abs()
- Zero initialize a struct clk_init_data in clk-loongson2 to avoid
stack junk confusing clk_hw_register()
- Actually use a pointer to __iomem for writel() in
pxa3xx_clk_update_accr() so we don't oops"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: pxa: fix NULL pointer dereference in pxa3xx_clk_update_accr
clk: clk-loongson2: Zero init clk_init_data
clk: mediatek: mt8365: Fix inverted topclk operations
clk: composite: Fix handling of high clock rates
clk: mediatek: mt8365: Fix index issue
This patch validate session id and tree id in compound request.
If first operation in the compound is SMB2 ECHO request, ksmbd bypass
session and tree validation. So work->sess and work->tcon could be NULL.
If secound request in the compound access work->sess or tcon, It cause
NULL pointer dereferecing error.
Cc: stable@vger.kernel.org
Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-21165
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
ksmbd_smb2_check_message doesn't validate hdr->NextCommand. If
->NextCommand is bigger than Offset + Length of smb2 write, It will
allow oversized smb2 write length. It will cause OOB read in smb2_write.
Cc: stable@vger.kernel.org
Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-21164
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
ksmbd is doing write access using vfs helpers. There are the cases that
mnt_want_write() is not called in vfs helper. This patch add missing
mnt_want_write() to ksmbd vfs functions.
Cc: stable@vger.kernel.org
Cc: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Pull drm fixes from Dave Airlie:
"A bunch of misc fixes across the board.
amdgpu is the usual bulk with a revert and other fixes, nouveau has a
race fix that was causing a UAF that was hard hanging systems,
otherwise some qaic, bridge and radeon.
amdgpu:
- GFX9 preemption fixes
- Add missing radeon secondary PCI ID
- vblflash fixes
- SMU 13 fix
- VCN 4.0 fix
- Re-enable TOPDOWN flag for large BAR systems to fix regression
- eDP fix
- PSR hang fix
- DPIA fix
radeon:
- fbdev client warning fix
qaic:
- leak fix
- null ptr deref fix
nouveau:
- use-after-free caused by fence race fix
- runtime pm fix
- NULL ptr checks
bridge:
- ti-sn65dsi86: Avoid possible buffer overflow"
* tag 'drm-fixes-2023-06-17' of git://anongit.freedesktop.org/drm/drm: (21 commits)
nouveau: fix client work fence deletion race
drm/amd/display: limit DPIA link rate to HBR3
drm/amd/display: fix the system hang while disable PSR
drm/amd/display: edp do not add non-edid timings
Revert "drm/amdgpu: remove TOPDOWN flags when allocating VRAM in large bar system"
drm/amdgpu: vcn_4_0 set instance 0 init sched score to 1
drm/radeon: Disable outputs when releasing fbdev client
drm/amd/pm: workaround for compute workload type on some skus
drm/amd: Tighten permissions on VBIOS flashing attributes
drm/amd: Make sure image is written to trigger VBIOS image update flow
drm/amdgpu: add missing radeon secondary PCI ID
drm/amdgpu: Implement gfx9 patch functions for resubmission
drm/amdgpu: Modify indirect buffer packages for resubmission
drm/amdgpu: Program gds backup address as zero if no gds allocated
drm/nouveau: add nv_encoder pointer check for NULL
drm/amdgpu: Reset CP_VMID_PREEMPT after trailing fence signaled
drm/nouveau/dp: check for NULL nv_connector->native_mode
drm/bridge: ti-sn65dsi86: Avoid possible buffer overflow
drm/nouveau: don't detect DSM for non-NVIDIA device
accel/qaic: Fix NULL pointer deref in qaic_destroy_drm_device()
...
In the same spirit as commit ca57f02295 ("afs: Fix fileserver probe
RTT handling"), don't rule out using a vlserver just because there
haven't been enough packets yet to calculate a real rtt. Always set the
server's probe rtt from the estimate provided by rxrpc_kernel_get_srtt,
which is capped at 1 second.
This could lead to EDESTADDRREQ errors when accessing a cell for the
first time, even though the vl servers are known and have responded to a
probe.
Fixes: 1d4adfaf65 ("rxrpc: Make rxrpc_kernel_get_srtt() indicate validity")
Signed-off-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: linux-afs@lists.infradead.org
Link: http://lists.infradead.org/pipermail/linux-afs/2023-June/006746.html
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fixes for the reset pin on nanopi r5c, a reset line on SOQuartz, a duplicate
usb regulator on rock64 and PCIe register mappings on rk356x.
Also some missing cache properties.
* tag 'v6.4-rockchip-dtsfixes1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip:
arm64: dts: rockchip: Fix rk356x PCIe register and range mappings
arm64: dts: rockchip: fix button reset pin for nanopi r5c
arm64: dts: rockchip: fix nEXTRST on SOQuartz
arm64: dts: rockchip: add missing cache properties
arm64: dts: rockchip: fix USB regulator on ROCK64
Link: https://lore.kernel.org/r/2885657.e9J7NaK4W3@phil
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
ASO query can be scheduled in atomic context as such it can't use usleep.
Use udelay as recommended in Documentation/timers/timers-howto.rst.
Fixes: 76e463f650 ("net/mlx5e: Overcome slow response for first IPsec ASO WQE")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
XFRM state which is changed to be XFRM_STATE_EXPIRED doesn't really
need to hold lock while modifying flow steering rules to drop traffic.
That state can be deleted only and as such mlx5e_ipsec_handle_tx_limit()
work will be canceled anyway and won't run in parallel.
Fixes: b2f7b01d36 ("net/mlx5e: Simulate missing IPsec TX limits hardware functionality")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
When TUNNEL_L3_TO_L2 decap action was created, a pointer to a local
variable was passed as its HW action data, resulting in attempt to
free invalid address:
BUG: KASAN: invalid-free in mlx5dr_action_destroy+0x318/0x410 [mlx5_core]
Fixes: 4781df92f4 ("net/mlx5: DR, Move STEv0 modify header logic")
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
In some cases, steering might need to use SW-created action in
FW table, which results in wrong packet reformat being used:
mlx5_core 0000:81:00.1: mlx5_cmd_check:756:(pid 1154):
SET_FLOW_TABLE_ENTRY(0×936) op_mod(0×0) failed,
status bad resource(0×5), syndrome (0xf2ff71)
This patch adds support for usage of SW-created packet reformat (encap)
actions in FW tables, and adds clear error flow for attempt to use
SW-created modify header on FW tables.
Fixes: 6a48faeeca ("net/mlx5: Add direct rule fs_cmd implementation")
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Erez Shitrit <erezsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The cited commit removes special handling of CT action. But it
removes too much. Pre ct/ct_nat tables and some other resources
are not destroyed due to the cited commit.
Fix it by adding it back.
Fixes: 08fe94ec5f ("net/mlx5e: TC, Remove special handling of CT action")
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The cited commits add hardware miss support to tc action. But if
the rules can't be offloaded, the pointers are null and system
will panic when accessing them.
Fix it by checking null pointer.
Fixes: 08fe94ec5f ("net/mlx5e: TC, Remove special handling of CT action")
Fixes: 6702782845 ("net/mlx5e: TC, Set CT miss to the specific ct action instance")
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
When a PCI device has just one msix vector available, we want to share
this vector between async and completion events. Current code fails to
do that assuming it will always have at least one dedicated vector for
completion events. Fix this by detecting when the pool contains just a
single vector.
Fixes: 3354822cde ("net/mlx5: Use dynamic msix vectors allocation")
Signed-off-by: Eli Cohen <elic@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The cited commit missed setting napi_id on XSK RQs, it only affected
regular RQs. Add the missing part to support socket busy polling on XSK
RQs.
Fixes: a2740f529d ("net/mlx5e: xsk: Set napi_id to support busy polling")
Signed-off-by: Maxim Mikityanskiy <maxtram95@gmail.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The cited commits missed passing frag_size to __xdp_rxq_info_reg, which
is required by bpf_xdp_adjust_tail to support growing the tail pointer
in fragmented packets. Pass the missing parameter when the current RQ
mode allows XDP multi buffer.
Fixes: ea5d49bdae ("net/mlx5e: Add XDP multi buffer support to the non-linear legacy RQ")
Fixes: 9cb9482ef1 ("net/mlx5e: Use fragments of the same size in non-linear legacy RQ with XDP")
Signed-off-by: Maxim Mikityanskiy <maxtram95@gmail.com>
Cc: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Pull btrfs fixes from David Sterba:
"Two fixes for NOCOW files, a regression fix in scrub and an assertion
fix:
- NOCOW fixes:
- keep length of iomap direct io request in case of a failure
- properly pass mode of extent reference checking, this can break
some cases for swapfile
- fix error value confusion when scrubbing a stripe
- convert assertion to a proper error handling when loading global
roots, reported by syzbot"
* tag 'for-6.4-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: scrub: fix a return value overwrite in scrub_stripe()
btrfs: do not ASSERT() on duplicated global roots
btrfs: can_nocow_file_extent should pass down args->strict from callers
btrfs: fix iomap_begin length for nocow writes
Pull block fix from Jens Axboe:
"Just a single fix for blk-cg stats flushing"
* tag 'block-6.4-2023-06-15' of git://git.kernel.dk/linux:
blk-cgroup: Flush stats before releasing blkcg_gq
Pull io_uring fixes from Jens Axboe:
"A fix for sendmsg with CMSG, and the followup fix discussed for
avoiding touching task->worker_private after the worker has started
exiting"
* tag 'io_uring-6.4-2023-06-15' of git://git.kernel.dk/linux:
io_uring/io-wq: clear current->worker_private on exit
io_uring/net: save msghdr->msg_control for retries
Pull sound fixes from Takashi Iwai:
"Just a few small fixes. The only change to the core code is for a
minor race in ALSA OSS sequencer, and the rest are all device-specific
fixes (regression fixes and a usual quirk)"
* tag 'sound-6.4-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: usb-audio: Add quirk flag for HEM devices to enable native DSD playback
ALSA: usb-audio: Fix broken resume due to UAC3 power state
ALSA: seq: oss: Fix racy open/close of MIDI devices
ASoC: tegra: Fix Master Volume Control
ALSA: hda/realtek: Add a quirk for Compaq N14JP6
firmware: cs_dsp: Log correct region name in bin error messages
KPTI keeps around two PGDs: one for userspace and another for the
kernel. Among other things, set_pgd() contains infrastructure to
ensure that updates to the kernel PGD are reflected in the user PGD
as well.
One side-effect of this is that set_pgd() expects to be passed whole
pages. Unfortunately, init_trampoline_kaslr() passes in a single entry:
'trampoline_pgd_entry'.
When KPTI is on, set_pgd() will update 'trampoline_pgd_entry' (an
8-Byte globally stored [.bss] variable) and will then proceed to
replicate that value into the non-existent neighboring user page
(located +4k away), leading to the corruption of other global [.bss]
stored variables.
Fix it by directly assigning 'trampoline_pgd_entry' and avoiding
set_pgd().
[ dhansen: tweak subject and changelog ]
Fixes: 0925dda596 ("x86/mm/KASLR: Use only one PUD entry for real mode trampoline")
Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Lee Jones <lee@kernel.org>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/all/20230614163859.924309-1-lee@kernel.org/g
The tick period is aligned very early while the first clock_event_device is
registered. At that point the system runs in periodic mode and switches
later to one-shot mode if possible.
The next wake-up event is programmed based on the aligned value
(tick_next_period) but the delta value, that is used to program the
clock_event_device, is computed based on ktime_get().
With the subtracted offset, the device fires earlier than the exact time
frame. With a large enough offset the system programs the timer for the
next wake-up and the remaining time left is too small to make any boot
progress. The system hangs.
Move the alignment later to the setup of tick_sched timer. At this point
the system switches to oneshot mode and a high resolution clocksource is
available. At this point it is safe to align tick_next_period because
ktime_get() will now return accurate (not jiffies based) time.
[bigeasy: Patch description + testing].
Fixes: e9523a0d81 ("tick/common: Align tick period with the HZ tick.")
Reported-by: Mathias Krause <minipli@grsecurity.net>
Reported-by: "Bhatnagar, Rishabh" <risbhat@amazon.com>
Suggested-by: Mathias Krause <minipli@grsecurity.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Richard W.M. Jones <rjones@redhat.com>
Tested-by: Mathias Krause <minipli@grsecurity.net>
Acked-by: SeongJae Park <sj@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/5a56290d-806e-b9a5-f37c-f21958b5a8c0@grsecurity.net
Link: https://lore.kernel.org/12c6f9a3-d087-b824-0d05-0d18c9bc1bf3@amazon.com
Link: https://lore.kernel.org/r/20230615091830.RxMV2xf_@linutronix.de
Pull RCU fix from Paul McKenney:
"This fixes a spinlock-initialization regression in SRCU that causes
the SRCU notifier to fail.
The fix simply adds the initialization, but introduces a #ifdef
because there is no spinlock to initialize for the Tiny SRCU used in
!SMP builds.
Yes, it would be nice to abstract this somehow in order to hide it in
SRCU, but I still don't see a good way of doing this"
* tag 'urgent-rcu.2023.06.11a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
notifier: Initialize new struct srcu_usage field
Pull RISC-V fix from Palmer Dabbelt:
- A documentation patch describing how we use patchwork
* tag 'riscv-for-linus-6.4-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
Documentation: RISC-V: patch-acceptance: mention patchwork's role
Commits ffb1b4a410 ("x86/unwind/orc: Add 'signal' field to ORC
metadata") and fb799447ae ("x86,objtool: Split UNWIND_HINT_EMPTY in
two") changed the ORC format. Although ORC is internal to the kernel,
it's the only way for external tools to get reliable kernel stack traces
on x86-64. In particular, the drgn debugger [1] uses ORC for stack
unwinding, and these format changes broke it [2]. As the drgn
maintainer, I don't care how often or how much the kernel changes the
ORC format as long as I have a way to detect the change.
It suffices to store a version identifier in the vmlinux and kernel
module ELF files (to use when parsing ORC sections from ELF), and in
kernel memory (to use when parsing ORC from a core dump+symbol table).
Rather than hard-coding a version number that needs to be manually
bumped, Peterz suggested hashing the definitions from orc_types.h. If
there is a format change that isn't caught by this, the hashing script
can be updated.
This patch adds an .orc_header allocated ELF section containing the
20-byte hash to vmlinux and kernel modules, along with the corresponding
__start_orc_header and __stop_orc_header symbols in vmlinux.
1: https://github.com/osandov/drgn
2: https://github.com/osandov/drgn/issues/303
Fixes: ffb1b4a410 ("x86/unwind/orc: Add 'signal' field to ORC metadata")
Fixes: fb799447ae ("x86,objtool: Split UNWIND_HINT_EMPTY in two")
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Link: https://lkml.kernel.org/r/aef9c8dc43915b886a8c48509a12ec1b006ca1ca.1686690801.git.osandov@osandov.com
Reiji reports that the arm64 implementation of arch_perf_update_userpage()
is now ignored and replaced by the dummy stub in core code.
This seems to happen since the PMUv3 driver was moved to driver/perf.
As it turns out, dropping the __weak attribute from the *prototype*
of the function solves the problem. You're right, this doesn't seem
to make much sense. And yet... It appears that both symbols get
flagged as weak, and that the first one to appear in the link order
wins:
$ nm drivers/perf/arm_pmuv3.o|grep arch_perf_update_userpage
0000000000001db0 W arch_perf_update_userpage
Dropping the attribute from the prototype restores the expected
behaviour, and arm64 is able to enjoy arch_perf_update_userpage()
again.
Fixes: 7755cec63a ("arm64: perf: Move PMUv3 driver to drivers/perf")
Fixes: f1ec3a517b ("kernel/events: Add a missing prototype for arch_perf_update_userpage()")
Reported-by: Reiji Watanabe <reijiw@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Reiji Watanabe <reijiw@google.com>
Link: https://lkml.kernel.org/r/20230616114831.3186980-1-maz@kernel.org
With offloading enabled, esp_xmit() gets invoked very late, from within
validate_xmit_xfrm() which is after validate_xmit_skb() validates and
linearizes the skb if the underlying device does not support fragments.
esp_output_tail() may add a fragment to the skb while adding the auth
tag/ IV. Devices without the proper support will then send skb->data
points to with the correct length so the packet will have garbage at the
end. A pcap sniffer will claim that the proper data has been sent since
it parses the skb properly.
It is not affected with INET_ESP_OFFLOAD disabled.
Linearize the skb after offloading if the sending hardware requires it.
It was tested on v4, v6 has been adopted.
Fixes: 7785bba299 ("esp: Add a software GRO codepath")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
ASoC: Fixes for v6.4
A couple more fixes for v6.4, one fixing a misleading error log and
another stopping us seeing spurious failures setting the master volume
on some Tegra systems introduced by a change to how we calculate delay
times.
This commit adds new DEVICE_FLG with QUIRK_FLAG_DSD_RAW and Vendor Id for
HEM devices which supports native DSD. Prior to this change Linux kernel
was not enabling native DSD playback for HEM devices, and as a result,
DSD audio was being converted to PCM "on the fly". HEM devices,
when connected to the system, would only play audio in PCM format,
even if the source material was in DSD format. With the addition of new
VENDOR_FLG in the quircks.c file, the devices are now correctly
recognized, and raw DSD data is transmitted to the device,
allowing for native DSD playback.
Signed-off-by: Lukasz Tyl <ltyl@hem-e.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20230614122524.30271-1-ltyl@hem-e.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
As reported in the bugzilla below, the PM resume of a UAC3 device may
fail due to the incomplete power state change, stuck at D1. The
reason is that the driver expects the full D0 power state change only
at hw_params, while the normal PCM resume procedure doesn't call
hw_params.
For fixing the bug, we add the same power state update to D0 at the
prepare callback, which is certainly called by the resume procedure.
Note that, with this change, the power state change in the hw_params
becomes almost redundant, since snd_usb_hw_params() doesn't touch the
parameters (at least it tires so). But dropping it is still a bit
risky (e.g. we have the media-driver binding), so I leave the D0 power
state change in snd_usb_hw_params() as is for now.
Fixes: a0a4959eb4 ("ALSA: usb-audio: Operate UAC3 Power Domains in PCM callbacks")
Cc: <stable@vger.kernel.org>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217539
Link: https://lore.kernel.org/r/20230612132818.29486-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
There are some MD5 tests which fail when the kernel is in FIPS mode,
since MD5 is not FIPS compliant. Add a check and only run those tests
if FIPS mode is not enabled.
Fixes: f0bee1ebb5 ("fcnal-test: Add TCP MD5 tests")
Fixes: 5cad8bce26 ("fcnal-test: Add TCP MD5 tests for VRF")
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Magali Lemes <magali.lemes@canonical.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The vrf-xfrm-tests tests use the hmac(md5) and cbc(des3_ede)
algorithms for performing authentication and encryption, respectively.
This causes the tests to fail when fips=1 is set, since these algorithms
are not allowed in FIPS mode. Therefore, switch from hmac(md5) and
cbc(des3_ede) to hmac(sha1) and cbc(aes), which are FIPS compliant.
Fixes: 3f251d7411 ("selftests: Add tests for vrf and xfrms")
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Magali Lemes <magali.lemes@canonical.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
TLS selftests use the ChaCha20-Poly1305 and SM4 algorithms, which are not
FIPS compliant. When fips=1, this set of tests fails. Add a check and only
run these tests if not in FIPS mode.
Fixes: 4f336e88a8 ("selftests/tls: add CHACHA20-POLY1305 to tls selftests")
Fixes: e506342a03 ("selftests/tls: add SM4 GCM/CCM to tls selftests")
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Magali Lemes <magali.lemes@canonical.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Before executing each test from a fixture, FIXTURE_SETUP is run once.
When SKIP is used in FIXTURE_SETUP, the setup function returns early
but the test still proceeds to run, unless another SKIP macro is used
within the test definition, leading to some code repetition. Therefore,
allow tests to be skipped directly from the setup function.
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Magali Lemes <magali.lemes@canonical.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull networking fixes from Jakub Kicinski:
"Including fixes from wireless, and netfilter.
Selftests excluded - we have 58 patches and diff of +442/-199, which
isn't really small but perhaps with the exception of the WiFi locking
change it's old(ish) bugs.
We have no known problems with v6.4.
The selftest changes are rather large as MPTCP folks try to apply
Greg's guidance that selftest from torvalds/linux should be able to
run against stable kernels.
Last thing I should call out is the DCCP/UDP-lite deprecation notices.
We are fairly sure those are dead, but if we're wrong reverting them
back in won't be fun.
Current release - regressions:
- wifi:
- cfg80211: fix double lock bug in reg_wdev_chan_valid()
- iwlwifi: mvm: spin_lock_bh() to fix lockdep regression
Current release - new code bugs:
- handshake: remove fput() that causes use-after-free
Previous releases - regressions:
- sched: cls_u32: fix reference counter leak leading to overflow
- sched: cls_api: fix lockup on flushing explicitly created chain
Previous releases - always broken:
- nf_tables: integrate pipapo into commit protocol
- nf_tables: incorrect error path handling with NFT_MSG_NEWRULE, fix
dangling pointer on failure
- ping6: fix send to link-local addresses with VRF
- sched: act_pedit: parse L3 header for L4 offset, the skb may not
have the offset saved
- sched: act_ct: fix promotion of offloaded unreplied tuple
- sched: refuse to destroy an ingress and clsact Qdiscs if there are
lockless change operations in flight
- wifi: mac80211: fix handful of bugs in multi-link operation
- ipvlan: fix bound dev checking for IPv6 l3s mode
- eth: enetc: correct the indexes of highest and 2nd highest TCs
- eth: ice: fix XDP memory leak when NIC is brought up and down
Misc:
- add deprecation notices for UDP-lite and DCCP
- selftests: mptcp: skip tests not supported by old kernels
- sctp: handle invalid error codes without calling BUG()"
* tag 'net-6.4-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (91 commits)
dccp: Print deprecation notice.
udplite: Print deprecation notice.
octeon_ep: Add missing check for ioremap
selftests/ptp: Fix timestamp printf format for PTP_SYS_OFFSET
net: ethernet: stmicro: stmmac: fix possible memory leak in __stmmac_open
net: tipc: resize nlattr array to correct size
sfc: fix XDP queues mode with legacy IRQ
net: macsec: fix double free of percpu stats
net: lapbether: only support ethernet devices
MAINTAINERS: add reviewers for SMC Sockets
s390/ism: Fix trying to free already-freed IRQ by repeated ism_dev_exit()
net: dsa: felix: fix taprio guard band overflow at 10Mbps with jumbo frames
net/sched: cls_api: Fix lockup on flushing explicitly created chain
ice: Fix ice module unload
net/handshake: remove fput() that causes use-after-free
selftests: forwarding: hw_stats_l3: Set addrgenmode in a separate step
net/sched: qdisc_destroy() old ingress and clsact Qdiscs before grafting
net/sched: Refactor qdisc_graft() for ingress and clsact Qdiscs
net/sched: act_ct: Fix promotion of offloaded unreplied tuple
wifi: iwlwifi: mvm: spin_lock_bh() to fix lockdep regression
...
Pull device mapper fixes from Mike Snitzer:
- Fix DM thinp discard performance regression introduced during this
merge window where DM core was splitting large discards every 128K
(max_sectors_kb) rather than every 64M (discard_max_bytes).
- Extend DM core LOCKFS fix, made during 6.4 merge, to also fix race
between do_mount and dm's do_suspend (in addition to the earlier
fix's do_mount race with dm's do_resume).
- Fix DM thin metadata operations to first check if the thin-pool is in
"fail_io" mode; otherwise UAF can occur.
- Fix DM thinp's call to __blkdev_issue_discard to use GFP_NOIO rather
than GFP_NOWAIT (__blkdev_issue_discard cannot handle NULL return
from bio_alloc).
* tag 'for-6.4/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm: use op specific max_sectors when splitting abnormal io
dm thin: fix issue_discard to pass GFP_NOIO to __blkdev_issue_discard
dm thin metadata: check fail_io before using data_sm
dm: don't lock fs when the map is NULL during suspend or resume
Pull rdma fixes from Jason Gunthorpe:
"This is an unusually large bunch of bug fixes for the later rc cycle,
rxe and mlx5 both dumped a lot of things at once. rxe continues to fix
itself, and mlx5 is fixing a bunch of "queue counters" related bugs.
There is one highly notable bug fix regarding the qkey. This small
security check was missed in the original 2005 implementation and it
allows some significant issues.
Summary:
- Two rtrs bug fixes for error unwind bugs
- Several rxe bug fixes:
* Incorrect Rx packet validation
* Using memory without a refcount
* Syzkaller found use before initialization
* Regression fix for missing locking with the tasklet conversion
from this merge window
- Have bnxt report the correct link properties to userspace, this was
a regression in v6.3
- Several mlx5 bug fixes:
* Kernel crash triggerable by userspace for the RAW ethernet
profile
* Defend against steering refcounting issues created by userspace
* Incorrect change of QP port affinity parameters in some LAG
configurations
- Fix mlx5 Q counters:
* Do not over allocate Q counters to allow userspace to use the
full port capacity
* Kernel crash triggered by eswitch due to mis-use of Q counters
* Incorrect mlx5_device for Q counters in some LAG configurations
- Properly implement the IBA spec restricting privileged qkeys to
root
- Always an error when reading from a disassociated device's event
queue
- isert bug fixes:
* Avoid a deadlock with the CM handler and CM ID destruction
* Correct list corruption due to incorrect locking
* Fix a use after free around connection tear down"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/rxe: Fix rxe_cq_post
IB/isert: Fix incorrect release of isert connection
IB/isert: Fix possible list corruption in CMA handler
IB/isert: Fix dead lock in ib_isert
RDMA/mlx5: Fix affinity assignment
IB/uverbs: Fix to consider event queue closing also upon non-blocking mode
RDMA/uverbs: Restrict usage of privileged QKEYs
RDMA/cma: Always set static rate to 0 for RoCE
RDMA/mlx5: Fix Q-counters query in LAG mode
RDMA/mlx5: Remove vport Q-counters dependency on normal Q-counters
RDMA/mlx5: Fix Q-counters per vport allocation
RDMA/mlx5: Create an indirect flow table for steering anchor
RDMA/mlx5: Initiate dropless RQ for RAW Ethernet functions
RDMA/rxe: Fix the use-before-initialization error of resp_pkts
RDMA/bnxt_re: Fix reporting active_{speed,width} attributes
RDMA/rxe: Fix ref count error in check_rkey()
RDMA/rxe: Fix packet length checks
RDMA/rtrs: Fix rxe_dealloc_pd warning
RDMA/rtrs: Fix the last iu->buf leak in err path
Pull spi fixes from Mark Brown:
"A few more driver specific fixes.
The DesignWare fix is for an issue introduced by conversion to the
chip select accessor functions and is pretty important but the other
two are less severe"
* tag 'spi-fix-v6.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: dw: Replace incorrect spi_get_chipselect with set
spi: fsl-dspi: avoid SCK glitches with continuous transfers
spi: cadence-quadspi: Add missing check for dma_set_mask
Pull regulator fix from Mark Brown:
"The set of regulators described for the Qualcomm PM8550 just seems to
have been completely wrong and would likely not have worked at all if
anything tried to actually configure anything except for enabling and
disabling at runtime"
* tag 'regulator-fix-v6.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: qcom-rpmh: Fix regulators for PM8550
Pull regmap fix from Mark Brown:
"Another fix for the maple tree cache, Takashi noticed that unlike
other caches the maple tree cache didn't check for read only registers
before trying to sync which would result in spurious syncs for read
only registers where we don't have a default.
This was due to the check being open coded in the caches, we now check
in the shared 'does this register need sync' function so that is fixed
for this and future caches"
* tag 'regmap-fix-v6.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
regmap: regcache: Don't sync read-only registers
Pull media fixes from Mauro Carvalho Chehab:
"A fix for dvb-core to avoid a race condition during DVB board
registration"
* tag 'media/v6.4-6' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
Revert "media: dvb-core: Fix use-after-free on race condition at dvb_frontend"
This seems to have existed for ever but is now more apparant after
commit 9bff18d134 ("drm/ttm: use per BO cleanup workers")
My analysis: two threads are running, one in the irq signalling the
fence, in dma_fence_signal_timestamp_locked, it has done the
DMA_FENCE_FLAG_SIGNALLED_BIT setting, but hasn't yet reached the
callbacks.
The second thread in nouveau_cli_work_ready, where it sees the fence is
signalled, so then puts the fence, cleanups the object and frees the
work item, which contains the callback.
Thread one goes again and tries to call the callback and causes the
use-after-free.
Proposed fix: lock the fence signalled check in nouveau_cli_work_ready,
so either the callbacks are done or the memory is freed.
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Fixes: 11e451e740 ("drm/nouveau: remove fence wait code from deferred client work handler")
Cc: stable@vger.kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://lore.kernel.org/dri-devel/20230615024008.1600281-1-airlied@gmail.com/
Pull smb client fixes from Steve French:
"Eight, mostly small, smb3 client fixes:
- important fix for deferred close oops (race with unmount) found
with xfstest generic/098 to some servers
- important reconnect fix
- fix problem with max_credits mount option
- two multichannel (interface related) fixes
- one trivial removal of confusing comment
- two small debugging improvements (to better spot crediting
problems)"
* tag '6.4-rc6-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: add a warning when the in-flight count goes negative
cifs: fix lease break oops in xfstest generic/098
cifs: fix max_credits implementation
cifs: fix sockaddr comparison in iface_cmp
smb/client: print "Unknown" instead of bogus link speed value
cifs: print all credit counters in DebugData
cifs: fix status checks in cifs_tree_connect
smb: remove obsolete comment
Kuniyuki Iwashima says:
====================
udplite/dccp: Print deprecation notice.
UDP-Lite is assumed to have no users for 7 years, and DCCP is
orphaned for 7 years too.
Let's add deprecation notice and see if anyone responds to it.
====================
Link: https://lore.kernel.org/r/20230614194705.90673-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
DCCP was marked as Orphan in the MAINTAINERS entry 2 years ago in commit
054c4610bd ("MAINTAINERS: dccp: move Gerrit Renker to CREDITS"). It says
we haven't heard from the maintainer for five years, so DCCP is not well
maintained for 7 years now.
Recently DCCP only receives updates for bugs, and major distros disable it
by default.
Removing DCCP would allow for better organisation of TCP fields to reduce
the number of cache lines hit in the fast path.
Let's add a deprecation notice when DCCP socket is created and schedule its
removal to 2025.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Recently syzkaller reported a 7-year-old null-ptr-deref [0] that occurs
when a UDP-Lite socket tries to allocate a buffer under memory pressure.
Someone should have stumbled on the bug much earlier if UDP-Lite had been
used in a real app. Also, we do not always need a large UDP-Lite workload
to hit the bug since UDP and UDP-Lite share the same memory accounting
limit.
Removing UDP-Lite would simplify UDP code removing a bunch of conditionals
in fast path.
Let's add a deprecation notice when UDP-Lite socket is created and schedule
its removal to 2025.
Link: https://lore.kernel.org/netdev/20230523163305.66466-1-kuniyu@amazon.com/ [0]
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Previously, timestamps were printed using "%lld.%u" which is incorrect
for nanosecond values lower than 100,000,000 as they're fractional
digits, therefore leading zeros are meaningful.
This patch changes the format strings to "%lld.%09u" in order to add
leading zeros to the nanosecond value.
Fixes: 568ebc5985 ("ptp: add the PTP_SYS_OFFSET ioctl to the testptp program")
Fixes: 4ec54f9573 ("ptp: Fix compiler warnings in the testptp utility")
Fixes: 6ab0e475f1 ("Documentation: fix misc. warnings")
Signed-off-by: Alex Maftei <alex.maftei@amd.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Link: https://lore.kernel.org/r/20230615083404.57112-1-alex.maftei@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
According to nla_parse_nested_deprecated(), the tb[] is supposed to the
destination array with maxtype+1 elements. In current
tipc_nl_media_get() and __tipc_nl_media_set(), a larger array is used
which is unnecessary. This patch resize them to a proper size.
Fixes: 1e55417d8f ("tipc: add media set to new netlink api")
Fixes: 46f15c6794 ("tipc: add media get/dump to new netlink api")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Tung Nguyen <tung.q.nguyen@dektech.com.au>
Link: https://lore.kernel.org/r/20230614120604.1196377-1-linma@zju.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Split abnormal IO in terms of the corresponding operation specific
max_sectors (max_discard_sectors, max_secure_erase_sectors or
max_write_zeroes_sectors).
This fixes a significant dm-thinp discard performance regression that
was introduced with commit e2dd8aca2d ("dm bio prison v1: improve
concurrent IO performance"). Relative to discard: max_discard_sectors
is used instead of max_sectors; which fixes excessive discard splitting
(e.g. max_sectors=128K vs max_discard_sectors=64M).
Tested by discarding an 1 Petabyte dm-thin device:
lvcreate -V 1125899906842624B -T test/pool -n thin
time blkdiscard /dev/test/thin
Before this fix (splitting discards every 128K): ~116m
After this fix (splitting discards every 64M) : 0m33.460s
Reported-by: Zorro Lang <zlang@redhat.com>
Fixes: 06961c487a ("dm: split discards further if target sets max_discard_granularity")
Requires: 13f6facf3f ("dm: allow targets to require splitting WRITE_ZEROES and SECURE_ERASE")
Fixes: e2dd8aca2d ("dm bio prison v1: improve concurrent IO performance")
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
issue_discard() passes GFP_NOWAIT to __blkdev_issue_discard() despite
its code assuming bio_alloc() always succeeds.
Commit 3dba53a958 ("dm thin: use __blkdev_issue_discard for async
discard support") clearly shows where things went bad:
Before commit 3dba53a958, dm-thin.c's open-coded
__blkdev_issue_discard_async() properly handled using GFP_NOWAIT.
Unfortunately __blkdev_issue_discard() doesn't and it was missed
during review.
Cc: stable@vger.kernel.org
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Must check pmd->fail_io before using pmd->data_sm since
pmd->data_sm may be destroyed by other processes.
P1(kworker) P2(message)
do_worker
process_prepared
process_prepared_discard_passdown_pt2
dm_pool_dec_data_range
pool_message
commit
dm_pool_commit_metadata
↓
// commit failed
metadata_operation_failed
abort_transaction
dm_pool_abort_metadata
__open_or_format_metadata
↓
dm_sm_disk_open
↓
// open failed
// pmd->data_sm is NULL
dm_sm_dec_blocks
↓
// try to access pmd->data_sm --> UAF
As shown above, if dm_pool_commit_metadata() and
dm_pool_abort_metadata() fail in pool_message process, kworker may
trigger UAF.
Fixes: be500ed721 ("dm space maps: improve performance with inc/dec on ranges of blocks")
Cc: stable@vger.kernel.org
Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
As described in commit 38d11da522 ("dm: don't lock fs when the map is
NULL in process of resume"), a deadlock may be triggered between
do_resume() and do_mount().
This commit preserves the fix from commit 38d11da522 but moves it to
where it also serves to fix a similar deadlock between do_suspend()
and do_mount(). It does so, if the active map is NULL, by clearing
DM_SUSPEND_LOCKFS_FLAG in dm_suspend() which is called by both
do_suspend() and do_resume().
Fixes: 38d11da522 ("dm: don't lock fs when the map is NULL in process of resume")
Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Since commit 955fb8719e ("thermal/intel/intel_soc_dts_iosf: Use Intel
TCC library") intel_soc_dts_iosf is reporting the wrong temperature.
The driver expects tj_max to be in milli-degrees-celcius but after
the switch to the TCC library this is now in degrees celcius so
instead of e.g. 90000 it is set to 90 causing a temperature 45
degrees below tj_max to be reported as -44910 milli-degrees
instead of as 45000 milli-degrees.
Fix this by adding back the lost factor of 1000.
Fixes: 955fb8719e ("thermal/intel/intel_soc_dts_iosf: Use Intel TCC library")
Reported-by: Bernhard Krug <b.krug@elektronenpumpe.de>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Acked-by: Zhang Rui <rui.zhang@intel.com>
Cc: 6.3+ <stable@vger.kernel.org> # 6.3+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The addition of might_sleep() to down_timeout() caused the latter to
enable interrupts unconditionally in some cases, which in turn broke
the ACPI S3 wakeup path in acpi_suspend_enter(), where down_timeout()
is called by acpi_disable_all_gpes() via acpi_ut_acquire_mutex().
Namely, if CONFIG_DEBUG_ATOMIC_SLEEP is set, might_sleep() causes
might_resched() to be used and if CONFIG_PREEMPT_VOLUNTARY is set,
this triggers __cond_resched() which may call preempt_schedule_common(),
so __schedule() gets invoked and it ends up with enabled interrupts (in
the prev == next case).
Now, enabling interrupts early in the S3 wakeup path causes the kernel
to crash.
Address this by modifying acpi_suspend_enter() to disable GPEs without
attempting to acquire the sleeping lock which is not needed in that code
path anyway.
Fixes: 99409b935c ("locking/semaphore: Add might_sleep() to down_*() family")
Reported-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: 5.15+ <stable@vger.kernel.org> # 5.15+
In systems without MSI-X capabilities, xdp_txq_queues_mode is calculated
in efx_allocate_msix_channels, but when enabling MSI-X fails, it was not
changed to a proper default value. This was leading to the driver
thinking that it has dedicated XDP queues, when it didn't.
Fix it by setting xdp_txq_queues_mode to the correct value if the driver
fallbacks to MSI or legacy IRQ mode. The correct value is
EFX_XDP_TX_QUEUES_BORROWED because there are no XDP dedicated queues.
The issue can be easily visible if the kernel is started with pci=nomsi,
then a call trace is shown. It is not shown only with sfc's modparam
interrupt_mode=2. Call trace example:
WARNING: CPU: 2 PID: 663 at drivers/net/ethernet/sfc/efx_channels.c:828 efx_set_xdp_channels+0x124/0x260 [sfc]
[...skip...]
Call Trace:
<TASK>
efx_set_channels+0x5c/0xc0 [sfc]
efx_probe_nic+0x9b/0x15a [sfc]
efx_probe_all+0x10/0x1a2 [sfc]
efx_pci_probe_main+0x12/0x156 [sfc]
efx_pci_probe_post_io+0x18/0x103 [sfc]
efx_pci_probe.cold+0x154/0x257 [sfc]
local_pci_probe+0x42/0x80
Fixes: 6215b608a8 ("sfc: last resort fallback for lack of xdp tx queues")
Reported-by: Yanghang Liu <yanghliu@redhat.com>
Signed-off-by: Íñigo Huguet <ihuguet@redhat.com>
Acked-by: Martin Habets <habetsm.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Inside macsec_add_dev() we free percpu macsec->secy.tx_sc.stats and
macsec->stats on some of the memory allocation failure paths. However, the
net_device is already registered to that moment: in macsec_newlink(), just
before calling macsec_add_dev(). This means that during unregister process
its priv_destructor - macsec_free_netdev() - will be called and will free
the stats again.
Remove freeing percpu stats inside macsec_add_dev() because
macsec_free_netdev() will correctly free the already allocated ones. The
pointers to unallocated stats stay NULL, and free_percpu() treats that
correctly.
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
Fixes: 0a28bfd497 ("net/macsec: Add MACsec skb_metadata_dst Tx Data path support")
Fixes: c09440f7dc ("macsec: introduce IEEE 802.1AE driver")
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
adding three people from Alibaba as reviewers for SMC.
They are currently working on improving SMC on other architectures than
s390 and help with reviewing patches on top.
Thank you D. Wythe, Tony Lu and Wen Gu for your contributions and
collaboration and welcome on board as reviewers!
Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>
Signed-off-by: Jan Karcher <jaka@linux.ibm.com>
Acked-by: Tony Lu <tonylu@linux.alibaba.com>
Acked-by: Wen Gu <guwen@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch prevents the system from crashing when unloading the ISM module.
How to reproduce: Attach an ISM device and execute 'rmmod ism'.
Error-Log:
- Trying to free already-free IRQ 0
- WARNING: CPU: 1 PID: 966 at kernel/irq/manage.c:1890 free_irq+0x140/0x540
After calling ism_dev_exit() for each ISM device in the exit routine,
pci_unregister_driver() will execute ism_remove() for each ISM device.
Because ism_remove() also calls ism_dev_exit(),
free_irq(pci_irq_vector(pdev, 0), ism) is called twice for each ISM
device. This results in a crash with the error
'Trying to free already-free IRQ'.
In the exit routine, it is enough to call pci_unregister_driver()
because it ensures that ism_dev_exit() is called once per
ISM device.
Cc: <stable@vger.kernel.org> # 6.3+
Fixes: 89e7d2ba61 ("net/ism: Add new API for client registration")
Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Julian Ruess <julianr@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The debugfs_create_dir() returns ERR_PTR in case of an error and the
correct way of checking it is using the IS_ERR_OR_NULL inline function
rather than the simple null comparision. This patch fixes the issue.
Cc: stable@vger.kernel.org
Suggested-By: Ivan Orlov <ivan.orlov0322@gmail.com>
Signed-off-by: Immad Mir <mirimmad17@gmail.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
The hardware monitoring points for instruction fetching and load/store
operations need to align 4 bytes and 1/2/4/8 bytes respectively.
Reported-by: Colin King <colin.i.king@gmail.com>
Signed-off-by: Qing Zhang <zhangqing@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
When we split a pmd into ptes, pmd_present() and pmd_trans_huge() should
return true, otherwise it would be treated as a swap pmd.
This is the same as arm64 does in commit b65399f611 ("arm64/mm: Change
THP helpers to comply with generic MM semantics"), we also add a new bit
named _PAGE_PRESENT_INVALID for LoongArch.
Signed-off-by: Hongchen Zhang <zhanghongchen@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
The DEV_MAC_MAXLEN_CFG register contains a 16-bit value - up to 65535.
Plus 2 * VLAN_HLEN (4), that is up to 65543.
The picos_per_byte variable is the largest when "speed" is lowest -
SPEED_10 = 10. In that case it is (1000000L * 8) / 10 = 800000.
Their product - 52434400000 - exceeds 32 bits, which is a problem,
because apparently, a multiplication between two 32-bit factors is
evaluated as 32-bit before being assigned to a 64-bit variable.
In fact it's a problem for any MTU value larger than 5368.
Cast one of the factors of the multiplication to u64 to force the
multiplication to take place on 64 bits.
Issue found by Coverity.
Fixes: 55a515b1f5 ("net: dsa: felix: drop oversized frames with tc-taprio instead of hanging the port")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230613170907.2413559-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Mingshuai Ren reports:
When a new chain is added by using tc, one soft lockup alarm will be
generated after delete the prio 0 filter of the chain. To reproduce
the problem, perform the following steps:
(1) tc qdisc add dev eth0 root handle 1: htb default 1
(2) tc chain add dev eth0
(3) tc filter del dev eth0 chain 0 parent 1: prio 0
(4) tc filter add dev eth0 chain 0 parent 1:
Fix the issue by accounting for additional reference to chains that are
explicitly created by RTM_NEWCHAIN message as opposed to implicitly by
RTM_NEWTFILTER message.
Fixes: 726d061286 ("net: sched: prevent insertion of new classifiers during chain flush")
Reported-by: Mingshuai Ren <renmingshuai@huawei.com>
Closes: https://lore.kernel.org/lkml/87legswvi3.fsf@nvidia.com/T/
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Link: https://lore.kernel.org/r/20230612093426.2867183-1-vladbu@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2023-06-12 (igc, igb)
This series contains updates to igc and igb drivers.
Husaini clears Tx rings when interface is brought down for igc.
Vinicius disables PTM and PCI busmaster when removing igc driver.
Alex adds error check and path for NVM read error on igb.
* '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
igb: fix nvm.ops.read() error handling
igc: Fix possible system crash when loading module
igc: Clean the TX buffer and TX descriptor ring
====================
Link: https://lore.kernel.org/r/20230612205208.115292-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
A reference underflow is found in TLS handshake subsystem that causes a
direct use-after-free. Part of the crash log is like below:
[ 2.022114] ------------[ cut here ]------------
[ 2.022193] refcount_t: underflow; use-after-free.
[ 2.022288] WARNING: CPU: 0 PID: 60 at lib/refcount.c:28 refcount_warn_saturate+0xbe/0x110
[ 2.022432] Modules linked in:
[ 2.022848] RIP: 0010:refcount_warn_saturate+0xbe/0x110
[ 2.023231] RSP: 0018:ffffc900001bfe18 EFLAGS: 00000286
[ 2.023325] RAX: 0000000000000000 RBX: 0000000000000007 RCX: 00000000ffffdfff
[ 2.023438] RDX: 0000000000000000 RSI: 00000000ffffffea RDI: 0000000000000001
[ 2.023555] RBP: ffff888004c20098 R08: ffffffff82b392c8 R09: 00000000ffffdfff
[ 2.023693] R10: ffffffff82a592e0 R11: ffffffff82b092e0 R12: ffff888004c200d8
[ 2.023813] R13: 0000000000000000 R14: ffff888004c20000 R15: ffffc90000013ca8
[ 2.023930] FS: 0000000000000000(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
[ 2.024062] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2.024161] CR2: ffff888003601000 CR3: 0000000002a2e000 CR4: 00000000000006f0
[ 2.024275] Call Trace:
[ 2.024322] <TASK>
[ 2.024367] ? __warn+0x7f/0x130
[ 2.024430] ? refcount_warn_saturate+0xbe/0x110
[ 2.024513] ? report_bug+0x199/0x1b0
[ 2.024585] ? handle_bug+0x3c/0x70
[ 2.024676] ? exc_invalid_op+0x18/0x70
[ 2.024750] ? asm_exc_invalid_op+0x1a/0x20
[ 2.024830] ? refcount_warn_saturate+0xbe/0x110
[ 2.024916] ? refcount_warn_saturate+0xbe/0x110
[ 2.024998] __tcp_close+0x2f4/0x3d0
[ 2.025065] ? __pfx_kunit_generic_run_threadfn_adapter+0x10/0x10
[ 2.025168] tcp_close+0x1f/0x70
[ 2.025231] inet_release+0x33/0x60
[ 2.025297] sock_release+0x1f/0x80
[ 2.025361] handshake_req_cancel_test2+0x100/0x2d0
[ 2.025457] kunit_try_run_case+0x4c/0xa0
[ 2.025532] kunit_generic_run_threadfn_adapter+0x15/0x20
[ 2.025644] kthread+0xe1/0x110
[ 2.025708] ? __pfx_kthread+0x10/0x10
[ 2.025780] ret_from_fork+0x2c/0x50
One can enable CONFIG_NET_HANDSHAKE_KUNIT_TEST config to reproduce above
crash.
The root cause of this bug is that the commit 1ce77c998f
("net/handshake: Unpin sock->file if a handshake is cancelled") adds one
additional fput() function. That patch claims that the fput() is used to
enable sock->file to be freed even when user space never calls DONE.
However, it seems that the intended DONE routine will never give an
additional fput() of ths sock->file. The existing two of them are just
used to balance the reference added in sockfd_lookup().
This patch revert the mentioned commit to avoid the use-after-free. The
patched kernel could successfully pass the KUNIT test and boot to shell.
[ 0.733613] # Subtest: Handshake API tests
[ 0.734029] 1..11
[ 0.734255] KTAP version 1
[ 0.734542] # Subtest: req_alloc API fuzzing
[ 0.736104] ok 1 handshake_req_alloc NULL proto
[ 0.736114] ok 2 handshake_req_alloc CLASS_NONE
[ 0.736559] ok 3 handshake_req_alloc CLASS_MAX
[ 0.737020] ok 4 handshake_req_alloc no callbacks
[ 0.737488] ok 5 handshake_req_alloc no done callback
[ 0.737988] ok 6 handshake_req_alloc excessive privsize
[ 0.738529] ok 7 handshake_req_alloc all good
[ 0.739036] # req_alloc API fuzzing: pass:7 fail:0 skip:0 total:7
[ 0.739444] ok 1 req_alloc API fuzzing
[ 0.740065] ok 2 req_submit NULL req arg
[ 0.740436] ok 3 req_submit NULL sock arg
[ 0.740834] ok 4 req_submit NULL sock->file
[ 0.741236] ok 5 req_lookup works
[ 0.741621] ok 6 req_submit max pending
[ 0.741974] ok 7 req_submit multiple
[ 0.742382] ok 8 req_cancel before accept
[ 0.742764] ok 9 req_cancel after accept
[ 0.743151] ok 10 req_cancel after done
[ 0.743510] ok 11 req_destroy works
[ 0.743882] # Handshake API tests: pass:11 fail:0 skip:0 total:11
[ 0.744205] # Totals: pass:17 fail:0 skip:0 total:17
Acked-by: Chuck Lever <chuck.lever@oracle.com>
Fixes: 1ce77c998f ("net/handshake: Unpin sock->file if a handshake is cancelled")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Link: https://lore.kernel.org/r/20230613083204.633896-1-linma@zju.edu.cn
Link: https://lore.kernel.org/r/20230614015249.987448-1-linma@zju.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Johannes Berg says:
====================
A couple of straggler fixes, mostly in the stack:
- fix fragmentation for multi-link related elements
- fix callback copy/paste error
- fix multi-link locking
- remove double-locking of wiphy mutex
- transmit only on active links, not all
- activate links in the correct order
- don't remove links that weren't added
- disable soft-IRQs for LQ lock in iwlwifi
* tag 'wireless-2023-06-14' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
wifi: iwlwifi: mvm: spin_lock_bh() to fix lockdep regression
wifi: mac80211: fragment per STA profile correctly
wifi: mac80211: Use active_links instead of valid_links in Tx
wifi: cfg80211: remove links only on AP
wifi: mac80211: take lock before setting vif links
wifi: cfg80211: fix link del callback to call correct handler
wifi: mac80211: fix link activation settings order
wifi: cfg80211: fix double lock bug in reg_wdev_chan_valid()
====================
Link: https://lore.kernel.org/r/20230614075502.11765-1-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This reverts commit ad3f09be6c.
The reverted commit was intended to simpfy the code to get group
descriptor block number in non-meta block group by assuming
s_gdb_count is block number used for all non-meta block group descriptors.
However s_gdb_count is block number used for all meta *and* non-meta
group descriptors. So s_gdb_group will be > actual group descriptor block
number used for all non-meta block group which should be "total non-meta
block group" / "group descriptors per block", e.g. s_first_meta_bg.
Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Link: https://lore.kernel.org/r/20230613225025.3859522-1-shikemeng@huaweicloud.com
Fixes: ad3f09be6c ("ext4: remove unnecessary check in ext4_bg_num_gdb_nometa")
Cc: stable@kernel.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
In the error exits in target_setup_session(), if a branch is taken to
free_sess: transport_free_session() may call to target_free_cmd_counter()
and then fall through to call target_free_cmd_counter() a second time.
This can, and does, sometimes cause seg faults since the data field in
cmd_cnt->refcnt has been freed in the first call.
Fix this problem by simply returning after the call to
transport_free_session(). The second call is redundant for those cases.
Fixes: 4edba7e4a8 ("scsi: target: Move cmd counter allocation")
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Link: https://lore.kernel.org/r/20230613144259.12890-1-rpearsonhpe@gmail.com
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Hyper-V synthetic SCSI devices do not support the MAINTENANCE_IN SCSI
command, so scsi_report_opcode() always fails, resulting in messages like
this:
hv_storvsc <guid>: tag#205 cmd 0xa3 status: scsi 0x2 srb 0x86 hv 0xc0000001
The recently added support for command duration limits calls
scsi_report_opcode() four times as each device comes online, which
significantly increases the number of messages logged in a system with many
disks.
Fix the problem by always marking Hyper-V synthetic SCSI devices as not
supporting scsi_report_opcode(). With this setting, the MAINTENANCE_IN SCSI
command is not issued and no messages are logged.
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/1686343101-18930-1-git-send-email-mikelley@microsoft.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
A recent fix stopped clearing PF_IO_WORKER from current->flags on exit,
which meant that we can now call inc/dec running on the worker after it
has been removed if it ends up scheduling in/out as part of exit.
If this happens after an RCU grace period has passed, then the struct
pointed to by current->worker_private may have been freed, and we can
now be accessing memory that is freed.
Ensure this doesn't happen by clearing the task worker_private field.
Both io_wq_worker_running() and io_wq_worker_sleeping() check this
field before going any further, and we don't need any accounting etc
done after this worker has exited.
Fixes: fd37b88400 ("io_uring/io-wq: don't clear PF_IO_WORKER on exit")
Reported-by: Zorro Lang <zlang@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Now user_events auto-cleanup upon the last reference by default. This
makes it not possible to use the dynamics event file via tracefs.
Document that auto-cleanup is enabled by default and remove the refernce
to /sys/kernel/tracing/dynamic_events file to make this clear.
Link: https://lkml.kernel.org/r/20230614163336.5797-7-beaub@linux.microsoft.com
Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Now that user_events does not honor persist events the dynamic_events
file cannot be easily used to test parsing and matching cases.
Update dyn_test to use the direct ABI file instead of dynamic_events so
that we still have testing coverage until persist events and
dynamic_events file integration has been decided.
Link: https://lkml.kernel.org/r/20230614163336.5797-6-beaub@linux.microsoft.com
Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Currently user events need to be manually deleted via the delete IOCTL
call or via the dynamic_events file. Most operators and processes wish
to have these events auto cleanup when they are no longer used by
anything to prevent them piling without manual maintenance. However,
some operators may not want this, such as pre-registering events via the
dynamic_events tracefs file.
Update user_event_put() to attempt an auto delete of the event if it's
the last reference. The auto delete must run in a work queue to ensure
proper behavior of class->reg() invocations that don't expect the call
to go away from underneath them during the unregister. Add work_struct
to user_event struct to ensure we can do this reliably.
Add a persist flag, that is not yet exposed, to ensure we can toggle
between auto-cleanup and leaving the events existing in the future. When
a non-zero flag is seen during register, return -EINVAL to ensure ABI
is clear for the user processes while we work out the best approach for
persistent events.
Link: https://lkml.kernel.org/r/20230614163336.5797-4-beaub@linux.microsoft.com
Link: https://lore.kernel.org/linux-trace-kernel/20230518093600.3f119d68@rorschach.local.home/
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Various parts of the code today track user_event's refcnt field directly
via a refcount_add/dec. This makes it hard to modify the behavior of the
last reference decrement in all code paths consistently. For example, in
the future we will auto-delete events upon the last reference going
away. This last reference could happen in many places, but we want it to
be consistently handled.
Add user_event_get() and user_event_put() for the add/dec. Update all
places where direct refcounts are being used to utilize these new
functions. In each location pass if event_mutex is locked or not. This
allows us to drop events automatically in future patches clearly. Ensure
when caller states the lock is held, it really is (or is not) held.
Link: https://lkml.kernel.org/r/20230614163336.5797-3-beaub@linux.microsoft.com
Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Currently we don't have any available flags for user processes to use to
indicate options for user_events. We will soon have a flag to indicate
the event should or should not auto-delete once it's not being used by
anyone.
Add a reg_flags field to user_events and parameters to existing
functions to allow for this in future patches.
Link: https://lkml.kernel.org/r/20230614163336.5797-2-beaub@linux.microsoft.com
Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
A recent patch replaced a tasklet execution of cq->comp_handler by a
direct call. While this made sense it let changes to cq->notify state be
unprotected and assumed that the cq completion machinery and the ulp done
callbacks were reentrant. The result is that in some cases completion
events can be lost. This patch moves the cq->comp_handler call inside of
the spinlock in rxe_cq_post which solves both issues. This is compatible
with the matching code in the request notify verb.
Fixes: 78b26a3353 ("RDMA/rxe: Remove tasklet call from rxe_cq.c")
Link: https://lore.kernel.org/r/20230612155032.17036-1-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
[RETURN VALUE OVERWRITE]
Inside scrub_stripe(), we would submit all the remaining stripes after
iterating all extents.
But since flush_scrub_stripes() can return error, we need to avoid
overwriting the existing @ret if there is any error.
However the existing check is doing the wrong check:
ret2 = flush_scrub_stripes();
if (!ret2)
ret = ret2;
This would overwrite the existing @ret to 0 as long as the final flush
detects no critical errors.
[FIX]
We should check @ret other than @ret2 in that case.
Fixes: 8eb3dd17ea ("btrfs: dev-replace: error out if we have unrepaired metadata error during")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We've seen the in-flight count go into negative with some
internal stress testing in Microsoft.
Adding a WARN when this happens, in hope of understanding
why this happens when it happens.
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Reviewed-by: Bharath SM <bharathsm@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
umount can race with lease break so need to check if
tcon->ses->server is still valid to send the lease
break response.
Reviewed-by: Bharath SM <bharathsm@microsoft.com>
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Fixes: 59a556aebc ("SMB3: drop reference to cfile before sending oplock break")
Signed-off-by: Steve French <stfrench@microsoft.com>
Palmer suggested at some point, not sure if it was in one of the
weekly linux-riscv syncs, or a conversation at FOSDEM, that we
should document the role of the automation running on our patchwork
instance plays in patch acceptance.
Add a short note to the patch-acceptance document to that end.
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20230606-rehab-monsoon-12c17bbe08e3@wendy
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Setting the IPv6 address generation mode of a net device during its
creation never worked, but after commit b0ad3c1790 ("rtnetlink: call
validate_linkmsg in rtnl_create_link") it explicitly fails [1]. The
failure is caused by the fact that validate_linkmsg() is called before
the net device is registered, when it still does not have an 'inet6_dev'.
Likewise, raising the net device before setting the address generation
mode is meaningless, because by the time the mode is set, the address
has already been generated.
Therefore, fix the test to first create the net device, then set its
IPv6 address generation mode and finally bring it up.
[1]
# ip link add name mydev addrgenmode eui64 type dummy
RTNETLINK answers: Address family not supported by protocol
Fixes: ba95e79309 ("selftests: forwarding: hw_stats_l3: Add a new test")
Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://lore.kernel.org/r/f3b05d85b2bc0c3d6168fe8f7207c6c8365703db.1686580046.git.petrm@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
mini_Qdisc_pair::p_miniq is a double pointer to mini_Qdisc, initialized
in ingress_init() to point to net_device::miniq_ingress. ingress Qdiscs
access this per-net_device pointer in mini_qdisc_pair_swap(). Similar
for clsact Qdiscs and miniq_egress.
Unfortunately, after introducing RTNL-unlocked RTM_{NEW,DEL,GET}TFILTER
requests (thanks Hillf Danton for the hint), when replacing ingress or
clsact Qdiscs, for example, the old Qdisc ("@old") could access the same
miniq_{in,e}gress pointer(s) concurrently with the new Qdisc ("@new"),
causing race conditions [1] including a use-after-free bug in
mini_qdisc_pair_swap() reported by syzbot:
BUG: KASAN: slab-use-after-free in mini_qdisc_pair_swap+0x1c2/0x1f0 net/sched/sch_generic.c:1573
Write of size 8 at addr ffff888045b31308 by task syz-executor690/14901
...
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xd9/0x150 lib/dump_stack.c:106
print_address_description.constprop.0+0x2c/0x3c0 mm/kasan/report.c:319
print_report mm/kasan/report.c:430 [inline]
kasan_report+0x11c/0x130 mm/kasan/report.c:536
mini_qdisc_pair_swap+0x1c2/0x1f0 net/sched/sch_generic.c:1573
tcf_chain_head_change_item net/sched/cls_api.c:495 [inline]
tcf_chain0_head_change.isra.0+0xb9/0x120 net/sched/cls_api.c:509
tcf_chain_tp_insert net/sched/cls_api.c:1826 [inline]
tcf_chain_tp_insert_unique net/sched/cls_api.c:1875 [inline]
tc_new_tfilter+0x1de6/0x2290 net/sched/cls_api.c:2266
...
@old and @new should not affect each other. In other words, @old should
never modify miniq_{in,e}gress after @new, and @new should not update
@old's RCU state.
Fixing without changing sch_api.c turned out to be difficult (please
refer to Closes: for discussions). Instead, make sure @new's first call
always happen after @old's last call (in {ingress,clsact}_destroy()) has
finished:
In qdisc_graft(), return -EBUSY if @old has any ongoing filter requests,
and call qdisc_destroy() for @old before grafting @new.
Introduce qdisc_refcount_dec_if_one() as the counterpart of
qdisc_refcount_inc_nz() used for filter requests. Introduce a
non-static version of qdisc_destroy() that does a TCQ_F_BUILTIN check,
just like qdisc_put() etc.
Depends on patch "net/sched: Refactor qdisc_graft() for ingress and
clsact Qdiscs".
[1] To illustrate, the syzkaller reproducer adds ingress Qdiscs under
TC_H_ROOT (no longer possible after commit c7cfbd1150 ("net/sched:
sch_ingress: Only create under TC_H_INGRESS")) on eth0 that has 8
transmission queues:
Thread 1 creates ingress Qdisc A (containing mini Qdisc a1 and a2),
then adds a flower filter X to A.
Thread 2 creates another ingress Qdisc B (containing mini Qdisc b1 and
b2) to replace A, then adds a flower filter Y to B.
Thread 1 A's refcnt Thread 2
RTM_NEWQDISC (A, RTNL-locked)
qdisc_create(A) 1
qdisc_graft(A) 9
RTM_NEWTFILTER (X, RTNL-unlocked)
__tcf_qdisc_find(A) 10
tcf_chain0_head_change(A)
mini_qdisc_pair_swap(A) (1st)
|
| RTM_NEWQDISC (B, RTNL-locked)
RCU sync 2 qdisc_graft(B)
| 1 notify_and_destroy(A)
|
tcf_block_release(A) 0 RTM_NEWTFILTER (Y, RTNL-unlocked)
qdisc_destroy(A) tcf_chain0_head_change(B)
tcf_chain0_head_change_cb_del(A) mini_qdisc_pair_swap(B) (2nd)
mini_qdisc_pair_swap(A) (3rd) |
... ...
Here, B calls mini_qdisc_pair_swap(), pointing eth0->miniq_ingress to
its mini Qdisc, b1. Then, A calls mini_qdisc_pair_swap() again during
ingress_destroy(), setting eth0->miniq_ingress to NULL, so ingress
packets on eth0 will not find filter Y in sch_handle_ingress().
This is just one of the possible consequences of concurrently accessing
miniq_{in,e}gress pointers.
Fixes: 7a096d579e ("net: sched: ingress: set 'unlocked' flag for Qdisc ops")
Fixes: 87f373921c ("net: sched: ingress: set 'unlocked' flag for clsact Qdisc ops")
Reported-by: syzbot+b53a9c0d1ea4ad62da8b@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/r/0000000000006cf87705f79acf1a@google.com/
Cc: Hillf Danton <hdanton@sina.com>
Cc: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Currently UNREPLIED and UNASSURED connections are added to the nf flow
table. This causes the following connection packets to be processed
by the flow table which then skips conntrack_in(), and thus such the
connections will remain UNREPLIED and UNASSURED even if reply traffic
is then seen. Even still, the unoffloaded reply packets are the ones
triggering hardware update from new to established state, and if
there aren't any to triger an update and/or previous update was
missed, hardware can get out of sync with sw and still mark
packets as new.
Fix the above by:
1) Not skipping conntrack_in() for UNASSURED packets, but still
refresh for hardware, as before the cited patch.
2) Try and force a refresh by reply-direction packets that update
the hardware rules from new to established state.
3) Remove any bidirectional flows that didn't failed to update in
hardware for re-insertion as bidrectional once any new packet
arrives.
Fixes: 6a9bad0069 ("net/sched: act_ct: offload UDP NEW connections")
Co-developed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/1686313379-117663-1-git-send-email-paulb@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Lockdep on 6.4-rc on ThinkPad X1 Carbon 5th says
=====================================================
WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected
6.4.0-rc5 #1 Not tainted
-----------------------------------------------------
kworker/3:1/49 [HC0[0]:SC0[4]:HE1:SE0] is trying to acquire:
ffff8881066fa368 (&mvm_sta->deflink.lq_sta.rs_drv.pers.lock){+.+.}-{2:2}, at: rs_drv_get_rate+0x46/0xe7
and this task is already holding:
ffff8881066f80a8 (&sta->rate_ctrl_lock){+.-.}-{2:2}, at: rate_control_get_rate+0xbd/0x126
which would create a new lock dependency:
(&sta->rate_ctrl_lock){+.-.}-{2:2} -> (&mvm_sta->deflink.lq_sta.rs_drv.pers.lock){+.+.}-{2:2}
but this new dependency connects a SOFTIRQ-irq-safe lock:
(&sta->rate_ctrl_lock){+.-.}-{2:2}
etc. etc. etc.
Changing the spin_lock() in rs_drv_get_rate() to spin_lock_bh() was not
enough to pacify lockdep, but changing them all on pers.lock has worked.
Fixes: a8938bc881 ("wifi: iwlwifi: mvm: Add locking to the rate read flow")
Signed-off-by: Hugh Dickins <hughd@google.com>
Link: https://lore.kernel.org/r/79ffcc22-9775-cb6d-3ffd-1a517c40beef@google.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Some qdiscs and classifiers have recently been retired from kernel.
However, tc-testing config is still cluttered with them which causes noise
when using merge_config.sh script to update existing config for tc-testing
compatibility. Remove the config settings for affected qdiscs and
classifiers.
Fixes: fb38306ceb ("net/sched: Retire ATM qdisc")
Fixes: 051d442098 ("net/sched: Retire CBQ qdisc")
Fixes: bbe77c14ee ("net/sched: Retire dsmark qdisc")
Fixes: 265b4da82d ("net/sched: Retire rsvp classifier")
Fixes: 8c710f7525 ("net/sched: Retire tcindex classifier")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Setting very small value of db like 10ms introduces rounding errors when
converting to/from jiffies on some kernel configs. For example, on 250hz
the actual value will be set to 12ms which causes the test to fail:
# $ sudo ./tdc.py -d eth2 -e 3410
# -- ns/SubPlugin.__init__
# Test 3410: Create SFB with db setting
#
# All test results:
#
# 1..1
# not ok 1 3410 - Create SFB with db setting
# Could not match regex pattern. Verify command output:
# qdisc sfb 1: root refcnt 2 rehash 600s db 12ms limit 1000p max 25p target 20p increment 0.000503548 decrement 4.57771e-05 penalty_rate 10pps penalty_burst 20p
Set the value to 100ms instead which currently seem to work on 100hz,
250hz, 300hz and 1000hz kernel configs.
Fixes: 6ad92dc56f ("selftests/tc-testing: add selftests for sfb qdisc")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If the application sets ->msg_control and we have to later retry this
command, or if it got queued with IOSQE_ASYNC to begin with, then we
need to retain the original msg_control value. This is due to the net
stack overwriting this field with an in-kernel pointer, to copy it
in. Hitting that path for the second time will now fail the copy from
user, as it's attempting to copy from a non-user address.
Cc: stable@vger.kernel.org # 5.10+
Link: https://github.com/axboe/liburing/issues/880
Reported-and-tested-by: Marek Majkowski <marek@cloudflare.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Krister Johansen says:
====================
Hi,
Enclosed are a pair of patches for an oops that can occur if an exception is
generated while a bpf subprogram is running. One of the bpf_prog_aux entries
for the subprograms are missing an extable. This can lead to an exception that
would otherwise be handled turning into a NULL pointer bug.
These changes were tested via the verifier and progs selftests and no
regressions were observed.
Changes from v4:
- Ensure that num_exentries is copied to prog->aux from func[0] (Feedback from
Ilya Leoshkevich)
Changes from v3:
- Selftest style fixups (Feedback from Yonghong Song)
- Selftest needs to assert that test bpf program executed (Feedback from
Yonghong Song)
- Selftest should combine open and load using open_and_load (Feedback from
Yonghong Song)
Changes from v2:
- Insert only the main program's kallsyms (Feedback from Yonghong Song and
Alexei Starovoitov)
- Selftest should use ASSERT instead of CHECK (Feedback from Yonghong Song)
- Selftest needs some cleanup (Feedback from Yonghong Song)
- Switch patch order (Feedback from Alexei Starovoitov)
Changes from v1:
- Add a selftest (Feedback From Alexei Starovoitov)
- Move to a 1-line verifier change instead of searching multiple extables
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
When subprograms are in use, the main program is not jit'd after the
subprograms because jit_subprogs sets a value for prog->bpf_func upon
success. Subsequent calls to the JIT are bypassed when this value is
non-NULL. This leads to a situation where the main program and its
func[0] counterpart are both in the bpf kallsyms tree, but only func[0]
has an extable. Extables are only created during JIT. Now there are
two nearly identical program ksym entries in the tree, but only one has
an extable. Depending upon how the entries are placed, there's a chance
that a fault will call search_extable on the aux with the NULL entry.
Since jit_subprogs already copies state from func[0] to the main
program, include the extable pointer in this state duplication.
Additionally, ensure that the copy of the main program in func[0] is not
added to the bpf_prog_kallsyms table. Instead, let the main program get
added later in bpf_prog_load(). This ensures there is only a single
copy of the main program in the kallsyms table, and that its tag matches
the tag observed by tooling like bpftool.
Cc: stable@vger.kernel.org
Fixes: 1c2a088a66 ("bpf: x64: add JIT support for multi-function programs")
Signed-off-by: Krister Johansen <kjlx@templeofstupid.com>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
Tested-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/r/6de9b2f4b4724ef56efbb0339daaa66c8b68b1e7.1686616663.git.kjlx@templeofstupid.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The given value of 1518 seems to refer to the layer 2 ethernet frame
size without 802.1Q tag. Actual use of the "max-frame-size" including in
the consumer of the "altr,tse-1.0" compatible is the MTU.
Fixes: 95acd4c7b6 ("nios2: Device tree support")
Fixes: 61c610ec61 ("nios2: Add Max10 device tree")
Cc: <stable@vger.kernel.org>
Signed-off-by: Janne Grunau <j@jannau.net>
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
This reverts commit c105518679.
This patch disables the TOPDOWN flag for APU and few dGPU cards
which has the VRAM size equal to the BAR size.
When we enable the TOPDOWN flag, we get the free blocks at
the highest available memory region and we don't split the
lower order blocks. This change is required to keep off
the fragmentation related issues particularly in ASIC
which has VRAM space <= 500MiB
Hence, we are reverting this patch.
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2270
Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Only vcn0 can process AV1 codecx. In order to use both vcn0 and
vcn1 in h264/265 transcode to AV1 cases, set vcn0 sched score to 1
at initialization time.
Signed-off-by: Sonny Jiang <sonjiang@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.1.x
On smu 13.0.0, the compute workload type cannot be set on all the skus
due to some other problems. This workaround is to make sure compute workload type
can also run on some specific skus.
v2: keep the variable consistent
Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Acked-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.1.x
Non-root users shouldn't be able to try to trigger a VBIOS flash
or query the flashing status. This should be reserved for users with the
appropriate permissions.
Cc: stable@vger.kernel.org
Fixes: 8424f2ccb3 ("drm/amdgpu/psp: Add vbflash sysfs interface support")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The VBIOS image update flow requires userspace to:
1) Write the image to `psp_vbflash`
2) Read `psp_vbflash`
3) Poll `psp_vbflash_status` to check for completion
If userspace reads `psp_vbflash` before writing an image, it's
possible that it causes problems that can put the dGPU into an invalid
state.
Explicitly check that an image has been written before letting a read
succeed.
Cc: stable@vger.kernel.org
Fixes: 8424f2ccb3 ("drm/amdgpu/psp: Add vbflash sysfs interface support")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
When the preempted IB frame resubmitted to cp, we need to modify the frame
data including:
1. set PRE_RESUME 1 in CONTEXT_CONTROL.
2. use meta data(DE and CE) read from CSA in WRITE_DATA.
Add functions to save the location the first time IBs emitted and callback
to patch the package when resubmission happens.
Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.3.x
When MEC executes unmap_queue for mid command buffer preemption, it will
kick the write pointer of the gfx ring, set CP_VMID_PREEMPT to trigger the
preemption and wait for CP_VMID_PREEMPT becomes zero after the preemption
done. There is a race condition that PFP may excute the resetting command
before MEC set CP_VMID_PREEMPT. As a result, hang happens as
CP_VMID_PREEMPT is always 0xffff.
To avoid this, we send resetting CP_VMID_PREEMPT command after the trailing
fence is siganled and update gfx write pointer explicitly.
Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.3.x
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2535
Add checking for NULL before calling nouveau_connector_detect_depth() in
nouveau_connector_get_modes() function because nv_connector->native_mode
could be dereferenced there since connector pointer passed to
nouveau_connector_detect_depth() and the same value of
nv_connector->native_mode is used there.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: d4c2c99bdc ("drm/nouveau/dp: remove broken display depth function, use the improved one")
Signed-off-by: Natalia Petrova <n.petrova@fintech.ru>
Reviewed-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Lyude Paul <lyude@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230512111526.82408-1-n.petrova@fintech.ru
Add the missing check for platform_get_irq() and return error code
if it fails.
The returned error code will be dealed with in
builtin_platform_driver(sifive_gpio_driver) and the driver will not
be registered.
Fixes: f52d6d8b43 ("gpio: sifive: To get gpio irq offset from device tree data")
Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn>
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
The "qcom,paired" schema is all wrong. First, it's a list rather than an
object(dictionary). Second, it is missing a required type. The meta-schema
normally catches this, but schemas under "$defs" was not getting checked.
A fix for that is pending.
Fixes: f9a06b8109 ("dt-bindings: pinctrl: qcom,pmic-mpp: Convert qcom pmic mpp bindings to YAML")
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20230418150606.1528107-1-robh@kernel.org
Signed-off-by: Rob Herring <robh@kernel.org>
In case of gpio-regmap, IRQ chip is added by regmap-irq and associated with
GPIO chip by gpiochip_irqchip_add_domain(). The initialization flag was not
added in gpiochip_irqchip_add_domain(), causing gpiochip_to_irq() to return
-EPROBE_DEFER.
Fixes: 5467801f1f ("gpio: Restrict usage of GPIO chip irq members before initialization")
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
regcache_maple_sync() tries to sync all cached values no matter
whether it's writable or not. OTOH, regache_sync_val() does care the
wrtability and returns -EIO for a read-only register. This results in
an error message like:
snd_hda_codec_realtek hdaudioC0D0: Unable to sync register 0x2f0009. -5
and the sync loop is aborted incompletely.
This patch adds the writable register check to regcache_sync_val() for
addressing the bug above.
Note that, although we may add the check in the caller side
(regcache_maple_sync()), here we put in regcache_sync_val(), so that a
similar case like this can be avoided in future.
Fixes: f033c26de5 ("regmap: Add maple tree based register cache")
Link: https://lore.kernel.org/r/877cs7g6f1.wl-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: https://lore.kernel.org/r/20230613112240.3361-1-tiwai@suse.de
Signed-off-by: Mark Brown <broonie@kernel.org>
Commit 3ed2b549b3 ("ALSA: pcm: fix wait_time calculations") corrected
the PCM wait_time calculations and in doing so reduced the calculated
wait_time. This exposed an issue with the Tegra Master Volume Control
(MVC) device where the reduced wait_time caused the MVC to fail. For now
fix this by setting the default wait_time for Tegra to be 500ms.
Fixes: 3ed2b549b3 ("ALSA: pcm: fix wait_time calculations")
Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Link: https://lore.kernel.org/r/20230613093453.13927-1-jonathanh@nvidia.com
Signed-off-by: Mark Brown <broonie@kernel.org>
LS1028A is using DMA with LPUART. Having RX watermark set to 1, means
DMA transactions are started only after receiving the second character.
On other platforms with newer LPUART IP, Receiver Idle Empty function
initiates the DMA request after the receiver is idling for 4 characters.
But this feature is missing on LS1028A, which is causing a 1-character
delay in the RX direction on this platform.
Set RX watermark to 0 to initiate RX DMA after each character.
Link: https://lore.kernel.org/linux-serial/20230607103459.1222426-1-robert.hodaszi@digi.com/
Fixes: 9ad9df8447 ("tty: serial: fsl_lpuart: Fix the wrong RXWATER setting for rx dma case")
Cc: stable <stable@kernel.org>
Signed-off-by: Robert Hodaszi <robert.hodaszi@digi.com>
Message-ID: <20230609121334.1878626-1-robert.hodaszi@digi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The call site of nouveau_dsm_pci_probe() uses single set of output
variables for all invocations. So, we must not write anything to them
unless it's an NVIDIA device. Otherwise, if we are called with another
device after the NVIDIA device, we'll clober the result of the NVIDIA
device.
For example, if the other device doesn't have _PR3 resources, the
detection later would miss the presence of power resource support, and
the rest of the code will keep using Optimus DSM, breaking power
management for that machine.
Also, because we're detecting NVIDIA's DSM, it doesn't make sense to run
this detection on a non-NVIDIA device anyway. Thus, check at the
beginning of the detection code if this is an NVIDIA card, and just
return if it isn't.
This, together with commit d22915d22d ("drm/nouveau/devinit/tu102-:
wait for GFW_BOOT_PROGRESS == COMPLETED") developed independently and
landed earlier, fixes runtime power management of the NVIDIA card in
Lenovo Legion 5-15ARH05. Without this patch, the GPU resumption code
will "timeout", sometimes hanging userspace.
As a bonus, we'll also stop preventing _PR3 usage from the bridge for
unrelated devices, which is always nice, I guess.
Fixes: ccfc2d5cdb ("drm/nouveau: Use generic helper to check _PR3 presence")
Signed-off-by: Ratchanan Srirattanamet <peathot@hotmail.com>
Closes: https://gitlab.freedesktop.org/drm/nouveau/-/issues/79
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/DM6PR19MB2780805D4BE1E3F9B3AC96D0BC409@DM6PR19MB2780.namprd19.prod.outlook.com
usb_udc_connect_control(), soft_connect_store() and
usb_gadget_deactivate() can potentially race against each other to invoke
usb_gadget_connect()/usb_gadget_disconnect(). To prevent this, guard
udc->started, gadget->allow_connect, gadget->deactivate and
gadget->connect with connect_lock so that ->pullup() is only invoked when
the gadget is bound, started and not deactivated. The routines
usb_gadget_connect_locked(), usb_gadget_disconnect_locked(),
usb_udc_connect_control_locked(), usb_gadget_udc_start_locked(),
usb_gadget_udc_stop_locked() are called with this lock held.
An earlier version of this commit was reverted due to the crash reported in
https://lore.kernel.org/all/ZF4BvgsOyoKxdPFF@francesco-nb.int.toradex.com/.
commit 16737e78d190 ("usb: gadget: udc: core: Offload usb_udc_vbus_handler processing")
addresses the crash reported.
Cc: stable@vger.kernel.org
Fixes: 628ef0d273 ("usb: udc: add usb_udc_vbus_handler")
Signed-off-by: Badhri Jagan Sridharan <badhri@google.com>
Reviewed-by: Alan Stern <stern@rowland.harvard.edu>
Message-ID: <20230609010227.978661-2-badhri@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
usb_udc_vbus_handler() can be invoked from interrupt context by irq
handlers of the gadget drivers, however, usb_udc_connect_control() has
to run in non-atomic context due to the following:
a. Some of the gadget driver implementations expect the ->pullup
callback to be invoked in non-atomic context.
b. usb_gadget_disconnect() acquires udc_lock which is a mutex.
Hence offload invocation of usb_udc_connect_control()
to workqueue.
UDC should not be pulled up unless gadget driver is bound. The new flag
"allow_connect" is now set by gadget_bind_driver() and cleared by
gadget_unbind_driver(). This prevents work item to pull up the gadget
even if queued when the gadget driver is already unbound.
Cc: stable@vger.kernel.org
Fixes: 1016fc0c09 ("USB: gadget: Fix obscure lockdep violation for udc_mutex")
Signed-off-by: Badhri Jagan Sridharan <badhri@google.com>
Reviewed-by: Alan Stern <stern@rowland.harvard.edu>
Message-ID: <20230609010227.978661-1-badhri@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Some dwc3 glue drivers are currently accessing the driver data of the
child core device directly, which is clearly a bad idea as the child may
not have probed yet or may have been unbound from its driver.
As a workaround until the glue drivers have been fixed, clear the driver
data pointer before allowing the glue parent device to runtime suspend
to prevent its driver from accessing data that has been freed during
unbind.
Fixes: 6dd2565989 ("usb: dwc3: add imx8mp dwc3 glue layer driver")
Fixes: 6895ea55c3 ("usb: dwc3: qcom: Configure wakeup interrupts during suspend")
Cc: stable@vger.kernel.org # 5.12
Cc: Li Jun <jun.li@nxp.com>
Cc: Sandeep Maheswaram <quic_c_sanm@quicinc.com>
Cc: Krishna Kurapati <quic_kriskura@quicinc.com>
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Acked-by: Thinh Nguyen <Thinh.Nguyen@synopsys.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Message-ID: <20230607100540.31045-3-johan+linaro@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The Qualcomm dwc3 glue driver is currently accessing the driver data of
the child core device during suspend and on wakeup interrupts. This is
clearly a bad idea as the child may not have probed yet or could have
been unbound from its driver.
The first such layering violation was part of the initial version of the
driver, but this was later made worse when the hack that accesses the
driver data of the grand child xhci device to configure the wakeup
interrupts was added.
Fixing this properly is not that easily done, so add a sanity check to
make sure that the child driver data is non-NULL before dereferencing it
for now.
Note that this relies on subtleties like the fact that driver core is
making sure that the parent is not suspended while the child is probing.
Reported-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/all/20230325165217.31069-4-manivannan.sadhasivam@linaro.org/
Fixes: d9152161b4 ("usb: dwc3: Add Qualcomm DWC3 glue layer driver")
Fixes: 6895ea55c3 ("usb: dwc3: qcom: Configure wakeup interrupts during suspend")
Cc: stable@vger.kernel.org # 3.18: a872ab303d: "usb: dwc3: qcom: fix use-after-free on runtime-PM wakeup"
Cc: Sandeep Maheswaram <quic_c_sanm@quicinc.com>
Cc: Krishna Kurapati <quic_kriskura@quicinc.com>
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Acked-by: Thinh Nguyen <Thinh.Nguyen@synopsys.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Message-ID: <20230607100540.31045-2-johan+linaro@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Consider a scenario where cable disconnect happens when there is an active
usb reqest queued to the UDC. As part of the disconnect we would issue an
end transfer with no interrupt-on-completion before giving back this
request. Since we are giving back the request without skipping TRBs the
num_trbs field of dwc3_request still holds the stale value previously used.
Function drivers re-use same request for a given bind-unbind session and
hence their dwc3_request context gets preserved across cable
disconnect/connect. When such a request gets re-queued after cable connect,
we would increase the num_trbs field on top of the previous stale value
thus incorrectly representing the number of TRBs used. Fix this by
resetting num_trbs field before giving back the request.
Fixes: 09fe1f8d7e ("usb: dwc3: gadget: track number of TRBs per request")
Cc: stable <stable@kernel.org>
Signed-off-by: Elson Roy Serrao <quic_eserrao@quicinc.com>
Acked-by: Thinh Nguyen <Thinh.Nguyen@synopsys.com>
Message-ID: <1685654850-8468-1-git-send-email-quic_eserrao@quicinc.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently {modprobe, bind} after {rmmod, unbind} results in probe failure.
genirq: Flags mismatch irq 22. 00000004 (85070400.usb3drd) vs. 00000004 (85070400.usb3drd)
renesas_usb3: probe of 85070000.usb3peri failed with error -16
The reason is, it is trying to register an interrupt handler for the same
IRQ twice. The devm_request_irq() was called with the parent device.
So the interrupt handler won't be unregistered when the usb3-peri device
is unbound.
Fix this issue by replacing "parent dev"->"dev" as the irq resource
is managed by this driver.
Fixes: 9cad72dfc5 ("usb: gadget: Add support for RZ/V2M USB3DRD driver")
Cc: stable <stable@kernel.org>
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Message-ID: <20230530161720.179927-1-biju.das.jz@bp.renesas.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
ULONG_MAX is used by a few drivers to figure out the highest available
clock rate via clk_round_rate(clk, ULONG_MAX). Since abs() takes a
signed value as input, the current logic effectively calculates with
ULONG_MAX = -1, which results in the worst parent clock being chosen
instead of the best one.
For example on Rockchip RK3588 the eMMC driver tries to figure out
the highest available clock rate. There are three parent clocks
available resulting in the following rate diffs with the existing
logic:
GPLL: abs(18446744073709551615 - 1188000000) = 1188000001
CPLL: abs(18446744073709551615 - 1500000000) = 1500000001
XIN24M: abs(18446744073709551615 - 24000000) = 24000001
As a result the clock framework will promote a maximum supported
clock rate of 24 MHz, even though 1.5GHz are possible. With the
updated logic any casting between signed and unsigned is avoided
and the numbers look like this instead:
GPLL: 18446744073709551615 - 1188000000 = 18446744072521551615
CPLL: 18446744073709551615 - 1500000000 = 18446744072209551615
XIN24M: 18446744073709551615 - 24000000 = 18446744073685551615
As a result the parent with the highest acceptable rate is chosen
instead of the parent clock with the lowest one.
Cc: stable@vger.kernel.org
Fixes: 4950240800 ("mmc: sdhci-of-dwcmshc: properly determine max clock on Rockchip")
Tested-by: Christopher Obbard <chris.obbard@collabora.com>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Link: https://lore.kernel.org/r/20230526171057.66876-2-sebastian.reichel@collabora.com
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Matthieu Baerts says:
====================
selftests: mptcp: skip tests not supported by old kernels (part 3)
After a few years of increasing test coverage in the MPTCP selftests, we
realised [1] the last version of the selftests is supposed to run on old
kernels without issues.
Supporting older versions is not that easy for this MPTCP case: these
selftests are often validating the internals by checking packets that
are exchanged, when some MIB counters are incremented after some
actions, how connections are getting opened and closed in some cases,
etc. In other words, it is not limited to the socket interface between
the userspace and the kernelspace.
In addition to that, the current MPTCP selftests run a lot of different
sub-tests but the TAP13 protocol used in the selftests don't support
sub-tests: one failure in sub-tests implies that the whole selftest is
seen as failed at the end because sub-tests are not tracked. It is then
important to skip sub-tests not supported by old kernels.
To minimise the modifications and reduce the complexity to support old
versions, the idea is to look at external signs and skip the whole
selftest or just some sub-tests before starting them. This cannot be
applied in all cases.
Similar to the second part, this third one focuses on marking different
sub-tests as skipped if some MPTCP features are not supported. This
time, only in "mptcp_join.sh" selftest, the remaining one, is modified.
Several techniques are used here to achieve this task:
- Before starting some tests:
- Check if a file (sysctl knob) is present: that's what patch 12/17 is
doing for the userspace PM feature.
- Check if a required kernel symbol is present in /proc/kallsyms:
patches 9, 10, 14 and 15/17 are using this technique.
- Check if it is possible to setup a particular network environment
requiring Netfilter or TC: if the preparation step fail, the linked
sub-test is marked as skipped. Patch 5/17 is doing that.
- Check if a MIB counter is available: patches 7 and 13/17 do that.
- Check if the kernel version is newer than a specific one: patch 1/17
adds some helpers in mptcp_lib.sh to ease its use. That's not ideal
and it is only used as last resort but as mentioned above, it is
important to skip tests if they are not supported not to have the
whole selftest always being marked as failed on old kernels. Patches
11 and 17/17 are checking the kernel version. An alternative would
be to ignore the results for some sub-tests but that's not ideal
too. Note that SELFTESTS_MPTCP_LIB_NO_KVERSION_CHECK env var can be
set to 1 not to skip these tests if the running kernel doesn't have
a supported version.
- After having launched the tests:
- Adapt the expectations depending on the presence of a kernel symbol
(patch 6/17) or a kernel version (patch 8/17).
- Check is a MIB counter is available and skip the verification if
not. Patch 4/17 is using this technique.
Before skipping tests, SELFTESTS_MPTCP_LIB_EXPECT_ALL_FEATURES env var
value is checked: if it is set to 1, the test is marked as "failed"
instead of "skipped". MPTCP public CI expects to have all features
supported and it sets this env var to 1 to catch regressions in these
new checks.
Patch 2/17 uses 'iptables-legacy' if available because it might be
needed when using an older kernel not supporting iptables-nft.
Patch 3/17 adds some helpers used in the other patches mentioned to
easily mark sub-tests as skipped.
Patch 16/17 uniforms MPTCP Join "listener" tests: it was imported code
from userspace_pm.sh but without using the "code style" and ways of
using tools and printing messages from MPTCP Join selftest.
Link: https://lore.kernel.org/stable/CA+G9fYtDGpgT4dckXD-y-N92nqUxuvue_7AtDdBcHrbOMsDZLg@mail.gmail.com/ [1]
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
====================
Link: https://lore.kernel.org/r/20230609-upstream-net-20230610-mptcp-selftests-support-old-kernels-part-3-v1-0-2896fe2ee8a3@tessares.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Selftests are supposed to run on any kernels, including the old ones not
supporting all MPTCP features.
One of them is the support of a mix of subflows in v4 and v6 by the
in-kernel PM introduced by commit b9d69db87f ("mptcp: let the
in-kernel PM use mixed IPv4 and IPv6 addresses").
It looks like there is no external sign we can use to predict the
expected behaviour. Instead of accepting different behaviours and thus
not really checking for the expected behaviour, we are looking here for
a specific kernel version. That's not ideal but it looks better than
removing the test because it cannot support older kernel versions.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
Fixes: ad3493746e ("selftests: mptcp: add test-cases for mixed v4/v6 subflows")
Cc: stable@vger.kernel.org
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The alignment was different from the other tests because tabs were used
instead of spaces.
While at it, also use 'echo' instead of 'printf' to print the result to
keep the same style as done in the other sub-tests. And, even if it
should be better with, also remove 'stdbuf' and sed's '--unbuffered'
option because they are not used in the other subtests and they are not
available when using a minimal environment with busybox.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
Fixes: 178d023208 ("selftests: mptcp: listener test for in-kernel PM")
Cc: stable@vger.kernel.org
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Selftests are supposed to run on any kernels, including the old ones not
supporting all MPTCP features.
One of them is the support of PM listener events introduced by commit
f8c9dfbd87 ("mptcp: add pm listener events").
It is possible to look for "mptcp_event_pm_listener" in kallsyms to know
in advance if the kernel supports this feature.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
Fixes: 178d023208 ("selftests: mptcp: listener test for in-kernel PM")
Cc: stable@vger.kernel.org
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Selftests are supposed to run on any kernels, including the old ones not
supporting all MPTCP features.
One of them is the support of sending an MP_PRIO signal for the initial
subflow, introduced by commit c157bbe776 ("mptcp: allow the in kernel
PM to set MPC subflow priority").
It is possible to look for "mptcp_subflow_send_ack" in kallsyms because
it was needed to introduce the mentioned feature. So we can know in
advance if the feature is supported instead of trying and accepting any
results.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
Fixes: 914f6a59b1 ("selftests: mptcp: add MPC backup tests")
Cc: stable@vger.kernel.org
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Selftests are supposed to run on any kernels, including the old ones not
supporting all MPTCP features.
One of them is the support of the MP_FAIL / infinite mapping introduced
by commit 1e39e5a32a ("mptcp: infinite mapping sending") and the
following ones.
It is possible to look for one of the infinite mapping counters to know
in advance if the this feature is available.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
Fixes: b6e074e171 ("selftests: mptcp: add infinite map testcase")
Cc: stable@vger.kernel.org
Fixes: 2ba18161d4 ("selftests: mptcp: add MP_FAIL reset testcase")
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Selftests are supposed to run on any kernels, including the old ones not
supporting all MPTCP features.
One of them is the support of the userspace PM introduced by commit
4638de5aef ("mptcp: handle local addrs announced by userspace PMs")
and the following ones.
It is possible to look for the MPTCP pm_type's sysctl knob to know in
advance if the userspace PM is available.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
Fixes: 5ac1d2d634 ("selftests: mptcp: Add tests for userspace PM type")
Cc: stable@vger.kernel.org
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Selftests are supposed to run on any kernels, including the old ones not
supporting all MPTCP features.
One of them is the support of the fullmesh flag for the in-kernel PM
introduced by commit 2843ff6f36 ("mptcp: remote addresses fullmesh")
and commit 1a0d6136c5 ("mptcp: local addresses fullmesh").
It looks like there is no easy external sign we can use to predict the
expected behaviour. We could add the flag and then check if it has been
added but for that, and for each fullmesh test, we would need to setup a
new environment, do the checks, clean it and then only start the test
from yet another clean environment. To keep it simple and avoid
introducing new issues, we look for a specific kernel version. That's
not ideal but an acceptable solution for this case.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
Fixes: 6a0653b96f ("selftests: mptcp: add fullmesh setting tests")
Cc: stable@vger.kernel.org
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Selftests are supposed to run on any kernels, including the old ones not
supporting all MPTCP features.
Commit bccefb7624 ("selftests: mptcp: simplify pm_nl_change_endpoint")
has simplified the way the backup flag is set on an endpoint. Instead of
doing:
./pm_nl_ctl set 10.0.2.1 flags backup
Now we do:
./pm_nl_ctl set id 1 flags backup
The new way is easier to maintain but it is also incompatible with older
kernels not supporting the implicit endpoints putting in place the
infrastructure to set flags per ID, hence the second Fixes tag.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
Fixes: bccefb7624 ("selftests: mptcp: simplify pm_nl_change_endpoint")
Cc: stable@vger.kernel.org
Fixes: 4cf86ae84c ("mptcp: strict local address ID selection")
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Selftests are supposed to run on any kernels, including the old ones not
supporting all MPTCP features.
One of them is the support of the implicit endpoints introduced by
commit d045b9eb95 ("mptcp: introduce implicit endpoints").
It is possible to look for "mptcp_subflow_send_ack" in kallsyms because
it was needed to introduce the mentioned feature. So we can know in
advance if the feature is supported instead of trying and accepting any
results.
Note that here and in the following commits, we re-do the same check for
each sub-test of the same function for a few reasons. The main one is
not to break the ID assign to each test in order to be able to easily
compare results between different kernel versions. Also, we can still
run a specific test even if it is skipped. Another reason is that it
makes it clear during the review that a specific subtest will be skipped
or not under certain conditions. At the end, it looks OK to call the
exact same helper multiple times: it is not a critical path and it is
the same code that is executed, not really more cases to maintain.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
Fixes: 69c6ce7b6e ("selftests: mptcp: add implicit endpoint test case")
Cc: stable@vger.kernel.org
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Selftests are supposed to run on any kernels, including the old ones not
supporting all MPTCP features.
At some points, a new feature caused internal behaviour changes we are
verifying in the selftests, see the Fixes tag below. It was not a UAPI
change but because in these selftests, we check some internal
behaviours, it is normal we have to adapt them from time to time after
having added some features.
It looks like there is no external sign we can use to predict the
expected behaviour. Instead of accepting different behaviours and thus
not really checking for the expected behaviour, we are looking here for
a specific kernel version. That's not ideal but it looks better than
removing the test because it cannot support older kernel versions.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
Fixes: 6fa0174a7c ("mptcp: more careful RM_ADDR generation")
Cc: stable@vger.kernel.org
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Selftests are supposed to run on any kernels, including the old ones not
supporting all MPTCP features.
One of them is the support of MP_FASTCLOSE introduced in commit
f284c0c773 ("mptcp: implement fastclose xmit path").
If the MIB counter is not available, the test cannot be verified and the
behaviour will not be the expected one. So we can skip the test if the
counter is missing.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
Fixes: 01542c9bf9 ("selftests: mptcp: add fastclose testcase")
Cc: stable@vger.kernel.org
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Selftests are supposed to run on any kernels, including the old ones not
supporting all MPTCP features.
At some points, a new feature caused internal behaviour changes we are
verifying in the selftests, see the Fixes tag below. It was not a uAPI
change but because in these selftests, we check some internal
behaviours, it is normal we have to adapt them from time to time after
having added some features.
It is possible to look for "mptcp_pm_subflow_check_next" in kallsyms
because it was needed to introduce the mentioned feature. So we can know
in advance what the behaviour we are expecting here instead of
supporting the two behaviours.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
Fixes: 86e39e0448 ("mptcp: keep track of local endpoint still available for each msk")
Cc: stable@vger.kernel.org
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Selftests are supposed to run on any kernels, including the old ones not
supporting all MPTCP features.
Some tests are using IPTables and/or TC commands to force some
behaviours. If one of these commands fails -- likely because some
features are not available due to missing kernel config -- we should
intercept the error and skip the tests requiring these features.
Note that if we expect to have these features available and if
SELFTESTS_MPTCP_LIB_EXPECT_ALL_FEATURES env var is set to 1, the tests
will be marked as failed instead of skipped.
This patch also replaces the 'exit 1' by 'return 1' not to stop the
selftest in the middle without the conclusion if there is an issue with
NF or TC.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
Fixes: 8d014eaa92 ("selftests: mptcp: add ADD_ADDR timeout test case")
Cc: stable@vger.kernel.org
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Selftests are supposed to run on any kernels, including the old ones not
supporting all MPTCP features.
One of them is the MPTCP MIB counters introduced in commit fc518953bc
("mptcp: add and use MIB counter infrastructure") and more later. The
MPTCP Join selftest heavily relies on these counters.
If a counter is not supported by the kernel, it is not displayed when
using 'nstat -z'. We can then detect that and skip the verification. A
new helper (get_counter()) has been added to do the required checks and
return an error if the counter is not available.
Note that if we expect to have these features available and if
SELFTESTS_MPTCP_LIB_EXPECT_ALL_FEATURES env var is set to 1, the tests
will be marked as failed instead of skipped.
This new helper also makes sure we get the exact counter we want to
avoid issues we had in the past, e.g. with MPTcpExtRmAddr and
MPTcpExtRmAddrDrop sharing the same prefix. While at it, we uniform the
way we fetch a MIB counter.
Note for the backports: we rarely change these modified blocks so if
there is are conflicts, it is very likely because a counter is not used
in the older kernels and we don't need that chunk.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
Fixes: b08fbf2410 ("selftests: add test-cases for MPTCP MP_JOIN")
Cc: stable@vger.kernel.org
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Selftests are supposed to run on any kernels, including the old ones not
supporting all MPTCP features.
Here are some helpers that will be used to mark subtests as skipped if a
feature is not supported. Marking as a fix for the commit introducing
this selftest to help with the backports.
While at it, also check if kallsyms feature is available as it will also
be used in the following commits to check if MPTCP features are
available before starting a test.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
Fixes: b08fbf2410 ("selftests: add test-cases for MPTCP MP_JOIN")
Cc: stable@vger.kernel.org
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
IPTables commands using 'iptables-nft' fail on old kernels, at least
5.15 because it doesn't see the default IPTables chains:
$ iptables -L
iptables/1.8.2 Failed to initialize nft: Protocol not supported
As a first step before switching to NFTables, we can use iptables-legacy
if available.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
Fixes: 8d014eaa92 ("selftests: mptcp: add ADD_ADDR timeout test case")
Cc: stable@vger.kernel.org
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Selftests are supposed to run on any kernels, including the old ones not
supporting all MPTCP features.
A new function is now available to easily detect if a feature is
missing by looking at the kernel version. That's clearly not ideal and
this kind of check should be avoided as soon as possible. But sometimes,
there are no external sign that a "feature" is available or not:
internal behaviours can change without modifying the uAPI and these
selftests are verifying the internal behaviours. Sometimes, the only
(easy) way to verify if the feature is present is to run the test but
then the validation cannot determine if there is a failure with the
feature or if the feature is missing. Then it looks better to check the
kernel version instead of having tests that can never fail. In any case,
we need a solution not to have a whole selftest being marked as failed
just because one sub-test has failed.
Note that this env var car be set to 1 not to do such check and run the
linked sub-test: SELFTESTS_MPTCP_LIB_NO_KVERSION_CHECK.
This new helper is going to be used in the following commits. In order
to ease the backport of such future patches, it would be good if this
patch is backported up to the introduction of MPTCP selftests, hence the
Fixes tag below: this type of check was supposed to be done from the
beginning.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
Fixes: 048d19d444 ("mptcp: add basic kselftest for mptcp")
Cc: stable@vger.kernel.org
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Maxime Chevallier says:
====================
fixes for Q-USGMII speeds and autoneg
This is the second version of a small changeset for QUSGMII support,
fixing inconsistencies in reported max speed and control word parsing.
As reported here [1], there are some inconsistencies for the Q-USGMII
mode speeds and configuration. The first patch in this fixup series
makes so that we correctly report the max speed of 1Gbps for this mode.
The second patch uses a dedicated helper to decode the control word.
This is necessary as although USGMII control words are close to USXGMII,
they don't support the same speeds.
[1] : https://lore.kernel.org/netdev/ZHnd+6FUO77XFJvQ@shell.armlinux.org.uk/
====================
Link: https://lore.kernel.org/r/20230609080305.546028-1-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Q-USGMII is a derivative of USGMII, that uses a specific formatting for
the control word. The layout is close to the USXGMII control word, but
doesn't support speeds over 1Gbps. Use a dedicated decoding logic for
the USGMII control word, re-using USXGMII definitions but only considering
10/100/1000Mbps speeds
Fixes: 5e61fe157a ("net: phy: Introduce QUSGMII PHY mode")
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Q-USGMII is the quad port version of USGMII, and supports a max speed of
1Gbps on each line. Make so that phylink_interface_max_speed() reports
this information correctly.
Fixes: ae0e4bb2a0 ("net: phylink: Adjust link settings based on rate matching")
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
[BUG]
Syzbot reports a reproducible ASSERT() when using rescue=usebackuproot
mount option on a corrupted fs.
The full report can be found here:
https://syzkaller.appspot.com/bug?extid=c4614eae20a166c25bf0
BTRFS error (device loop0: state C): failed to load root csum
assertion failed: !tmp, in fs/btrfs/disk-io.c:1103
------------[ cut here ]------------
kernel BUG at fs/btrfs/ctree.h:3664!
invalid opcode: 0000 [#1] PREEMPT SMP KASAN
CPU: 1 PID: 3608 Comm: syz-executor356 Not tainted 6.0.0-rc7-syzkaller-00029-g3800a713b607 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/26/2022
RIP: 0010:assertfail+0x1a/0x1c fs/btrfs/ctree.h:3663
RSP: 0018:ffffc90003aaf250 EFLAGS: 00010246
RAX: 0000000000000032 RBX: 0000000000000000 RCX: f21c13f886638400
RDX: 0000000000000000 RSI: 0000000080000000 RDI: 0000000000000000
RBP: ffff888021c640a0 R08: ffffffff816bd38d R09: ffffed10173667f1
R10: ffffed10173667f1 R11: 1ffff110173667f0 R12: dffffc0000000000
R13: ffff8880229c21f7 R14: ffff888021c64060 R15: ffff8880226c0000
FS: 0000555556a73300(0000) GS:ffff8880b9b00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055a2637d7a00 CR3: 00000000709c4000 CR4: 00000000003506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
btrfs_global_root_insert+0x1a7/0x1b0 fs/btrfs/disk-io.c:1103
load_global_roots_objectid+0x482/0x8c0 fs/btrfs/disk-io.c:2467
load_global_roots fs/btrfs/disk-io.c:2501 [inline]
btrfs_read_roots fs/btrfs/disk-io.c:2528 [inline]
init_tree_roots+0xccb/0x203c fs/btrfs/disk-io.c:2939
open_ctree+0x1e53/0x33df fs/btrfs/disk-io.c:3574
btrfs_fill_super+0x1c6/0x2d0 fs/btrfs/super.c:1456
btrfs_mount_root+0x885/0x9a0 fs/btrfs/super.c:1824
legacy_get_tree+0xea/0x180 fs/fs_context.c:610
vfs_get_tree+0x88/0x270 fs/super.c:1530
fc_mount fs/namespace.c:1043 [inline]
vfs_kern_mount+0xc9/0x160 fs/namespace.c:1073
btrfs_mount+0x3d3/0xbb0 fs/btrfs/super.c:1884
[CAUSE]
Since the introduction of global roots, we handle
csum/extent/free-space-tree roots as global roots, even if no
extent-tree-v2 feature is enabled.
So for regular csum/extent/fst roots, we load them into
fs_info::global_root_tree rb tree.
And we should not expect any conflicts in that rb tree, thus we have an
ASSERT() inside btrfs_global_root_insert().
But rescue=usebackuproot can break the assumption, as we will try to
load those trees again and again as long as we have bad roots and have
backup roots slot remaining.
So in that case we can have conflicting roots in the rb tree, and
triggering the ASSERT() crash.
[FIX]
We can safely remove that ASSERT(), as the caller will properly put the
offending root.
To make further debugging easier, also add two explicit error messages:
- Error message for conflicting global roots
- Error message when using backup roots slot
Reported-by: syzbot+a694851c6ab28cbcfb9c@syzkaller.appspotmail.com
Fixes: abed4aaae4 ("btrfs: track the csum, extent, and free space trees in a rb tree")
CC: stable@vger.kernel.org # 6.1+
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Pull misc fixes from Andrew Morton:
"19 hotfixes. 14 are cc:stable and the remainder address issues which
were introduced during this development cycle or which were considered
inappropriate for a backport"
* tag 'mm-hotfixes-stable-2023-06-12-12-22' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
zswap: do not shrink if cgroup may not zswap
page cache: fix page_cache_next/prev_miss off by one
ocfs2: check new file size on fallocate call
mailmap: add entry for John Keeping
mm/damon/core: fix divide error in damon_nr_accesses_to_accesses_bp()
epoll: ep_autoremove_wake_function should use list_del_init_careful
mm/gup_test: fix ioctl fail for compat task
nilfs2: reject devices with insufficient block count
ocfs2: fix use-after-free when unmounting read-only filesystem
lib/test_vmalloc.c: avoid garbage in page array
nilfs2: fix possible out-of-bounds segment allocation in resize ioctl
riscv/purgatory: remove PGO flags
powerpc/purgatory: remove PGO flags
x86/purgatory: remove PGO flags
kexec: support purgatories with .text.hot sections
mm/uffd: allow vma to merge as much as possible
mm/uffd: fix vma operation where start addr cuts part of vma
radix-tree: move declarations to header
nilfs2: fix incomplete buffer cleanup in nilfs_btnode_abort_change_key()
Commit 619104ba45 ("btrfs: move common NOCOW checks against a file
extent into a helper") changed our call to btrfs_cross_ref_exist() to
always pass false for the 'strict' parameter. We're passing this down
through the stack so that we can do a full check for cross references
during swapfile activation.
With strict always false, this test fails:
btrfs subvol create swappy
chattr +C swappy
fallocate -l1G swappy/swapfile
chmod 600 swappy/swapfile
mkswap swappy/swapfile
btrfs subvol snap swappy swapsnap
btrfs subvol del -C swapsnap
btrfs fi sync /
sync;sync;sync
swapon swappy/swapfile
The fix is to just use args->strict, and everyone except swapfile
activation is passing false.
Fixes: 619104ba45 ("btrfs: move common NOCOW checks against a file extent into a helper")
CC: stable@vger.kernel.org # 6.1+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
can_nocow_extent can reduce the len passed in, which needs to be
propagated to btrfs_dio_iomap_begin so that iomap does not submit
more data then is mapped.
This problems exists since the btrfs_get_blocks_direct helper was added
in commit c5794e5178 ("btrfs: Factor out write portion of
btrfs_get_blocks_direct"), but the ordered_extent splitting added in
commit b73a6fd1b1 ("btrfs: split partial dio bios before submit")
added a WARN_ON that made a syzkaller test fail.
Reported-by: syzbot+ee90502d5c8fd1d0dd93@syzkaller.appspotmail.com
Fixes: c5794e5178 ("btrfs: Factor out write portion of btrfs_get_blocks_direct")
CC: stable@vger.kernel.org # 6.1+
Tested-by: syzbot+ee90502d5c8fd1d0dd93@syzkaller.appspotmail.com
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David Sterba <dsterba@suse.com>
Add error handling into igb_set_eeprom() function, in case
nvm.ops.read() fails just quit with error code asap.
Fixes: 9d5c824399 ("igb: PCI-Express 82575 Gigabit Ethernet driver")
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Guarantee that when probe() is run again, PTM and PCI busmaster will be
in the same state as it was if the driver was never loaded.
Avoid an i225/i226 hardware issue that PTM requests can be made even
though PCI bus mastering is not enabled. These unexpected PTM requests
can crash some systems.
So, "force" disable PTM and busmastering before removing the driver,
so they can be re-enabled in the right order during probe(). This is
more like a workaround and should be applicable for i225 and i226, in
any platform.
Fixes: 1b5d73fb86 ("igc: Enable PCIe PTM")
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Reviewed-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Before the patch [1], the clock probe was done directly in the
clk-mt8365 driver. In this probe function, the array which stores the
data clocks is sized using the higher defined numbers (*_NR_CLOCK) in
the clock lists [2]. Currently, with the patch [1], the specific
clk-mt8365 probe function is replaced by the mtk generic one [3], which
size the clock data array by adding all the clock descriptor array size
provided by the clk-mt8365 driver.
Actually, all clock indexes come from the header file [2], that mean, if
there are more clock (then more index) in the header file [2] than the
number of clock declared in the clock descriptor arrays (which is the
case currently), the clock data array will be undersized and then the
generic probe function will overflow when it will try to write in
"clk_data[CLK_INDEX]". Actually, instead of crashing at boot, the probe
function returns an error in the log which looks like:
"of_clk_hw_onecell_get: invalid index 135", then this clock isn't
enabled.
Solve this issue by adding in the driver the missing clocks declared in
the header clock file [2].
[1]: Commit ffe91cb28f ("clk: mediatek: mt8365: Convert to
mtk_clk_simple_{probe,remove}()")
[2]: include/dt-bindings/clock/mediatek,mt8365-clk.h
[3]: drivers/clk/mediatek/clk-mtk.c
Fixes: ffe91cb28f ("clk: mediatek: mt8365: Convert to mtk_clk_simple_{probe,remove}()")
Signed-off-by: Alexandre Mergnat <amergnat@baylibre.com>
Link: https://lore.kernel.org/r/20230517-fix-clk-index-v3-1-be4df46065c4@baylibre.com
Tested-by: Markus Schneider-Pargmann <msp@baylibre.com>
Reviewed-by: Markus Schneider-Pargmann <msp@baylibre.com>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Before storing a page, zswap first checks if the number of stored pages
exceeds the limit specified by memory.zswap.max, for each cgroup in the
hierarchy. If this limit is reached or exceeded, then zswap shrinking is
triggered and short-circuits the store attempt.
However, since the zswap's LRU is not memcg-aware, this can create the
following pathological behavior: the cgroup whose zswap limit is 0 will
evict pages from other cgroups continually, without lowering its own zswap
usage. This means the shrinking will continue until the need for swap
ceases or the pool becomes empty.
As a result of this, we observe a disproportionate amount of zswap
writeback and a perpetually small zswap pool in our experiments, even
though the pool limit is never hit.
More generally, a cgroup might unnecessarily evict pages from other
cgroups before we drive the memcg back below its limit.
This patch fixes the issue by rejecting zswap store attempt without
shrinking the pool when obj_cgroup_may_zswap() returns false.
[akpm@linux-foundation.org: fix return of unintialized value]
[akpm@linux-foundation.org: s/ENOSPC/ENOMEM/]
Link: https://lkml.kernel.org/r/20230530222440.2777700-1-nphamcs@gmail.com
Link: https://lkml.kernel.org/r/20230530232435.3097106-1-nphamcs@gmail.com
Fixes: f4840ccfca ("zswap: memcg accounting")
Signed-off-by: Nhat Pham <nphamcs@gmail.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Vitaly Wool <vitaly.wool@konsulko.com>
Cc: Yosry Ahmed <yosryahmed@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Ackerley Tng reported an issue with hugetlbfs fallocate here[1]. The
issue showed up after the conversion of hugetlb page cache lookup code to
use page_cache_next_miss. Code in hugetlb fallocate, userfaultfd and GUP
is now using page_cache_next_miss to determine if a page is present the
page cache. The following statement is used.
present = page_cache_next_miss(mapping, index, 1) != index;
There are two issues with page_cache_next_miss when used in this way.
1) If the passed value for index is equal to the 'wrap-around' value,
the same index will always be returned. This wrap-around value is 0,
so 0 will be returned even if page is present at index 0.
2) If there is no gap in the range passed, the last index in the range
will be returned. When passed a range of 1 as above, the passed
index value will be returned even if the page is present.
The end result is the statement above will NEVER indicate a page is
present in the cache, even if it is.
As noted by Ackerley in [1], users can see this by hugetlb fallocate
incorrectly returning EEXIST if pages are already present in the file. In
addition, hugetlb pages will not be included in core dumps if they need to
be brought in via GUP. userfaultfd UFFDIO_COPY also uses this code and
will not notice pages already present in the cache. It may try to
allocate a new page and potentially return ENOMEM as opposed to EEXIST.
Both page_cache_next_miss and page_cache_prev_miss have similar issues.
Fix by:
- Check for index equal to 'wrap-around' value and do not exit early.
- If no gap is found in range, return index outside range.
- Update function description to say 'wrap-around' value could be
returned if passed as index.
[1] https://lore.kernel.org/linux-mm/cover.1683069252.git.ackerleytng@google.com/
Link: https://lkml.kernel.org/r/20230602225747.103865-2-mike.kravetz@oracle.com
Fixes: d0ce0e47b3 ("mm/hugetlb: convert hugetlb fault paths to use alloc_hugetlb_folio()")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reported-by: Ackerley Tng <ackerleytng@google.com>
Reviewed-by: Ackerley Tng <ackerleytng@google.com>
Tested-by: Ackerley Tng <ackerleytng@google.com>
Cc: Erdem Aktas <erdemaktas@google.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: Vishal Annapurve <vannapurve@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The current sanity check for nilfs2 geometry information lacks checks for
the number of segments stored in superblocks, so even for device images
that have been destructively truncated or have an unusually high number of
segments, the mount operation may succeed.
This causes out-of-bounds block I/O on file system block reads or log
writes to the segments, the latter in particular causing
"a_ops->writepages" to repeatedly fail, resulting in sync_inodes_sb() to
hang.
Fix this issue by checking the number of segments stored in the superblock
and avoiding mounting devices that can cause out-of-bounds accesses. To
eliminate the possibility of overflow when calculating the number of
blocks required for the device from the number of segments, this also adds
a helper function to calculate the upper bound on the number of segments
and inserts a check using it.
Link: https://lkml.kernel.org/r/20230526021332.3431-1-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: syzbot+7d50f1e54a12ba3aeae2@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?extid=7d50f1e54a12ba3aeae2
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
It's trivial to trigger a use-after-free bug in the ocfs2 quotas code using
fstest generic/452. After a read-only remount, quotas are suspended and
ocfs2_mem_dqinfo is freed through ->ocfs2_local_free_info(). When unmounting
the filesystem, an UAF access to the oinfo will eventually cause a crash.
BUG: KASAN: slab-use-after-free in timer_delete+0x54/0xc0
Read of size 8 at addr ffff8880389a8208 by task umount/669
...
Call Trace:
<TASK>
...
timer_delete+0x54/0xc0
try_to_grab_pending+0x31/0x230
__cancel_work_timer+0x6c/0x270
ocfs2_disable_quotas.isra.0+0x3e/0xf0 [ocfs2]
ocfs2_dismount_volume+0xdd/0x450 [ocfs2]
generic_shutdown_super+0xaa/0x280
kill_block_super+0x46/0x70
deactivate_locked_super+0x4d/0xb0
cleanup_mnt+0x135/0x1f0
...
</TASK>
Allocated by task 632:
kasan_save_stack+0x1c/0x40
kasan_set_track+0x21/0x30
__kasan_kmalloc+0x8b/0x90
ocfs2_local_read_info+0xe3/0x9a0 [ocfs2]
dquot_load_quota_sb+0x34b/0x680
dquot_load_quota_inode+0xfe/0x1a0
ocfs2_enable_quotas+0x190/0x2f0 [ocfs2]
ocfs2_fill_super+0x14ef/0x2120 [ocfs2]
mount_bdev+0x1be/0x200
legacy_get_tree+0x6c/0xb0
vfs_get_tree+0x3e/0x110
path_mount+0xa90/0xe10
__x64_sys_mount+0x16f/0x1a0
do_syscall_64+0x43/0x90
entry_SYSCALL_64_after_hwframe+0x72/0xdc
Freed by task 650:
kasan_save_stack+0x1c/0x40
kasan_set_track+0x21/0x30
kasan_save_free_info+0x2a/0x50
__kasan_slab_free+0xf9/0x150
__kmem_cache_free+0x89/0x180
ocfs2_local_free_info+0x2ba/0x3f0 [ocfs2]
dquot_disable+0x35f/0xa70
ocfs2_susp_quotas.isra.0+0x159/0x1a0 [ocfs2]
ocfs2_remount+0x150/0x580 [ocfs2]
reconfigure_super+0x1a5/0x3a0
path_mount+0xc8a/0xe10
__x64_sys_mount+0x16f/0x1a0
do_syscall_64+0x43/0x90
entry_SYSCALL_64_after_hwframe+0x72/0xdc
Link: https://lkml.kernel.org/r/20230522102112.9031-1-lhenriques@suse.de
Signed-off-by: Luís Henriques <lhenriques@suse.de>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Tested-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
It turns out that alloc_pages_bulk_array() does not treat the page_array
parameter as an output parameter, but rather reads the array and skips any
entries that have already been allocated.
This is somewhat unexpected and breaks this test, as we allocate the pages
array uninitialised on the assumption it will be overwritten.
As a result, the test was referencing uninitialised data and causing the
PFN to not be valid and thus a WARN_ON() followed by a null pointer deref
and panic.
In addition, this is an array of pointers not of struct page objects, so we
need only allocate an array with elements of pointer size.
We solve both problems by simply using kcalloc() and referencing
sizeof(struct page *) rather than sizeof(struct page).
Link: https://lkml.kernel.org/r/20230524082424.10022-1-lstoakes@gmail.com
Fixes: 869cb29a61 ("lib/test_vmalloc.c: add vm_map_ram()/vm_unmap_ram() test case")
Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Reviewed-by: Baoquan He <bhe@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Syzbot reports that in its stress test for resize ioctl, the log writing
function nilfs_segctor_do_construct hits a WARN_ON in
nilfs_segctor_truncate_segments().
It turned out that there is a problem with the current implementation of
the resize ioctl, which changes the writable range on the device (the
range of allocatable segments) at the end of the resize process.
This order is necessary for file system expansion to avoid corrupting the
superblock at trailing edge. However, in the case of a file system
shrink, if log writes occur after truncating out-of-bounds trailing
segments and before the resize is complete, segments may be allocated from
the truncated space.
The userspace resize tool was fine as it limits the range of allocatable
segments before performing the resize, but it can run into this issue if
the resize ioctl is called alone.
Fix this issue by changing nilfs_sufile_resize() to update the range of
allocatable segments immediately after successful truncation of segment
space in case of file system shrink.
Link: https://lkml.kernel.org/r/20230524094348.3784-1-konishi.ryusuke@gmail.com
Fixes: 4e33f9eab0 ("nilfs2: implement resize ioctl")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: syzbot+33494cd0df2ec2931851@syzkaller.appspotmail.com
Closes: https://lkml.kernel.org/r/0000000000005434c405fbbafdc5@google.com
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
We used to not pass in the pgoff correctly when register/unregister uffd
regions, it caused incorrect behavior on vma merging and can cause
mergeable vmas being separate after ioctls return.
For example, when we have:
vma1(range 0-9, with uffd), vma2(range 10-19, no uffd)
Then someone unregisters uffd on range (5-9), it should logically become:
vma1(range 0-4, with uffd), vma2(range 5-19, no uffd)
But with current code we'll have:
vma1(range 0-4, with uffd), vma3(range 5-9, no uffd), vma2(range 10-19, no uffd)
This patch allows such merge to happen correctly before ioctl returns.
This behavior seems to have existed since the 1st day of uffd. Since
pgoff for vma_merge() is only used to identify the possibility of vma
merging, meanwhile here what we did was always passing in a pgoff smaller
than what we should, so there should have no other side effect besides not
merging it. Let's still tentatively copy stable for this, even though I
don't see anything will go wrong besides vma being split (which is mostly
not user visible).
Link: https://lkml.kernel.org/r/20230517190916.3429499-3-peterx@redhat.com
Fixes: 86039bd3b4 ("userfaultfd: add new syscall to provide memory externalization")
Signed-off-by: Peter Xu <peterx@redhat.com>
Reported-by: Lorenzo Stoakes <lstoakes@gmail.com>
Acked-by: Lorenzo Stoakes <lstoakes@gmail.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "mm/uffd: Fix vma merge/split", v2.
This series contains two patches that fix vma merge/split for userfaultfd
on two separate issues.
Patch 1 fixes a regression since 6.1+ due to something we overlooked when
converting to maple tree apis. The plan is we use patch 1 to replace the
commit "2f628010799e (mm: userfaultfd: avoid passing an invalid range to
vma_merge())" in mm-hostfixes-unstable tree if possible, so as to bring
uffd vma operations back aligned with the rest code again.
Patch 2 fixes a long standing issue that vma can be left unmerged even if
we can for either uffd register or unregister.
Many thanks to Lorenzo on either noticing this issue from the assert
movement patch, looking at this problem, and also provided a reproducer on
the unmerged vma issue [1].
[1] https://gist.github.com/lorenzo-stoakes/a11a10f5f479e7a977fc456331266e0e
This patch (of 2):
It seems vma merging with uffd paths is broken with either
register/unregister, where right now we can feed wrong parameters to
vma_merge() and it's found by recent patch which moved asserts upwards in
vma_merge() by Lorenzo Stoakes:
https://lore.kernel.org/all/ZFunF7DmMdK05MoF@FVFF77S0Q05N.cambridge.arm.com/
It's possible that "start" is contained within vma but not clamped to its
start. We need to convert this into either "cannot merge" case or "can
merge" case 4 which permits subdivision of prev by assigning vma to prev.
As we loop, each subsequent VMA will be clamped to the start.
This patch will eliminate the report and make sure vma_merge() calls will
become legal again.
One thing to mention is that the "Fixes: 29417d292bd0" below is there only
to help explain where the warning can start to trigger, the real commit to
fix should be 69dbe6daf1. Commit 29417d292b helps us to identify the
issue, but unfortunately we may want to keep it in Fixes too just to ease
kernel backporters for easier tracking.
Link: https://lkml.kernel.org/r/20230517190916.3429499-1-peterx@redhat.com
Link: https://lkml.kernel.org/r/20230517190916.3429499-2-peterx@redhat.com
Fixes: 69dbe6daf1 ("userfaultfd: use maple tree iterator to iterate VMAs")
Signed-off-by: Peter Xu <peterx@redhat.com>
Reported-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Closes: https://lore.kernel.org/all/ZFunF7DmMdK05MoF@FVFF77S0Q05N.cambridge.arm.com/
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The xarray.c file contains the only call to radix_tree_node_rcu_free(),
and it comes with its own extern declaration for it. This means the
function definition causes a missing-prototype warning:
lib/radix-tree.c:288:6: error: no previous prototype for 'radix_tree_node_rcu_free' [-Werror=missing-prototypes]
Instead, move the declaration for this function to a new header that can
be included by both, and do the same for the radix_tree_node_cachep
variable that has the same underlying problem but does not cause a warning
with gcc.
[zhangpeng.00@bytedance.com: fix building radix tree test suite]
Link: https://lkml.kernel.org/r/20230521095450.21332-1-zhangpeng.00@bytedance.com
Link: https://lkml.kernel.org/r/20230516194212.548910-1-arnd@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
A recent commit gated the core dumping task exit logic on current->flags
remaining consistent in terms of PF_{IO,USER}_WORKER at task exit time.
This exposed a problem with the io-wq handling of that, which explicitly
clears PF_IO_WORKER before calling do_exit().
The reasons for this manual clear of PF_IO_WORKER is historical, where
io-wq used to potentially trigger a sleep on exit. As the io-wq thread
is exiting, it should not participate any further accounting. But these
days we don't need to rely on current->flags anymore, so we can safely
remove the PF_IO_WORKER clearing.
Reported-by: Zorro Lang <zlang@redhat.com>
Reported-by: Dave Chinner <david@fromorbit.com>
Link: https://lore.kernel.org/all/ZIZSPyzReZkGBEFy@dread.disaster.area/
Fixes: f9010dbdce ("fork, vhost: Use CLONE_THREAD to fix freezer/ps regression")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull btrfs fixes from David Sterba:
"A more fixes and regression fixes:
- in subpage mode, fix crash when repairing metadata at the end of
a stripe
- properly enable async discard when remounting from read-only to
read-write
- scrub regression fixes:
- respect read-only scrub when attempting to do a repair
- fix reporting of found errors, the stats don't get properly
accounted after a stripe repair"
* tag 'for-6.4-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: scrub: also report errors hit during the initial read
btrfs: scrub: respect the read-only flag during repair
btrfs: properly enable async discard when switching from RO->RW
btrfs: subpage: fix a crash in metadata repair path
We found a refcount UAF bug as follows:
refcount_t: addition on 0; use-after-free.
WARNING: CPU: 1 PID: 342 at lib/refcount.c:25 refcount_warn_saturate+0xa0/0x148
Workqueue: events cpuset_hotplug_workfn
Call trace:
refcount_warn_saturate+0xa0/0x148
__refcount_add.constprop.0+0x5c/0x80
css_task_iter_advance_css_set+0xd8/0x210
css_task_iter_advance+0xa8/0x120
css_task_iter_next+0x94/0x158
update_tasks_root_domain+0x58/0x98
rebuild_root_domains+0xa0/0x1b0
rebuild_sched_domains_locked+0x144/0x188
cpuset_hotplug_workfn+0x138/0x5a0
process_one_work+0x1e8/0x448
worker_thread+0x228/0x3e0
kthread+0xe0/0xf0
ret_from_fork+0x10/0x20
then a kernel panic will be triggered as below:
Unable to handle kernel paging request at virtual address 00000000c0000010
Call trace:
cgroup_apply_control_disable+0xa4/0x16c
rebind_subsystems+0x224/0x590
cgroup_destroy_root+0x64/0x2e0
css_free_rwork_fn+0x198/0x2a0
process_one_work+0x1d4/0x4bc
worker_thread+0x158/0x410
kthread+0x108/0x13c
ret_from_fork+0x10/0x18
The race that cause this bug can be shown as below:
(hotplug cpu) | (umount cpuset)
mutex_lock(&cpuset_mutex) | mutex_lock(&cgroup_mutex)
cpuset_hotplug_workfn |
rebuild_root_domains | rebind_subsystems
update_tasks_root_domain | spin_lock_irq(&css_set_lock)
css_task_iter_start | list_move_tail(&cset->e_cset_node[ss->id]
while(css_task_iter_next) | &dcgrp->e_csets[ss->id]);
css_task_iter_end | spin_unlock_irq(&css_set_lock)
mutex_unlock(&cpuset_mutex) | mutex_unlock(&cgroup_mutex)
Inside css_task_iter_start/next/end, css_set_lock is hold and then
released, so when iterating task(left side), the css_set may be moved to
another list(right side), then it->cset_head points to the old list head
and it->cset_pos->next points to the head node of new list, which can't
be used as struct css_set.
To fix this issue, switch from all css_sets to only scgrp's css_sets to
patch in-flight iterators to preserve correct iteration, and then
update it->cset_head as well.
Reported-by: Gaosheng Cui <cuigaosheng1@huawei.com>
Link: https://www.spinics.net/lists/cgroups/msg37935.html
Suggested-by: Michal Koutný <mkoutny@suse.com>
Link: https://lore.kernel.org/all/20230526114139.70274-1-xiujianfeng@huaweicloud.com/
Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Fixes: 2d8f243a5e ("cgroup: implement cgroup->e_csets[]")
Cc: stable@vger.kernel.org # v3.16+
Signed-off-by: Tejun Heo <tj@kernel.org>
The sysctl net/core/bpf_jit_enable does not work now due to commit
1022a5498f ("bpf, x86_64: Use bpf_jit_binary_pack_alloc"). The
commit saved the jitted insns into 'rw_image' instead of 'image'
which caused bpf_jit_dump not dumping proper content.
With 'echo 2 > /proc/sys/net/core/bpf_jit_enable', run
'./test_progs -t fentry_test'. Without this patch, one of jitted
image for one particular prog is:
flen=17 proglen=92 pass=4 image=0000000014c64883 from=test_progs pid=1807
00000000: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc
00000010: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc
00000020: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc
00000030: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc
00000040: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc
00000050: cc cc cc cc cc cc cc cc cc cc cc cc
With this patch, the jitte image for the same prog is:
flen=17 proglen=92 pass=4 image=00000000b90254b7 from=test_progs pid=1809
00000000: f3 0f 1e fa 0f 1f 44 00 00 66 90 55 48 89 e5 f3
00000010: 0f 1e fa 31 f6 48 8b 57 00 48 83 fa 07 75 2b 48
00000020: 8b 57 10 83 fa 09 75 22 48 8b 57 08 48 81 e2 ff
00000030: 00 00 00 48 83 fa 08 75 11 48 8b 7f 18 be 01 00
00000040: 00 00 48 83 ff 0a 74 02 31 f6 48 bf 18 d0 14 00
00000050: 00 c9 ff ff 48 89 77 00 31 c0 c9 c3
Fixes: 1022a5498f ("bpf, x86_64: Use bpf_jit_binary_pack_alloc")
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/bpf/20230609005439.3173569-1-yhs@fb.com
While SDHCI claims to support 64-bit DMA on MSM8916 it does not seem to
be properly functional. It is not immediately obvious because SDHCI is
usually used with IOMMU bypassed on this SoC, and all physical memory
has 32-bit addresses. But when trying to enable the IOMMU it quickly
fails with an error such as the following:
arm-smmu 1e00000.iommu: Unhandled context fault:
fsr=0x402, iova=0xfffff200, fsynr=0xe0000, cbfrsynra=0x140, cb=3
mmc1: ADMA error: 0x02000000
mmc1: sdhci: ============ SDHCI REGISTER DUMP ===========
mmc1: sdhci: Sys addr: 0x00000000 | Version: 0x00002e02
mmc1: sdhci: Blk size: 0x00000008 | Blk cnt: 0x00000000
mmc1: sdhci: Argument: 0x00000000 | Trn mode: 0x00000013
mmc1: sdhci: Present: 0x03f80206 | Host ctl: 0x00000019
mmc1: sdhci: Power: 0x0000000f | Blk gap: 0x00000000
mmc1: sdhci: Wake-up: 0x00000000 | Clock: 0x00000007
mmc1: sdhci: Timeout: 0x0000000a | Int stat: 0x00000001
mmc1: sdhci: Int enab: 0x03ff900b | Sig enab: 0x03ff100b
mmc1: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000000
mmc1: sdhci: Caps: 0x322dc8b2 | Caps_1: 0x00008007
mmc1: sdhci: Cmd: 0x0000333a | Max curr: 0x00000000
mmc1: sdhci: Resp[0]: 0x00000920 | Resp[1]: 0x5b590000
mmc1: sdhci: Resp[2]: 0xe6487f80 | Resp[3]: 0x0a404094
mmc1: sdhci: Host ctl2: 0x00000008
mmc1: sdhci: ADMA Err: 0x00000001 | ADMA Ptr: 0x0000000ffffff224
mmc1: sdhci_msm: ----------- VENDOR REGISTER DUMP -----------
mmc1: sdhci_msm: DLL sts: 0x00000000 | DLL cfg: 0x60006400 | DLL cfg2: 0x00000000
mmc1: sdhci_msm: DLL cfg3: 0x00000000 | DLL usr ctl: 0x00000000 | DDR cfg: 0x00000000
mmc1: sdhci_msm: Vndr func: 0x00018a9c | Vndr func2 : 0xf88018a8 Vndr func3: 0x00000000
mmc1: sdhci: ============================================
mmc1: sdhci: fffffffff200: DMA 0x0000ffffffffe100, LEN 0x0008, Attr=0x21
mmc1: sdhci: fffffffff20c: DMA 0x0000000000000000, LEN 0x0000, Attr=0x03
Looking closely it's obvious that only the 32-bit part of the address
(0xfffff200) arrives at the SMMU, the higher 16-bit (0xffff...) get
lost somewhere. This might not be a limitation of the SDHCI itself but
perhaps the bus/interconnect it is connected to, or even the connection
to the SMMU.
Work around this by setting SDHCI_QUIRK2_BROKEN_64_BIT_DMA to avoid
using 64-bit addresses.
Signed-off-by: Stephan Gerhold <stephan@gerhold.net>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230518-msm8916-64bit-v1-1-5694b0f35211@gerhold.net
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
For BEET the inner address and therefore family is stored in the
xfrm_state selector. Use that when decapsulating an input packet
instead of incorrectly relying on a non-existent tunnel protocol.
Fixes: 5f24f41e8e ("xfrm: Remove inner/outer modes from input path")
Reported-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
The sctp_sf_eat_auth() function is supposed to enum sctp_disposition
values and returning a kernel error code will cause issues in the
caller. Change -ENOMEM to SCTP_DISPOSITION_NOMEM.
Fixes: 65b07e5d0d ("[SCTP]: API updates to suport SCTP-AUTH extensions.")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Acked-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The sctp_sf_eat_auth() function is supposed to return enum sctp_disposition
values but if the call to sctp_ulpevent_make_authkey() fails, it returns
-ENOMEM.
This results in calling BUG() inside the sctp_side_effects() function.
Calling BUG() is an over reaction and not helpful. Call WARN_ON_ONCE()
instead.
This code predates git.
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The commit 59a0b022aa ("ipvlan: Make skb->skb_iif track skb->dev for l3s
mode") fixed ipvlan bonded dev checking by updating skb skb_iif. This fix
works for IPv4, as in raw_v4_input() the dif is from inet_iif(skb), which
is skb->skb_iif when there is no route.
But for IPv6, the fix is not enough, because in ipv6_raw_deliver() ->
raw_v6_match(), the dif is inet6_iif(skb), which is returns IP6CB(skb)->iif
instead of skb->skb_iif if it's not a l3_slave. To fix the IPv6 part
issue. Let's set IP6CB(skb)->iif to correct ifindex.
BTW, ipvlan handles NS/NA specifically. Since it works fine, I will not
reset IP6CB(skb)->iif when addr->atype is IPVL_ICMPV6.
Fixes: c675e06a98 ("ipvlan: decouple l3s mode dependencies from other modes")
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2196710
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When compiling YNL generated code compiler complains about
array-initializer-out-of-bounds. Turns out the MAX value
for STATS_GRP uses the value for STATS.
This may lead to random corruptions in user space (kernel
itself doesn't use this value as it never parses stats).
Fixes: f09ea6fb12 ("ethtool: add a new command for reading standard stats")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current implementation of max_credits on the client does
not work because the CreditRequest logic for several commands
does not take max_credits into account.
Still, we can end up asking the server for more credits, depending
on the number of credits in flight. For this, we need to
limit the credits while parsing the responses too.
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
iface_cmp used to simply do a memcmp of the two
provided struct sockaddrs. The comparison needs to do more
based on the address family. Similar logic was already
present in cifs_match_ipaddr. Doing something similar now.
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
The virtio driver for Linux guests will not set a link speed to its
paravirtualized NICs. This will be seen as -1 in the ethernet layer, and
when some servers (e.g. samba) fetches it, it's converted to an unsigned
value (and multiplied by 1000 * 1000), so in client side we end up with:
1) Speed: 4294967295000000 bps
in DebugData.
This patch introduces a helper that returns a speed string (in Mbps or
Gbps) if interface speed is valid (>= SPEED_10 and <= SPEED_800000), or
"Unknown" otherwise.
The reason to not change the value in iface->speed is because we don't
know the real speed of the HW backing the server NIC, so let's keep
considering these as the fastest NICs available.
Also print "Capabilities: None" when the interface doesn't support any.
Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de>
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Output of /proc/fs/cifs/DebugData shows only the per-connection
counter for the number of credits of regular type. i.e. the
credits reserved for echo and oplocks are not displayed.
There have been situations recently where having this info
would have been useful. This change prints the credit counters
of all three types: regular, echo, oplocks.
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
The ordering of status checks at the beginning of
cifs_tree_connect is wrong. As a result, a tcon
which is good may stay marked as needing reconnect
infinitely.
Fixes: 2f0e4f0342 ("cifs: check only tcon status on tcon related functions")
Cc: stable@vger.kernel.org # 6.3
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Because do_gettimeofday has been removed and replaced by ktime_get_real_ts64,
So just remove the comment as it's not needed now.
Signed-off-by: 鑫华 <jixianghua@xfusion.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
As noted by Michal, the blkg_iostat_set's in the lockless list hold
reference to blkg's to protect against their removal. Those blkg's
hold reference to blkcg. When a cgroup is being destroyed,
cgroup_rstat_flush() is only called at css_release_work_fn() which
is called when the blkcg reference count reaches 0. This circular
dependency will prevent blkcg and some blkgs from being freed after
they are made offline.
It is less a problem if the cgroup to be destroyed also has other
controllers like memory that will call cgroup_rstat_flush() which will
clean up the reference count. If block is the only controller that uses
rstat, these offline blkcg and blkgs may never be freed leaking more
and more memory over time.
To prevent this potential memory leak:
- flush blkcg per-cpu stats list in __blkg_release(), when no new stat
can be added
- add global blkg_stat_lock for covering concurrent parent blkg stat
update
- don't grab bio->bi_blkg reference when adding the stats into blkcg's
per-cpu stat list since all stats are guaranteed to be consumed before
releasing blkg instance, and grabbing blkg reference for stats was the
most fragile part of original patch
Based on Waiman's patch:
https://lore.kernel.org/linux-block/20221215033132.230023-3-longman@redhat.com/
Fixes: 3b8cc62987 ("blk-cgroup: Optimize blkcg_rstat_flush()")
Cc: stable@vger.kernel.org
Reported-by: Jay Shin <jaeshin@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: mkoutny@suse.com
Cc: Yosry Ahmed <yosryahmed@google.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20230609234249.1412858-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The ib_isert module is releasing the isert connection both in
isert_wait_conn() handler as well as isert_free_conn() handler.
In isert_wait_conn() handler, it is expected to wait for iSCSI
session logout operation to complete. It should free the isert
connection only in isert_free_conn() handler.
When a bunch of iSER target is cleared, this issue can lead to
use-after-free memory issue as isert conn is twice released
Fixes: b02efbfc9a ("iser-target: Fix implicit termination of connections")
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Link: https://lore.kernel.org/r/20230606102531.162967-4-saravanan.vajravel@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Pull x86 fix from Borislav Petkov:
- Set up the kernel CS earlier in the boot process in case EFI boots
the kernel after bypassing the decompressor and the CS descriptor
used ends up being the EFI one which is not mapped in the identity
page table, leading to early SEV/SNP guest communication exceptions
resulting in the guest crashing
* tag 'x86_urgent_for_v6.4_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/head/64: Switch to KERNEL_CS as soon as new GDT is installed
Pull smb server fixes from Steve French:
"Five smb3 server fixes, all also for stable:
- Fix four slab out of bounds warnings: improve checks for protocol
id, and for small packet length, and for create context parsing,
and for negotiate context parsing
- Fix for incorrect dereferencing POSIX ACLs"
* tag '6.4-rc5-smb3-server-fixes' of git://git.samba.org/ksmbd:
ksmbd: validate smb request protocol id
ksmbd: check the validation of pdu_size in ksmbd_conn_handler_loop
ksmbd: fix posix_acls and acls dereferencing possible ERR_PTR()
ksmbd: fix out-of-bound read in parse_lease_state()
ksmbd: fix out-of-bound read in deassemble_neg_contexts()
The cited commit aimed to ensure that Virtual Functions (VFs) assign a
queue affinity to a Queue Pair (QP) to distribute traffic when
the LAG master creates a hardware LAG. If the affinity was set while
the hardware was not in LAG, the firmware would ignore the affinity value.
However, this commit unintentionally assigned an affinity to QPs on the LAG
master's VPORT even if the RDMA device was not marked as LAG-enabled.
In most cases, this was not an issue because when the hardware entered
hardware LAG configuration, the RDMA device of the LAG master would be
destroyed and a new one would be created, marked as LAG-enabled.
The problem arises when a user configures Equal-Cost Multipath (ECMP).
In ECMP mode, traffic can be directed to different physical ports based on
the queue affinity, which is intended for use by VPORTS other than the
E-Switch manager. ECMP mode is supported only if both E-Switch managers are
in switchdev mode and the appropriate route is configured via IP. In this
configuration, the RDMA device is not destroyed, and we retain the RDMA
device that is not marked as LAG-enabled.
To ensure correct behavior, Send Queues (SQs) opened by the E-Switch
manager through verbs should be assigned strict affinity. This means they
will only be able to communicate through the native physical port
associated with the E-Switch manager. This will prevent the firmware from
assigning affinity and will not allow the SQs to be remapped in case of
failover.
Fixes: 802dcc7fc5 ("RDMA/mlx5: Support TX port affinity for VF drivers in LAG mode")
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://lore.kernel.org/r/425b05f4da840bc684b0f7e8ebf61aeb5cef09b0.1685960567.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Fix ib_uverbs_event_read() to consider event queue closing also upon
non-blocking mode.
Once the queue is closed (e.g. hot-plug flow) all the existing events
are cleaned-up as part of ib_uverbs_free_event_queue().
An application that uses the non-blocking FD mode should get -EIO in
that case to let it knows that the device was removed already.
Otherwise, it can loose the indication that the device was removed and
won't recover.
As part of that, refactor the code to have a single flow with regards to
'is_closed' for both blocking and non-blocking modes.
Fixes: 14e23bd6d2 ("RDMA/core: Fix locking in ib_uverbs_event_read")
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Link: https://lore.kernel.org/r/97b00116a1e1e13f8dc4ec38a5ea81cf8c030210.1685960567.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Previously we used the core device associated to the IB device in order
to do the Q-counters query to the FW, but in LAG mode it is possible
that the core device isn't the one that created this VF.
Hence instead of using the core device to query the Q-counters
we use the ESW core device which is guaranteed to be that of the VF.
Fixes: d22467a71e ("RDMA/mlx5: Expand switchdev Q-counters to expose representor statistics")
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Reviewed-by: Mark Zhang <markzhang@nvidia.com>
Link: https://lore.kernel.org/r/778d7d7a24892348d0bdef17d2e5f9e044717e86.1685960567.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Previously Q-counters data was being allocated over the PF for all of
the available vports, however that isn't necessary.
Since each VF or SF has a Q-counter allocated for itself.
So we only need to allocate two counters data structures, one for the
device counters, and one for all the other vports to expose the
representors, since they only need to read from it in order to
determine mainly counters numbers and names, so they can all share.
This in turn also solves a bug we previously had where we couldn't
switch the device to switchdev mode when there were more than 128 SF/VFs
configured, since that is the maximum amount of Q-counters available for
a single port
Fixes: d22467a71e ("RDMA/mlx5: Expand switchdev Q-counters to expose representor statistics")
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Reviewed-by: Mark Zhang <markzhang@nvidia.com>
Link: https://lore.kernel.org/r/f54671df16e2227a069b229b33b62cd9ee24c475.1685960567.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
A misbehaved user can create a steering anchor that points to a kernel
flow table and then destroy the anchor without freeing the associated
STC. This creates a problem as the kernel can't destroy the flow
table since there is still a reference to it. As a result, this can
exhaust all available flow table resources, preventing other users from
using the RDMA device.
To prevent this problem, a solution is implemented where a special flow
table with two steering rules is created when a user creates a steering
anchor for the first time. The rules include one that drops all traffic
and another that points to the kernel flow table. If the steering anchor
is destroyed, only the rule pointing to the kernel's flow table is removed.
Any traffic reaching the special flow table after that is dropped.
Since the special flow table is not destroyed when the steering anchor is
destroyed, any issues are prevented from occurring. The remaining resources
are only destroyed when the RDMA device is destroyed, which happens after
all DEVX objects are freed, including the STCs, thus mitigating the issue.
Fixes: 0c6ab0ca9a ("RDMA/mlx5: Expose steering anchor to userspace")
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Link: https://lore.kernel.org/r/b4a88a871d651fa4e8f98d552553c1cfe9ba2cd6.1685960567.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Delay drop data is initiated for PFs that have the capability of
rq_delay_drop and are in roce profile.
However, PFs with RAW ethernet profile do not initiate delay drop data
on function load, causing kernel panic if delay drop struct members are
accessed later on in case a dropless RQ is created.
Thus, stage the delay drop initialization as part of RAW ethernet
PF loading process.
Fixes: b5ca15ad7e ("IB/mlx5: Add proper representors support")
Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Link: https://lore.kernel.org/r/2e9d386785043d48c38711826eb910315c1de141.1685960567.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Pull i2c fixes from Wolfram Sang:
"Biggest news is that Andi Shyti steps in for maintaining the
controller drivers. Thank you very much!
Other than that, one new driver maintainer and the rest is usual
driver bugfixes. at24 has a Kconfig dependecy fix"
* tag 'i2c-for-6.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
MAINTAINERS: Add entries for Renesas RZ/V2M I2C driver
eeprom: at24: also select REGMAP
i2c: sprd: Delete i2c adapter in .remove's error path
i2c: mv64xxx: Fix reading invalid status value in atomic mode
i2c: designware: fix idx_write_cnt in read loop
i2c: mchp-pci1xxxx: Avoid cast to incompatible function type
i2c: img-scb: Fix spelling mistake "innacurate" -> "inaccurate"
MAINTAINERS: Add myself as I2C host drivers maintainer
Pull soundwire fixes from Vinod Koul:
"Core fix for missing flag clear, error patch handling in qcom driver
and BIOS quirk for HP Spectre x360:
- HP Spectre x360 soundwire DMI quirk
- Error path handling for qcom driver
- Core fix for missing clear of alloc_slave_rt"
* tag 'soundwire-6.4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire:
soundwire: stream: Add missing clear of alloc_slave_rt
soundwire: qcom: add proper error paths in qcom_swrm_startup()
soundwire: dmi-quirks: add new mapping for HP Spectre x360
Pull ARM SoC fixes from Arnd Bergmann:
"Most of the changes this time are for the Qualcomm Snapdragon
platforms.
There are bug fixes for error handling in Qualcomm icc-bwmon,
rpmh-rsc, ramp_controller and rmtfs driver as well as the AMD tee
firmware driver and a missing initialization in the Arm ff-a firmware
driver. The Qualcomm RPMh and EDAC drivers need some rework to work
correctly on all supported chips.
The DT fixes include:
- i.MX8 fixes for gpio, pinmux and clock settings
- ADS touchscreen gpio polarity settings in several machines
- Address dtb warnings for caches, panel and input-enable properties
on Qualcomm platforms
- Incorrect data on qualcomm platforms fir SA8155P power domains,
SM8550 LLCC, SC7180-lite SDRAM frequencies and SM8550 soundwire
- Remoteproc firmware paths are corrected for Sony Xperia 10 IV"
* tag 'arm-fixes-6.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (36 commits)
firmware: arm_ffa: Set handle field to zero in memory descriptor
ARM: dts: Fix erroneous ADS touchscreen polarities
arm64: dts: imx8mn-beacon: Fix SPI CS pinmux
arm64: dts: imx8-ss-dma: assign default clock rate for lpuarts
arm64: dts: imx8qm-mek: correct GPIOs for USDHC2 CD and WP signals
EDAC/qcom: Get rid of hardcoded register offsets
EDAC/qcom: Remove superfluous return variable assignment in qcom_llcc_core_setup()
arm64: dts: qcom: sm8550: Use the correct LLCC register scheme
dt-bindings: cache: qcom,llcc: Fix SM8550 description
arm64: dts: qcom: sc7180-lite: Fix SDRAM freq for misidentified sc7180-lite boards
arm64: dts: qcom: sm8550: use uint16 for Soundwire interval
soc: qcom: rpmhpd: Add SA8155P power domains
arm64: dts: qcom: Split out SA8155P and use correct RPMh power domains
dt-bindings: power: qcom,rpmpd: Add SA8155P
soc: qcom: Rename ice to qcom_ice to avoid module name conflict
soc: qcom: rmtfs: Fix error code in probe()
soc: qcom: ramp_controller: Fix an error handling path in qcom_ramp_controller_probe()
ARM: dts: at91: sama7g5ek: fix debounce delay property for shdwc
ARM: at91: pm: fix imbalanced reference counter for ethernet devices
arm64: dts: qcom: sm6375-pdx225: Fix remoteproc firmware paths
...
In the last step of the EEH recovery process, the EEH driver calls into
bnx2x_io_resume() to re-initialize the NIC hardware via the function
bnx2x_nic_load(). If an error occurs during bnx2x_nic_load(), OS and
hardware resources are released and an error code is returned to the
caller. When called from bnx2x_io_resume(), the return code is ignored
and the network interface is brought up unconditionally. Later attempts
to send a packet via this interface result in a page fault due to a null
pointer reference.
This patch checks the return code of bnx2x_nic_load(), prints an error
message if necessary, and does not enable the interface.
Signed-off-by: David Christensen <drc@linux.vnet.ibm.com>
Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
netfilter pull request 23-06-08
Pablo Neira Ayuso says:
====================
The following patchset contains Netfilter fixes for net:
1) Add commit and abort set operation to pipapo set abort path.
2) Bail out immediately in case of ENOMEM in nfnetlink batch.
3) Incorrect error path handling when creating a new rule leads to
dangling pointer in set transaction list.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
There is a shift wrapping bug in this code on 32-bit architectures.
NETLBL_CATMAP_MAPTYPE is u64, bitmap is unsigned long.
Every second 32-bit word of catmap becomes corrupted.
Signed-off-by: Dmitry Mastykin <dmastykin@astralinux.ru>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Naveen Mamindlapalli says:
====================
RVU NIX AF driver fixes
This patch series contains few fixes to the RVU NIX AF driver.
The first patch modifies the driver check to validate whether the req count
for contiguous and non-contiguous arrays is less than the maximum limit.
The second patch fixes HW lbk interface link credit programming.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix LBK link credits on CN10K to be same as CN9K i.e
16 * MAX_LBK_DATA_RATE instead of current scheme of
calculation based on LBK buf length / FIFO size.
Fixes: 6e54e1c539 ("octeontx2-af: cn10K: Add MTU configuration")
Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com>
Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
txschq_alloc response have two different arrays to store continuous
and non-continuous schedulers of each level. Requested count should
be checked for each array separately.
Fixes: 5d9b976d44 ("octeontx2-af: Support fixed transmit scheduler topology")
Signed-off-by: Satha Rao <skoteshwar@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com>
Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2023-06-08 (ice)
This series contains updates to ice driver only.
Simon Horman stops null pointer dereference for GNSS error path.
Kamil fixes memory leak when downing interface when XDP is enabled.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
ice: Fix XDP memory leak when NIC is brought up and down
ice: Don't dereference NULL in ice_gnss_read error path
====================
Link: https://lore.kernel.org/r/20230608200051.451752-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Matthieu Baerts says:
====================
selftests: mptcp: skip tests not supported by old kernels (part 2)
After a few years of increasing test coverage in the MPTCP selftests, we
realised [1] the last version of the selftests is supposed to run on old
kernels without issues.
Supporting older versions is not that easy for this MPTCP case: these
selftests are often validating the internals by checking packets that
are exchanged, when some MIB counters are incremented after some
actions, how connections are getting opened and closed in some cases,
etc. In other words, it is not limited to the socket interface between
the userspace and the kernelspace.
In addition to that, the current MPTCP selftests run a lot of different
sub-tests but the TAP13 protocol used in the selftests don't support
sub-tests: one failure in sub-tests implies that the whole selftest is
seen as failed at the end because sub-tests are not tracked. It is then
important to skip sub-tests not supported by old kernels.
To minimise the modifications and reduce the complexity to support old
versions, the idea is to look at external signs and skip the whole
selftests or just some sub-tests before starting them. This cannot be
applied in all cases.
This second part focuses on marking different sub-tests as skipped if
some MPTCP features are not supported. A few techniques are used here:
- Before starting some tests:
- Check if a file (sysctl knob) is present: that's what patch 13/14 is
doing for the userspace PM feature.
- Check if a symbol is present in /proc/kallsyms: patch 1/14 adds some
helpers in mptcp_lib.sh to ease its use. Then these helpers are used
in patches 2, 3, 4, 10, 11 and 14/14.
- Set a flag and get the status to check if a feature is supported:
patch 8/14 is doing that with the 'fullmesh' flag.
- After having launched the tests:
- Retrieve the counters after a test and check if they are different
than 0. Similar to the check with the flag, that's not ideal but in
this case, the counters were already present before the introduction
of MPTCP but they have been supported by MPTCP sockets only later.
Patches 5 and 6/14 are using this technique.
Before skipping tests, SELFTESTS_MPTCP_LIB_EXPECT_ALL_FEATURES env var
value is checked: if it is set to 1, the test is marked as "failed"
instead of "skipped". MPTCP public CI expects to have all features
supported and it sets this env var to 1 to catch regressions in these
new checks.
Patches 7/14 and 9/14 are a bit different because they don't skip tests:
- Patch 7/14 retrieves the default values instead of using hardcoded
ones because these default values have been modified at some points.
Then the comparisons are done with the default values.
- patch 9/14 relaxes the expected returned size from MPTCP's getsockopt
because the different structures gathering various info can get new
fields and get bigger over time. We cannot expect that the userspace
is using the same structure as the kernel.
Patch 12/14 marks the test as "skipped" instead of "failed" if the "ip"
tool is not available.
In this second part, the "mptcp_join" selftest is not modified yet. This
will come soon after in the third part with quite a few patches.
Link: https://lore.kernel.org/stable/CA+G9fYtDGpgT4dckXD-y-N92nqUxuvue_7AtDdBcHrbOMsDZLg@mail.gmail.com/ [1]
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
====================
Link: https://lore.kernel.org/r/20230608-upstream-net-20230608-mptcp-selftests-support-old-kernels-part-2-v1-0-20997a6fd841@tessares.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Selftests are supposed to run on any kernels, including the old ones not
supporting all MPTCP features.
One of them is the new listener events linked to the path-manager
introduced by commit f8c9dfbd87 ("mptcp: add pm listener events").
It is possible to look for "mptcp_event_pm_listener" in kallsyms to know
in advance if the kernel supports this feature and skip these sub-tests
if the feature is not supported.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
Fixes: 6c73008aa3 ("selftests: mptcp: listener test for userspace PM")
Cc: stable@vger.kernel.org
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Selftests are supposed to run on any kernels, including the old ones not
supporting all MPTCP features.
One of them is the MPTCP Userspace PM introduced by commit 4638de5aef
("mptcp: handle local addrs announced by userspace PMs").
We can skip all these tests if the feature is not supported simply by
looking for the MPTCP pm_type's sysctl knob.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
Fixes: 259a834fad ("selftests: mptcp: functional tests for the userspace PM type")
Cc: stable@vger.kernel.org
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Selftests are supposed to run on any kernels, including the old ones not
supporting all MPTCP features.
One of them is TCP_INQ cmsg support introduced in commit 2c9e77659a
("mptcp: add TCP_INQ cmsg support").
It is possible to look for "mptcp_ioctl" in kallsyms because it was
needed to introduce the mentioned feature. We can skip these tests and
not set TCPINQ option if the feature is not supported.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
Fixes: 5cbd886ce2 ("selftests: mptcp: add TCP_INQ support")
Cc: stable@vger.kernel.org
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Selftests are supposed to run on any kernels, including the old ones not
supporting all MPTCP features.
One of them is the getsockopt(SOL_MPTCP) to get info about the MPTCP
connections introduced by commit 55c42fa7fa ("mptcp: add MPTCP_INFO
getsockopt") and the following ones.
It is possible to look for "mptcp_diag_fill_info" in kallsyms because
it is introduced by the mentioned feature. So we can know in advance if
the feature is supported and skip the sub-test if not.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
Fixes: ce9979129a ("selftests: mptcp: add mptcp getsockopt test cases")
Cc: stable@vger.kernel.org
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Selftests are supposed to run on any kernels, including the old ones not
supporting all MPTCP features.
One of them is the getsockopt(SOL_MPTCP) to get info about the MPTCP
connections introduced by commit 55c42fa7fa ("mptcp: add MPTCP_INFO
getsockopt") and the following ones.
We cannot guess in advance which sizes the kernel will returned: older
kernel can returned smaller sizes, e.g. recently the tcp_info structure
has been modified in commit 71fc704768 ("tcp: add rcv_wnd and
plb_rehash to TCP_INFO") where a new field has been added.
The userspace can also expect a smaller size if it is compiled with old
uAPI kernel headers.
So for these sizes, we can only check if they are above a certain
threshold, 0 for the moment. We can also only compared sizes with the
ones set by the kernel.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
Fixes: ce9979129a ("selftests: mptcp: add mptcp getsockopt test cases")
Cc: stable@vger.kernel.org
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Selftests are supposed to run on any kernels, including the old ones not
supporting all MPTCP features.
One of them is the fullmesh flag that can be given to the MPTCP
in-kernel path-manager and introduced in commit 2843ff6f36 ("mptcp:
remote addresses fullmesh").
If the flag is not visible in the dump after having set it, we don't
check the content. Note that if we expect to have this feature and
SELFTESTS_MPTCP_LIB_EXPECT_ALL_FEATURES env var is set to 1, we always
check the content to avoid regressions.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
Fixes: 6da1dfdd03 ("selftests: mptcp: add set_flags tests in pm_netlink.sh")
Cc: stable@vger.kernel.org
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Selftests are supposed to run on any kernels, including the old ones not
supporting all MPTCP features.
One of them is the checks of the default limits returned by the MPTCP
in-kernel path-manager. The default values have been modified by commit
72bcbc46a5 ("mptcp: increase default max additional subflows to 2").
Instead of comparing with hardcoded values, we can get the default one
and compare with them.
Note that if we expect to have the latest version, we continue to check
the hardcoded values to avoid unexpected behaviour changes.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
Fixes: eedbc68532 ("selftests: add PM netlink functional tests")
Cc: stable@vger.kernel.org
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Selftests are supposed to run on any kernels, including the old ones not
supporting all MPTCP features.
One of them is the reporting of the MPTCP sockets being used, introduced
by commit c558246ee7 ("mptcp: add statistics for mptcp socket in use").
Similar to the parent commit, it looks like there is no good pre-check
to do here, i.e. dedicated function available in kallsyms. Instead, we
try to get info and if nothing is returned, the test is marked as
skipped.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
Fixes: e04a30f788 ("selftest: mptcp: add test for mptcp socket in use")
Cc: stable@vger.kernel.org
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Selftests are supposed to run on any kernels, including the old ones not
supporting all MPTCP features.
One of them is the listen diag dump support introduced by
commit 4fa39b701c ("mptcp: listen diag dump support").
It looks like there is no good pre-check to do here, i.e. dedicated
function available in kallsyms. Instead, we try to get info if nothing
is returned, the test is marked as skipped.
That's not ideal because something could be wrong with the feature and
instead of reporting an error, the test could be marked as skipped. If
we know in advanced that the feature is supposed to be supported, the
tester can set SELFTESTS_MPTCP_LIB_EXPECT_ALL_FEATURES env var to 1: in
this case the test will report an error instead of marking the test as
skipped if nothing is returned.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
Fixes: f2ae0fa68e ("selftests/mptcp: add diag listen tests")
Cc: stable@vger.kernel.org
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Selftests are supposed to run on any kernels, including the old ones not
supporting all MPTCP features.
One of them is the support of TCP_FASTOPEN socket option with MPTCP
connections introduced by commit 4ffb0a0234 ("mptcp: add TCP_FASTOPEN
sock option").
It is possible to look for "mptcp_fastopen_" in kallsyms to know if the
feature is supported or not.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
Fixes: ca7ae89160 ("selftests: mptcp: mptfo Initiator/Listener")
Cc: stable@vger.kernel.org
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Selftests are supposed to run on any kernels, including the old ones not
supporting all MPTCP features.
One of them is the full support of disconnections from the userspace
introduced by commit b29fcfb54c ("mptcp: full disconnect
implementation").
It is possible to look for "mptcp_pm_data_reset" in kallsyms because a
preparation patch added it to ease the introduction of the mentioned
feature.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
Fixes: 05be5e273c ("selftests: mptcp: add disconnect tests")
Cc: stable@vger.kernel.org
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Selftests are supposed to run on any kernels, including the old ones not
supporting all MPTCP features.
One of them is the support of IP(V6)_TRANSPARENT socket option with
MPTCP connections introduced by commit c9406a23c1 ("mptcp: sockopt:
add SOL_IP freebind & transparent options").
It is possible to look for "__ip_sock_set_tos" in kallsyms because
IP(V6)_TRANSPARENT socket option support has been added after TOS
support which came with the required infrastructure in MPTCP sockopt
code. To support TOS, the following function has been exported (T). Not
great but better than checking for a specific kernel version.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
Fixes: 5fb62e9cd3 ("selftests: mptcp: add tproxy test case")
Cc: stable@vger.kernel.org
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Selftests are supposed to run on any kernels, including the old ones not
supporting all MPTCP features.
New functions are now available to easily detect if a certain feature is
missing by looking at kallsyms.
These new helpers are going to be used in the following commits. In
order to ease the backport of such future patches, it would be good if
this patch is backported up to the introduction of MPTCP selftests,
hence the Fixes tag below: this type of check was supposed to be done
from the beginning.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
Fixes: 048d19d444 ("mptcp: add basic kselftest for mptcp")
Cc: stable@vger.kernel.org
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull block fixes from Jens Axboe:
- Fix an issue with the hardware queue nr_active, causing it to become
imbalanced (Tian)
- Fix an issue with null_blk not releasing pages if configured as
memory backed (Nitesh)
- Fix a locking issue in dasd (Jan)
* tag 'block-6.4-2023-06-09' of git://git.kernel.dk/linux:
s390/dasd: Use correct lock while counting channel queue length
null_blk: Fix: memory release when memory_backed=1
blk-mq: fix blk_mq_hw_ctx active request accounting
Pull virtio bug fixes from Michael Tsirkin:
"A bunch of fixes all over the place"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
tools/virtio: use canonical ftrace path
vhost_vdpa: support PACKED when setting-getting vring_base
vhost: support PACKED when setting-getting vring_base
vhost: Fix worker hangs due to missed wake up calls
vhost: Fix crash during early vhost_transport_send_pkt calls
vhost_net: revert upend_idx only on retriable error
vhost_vdpa: tell vqs about the negotiated
vdpa/mlx5: Fix hang when cvq commands are triggered during device unregister
tools/virtio: Add .gitignore for ringtest
tools/virtio: Fix arm64 ringtest compilation error
vduse: avoid empty string for dev name
vhost: use kzalloc() instead of kmalloc() followed by memset()
Pull ceph fixes from Ilya Dryomov:
"A fix for a potential data corruption in differential backup and
snapshot-based mirroring scenarios in RBD and a reference counting
fixup to avoid use-after-free in CephFS, all marked for stable"
* tag 'ceph-for-6.4-rc6' of https://github.com/ceph/ceph-client:
ceph: fix use-after-free bug for inodes when flushing capsnaps
rbd: get snapshot context after exclusive lock is ensured to be held
rbd: move RBD_OBJ_FLAG_COPYUP_ENABLED flag setting
The lock around counting the channel queue length in the BIODASDINFO
ioctl was incorrectly changed to the dasd_block->queue_lock with commit
583d6535cb ("dasd: remove dead code"). This can lead to endless list
iterations and a subsequent crash.
The queue_lock is supposed to be used only for queue lists belonging to
dasd_block. For dasd_device related queue lists the ccwdev lock must be
used.
Fix the mentioned issues by correctly using the ccwdev lock instead of
the queue lock.
Fixes: 583d6535cb ("dasd: remove dead code")
Cc: stable@vger.kernel.org # v5.0+
Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com>
Reviewed-by: Stefan Haberland <sth@linux.ibm.com>
Signed-off-by: Stefan Haberland <sth@linux.ibm.com>
Link: https://lore.kernel.org/r/20230609153750.1258763-2-sth@linux.ibm.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull RISC-V fixes from Palmer Dabbelt:
- A fix to avoid ISA-disallowed privilege mappings that can result from
WRITE+EXEC mmap requests from userspace.
- A fix for kfence to handle the huge pages.
- A fix to avoid converting misaligned VAs to huge pages.
- ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE has been selected so kprobe
can understand user pointers.
* tag 'riscv-for-linus-6.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: fix kprobe __user string arg print fault issue
riscv: Check the virtual alignment before choosing a map size
riscv: Fix kfence now that the linear mapping can be backed by PUD/P4D/PGD
riscv: mm: Ensure prot of VM_WRITE and VM_EXEC must be readable
Pull s390 fixes from Alexander Gordeev:
- Avoid linker error for randomly generated config file that has
CONFIG_BRANCH_PROFILE_NONE enabled and make it similar to riscv, x86
and also to commit 4bf3ec384e ("s390: disable branch profiling for
vdso").
- Currently, if the device is offline and all the channel paths are
either configured or varied offline, the associated subchannel gets
unregistered. Don't unregister the subchannel, instead unregister
offline device.
* tag 's390-6.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/purgatory: disable branch profiling
s390/cio: unregister device when the only path is gone
In the following:
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xd9/0x150 lib/dump_stack.c:106
assign_lock_key kernel/locking/lockdep.c:982 [inline]
register_lock_class+0xdb6/0x1120 kernel/locking/lockdep.c:1295
__lock_acquire+0x10a/0x5df0 kernel/locking/lockdep.c:4951
lock_acquire kernel/locking/lockdep.c:5691 [inline]
lock_acquire+0x1b1/0x520 kernel/locking/lockdep.c:5656
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
_raw_spin_lock_irqsave+0x3d/0x60 kernel/locking/spinlock.c:162
skb_dequeue+0x20/0x180 net/core/skbuff.c:3639
drain_resp_pkts drivers/infiniband/sw/rxe/rxe_comp.c:555 [inline]
rxe_completer+0x250d/0x3cc0 drivers/infiniband/sw/rxe/rxe_comp.c:652
rxe_qp_do_cleanup+0x1be/0x820 drivers/infiniband/sw/rxe/rxe_qp.c:761
execute_in_process_context+0x3b/0x150 kernel/workqueue.c:3473
__rxe_cleanup+0x21e/0x370 drivers/infiniband/sw/rxe/rxe_pool.c:233
rxe_create_qp+0x3f6/0x5f0 drivers/infiniband/sw/rxe/rxe_verbs.c:583
This is a use-before-initialization problem.
It happens because rxe_qp_do_cleanup is called during error unwind before
the struct has been fully initialized.
Move the initialization of the skb earlier.
Fixes: 8700e3e7c4 ("Soft RoCE driver")
Link: https://lore.kernel.org/r/20230602035408.741534-1-yanjun.zhu@intel.com
Reported-by: syzbot+eba589d8f49c73d356da@syzkaller.appspotmail.com
Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Pull gpio fixes from Bartosz Golaszewski:
"Two fixes for the GPIO testing module and one commit making Andy a
reviewer for the GPIO subsystem:
- fix a memory corruption bug in gpio-sim
- fix inconsistencies in user-space configuration of gpio-sim
- make Andy Shevchenko a reviewer for the GPIO subsystem"
* tag 'gpio-fixes-for-v6.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
MAINTAINERS: add Andy Shevchenko as reviewer for the GPIO subsystem
gpio: sim: quietly ignore configured lines outside the bank
gpio: sim: fix memory corruption when adding named lines and unnamed hogs
The canonical location for the tracefs filesystem is at /sys/kernel/tracing.
But, from Documentation/trace/ftrace.rst:
Before 4.1, all ftrace tracing control files were within the debugfs
file system, which is typically located at /sys/kernel/debug/tracing.
For backward compatibility, when mounting the debugfs file system,
the tracefs file system will be automatically mounted at:
/sys/kernel/debug/tracing
A few spots in tools/virtio still refer to this older debugfs
path, so let's update them to avoid confusion.
Signed-off-by: Ross Zwisler <zwisler@google.com>
Message-Id: <20230215223350.2658616-6-zwisler@google.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Mukesh Ojha <quic_mojha@quicinc.com>
Pull pin control fix from Linus Walleij:
"A single fix for the Meson driver, nothing else has surfaced so far
this cycle"
* tag 'pinctrl-v6.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: meson-axg: add missing GPIOA_18 gpio group
Pull sound fixes from Takashi Iwai:
"Lots of small fixes, and almost all are device-specific.
A few of them are the fixes for the old regressions by the fast kctl
lookups (introduced around 5.19). Others are ASoC simple-card fixes,
selftest compile warning fixes, ASoC AMD quirks, various ASoC codec
fixes as well as usual HD-audio quirks"
* tag 'sound-6.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (26 commits)
ALSA: hda/realtek: Enable 4 amplifiers instead of 2 on a HP platform
ALSA: hda: Fix kctl->id initialization
ALSA: gus: Fix kctl->id initialization
ALSA: cmipci: Fix kctl->id initialization
ALSA: ymfpci: Fix kctl->id initialization
ALSA: ice1712,ice1724: fix the kcontrol->id initialization
ALSA: hda/realtek: Add quirk for Clevo NS50AU
ALSA: hda/realtek: Add quirks for Asus ROG 2024 laptops using CS35L41
ALSA: hda/realtek: Add "Intel Reference board" and "NUC 13" SSID in the ALC256
ALSA: hda/realtek: Add Lenovo P3 Tower platform
ALSA: hda/realtek: Add a quirk for HP Slim Desktop S01
selftests: alsa: pcm-test: Fix compiler warnings about the format
ASoC: fsl_sai: Enable BCI bit if SAI works on synchronous mode with BYP asserted
ASoC: simple-card-utils: fix PCM constraint error check
ASoC: cs35l56: Remove NULL check from cs35l56_sdw_dai_set_stream()
ASoC: max98363: limit the number of channel to 1
ASoC: max98363: Removed 32bit support
ASoC: mediatek: mt8195: fix use-after-free in driver remove path
ASoC: mediatek: mt8188: fix use-after-free in driver remove path
ASoC: amd: yc: Add Thinkpad Neo14 to quirks list for acp6x
...
Pull ext4 fix from Ted Ts'o:
"Fix an ext4 regression which breaks remounting r/w file systems that
have the quota feature enabled"
* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: only check dquot_initialize_needed() when debugging
Revert "ext4: don't clear SB_RDONLY when remounting r/w until quota is re-enabled"
i.MX fixes for 6.4, round 2:
- Fix SPI CS pinmux for the final production version of imx8mn-beacon
board.
- Fix GPIOs for USDHC2 CD and WP signals on imx8qm-mek board.
- Assign default clock rate for i.MX8 LPUARTs to fix UART failure.
* tag 'imx-fixes-6.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux:
arm64: dts: imx8mn-beacon: Fix SPI CS pinmux
arm64: dts: imx8-ss-dma: assign default clock rate for lpuarts
arm64: dts: imx8qm-mek: correct GPIOs for USDHC2 CD and WP signals
Link: https://lore.kernel.org/r/20230607141312.GU4199@dragon
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
In the event of a failure in tcf_change_indev(), u32_set_parms() will
immediately return without decrementing the recently incremented
reference counter. If this happens enough times, the counter will
rollover and the reference freed, leading to a double free which can be
used to do 'bad things'.
In order to prevent this, move the point of possible failure above the
point where the reference counter is incremented. Also save any
meaningful return values to be applied to the return data at the
appropriate point in time.
This issue was caught with KASAN.
Fixes: 705c709126 ("net: sched: cls_u32: no need to call tcf_exts_change for newly allocated struct")
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Lee Jones <lee@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As shown in [1], out-of-bounds access occurs in two cases:
1)when the qdisc of the taprio type is used to replace the previously
configured taprio, count and offset in tc_to_txq can be set to 0. In this
case, the value of *txq in taprio_next_tc_txq() will increases
continuously. When the number of accessed queues exceeds the number of
queues on the device, out-of-bounds access occurs.
2)When packets are dequeued, taprio can be deleted. In this case, the tc
rule of dev is cleared. The count and offset values are also set to 0. In
this case, out-of-bounds access is also caused.
Now the restriction on the queue number is added.
[1] https://groups.google.com/g/syzkaller-bugs/c/_lYOKgkBVMg
Fixes: 2f530df76c ("net/sched: taprio: give higher priority to higher TCs in software dequeue mode")
Reported-by: syzbot+04afcb3d2c840447559a@syzkaller.appspotmail.com
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Tested-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
CN10KB silicon introduced a new exact match feature,
which is used for DMAC filtering. The state of installed
DMAC filters in this exact match table is getting corrupted
when promiscuous mode is toggled. Fix this by not touching
Exact match related config when promiscuous mode is toggled.
Fixes: 2dba9459d2 ("octeontx2-af: Wrapper functions for MAC addr add/del/update/reset")
Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The timestamp descriptors were intended to act cyclically. Descriptors
from index 0 through gq->ring_size - 1 contain actual information, and
the last index (gq->ring_size) should have LINKFIX to indicate
the first index 0 descriptor. However, the LINKFIX value is missing,
causing the timestamp feature to stop after all descriptors are used.
To resolve this issue, set the LINKFIX to the timestamp descritors.
Reported-by: Phong Hoang <phong.hoang.wz@renesas.com>
Fixes: 33f5d733b5 ("net: renesas: rswitch: Improve TX timestamp accuracy")
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of relying on skb->transport_header being set correctly, opt
instead to parse the L3 header length out of the L3 headers for both
IPv4/IPv6 when the Extended Layer Op for tcp/udp is used. This fixes a
bug if GRO is disabled, when GRO is disabled skb->transport_header is
set by __netif_receive_skb_core() to point to the L3 header, it's later
fixed by the upper protocol layers, but act_pedit will receive the SKB
before the fixups are completed. The existing behavior causes the
following to edit the L3 header if GRO is disabled instead of the UDP
header:
tc filter add dev eth0 ingress protocol ip flower ip_proto udp \
dst_ip 192.168.1.3 action pedit ex munge udp set dport 18053
Also re-introduce a rate-limited warning if we were unable to extract
the header offset when using the 'ex' interface.
Fixes: 71d0ed7079 ("net/act_pedit: Support using offset relative to
the conventional network headers")
Signed-off-by: Max Tottenham <mtottenh@akamai.com>
Reviewed-by: Josh Hunt <johunt@akamai.com>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202305261541.N165u9TZ-lkp@intel.com/
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linux 6.4-rc5
* tag 'v6.4-rc5': (303 commits)
Linux 6.4-rc5
leds: qcom-lpg: Fix PWM period limits
selftests/ftrace: Choose target function for filter test from samples
KVM: selftests: Add test for race in kvm_recalculate_apic_map()
KVM: x86: Bail from kvm_recalculate_phys_map() if x2APIC ID is out-of-bounds
KVM: x86: Account fastpath-only VM-Exits in vCPU stats
KVM: SVM: vNMI pending bit is V_NMI_PENDING_MASK not V_NMI_BLOCKING_MASK
KVM: x86/mmu: Grab memslot for correct address space in NX recovery worker
tpm, tpm_tis: correct tpm_tis_flags enumeration values
Revert "ext4: remove ac->ac_found > sbi->s_mb_min_to_scan dead check in ext4_mb_check_limits"
riscv: Implement missing huge_ptep_get
riscv: Fix huge_ptep_set_wrprotect when PTE is a NAPOT
module/decompress: Fix error checking on zstd decompression
fork, vhost: Use CLONE_THREAD to fix freezer/ps regression
dt-bindings: serial: 8250_omap: add rs485-rts-active-high
selinux: don't use make's grouped targets feature yet
riscv: perf: Fix callchain parse error with kernel tracepoint events
mptcp: fix active subflow finalization
mptcp: add annotations around sk->sk_shutdown accesses
mptcp: fix data race around msk->first access
...
Pull drm fixes from Dave Airlie:
"Bit busier and a bit more scattered than usual. amdgpu is the main
one, with ivpu and msm having a few fixes, then i915, exynos, ast,
lima, radeon with some misc bits, but overall nothing standing out.
fb-helper:
- Fill in fb-helper vars more correctly
amdgpu:
- S0ix fixes
- GPU reset fixes
- SMU13 fixes
- SMU11 fixes
- Misc Display fixes
- Revert RV/RV2/PCO clock counter changes
- Fix Stoney xclk value
- Fix reserved vram debug info
radeon:
- Fix a potential use after free
i915:
- CDCLK voltage fix for ADL-P
- eDP wake sync pulse fix
- Two error handling fixes to selftests
exynos:
- Fix wrong return in Exynos vidi driver
- Fix use-after-free issue to Exynos g2d driver
ast:
- resume and modeset fixes for ast
ivpu:
- Assorted ivpu fixes
lima:
- lima context destroy fix
msm:
- Fix max segment size to address splat on newer a6xx
- Disable PSR by default w/ modparam to re-enable, since there still
seems to be a lingering issue
- Fix HPD issue
- Fix issue with unitialized GMU mutex"
* tag 'drm-fixes-2023-06-09' of git://anongit.freedesktop.org/drm/drm: (32 commits)
drm/msm/a6xx: initialize GMU mutex earlier
drm/msm/dp: enable HDP plugin/unplugged interrupts at hpd_enable/disable
accel/ivpu: Fix sporadic VPU boot failure
accel/ivpu: Do not use mutex_lock_interruptible
accel/ivpu: Do not trigger extra VPU reset if the VPU is idle
drm/amd/display: Reduce sdp bw after urgent to 90%
drm/amdgpu: change reserved vram info print
drm/amdgpu: fix xclk freq on CHIP_STONEY
drm/radeon: fix race condition UAF in radeon_gem_set_domain_ioctl
Revert "drm/amdgpu: switch to golden tsc registers for raven/raven2"
Revert "drm/amdgpu: Differentiate between Raven2 and Raven/Picasso according to revision id"
Revert "drm/amdgpu: change the reference clock for raven/raven2"
drm/amd/display: add ODM case when looking for first split pipe
drm/amd: Make lack of `ACPI_FADT_LOW_POWER_S0` or `CONFIG_AMD_PMC` louder during suspend path
drm/amd/pm: conditionally disable pcie lane switching for some sienna_cichlid SKUs
drm/amd/pm: Fix power context allocation in SMU13
drm/amdgpu: fix Null pointer dereference error in amdgpu_device_recover_vram
drm/amd: Disallow s0ix without BIOS support again
drm/i915/selftests: Add some missing error propagation
drm/exynos: fix race condition UAF in exynos_g2d_exec_ioctl
...
82580/i354/i350 features circle-counter-like timestamp registers
that are different with newer i210. The EXTTS capture value in
AUXTSMPx should be converted from raw circle counter value to
timestamp value in resolution of 1 nanosec by the driver.
This issue can be reproduced on i350 nics, connecting an 1PPS
signal to a SDP pin, and run 'ts2phc' command to read external
1PPS timestamp value. On i210 this works fine, but on i350 the
extts is not correctly converted.
The i350/i354/82580's SYSTIM and other timestamp registers are
40bit counters, presenting time range of 2^40 ns, that means these
registers overflows every about 1099s. This causes all these regs
can't be used directly in contrast to the newer i210/i211s.
The igb driver needs to convert these raw register values to
valid time stamp format by using kernel timecounter apis for i350s
families. Here the igb_extts() just forgot to do the convert.
Fixes: 38970eac41 ("igb: support EXTTS on 82580/i354/i350")
Signed-off-by: Yuezhen Luan <eggcar.luan@gmail.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230607164116.3768175-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ping sockets can't send packets when they're bound to a VRF master
device and the output interface is set to a slave device.
For example, when net.ipv4.ping_group_range is properly set, so that
ping6 can use ping sockets, the following kind of commands fails:
$ ip vrf exec red ping6 fe80::854:e7ff:fe88:4bf1%eth1
What happens is that sk->sk_bound_dev_if is set to the VRF master
device, but 'oif' is set to the real output device. Since both are set
but different, ping_v6_sendmsg() sees their value as inconsistent and
fails.
Fix this by allowing 'oif' to be a slave device of ->sk_bound_dev_if.
This fixes the following kselftest failure:
$ ./fcnal-test.sh -t ipv6_ping
[...]
TEST: ping out, vrf device+address bind - ns-B IPv6 LLA [FAIL]
Reported-by: Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr>
Closes: https://lore.kernel.org/netdev/b6191f90-ffca-dbca-7d06-88a9788def9c@alu.unizg.hr/
Tested-by: Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr>
Fixes: 5e45789698 ("net: ipv6: Fix ping to link-local addresses.")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/6c8b53108816a8d0d5705ae37bdc5a8322b5e3d9.1686153846.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull cgroup fixes from Tejun Heo:
- Fix css_set reference leaks on fork failures
- Fix CPU hotplug locking in cgroup_transfer_tasks() which is used by
cgroup1 cpuset
- Doc update
* tag 'cgroup-for-6.4-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: Documentation: Clarify usage of memory limits
cgroup: always put cset in cgroup_css_set_put_fork
cgroup: fix missing cpus_read_{lock,unlock}() in cgroup_transfer_tasks()
The internal_hpd flag is set to true by dp_bridge_hpd_enable() and set to
false by dp_bridge_hpd_disable() to handle GPIO pinmuxed into DP controller
case. HDP related interrupts can not be enabled until internal_hpd is set
to true. At current implementation dp_display_config_hpd() will initialize
DP host controller first followed by enabling HDP related interrupts if
internal_hpd was true at that time. Enable HDP related interrupts depends on
internal_hpd status may leave system with DP driver host is in running state
but without HDP related interrupts being enabled. This will prevent external
display from being detected. Eliminated this dependency by moving HDP related
interrupts enable/disable be done at dp_bridge_hpd_enable/disable() directly
regardless of internal_hpd status.
Changes in V3:
-- dp_catalog_ctrl_hpd_enable() and dp_catalog_ctrl_hpd_disable()
-- rewording ocmmit text
Changes in V4:
-- replace dp_display_config_hpd() with dp_display_host_start()
-- move enable_irq() at dp_display_host_start();
Changes in V5:
-- replace dp_display_host_start() with dp_display_host_init()
Changes in V6:
-- squash remove enable_irq() and disable_irq()
Fixes: cd198cadde ("drm/msm/dp: Rely on hpd_enable/disable callbacks")
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Tested-by: Leonard Lausen <leonard@lausen.nl> # on sc7180 lazor
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Reviewed-by: Bjorn Andersson <andersson@kernel.org>
Tested-by: Bjorn Andersson <andersson@kernel.org>
Reviewed-by: Abhinav Kumar <quic_abhinavk@quicinc.com>
Link: https://lore.kernel.org/r/1684878756-17830-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
Pull arm64 fixes from Will Deacon:
"Two tiny arm64 fixes for -rc6.
One fixes a build breakage when MAX_ORDER can be nonsensical if
CONFIG_EXPERT=y and the other fixes the address masking for perf's
page fault software events so that it is consistent amongst them:
- Fix build breakage due to bogus MAX_ORDER definitions on !4k pages
- Avoid masking fault address for perf software events"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: mm: pass original fault address to handle_mm_fault() in PER_VMA_LOCK block
arm64: Remove the ARCH_FORCE_MAX_ORDER config input prompt
For ENETC hardware, the TCs are numbered from 0 to N-1, where N
is the number of TCs. Numerically higher TC has higher priority.
It's obvious that the highest priority TC index should be N-1 and
the 2nd highest priority TC index should be N-2.
However, the previous logic uses netdev_get_prio_tc_map() to get
the indexes of highest priority and 2nd highest priority TCs, it
does not make sense and is incorrect to give a "tc" argument to
netdev_get_prio_tc_map(). So the driver may get the wrong indexes
of the two highest priotiry TCs which would lead to failed to set
the CBS for the two highest priotiry TCs.
e.g.
$ tc qdisc add dev eno0 parent root handle 100: mqprio num_tc 6 \
map 0 0 1 1 2 3 4 5 queues 1@0 1@1 1@2 1@3 2@4 2@6 hw 1
$ tc qdisc replace dev eno0 parent 100:6 cbs idleslope 100000 \
sendslope -900000 hicredit 12 locredit -113 offload 1
$ Error: Specified device failed to setup cbs hardware offload.
^^^^^
In this example, the previous logic deems the indexes of the two
highest priotiry TCs should be 3 and 2. Actually, the indexes are
5 and 4, because the number of TCs is 6. So it would be failed to
configure the CBS for the two highest priority TCs.
Fixes: c431047c4e ("enetc: add support Credit Based Shaper(CBS) for hardware offload")
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In case of error when adding a new rule that refers to an anonymous set,
deactivate expressions via NFT_TRANS_PREPARE state, not NFT_TRANS_RELEASE.
Thus, the lookup expression marks anonymous sets as inactive in the next
generation to ensure it is not reachable in this transaction anymore and
decrement the set refcount as introduced by c1592a8994 ("netfilter:
nf_tables: deactivate anonymous set from preparation phase"). The abort
step takes care of undoing the anonymous set.
This is also consistent with rule deletion, where NFT_TRANS_PREPARE is
used. Note that this error path is exercised in the preparation step of
the commit protocol. This patch replaces nf_tables_rule_release() by the
deactivate and destroy calls, this time with NFT_TRANS_PREPARE.
Due to this incorrect error handling, it is possible to access a
dangling pointer to the anonymous set that remains in the transaction
list.
[1009.379054] BUG: KASAN: use-after-free in nft_set_lookup_global+0x147/0x1a0 [nf_tables]
[1009.379106] Read of size 8 at addr ffff88816c4c8020 by task nft-rule-add/137110
[1009.379116] CPU: 7 PID: 137110 Comm: nft-rule-add Not tainted 6.4.0-rc4+ #256
[1009.379128] Call Trace:
[1009.379132] <TASK>
[1009.379135] dump_stack_lvl+0x33/0x50
[1009.379146] ? nft_set_lookup_global+0x147/0x1a0 [nf_tables]
[1009.379191] print_address_description.constprop.0+0x27/0x300
[1009.379201] kasan_report+0x107/0x120
[1009.379210] ? nft_set_lookup_global+0x147/0x1a0 [nf_tables]
[1009.379255] nft_set_lookup_global+0x147/0x1a0 [nf_tables]
[1009.379302] nft_lookup_init+0xa5/0x270 [nf_tables]
[1009.379350] nf_tables_newrule+0x698/0xe50 [nf_tables]
[1009.379397] ? nf_tables_rule_release+0xe0/0xe0 [nf_tables]
[1009.379441] ? kasan_unpoison+0x23/0x50
[1009.379450] nfnetlink_rcv_batch+0x97c/0xd90 [nfnetlink]
[1009.379470] ? nfnetlink_rcv_msg+0x480/0x480 [nfnetlink]
[1009.379485] ? __alloc_skb+0xb8/0x1e0
[1009.379493] ? __alloc_skb+0xb8/0x1e0
[1009.379502] ? entry_SYSCALL_64_after_hwframe+0x46/0xb0
[1009.379509] ? unwind_get_return_address+0x2a/0x40
[1009.379517] ? write_profile+0xc0/0xc0
[1009.379524] ? avc_lookup+0x8f/0xc0
[1009.379532] ? __rcu_read_unlock+0x43/0x60
Fixes: 958bee14d0 ("netfilter: nf_tables: use new transaction infrastructure to handle sets")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
We can race where we have added work to the work_list, but
vhost_task_fn has passed that check but not yet set us into
TASK_INTERRUPTIBLE. wake_up_process will see us in TASK_RUNNING and
just return.
This bug was intoduced in commit f9010dbdce ("fork, vhost: Use
CLONE_THREAD to fix freezer/ps regression") when I moved the setting
of TASK_INTERRUPTIBLE to simplfy the code and avoid get_signal from
logging warnings about being in the wrong state. This moves the setting
of TASK_INTERRUPTIBLE back to before we test if we need to stop the
task to avoid a possible race there as well. We then have vhost_worker
set TASK_RUNNING if it finds work similar to before.
Fixes: f9010dbdce ("fork, vhost: Use CLONE_THREAD to fix freezer/ps regression")
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Message-Id: <20230607192338.6041-3-michael.christie@oracle.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
If userspace does VHOST_VSOCK_SET_GUEST_CID before VHOST_SET_OWNER we
can race where:
1. thread0 calls vhost_transport_send_pkt -> vhost_work_queue
2. thread1 does VHOST_SET_OWNER which calls vhost_worker_create.
3. vhost_worker_create will set the dev->worker pointer before setting
the worker->vtsk pointer.
4. thread0's vhost_work_queue will see the dev->worker pointer is
set and try to call vhost_task_wake using not yet set worker->vtsk
pointer.
5. We then crash since vtsk is NULL.
Before commit 6e890c5d50 ("vhost: use vhost_tasks for worker
threads"), we only had the worker pointer so we could just check it to
see if VHOST_SET_OWNER has been done. After that commit we have the
vhost_worker and vhost_task pointer, so we can now hit the bug above.
This patch embeds the vhost_worker in the vhost_dev and moves the work
list initialization back to vhost_dev_init, so we can just check the
worker.vtsk pointer to check if VHOST_SET_OWNER has been done like
before.
Fixes: 6e890c5d50 ("vhost: use vhost_tasks for worker threads")
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Message-Id: <20230607192338.6041-2-michael.christie@oracle.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reported-by: syzbot+d0d442c22fa8db45ff0e@syzkaller.appspotmail.com
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Fix possible virtqueue used buffers leak and corresponding stuck
in case of temporary -EIO from sendmsg() which is produced by
tun driver while backend device is not up.
In case of no-retriable error and zcopy do not revert upend_idx
to pass packet data (that is update used_idx in corresponding
vhost_zerocopy_signal_used()) as if packet data has been
transferred successfully.
v2: set vq->heads[ubuf->desc].len equal to VHOST_DMA_DONE_LEN
in case of fake successful transmit.
Signed-off-by: Andrey Smetanin <asmetanin@yandex-team.ru>
Message-Id: <20230424204411.24888-1-asmetanin@yandex-team.ru>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Andrey Smetanin <asmetanin@yandex-team.ru>
Acked-by: Jason Wang <jasowang@redhat.com>
As is done in the net, iscsi, and vsock vhost support, let the vdpa vqs
know about the features that have been negotiated. This allows vhost
to more safely make decisions based on the features, such as when using
PACKED vs split queues.
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20230424225031.18947-2-shannon.nelson@amd.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Fix the buffer leak that occurs while switching
the port up and down with traffic and XDP by
checking for an active XDP program and freeing all empty TX buffers.
Fixes: efc2214b60 ("ice: Add support for XDP")
Signed-off-by: Kamil Maziarz <kamil.maziarz@intel.com>
Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel)
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
On riscv qemu platform, when add kprobe event on do_sys_open() to show
filename string arg, it just print fault as follow:
echo 'p:myprobe do_sys_open dfd=$arg1 filename=+0($arg2):string flags=$arg3
mode=$arg4' > kprobe_events
bash-166 [000] ...1. 360.195367: myprobe: (do_sys_open+0x0/0x84)
dfd=0xffffffffffffff9c filename=(fault) flags=0x8241 mode=0x1b6
bash-166 [000] ...1. 360.219369: myprobe: (do_sys_open+0x0/0x84)
dfd=0xffffffffffffff9c filename=(fault) flags=0x8241 mode=0x1b6
bash-191 [000] ...1. 360.378827: myprobe: (do_sys_open+0x0/0x84)
dfd=0xffffffffffffff9c filename=(fault) flags=0x98800 mode=0x0
As riscv do not select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE,
the +0($arg2) addr is processed as a kernel address though it is a
userspace address, cause the above filename=(fault) print. So select
ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE to avoid the issue, after that the
kprobe trace is ok as below:
bash-166 [000] ...1. 96.767641: myprobe: (do_sys_open+0x0/0x84)
dfd=0xffffffffffffff9c filename="/dev/null" flags=0x8241 mode=0x1b6
bash-166 [000] ...1. 96.793751: myprobe: (do_sys_open+0x0/0x84)
dfd=0xffffffffffffff9c filename="/dev/null" flags=0x8241 mode=0x1b6
bash-177 [000] ...1. 96.962354: myprobe: (do_sys_open+0x0/0x84)
dfd=0xffffffffffffff9c filename="/sys/kernel/debug/tracing/events/kprobes/"
flags=0x98800 mode=0x0
Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com>
Acked-by: Björn Töpel <bjorn@rivosinc.com>
Fixes: 0ebeea8ca8 ("bpf: Restrict bpf_probe_read{, str}() only to archs where they work")
Link: https://lore.kernel.org/r/20230504072910.3742842-1-ruanjinjie@huawei.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Pull xfs fixes from Dave Chinner:
"These are a set of regression fixes discovered on recent kernels. I
was hoping to send this to you a week and half ago, but events out of
my control delayed finalising the changes until early this week.
Whilst the diffstat looks large for this stage of the merge window, a
large chunk of it comes from moving the guts of one function from one
file to another i.e. it's the same code, it is just run in a different
context where it is safe to hold a specific lock. Otherwise the
individual changes are relatively small and straigtht forward.
Summary:
- Propagate unlinked inode list corruption back up to log recovery
(regression fix)
- improve corruption detection for AGFL entries, AGFL indexes and
XEFI extents (syzkaller fuzzer oops report)
- Avoid double perag reference release (regression fix)
- Improve extent merging detection in scrub (regression fix)
- Fix a new undefined high bit shift (regression fix)
- Fix for AGF vs inode cluster buffer deadlock (regression fix)"
* tag 'xfs-6.4-rc5-fixes' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: collect errors from inodegc for unlinked inode recovery
xfs: validate block number being freed before adding to xefi
xfs: validity check agbnos on the AGFL
xfs: fix agf/agfl verification on v4 filesystems
xfs: fix double xfs_perag_rele() in xfs_filestream_pick_ag()
xfs: fix broken logic when detecting mergeable bmap records
xfs: Fix undefined behavior of shift into sign bit
xfs: fix AGF vs inode cluster buffer deadlock
xfs: defered work could create precommits
xfs: restore allocation trylock iteration
xfs: buffer pins need to hold a buffer reference
If pf is NULL in ice_gnss_read() then it will be dereferenced
in the error path by a call to dev_dbg(ice_pf_to_dev(pf), ...).
Avoid this by simply returning in this case.
If logging is desired an alternate approach might be to
use pr_err() before returning.
Flagged by Smatch as:
.../ice_gnss.c:196 ice_gnss_read() error: we previously assumed 'pf' could be null (see line 131)
Fixes: 43113ff734 ("ice: add TTY for GNSS module for E810T device")
Signed-off-by: Simon Horman <horms@kernel.org>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
ext4_xattr_block_set() relies on its caller to call dquot_initialize()
on the inode. To assure that this has happened there are WARN_ON
checks. Unfortunately, this is subject to false positives if there is
an antagonist thread which is flipping the file system at high rates
between r/o and rw. So only do the check if EXT4_XATTR_DEBUG is
enabled.
Link: https://lore.kernel.org/r/20230608044056.GA1418535@mit.edu
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
[BUG]
After the recent scrub rework introduced in commit e02ee89baa ("btrfs:
scrub: switch scrub_simple_mirror() to scrub_stripe infrastructure"),
btrfs scrub no longer reports repaired errors any more:
# mkfs.btrfs -f $dev -d DUP
# mount $dev $mnt
# xfs_io -f -d -c "pwrite -b 64K -S 0xaa 0 64" $mnt/file
# umount $dev
# xfs_io -f -c "pwrite -S 0xff $phy1 64K" $dev # Corrupt the first mirror
# mount $dev $mnt
# btrfs scrub start -BR $mnt
scrub done for 725e7cb7-8a4a-4c77-9f2a-86943619e218
Scrub started: Tue Jun 6 14:56:50 2023
Status: finished
Duration: 0:00:00
data_extents_scrubbed: 2
tree_extents_scrubbed: 18
data_bytes_scrubbed: 131072
tree_bytes_scrubbed: 294912
read_errors: 0
csum_errors: 0 <<< No errors here
verify_errors: 0
[...]
uncorrectable_errors: 0
unverified_errors: 0
corrected_errors: 16 <<< Only corrected errors
last_physical: 2723151872
This can confuse btrfs-progs, as it relies on the csum_errors to
determine if there is anything wrong.
While on v6.3.x kernels, the report is different:
csum_errors: 16 <<<
verify_errors: 0
[...]
uncorrectable_errors: 0
unverified_errors: 0
corrected_errors: 16 <<<
[CAUSE]
In the reworked scrub, we update the scrub progress inside
scrub_stripe_report_errors(), using various bitmaps to update the
result.
For example for csum_errors, we use bitmap_weight() of
stripe->csum_error_bitmap.
Unfortunately at that stage, all error bitmaps (except
init_error_bitmap) are the result of the latest repair attempt, thus if
the stripe is fully repaired, those error bitmaps will all be empty,
resulting the above output mismatch.
To fix this, record the number of errors into stripe->init_nr_*_errors.
Since we don't really care about where those errors are, we only need to
record the number of errors.
Then in scrub_stripe_report_errors(), use those initial numbers to
update the progress other than using the latest error bitmaps.
Fixes: e02ee89baa ("btrfs: scrub: switch scrub_simple_mirror() to scrub_stripe infrastructure")
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
With recent scrub rework, the scrub operation no longer respects the
read-only flag passed by "-r" option of "btrfs scrub start" command.
# mkfs.btrfs -f -d raid1 $dev1 $dev2
# mount $dev1 $mnt
# xfs_io -f -d -c "pwrite -b 128K -S 0xaa 0 128k" $mnt/file
# sync
# xfs_io -c "pwrite -S 0xff $phy1 64k" $dev1
# xfs_io -c "pwrite -S 0xff $((phy2 + 65536)) 64k" $dev2
# mount $dev1 $mnt -o ro
# btrfs scrub start -BrRd $mnt
Scrub device $dev1 (id 1) done
Scrub started: Tue Jun 6 09:59:14 2023
Status: finished
Duration: 0:00:00
[...]
corrected_errors: 16 <<< Still has corrupted sectors
last_physical: 1372585984
Scrub device $dev2 (id 2) done
Scrub started: Tue Jun 6 09:59:14 2023
Status: finished
Duration: 0:00:00
[...]
corrected_errors: 16 <<< Still has corrupted sectors
last_physical: 1351614464
# btrfs scrub start -BrRd $mnt
Scrub device $dev1 (id 1) done
Scrub started: Tue Jun 6 10:00:17 2023
Status: finished
Duration: 0:00:00
[...]
corrected_errors: 0 <<< No more errors
last_physical: 1372585984
Scrub device $dev2 (id 2) done
[...]
corrected_errors: 0 <<< No more errors
last_physical: 1372585984
[CAUSE]
In the newly reworked scrub code, repair is always submitted no matter
if we're doing a read-only scrub.
[FIX]
Fix it by skipping the write submission if the scrub is a read-only one.
Unfortunately for the report part, even for a read-only scrub we will
still report it as corrected errors, as we know it's repairable, even we
won't really submit the write.
Fixes: e02ee89baa ("btrfs: scrub: switch scrub_simple_mirror() to scrub_stripe infrastructure")
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Johan writes:
USB-serial device ids for 6.4-rc6
Here are some new modem device ids.
Everything has been in linux-next with no reported issues.
* tag 'usb-serial-6.4-rc6' of https://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial:
USB: serial: option: add Quectel EM061KGL series
Michael Chan says:
====================
bnxt_en: Bug fixes
This patchset has the following fixes for bnxt_en:
1. Add missing VNIC ID parameter in the FW message when getting an
updated RSS configuration from the FW.
2. Fix a warning when doing ethtool reset on newer chips.
3. Fix VLAN issue on a VF when a default VLAN is assigned.
4. Fix a problem during DPC (Downstream Port containment) scenario.
5. Fix a NULL pointer dereference when receiving a PTP event from FW.
6. Fix VXLAN/Geneve UDP port delete/add with newer FW.
====================
Link: https://lore.kernel.org/r/20230607075409.228450-1-michael.chan@broadcom.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
As per the new udp tunnel framework, drivers which need to know the
details of a port entry (i.e. port type) when it gets deleted should
use the .set_port / .unset_port callbacks.
Implementing the current .udp_tunnel_sync callback would mean that the
deleted tunnel port entry would be all zeros. This used to work on
older firmware because it would not check the input when deleting a
tunnel port. With newer firmware, the delete will now fail and
subsequent tunnel port allocation will fail as a result.
Fixes: 442a35a5a7 ("bnxt: convert to new udp_tunnel_nic infra")
Reviewed-by: Kalesh Anakkur Purayil <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The firmware can send PHC_RTC_UPDATE async event on a PF that may not
have PTP registered. In such a case, there will be a null pointer
deference for bp->ptp_cfg when we try to handle the event.
Fix it by not registering for this event with the firmware if !bp->ptp_cfg.
Also, check that bp->ptp_cfg is valid before proceeding when we receive
the event.
Fixes: 8bcf6f04d4 ("bnxt_en: Handle async event when the PHC is updated in RTC mode")
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Driver starts firmware fatal error recovery by detecting
heartbeat failure or fw reset count register changing. But
these checks are not reliable if the device is not accessible.
This can happen while DPC (Downstream Port containment) is in
progress. Skip firmware fatal recovery if pci_device_is_present()
returns false.
Fixes: acfb50e4e7 ("bnxt_en: Add FW fatal devlink_health_reporter.")
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
We need to call bnxt_hwrm_func_qcfg() on a VF to query the default
VLAN that may be setup by the PF. If a default VLAN is enabled,
the VF cannot support VLAN acceleration on the receive side and
the VNIC must be setup to strip out the default VLAN tag. If a
default VLAN is not enabled, the VF can support VLAN acceleration
on the receive side. The VNIC should be set up to strip or not
strip the VLAN based on the RX VLAN acceleration setting.
Without this call to determine the default VLAN before calling
bnxt_setup_vnic(), the VNIC may not be set up correctly. For
example, bnxt_setup_vnic() may set up to strip the VLAN tag based
on stale default VLAN information. If RX VLAN acceleration is
not enabled, the VLAN tag will be incorrectly stripped and the
RX data path will not work correctly.
Fixes: cf6645f8eb ("bnxt_en: Add function for VF driver to query default VLAN.")
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Only older NIC controller's firmware uses the PROC AP reset type.
Firmware on 5731X/5741X and newer chips does not support this reset
type. When bnxt_reset() issues a series of resets, this PROC AP
reset may actually fail on these newer chips because the firmware
is not ready to accept this unsupported command yet. Avoid this
unnecessary error by skipping this reset type on chips that don't
support it.
Fixes: 7a13240e37 ("bnxt_en: fix ethtool_reset_flags ABI violations")
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The following scenario describes a bug in the verifier where it
incorrectly concludes about equivalent scalar IDs which could lead to
verifier bypass in privileged mode:
1. Prepare a 32-bit rogue number.
2. Put the rogue number into the upper half of a 64-bit register, and
roll a random (unknown to the verifier) bit in the lower half. The
rest of the bits should be zero (although variations are possible).
3. Assign an ID to the register by MOVing it to another arbitrary
register.
4. Perform a 32-bit spill of the register, then perform a 32-bit fill to
another register. Due to a bug in the verifier, the ID will be
preserved, although the new register will contain only the lower 32
bits, i.e. all zeros except one random bit.
At this point there are two registers with different values but the same
ID, which means the integrity of the verifier state has been corrupted.
5. Compare the new 32-bit register with 0. In the branch where it's
equal to 0, the verifier will believe that the original 64-bit
register is also 0, because it has the same ID, but its actual value
still contains the rogue number in the upper half.
Some optimizations of the verifier prevent the actual bypass, so
extra care is needed: the comparison must be between two registers,
and both branches must be reachable (this is why one random bit is
needed). Both branches are still suitable for the bypass.
6. Right shift the original register by 32 bits to pop the rogue number.
7. Use the rogue number as an offset with any pointer. The verifier will
believe that the offset is 0, while in reality it's the given number.
The fix is similar to the 32-bit BPF_MOV handling in check_alu_op for
SCALAR_VALUE. If the spill is narrowing the actual register value, don't
keep the ID, make sure it's reset to 0.
Fixes: 354e8f1970 ("bpf: Support <8-byte scalar spill and refill")
Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Andrii Nakryiko <andrii@kernel.org> # Checked veristat delta
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20230607123951.558971-2-maxtram95@gmail.com
Selecting only REGMAP_I2C can leave REGMAP unset, causing build errors,
so also select REGMAP to prevent the build errors.
../drivers/misc/eeprom/at24.c:540:42: warning: 'struct regmap_config' declared inside parameter list will not be visible outside of this definition or declaration
540 | struct regmap_config *regmap_config)
../drivers/misc/eeprom/at24.c: In function 'at24_make_dummy_client':
../drivers/misc/eeprom/at24.c:552:18: error: implicit declaration of function 'devm_regmap_init_i2c' [-Werror=implicit-function-declaration]
552 | regmap = devm_regmap_init_i2c(dummy_client, regmap_config);
../drivers/misc/eeprom/at24.c:552:16: warning: assignment to 'struct regmap *' from 'int' makes pointer from integer without a cast [-Wint-conversion]
552 | regmap = devm_regmap_init_i2c(dummy_client, regmap_config);
../drivers/misc/eeprom/at24.c: In function 'at24_probe':
../drivers/misc/eeprom/at24.c:586:16: error: variable 'regmap_config' has initializer but incomplete type
586 | struct regmap_config regmap_config = { };
../drivers/misc/eeprom/at24.c:586:30: error: storage size of 'regmap_config' isn't known
586 | struct regmap_config regmap_config = { };
../drivers/misc/eeprom/at24.c:586:30: warning: unused variable 'regmap_config' [-Wunused-variable]
Fixes: 5c01525847 ("eeprom: at24: add basic regmap_i2c support")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
There is a race between capsnaps flush and removing the inode from
'mdsc->snap_flush_list' list:
== Thread A == == Thread B ==
ceph_queue_cap_snap()
-> allocate 'capsnapA'
->ihold('&ci->vfs_inode')
->add 'capsnapA' to 'ci->i_cap_snaps'
->add 'ci' to 'mdsc->snap_flush_list'
...
== Thread C ==
ceph_flush_snaps()
->__ceph_flush_snaps()
->__send_flush_snap()
handle_cap_flushsnap_ack()
->iput('&ci->vfs_inode')
this also will release 'ci'
...
== Thread D ==
ceph_handle_snap()
->flush_snaps()
->iterate 'mdsc->snap_flush_list'
->get the stale 'ci'
->remove 'ci' from ->ihold(&ci->vfs_inode) this
'mdsc->snap_flush_list' will WARNING
To fix this we will increase the inode's i_count ref when adding 'ci'
to the 'mdsc->snap_flush_list' list.
[ idryomov: need_put int -> bool ]
Cc: stable@vger.kernel.org
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2209299
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Milind Changire <mchangir@redhat.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Simon Wunderlich says:
====================
Here is a batman-adv bugfix:
- fix a broken sync while rescheduling delayed work,
by Vladislav Efanov
* tag 'batadv-net-pullrequest-20230607' of git://git.open-mesh.org/linux-merge:
batman-adv: Broken sync while rescheduling delayed work
====================
Link: https://lore.kernel.org/r/20230607155515.548120-1-sw@simonwunderlich.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We had a number of short comings:
- EEE must be re-evaluated whenever the state machine detects a link
change as wight be switching from a link partner with EEE
enabled/disabled
- tx_lpi_enabled controls whether EEE should be enabled/disabled for the
transmit path, which applies to the TBUF block
- We do not need to forcibly enable EEE upon system resume, as the PHY
state machine will trigger a link event that will do that, too
Fixes: 6ef398ea60 ("net: bcmgenet: add EEE support")
Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/20230606214348.2408018-1-florian.fainelli@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Flip the netif_carrier_ok() condition in queue wake logic.
When I moved it to inside __netif_txq_completed_wake()
I missed negating it.
This made the condition ineffective and could probably
lead to crashes.
Fixes: 301f227fc8 ("net: piggy back on the memory barrier in bql when waking queues")
Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20230607010826.960226-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The down condition should be the negation of the wake condition,
IOW when I moved it from:
if (cond && wake())
to
if (__netif_txq_completed_wake(cond))
Cond should have been negated. Flip it now.
This bug leads to occasional crashes with netconsole.
It may also lead to queue never waking up in case BQL is not enabled.
Reported-by: David Wei <davidhwei@meta.com>
Fixes: 08a096780d ("bnxt: use new queue try_stop/try_wake macros")
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20230607010826.960226-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Daniel Borkmann says:
====================
pull-request: bpf 2023-06-07
We've added 7 non-merge commits during the last 7 day(s) which contain
a total of 12 files changed, 112 insertions(+), 7 deletions(-).
The main changes are:
1) Fix a use-after-free in BPF's task local storage, from KP Singh.
2) Make struct path handling more robust in bpf_d_path, from Jiri Olsa.
3) Fix a syzbot NULL-pointer dereference in sockmap, from Eric Dumazet.
4) UAPI fix for BPF_NETFILTER before final kernel ships,
from Florian Westphal.
5) Fix map-in-map array_map_gen_lookup code generation where elem_size was
not being set for inner maps, from Rhys Rustad-Elliott.
6) Fix sockopt_sk selftest's NETLINK_LIST_MEMBERSHIPS assertion,
from Yonghong Song.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf: Add extra path pointer check to d_path helper
selftests/bpf: Fix sockopt_sk selftest
bpf: netfilter: Add BPF_NETFILTER bpf_attach_type
selftests/bpf: Add access_inner_map selftest
bpf: Fix elem_size not being set for inner maps
bpf: Fix UAF in task local storage
bpf, sockmap: Avoid potential NULL dereference in sk_psock_verdict_data_ready()
====================
Link: https://lore.kernel.org/r/20230607220514.29698-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If caller reports ENOMEM, then stop iterating over the batch and send a
single netlink message to userspace to report OOM.
Fixes: cbb8125eb4 ("netfilter: nfnetlink: deliver netlink errors on batch completion")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The pipapo set backend follows copy-on-update approach, maintaining one
clone of the existing datastructure that is being updated. The clone
and current datastructures are swapped via rcu from the commit step.
The existing integration with the commit protocol is flawed because
there is no operation to clean up the clone if the transaction is
aborted. Moreover, the datastructure swap happens on set element
activation.
This patch adds two new operations for sets: commit and abort, these new
operations are invoked from the commit and abort steps, after the
transactions have been digested, and it updates the pipapo set backend
to use it.
This patch adds a new ->pending_update field to sets to maintain a list
of sets that require this new commit and abort operations.
Fixes: 3c4287f620 ("nf_tables: Add set type for arbitrary concatenation of ranges")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
According to Alex, most APUs from that time seem to have the same issue
(vbios says 48Mhz, actual is 100Mhz). I only have a CHIP_STONEY so I
limit the fixup to CHIP_STONEY
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Userspace can race to free the gobj(robj converted from), robj should not
be accessed again after drm_gem_object_put, otherwith it will result in
use-after-free.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Min Li <lm0963hack@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
When going from ODM 2:1 single display case to max displays, second
odm pipe needs to be repurposed for one of the new single displays.
However, acquire_first_split_pipe() only handles MPC case and not
ODM case
[How]
Add ODM conditions in acquire_first_split_pipe()
Add commit_minimal_transition_state() in commit_streams() to handle
odm 2:1 exit first, and then process new streams
Handle ODM condition in commit_minimal_transition_state()
Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Acked-by: Stylon Wang <stylon.wang@amd.com>
Signed-off-by: Samson Tam <samson.tam@amd.com>
Reviewed-by: Alvin Lee <Alvin.Lee2@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Use the function of amdgpu_bo_vm_destroy to handle the resource release
of shadow bo. During the amdgpu_mes_self_test, shadow bo released, but
vmbo->shadow_list was not, which caused a null pointer reference error
in amdgpu_device_recover_vram when GPU reset.
Fixes: 6c032c37ac ("drm/amdgpu: Fix vram recover doesn't work after whole GPU reset (v2)")
Signed-off-by: xinhui pan <xinhui.pan@amd.com>
Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com>
Acked-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
commit cf488dcd0a ("drm/amd: Allow s0ix without BIOS support") showed
improvements to power consumption over suspend when s0ix wasn't enabled in
BIOS and the system didn't support S3.
This patch however was misguided because the reason the system didn't
support S3 was because SMT was disabled in OEM BIOS setup.
This prevented the BIOS from allowing S3.
Also allowing GPUs to use the s2idle path actually causes problems if
they're invoked on systems that may not support s2idle in the platform
firmware. `systemd` has a tendency to try to use `s2idle` if `deep` fails
for any reason, which could lead to unexpected flows.
The original commit also fixed a problem during resume from suspend to idle
without hardware support, but this is no longer necessary with commit
ca47518663 ("drm/amd: Don't allow s0ix on APUs older than Raven")
Revert commit cf488dcd0a ("drm/amd: Allow s0ix without BIOS support")
to make it match the expected behavior again.
Cc: Rafael Ávila de Espíndola <rafael@espindo.la>
Link: https://github.com/torvalds/linux/blob/v6.1/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c#L1060
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/2599
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Pull input fixes from Dmitry Torokhov:
- a fix for unbalanced open count for inhibited input devices
- fixups in Elantech PS/2 and Cyppress TTSP v5 drivers
- a quirk to soc_button_array driver to make it work with Lenovo
Yoga Book X90F / X90L
- a removal of erroneous entry from xpad driver
* tag 'input-for-v6.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: xpad - delete a Razer DeathAdder mouse VID/PID entry
Input: psmouse - fix OOB access in Elantech protocol
Input: soc_button_array - add invalid acpi_index DMI quirk handling
Input: fix open count when closing inhibited device
Input: cyttsp5 - fix array length
In commit 95433f7263 ("srcu: Begin offloading srcu_struct fields to
srcu_update"), a new struct srcu_usage field was added, but was not
properly initialized. This led to a "spinlock bad magic" BUG when the
SRCU notifier was ever used. This was observed in the MediaTek CCI
devfreq driver on next-20230525. The trimmed stack trace is as follows:
BUG: spinlock bad magic on CPU#4, swapper/0/1
lock: 0xffffff80ff529ac0, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0
Call trace:
spin_bug+0xa4/0xe8
do_raw_spin_lock+0xec/0x120
_raw_spin_lock_irqsave+0x78/0xb8
synchronize_srcu+0x3c/0x168
srcu_notifier_chain_unregister+0x5c/0xa0
cpufreq_unregister_notifier+0x94/0xe0
devfreq_passive_event_handler+0x7c/0x3e0
devfreq_remove_device+0x48/0xe8
Add __SRCU_USAGE_INIT() to SRCU_NOTIFIER_INIT() so that srcu_usage gets
initialized properly.
Reported-by: Jon Hunter <jonathanh@nvidia.com>
Fixes: 95433f7263 ("srcu: Begin offloading srcu_struct fields to srcu_update")
Signed-off-by: Chen-Yu Tsai <wenst@chromium.org>
Tested-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: "Michał Mirosław" <mirq-linux@rere.qmqm.pl>
Cc: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Cc: Sachin Sant <sachinp@linux.ibm.com>
Cc: Joel Fernandes (Google) <joel@joelfernandes.org
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Acked-by: Zqiang <qiang.zhang1211@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
This is overdue and an oversight.
Add myself to this file deespite the fact that I'm trying to reduce the
number of entries in this file which have my name attached, but in the
hope that patches wont get picked up elsewhere completely unreviewed and
unnoticed.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
kafs incorrectly passes a zero mtime (ie. 1st Jan 1970) to the server when
creating a file, dir or symlink because the mtime recorded in the
afs_operation struct gets passed to the server by the marshalling routines,
but the afs_mkdir(), afs_create() and afs_symlink() functions don't set it.
This gets masked if a file or directory is subsequently modified.
Fix this by filling in op->mtime before calling the create op.
Fixes: e49c7b2f6d ("afs: Build an abstraction around an "operation" concept")
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeffrey Altman <jaltman@auristor.com>
Reviewed-by: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sebastian reports that commit 1c913a1c35 ("KVM: arm64: Iterate
arm_pmus list to probe for default PMU") introduced the following splat
with CONFIG_DEBUG_PREEMPT enabled:
[70506.110187] BUG: using smp_processor_id() in preemptible [00000000] code: qemu-system-aar/3078242
[70506.119077] caller is debug_smp_processor_id+0x20/0x30
[70506.124229] CPU: 129 PID: 3078242 Comm: qemu-system-aar Tainted: G W 6.4.0-rc5 #25
[70506.133176] Hardware name: GIGABYTE R181-T92-00/MT91-FS4-00, BIOS F34 08/13/2020
[70506.140559] Call trace:
[70506.142993] dump_backtrace+0xa4/0x130
[70506.146737] show_stack+0x20/0x38
[70506.150040] dump_stack_lvl+0x48/0x60
[70506.153704] dump_stack+0x18/0x28
[70506.157007] check_preemption_disabled+0xe4/0x108
[70506.161701] debug_smp_processor_id+0x20/0x30
[70506.166046] kvm_arm_pmu_v3_set_attr+0x460/0x628
[70506.170662] kvm_arm_vcpu_arch_set_attr+0x88/0xd8
[70506.175363] kvm_arch_vcpu_ioctl+0x258/0x4a8
[70506.179632] kvm_vcpu_ioctl+0x32c/0x6b8
[70506.183465] __arm64_sys_ioctl+0xb4/0x100
[70506.187467] invoke_syscall+0x78/0x108
[70506.191205] el0_svc_common.constprop.0+0x4c/0x100
[70506.195984] do_el0_svc+0x34/0x50
[70506.199287] el0_svc+0x34/0x108
[70506.202416] el0t_64_sync_handler+0xf4/0x120
[70506.206674] el0t_64_sync+0x194/0x198
Fix the issue by using the raw variant that bypasses the debug
assertion. While at it, stick all of the nuance and UAPI baggage into a
comment for posterity.
Fixes: 1c913a1c35 ("KVM: arm64: Iterate arm_pmus list to probe for default PMU")
Reported-by: Sebastian Ott <sebott@redhat.com>
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230606184814.456743-1-oliver.upton@linux.dev
When reworking the vgic locking, the vgic distributor registration
got simplified, which was a very good cleanup. But just a tad too
radical, as we now register the *native* vgic only, ignoring the
GICv2-on-GICv3 that allows pre-historic VMs (or so I thought)
to run.
As it turns out, QEMU still defaults to GICv2 in some cases, and
this breaks Nathan's setup!
Fix it by propagating the *requested* vgic type rather than the
host's version.
Fixes: 59112e9c39 ("KVM: arm64: vgic: Fix a circular locking issue")
Reported-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
link: https://lore.kernel.org/r/20230606221525.GA2269598@dev-arch.thelio-3990X
Commit 8aeb7b17f0 ("RISC-V: Make mmap() with PROT_WRITE imply PROT_READ")
allows riscv to use mmap with PROT_WRITE only, and meanwhile mmap with w+x
is also permitted. However, when userspace tries to access this page with
PROT_WRITE|PROT_EXEC, which causes infinite loop at load page fault as
well as it triggers soft lockup. According to riscv privileged spec,
"Writable pages must also be marked readable". The fix to drop the
`PAGE_COPY_READ_EXEC` and then `PAGE_COPY_EXEC` would be just used instead.
This aligns the other arches (i.e arm64) for protection_map.
Fixes: 8aeb7b17f0 ("RISC-V: Make mmap() with PROT_WRITE imply PROT_READ")
Signed-off-by: Hsieh-Tseng Shen <woodrow.shen@sifive.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20230425102828.1616812-1-woodrow.shen@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Anastasios reported crash on stable 5.15 kernel with following
BPF attached to lsm hook:
SEC("lsm.s/bprm_creds_for_exec")
int BPF_PROG(bprm_creds_for_exec, struct linux_binprm *bprm)
{
struct path *path = &bprm->executable->f_path;
char p[128] = { 0 };
bpf_d_path(path, p, 128);
return 0;
}
But bprm->executable can be NULL, so bpf_d_path call will crash:
BUG: kernel NULL pointer dereference, address: 0000000000000018
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC NOPTI
...
RIP: 0010:d_path+0x22/0x280
...
Call Trace:
<TASK>
bpf_d_path+0x21/0x60
bpf_prog_db9cf176e84498d9_bprm_creds_for_exec+0x94/0x99
bpf_trampoline_6442506293_0+0x55/0x1000
bpf_lsm_bprm_creds_for_exec+0x5/0x10
security_bprm_creds_for_exec+0x29/0x40
bprm_execve+0x1c1/0x900
do_execveat_common.isra.0+0x1af/0x260
__x64_sys_execve+0x32/0x40
It's problem for all stable trees with bpf_d_path helper, which was
added in 5.9.
This issue is fixed in current bpf code, where we identify and mark
trusted pointers, so the above code would fail even to load.
For the sake of the stable trees and to workaround potentially broken
verifier in the future, adding the code that reads the path object from
the passed pointer and verifies it's valid in kernel space.
Fixes: 6e22ab9da7 ("bpf: Add d_path helper")
Reported-by: Anastasios Papagiannis <tasos.papagiannnis@gmail.com>
Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20230606181714.532998-1-jolsa@kernel.org
The user-space policy of the gpio-sim is that configuration for lines
with offsets outside the bounds of the corresponding bank is ignored,
but gpio-sim is still using that configuration when constructing the
sim. In the case of named lines this results in temporarily allocating
space for names that are not used, and for hogs results in errors being
logged when the gpio-sim attempts to register the out of range hog with
gpiolib:
gpiochip_machine_hog: unable to get GPIO desc: -22
Add checks to filter out any line configuration outside the bounds
of the bank when constructing the sim.
Signed-off-by: Kent Gibson <warthog618@gmail.com>
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
try_module_get will be called in tcf_proto_lookup_ops. So module_put needs
to be called to drop the refcount if ops don't implement the required
function.
Fixes: 9f407f1768 ("net: sched: introduce chain templates")
Signed-off-by: Hangyu Hua <hbh25y@gmail.com>
Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes following sparse errors:
net/sched/act_police.c:360:28: warning: dereference of noderef expression
net/sched/act_police.c:362:45: warning: dereference of noderef expression
net/sched/act_police.c:362:45: warning: dereference of noderef expression
net/sched/act_police.c:368:28: warning: dereference of noderef expression
net/sched/act_police.c:370:45: warning: dereference of noderef expression
net/sched/act_police.c:370:45: warning: dereference of noderef expression
net/sched/act_police.c:376:45: warning: dereference of noderef expression
net/sched/act_police.c:376:45: warning: dereference of noderef expression
Fixes: d1967e495a ("net_sched: act_police: add 2 new attributes to support police 64bit rate and peakrate")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, the per cpu upcall counters are allocated after the vport is
created and inserted into the system. This could lead to the datapath
accessing the counters before they are allocated resulting in a kernel
Oops.
Here is an example:
PID: 59693 TASK: ffff0005f4f51500 CPU: 0 COMMAND: "ovs-vswitchd"
#0 [ffff80000a39b5b0] __switch_to at ffffb70f0629f2f4
#1 [ffff80000a39b5d0] __schedule at ffffb70f0629f5cc
#2 [ffff80000a39b650] preempt_schedule_common at ffffb70f0629fa60
#3 [ffff80000a39b670] dynamic_might_resched at ffffb70f0629fb58
#4 [ffff80000a39b680] mutex_lock_killable at ffffb70f062a1388
#5 [ffff80000a39b6a0] pcpu_alloc at ffffb70f0594460c
#6 [ffff80000a39b750] __alloc_percpu_gfp at ffffb70f05944e68
#7 [ffff80000a39b760] ovs_vport_cmd_new at ffffb70ee6961b90 [openvswitch]
...
PID: 58682 TASK: ffff0005b2f0bf00 CPU: 0 COMMAND: "kworker/0:3"
#0 [ffff80000a5d2f40] machine_kexec at ffffb70f056a0758
#1 [ffff80000a5d2f70] __crash_kexec at ffffb70f057e2994
#2 [ffff80000a5d3100] crash_kexec at ffffb70f057e2ad8
#3 [ffff80000a5d3120] die at ffffb70f0628234c
#4 [ffff80000a5d31e0] die_kernel_fault at ffffb70f062828a8
#5 [ffff80000a5d3210] __do_kernel_fault at ffffb70f056a31f4
#6 [ffff80000a5d3240] do_bad_area at ffffb70f056a32a4
#7 [ffff80000a5d3260] do_translation_fault at ffffb70f062a9710
#8 [ffff80000a5d3270] do_mem_abort at ffffb70f056a2f74
#9 [ffff80000a5d32a0] el1_abort at ffffb70f06297dac
#10 [ffff80000a5d32d0] el1h_64_sync_handler at ffffb70f06299b24
#11 [ffff80000a5d3410] el1h_64_sync at ffffb70f056812dc
#12 [ffff80000a5d3430] ovs_dp_upcall at ffffb70ee6963c84 [openvswitch]
#13 [ffff80000a5d3470] ovs_dp_process_packet at ffffb70ee6963fdc [openvswitch]
#14 [ffff80000a5d34f0] ovs_vport_receive at ffffb70ee6972c78 [openvswitch]
#15 [ffff80000a5d36f0] netdev_port_receive at ffffb70ee6973948 [openvswitch]
#16 [ffff80000a5d3720] netdev_frame_hook at ffffb70ee6973a28 [openvswitch]
#17 [ffff80000a5d3730] __netif_receive_skb_core.constprop.0 at ffffb70f06079f90
We moved the per cpu upcall counter allocation to the existing vport
alloc and free functions to solve this.
Fixes: 95637d91fe ("net: openvswitch: release vport resources on failure")
Fixes: 1933ea365a ("net: openvswitch: Add support to count upcall packets")
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
rtm_tca_policy is used from net/sched/sch_api.c and net/sched/cls_api.c,
thus should be declared in an include file.
This fixes the following sparse warning:
net/sched/sch_api.c:1434:25: warning: symbol 'rtm_tca_policy' was not declared. Should it be static?
Fixes: e331473fee ("net/sched: cls_api: add missing validation of netlink attributes")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If pm runtime resume fails the .remove callback used to exit early. This
resulted in an error message by the driver core but the device gets
removed anyhow. This lets the registered i2c adapter stay around with an
unbound parent device.
So only skip clk disabling if resume failed, but do delete the adapter.
Fixes: 8b9ec07198 ("i2c: Add Spreadtrum I2C controller driver")
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Reviewed-by: Andi Shyti <andi.shyti@kernel.org>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
The current ice driver's GNSS write implementation buffers writes and
works through them asynchronously in a kthread. That's bad because:
- The GNSS write_raw operation is supposed to be synchronous[1][2].
- There is no upper bound on the number of pending writes.
Userspace can submit writes much faster than the driver can process,
consuming unlimited amounts of kernel memory.
A patch that's currently on review[3] ("[v3,net] ice: Write all GNSS
buffers instead of first one") would add one more problem:
- The possibility of waiting for a very long time to flush the write
work when doing rmmod, softlockups.
To fix these issues, simplify the implementation: Drop the buffering,
the write_work, and make the writes synchronous.
I tested this with gpsd and ubxtool.
[1] https://events19.linuxfoundation.org/wp-content/uploads/2017/12/The-GNSS-Subsystem-Johan-Hovold-Hovold-Consulting-AB.pdf
"User interface" slide.
[2] A comment in drivers/gnss/core.c:gnss_write():
/* Ignoring O_NONBLOCK, write_raw() is synchronous. */
[3] https://patchwork.ozlabs.org/project/intel-wired-lan/patch/20230217120541.16745-1-karol.kolacinski@intel.com/
Fixes: d6b98c8d24 ("ice: add write functionality for GNSS TTY")
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet says:
====================
rfs: annotate lockless accesses
rfs runs without locks held, so we should annotate
read and writes to shared variables.
It should prevent compilers forcing writes
in the following situation:
if (var != val)
var = val;
A compiler could indeed simply avoid the conditional:
var = val;
This matters if var is shared between many cpus.
v2: aligns one closing bracket (Simon)
adds Fixes: tags (Jakub)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Add READ_ONCE()/WRITE_ONCE() on accesses to the sock flow table.
This also prevents a (smart ?) compiler to remove the condition in:
if (table->ents[index] != newval)
table->ents[index] = newval;
We need the condition to avoid dirtying a shared cache line.
Fixes: fec5e652e5 ("rfs: Receive Flow Steering")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add READ_ONCE()/WRITE_ONCE() on accesses to sk->sk_rxhash.
This also prevents a (smart ?) compiler to remove the condition in:
if (sk->sk_rxhash != newval)
sk->sk_rxhash = newval;
We need the condition to avoid dirtying a shared cache line.
Fixes: fec5e652e5 ("rfs: Receive Flow Steering")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Offloaded policies are deleted through two flows: netdev is going
down and policy flush.
In both cases, the code lacks relevant call to delete offloaded policy.
Fixes: 919e43fad5 ("xfrm: add an interface to offload policy")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Luiz Augusto von Dentz says:
====================
bluetooth pull request for net:
- Fixes to debugfs registration
- Fix use-after-free in hci_remove_ltk/hci_remove_irk
- Fixes to ISO channel support
- Fix missing checks for invalid L2CAP DCID
- Fix l2cap_disconnect_req deadlock
- Add lock to protect HCI_UNREGISTER
* tag 'for-net-2023-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: L2CAP: Add missing checks for invalid DCID
Bluetooth: ISO: use correct CIS order in Set CIG Parameters event
Bluetooth: ISO: don't try to remove CIG if there are bound CIS left
Bluetooth: Fix l2cap_disconnect_req deadlock
Bluetooth: hci_qca: fix debugfs registration
Bluetooth: fix debugfs registration
Bluetooth: hci_sync: add lock to protect HCI_UNREGISTER
Bluetooth: Fix use-after-free in hci_remove_ltk/hci_remove_irk
Bluetooth: ISO: Fix CIG auto-allocation to select configurable CIG
Bluetooth: ISO: consider right CIS when removing CIG at cleanup
====================
Link: https://lore.kernel.org/r/20230606003454.2392552-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
1) Missing nul-check in basechain hook netlink dump path, from Gavrilov Ilia.
2) Fix bitwise register tracking, from Jeremy Sowden.
3) Null pointer dereference when accessing conntrack helper,
from Tijs Van Buggenhout.
4) Add schedule point to ipset's call_ad, from Kuniyuki Iwashima.
5) Incorrect boundary check when building chain blob.
* tag 'nf-23-06-07' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nf_tables: out-of-bound check in chain blob
netfilter: ipset: Add schedule point in call_ad().
netfilter: conntrack: fix NULL pointer dereference in nf_confirm_cthelper
netfilter: nft_bitwise: fix register tracking
netfilter: nf_tables: Add null check for nla_nest_start_noflag() in nft_dump_basechain_hook()
====================
Link: https://lore.kernel.org/r/20230606225851.67394-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kalle Valo says:
====================
wireless fixes for v6.4
Both rtw88 and rtw89 have a 802.11 powersave fix for a regression
introduced in v6.0. mt76 fixes a race and a null pointer dereference.
iwlwifi fixes an issue where not enough memory was allocated for a
firmware event. And finally the stack has several smaller fixes all
over.
* tag 'wireless-2023-06-06' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
wifi: cfg80211: fix locking in regulatory disconnect
wifi: cfg80211: fix locking in sched scan stop work
wifi: iwlwifi: mvm: Fix -Warray-bounds bug in iwl_mvm_wait_d3_notif()
wifi: mac80211: fix switch count in EMA beacons
wifi: mac80211: don't translate beacon/presp addrs
wifi: mac80211: mlme: fix non-inheritence element
wifi: cfg80211: reject bad AP MLD address
wifi: mac80211: use correct iftype HE cap
wifi: mt76: mt7996: fix possible NULL pointer dereference in mt7996_mac_write_txwi()
wifi: rtw89: remove redundant check of entering LPS
wifi: rtw89: correct PS calculation for SUPPORTS_DYNAMIC_PS
wifi: rtw88: correct PS calculation for SUPPORTS_DYNAMIC_PS
wifi: mt76: mt7615: fix possible race in mt7615_mac_sta_poll
====================
Link: https://lore.kernel.org/r/20230606150817.EC133C433D2@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit 523847df1b ("pds_core: add devcmd device interfaces") included
initial support for FW recovery detection. Unfortunately, the ordering
in pdsc_is_fw_good() was incorrect, which was causing FW recovery to be
undetected by the driver. Fix this by making sure to update the cached
fw_status by calling pdsc_is_fw_running() before setting the local FW
gen.
Fixes: 523847df1b ("pds_core: add devcmd device interfaces")
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230605195116.49653-1-brett.creeley@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We missed that tcp_gso_segment() was assuming skb->len was smaller than 65535 :
oldlen = (u16)~skb->len;
This part came with commit 0718bcc09b ("[NET]: Fix CHECKSUM_HW GSO problems.")
This leads to wrong TCP checksum.
Adapt the code to accept arbitrary packet length.
v2:
- use two csum_add() instead of csum_fold() (Alexander Duyck)
- Change delta type to __wsum to reduce casts (Alexander Duyck)
Fixes: 09f3d1a3a5 ("ipv6/gso: remove temporary HBH/jumbo header")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230605161647.3624428-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If it is async, runqueue_node is freed in g2d_runqueue_worker on another
worker thread. So in extreme cases, if g2d_runqueue_worker runs first, and
then executes the following if statement, there will be use-after-free.
Signed-off-by: Min Li <lm0963hack@gmail.com>
Reviewed-by: Andi Shyti <andi.shyti@kernel.org>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
Fix a wrong error return by dropping an error return.
When vidi driver is remvoed, if ctx->raw_edid isn't same as fake_edid_info
then only what we have to is to free ctx->raw_edid so that driver removing
can work correctly - it's not an error case.
Signed-off-by: Inki Dae <inki.dae@samsung.com>
Reviewed-by: Andi Shyti <andi.shyti@kernel.org>
A remote DoS vulnerability of RPL Source Routing is assigned CVE-2023-2156.
The Source Routing Header (SRH) has the following format:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Next Header | Hdr Ext Len | Routing Type | Segments Left |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| CmprI | CmprE | Pad | Reserved |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
. .
. Addresses[1..n] .
. .
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
The originator of an SRH places the first hop's IPv6 address in the IPv6
header's IPv6 Destination Address and the second hop's IPv6 address as
the first address in Addresses[1..n].
The CmprI and CmprE fields indicate the number of prefix octets that are
shared with the IPv6 Destination Address. When CmprI or CmprE is not 0,
Addresses[1..n] are compressed as follows:
1..n-1 : (16 - CmprI) bytes
n : (16 - CmprE) bytes
Segments Left indicates the number of route segments remaining. When the
value is not zero, the SRH is forwarded to the next hop. Its address
is extracted from Addresses[n - Segment Left + 1] and swapped with IPv6
Destination Address.
When Segment Left is greater than or equal to 2, the size of SRH is not
changed because Addresses[1..n-1] are decompressed and recompressed with
CmprI.
OTOH, when Segment Left changes from 1 to 0, the new SRH could have a
different size because Addresses[1..n-1] are decompressed with CmprI and
recompressed with CmprE.
Let's say CmprI is 15 and CmprE is 0. When we receive SRH with Segment
Left >= 2, Addresses[1..n-1] have 1 byte for each, and Addresses[n] has
16 bytes. When Segment Left is 1, Addresses[1..n-1] is decompressed to
16 bytes and not recompressed. Finally, the new SRH will need more room
in the header, and the size is (16 - 1) * (n - 1) bytes.
Here the max value of n is 255 as Segment Left is u8, so in the worst case,
we have to allocate 3825 bytes in the skb headroom. However, now we only
allocate a small fixed buffer that is IPV6_RPL_SRH_WORST_SWAP_SIZE (16 + 7
bytes). If the decompressed size overflows the room, skb_push() hits BUG()
below [0].
Instead of allocating the fixed buffer for every packet, let's allocate
enough headroom only when we receive SRH with Segment Left 1.
[0]:
skbuff: skb_under_panic: text:ffffffff81c9f6e2 len:576 put:576 head:ffff8880070b5180 data:ffff8880070b4fb0 tail:0x70 end:0x140 dev:lo
kernel BUG at net/core/skbuff.c:200!
invalid opcode: 0000 [#1] PREEMPT SMP PTI
CPU: 0 PID: 154 Comm: python3 Not tainted 6.4.0-rc4-00190-gc308e9ec0047 #7
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
RIP: 0010:skb_panic (net/core/skbuff.c:200)
Code: 4f 70 50 8b 87 bc 00 00 00 50 8b 87 b8 00 00 00 50 ff b7 c8 00 00 00 4c 8b 8f c0 00 00 00 48 c7 c7 80 6e 77 82 e8 ad 8b 60 ff <0f> 0b 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90
RSP: 0018:ffffc90000003da0 EFLAGS: 00000246
RAX: 0000000000000085 RBX: ffff8880058a6600 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff88807dc1c540 RDI: ffff88807dc1c540
RBP: ffffc90000003e48 R08: ffffffff82b392c8 R09: 00000000ffffdfff
R10: ffffffff82a592e0 R11: ffffffff82b092e0 R12: ffff888005b1c800
R13: ffff8880070b51b8 R14: ffff888005b1ca18 R15: ffff8880070b5190
FS: 00007f4539f0b740(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055670baf3000 CR3: 0000000005b0e000 CR4: 00000000007506f0
PKRU: 55555554
Call Trace:
<IRQ>
skb_push (net/core/skbuff.c:210)
ipv6_rthdr_rcv (./include/linux/skbuff.h:2880 net/ipv6/exthdrs.c:634 net/ipv6/exthdrs.c:718)
ip6_protocol_deliver_rcu (net/ipv6/ip6_input.c:437 (discriminator 5))
ip6_input_finish (./include/linux/rcupdate.h:805 net/ipv6/ip6_input.c:483)
__netif_receive_skb_one_core (net/core/dev.c:5494)
process_backlog (./include/linux/rcupdate.h:805 net/core/dev.c:5934)
__napi_poll (net/core/dev.c:6496)
net_rx_action (net/core/dev.c:6565 net/core/dev.c:6696)
__do_softirq (./arch/x86/include/asm/jump_label.h:27 ./include/linux/jump_label.h:207 ./include/trace/events/irq.h:142 kernel/softirq.c:572)
do_softirq (kernel/softirq.c:472 kernel/softirq.c:459)
</IRQ>
<TASK>
__local_bh_enable_ip (kernel/softirq.c:396)
__dev_queue_xmit (net/core/dev.c:4272)
ip6_finish_output2 (./include/net/neighbour.h:544 net/ipv6/ip6_output.c:134)
rawv6_sendmsg (./include/net/dst.h:458 ./include/linux/netfilter.h:303 net/ipv6/raw.c:656 net/ipv6/raw.c:914)
sock_sendmsg (net/socket.c:724 net/socket.c:747)
__sys_sendto (net/socket.c:2144)
__x64_sys_sendto (net/socket.c:2156 net/socket.c:2152 net/socket.c:2152)
do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
RIP: 0033:0x7f453a138aea
Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
RSP: 002b:00007ffcc212a1c8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00007ffcc212a288 RCX: 00007f453a138aea
RDX: 0000000000000060 RSI: 00007f4539084c20 RDI: 0000000000000003
RBP: 00007f4538308e80 R08: 00007ffcc212a300 R09: 000000000000001c
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: ffffffffc4653600 R14: 0000000000000001 R15: 00007f4539712d1b
</TASK>
Modules linked in:
Fixes: 8610c7c6e3 ("net: ipv6: add support for rpl sr exthdr")
Reported-by: Max VA
Closes: https://www.interruptlabs.co.uk/articles/linux-ipv6-route-of-death
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20230605180617.67284-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The drm sched entity must be flushed before finishing, to account for
jobs potentially still in flight at that time.
Lima did not do this flush until now, so switch the destroy call to the
drm_sched_entity_destroy() wrapper which will take care of that.
This fixes a regression on lima which started since the rework in
commit 2fdb8a8f07 ("drm/scheduler: rework entity flush, kill and fini")
where some specific types of applications may hang indefinitely.
Fixes: 2fdb8a8f07 ("drm/scheduler: rework entity flush, kill and fini")
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230606143247.433018-1-nunes.erico@gmail.com
Add current size of rule expressions to the boundary check.
Fixes: 2c865a8a28 ("netfilter: nf_tables: add rule blob layout")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
syzkaller found a repro that causes Hung Task [0] with ipset. The repro
first creates an ipset and then tries to delete a large number of IPs
from the ipset concurrently:
IPSET_ATTR_IPADDR_IPV4 : 172.20.20.187
IPSET_ATTR_CIDR : 2
The first deleting thread hogs a CPU with nfnl_lock(NFNL_SUBSYS_IPSET)
held, and other threads wait for it to be released.
Previously, the same issue existed in set->variant->uadt() that could run
so long under ip_set_lock(set). Commit 5e29dc36bd ("netfilter: ipset:
Rework long task execution when adding/deleting entries") tried to fix it,
but the issue still exists in the caller with another mutex.
While adding/deleting many IPs, we should release the CPU periodically to
prevent someone from abusing ipset to hang the system.
Note we need to increment the ipset's refcnt to prevent the ipset from
being destroyed while rescheduling.
[0]:
INFO: task syz-executor174:268 blocked for more than 143 seconds.
Not tainted 6.4.0-rc1-00145-gba79e9a73284 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz-executor174 state:D stack:0 pid:268 ppid:260 flags:0x0000000d
Call trace:
__switch_to+0x308/0x714 arch/arm64/kernel/process.c:556
context_switch kernel/sched/core.c:5343 [inline]
__schedule+0xd84/0x1648 kernel/sched/core.c:6669
schedule+0xf0/0x214 kernel/sched/core.c:6745
schedule_preempt_disabled+0x58/0xf0 kernel/sched/core.c:6804
__mutex_lock_common kernel/locking/mutex.c:679 [inline]
__mutex_lock+0x6fc/0xdb0 kernel/locking/mutex.c:747
__mutex_lock_slowpath+0x14/0x20 kernel/locking/mutex.c:1035
mutex_lock+0x98/0xf0 kernel/locking/mutex.c:286
nfnl_lock net/netfilter/nfnetlink.c:98 [inline]
nfnetlink_rcv_msg+0x480/0x70c net/netfilter/nfnetlink.c:295
netlink_rcv_skb+0x1c0/0x350 net/netlink/af_netlink.c:2546
nfnetlink_rcv+0x18c/0x199c net/netfilter/nfnetlink.c:658
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x664/0x8cc net/netlink/af_netlink.c:1365
netlink_sendmsg+0x6d0/0xa4c net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg net/socket.c:747 [inline]
____sys_sendmsg+0x4b8/0x810 net/socket.c:2503
___sys_sendmsg net/socket.c:2557 [inline]
__sys_sendmsg+0x1f8/0x2a4 net/socket.c:2586
__do_sys_sendmsg net/socket.c:2595 [inline]
__se_sys_sendmsg net/socket.c:2593 [inline]
__arm64_sys_sendmsg+0x80/0x94 net/socket.c:2593
__invoke_syscall arch/arm64/kernel/syscall.c:38 [inline]
invoke_syscall+0x84/0x270 arch/arm64/kernel/syscall.c:52
el0_svc_common+0x134/0x24c arch/arm64/kernel/syscall.c:142
do_el0_svc+0x64/0x198 arch/arm64/kernel/syscall.c:193
el0_svc+0x2c/0x7c arch/arm64/kernel/entry-common.c:637
el0t_64_sync_handler+0x84/0xf0 arch/arm64/kernel/entry-common.c:655
el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:591
Reported-by: syzkaller <syzkaller@googlegroups.com>
Fixes: a7b4f989a6 ("netfilter: ipset: IP set core support")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
An nf_conntrack_helper from nf_conn_help may become NULL after DNAT.
Observed when TCP port 1720 (Q931_PORT), associated with h323 conntrack
helper, is DNAT'ed to another destination port (e.g. 1730), while
nfqueue is being used for final acceptance (e.g. snort).
This happenned after transition from kernel 4.14 to 5.10.161.
Workarounds:
* keep the same port (1720) in DNAT
* disable nfqueue
* disable/unload h323 NAT helper
$ linux-5.10/scripts/decode_stacktrace.sh vmlinux < /tmp/kernel.log
BUG: kernel NULL pointer dereference, address: 0000000000000084
[..]
RIP: 0010:nf_conntrack_update (net/netfilter/nf_conntrack_core.c:2080 net/netfilter/nf_conntrack_core.c:2134) nf_conntrack
[..]
nfqnl_reinject (net/netfilter/nfnetlink_queue.c:237) nfnetlink_queue
nfqnl_recv_verdict (net/netfilter/nfnetlink_queue.c:1230) nfnetlink_queue
nfnetlink_rcv_msg (net/netfilter/nfnetlink.c:241) nfnetlink
[..]
Fixes: ee04805ff5 ("netfilter: conntrack: make conntrack userspace helpers work again")
Signed-off-by: Tijs Van Buggenhout <tijs.van.buggenhout@axsguard.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
At the end of `nft_bitwise_reduce`, there is a loop which is intended to
update the bitwise expression associated with each tracked destination
register. However, currently, it just updates the first register
repeatedly. Fix it.
Fixes: 34cc9e5288 ("netfilter: nf_tables: cancel tracking for clobbered destination registers")
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The nla_nest_start_noflag() function may fail and return NULL;
the return value needs to be checked.
Found by InfoTeCS on behalf of Linux Verification Center
(linuxtesting.org) with SVACE.
Fixes: d54725cd11 ("netfilter: nf_tables: support for multiple devices per netdev hook")
Signed-off-by: Gavrilov Ilia <Ilia.Gavrilov@infotecs.ru>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Commit f4e4534850 ("net/netlink: fix NETLINK_LIST_MEMBERSHIPS length report")
fixed NETLINK_LIST_MEMBERSHIPS length report which caused
selftest sockopt_sk failure. The failure log looks like
test_sockopt_sk:PASS:join_cgroup /sockopt_sk 0 nsec
run_test:PASS:skel_load 0 nsec
run_test:PASS:setsockopt_link 0 nsec
run_test:PASS:getsockopt_link 0 nsec
getsetsockopt:FAIL:Unexpected NETLINK_LIST_MEMBERSHIPS value unexpected Unexpected NETLINK_LIST_MEMBERSHIPS value: actual 8 != expected 4
run_test:PASS:getsetsockopt 0 nsec
#201 sockopt_sk:FAIL
In net/netlink/af_netlink.c, function netlink_getsockopt(), for NETLINK_LIST_MEMBERSHIPS,
nlk->ngroups equals to 36. Before Commit f4e4534850, the optlen is calculated as
ALIGN(nlk->ngroups / 8, sizeof(u32)) = 4
After that commit, the optlen is
ALIGN(BITS_TO_BYTES(nlk->ngroups), sizeof(u32)) = 8
Fix the test by setting the expected optlen to be 8.
Fixes: f4e4534850 ("net/netlink: fix NETLINK_LIST_MEMBERSHIPS length report")
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230606172202.1606249-1-yhs@fb.com
The async discard uses the BTRFS_FS_DISCARD_RUNNING bit in the fs_info
to force discards off when the filesystem has aborted or we're generally
not able to run discards. This gets flipped on when we're mounted rw,
and also when we go from ro->rw.
Commit 63a7cb1307 ("btrfs: auto enable discard=async when possible")
enabled async discard by default, and this meant
"mount -o ro /dev/xxx /yyy" had async discards turned on.
Unfortunately, this meant our check in btrfs_remount_cleanup() would see
that discards are already on:
/* If we toggled discard async */
if (!btrfs_raw_test_opt(old_opts, DISCARD_ASYNC) &&
btrfs_test_opt(fs_info, DISCARD_ASYNC))
btrfs_discard_resume(fs_info);
So, we'd never call btrfs_discard_resume() when remounting the root
filesystem from ro->rw.
drgn shows this really nicely:
import os
import sys
from drgn.helpers.linux.fs import path_lookup
from drgn import NULL, Object, Type, cast
def btrfs_sb(sb):
return cast("struct btrfs_fs_info *", sb.s_fs_info)
if len(sys.argv) == 1:
path = "/"
else:
path = sys.argv[1]
fs_info = cast("struct btrfs_fs_info *", path_lookup(prog, path).mnt.mnt_sb.s_fs_info)
BTRFS_FS_DISCARD_RUNNING = 1 << prog['BTRFS_FS_DISCARD_RUNNING']
if fs_info.flags & BTRFS_FS_DISCARD_RUNNING:
print("discard running flag is on")
else:
print("discard running flag is off")
[root]# mount | grep nvme
/dev/nvme0n1p3 on / type btrfs
(rw,relatime,compress-force=zstd:3,ssd,discard=async,space_cache=v2,subvolid=5,subvol=/)
[root]# ./discard_running.drgn
discard running flag is off
[root]# mount -o remount,discard=sync /
[root]# mount -o remount,discard=async /
[root]# ./discard_running.drgn
discard running flag is on
The fix is to call btrfs_discard_resume() when we're going from ro->rw.
It already checks to make sure the async discard flag is on, so it'll do
the right thing.
Fixes: 63a7cb1307 ("btrfs: auto enable discard=async when possible")
CC: stable@vger.kernel.org # 6.3+
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Chris Mason <clm@fb.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The DSPI controller has configurable timing for
(a) tCSC: the interval between the assertion of the chip select and the
first clock edge
(b) tASC: the interval between the last clock edge and the deassertion
of the chip select
What is a bit surprising, but is documented in the figure "Example of
continuous transfer (CPHA=1, CONT=1)" in the datasheet, is that when the
chip select stays asserted between multiple TX FIFO writes, the tCSC and
tASC times still apply. With CONT=1, chip select remains asserted, but
SCK takes a break and goes to the idle state for tASC + tCSC ns.
In other words, the default values (of 0 and 0 ns) result in SCK
glitches where the SCK transition to the idle state, as well as the SCK
transition from the idle state, will have no delay in between, and it
may appear that a SCK cycle has simply gone missing. The resulting
timing violation might cause data corruption in many peripherals, as
their chip select is asserted.
The driver has device tree bindings for tCSC ("fsl,spi-cs-sck-delay")
and tASC ("fsl,spi-sck-cs-delay"), but these are only specified to apply
when the chip select toggles in the first place, and this timing
characteristic depends on each peripheral. Many peripherals do not have
explicit timing requirements, so many device trees do not have these
properties present at all.
Nonetheless, the lack of SCK glitches is a common sense requirement, and
since the SCK stays in the idle state during transfers for tCSC+tASC ns,
and that in itself should look like half a cycle, then let's ensure that
tCSC and tASC are at least a quarter of a SCK period, such that their
sum is at least half of one.
Fixes: 95bf15f386 ("spi: fsl-dspi: Add ~50ns delay between cs and sck")
Reported-by: Lisa Chen (陈敏捷) <minjie.chen@geekplus.com>
Debugged-by: Lisa Chen (陈敏捷) <minjie.chen@geekplus.com>
Tested-by: Lisa Chen (陈敏捷) <minjie.chen@geekplus.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20230529223402.1199503-1-vladimir.oltean@nxp.com
Signed-off-by: Mark Brown <broonie@kernel.org>
When constructing the sim, gpio-sim constructs an array of named lines,
sized based on the largest offset of any named line, and then initializes
that array with the names of all lines, including unnamed hogs with higher
offsets. In doing so it writes NULLs beyond the extent of the array.
Add a check that only named lines are used to initialize the array.
Fixes: cb8c474e79 ("gpio: sim: new testing module")
Signed-off-by: Kent Gibson<warthog618@gmail.com>
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Pull spi fixes from Mark Brown:
"A small collection of driver specific fixes, none of them particularly
remarkable or severe"
* tag 'spi-fix-v6.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: qup: Request DMA before enabling clocks
spi: mt65xx: make sure operations completed before unloading
spi: lpspi: disable lpspi module irq in DMA mode
ASoC: Fixes for v6.4
A lot of routine driver specific fixes here, nothing in the core though
there are a couple of fixes for the generic cards. There's also a few
new quirks for x86 platforms.
Arm FF-A fix for v6.4
A single fix addressing another MBZ field being non-zero and non-compliant
resulting in the rejection of certain memory interface transmissions by
the receivers(secure partitions). The issue was addressed before but missed
to address this one field which is part of this change.
* tag 'ffa-fix-6.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux:
firmware: arm_ffa: Set handle field to zero in memory descriptor
Link: https://lore.kernel.org/r/20230606125720.2816923-1-sudeep.holla@arm.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
This should use wiphy_lock() now instead of requiring the
RTNL, since __cfg80211_leave() via cfg80211_leave() is now
requiring that lock to be held.
Fixes: a05829a722 ("cfg80211: avoid holding the RTNL when calling the driver")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This should use wiphy_lock() now instead of acquiring the
RTNL, since cfg80211_stop_sched_scan_req() now needs that.
Fixes: a05829a722 ("cfg80211: avoid holding the RTNL when calling the driver")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Pull gfs2 fix from Andreas Gruenbacher:
- Don't get stuck writing page onto itself under direct I/O
* tag 'gfs2-v6.4-rc4-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
gfs2: Don't get stuck writing page onto itself under direct I/O
Pull x86 platform driver fixes from Hans de Goede:
- various Microsoft Surface support fixes
- one fix for the INT3472 driver
* tag 'platform-drivers-x86-v6.4-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/x86: int3472: Avoid crash in unregistering regulator gpio
platform/surface: aggregator_tabletsw: Add support for book mode in POS subsystem
platform/surface: aggregator_tabletsw: Add support for book mode in KIP subsystem
platform/surface: aggregator: Allow completion work-items to be executed in parallel
platform/surface: aggregator: Make to_ssam_device_driver() respect constness
HD-audio core code replaces the kctl->id.index of SPDIF-related
controls after assigning via snd_ctl_add(). This doesn't work any
longer with the new Xarray lookup change. The change of the kctl->id
content has to be done via snd_ctl_rename_id() helper, instead.
Fixes: c27e1efb61 ("ALSA: control: Use xarray for faster lookups")
Cc: <stable@vger.kernel.org>
Reviewed-by: Jaroslav Kysela <perex@perex.cz>
Link: https://lore.kernel.org/r/20230606093855.14685-5-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Pull HID fix from Jiri Kosina:
- Final, confirmed fix for regression causing some devices connected
via Logitech HID++ Unifying receiver take too long to initialize
(Benjamin Tissoires)
* tag 'for-linus-2023060501' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
HID: hidpp: terminate retry loop on success
kmemdup() at line 2735 is not duplicating enough memory for
notif->tid_tear_down and notif->station_id. As it only duplicates
612 bytes: up to offsetofend(struct iwl_wowlan_info_notif,
received_beacons), this is the range of [0, 612) bytes.
2735 notif = kmemdup(notif_v1,
2736 offsetofend(struct iwl_wowlan_info_notif,
2737 received_beacons),
2738 GFP_ATOMIC);
which evidently does not cover bytes 612 and 613 for members
tid_tear_down and station_id in struct iwl_wowlan_info_notif.
See below:
$ pahole -C iwl_wowlan_info_notif drivers/net/wireless/intel/iwlwifi/mvm/d3.o
struct iwl_wowlan_info_notif {
struct iwl_wowlan_gtk_status_v3 gtk[2]; /* 0 488 */
/* --- cacheline 7 boundary (448 bytes) was 40 bytes ago --- */
struct iwl_wowlan_igtk_status igtk[2]; /* 488 80 */
/* --- cacheline 8 boundary (512 bytes) was 56 bytes ago --- */
__le64 replay_ctr; /* 568 8 */
/* --- cacheline 9 boundary (576 bytes) --- */
__le16 pattern_number; /* 576 2 */
__le16 reserved1; /* 578 2 */
__le16 qos_seq_ctr[8]; /* 580 16 */
__le32 wakeup_reasons; /* 596 4 */
__le32 num_of_gtk_rekeys; /* 600 4 */
__le32 transmitted_ndps; /* 604 4 */
__le32 received_beacons; /* 608 4 */
u8 tid_tear_down; /* 612 1 */
u8 station_id; /* 613 1 */
u8 reserved2[2]; /* 614 2 */
/* size: 616, cachelines: 10, members: 13 */
/* last cacheline: 40 bytes */
};
Therefore, when the following assignments take place, actually no memory
has been allocated for those objects:
2743 notif->tid_tear_down = notif_v1->tid_tear_down;
2744 notif->station_id = notif_v1->station_id;
Fix this by allocating space for the whole notif object and zero out the
remaining space in memory after member station_id.
This also fixes the following -Warray-bounds issues:
CC drivers/net/wireless/intel/iwlwifi/mvm/d3.o
drivers/net/wireless/intel/iwlwifi/mvm/d3.c: In function ‘iwl_mvm_wait_d3_notif’:
drivers/net/wireless/intel/iwlwifi/mvm/d3.c:2743:30: warning: array subscript ‘struct iwl_wowlan_info_notif[0]’ is partly outside array bounds of ‘unsigned char[612]’ [-Warray-bounds=]
2743 | notif->tid_tear_down = notif_v1->tid_tear_down;
|
from drivers/net/wireless/intel/iwlwifi/mvm/d3.c:7:
In function ‘kmemdup’,
inlined from ‘iwl_mvm_wait_d3_notif’ at drivers/net/wireless/intel/iwlwifi/mvm/d3.c:2735:12:
include/linux/fortify-string.h:765:16: note: object of size 612 allocated by ‘__real_kmemdup’
765 | return __real_kmemdup(p, size, gfp);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/net/wireless/intel/iwlwifi/mvm/d3.c: In function ‘iwl_mvm_wait_d3_notif’:
drivers/net/wireless/intel/iwlwifi/mvm/d3.c:2744:30: warning: array subscript ‘struct iwl_wowlan_info_notif[0]’ is partly outside array bounds of ‘unsigned char[612]’ [-Warray-bounds=]
2744 | notif->station_id = notif_v1->station_id;
| ^~
In function ‘kmemdup’,
inlined from ‘iwl_mvm_wait_d3_notif’ at drivers/net/wireless/intel/iwlwifi/mvm/d3.c:2735:12:
include/linux/fortify-string.h:765:16: note: object of size 612 allocated by ‘__real_kmemdup’
765 | return __real_kmemdup(p, size, gfp);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
Link: https://github.com/KSPP/linux/issues/306
Fixes: 905d50ddbc ("wifi: iwlwifi: mvm: support wowlan info notification version 2")
Cc: stable@vger.kernel.org
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Acked-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/ZHpGN555FwAKGduH@work
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Currently, whenever an EMA beacon is formed, due to is_template
argument being false from the caller, the switch count is always
decremented once which is wrong.
Also if switch count is equal to profile periodicity, this makes
the switch count to reach till zero which triggers a WARN_ON_ONCE.
[ 261.593915] CPU: 1 PID: 800 Comm: kworker/u8:3 Not tainted 5.4.213 #0
[ 261.616143] Hardware name: Qualcomm Technologies, Inc. IPQ9574
[ 261.622666] Workqueue: phy0 ath12k_get_link_bss_conf [ath12k]
[ 261.629771] pstate: 60400005 (nZCv daif +PAN -UAO)
[ 261.635595] pc : ieee80211_next_txq+0x1ac/0x1b8 [mac80211]
[ 261.640282] lr : ieee80211_beacon_update_cntdwn+0x64/0xb4 [mac80211]
[...]
[ 261.729683] Call trace:
[ 261.734986] ieee80211_next_txq+0x1ac/0x1b8 [mac80211]
[ 261.737156] ieee80211_beacon_cntdwn_is_complete+0xa28/0x1194 [mac80211]
[ 261.742365] ieee80211_beacon_cntdwn_is_complete+0xef4/0x1194 [mac80211]
[ 261.749224] ieee80211_beacon_get_template_ema_list+0x38/0x5c [mac80211]
[ 261.755908] ath12k_get_link_bss_conf+0xf8/0x33b4 [ath12k]
[ 261.762590] ath12k_get_link_bss_conf+0x390/0x33b4 [ath12k]
[ 261.767881] process_one_work+0x194/0x270
[ 261.773346] worker_thread+0x200/0x314
[ 261.777514] kthread+0x140/0x150
[ 261.781158] ret_from_fork+0x10/0x18
Fix this issue by making the is_template argument as true when fetching
the EMA beacons.
Fixes: bd54f3c290 ("wifi: mac80211: generate EMA beacons in AP mode")
Signed-off-by: Aditya Kumar Singh <quic_adisi@quicinc.com>
Link: https://lore.kernel.org/r/20230531062012.4537-1-quic_adisi@quicinc.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
There were two bugs when creating the non-inheritence
element:
1) 'at_extension' needs to be declared outside the loop,
otherwise the value resets every iteration and we
can never really switch properly
2) 'added' never got set to true, so we always cut off
the extension element again at the end of the function
This shows another issue that we might add a list but no
extension list, but we need to make the extension list a
zero-length one in that case.
Fix all these issues. While at it, add a comment explaining
the trim.
Fixes: 81151ce462 ("wifi: mac80211: support MLO authentication/association with one link")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230604120651.3addaa5c4782.If3a78f9305997ad7ef4ba7ffc17a8234c956f613@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Move capturing the snapshot context into the image request state
machine, after exclusive lock is ensured to be held for the duration of
dealing with the image request. This is needed to ensure correctness
of fast-diff states (OBJECT_EXISTS vs OBJECT_EXISTS_CLEAN) and object
deltas computed based off of them. Otherwise the object map that is
forked for the snapshot isn't guaranteed to accurately reflect the
contents of the snapshot when the snapshot is taken under I/O. This
breaks differential backup and snapshot-based mirroring use cases with
fast-diff enabled: since some object deltas may be incomplete, the
destination image may get corrupted.
Cc: stable@vger.kernel.org
Link: https://tracker.ceph.com/issues/61472
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
Move RBD_OBJ_FLAG_COPYUP_ENABLED flag setting into the object request
state machine to allow for the snapshot context to be captured in the
image request state machine rather than in rbd_queue_workfn().
Cc: stable@vger.kernel.org
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
When receiving a connect response we should make sure that the DCID is
within the valid range and that we don't already have another channel
allocated for the same DCID.
Missing checks may violate the specification (BLUETOOTH CORE SPECIFICATION
Version 5.4 | Vol 3, Part A, Page 1046).
Fixes: 40624183c2 ("Bluetooth: L2CAP: Add missing checks for invalid LE DCID")
Signed-off-by: Sungwoo Kim <iam@sung-woo.kim>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
The order of CIS handle array in Set CIG Parameters response shall match
the order of the CIS_ID array in the command (Core v5.3 Vol 4 Part E Sec
7.8.97). We send CIS_IDs mainly in the order of increasing CIS_ID (but
with "last" CIS first if it has fixed CIG_ID). In handling of the
reply, we currently assume this is also the same as the order of
hci_conn in hdev->conn_hash, but that is not true.
Match the correct hci_conn to the correct handle by matching them based
on the CIG+CIS combination. The CIG+CIS combination shall be unique for
ISO_LINK hci_conn at state >= BT_BOUND, which we maintain in
hci_le_set_cig_params.
Fixes: 26afbd826e ("Bluetooth: Add initial implementation of CIS connections")
Signed-off-by: Pauli Virtanen <pav@iki.fi>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Consider existing BOUND & CONNECT state CIS to block CIG removal.
Otherwise, under suitable timing conditions we may attempt to remove CIG
while Create CIS is pending, which fails.
Fixes: 26afbd826e ("Bluetooth: Add initial implementation of CIS connections")
Signed-off-by: Pauli Virtanen <pav@iki.fi>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
L2CAP assumes that the locks conn->chan_lock and chan->lock are
acquired in the order conn->chan_lock, chan->lock to avoid
potential deadlock.
For example, l2sock_shutdown acquires these locks in the order:
mutex_lock(&conn->chan_lock)
l2cap_chan_lock(chan)
However, l2cap_disconnect_req acquires chan->lock in
l2cap_get_chan_by_scid first and then acquires conn->chan_lock
before calling l2cap_chan_del. This means that these locks are
acquired in unexpected order, which leads to potential deadlock:
l2cap_chan_lock(c)
mutex_lock(&conn->chan_lock)
This patch releases chan->lock before acquiring the conn_chan_lock
to avoid the potential deadlock.
Fixes: a2a9339e1c ("Bluetooth: L2CAP: Fix use-after-free in l2cap_disconnect_{req,rsp}")
Signed-off-by: Ying Hsu <yinghsu@chromium.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Since commit 3e4be65eb8 ("Bluetooth: hci_qca: Add poweroff support
during hci down for wcn3990"), the setup callback which registers the
debugfs interface can be called multiple times.
This specifically leads to the following error when powering on the
controller:
debugfs: Directory 'ibs' with parent 'hci0' already present!
Add a driver flag to avoid trying to register the debugfs interface more
than once.
Fixes: 3e4be65eb8 ("Bluetooth: hci_qca: Add poweroff support during hci down for wcn3990")
Cc: stable@vger.kernel.org # 4.20
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Since commit ec6cef9cd9 ("Bluetooth: Fix SMP channel registration for
unconfigured controllers") the debugfs interface for unconfigured
controllers will be created when the controller is configured.
There is however currently nothing preventing a controller from being
configured multiple time (e.g. setting the device address using btmgmt)
which results in failed attempts to register the already registered
debugfs entries:
debugfs: File 'features' in directory 'hci0' already present!
debugfs: File 'manufacturer' in directory 'hci0' already present!
debugfs: File 'hci_version' in directory 'hci0' already present!
...
debugfs: File 'quirk_simultaneous_discovery' in directory 'hci0' already present!
Add a controller flag to avoid trying to register the debugfs interface
more than once.
Fixes: ec6cef9cd9 ("Bluetooth: Fix SMP channel registration for unconfigured controllers")
Cc: stable@vger.kernel.org # 4.0
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
When the HCI_UNREGISTER flag is set, no jobs should be scheduled. Fix
potential race when HCI_UNREGISTER is set after the flag is tested in
hci_cmd_sync_queue.
Fixes: 0b94f2651f ("Bluetooth: hci_sync: Fix queuing commands when HCI_UNREGISTER is set")
Signed-off-by: Zhengping Jiang <jiangzp@google.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Make CIG auto-allocation to select the first CIG_ID that is still
configurable. Also use correct CIG_ID range (see Core v5.3 Vol 4 Part E
Sec 7.8.97 p.2553).
Previously, it would always select CIG_ID 0 regardless of anything,
because cis_list with data.cis == 0xff (BT_ISO_QOS_CIS_UNSET) would not
count any CIS. Since we are not adding CIS here, use find_cis instead.
Fixes: 26afbd826e ("Bluetooth: Add initial implementation of CIS connections")
Signed-off-by: Pauli Virtanen <pav@iki.fi>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
When looking for CIS blocking CIG removal, consider only the CIS with
the right CIG ID. Don't try to remove CIG with unset CIG ID.
Fixes: 26afbd826e ("Bluetooth: Add initial implementation of CIS connections")
Signed-off-by: Pauli Virtanen <pav@iki.fi>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
The existing documentation refers to memory.high as the "main mechanism
to control memory usage." This seems incorrect to me - memory.high can
result in reclaim pressure which simply leads to stalls unless some
external component observes and actions on it (e.g. systemd-oomd can be
used for this purpose). While this is feasible, users are unaware of
this interaction and are led to believe that memory.high alone is an
effective mechanism for limiting memory.
The documentation should recommend the use of memory.max as the
effective way to enforce memory limits - it triggers reclaim and results
in OOM kills by itself.
Signed-off-by: Dan Schatzberg <schatzberg.dan@gmail.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Chris Down <chris@chrisdown.name>
Signed-off-by: Tejun Heo <tj@kernel.org>
Andrii Nakryiko writes:
And we currently don't have an attach type for NETLINK BPF link.
Thankfully it's not too late to add it. I see that link_create() in
kernel/bpf/syscall.c just bypasses attach_type check. We shouldn't
have done that. Instead we need to add BPF_NETLINK attach type to enum
bpf_attach_type. And wire all that properly throughout the kernel and
libbpf itself.
This adds BPF_NETFILTER and uses it. This breaks uabi but this
wasn't in any non-rc release yet, so it should be fine.
v2: check link_attack prog type in link_create too
Fixes: 84601d6ee6 ("bpf: add bpf_link support for BPF_NETFILTER programs")
Suggested-by: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/CAEf4BzZ69YgrQW7DHCJUT_X+GqMq_ZQQPBwopaJJVGFD5=d5Vg@mail.gmail.com/
Link: https://lore.kernel.org/bpf/20230605131445.32016-1-fw@strlen.de
It seems we forgot the normal case to terminate the retry loop,
making us asking 3 times each command, which is probably a little bit
too much.
And remove the ugly "goto exit" that can be replaced by a simpler "break"
Fixes: 586e8fede7 ("HID: logitech-hidpp: Retry commands when device is busy")
Suggested-by: Mark Lord <mlord@pobox.com>
Tested-by: Mark Lord <mlord@pobox.com>
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
[BUG]
Test case btrfs/027 would crash with subpage (64K page size, 4K
sectorsize) with the following dying messages:
debug: map_length=16384 length=65536 type=metadata|raid6(0x104)
assertion failed: map_length >= length, in fs/btrfs/volumes.c:8093
------------[ cut here ]------------
kernel BUG at fs/btrfs/messages.c:259!
Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
Call trace:
btrfs_assertfail+0x28/0x2c [btrfs]
btrfs_map_repair_block+0x150/0x2b8 [btrfs]
btrfs_repair_io_failure+0xd4/0x31c [btrfs]
btrfs_read_extent_buffer+0x150/0x16c [btrfs]
read_tree_block+0x38/0xbc [btrfs]
read_tree_root_path+0xfc/0x1bc [btrfs]
btrfs_get_root_ref.part.0+0xd4/0x3a8 [btrfs]
open_ctree+0xa30/0x172c [btrfs]
btrfs_mount_root+0x3c4/0x4a4 [btrfs]
legacy_get_tree+0x30/0x60
vfs_get_tree+0x28/0xec
vfs_kern_mount.part.0+0x90/0xd4
vfs_kern_mount+0x14/0x28
btrfs_mount+0x114/0x418 [btrfs]
legacy_get_tree+0x30/0x60
vfs_get_tree+0x28/0xec
path_mount+0x3e0/0xb64
__arm64_sys_mount+0x200/0x2d8
invoke_syscall+0x48/0x114
el0_svc_common.constprop.0+0x60/0x11c
do_el0_svc+0x38/0x98
el0_svc+0x40/0xa8
el0t_64_sync_handler+0xf4/0x120
el0t_64_sync+0x190/0x194
Code: aa0403e2 b0fff060 91010000 959c2024 (d4210000)
[CAUSE]
In btrfs/027 we test RAID6 with missing devices, in this particular
case, we're repairing a metadata at the end of a data stripe.
But at btrfs_repair_io_failure(), we always pass a full PAGE for repair,
and for subpage case this can cross stripe boundary and lead to the
above BUG_ON().
This metadata repair code is always there, since the introduction of
subpage support, but this can trigger BUG_ON() after the bio split
ability at btrfs_map_bio().
[FIX]
Instead of passing the old PAGE_SIZE, we calculate the correct length
based on the eb size and page size for both regular and subpage cases.
CC: stable@vger.kernel.org # 6.3+
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
AT91 fixes for 6.4
It contains:
- fix imbalanced reference counter for ethernet devices; without it
system hangs after consecutive suspend/resume cycles;
- fix debounce delay property for shutdown controller; the initial DT
property is not what the driver expects.
* tag 'at91-fixes-6.4' of https://git.kernel.org/pub/scm/linux/kernel/git/at91/linux:
ARM: dts: at91: sama7g5ek: fix debounce delay property for shdwc
ARM: at91: pm: fix imbalanced reference counter for ethernet devices
Link: https://lore.kernel.org/r/20230530105930.11621-1-claudiu.beznea@microchip.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Qualcomm ARM64 DeviceTree fixes for 6.4
Register scheme for SM8550 LLCC is corrected to avoid using the wrong
register offsets. SDRAM frequency for misidentified SC7180-lite boards
is handled. The datatype for Soundwire interval on SM8550 is corrected.
The resource controller on SC8280XP is added to the CPU cluster
power-domain to get notified to send cached sleep and wake votes before
going entering the lower power states.
SA8155P power-domains that differ from what's inherited from the SM8150
DeviceTree are adjusted to make the platform boot again.
Remoteproc firmware paths are corrected for Sony Xperia 10 IV.
Cache properties are adjusted across a range of platforms, to meet
changes in the binding.
Panel compatibles are corrected for Xiaomi Mi Pad 5 Pro, to match
binding. Invalid dai-cells are dropped from SC7280 devices, to match
binding.
The incorrect removal of "input-enable" from the LPASS pinctrl node of
SC8280XP was reverted, to get dmic pins in the correct state again.
The incorrect input-enable property is dropped from a msm8974, mdm9615
and apq8026 to resolve a range of DT validation warnings, incorrectly
picked up through the ARM64 tree.
* tag 'qcom-arm64-fixes-for-6.4' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux:
arm64: dts: qcom: sm8550: Use the correct LLCC register scheme
arm64: dts: qcom: sc7180-lite: Fix SDRAM freq for misidentified sc7180-lite boards
arm64: dts: qcom: sm8550: use uint16 for Soundwire interval
arm64: dts: qcom: Split out SA8155P and use correct RPMh power domains
arm64: dts: qcom: sm6375-pdx225: Fix remoteproc firmware paths
arm64: dts: qcom: add missing cache properties
arm64: dts: qcom: use decimal for cache level
arm64: dts: qcom: fix indentation
ARM: dts: qcom: msm8974: remove superfluous "input-enable"
ARM: dts: qcom: mdm9615: remove superfluous "input-enable"
ARM: dts: qcom: apq8026: remove superfluous "input-enable"
arm64: dts: qcom: sm8250-xiaomi-elish-csot: fix panel compatible
arm64: dts: qcom: sm8250-xiaomi-elish-boe: fix panel compatible
arm64: dts: qcom: sc7280-qcard: drop incorrect dai-cells from WCD938x SDW
arm64: dts: qcom: sc7280-idp: drop incorrect dai-cells from WCD938x SDW
arm64: dts: qcom: sc8280xp: Flush RSC sleep & wake votes
arm64: dts: qcom: sc8280xp: Revert "arm64: dts: qcom: sc8280xp: remove superfluous "input-enable""
Link: https://lore.kernel.org/r/20230601142659.2246348-1-andersson@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Qualcomm driver fixes for 6.4
Error paths is corrected across icc-bwmon, rpmh-rsc, ramp_controller and
rmtfs. The ice module is renamed qcom_ice, to avoid clashing with
existing "ice" driver.
SA8155P-specific RPMh power-domains are introduced to avoid the code
trying to access resources that exists on SM8150, but not on SA8155P.
Lastly, changes to the EDAC driver to fix an issue where the driver
performs mmio based on the wrong register map.
* tag 'qcom-driver-fixes-for-6.4' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux:
EDAC/qcom: Get rid of hardcoded register offsets
EDAC/qcom: Remove superfluous return variable assignment in qcom_llcc_core_setup()
dt-bindings: cache: qcom,llcc: Fix SM8550 description
soc: qcom: rpmhpd: Add SA8155P power domains
dt-bindings: power: qcom,rpmpd: Add SA8155P
soc: qcom: Rename ice to qcom_ice to avoid module name conflict
soc: qcom: rmtfs: Fix error code in probe()
soc: qcom: ramp_controller: Fix an error handling path in qcom_ramp_controller_probe()
soc: qcom: rpmh-rsc: drop redundant unsigned >=0 comparision
soc: qcom: icc-bwmon: fix incorrect error code passed to dev_err_probe()
Link: https://lore.kernel.org/r/20230601141058.2246039-1-andersson@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Pull asymmetric keys fix from Roberto Sassu:
"Here is a small fix to make an unconditional copy of the buffer passed
to crypto operations, to take into account the case of the stack not
in the linear mapping area.
It has been tested and verified to fix the bug"
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: David Howells <dhowells@redhat.com>
* tag 'asym-keys-fix-for-linus-v6.4-rc5' of https://github.com/robertosassu/linux:
KEYS: asymmetric: Copy sig and digest in public_key_verify_signature()
In cs_dsp_load_coeff() region_name should be set in the XM/YM/ZM
cases otherwise any errors will log the region as "Unknown".
While doing this also change one error message that logged the
region type ID to log the region_name instead. This makes it
consistent with other messages in the same function.
Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Link: https://lore.kernel.org/r/20230605143238.4001982-1-rf@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Mat Martineau says:
====================
mptcp: Fixes for address advertisement
Patches 1 and 2 allow address advertisements to be removed without
affecting current connected subflows, and updates associated self tests.
Patches 3 and 4 correctly track (and allow removal of) addresses that
were implicitly announced as part of subflow creation. Also updates
associated self tests.
Patch 5 makes subflow and address announcement counters work consistently
between the userspace and in-kernel path managers.
====================
Signed-off-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Increase pm subflows counter on both server side and client side when
userspace pm creates a new subflow, and decrease the counter when it
closes a subflow.
Increase add_addr_signaled counter in mptcp_nl_cmd_announce() when the
address is announced by userspace PM.
This modification is similar to how the in-kernel PM is updating the
counter: when additional subflows are created/removed.
Fixes: 9ab4807c84 ("mptcp: netlink: Add MPTCP_PM_CMD_ANNOUNCE")
Fixes: 702c2f646d ("mptcp: netlink: allow userspace-driven subflow establishment")
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/329
Cc: stable@vger.kernel.org
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the address into userspace_pm_local_addr_list when the subflow is
created. Make sure it can be found in mptcp_nl_cmd_remove(). And delete
it in the new helper mptcp_userspace_pm_delete_local_addr().
By doing this, the "REMOVE" command also works with subflows that have
been created via the "SUB_CREATE" command instead of restricting to
the addresses that have been announced via the "ANNOUNCE" command.
Fixes: d9a4594eda ("mptcp: netlink: Add MPTCP_PM_CMD_REMOVE")
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/379
Cc: stable@vger.kernel.org
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is linked to the previous commit ("mptcp: only send RM_ADDR in
nl_cmd_remove").
To align with what is done by the in-kernel PM, update userspace pm addr
selftests, by sending a remove_subflows command together after the
remove_addrs command.
Fixes: d9a4594eda ("mptcp: netlink: Add MPTCP_PM_CMD_REMOVE")
Fixes: 97040cf980 ("selftests: mptcp: userspace pm address tests")
Cc: stable@vger.kernel.org
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The specifications from [1] about the "REMOVE" command say:
Announce that an address has been lost to the peer
It was then only supposed to send a RM_ADDR and not trying to delete
associated subflows.
A new helper mptcp_pm_remove_addrs() is then introduced to do just
that, compared to mptcp_pm_remove_addrs_and_subflows() also removing
subflows.
To delete a subflow, the userspace daemon can use the "SUB_DESTROY"
command, see mptcp_nl_cmd_sf_destroy().
Fixes: d9a4594eda ("mptcp: netlink: Add MPTCP_PM_CMD_REMOVE")
Link: https://github.com/multipath-tcp/mptcp/blob/mptcp_v0.96/include/uapi/linux/mptcp.h [1]
Cc: stable@vger.kernel.org
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mika writes:
thunderbolt: Fixes for v6.4-rc6
This includes following fixes for v6.4-rc6:
- Fix DMA test driver to pass correct values when only RX or TX ring
is used
- Increase timeout when DisplayPort tunnel is established to cope with
a VGA/DVI dongle connected to a dock
- Do not enable CL states when BIOS connnection manager already
created the tunnels
- Correct the ring interrupt masking to work again in Intel hardware
on resume from system sleep states.
All these have been in linux-next with no reported issues.
* tag 'thunderbolt-for-v6.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/westeri/thunderbolt:
thunderbolt: Mask ring interrupt on Intel hardware as well
thunderbolt: Do not touch CL state configuration during discovery
thunderbolt: Increase DisplayPort Connection Manager handshake timeout
thunderbolt: dma_test: Use correct value for absent rings when creating paths
We must not assign plat_dat->dwmac4_addrs unconditionally as for
structures which don't set them, this will result in the core driver
using zeroes everywhere and breaking the driver for older HW. On EMAC < 2
the address should remain NULL.
Fixes: b68376191c ("net: stmmac: dwmac-qcom-ethqos: Add EMAC3 support")
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Reviewed-by: Andrew Halaney <ahalaney@redhat.com>
Reviewed-by: Siddharth Vadapalli <s-vadapalli@ti.com>
Reviewed-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
There seems to be a bug within the mv64xxx I2C controller, wherein the
status register may not necessarily contain valid value immediately
after the IFLG flag is set in the control register.
My theory is that the controller:
- first sets the IFLG in control register
- then updates the status register
- then raises an interrupt
This may sometime cause weird bugs when in atomic mode, since in this
mode we do not wait for an interrupt, but instead we poll the control
register for IFLG and read status register immediately after.
I encountered -ENXIO from mv64xxx_i2c_fsm() due to this issue when using
this driver in atomic mode.
Note that I've only seen this issue on Armada 385, I don't know whether
other SOCs with this controller are also affected. Also note that this
fix has been in U-Boot for over 4 years [1] without anybody complaining,
so it should not cause regressions.
[1] https://source.denx.de/u-boot/u-boot/-/commit/d50e29662f78
Fixes: 544a8d75f3 ("i2c: mv64xxx: Add atomic_xfer method to driver")
Signed-off-by: Marek Behún <kabel@kernel.org>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Marc Kleine-Budde says:
====================
this is a pull request of 3 patches for net/master.
All 3 patches target the j1939 stack.
The 1st patch is by Oleksij Rempel and fixes the error queue handling
for (E)TP sessions that run into timeouts.
The last 2 patches are by Fedor Pchelkin and fix a potential
use-after-free in j1939_netdev_start() if j1939_can_rx_register()
fails.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
With IC_INTR_RX_FULL slave interrupt handler reads data in a loop until
RX FIFO is empty. When testing with the slave-eeprom, each transaction
has 2 bytes for address/index and 1 byte for value, the address byte
can be written as data byte due to dropping STOP condition.
In the test below, the master continuously writes to the slave, first 2
bytes are index, 3rd byte is value and follow by a STOP condition.
i2c_write: i2c-3 #0 a=04b f=0000 l=3 [00-D1-D1]
i2c_write: i2c-3 #0 a=04b f=0000 l=3 [00-D2-D2]
i2c_write: i2c-3 #0 a=04b f=0000 l=3 [00-D3-D3]
Upon receiving STOP condition slave eeprom would reset `idx_write_cnt` so
next 2 bytes can be treated as buffer index for upcoming transaction.
Supposedly the slave eeprom buffer would be written as
EEPROM[0x00D1] = 0xD1
EEPROM[0x00D2] = 0xD2
EEPROM[0x00D3] = 0xD3
When CPU load is high the slave irq handler may not read fast enough,
the interrupt status can be seen as 0x204 with both DW_IC_INTR_STOP_DET
(0x200) and DW_IC_INTR_RX_FULL (0x4) bits. The slave device may see
the transactions below.
0x1 STATUS SLAVE_ACTIVITY=0x1 : RAW_INTR_STAT=0x1594 : INTR_STAT=0x4
0x1 STATUS SLAVE_ACTIVITY=0x1 : RAW_INTR_STAT=0x1594 : INTR_STAT=0x4
0x1 STATUS SLAVE_ACTIVITY=0x1 : RAW_INTR_STAT=0x1594 : INTR_STAT=0x4
0x1 STATUS SLAVE_ACTIVITY=0x1 : RAW_INTR_STAT=0x1794 : INTR_STAT=0x204
0x1 STATUS SLAVE_ACTIVITY=0x0 : RAW_INTR_STAT=0x1790 : INTR_STAT=0x200
0x1 STATUS SLAVE_ACTIVITY=0x1 : RAW_INTR_STAT=0x1594 : INTR_STAT=0x4
0x1 STATUS SLAVE_ACTIVITY=0x1 : RAW_INTR_STAT=0x1594 : INTR_STAT=0x4
0x1 STATUS SLAVE_ACTIVITY=0x1 : RAW_INTR_STAT=0x1594 : INTR_STAT=0x4
After `D1` is received, read loop continues to read `00` which is the
first bype of next index. Since STOP condition is ignored by the loop,
eeprom buffer index increased to `D2` and `00` is written as value.
So the slave eeprom buffer becomes
EEPROM[0x00D1] = 0xD1
EEPROM[0x00D2] = 0x00
EEPROM[0x00D3] = 0xD3
The fix is to use `FIRST_DATA_BYTE` (bit 11) in `IC_DATA_CMD` to split
the transactions. The first index byte in this case would have bit 11
set. Check this indication to inject I2C_SLAVE_WRITE_REQUESTED event
which will reset `idx_write_cnt` in slave eeprom.
Signed-off-by: David Zheng <david.zheng@intel.com>
Acked-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Rather than casting pci1xxxx_i2c_shutdown to an incompatible function type,
update the type to match that expected by __devm_add_action.
Reported by clang-16 with W-1:
.../i2c-mchp-pci1xxxx.c:1159:29: error: cast from 'void (*)(struct pci1xxxx_i2c *)' to 'void (*)(void *)' converts to incompatible function type [-Werror,-Wcast-function-type-strict]
ret = devm_add_action(dev, (void (*)(void *))pci1xxxx_i2c_shutdown, i2c);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
./include/linux/device.h:251:29: note: expanded from macro 'devm_add_action'
__devm_add_action(release, action, data, #action)
^~~~~~
No functional change intended.
Compile tested only.
Signed-off-by: Simon Horman <horms@kernel.org>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Reviewed-by: Andi Shyti <andi.shyti@kernel.org>
Reviewed-by: Tharun Kumar P<tharunkumar.pasumarthi@microchip.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
GCC 11.3.0 issues warnings in this module about wrong sizes of format
specifiers:
pcm-test.c: In function ‘test_pcm_time’:
pcm-test.c:384:68: warning: format ‘%ld’ expects argument of type ‘long int’, but argument 5 \
has type ‘unsigned int’ [-Wformat=]
384 | snprintf(msg, sizeof(msg), "rate mismatch %ld != %ld", rate, rrate);
pcm-test.c:455:53: warning: format ‘%d’ expects argument of type ‘int’, but argument 4 has \
type ‘long int’ [-Wformat=]
455 | "expected %d, wrote %li", rate, frames);
pcm-test.c:462:53: warning: format ‘%d’ expects argument of type ‘int’, but argument 4 has \
type ‘long int’ [-Wformat=]
462 | "expected %d, wrote %li", rate, frames);
pcm-test.c:467:53: warning: format ‘%d’ expects argument of type ‘int’, but argument 4 has \
type ‘long int’ [-Wformat=]
467 | "expected %d, wrote %li", rate, frames);
Simple fix according to compiler's suggestion removed the warnings.
Signed-off-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20230524191528.13203-1-mirsad.todorovac@alu.unizg.hr
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Fedor Pchelkin <pchelkin@ispras.ru> says:
The patch series fixes a possible racy use-after-free scenario
described in 2/2: if j1939_can_rx_register() fails then the concurrent
thread may have already read the invalid priv structure.
The 1/2 makes j1939_netdev_lock a mutex so that access to
j1939_can_rx_register() can be serialized without changing GFP_KERNEL
to GFP_ATOMIC inside can_rx_register(). This seems to be safe.
Note that the patch series has been tested only via Syzkaller and not
with a real device.
Link: https://lore.kernel.org/r/20230526171910.227615-1-pchelkin@ispras.ru
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
It turns out access to j1939_can_rx_register() needs to be serialized,
otherwise j1939_priv can be corrupted when parallel threads call
j1939_netdev_start() and j1939_can_rx_register() fails. This issue is
thoroughly covered in other commit which serializes access to
j1939_can_rx_register().
Change j1939_netdev_lock type to mutex so that we do not need to remove
GFP_KERNEL from can_rx_register().
j1939_netdev_lock seems to be used in normal contexts where mutex usage
is not prohibited.
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
Fixes: 9d71dd0c70 ("can: add support of SAE J1939 protocol")
Suggested-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Tested-by: Oleksij Rempel <o.rempel@pengutronix.de>
Acked-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://lore.kernel.org/r/20230526171910.227615-2-pchelkin@ispras.ru
Cc: stable@vger.kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
This patch addresses an issue within the j1939_sk_send_loop_abort()
function in the j1939/socket.c file, specifically in the context of
Transport Protocol (TP) sessions.
Without this patch, when a TP session is initiated and a Clear To Send
(CTS) frame is received from the remote side requesting one data packet,
the kernel dispatches the first Data Transport (DT) frame and then waits
for the next CTS. If the remote side doesn't respond with another CTS,
the kernel aborts due to a timeout. This leads to the user-space
receiving an EPOLLERR on the socket, and the socket becomes active.
However, when trying to read the error queue from the socket with
sock.recvmsg(, , socket.MSG_ERRQUEUE), it returns -EAGAIN,
given that the socket is non-blocking. This situation results in an
infinite loop: the user-space repeatedly calls epoll(), epoll() returns
the socket file descriptor with EPOLLERR, but the socket then blocks on
the recv() of ERRQUEUE.
This patch introduces an additional check for the J1939_SOCK_ERRQUEUE
flag within the j1939_sk_send_loop_abort() function. If the flag is set,
it indicates that the application has subscribed to receive error queue
messages. In such cases, the kernel can communicate the current transfer
state via the error queue. This allows for the function to return early,
preventing the unnecessary setting of the socket into an error state,
and breaking the infinite loop. It is crucial to note that a socket
error is only needed if the application isn't using the error queue, as,
without it, the application wouldn't be aware of transfer issues.
Fixes: 9d71dd0c70 ("can: add support of SAE J1939 protocol")
Reported-by: David Jander <david@protonic.nl>
Tested-by: David Jander <david@protonic.nl>
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://lore.kernel.org/r/20230526081946.715190-1-o.rempel@pengutronix.de
Cc: stable@vger.kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Unlinked list recovery requires errors removing the inode the from
the unlinked list get fed back to the main recovery loop. Now that
we offload the unlinking to the inodegc work, we don't get errors
being fed back when we trip over a corruption that prevents the
inode from being removed from the unlinked list.
This means we never clear the corrupt unlinked list bucket,
resulting in runtime operations eventually tripping over it and
shutting down.
Fix this by collecting inodegc worker errors and feed them
back to the flush caller. This is largely best effort - the only
context that really cares is log recovery, and it only flushes a
single inode at a time so we don't need complex synchronised
handling. Essentially the inodegc workers will capture the first
error that occurs and the next flush will gather them and clear
them. The flush itself will only report the first gathered error.
In the cases where callers can return errors, propagate the
collected inodegc flush error up the error handling chain.
In the case of inode unlinked list recovery, there are several
superfluous calls to flush queued unlinked inodes -
xlog_recover_iunlink_bucket() guarantees that it has flushed the
inodegc and collected errors before it returns. Hence nothing in the
calling path needs to run a flush, even when an error is returned.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Bad things happen in defered extent freeing operations if it is
passed a bad block number in the xefi. This can come from a bogus
agno/agbno pair from deferred agfl freeing, or just a bad fsbno
being passed to __xfs_free_extent_later(). Either way, it's very
difficult to diagnose where a null perag oops in EFI creation
is coming from when the operation that queued the xefi has already
been completed and there's no longer any trace of it around....
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Dave Chinner <david@fromorbit.com>
If the agfl or the indexing in the AGF has been corrupted, getting a
block form the AGFL could return an invalid block number. If this
happens, bad things happen. Check the agbno we pull off the AGFL
and return -EFSCORRUPTED if we find somethign bad.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Dave Chinner <david@fromorbit.com>
When a v4 filesystem has fl_last - fl_first != fl_count, we do not
not detect the corruption and allow the AGF to be used as it if was
fully valid. On V5 filesystems, we reset the AGFL to empty in these
cases and avoid the corruption at a small cost of leaked blocks.
If we don't catch the corruption on V4 filesystems, bad things
happen later when an allocation attempts to trim the free list
and either double-frees stale entries in the AGFl or tries to free
NULLAGBNO entries.
Either way, this is bad. Prevent this from happening by using the
AGFL_NEED_RESET logic for v4 filesysetms, too.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Dave Chinner <david@fromorbit.com>
xfs_bmap_longest_free_extent() can return an error when accessing
the AGF fails. In this case, the behaviour of
xfs_filestream_pick_ag() is conditional on the error. We may
continue the loop, or break out of it. The error handling after the
loop cleans up the perag reference held when the break occurs. If we
continue, the next loop iteration handles cleaning up the perag
reference.
EIther way, we don't need to release the active perag reference when
xfs_bmap_longest_free_extent() fails. Doing so means we do a double
decrement on the active reference count, and this causes tha active
reference count to fall to zero. At this point, new active
references will fail.
This leads to unmount hanging because it tries to grab active
references to that perag, only for it to fail. This happens inside a
loop that retries until a inode tree radix tree tag is cleared,
which cannot happen because we can't get an active reference to the
perag.
The unmount livelocks in this path:
xfs_reclaim_inodes+0x80/0xc0
xfs_unmount_flush_inodes+0x5b/0x70
xfs_unmountfs+0x5b/0x1a0
xfs_fs_put_super+0x49/0x110
generic_shutdown_super+0x7c/0x1a0
kill_block_super+0x27/0x50
deactivate_locked_super+0x30/0x90
deactivate_super+0x3c/0x50
cleanup_mnt+0xc2/0x160
__cleanup_mnt+0x12/0x20
task_work_run+0x5e/0xa0
exit_to_user_mode_prepare+0x1bc/0x1c0
syscall_exit_to_user_mode+0x16/0x40
do_syscall_64+0x40/0x80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
Reported-by: Pengfei Xu <pengfei.xu@intel.com>
Fixes: eb70aa2d8e ("xfs: use for_each_perag_wrap in xfs_filestream_pick_ag")
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Commit 6bc6c99a944c was a well-intentioned effort to initiate
consolidation of adjacent bmbt mapping records by setting the PREEN
flag. Consolidation can only happen if the length of the combined
record doesn't overflow the 21-bit blockcount field of the bmbt
recordset. Unfortunately, the length test is inverted, leading to it
triggering on data forks like these:
EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL
0: [0..16777207]: 76110848..92888055 0 (76110848..92888055) 16777208
1: [16777208..20639743]: 92888056..96750591 0 (92888056..96750591) 3862536
Note that record 0 has a length of 16777208 512b blocks. This
corresponds to 2097151 4k fsblocks, which is the maximum. Hence the two
records cannot be merged.
However, the logic is still wrong even if we change the in-loop
comparison, because the scope of our examination isn't broad enough
inside the loop to detect mappings like this:
0: [0..9]: 76110838..76110847 0 (76110838..76110847) 10
1: [10..16777217]: 76110848..92888055 0 (76110848..92888055) 16777208
2: [16777218..20639753]: 92888056..96750591 0 (92888056..96750591) 3862536
These three records could be merged into two, but one cannot determine
this purely from looking at records 0-1 or 1-2 in isolation.
Hoist the mergability detection outside the loop, and base its decision
making on whether or not a merged mapping could be expressed in fewer
bmbt records. While we're at it, fix the incorrect return type of the
iter function.
Fixes: 336642f792 ("xfs: alert the user about data/attr fork mappings that could be merged")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
With gcc-5:
In file included from ./include/trace/define_trace.h:102:0,
from ./fs/xfs/scrub/trace.h:988,
from fs/xfs/scrub/trace.c:40:
./fs/xfs/./scrub/trace.h: In function ‘trace_raw_output_xchk_fsgate_class’:
./fs/xfs/scrub/scrub.h:111:28: error: initializer element is not constant
#define XREP_ALREADY_FIXED (1 << 31) /* checking our repair work */
^
Shifting the (signed) value 1 into the sign bit is undefined behavior.
Fix this for all definitions in the file by shifting "1U" instead of
"1".
This was exposed by the first user added in commit 466c525d6d
("xfs: minimize overhead of drain wakeups by using jump labels").
Fixes: 160b5a7845 ("xfs: hoist the already_fixed variable to the scrub context")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Lock order in XFS is AGI -> AGF, hence for operations involving
inode unlinked list operations we always lock the AGI first. Inode
unlinked list operations operate on the inode cluster buffer,
so the lock order there is AGI -> inode cluster buffer.
For O_TMPFILE operations, this now means the lock order set down in
xfs_rename and xfs_link is AGI -> inode cluster buffer -> AGF as the
unlinked ops are done before the directory modifications that may
allocate space and lock the AGF.
Unfortunately, we also now lock the inode cluster buffer when
logging an inode so that we can attach the inode to the cluster
buffer and pin it in memory. This creates a lock order of AGF ->
inode cluster buffer in directory operations as we have to log the
inode after we've allocated new space for it.
This creates a lock inversion between the AGF and the inode cluster
buffer. Because the inode cluster buffer is shared across multiple
inodes, the inversion is not specific to individual inodes but can
occur when inodes in the same cluster buffer are accessed in
different orders.
To fix this we need move all the inode log item cluster buffer
interactions to the end of the current transaction. Unfortunately,
xfs_trans_log_inode() calls are littered throughout the transactions
with no thought to ordering against other items or locking. This
makes it difficult to do anything that involves changing the call
sites of xfs_trans_log_inode() to change locking orders.
However, we do now have a mechanism that allows is to postpone dirty
item processing to just before we commit the transaction: the
->iop_precommit method. This will be called after all the
modifications are done and high level objects like AGI and AGF
buffers have been locked and modified, thereby providing a mechanism
that guarantees we don't lock the inode cluster buffer before those
high level objects are locked.
This change is largely moving the guts of xfs_trans_log_inode() to
xfs_inode_item_precommit() and providing an extra flag context in
the inode log item to track the dirty state of the inode in the
current transaction. This also means we do a lot less repeated work
in xfs_trans_log_inode() by only doing it once per transaction when
all the work is done.
Fixes: 298f7bec50 ("xfs: pin inode backing buffer to the inode log item")
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Dave Chinner <david@fromorbit.com>
To fix a AGI-AGF-inode cluster buffer deadlock, we need to move
inode cluster buffer operations to the ->iop_precommit() method.
However, this means that deferred operations can require precommits
to be run on the final transaction that the deferred ops pass back
to xfs_trans_commit() context. This will be exposed by attribute
handling, in that the last changes to the inode in the attr set
state machine "disappear" because the precommit operation is not run.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
It was accidentally dropped when refactoring the allocation code,
resulting in the AG iteration always doing blocking AG iteration.
This results in a small performance regression for a specific fsmark
test that runs more user data writer threads than there are AGs.
Reported-by: kernel test robot <oliver.sang@intel.com>
Fixes: 2edf06a50f ("xfs: factor xfs_alloc_vextent_this_ag() for _iterate_ags()")
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
When a buffer is unpinned by xfs_buf_item_unpin(), we need to access
the buffer after we've dropped the buffer log item reference count.
This opens a window where we can have two racing unpins for the
buffer item (e.g. shutdown checkpoint context callback processing
racing with journal IO iclog completion processing) and both attempt
to access the buffer after dropping the BLI reference count. If we
are unlucky, the "BLI freed" context wins the race and frees the
buffer before the "BLI still active" case checks the buffer pin
count.
This results in a use after free that can only be triggered
in active filesystem shutdown situations.
To fix this, we need to ensure that buffer existence extends beyond
the BLI reference count checks and until the unpin processing is
complete. This implies that a buffer pin operation must also take a
buffer reference to ensure that the buffer cannot be freed until the
buffer unpin processing is complete.
Reported-by: yangerkun <yangerkun@huawei.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Currently, with VHE, KVM sets ER, CR, SW and EN bits of
PMUSERENR_EL0 to 1 on vcpu_load(), and saves and restores
the register value for the host on vcpu_load() and vcpu_put().
If the value of those bits are cleared on a pCPU with a vCPU
loaded (armv8pmu_start() would do that when PMU counters are
programmed for the guest), PMU access from the guest EL0 might
be trapped to the guest EL1 directly regardless of the current
PMUSERENR_EL0 value of the vCPU.
Fix this by not letting armv8pmu_start() overwrite PMUSERENR_EL0
on the pCPU where PMUSERENR_EL0 for the guest is loaded, and
instead updating the saved shadow register value for the host
so that the value can be restored on vcpu_put() later.
While vcpu_{put,load}() are manipulating PMUSERENR_EL0, disable
IRQs to prevent a race condition between these processes and IPIs
that attempt to update PMUSERENR_EL0 for the host EL0.
Suggested-by: Mark Rutland <mark.rutland@arm.com>
Suggested-by: Marc Zyngier <maz@kernel.org>
Fixes: 83a7a4d643 ("arm64: perf: Enable PMU counter userspace access for perf event")
Signed-off-by: Reiji Watanabe <reijiw@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230603025035.3781797-3-reijiw@google.com
Restore the host's PMUSERENR_EL0 value instead of clearing it,
before returning back to userspace, as the host's EL0 might have
a direct access to PMU registers (some bits of PMUSERENR_EL0 for
might not be zero for the host EL0).
Fixes: 83a7a4d643 ("arm64: perf: Enable PMU counter userspace access for perf event")
Signed-off-by: Reiji Watanabe <reijiw@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230603025035.3781797-2-reijiw@google.com
Pull irq fix from Borislav Petkov:
- Fix open firmware quirks validation so that they don't get applied
wrongly
* tag 'irq_urgent_for_v6.4_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/gic: Correctly validate OF quirk descriptors
This patch fixes the following sparse warning:
net/sched/sch_api.c:2305:1: sparse: warning: symbol 'tc_skip_wrapper' was not declared. Should it be static?
No functional change intended.
Signed-off-by: Min-Hua Chen <minhuadotchen@gmail.com>
Acked-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Fang says:
====================
net: enetc: correct the statistics of rx bytes
The purpose of this patch set is to fix the issue of rx bytes
statistics. The first patch corrects the rx bytes statistics
of normal kernel protocol stack path, and the second patch is
used to correct the rx bytes statistics of XDP.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The rx_bytes statistics of XDP are always zero, because rx_byte_cnt
is not updated after it is initialized to 0. So fix it.
Fixes: d1b15102dd ("net: enetc: add support for XDP_DROP and XDP_PASS")
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The rx_bytes of struct net_device_stats should count the length of
ethernet frames excluding the FCS. However, there are two problems
with the rx_bytes statistics of the current enetc driver. one is
that the length of VLAN header is not counted if the VLAN extraction
feature is enabled. The other is that the length of L2 header is not
counted, because eth_type_trans() is invoked before updating rx_bytes
which will subtract the length of L2 header from skb->len.
BTW, the rx_bytes statistics of XDP path also have similar problem,
I will fix it in another patch.
Fixes: a800abd3ec ("net: enetc: move skb creation into enetc_build_skb")
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull media fixes from Mauro Carvalho Chehab:
"Some driver fixes:
- a regression fix for the verisilicon driver
- uvcvideo: don't expose unsupported video formats to userspace
- camss-video: don't zero subdev format after init
- mediatek: some fixes for 4K decoder formats
- fix a Sphinx build warning (missing doc for client_caps)
- some fixes for imx and atomisp staging drivers
And two CEC core fixes:
- don't set last_initiator if TX in progress
- disable adapter in cec_devnode_unregister"
* tag 'media/v6.4-4' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
media: uvcvideo: Don't expose unsupported formats to userspace
media: v4l2-subdev: Fix missing kerneldoc for client_caps
media: staging: media: imx: initialize hs_settle to avoid warning
media: v4l2-mc: Drop subdev check in v4l2_create_fwnode_links_to_pad()
media: staging: media: atomisp: init high & low vars
media: cec: core: don't set last_initiator if tx in progress
media: cec: core: disable adapter in cec_devnode_unregister
media: mediatek: vcodec: Only apply 4K frame sizes on decoder formats
media: camss: camss-video: Don't zero subdev format again after initialization
media: verisilicon: Additional fix for the crash when opening the driver
Linux 6.4-rc4
* tag 'v6.4-rc4': (606 commits)
Linux 6.4-rc4
cxl: Explicitly initialize resources when media is not ready
x86: re-introduce support for ERMS copies for user space accesses
NVMe: Add MAXIO 1602 to bogus nid list.
module: error out early on concurrent load of the same module file
x86/topology: Fix erroneous smp_num_siblings on Intel Hybrid platforms
cpufreq: amd-pstate: Update policy->cur in amd_pstate_adjust_perf()
io_uring: unlock sqd->lock before sq thread release CPU
MAINTAINERS: update arm64 Microchip entries
udplite: Fix NULL pointer dereference in __sk_mem_raise_allocated().
net: phy: mscc: enable VSC8501/2 RGMII RX clock
net: phy: mscc: remove unnecessary phydev locking
net: phy: mscc: add support for VSC8501
net: phy: mscc: add VSC8502 to MODULE_DEVICE_TABLE
net/handshake: Enable the SNI extension to work properly
net/handshake: Unpin sock->file if a handshake is cancelled
net/handshake: handshake_genl_notify() shouldn't ignore @flags
net/handshake: Fix uninitialized local variable
net/handshake: Fix handshake_dup() ref counting
net/handshake: Remove unneeded check from handshake_dup()
...
Pull char/misc driver fixes from Greg KH:
"Here are a bunch of tiny char/misc/other driver fixes for 6.4-rc5 that
resolve a number of reported issues. Included in here are:
- iio driver fixes
- fpga driver fixes
- test_firmware bugfixes
- fastrpc driver tiny bugfixes
- MAINTAINERS file updates for some subsystems
All of these have been in linux-next this past week with no reported
issues"
* tag 'char-misc-6.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (34 commits)
test_firmware: fix the memory leak of the allocated firmware buffer
test_firmware: fix a memory leak with reqs buffer
test_firmware: prevent race conditions by a correct implementation of locking
firmware_loader: Fix a NULL vs IS_ERR() check
MAINTAINERS: Vaibhav Gupta is the new ipack maintainer
dt-bindings: fpga: replace Ivan Bornyakov maintainership
MAINTAINERS: update Microchip MPF FPGA reviewers
misc: fastrpc: reject new invocations during device removal
misc: fastrpc: return -EPIPE to invocations on device removal
misc: fastrpc: Reassign memory ownership only for remote heap
misc: fastrpc: Pass proper scm arguments for secure map request
iio: imu: inv_icm42600: fix timestamp reset
iio: adc: ad_sigma_delta: Fix IRQ issue by setting IRQ_DISABLE_UNLAZY flag
dt-bindings: iio: adc: renesas,rcar-gyroadc: Fix adi,ad7476 compatible value
iio: dac: mcp4725: Fix i2c_master_send() return value handling
iio: accel: kx022a fix irq getting
iio: bu27034: Ensure reset is written
iio: dac: build ad5758 driver when AD5758 is selected
iio: addac: ad74413: fix resistance input processing
iio: light: vcnl4035: fixed chip ID check
...
The final production baseboard had a different chip select than
earlier prototype boards. When the newer board was released,
the SPI stopped working because the wrong pin was used in the device
tree and conflicted with the UART RTS. Fix the pinmux for
production boards.
Fixes: 36ca3c8ccb ("arm64: dts: imx: Add Beacon i.MX8M Nano development kit")
Signed-off-by: Adam Ford <aford173@gmail.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Pull driver core fixes from Greg KH:
"Here are two small driver core cacheinfo fixes for 6.4-rc5 that
resolve a number of reported issues with that file. These changes have
been in linux-next this past week with no reported problems"
* tag 'driver-core-6.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
drivers: base: cacheinfo: Update cpu_map_populated during CPU Hotplug
drivers: base: cacheinfo: Fix shared_cpu_map changes in event of CPU hotplug
Pull tty/serial driver fixes from Greg KH:
"Here are some small tty/serial driver fixes for 6.4-rc5 that have all
been in linux-next this past week with no reported problems. Included
in here are:
- 8250_tegra driver bugfix
- fsl uart driver bugfixes
- Kconfig fix for dependancy issue
- dt-bindings fix for the 8250_omap driver"
* tag 'tty-6.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
dt-bindings: serial: 8250_omap: add rs485-rts-active-high
serial: cpm_uart: Fix a COMPILE_TEST dependency
soc: fsl: cpm1: Fix TSA and QMC dependencies in case of COMPILE_TEST
tty: serial: fsl_lpuart: use UARTCTRL_TXINV to send break instead of UARTCTRL_SBK
serial: 8250_tegra: Fix an error handling path in tegra_uart_probe()
Pull USB fixes from Greg KH:
"Here are some USB driver and core fixes for 6.4-rc5. Most of these are
tiny driver fixes, including:
- udc driver bugfix
- f_fs gadget driver bugfix
- cdns3 driver bugfix
- typec bugfixes
But the "big" thing in here is a fix yet-again for how the USB buffers
are handled from userspace when dealing with DMA issues. The changes
were discussed a lot, and tested a lot, on the list, and acked by the
relevant mm maintainers and have been in linux-next all this past week
with no reported problems"
* tag 'usb-6.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: typec: tps6598x: Fix broken polling mode after system suspend/resume
mm: page_table_check: Ensure user pages are not slab pages
mm: page_table_check: Make it dependent on EXCLUSIVE_SYSTEM_RAM
usb: usbfs: Use consistent mmap functions
usb: usbfs: Enforce page requirements for mmap
dt-bindings: usb: snps,dwc3: Fix "snps,hsphy_interface" type
usb: gadget: udc: fix NULL dereference in remove()
usb: gadget: f_fs: Add unbind event before functionfs_unbind
usb: cdns3: fix NCM gadget RX speed 20x slow than expection at iMX8QM
Pull kvm fixes from Paolo Bonzini:
"ARM:
- Address some fallout of the locking rework, this time affecting the
way the vgic is configured
- Fix an issue where the page table walker frees a subtree and then
proceeds with walking what it has just freed...
- Check that a given PA donated to the guest is actually memory (only
affecting pKVM)
- Correctly handle MTE CMOs by Set/Way
- Fix the reported address of a watchpoint forwarded to userspace
- Fix the freeing of the root of stage-2 page tables
- Stop creating spurious PMU events to perform detection of the
default PMU and use the existing PMU list instead
x86:
- Fix a memslot lookup bug in the NX recovery thread that could
theoretically let userspace bypass the NX hugepage mitigation
- Fix a s/BLOCKING/PENDING bug in SVM's vNMI support
- Account exit stats for fastpath VM-Exits that never leave the super
tight run-loop
- Fix an out-of-bounds bug in the optimized APIC map code, and add a
regression test for the race"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: selftests: Add test for race in kvm_recalculate_apic_map()
KVM: x86: Bail from kvm_recalculate_phys_map() if x2APIC ID is out-of-bounds
KVM: x86: Account fastpath-only VM-Exits in vCPU stats
KVM: SVM: vNMI pending bit is V_NMI_PENDING_MASK not V_NMI_BLOCKING_MASK
KVM: x86/mmu: Grab memslot for correct address space in NX recovery worker
KVM: arm64: Document default vPMU behavior on heterogeneous systems
KVM: arm64: Iterate arm_pmus list to probe for default PMU
KVM: arm64: Drop last page ref in kvm_pgtable_stage2_free_removed()
KVM: arm64: Populate fault info for watchpoint
KVM: arm64: Reload PTE after invoking walker callback on preorder traversal
KVM: arm64: Handle trap of tagged Set/Way CMOs
arm64: Add missing Set/Way CMO encodings
KVM: arm64: Prevent unconditional donation of unmapped regions from the host
KVM: arm64: vgic: Fix a comment
KVM: arm64: vgic: Fix locking comment
KVM: arm64: vgic: Wrap vgic_its_create() with config_lock
KVM: arm64: vgic: Fix a circular locking issue
Pull powerpc fixes from Michael Ellerman:
- Fix link errors in new aes-gcm-p10 code when built-in with other
drivers
- Limit number of TCEs passed to H_STUFF_TCE hcall as per spec
- Use KSYM_NAME_LEN in xmon array size to avoid possible OOB write
Thanks to Gaurav Batra and Maninder Singh Vishal Chourasia.
* tag 'powerpc-6.4-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/xmon: Use KSYM_NAME_LEN in array size
powerpc/iommu: Limit number of TCEs to 512 for H_STUFF_TCE hcall
powerpc/crypto: Fix aes-gcm-p10 link errors
The nr_active counter continues to increase over time which causes the
blk_mq_get_tag to hang until the thread is rescheduled to a different
core despite there are still tags available.
kernel-stack
INFO: task inboundIOReacto:3014879 blocked for more than 2 seconds
Not tainted 6.1.15-amd64 #1 Debian 6.1.15~debian11
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:inboundIOReacto state:D stack:0 pid:3014879 ppid:4557 flags:0x00000000
Call Trace:
<TASK>
__schedule+0x351/0xa20
scheduler+0x5d/0xe0
io_schedule+0x42/0x70
blk_mq_get_tag+0x11a/0x2a0
? dequeue_task_stop+0x70/0x70
__blk_mq_alloc_requests+0x191/0x2e0
kprobe output showing RQF_MQ_INFLIGHT bit is not cleared before
__blk_mq_free_request being called.
320 320 kworker/29:1H __blk_mq_free_request rq_flags 0x220c0 in-flight 1
b'__blk_mq_free_request+0x1 [kernel]'
b'bt_iter+0x50 [kernel]'
b'blk_mq_queue_tag_busy_iter+0x318 [kernel]'
b'blk_mq_timeout_work+0x7c [kernel]'
b'process_one_work+0x1c4 [kernel]'
b'worker_thread+0x4d [kernel]'
b'kthread+0xe6 [kernel]'
b'ret_from_fork+0x1f [kernel]'
Signed-off-by: Tian Lan <tian.lan@twosigma.com>
Fixes: 2e315dc07d ("blk-mq: grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter")
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20230513221227.497327-1-tilan7663@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
SMCRv1 has a similar issue to SMCRv2 (see link below) that may access
invalid MRs of RMBs when construct LLC ADD LINK CONT messages.
BUG: kernel NULL pointer dereference, address: 0000000000000014
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 5 PID: 48 Comm: kworker/5:0 Kdump: loaded Tainted: G W E 6.4.0-rc3+ #49
Workqueue: events smc_llc_add_link_work [smc]
RIP: 0010:smc_llc_add_link_cont+0x160/0x270 [smc]
RSP: 0018:ffffa737801d3d50 EFLAGS: 00010286
RAX: ffff964f82144000 RBX: ffffa737801d3dd8 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff964f81370c30
RBP: ffffa737801d3dd4 R08: ffff964f81370000 R09: ffffa737801d3db0
R10: 0000000000000001 R11: 0000000000000060 R12: ffff964f82e70000
R13: ffff964f81370c38 R14: ffffa737801d3dd3 R15: 0000000000000001
FS: 0000000000000000(0000) GS:ffff9652bfd40000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000014 CR3: 000000008fa20004 CR4: 00000000003706e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
smc_llc_srv_rkey_exchange+0xa7/0x190 [smc]
smc_llc_srv_add_link+0x3ae/0x5a0 [smc]
smc_llc_add_link_work+0xb8/0x140 [smc]
process_one_work+0x1e5/0x3f0
worker_thread+0x4d/0x2f0
? __pfx_worker_thread+0x10/0x10
kthread+0xe5/0x120
? __pfx_kthread+0x10/0x10
ret_from_fork+0x2c/0x50
</TASK>
When an alernate RNIC is available in system, SMC will try to add a new
link based on the RNIC for resilience. All the RMBs in use will be mapped
to the new link. Then the RMBs' MRs corresponding to the new link will
be filled into LLC messages. For SMCRv1, they are ADD LINK CONT messages.
However smc_llc_add_link_cont() may mistakenly access to unused RMBs which
haven't been mapped to the new link and have no valid MRs, thus causing a
crash. So this patch fixes it.
Fixes: 87f88cda21 ("net/smc: rkey processing for a new link as SMC client")
Link: https://lore.kernel.org/r/1685101741-74826-3-git-send-email-guwen@linux.alibaba.com
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This resolves the issue that generated binary is showing up as an untracked git file after every build on the kernel.
Signed-off-by: Weihao Gao <weihaogao@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
KVM x86 fixes for 6.4
- Fix a memslot lookup bug in the NX recovery thread that could
theoretically let userspace bypass the NX hugepage mitigation
- Fix a s/BLOCKING/PENDING bug in SVM's vNMI support
- Account exit stats for fastpath VM-Exits that never leave the super
tight run-loop
- Fix an out-of-bounds bug in the optimized APIC map code, and add a
regression test for the race.
KVM/arm64 fixes for 6.4, take #3
- Fix the reported address of a watchpoint forwarded to userspace
- Fix the freeing of the root of stage-2 page tables
- Stop creating spurious PMU events to perform detection of the
default PMU and use the existing PMU list instead.
KVM/arm64 fixes for 6.4, take #2
- Address some fallout of the locking rework, this time affecting
the way the vgic is configured
- Fix an issue where the page table walker frees a subtree and
then proceeds with walking what it has just freed...
- Check that a given PA donated to the gues is actually memory
(only affecting pKVM)
- Correctly handle MTE CMOs by Set/Way
Pull SCSI fixes from James Bottomley:
"Five fixes, all in drivers.
The most extensive is the target change to fix the hang in the login
code, which involves changing timers from per login to per connection"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: stex: Fix gcc 13 warnings
scsi: qla2xxx: Fix NULL pointer dereference in target mode
scsi: target: iscsi: Prevent login threads from racing between each other
scsi: target: iscsi: Remove unused transport_timer
scsi: target: iscsi: Fix hang in the iSCSI login code
Pull LED fix from Johan Hovold:
"Here's a fix for a regression in 6.4-rc1 which broke the backlight on
machines such as the Lenovo ThinkPad X13s"
Acked-by: Lee Jones <lee@kernel.org>
Link: https://lore.kernel.org/lkml/20230602091928.GR449117@google.com/
* tag 'leds-6.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/johan/linux:
leds: qcom-lpg: Fix PWM period limits
The introduction of high resolution PWM support changed the order of the
operations in the calculation of min and max period. The result in both
divisions is in most cases a truncation to 0, which limits the period to
the range of [0, 0].
Both numerators (and denominators) are within 64 bits, so the whole
expression can be put directly into the div64_u64, instead of doing it
partially.
Fixes: b00d2ed376 ("leds: rgb: leds-qcom-lpg: Add support for high resolution PWM")
Reviewed-by: Caleb Connolly <caleb.connolly@linaro.org>
Tested-by: Steev Klimaszewski <steev@kali.org>
Signed-off-by: Bjorn Andersson <quic_bjorande@quicinc.com>
Acked-by: Lee Jones <lee@kernel.org>
Tested-by: Johan Hovold <johan+linaro@kernel.org>
Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8550-QRD
Link: https://lore.kernel.org/r/20230515162604.649203-1-quic_bjorande@quicinc.com
Signed-off-by: Johan Hovold <johan@kernel.org>
Pull probes fixes from Masami Hiramatsu:
- Return NULL if the trace_probe list on trace_probe_event is empty
- selftests/ftrace: Choose testing symbol name for filtering feature
from sample data instead of fixed symbol
* tag 'probes-fixes-6.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
selftests/ftrace: Choose target function for filter test from samples
tracing/probe: trace_probe_primary_from_call(): checked list_first_entry
Raju Lakkaraju reported that the below commit caused a regression
with Lan743x drivers and a 2.5G SFP. Sadly, this is because the commit
was utterly wrong. Let's fix this properly by not moving the
linkmode_and(), but instead copying the link ksettings and then
modifying the advertising mask before passing the modified link
ksettings to phylib.
Fixes: df0acdc59b ("net: phylink: fix ksettings_set() ethtool call")
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/E1q4eLm-00Ayxk-GZ@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
skip_notify_on_dev_down ctl table expects this field
to be an int (4 bytes), not a bool (1 byte).
Because proc_dou8vec_minmax() was added in 5.13,
this patch converts skip_notify_on_dev_down to an int.
Following patch then converts the field to u8 and use proc_dou8vec_minmax().
Fixes: 7c6bb7d2fa ("net/ipv6: Add knob to skip DELROUTE message on device down")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Keep switching between LAPIC_MODE_X2APIC and LAPIC_MODE_DISABLED during
APIC map construction to hunt for TOCTOU bugs in KVM. KVM's optimized map
recalc makes multiple passes over the list of vCPUs, and the calculations
ignore vCPU's whose APIC is hardware-disabled, i.e. there's a window where
toggling LAPIC_MODE_DISABLED is quite interesting.
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20230602233250.1014316-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Bail from kvm_recalculate_phys_map() and disable the optimized map if the
target vCPU's x2APIC ID is out-of-bounds, i.e. if the vCPU was added
and/or enabled its local APIC after the map was allocated. This fixes an
out-of-bounds access bug in the !x2apic_format path where KVM would write
beyond the end of phys_map.
Check the x2APIC ID regardless of whether or not x2APIC is enabled,
as KVM's hardcodes x2APIC ID to be the vCPU ID, i.e. it can't change, and
the map allocation in kvm_recalculate_apic_map() doesn't check for x2APIC
being enabled, i.e. the check won't get false postivies.
Note, this also affects the x2apic_format path, which previously just
ignored the "x2apic_id > new->max_apic_id" case. That too is arguably a
bug fix, as ignoring the vCPU meant that KVM would not send interrupts to
the vCPU until the next map recalculation. In practice, that "bug" is
likely benign as a newly present vCPU/APIC would immediately trigger a
recalc. But, there's no functional downside to disabling the map, and
a future patch will gracefully handle the -E2BIG case by retrying instead
of simply disabling the optimized map.
Opportunistically add a sanity check on the xAPIC ID size, along with a
comment explaining why the xAPIC ID is guaranteed to be "good".
Reported-by: Michal Luczaj <mhal@rbox.co>
Fixes: 5b84b02917 ("KVM: x86: Honor architectural behavior for aliased 8-bit APIC IDs")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230602233250.1014316-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Rhys Rustad-Elliott says:
====================
Commit d937bc3449 ("bpf: make uniform use of array->elem_size
everywhere in arraymap.c") changed array_map_gen_lookup to use
array->elem_size instead of round_up(map->value_size, 8) as the element
size when generating code to access a value in an array map.
array->elem_size, however, is not set by bpf_map_meta_alloc when
initializing an BPF_MAP_TYPE_ARRAY_OF_MAPS or BPF_MAP_TYPE_HASH_OF_MAPS.
This results in array_map_gen_lookup incorrectly outputting code that
always accesses index 0 in the array (as the index will be calculated
via a multiplication with the element size, which is incorrectly set to
0).
This patchset sets elem_size on the bpf_array object when allocating an
array or hash of maps to fix this and adds a selftest that accesses an
array map nested within a hash of maps at a nonzero index to prevent
regressions.
v1: https://lore.kernel.org/bpf/95b5da7c-ee52-3ecb-0a4e-f6a7a114f269@linux.dev/
Changelog:
v1 -> v2:
Address comments by Martin KaFai Lau:
- Directly use inner_array->elem_size instead of using round_up
- Move selftests to a new patch
- Use ASSERT_* macros instead of CHECK and remove duration
- Remove unnecessary usleep
- Shorten selftest name
====================
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
The call to startup_64_setup_env() will install a new GDT but does not
actually switch to using the KERNEL_CS entry until returning from the
function call.
Commit bcce829083 ("x86/sev: Detect/setup SEV/SME features earlier in
boot") moved the call to sme_enable() earlier in the boot process and in
between the call to startup_64_setup_env() and the switch to KERNEL_CS.
An SEV-ES or an SEV-SNP guest will trigger #VC exceptions during the call
to sme_enable() and if the CS pushed on the stack as part of the exception
and used by IRETQ is not mapped by the new GDT, then problems occur.
Today, the current CS when entering startup_64 is the kernel CS value
because it was set up by the decompressor code, so no issue is seen.
However, a recent patchset that looked to avoid using the legacy
decompressor during an EFI boot exposed this bug. At entry to startup_64,
the CS value is that of EFI and is not mapped in the new kernel GDT. So
when a #VC exception occurs, the CS value used by IRETQ is not valid and
the guest boot crashes.
Fix this issue by moving the block that switches to the KERNEL_CS value to
be done immediately after returning from startup_64_setup_env().
Fixes: bcce829083 ("x86/sev: Detect/setup SEV/SME features earlier in boot")
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Joerg Roedel <jroedel@suse.de>
Link: https://lore.kernel.org/all/6ff1f28af2829cc9aea357ebee285825f90a431f.1684340801.git.thomas.lendacky%40amd.com
Increment vcpu->stat.exits when handling a fastpath VM-Exit without
going through any part of the "slow" path. Not bumping the exits stat
can result in wildly misleading exit counts, e.g. if the primary reason
the guest is exiting is to program the TSC deadline timer.
Fixes: 404d5d7bff ("KVM: X86: Introduce more exit_fastpath_completion enum values")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230602011920.787844-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
While testing Hyper-V enabled Windows Server 2019 guests on Zen4 hardware
I noticed that with vCPU count large enough (> 16) they sometimes froze at
boot.
With vCPU count of 64 they never booted successfully - suggesting some kind
of a race condition.
Since adding "vnmi=0" module parameter made these guests boot successfully
it was clear that the problem is most likely (v)NMI-related.
Running kvm-unit-tests quickly showed failing NMI-related tests cases, like
"multiple nmi" and "pending nmi" from apic-split, x2apic and xapic tests
and the NMI parts of eventinj test.
The issue was that once one NMI was being serviced no other NMI was allowed
to be set pending (NMI limit = 0), which was traced to
svm_is_vnmi_pending() wrongly testing for the "NMI blocked" flag rather
than for the "NMI pending" flag.
Fix this by testing for the right flag in svm_is_vnmi_pending().
Once this is done, the NMI-related kvm-unit-tests pass successfully and
the Windows guest no longer freezes at boot.
Fixes: fa4c027a79 ("KVM: x86: Add support for SVM's Virtual NMI")
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/be4ca192eb0c1e69a210db3009ca984e6a54ae69.1684495380.git.maciej.szmigiero@oracle.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Factor in the address space (non-SMM vs. SMM) of the target shadow page
when recovering potential NX huge pages, otherwise KVM will retrieve the
wrong memslot when zapping shadow pages that were created for SMM. The
bug most visibly manifests as a WARN on the memslot being non-NULL, but
the worst case scenario is that KVM could unaccount the shadow page
without ensuring KVM won't install a huge page, i.e. if the non-SMM slot
is being dirty logged, but the SMM slot is not.
------------[ cut here ]------------
WARNING: CPU: 1 PID: 3911 at arch/x86/kvm/mmu/mmu.c:7015
kvm_nx_huge_page_recovery_worker+0x38c/0x3d0 [kvm]
CPU: 1 PID: 3911 Comm: kvm-nx-lpage-re
RIP: 0010:kvm_nx_huge_page_recovery_worker+0x38c/0x3d0 [kvm]
RSP: 0018:ffff99b284f0be68 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff99b284edd000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffff9271397024e0 R08: 0000000000000000 R09: ffff927139702450
R10: 0000000000000000 R11: 0000000000000001 R12: ffff99b284f0be98
R13: 0000000000000000 R14: ffff9270991fcd80 R15: 0000000000000003
FS: 0000000000000000(0000) GS:ffff927f9f640000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f0aacad3ae0 CR3: 000000088fc2c005 CR4: 00000000003726e0
Call Trace:
<TASK>
__pfx_kvm_nx_huge_page_recovery_worker+0x10/0x10 [kvm]
kvm_vm_worker_thread+0x106/0x1c0 [kvm]
kthread+0xd9/0x100
ret_from_fork+0x2c/0x50
</TASK>
---[ end trace 0000000000000000 ]---
This bug was exposed by commit edbdb43fc9 ("KVM: x86: Preserve TDP MMU
roots until they are explicitly invalidated"), which allowed KVM to retain
SMM TDP MMU roots effectively indefinitely. Before commit edbdb43fc9,
KVM would zap all SMM TDP MMU roots and thus all SMM TDP MMU shadow pages
once all vCPUs exited SMM, which made the window where this bug (recovering
an SMM NX huge page) could be encountered quite tiny. To hit the bug, the
NX recovery thread would have to run while at least one vCPU was in SMM.
Most VMs typically only use SMM during boot, and so the problematic shadow
pages were gone by the time the NX recovery thread ran.
Now that KVM preserves TDP MMU roots until they are explicitly invalidated
(e.g. by a memslot deletion), the window to trigger the bug is effectively
never closed because most VMMs don't delete memslots after boot (except
for a handful of special scenarios).
Fixes: eb29860570 ("KVM: x86/mmu: Do not recover dirty-tracked NX Huge Pages")
Reported-by: Fabio Coatti <fabio.coatti@gmail.com>
Closes: https://lore.kernel.org/all/CADpTngX9LESCdHVu_2mQkNGena_Ng2CphWNwsRGSMxzDsTjU2A@mail.gmail.com
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230602010137.784664-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Commit d937bc3449 ("bpf: make uniform use of array->elem_size
everywhere in arraymap.c") changed array_map_gen_lookup to use
array->elem_size instead of round_up(map->value_size, 8) as the element
size when generating code to access a value in an array map.
array->elem_size, however, is not set by bpf_map_meta_alloc when
initializing an BPF_MAP_TYPE_ARRAY_OF_MAPS or BPF_MAP_TYPE_HASH_OF_MAPS.
This results in array_map_gen_lookup incorrectly outputting code that
always accesses index 0 in the array (as the index will be calculated
via a multiplication with the element size, which is incorrectly set to
0).
Set elem_size on the bpf_array object when allocating an array or hash
of maps to fix this.
Fixes: d937bc3449 ("bpf: make uniform use of array->elem_size everywhere in arraymap.c")
Signed-off-by: Rhys Rustad-Elliott <me@rhysre.net>
Link: https://lore.kernel.org/r/20230602190110.47068-2-me@rhysre.net
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
With commit 858e8b792d ("tpm, tpm_tis: Avoid cache incoherency in test
for interrupts") bit accessor functions are used to access flags in
tpm_tis_data->flags.
However these functions expect bit numbers, while the flags are defined
as bit masks in enum tpm_tis_flag.
Fix this inconsistency by using numbers instead of masks also for the
flags in the enum.
Reported-by: Pavel Machek <pavel@denx.de>
Fixes: 858e8b792d ("tpm, tpm_tis: Avoid cache incoherency in test for interrupts")
Signed-off-by: Lino Sanfilippo <l.sanfilippo@kunbus.com>
Cc: stable@vger.kernel.org
Reviewed-by: Pavel Machek <pavel@denx.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull ext4 fix from Ted Ts'o:
"Fix an ext4 regression which landed during the 6.4 merge window"
* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
Revert "ext4: remove ac->ac_found > sbi->s_mb_min_to_scan dead check in ext4_mb_check_limits"
Pull btrfs fix from David Sterba:
"One regression fix.
The rewrite of scrub code in 6.4 broke device replace in zoned mode,
some of the writes could happen out of order so this had to be
adjusted for all cases"
* tag 'for-6.4-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: zoned: fix dev-replace after the scrub rework
This reverts commit 32c0869370.
The reverted commit was intended to remove a dead check however it was observed
that this check was actually being used to exit early instead of looping
sbi->s_mb_max_to_scan times when we are able to find a free extent bigger than
the goal extent. Due to this, a my performance tests (fsmark, parallel file
writes in a highly fragmented FS) were seeing a 2x-3x regression.
Example, the default value of the following variables is:
sbi->s_mb_max_to_scan = 200
sbi->s_mb_min_to_scan = 10
In ext4_mb_check_limits() if we find an extent smaller than goal, then we return
early and try again. This loop will go on until we have processed
sbi->s_mb_max_to_scan(=200) number of free extents at which point we exit and
just use whatever we have even if it is smaller than goal extent.
Now, the regression comes when we find an extent bigger than goal. Earlier, in
this case we would loop only sbi->s_mb_min_to_scan(=10) times and then just use
the bigger extent. However with commit 32c08693 that check was removed and hence
we would loop sbi->s_mb_max_to_scan(=200) times even though we have a big enough
free extent to satisfy the request. The only time we would exit early would be
when the free extent is *exactly* the size of our goal, which is pretty uncommon
occurrence and so we would almost always end up looping 200 times.
Hence, revert the commit by adding the check back to fix the regression. Also
add a comment to outline this policy.
Fixes: 32c0869370 ("ext4: remove ac->ac_found > sbi->s_mb_min_to_scan dead check in ext4_mb_check_limits")
Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Reviewed-by: Kemeng Shi <shikemeng@huaweicloud.com>
Link: https://lore.kernel.org/r/ddcae9658e46880dfec2fb0aa61d01fb3353d202.1685449706.git.ojaswin@linux.ibm.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
When the uvcvideo driver encounters a format descriptor with an unknown
format GUID, it creates a corresponding struct uvc_format instance with
the fcc field set to 0. Since commit 50459f103e ("media: uvcvideo:
Remove format descriptions"), the driver relies on the V4L2 core to
provide the format description string, which the V4L2 core can't do
without a valid 4CC. This triggers a WARN_ON.
As a format with a zero 4CC can't be selected, it is unusable for
applications. Ignore the format completely without creating a uvc_format
instance, which fixes the warning.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217252
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2180107
Fixes: 50459f103e ("media: uvcvideo: Remove format descriptions")
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Reviewed-by: Ricardo Ribalda <ribalda@chromium.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Pull RISC-V fixes from Palmer Dabbelt:
- A build warning fix for BUILTIN_DTB=y
- Hibernation support is hidden behind NONPORTABLE, as it depends on
some undocumented early boot behavior and breaks on most platforms
- A fix for relocatable kernels on systems with early boot errata
- A fix to properly handle perf callchains for kernel tracepoints
- A pair of fixes for NAPOT to avoid inconsistencies between PTEs and
handle hardware that sets arbitrary A/D bits
* tag 'riscv-for-linus-6.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: Implement missing huge_ptep_get
riscv: Fix huge_ptep_set_wrprotect when PTE is a NAPOT
riscv: perf: Fix callchain parse error with kernel tracepoint events
riscv: Fix relocatable kernels with early alternatives using -fno-pie
RISC-V: mark hibernation as nonportable
riscv: Fix unused variable warning when BUILTIN_DTB is set
Initialize hs_settle to 0 to avoid this compiler warning:
imx8mq-mipi-csi2.c: In function 'imx8mq_mipi_csi_start_stream.part.0':
imx8mq-mipi-csi2.c:91:55: warning: 'hs_settle' may be used uninitialized [-Wmaybe-uninitialized]
91 | #define GPR_CSI2_1_S_PRG_RXHS_SETTLE(x) (((x) & 0x3f) << 2)
| ^~
imx8mq-mipi-csi2.c:357:13: note: 'hs_settle' was declared here
357 | u32 hs_settle;
| ^~~~~~~~~
It's a false positive, but it is too complicated for the compiler to detect that.
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Reviewed-by: Martin Kepplinger <martink@posteo.de>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
While updating v4l2_create_fwnode_links_to_pad() to accept non-subdev
sinks, the check is_media_entity_v4l2_subdev() was not removed which
prevented the function from being used with non-subdev sinks, Drop the
unnecessary check.
Fixes: bd5a03bc5b ("media: Accept non-subdev sinks in v4l2_create_fwnode_links_to_pad()")
Signed-off-by: Vaishnav Achath <vaishnav.a@ti.com>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Pull nfsd fixes from Chuck Lever:
- Two minor bug fixes
* tag 'nfsd-6.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
nfsd: fix double fget() bug in __write_ports_addfd()
nfsd: make a copy of struct iattr before calling notify_change
This patch add the validation for smb request protocol id.
If it is not one of the four ids(SMB1_PROTO_NUMBER, SMB2_PROTO_NUMBER,
SMB2_TRANSFORM_PROTO_NUM, SMB2_COMPRESSION_TRANSFORM_ID), don't allow
processing the request. And this will fix the following KASAN warning
also.
[ 13.905265] BUG: KASAN: slab-out-of-bounds in init_smb2_rsp_hdr+0x1b9/0x1f0
[ 13.905900] Read of size 16 at addr ffff888005fd2f34 by task kworker/0:2/44
...
[ 13.908553] Call Trace:
[ 13.908793] <TASK>
[ 13.908995] dump_stack_lvl+0x33/0x50
[ 13.909369] print_report+0xcc/0x620
[ 13.910870] kasan_report+0xae/0xe0
[ 13.911519] kasan_check_range+0x35/0x1b0
[ 13.911796] init_smb2_rsp_hdr+0x1b9/0x1f0
[ 13.912492] handle_ksmbd_work+0xe5/0x820
Cc: stable@vger.kernel.org
Reported-by: Chih-Yen Chang <cc85nod@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
The length field of netbios header must be greater than the SMB header
sizes(smb1 or smb2 header), otherwise the packet is an invalid SMB packet.
If `pdu_size` is 0, ksmbd allocates a 4 bytes chunk to `conn->request_buf`.
In the function `get_smb2_cmd_val` ksmbd will read cmd from
`rcv_hdr->Command`, which is `conn->request_buf + 12`, causing the KASAN
detector to print the following error message:
[ 7.205018] BUG: KASAN: slab-out-of-bounds in get_smb2_cmd_val+0x45/0x60
[ 7.205423] Read of size 2 at addr ffff8880062d8b50 by task ksmbd:42632/248
...
[ 7.207125] <TASK>
[ 7.209191] get_smb2_cmd_val+0x45/0x60
[ 7.209426] ksmbd_conn_enqueue_request+0x3a/0x100
[ 7.209712] ksmbd_server_process_request+0x72/0x160
[ 7.210295] ksmbd_conn_handler_loop+0x30c/0x550
[ 7.212280] kthread+0x160/0x190
[ 7.212762] ret_from_fork+0x1f/0x30
[ 7.212981] </TASK>
Cc: stable@vger.kernel.org
Reported-by: Chih-Yen Chang <cc85nod@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Dan reported the following error message:
fs/smb/server/smbacl.c:1296 smb_check_perm_dacl()
error: 'posix_acls' dereferencing possible ERR_PTR()
fs/smb/server/vfs.c:1323 ksmbd_vfs_make_xattr_posix_acl()
error: 'posix_acls' dereferencing possible ERR_PTR()
fs/smb/server/vfs.c:1830 ksmbd_vfs_inherit_posix_acl()
error: 'acls' dereferencing possible ERR_PTR()
__get_acl() returns a mix of error pointers and NULL. This change it
with IS_ERR_OR_NULL().
Fixes: e2f34481b2 ("cifsd: add server-side procedures for SMB3")
Cc: stable@vger.kernel.org
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
This bug is in parse_lease_state, and it is caused by the missing check
of `struct create_context`. When the ksmbd traverses the create_contexts,
it doesn't check if the field of `NameOffset` and `Next` is valid,
The KASAN message is following:
[ 6.664323] BUG: KASAN: slab-out-of-bounds in parse_lease_state+0x7d/0x280
[ 6.664738] Read of size 2 at addr ffff888005c08988 by task kworker/0:3/103
...
[ 6.666644] Call Trace:
[ 6.666796] <TASK>
[ 6.666933] dump_stack_lvl+0x33/0x50
[ 6.667167] print_report+0xcc/0x620
[ 6.667903] kasan_report+0xae/0xe0
[ 6.668374] kasan_check_range+0x35/0x1b0
[ 6.668621] parse_lease_state+0x7d/0x280
[ 6.668868] smb2_open+0xbe8/0x4420
[ 6.675137] handle_ksmbd_work+0x282/0x820
Use smb2_find_context_vals() to find smb2 create request lease context.
smb2_find_context_vals validate create context fields.
Cc: stable@vger.kernel.org
Reported-by: Chih-Yen Chang <cc85nod@gmail.com>
Tested-by: Chih-Yen Chang <cc85nod@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
The check in the beginning is
`clen + sizeof(struct smb2_neg_context) <= len_of_ctxts`,
but in the end of loop, `len_of_ctxts` will subtract
`((clen + 7) & ~0x7) + sizeof(struct smb2_neg_context)`, which causes
integer underflow when clen does the 8 alignment. We should use
`(clen + 7) & ~0x7` in the check to avoid underflow from happening.
Then there are some variables that need to be declared unsigned
instead of signed.
[ 11.671070] BUG: KASAN: slab-out-of-bounds in smb2_handle_negotiate+0x799/0x1610
[ 11.671533] Read of size 2 at addr ffff888005e86cf2 by task kworker/0:0/7
...
[ 11.673383] Call Trace:
[ 11.673541] <TASK>
[ 11.673679] dump_stack_lvl+0x33/0x50
[ 11.673913] print_report+0xcc/0x620
[ 11.674671] kasan_report+0xae/0xe0
[ 11.675171] kasan_check_range+0x35/0x1b0
[ 11.675412] smb2_handle_negotiate+0x799/0x1610
[ 11.676217] ksmbd_smb_negotiate_common+0x526/0x770
[ 11.676795] handle_ksmbd_work+0x274/0x810
...
Cc: stable@vger.kernel.org
Signed-off-by: Chih-Yen Chang <cc85nod@gmail.com>
Tested-by: Chih-Yen Chang <cc85nod@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
When task local storage was generalized for tracing programs, the
bpf_task_local_storage callback was moved from a BPF LSM hook
callback for security_task_free LSM hook to it's own callback. But a
failure case in bad_fork_cleanup_security was missed which, when
triggered, led to a dangling task owner pointer and a subsequent
use-after-free. Move the bpf_task_storage_free to the very end of
free_task to handle all failure cases.
This issue was noticed when a BPF LSM program was attached to the
task_alloc hook on a kernel with KASAN enabled. The program used
bpf_task_storage_get to copy the task local storage from the current
task to the new task being created.
Fixes: a10787e6d5 ("bpf: Enable task local storage for tracing programs")
Reported-by: Kuba Piecuch <jpiecuch@google.com>
Signed-off-by: KP Singh <kpsingh@kernel.org>
Acked-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20230602002612.1117381-1-kpsingh@kernel.org
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Pull block fixes from Jens Axboe:
"Just an NVMe pull request with (mostly) KATO fixes, a regression fix
for zoned device revalidation, and a fix for an md raid5 regression"
* tag 'block-6.4-2023-06-02' of git://git.kernel.dk/linux:
nvme: fix the name of Zone Append for verbose logging
nvme: improve handling of long keep alives
nvme: check IO start time when deciding to defer KA
nvme: double KA polling frequency to avoid KATO with TBKAS on
nvme: fix miss command type check
block: fix revalidate performance regression
md/raid5: fix miscalculation of 'end_sector' in raid5_read_one_chunk()
Pull io_uring fix from Jens Axboe:
"Just a single revert in here, removing the warning on the epoll ctl
opcode.
We originally deprecated this a few releases ago, but I've since had
two people report that it's being used. Which isn't the biggest deal,
obviously this is why we out in the deprecation notice in the first
place, but it also means that we should just kill this warning again
and abandon the deprecation plans.
Since it's only a few handfuls of code to support epoll ctl, not worth
going any further with this imho"
* tag 'io_uring-6.4-2023-06-02' of git://git.kernel.dk/linux:
io_uring: undeprecate epoll_ctl support
Commit ac4e97abce ("scatterlist: sg_set_buf() argument must be in linear
mapping") checks that both the signature and the digest reside in the
linear mapping area.
However, more recently commit ba14a194a4 ("fork: Add generic vmalloced
stack support") made it possible to move the stack in the vmalloc area,
which is not contiguous, and thus not suitable for sg_set_buf() which needs
adjacent pages.
Always make a copy of the signature and digest in the same buffer used to
store the key and its parameters, and pass them to sg_init_one(). Prefer it
to conditionally doing the copy if necessary, to keep the code simple. The
buffer allocated with kmalloc() is in the linear mapping area.
Cc: stable@vger.kernel.org # 4.9.x
Fixes: ba14a194a4 ("fork: Add generic vmalloced stack support")
Link: https://lore.kernel.org/linux-integrity/Y4pIpxbjBdajymBJ@sol.localdomain/
Suggested-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Tested-by: Stefan Berger <stefanb@linux.ibm.com>
When reading the arm64's PER_VMA_LOCK support code, I found a bit
difference between arm64 and other arch when calling handle_mm_fault()
during VMA lock-based page fault handling: the fault address is masked
before passing to handle_mm_fault(). This is also different from the
usage in mmap_lock-based handling. I think we need to pass the
original fault address to handle_mm_fault() as we did in
commit 84c5e23ede ("arm64: mm: Pass original fault address to
handle_mm_fault()").
If we go through the code path further, we can find that the "masked"
fault address can cause mismatched fault address between perf sw
major/minor page fault sw event and perf page fault sw event:
do_page_fault
perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, ..., addr) // orig addr
handle_mm_fault
mm_account_fault
perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MAJ, ...) // masked addr
Fixes: cd7f176aea ("arm64/mm: try VMA lock-based page fault handling first")
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20230524131305.2808-1-jszhang@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Commit 34affcd757 ("arm64: drop ranges in definition of
ARCH_FORCE_MAX_ORDER") dropped the ranges from the config entry and
introduced an EXPERT condition on the input prompt instead.
However, starting with defconfig (ARCH_FORCE_MAX_ORDER of 10) and
setting ARM64_64K_PAGES together with EXPERT leaves MAX_ORDER 10 which
fails to build in this configuration.
Drop the input prompt for ARCH_FORCE_MAX_ORDER completely so that it's
no longer configurable. People requiring a higher MAX_ORDER should send
a patch changing the default, together with proper justification.
Fixes: 34affcd757 ("arm64: drop ranges in definition of ARCH_FORCE_MAX_ORDER")
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Justin M. Forbes <jforbes@fedoraproject.org>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>
Link: https://lore.kernel.org/r/20230519171440.1941213-1-catalin.marinas@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
There's an issue on SAI synchronous mode that TX/RX side can't get BCLK
from RX/TX it sync with if BYP bit is asserted. It's a workaround to
fix it that enable SION of IOMUX pad control and assert BCI.
For example if TX sync with RX which means both TX and RX are using clk
form RX and BYP=1. TX can get BCLK only if the following two conditions
are valid:
1. SION of RX BCLK IOMUX pad is set to 1
2. BCI of TX is set to 1
Signed-off-by: Chancel Liu <chancel.liu@nxp.com>
Acked-by: Shengjiu Wang <shengjiu.wang@gmail.com>
Link: https://lore.kernel.org/r/20230530103012.3448838-1-chancel.liu@nxp.com
Signed-off-by: Mark Brown <broonie@kernel.org>
The code in asoc_simple_startup was treating any non-zero return from
snd_pcm_hw_constraint_minmax as an error, when this can return 1 in some
normal cases and only negative values indicate an error.
When this happened, it caused asoc_simple_startup to disable the clocks
it just enabled and return 1, which was not treated as an error by the
calling code which only checks for negative return values. Then when the
PCM is eventually shut down, it causes the clock framework to complain
about disabling clocks that were not enabled.
Fix the check for snd_pcm_hw_constraint_minmax return value to only
treat negative values as an error.
Fixes: 5ca2ab4598 ("ASoC: simple-card-utils: Add new system-clock-fixed flag")
Signed-off-by: Robert Hancock <robert.hancock@calian.com>
Link: https://lore.kernel.org/r/20230602011936.231931-1-robert.hancock@calian.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Pull drm fixes from Dave Airlie:
"Quiet enough week, though the misc fixes tree didn't get to me when I
was sending this, so maybe it'll be a bit bigger next week, just one
i915 fix and some scattered amdgpu fixes:
amdgpu:
- Fix mclk and fclk output ordering on some APUs
- Fix display regression with 5K VRR
- VCN, JPEG spurious interrupt warning fixes
- Fix SI DPM on some ARM64 platforms
- Fix missing TMZ enablement on GC 11.0.1
i915:
- Fix for OA reporting to allow detecting non-power-of-two reports"
* tag 'drm-fixes-2023-06-02' of git://anongit.freedesktop.org/drm/drm:
drm/i915/perf: Clear out entire reports after reading if not power of 2 size
drm/amdgpu: enable tmz by default for GC 11.0.1
drm/amd/pm: resolve reboot exception for si oland
drm/amdgpu: add RAS POISON interrupt funcs for jpeg_v4_0
drm/amdgpu: add RAS POISON interrupt funcs for jpeg_v2_6
drm/amdgpu: separate ras irq from jpeg instance irq for UVD_POISON
drm/amdgpu: add RAS POISON interrupt funcs for vcn_v4_0
drm/amdgpu: add RAS POISON interrupt funcs for vcn_v2_6
drm/amdgpu: separate ras irq from vcn instance irq for UVD_POISON
Revert "drm/amd/display: Do not set drr on pipe commit"
Revert "drm/amd/display: Block optimize on consecutive FAMS enables"
drm/amd/pm: reverse mclk and fclk clocks levels for renoir
drm/amd/pm: reverse mclk and fclk clocks levels for vangogh
drm/amd/pm: reverse mclk and fclk clocks levels for yellow carp
drm/amd/pm: reverse mclk clocks levels for SMU v13.0.5
drm/amd/pm: reverse mclk and fclk clocks levels for SMU v13.0.4
Pull selinux fix from Paul Moore:
"A small SELinux Makefile fix to resolve a problem seen when building
the kernel with older versions of make.
The fix is pretty trivial and effectively reverts a patch that was
merged during the last merge window"
* tag 'selinux-pr-20230601' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: don't use make's grouped targets feature yet
Without LED triggers, the driver now fails to build:
drivers/net/dsa/qca/qca8k-leds.c: In function 'qca8k_parse_port_leds':
drivers/net/dsa/qca/qca8k-leds.c:403:31: error: 'struct led_classdev' has no member named 'hw_control_is_supported'
403 | port_led->cdev.hw_control_is_supported = qca8k_cled_hw_control_is_supported;
| ^
There is a mix of 'depends on' and 'select' for LEDS_TRIGGERS, so it's
not clear what we should use here, but in general using 'depends on'
causes fewer problems, so use that.
Fixes: e0256648c8 ("net: dsa: qca8k: implement hw_control ops")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
With this commit, all the GIDs ("0 4294967294") can be written to the
"net.ipv4.ping_group_range" sysctl.
Note that 4294967295 (0xffffffff) is an invalid GID (see gid_valid() in
include/linux/uidgid.h), and an attempt to register this number will cause
-EINVAL.
Prior to this commit, only up to GID 2147483647 could be covered.
Documentation/networking/ip-sysctl.rst had "0 4294967295" as an example
value, but this example was wrong and causing -EINVAL.
Fixes: c319b4d76b ("net: ipv4: add IPPROTO_ICMP socket kind")
Co-developed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
After the blamed commit, the member key is longer 4-byte aligned. On
platforms that do not support unaligned access, e.g., MIPS32R2 with
unaligned_action set to 1, this will trigger a crash when accessing
an IPv6 pneigh_entry, as the key is cast to an in6_addr pointer.
Change the type of the key to u32 to make it aligned.
Fixes: 62dd93181a ("[IPV6] NDISC: Set per-entry is_router flag in Proxy NA.")
Signed-off-by: Qingfang DENG <qingfang.deng@siflower.com.cn>
Link: https://lore.kernel.org/r/20230601015432.159066-1-dqfext@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull modules fix from Luis Chamberlain:
"A zstd fix by lucas as he tested zstd decompression support"
* tag 'modules-6.4-rc5-second-pull' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux:
module/decompress: Fix error checking on zstd decompression
Pull EFI fixes from Ard Biesheuvel:
"A few minor fixes for EFI, one of which fixes the reported boot
regression when booting x86 kernels using the BIOS based loader built
into the hypervisor framework on macOS.
- fix harmless warning in zboot code on 'make clean'
- add some missing prototypes
- fix boot regressions triggered by PE/COFF header image minor
version bump"
* tag 'efi-fixes-for-v6.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
efi: Bump stub image version for macOS HVF compatibility
efi: fix missing prototype warnings
efi/libstub: zboot: Avoid eager evaluation of objcopy flags
After commit 6d758147c7 ("RDMA/bnxt_re: Use auxiliary driver interface")
the active_{speed, width} attributes are reported incorrectly, This is
happening because ib_get_eth_speed() is called only once from
bnxt_re_ib_init() - Fix this issue by calling ib_get_eth_speed() from
bnxt_re_query_port().
Fixes: 6d758147c7 ("RDMA/bnxt_re: Use auxiliary driver interface")
Link: https://lore.kernel.org/r/20230529153525.87254-1-kheib@redhat.com
Signed-off-by: Kamal Heib <kheib@redhat.com>
Acked-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Pull fbdev fixes from Helge Deller:
"Most notable is a fix for a null-ptr-deref in fbcon's soft_cursor
function which was found by syzbot.
- Fix null-ptr-deref in soft_cursor
- various remove callback conversions
- error path fixes in imsttfb"
* tag 'fbdev-for-6.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev:
fbdev: bw2: Convert to platform remove callback returning void
fbdev: broadsheetfb: Convert to platform remove callback returning void
fbdev: au1200fb: Convert to platform remove callback returning void
fbdev: au1100fb: Convert to platform remove callback returning void
fbdev: arcfb: Convert to platform remove callback returning void
fbdev: au1100fb: Drop if with an always false condition
fbcon: Fix null-ptr-deref in soft_cursor
fbdev: imsttfb: Fix error path of imsttfb_probe()
fbdev: imsttfb: Release framebuffer and dealloc cmap on error path
fbdev: matroxfb ssd1307fb: Switch i2c drivers back to use .probe()
While implementing support for in-kernel decompression in kmod,
finit_module() was returning a very suspicious value:
finit_module(3, "", MODULE_INIT_COMPRESSED_FILE) = 18446744072717407296
It turns out the check for module_get_next_page() failing is wrong,
and hence the decompression was not really taking place. Invert
the condition to fix it.
Fixes: 169a58ad82 ("module/decompress: Support zstd in-kernel decompression")
Cc: stable@kernel.org
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Stephen Boyd <swboyd@chromium.org>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Pull mtd fixes from Miquel Raynal:
"MTD core:
- MAINTAINERS: Add Michal as reviewer instead of Naga
- mtdchar: Mark bits of ioctl handler noinline
NAND controller drivers:
- marvell:
- Don't set the NAND frequency select
- Ensure timing values are written
- ingenic: Fix empty stub helper definitions
SPI-NOR core:
- Fix divide by zero for spi-nor-generic flashes
SPI-NOR manufacturer driver:
- spansion: make sure local struct does not contain garbage"
* tag 'mtd/fixes-for-6.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux:
mtd: rawnand: marvell: don't set the NAND frequency select
mtd: rawnand: marvell: ensure timing values are written
mtdchar: mark bits of ioctl handler noinline
MAINTAINERS: Add myself as reviewer instead of Naga
mtd: spi-nor: Fix divide by zero for spi-nor-generic flashes
mtd: rawnand: ingenic: fix empty stub helper definitions
mtd: spi-nor: spansion: make sure local struct does not contain garbage
Pull networking fixes from Jakub Kicinski:
"Happy Wear a Dress Day.
Fairly standard-sized batch of fixes, accounting for the lack of
sub-tree submissions this week. The mlx5 IRQ fixes are notable, people
were complaining about that. No fires burning.
Current release - regressions:
- eth: mlx5e:
- multiple fixes for dynamic IRQ allocation
- prevent encap offload when neigh update is running
- eth: mana: fix perf regression: remove rx_cqes, tx_cqes counters
Current release - new code bugs:
- eth: mlx5e: DR, add missing mutex init/destroy in pattern manager
Previous releases - always broken:
- tcp: deny tcp_disconnect() when threads are waiting
- sched: prevent ingress Qdiscs from getting installed in random
locations in the hierarchy and moving around
- sched: flower: fix possible OOB write in fl_set_geneve_opt()
- netlink: fix NETLINK_LIST_MEMBERSHIPS length report
- udp6: fix race condition in udp6_sendmsg & connect
- tcp: fix mishandling when the sack compression is deferred
- rtnetlink: validate link attributes set at creation time
- mptcp: fix connect timeout handling
- eth: stmmac: fix call trace when stmmac_xdp_xmit() is invoked
- eth: amd-xgbe: fix the false linkup in xgbe_phy_status
- eth: mlx5e:
- fix corner cases in internal buffer configuration
- drain health before unregistering devlink
- usb: qmi_wwan: set DTR quirk for BroadMobi BM818
Misc:
- tcp: return user_mss for TCP_MAXSEG in CLOSE/LISTEN state if
user_mss set"
* tag 'net-6.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (71 commits)
mptcp: fix active subflow finalization
mptcp: add annotations around sk->sk_shutdown accesses
mptcp: fix data race around msk->first access
mptcp: consolidate passive msk socket initialization
mptcp: add annotations around msk->subflow accesses
mptcp: fix connect timeout handling
rtnetlink: add the missing IFLA_GRO_ tb check in validate_linkmsg
rtnetlink: move IFLA_GSO_ tb check to validate_linkmsg
rtnetlink: call validate_linkmsg in rtnl_create_link
ice: recycle/free all of the fragments from multi-buffer frame
net: phy: mxl-gpy: extend interrupt fix to all impacted variants
net: renesas: rswitch: Fix return value in error path of xmit
net: dsa: mv88e6xxx: Increase wait after reset deactivation
net: ipa: Use correct value for IPA_STATUS_SIZE
tcp: fix mishandling when the sack compression is deferred.
net/sched: flower: fix possible OOB write in fl_set_geneve_opt()
sfc: fix error unwinds in TC offload
net/mlx5: Read embedded cpu after init bit cleared
net/mlx5e: Fix error handling in mlx5e_refresh_tirs
net/mlx5: Ensure af_desc.mask is properly initialized
...
When switching from kthreads to vhost_tasks two bugs were added:
1. The vhost worker tasks's now show up as processes so scripts doing
ps or ps a would not incorrectly detect the vhost task as another
process. 2. kthreads disabled freeze by setting PF_NOFREEZE, but
vhost tasks's didn't disable or add support for them.
To fix both bugs, this switches the vhost task to be thread in the
process that does the VHOST_SET_OWNER ioctl, and has vhost_worker call
get_signal to support SIGKILL/SIGSTOP and freeze signals. Note that
SIGKILL/STOP support is required because CLONE_THREAD requires
CLONE_SIGHAND which requires those 2 signals to be supported.
This is a modified version of the patch written by Mike Christie
<michael.christie@oracle.com> which was a modified version of patch
originally written by Linus.
Much of what depended upon PF_IO_WORKER now depends on PF_USER_WORKER.
Including ignoring signals, setting up the register state, and having
get_signal return instead of calling do_group_exit.
Tidied up the vhost_task abstraction so that the definition of
vhost_task only needs to be visible inside of vhost_task.c. Making
it easier to review the code and tell what needs to be done where.
As part of this the main loop has been moved from vhost_worker into
vhost_task_fn. vhost_worker now returns true if work was done.
The main loop has been updated to call get_signal which handles
SIGSTOP, freezing, and collects the message that tells the thread to
exit as part of process exit. This collection clears
__fatal_signal_pending. This collection is not guaranteed to
clear signal_pending() so clear that explicitly so the schedule()
sleeps.
For now the vhost thread continues to exist and run work until the
last file descriptor is closed and the release function is called as
part of freeing struct file. To avoid hangs in the coredump
rendezvous and when killing threads in a multi-threaded exec. The
coredump code and de_thread have been modified to ignore vhost threads.
Remvoing the special case for exec appears to require teaching
vhost_dev_flush how to directly complete transactions in case
the vhost thread is no longer running.
Removing the special case for coredump rendezvous requires either the
above fix needed for exec or moving the coredump rendezvous into
get_signal.
Fixes: 6e890c5d50 ("vhost: use vhost_tasks for worker threads")
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Co-developed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The Linux Kernel currently only requires make v3.82 while the grouped
target functionality requires make v4.3. Removed the grouped target
introduced in 4ce1f694eb ("selinux: ensure av_permissions.h is
built when needed") as well as the multiple header file targets in
the make rule. This effectively reverts the problem commit.
We will revisit this change when make >= 4.3 is required by the rest
of the kernel.
Cc: stable@vger.kernel.org
Fixes: 4ce1f694eb ("selinux: ensure av_permissions.h is built when needed")
Reported-by: Erwan Velu <e.velu@criteo.com>
Reported-by: Luiz Capitulino <luizcap@amazon.com>
Tested-by: Luiz Capitulino <luizcap@amazon.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
There is a reference count error in error path code and a potential race
in check_rkey() in rxe_resp.c. When looking up the rkey for a memory
window the reference to the mw from rxe_lookup_mw() is dropped before a
reference is taken on the mr referenced by the mw. If the mr is destroyed
immediately after the call to rxe_put(mw) the mr pointer is unprotected
and may end up pointing at freed memory. The rxe_get(mr) call should take
place before the rxe_put(mw) call.
All errors in check_rkey() call rxe_put(mw) if mw is not NULL but it was
already called after the above. The mw pointer should be set to NULL after
the rxe_put(mw) call to prevent this from happening.
Fixes: cdd0b85675 ("RDMA/rxe: Implement memory access through MWs")
Link: https://lore.kernel.org/r/20230517211509.1819998-1-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
In rxe_net.c a received packet, from udp or loopback, is passed to
rxe_rcv() in rxe_recv.c as a udp packet. I.e. skb->data is pointing at the
udp header. But rxe_rcv() makes length checks to verify the packet is long
enough to hold the roce headers as if it were a roce
packet. I.e. skb->data pointing at the bth header. A runt packet would
appear to have 8 more bytes than it actually does which may lead to
incorrect behavior.
This patch calls skb_pull() to adjust the skb to point at the bth header
before calling rxe_rcv() which fixes this error.
Fixes: 8700e3e7c4 ("Soft RoCE driver")
Link: https://lore.kernel.org/r/20230517172242.1806340-1-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Saeed Mahameed says:
====================
mlx5 fixes 2023-05-31
This series provides bug fixes to mlx5 driver.
* tag 'mlx5-fixes-2023-05-31' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5: Read embedded cpu after init bit cleared
net/mlx5e: Fix error handling in mlx5e_refresh_tirs
net/mlx5: Ensure af_desc.mask is properly initialized
net/mlx5: Fix setting of irq->map.index for static IRQ case
net/mlx5: Remove rmap also in case dynamic MSIX not supported
====================
Link: https://lore.kernel.org/r/20230601031051.131529-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull NVMe fixes from Keith:
"nvme fixes for Linux 6.4
- Fixes for spurious Keep Alive timeouts (Uday)
- Fix for command type check on passthrough actions (Min)
- Fix for nvme command name for error logging (Christoph)"
* tag 'nvme-6.4-2023-06-01' of git://git.infradead.org/nvme:
nvme: fix the name of Zone Append for verbose logging
nvme: improve handling of long keep alives
nvme: check IO start time when deciding to defer KA
nvme: double KA polling frequency to avoid KATO with TBKAS on
nvme: fix miss command type check
For RISC-V, when tracing with tracepoint events, the IP and status are
set to 0, preventing the perf code parsing the callchain and resolving
the symbols correctly.
./ply 'tracepoint:kmem/kmem_cache_alloc { @[stack]=count(); }'
@:
{ <STACKID4294967282> }: 1
The fix is to implement perf_arch_fetch_caller_regs for riscv, which
fills several necessary registers used for callchain unwinding,
including epc, sp, s0 and status. It's similar to commit b3eac0265b
("arm: perf: Fix callchain parse error with kernel tracepoint events")
and commit 5b09a094f2 ("arm64: perf: Fix callchain parse error with
kernel tracepoint events").
With this patch, callchain can be parsed correctly as:
./ply 'tracepoint:kmem/kmem_cache_alloc { @[stack]=count(); }'
@:
{
__traceiter_kmem_cache_alloc+68
__traceiter_kmem_cache_alloc+68
kmem_cache_alloc+354
__sigqueue_alloc+94
__send_signal_locked+646
send_signal_locked+154
do_send_sig_info+84
__kill_pgrp_info+130
kill_pgrp+60
isig+150
n_tty_receive_signal_char+36
n_tty_receive_buf_standard+2214
n_tty_receive_buf_common+280
n_tty_receive_buf2+26
tty_ldisc_receive_buf+34
tty_port_default_receive_buf+62
flush_to_ldisc+158
process_one_work+458
worker_thread+138
kthread+178
riscv_cpufeature_patch_func+832
}: 1
Signed-off-by: Ism Hong <ism.hong@gmail.com>
Link: https://lore.kernel.org/r/20230601095355.1168910-1-ism.hong@gmail.com
Fixes: 178e9fc47a ("perf: riscv: preliminary RISC-V support")
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Mat Martineau says:
====================
mptcp: Fixes for connect timeout, access annotations, and subflow init
Patch 1 allows the SO_SNDTIMEO sockopt to correctly change the connect
timeout on MPTCP sockets.
Patches 2-5 add READ_ONCE()/WRITE_ONCE() annotations to fix KCSAN issues.
Patch 6 correctly initializes some subflow fields on outgoing connections.
====================
Link: https://lore.kernel.org/r/20230531-send-net-20230531-v1-0-47750c420571@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Active subflow are inserted into the connection list at creation time.
When the MPJ handshake completes successfully, a new subflow creation
netlink event is generated correctly, but the current code wrongly
avoid initializing a couple of subflow data.
The above will cause misbehavior on a few exceptional events: unneeded
mptcp-level retransmission on msk-level sequence wrap-around and infinite
mapping fallback even when a MPJ socket is present.
Address the issue factoring out the needed initialization in a new helper
and invoking the latter from __mptcp_finish_join() time for passive
subflow and from mptcp_finish_join() for active ones.
Fixes: 0530020a7c ("mptcp: track and update contiguous data status")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The first subflow socket is accessed outside the msk socket lock
by mptcp_subflow_fail(), we need to annotate each write access
with WRITE_ONCE, but a few spots still lacks it.
Fixes: 76a13b3157 ("mptcp: invoke MP_FAIL response when needed")
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When the msk socket is cloned at MPC handshake time, a few
fields are initialized in a racy way outside mptcp_sk_clone()
and the msk socket lock.
The above is due historical reasons: before commit a88d0092b2
("mptcp: simplify subflow_syn_recv_sock()") as the first subflow socket
carrying all the needed date was not available yet at msk creation
time
We can now refactor the code moving the missing initialization bit
under the socket lock, removing the init race and avoiding some
code duplication.
This will also simplify the next patch, as all msk->first write
access are now under the msk socket lock.
Fixes: 0397c6d85f ("mptcp: keep unaccepted MPC subflow into join list")
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The MPTCP can access the first subflow socket in a few spots
outside the socket lock scope. That is actually safe, as MPTCP
will delete the socket itself only after the msk sock close().
Still the such accesses causes a few KCSAN splats, as reported
by Christoph. Silence the harmless warning adding a few annotation
around the relevant accesses.
Fixes: 71ba088ce0 ("mptcp: cleanup accept and poll")
Reported-by: Christoph Paasch <cpaasch@apple.com>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/402
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ondrej reported a functional issue WRT timeout handling on connect
with a nice reproducer.
The problem is that the current mptcp connect waits for both the
MPTCP socket level timeout, and the first subflow socket timeout.
The latter is not influenced/touched by the exposed setsockopt().
Overall the above makes the SO_SNDTIMEO a no-op on connect.
Since mptcp_connect is invoked via inet_stream_connect and the
latter properly handle the MPTCP level timeout, we can address the
issue making the nested subflow level connect always unblocking.
This also allow simplifying a bit the code, dropping an ugly hack
to handle the fastopen and custom proto_ops connect.
The issues predates the blamed commit below, but the current resolution
requires the infrastructure introduced there.
Fixes: 54f1944ed6 ("mptcp: factor out mptcp_connect()")
Reported-by: Ondrej Mosnacek <omosnace@redhat.com>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/399
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Xin Long says:
====================
rtnetlink: a couple of fixes in linkmsg validation
validate_linkmsg() was introduced to do linkmsg validation for existing
links. However, the new created links also need this linkmsg validation.
Add validate_linkmsg() check for link creating in Patch 1, and add more
tb checks into validate_linkmsg() in Patch 2 and 3.
====================
Link: https://lore.kernel.org/r/cover.1685548598.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This fixes the issue that dev gro_max_size and gso_ipv4_max_size
can be set to a huge value:
# ip link add dummy1 type dummy
# ip link set dummy1 gro_max_size 4294967295
# ip -d link show dummy1
dummy addrgenmode eui64 ... gro_max_size 4294967295
Fixes: 0fe79f28bf ("net: allow gro_max_size to exceed 65536")
Fixes: 9eefedd58a ("net: add gso_ipv4_max_size and gro_ipv4_max_size per device")
Reported-by: Xiumei Mu <xmu@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
These IFLA_GSO_* tb check should also be done for the new created link,
otherwise, they can be set to a huge value when creating links:
# ip link add dummy1 gso_max_size 4294967295 type dummy
# ip -d link show dummy1
dummy addrgenmode eui64 ... gso_max_size 4294967295
Fixes: 46e6b992c2 ("rtnetlink: allow GSO maximums to be set on device creation")
Fixes: 9eefedd58a ("net: add gso_ipv4_max_size and gro_ipv4_max_size per device")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
validate_linkmsg() was introduced by commit 1840bb13c2 ("[RTNL]:
Validate hardware and broadcast address attribute for RTM_NEWLINK")
to validate tb[IFLA_ADDRESS/BROADCAST] for existing links. The same
check should also be done for newly created links.
This patch adds validate_linkmsg() call in rtnl_create_link(), to
avoid the invalid address set when creating some devices like:
# ip link add dummy0 type dummy
# ip link add link dummy0 name mac0 address 01:02 type macsec
Fixes: 0e06877c6f ("[RTNETLINK]: rtnl_link: allow specifying initial device address")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In current design:
1. PD and clt_path->s.dev are shared among connections.
2. every con[n]'s cleanup phase will call destroy_con_cq_qp()
3. clt_path->s.dev will be always decreased in destroy_con_cq_qp(), and
when clt_path->s.dev become zero, it will destroy PD.
4. when con[1] failed to create, con[1] will not take clt_path->s.dev,
but it try to decreased clt_path->s.dev
So, in case create_cm(con[0]) succeeds but create_cm(con[1]) fails,
destroy_con_cq_qp(con[1]) will be called first which will destroy the PD
while this PD is still taken by con[0].
Here, we refactor the error path of create_cm() and init_conns(), so that
we do the cleanup in the order they are created.
The warning occurs when destroying RXE PD whose reference count is not
zero.
rnbd_client L597: Mapping device /dev/nvme0n1 on session client, (access_mode: rw, nr_poll_queues: 0)
------------[ cut here ]------------
WARNING: CPU: 0 PID: 26407 at drivers/infiniband/sw/rxe/rxe_pool.c:256 __rxe_cleanup+0x13a/0x170 [rdma_rxe]
Modules linked in: rpcrdma rdma_ucm ib_iser rnbd_client libiscsi rtrs_client scsi_transport_iscsi rtrs_core rdma_cm iw_cm ib_cm crc32_generic rdma_rxe udp_tunnel ib_uverbs ib_core kmem device_dax nd_pmem dax_pmem nd_vme crc32c_intel fuse nvme_core nfit libnvdimm dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_mirror dm_region_hash dm_log dm_mod
CPU: 0 PID: 26407 Comm: rnbd-client.sh Kdump: loaded Not tainted 6.2.0-rc6-roce-flush+ #53
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
RIP: 0010:__rxe_cleanup+0x13a/0x170 [rdma_rxe]
Code: 45 84 e4 0f 84 5a ff ff ff 48 89 ef e8 5f 18 71 f9 84 c0 75 90 be c8 00 00 00 48 89 ef e8 be 89 1f fa 85 c0 0f 85 7b ff ff ff <0f> 0b 41 bc ea ff ff ff e9 71 ff ff ff e8 84 7f 1f fa e9 d0 fe ff
RSP: 0018:ffffb09880b6f5f0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff99401f15d6a8 RCX: 0000000000000000
RDX: 0000000000000001 RSI: ffffffffbac8234b RDI: 00000000ffffffff
RBP: ffff99401f15d6d0 R08: 0000000000000001 R09: 0000000000000001
R10: 0000000000002d82 R11: 0000000000000000 R12: 0000000000000001
R13: ffff994101eff208 R14: ffffb09880b6f6a0 R15: 00000000fffffe00
FS: 00007fe113904740(0000) GS:ffff99413bc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ff6cde656c8 CR3: 000000001f108004 CR4: 00000000001706f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
rxe_dealloc_pd+0x16/0x20 [rdma_rxe]
ib_dealloc_pd_user+0x4b/0x80 [ib_core]
rtrs_ib_dev_put+0x79/0xd0 [rtrs_core]
destroy_con_cq_qp+0x8a/0xa0 [rtrs_client]
init_path+0x1e7/0x9a0 [rtrs_client]
? __pfx_autoremove_wake_function+0x10/0x10
? lock_is_held_type+0xd7/0x130
? rcu_read_lock_sched_held+0x43/0x80
? pcpu_alloc+0x3dd/0x7d0
? rtrs_clt_init_stats+0x18/0x40 [rtrs_client]
rtrs_clt_open+0x24f/0x5a0 [rtrs_client]
? __pfx_rnbd_clt_link_ev+0x10/0x10 [rnbd_client]
rnbd_clt_map_device+0x6a5/0xe10 [rnbd_client]
Fixes: 6a98d71dae ("RDMA/rtrs: client: main functionality")
Link: https://lore.kernel.org/r/1682384563-2-4-git-send-email-lizhijian@fujitsu.com
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Acked-by: Jack Wang <jinpu.wang@ionos.com>
Tested-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
marvell_nfc_setup_interface() uses the frequency retrieved from the
clock associated with the nand interface to determine the timings that
will be used. By changing the NAND frequency select without reflecting
this in the clock configuration this means that the timings calculated
don't correctly meet the requirements of the NAND chip. This hasn't been
an issue up to now because of a different bug that was stopping the
timings being updated after they were initially set.
Fixes: b25251414f ("mtd: rawnand: marvell: Stop implementing ->select_chip()")
Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20230525003154.2303012-2-chris.packham@alliedtelesis.co.nz
The addition of the mtdchar_read_ioctl() function caused the stack usage
of mtdchar_ioctl() to grow beyond the warning limit on 32-bit architectures
with gcc-13:
drivers/mtd/mtdchar.c: In function 'mtdchar_ioctl':
drivers/mtd/mtdchar.c:1229:1: error: the frame size of 1488 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
Mark both the read and write portions as noinline_for_stack to ensure
they don't get inlined and use separate stack slots to reduce the
maximum usage, both in the mtdchar_ioctl() and combined with any
of its callees.
Fixes: 095bb6e44e ("mtdchar: add MEMREAD ioctl")
Cc: stable@vger.kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20230417205654.1982368-1-arnd@kernel.org
Pull firewire fix from Takashi Sakamoto:
"A single patch to use a flexible array rather than a zero-length one"
* tag 'firewire-fixes-6.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
firewire: Replace zero-length array with flexible-array member
Pull mailbox fix from Jassi Brar:
"Fix missing mutex unlock in mailbox-test"
* tag 'mailbox-fixes-6.4-rc5' of git://git.linaro.org/landing-teams/working/fujitsu/integration:
mailbox: mailbox-test: fix a locking issue in mbox_test_message_write()
Avoid linker error for randomly generated config file that
has CONFIG_BRANCH_PROFILE_NONE enabled and make it similar
to riscv, x86 and also to commit 4bf3ec384e ("s390: disable
branch profiling for vdso").
Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
A switch held in reset by default needs to wait longer until we can
reliably detect it.
An issue was observed when testing on the Marvell 88E6393X (Link Street).
The driver failed to detect the switch on some upstarts. Increasing the
wait time after reset deactivation solves this issue.
The updated wait time is now also the same as the wait time in the
mv88e6xxx_hardware_reset function.
Fixes: 7b75e49de4 ("net: dsa: mv88e6xxx: wait after reset deactivation")
Signed-off-by: Andreas Svensson <andreas.svensson@axis.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20230530145223.1223993-1-andreas.svensson@axis.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Zero-length and one-element arrays are deprecated, and we are moving
towards adopting C99 flexible-array members, instead.
Address the following warnings found with GCC-13 and
-fstrict-flex-arrays=3 enabled:
sound/firewire/amdtp-stream.c: In function ‘build_it_pkt_header’:
sound/firewire/amdtp-stream.c:694:17: warning: ‘generate_cip_header’ accessing 8 bytes in a region of size 0 [-Wstringop-overflow=]
694 | generate_cip_header(s, cip_header, data_block_counter, syt);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
sound/firewire/amdtp-stream.c:694:17: note: referencing argument 2 of type ‘__be32[2]’ {aka ‘unsigned int[2]’}
sound/firewire/amdtp-stream.c:667:13: note: in a call to function ‘generate_cip_header’
667 | static void generate_cip_header(struct amdtp_stream *s, __be32 cip_header[2],
| ^~~~~~~~~~~~~~~~~~~
This helps with the ongoing efforts to tighten the FORTIFY_SOURCE
routines on memcpy() and help us make progress towards globally
enabling -fstrict-flex-arrays=3 [1].
Link: https://github.com/KSPP/linux/issues/21
Link: https://github.com/KSPP/linux/issues/303
Link: https://gcc.gnu.org/pipermail/gcc-patches/2022-October/602902.html [1]
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/ZHT0V3SpvHyxCv5W@work
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
This driver relies on IEEE80211_CONF_PS of hw->conf.flags to turn off PS or
turn on dynamic PS controlled by driver and firmware. Though this would be
incorrect, it did work before because the flag is always recalculated until
the commit 28977e790b ("wifi: mac80211: skip powersave recalc if driver SUPPORTS_DYNAMIC_PS")
is introduced by kernel 5.20 to skip to recalculate IEEE80211_CONF_PS
of hw->conf.flags if driver sets SUPPORTS_DYNAMIC_PS.
Correct this by doing recalculation while BSS_CHANGED_PS is changed and
interface is added or removed. For now, it is allowed to enter PS only if
single one station vif is working, and it could possible to have PS per
vif after firmware can support it. Without this fix, driver doesn't
enter PS anymore that causes higher power consumption.
Fixes: e3ec7017f6 ("rtw89: add Realtek 802.11ax driver")
Cc: stable@vger.kernel.org # 6.1+
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20230527082939.11206-3-pkshih@realtek.com
This driver relies on IEEE80211_CONF_PS of hw->conf.flags to turn off PS or
turn on dynamic PS controlled by driver and firmware. Though this would be
incorrect, it did work before because the flag is always recalculated until
the commit 28977e790b ("wifi: mac80211: skip powersave recalc if driver SUPPORTS_DYNAMIC_PS")
is introduced by kernel 5.20 to skip to recalculate IEEE80211_CONF_PS
of hw->conf.flags if driver sets SUPPORTS_DYNAMIC_PS.
Correct this by doing recalculation while BSS_CHANGED_PS is changed and
interface is added or removed. It is allowed to enter PS only if single
one station vif is working. Without this fix, driver doesn't enter PS
anymore that causes higher power consumption.
Fixes: bcde60e599 ("rtw88: remove misleading module parameter rtw_fw_support_lps")
Cc: stable@vger.kernel.org # 6.1+
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20230527082939.11206-2-pkshih@realtek.com
[BUG]
After commit e02ee89baa ("btrfs: scrub: switch scrub_simple_mirror()
to scrub_stripe infrastructure"), scrub no longer works for zoned device
at all.
Even an empty zoned btrfs cannot be replaced:
# mkfs.btrfs -f /dev/nvme0n1
# mount /dev/nvme0n1 /mnt/btrfs
# btrfs replace start -Bf 1 /dev/nvme0n2 /mnt/btrfs
Resetting device zones /dev/nvme1n1 (160 zones) ...
ERROR: ioctl(DEV_REPLACE_START) failed on "/mnt/btrfs/": Input/output error
And we can hit kernel crash related to that:
BTRFS info (device nvme1n1): host-managed zoned block device /dev/nvme3n1, 160 zones of 134217728 bytes
BTRFS info (device nvme1n1): dev_replace from /dev/nvme2n1 (devid 2) to /dev/nvme3n1 started
nvme3n1: Zone Management Append(0x7d) @ LBA 65536, 4 blocks, Zone Is Full (sct 0x1 / sc 0xb9) DNR
I/O error, dev nvme3n1, sector 786432 op 0xd:(ZONE_APPEND) flags 0x4000 phys_seg 3 prio class 2
BTRFS error (device nvme1n1): bdev /dev/nvme3n1 errs: wr 1, rd 0, flush 0, corrupt 0, gen 0
BUG: kernel NULL pointer dereference, address: 00000000000000a8
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
RIP: 0010:_raw_spin_lock_irqsave+0x1e/0x40
Call Trace:
<IRQ>
btrfs_lookup_ordered_extent+0x31/0x190
btrfs_record_physical_zoned+0x18/0x40
btrfs_simple_end_io+0xaf/0xc0
blk_update_request+0x153/0x4c0
blk_mq_end_request+0x15/0xd0
nvme_poll_cq+0x1d3/0x360
nvme_irq+0x39/0x80
__handle_irq_event_percpu+0x3b/0x190
handle_irq_event+0x2f/0x70
handle_edge_irq+0x7c/0x210
__common_interrupt+0x34/0xa0
common_interrupt+0x7d/0xa0
</IRQ>
<TASK>
asm_common_interrupt+0x22/0x40
[CAUSE]
Dev-replace reuses scrub code to iterate all extents and write the
existing content back to the new device.
And for zoned devices, we call fill_writer_pointer_gap() to make sure
all the writes into the zoned device is sequential, even if there may be
some gaps between the writes.
However we have several different bugs all related to zoned dev-replace:
- We are using ZONE_APPEND operation for metadata style write back
For zoned devices, btrfs has two ways to write data:
* ZONE_APPEND for data
This allows higher queue depth, but will not be able to know where
the write would land.
Thus needs to grab the real on-disk physical location in it's endio.
* WRITE for metadata
This requires single queue depth (new writes can only be submitted
after previous one finished), and all writes must be sequential.
For scrub, we go single queue depth, but still goes with ZONE_APPEND,
which requires btrfs_bio::inode being populated.
This is the cause of that crash.
- No correct tracing of write_pointer
After a write finished, we should forward sctx->write_pointer, or
fill_writer_pointer_gap() would not work properly and cause more
than necessary zero out, and fill the whole zone prematurely.
- Incorrect physical bytenr passed to fill_writer_pointer_gap()
In scrub_write_sectors(), one call site passes logical address, which
is completely wrong.
The other call site passes physical address of current sector, but
we should pass the physical address of the btrfs_bio we're submitting.
This is the cause of the -EIO errors.
[FIX]
- Do not use ZONE_APPEND for btrfs_submit_repair_write().
- Manually forward sctx->write_pointer after successful writeback
- Use the physical address of the to-be-submitted btrfs_bio for
fill_writer_pointer_gap()
Now zoned device replace would work as expected.
Reported-by: Christoph Hellwig <hch@lst.de>
Fixes: e02ee89baa ("btrfs: scrub: switch scrub_simple_mirror() to scrub_stripe infrastructure")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Pull HID fixes from Jiri Kosina:
- Regression fix for overlong long timeouts during initialization on
some Logitech Unifying devices (Bastien Nocera)
- error handling and overflow fixes for Wacom driver (Denis Arefev,
Jason Gerecke, Nikita Zhandarovich)
* tag 'for-linus-2023060101' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
HID: logitech-hidpp: Handle timeout differently from busy
HID: wacom: Add error check to wacom_parse_and_register()
HID: google: add jewel USB id
HID: wacom: avoid integer overflow in wacom_intuos_inout()
HID: wacom: Check for string overflow from strscpy calls
When a direct I/O write is performed, iomap_dio_rw() invalidates the
part of the page cache which the write is going to before carrying out
the write. In the odd case, the direct I/O write will be reading from
the same page it is writing to. gfs2 carries out writes with page
faults disabled, so it should have been obvious that this page
invalidation can cause iomap_dio_rw() to never make any progress.
Currently, gfs2 will end up in an endless retry loop in
gfs2_file_direct_write() instead, though.
Break this endless loop by limiting the number of retries and falling
back to buffered I/O after that.
Also simplify should_fault_in_pages() sightly and add a comment to make
the above case easier to understand.
Reported-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Pull ata fix from Damien Le Moal:
- Fix ata_find_dev() use of the device number to find a struct
ata_device for a port. This addresses issues with some passthrough
commands with libsas managed devices.
* tag 'ata-6.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
ata: libata-scsi: Use correct device no in ata_find_dev()
Pull smb server fixes from Steve French:
"Eight server fixes (most also for stable):
- Two fixes for uninitialized pointer reads (rename and link)
- Fix potential UAF in oplock break
- Two fixes for potential out of bound reads in negotiate
- Fix crediting bug
- Two fixes for xfstests (allocation size fix for test 694 and lookup
issue shown by test 464)"
* tag '6.4-rc4-smb3-server-fixes' of git://git.samba.org/ksmbd:
ksmbd: call putname after using the last component
ksmbd: fix incorrect AllocationSize set in smb2_get_info
ksmbd: fix UAF issue from opinfo->conn
ksmbd: fix multiple out-of-bounds read during context decoding
ksmbd: fix slab-out-of-bounds read in smb2_handle_negotiate
ksmbd: fix credit count leakage
ksmbd: fix uninitialized pointer read in smb2_create_link()
ksmbd: fix uninitialized pointer read in ksmbd_vfs_rename()
During mt8195_afe_init_clock(), mt8195_audsys_clk_register() was called
followed by several other devm functions. At mt8195_afe_deinit_clock()
located at mt8195_afe_pcm_dev_remove(), mt8195_audsys_clk_unregister()
was called.
However, there was an issue with the order in which these functions were
called. Specifically, the remove callback of platform_driver was called
before devres released the resource, resulting in a use-after-free issue
during remove time.
At probe time, the order of calls was:
1. mt8195_audsys_clk_register
2. afe_priv->clk = devm_kcalloc
3. afe_priv->clk[i] = devm_clk_get
At remove time, the order of calls was:
1. mt8195_audsys_clk_unregister
3. free afe_priv->clk[i]
2. free afe_priv->clk
To resolve the problem, we can utilize devm_add_action_or_reset() in
mt8195_audsys_clk_register() so that the remove order can be changed to
3->2->1.
Fixes: 6746cc8582 ("ASoC: mediatek: mt8195: add platform driver")
Signed-off-by: Trevor Wu <trevor.wu@mediatek.com>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Link: https://lore.kernel.org/r/20230601033318.10408-3-trevor.wu@mediatek.com
Signed-off-by: Mark Brown <broonie@kernel.org>
During mt8188_afe_init_clock(), mt8188_audsys_clk_register() was called
followed by several other devm functions. The caller of
mt8188_afe_init_clock() utilized devm_add_action_or_reset() to call
mt8188_afe_deinit_clock(). However, the order was incorrect, causing a
use-after-free issue during remove time.
At probe time, the order of calls was:
1. mt8188_audsys_clk_register
2. afe_priv->clk = devm_kcalloc
3. afe_priv->clk[i] = devm_clk_get
At remove time, the order of calls was:
1. mt8188_audsys_clk_unregister
3. free afe_priv->clk[i]
2. free afe_priv->clk
To resolve the problem, it's necessary to move devm_add_action_or_reset()
to the appropriate position so that the remove order can be 3->2->1.
Fixes: f6b026479b ("ASoC: mediatek: mt8188: support audio clock control")
Signed-off-by: Trevor Wu <trevor.wu@mediatek.com>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Link: https://lore.kernel.org/r/20230601033318.10408-2-trevor.wu@mediatek.com
Signed-off-by: Mark Brown <broonie@kernel.org>
If we send two TCA_FLOWER_KEY_ENC_OPTS_GENEVE packets and their total
size is 252 bytes(key->enc_opts.len = 252) then
key->enc_opts.len = opt->length = data_len / 4 = 0 when the third
TCA_FLOWER_KEY_ENC_OPTS_GENEVE packet enters fl_set_geneve_opt. This
bypasses the next bounds check and results in an out-of-bounds.
Fixes: 0a6e77784f ("net/sched: allow flower to match tunnel options")
Signed-off-by: Hangyu Hua <hbh25y@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Pieter Jansen van Vuuren <pieter.jansen-van-vuuren@amd.com>
Link: https://lore.kernel.org/r/20230531102805.27090-1-hbh25y@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
If an IOMMU domain was never attached, it lacks any linkage to the
actual IOMMU hardware. Attempting to do flush_iotlb_all() on it will
result in a NULL pointer dereference. This seems to happen after the
recent IOMMU core rework in v6.4-rc1.
Unable to handle kernel read from unreadable memory at virtual address 0000000000000018
Call trace:
mtk_iommu_flush_iotlb_all+0x20/0x80
iommu_create_device_direct_mappings.part.0+0x13c/0x230
iommu_setup_default_domain+0x29c/0x4d0
iommu_probe_device+0x12c/0x190
of_iommu_configure+0x140/0x208
of_dma_configure_id+0x19c/0x3c0
platform_dma_configure+0x38/0x88
really_probe+0x78/0x2c0
Check if the "bank" field has been filled in before actually attempting
the IOTLB flush to avoid it. The IOTLB is also flushed when the device
comes out of runtime suspend, so it should have a clean initial state.
Fixes: 08500c43d4 ("iommu/mediatek: Adjust the structure")
Signed-off-by: Chen-Yu Tsai <wenst@chromium.org>
Reviewed-by: Yong Wu <yong.wu@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Link: https://lore.kernel.org/r/20230526085402.394239-1-wenst@chromium.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
During driver load it reads embedded_cpu bit from initialization
segment, but the initialization segment is readable only after
initialization bit is cleared.
Move the call to mlx5_read_embedded_cpu() right after initialization bit
cleared.
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Fixes: 591905ba96 ("net/mlx5: Introduce Mellanox SmartNIC and modify page management logic")
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Allocation failure is outside the critical lock section and should
return immediately rather than jumping to the unlock section.
Also unlock as soon as required and remove the now redundant jump label.
Fixes: 80a2a9026b ("net/mlx5e: Add a lock on tir list")
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
When dynamic IRQ allocation is not supported all IRQs are allocated up
front in mlx5_irq_table_create() instead of dynamically as part of
mlx5_irq_alloc(). In the latter dynamic case irq->map.index is set
via the mapping returned by pci_msix_alloc_irq_at(). In the static case
and prior to commit 1da438c0ae ("net/mlx5: Fix indexing of mlx5_irq")
irq->map.index was set in mlx5_irq_alloc() twice once initially to 0 and
then to the requested index before storing in the xarray. After this
commit it is only set to 0 which breaks all other IRQ mappings.
Fix this by setting irq->map.index to the requested index together with
irq->map.virq and improve the related comment to make it clearer which
cases it deals with.
Cc: Chuck Lever III <chuck.lever@oracle.com>
Tested-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Eli Cohen <elic@nvidia.com>
Fixes: 1da438c0ae ("net/mlx5: Fix indexing of mlx5_irq")
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Tested-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Pull smb client fixes from Steve French:
"Four small smb3 client fixes:
- two small fixes suggested by kernel test robot
- small cleanup fix
- update Paulo's email address in the maintainer file"
* tag '6.4-rc4-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: address unused variable warning
smb: delete an unnecessary statement
smb3: missing null check in SMB2_change_notify
smb3: update a reviewer email in MAINTAINERS file
Separate jpegbRAS poison consumption handling from the instance irq, and
register dedicated ras_poison_irq src and funcs for UVD_POISON.
v2:
- Separate ras irq from jpeg instance irq
- Improve the subject and code comments
v3:
- Split the patch into three parts
- Improve the code comments
Suggested-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Separate vcn RAS poison consumption handling from the instance irq, and
register dedicated ras_poison_irq src and funcs for UVD_POISON.
v2:
- Separate ras irq from vcn instance irq
- Improve the subject and code comments
v3:
- Split the patch into three parts
- Improve the code comments
Suggested-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This patch reverses the DPM clocks levels output of pp_dpm_mclk
and pp_dpm_fclk for renoir.
On dGPUs and older APUs we expose the levels from lowest clocks
to highest clocks. But for some APUs, the clocks levels are
given the reversed orders by PMFW. Like the memory DPM clocks
that are exposed by pp_dpm_mclk.
It's not intuitive that they are reversed on these APUs. All tools
and software that talks to the driver then has to know different ways
to interpret the data depending on the asic.
So we need to reverse them to expose the clocks levels from the
driver consistently.
Signed-off-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
This patch reverses the DPM clocks levels output of pp_dpm_mclk
and pp_dpm_fclk.
On dGPUs and older APUs we expose the levels from lowest clocks
to highest clocks. But for some APUs, the clocks levels that from
the DFPstateTable are given the reversed orders by PMFW. Like the
memory DPM clocks that are exposed by pp_dpm_mclk.
It's not intuitive that they are reversed on these APUs. All tools
and software that talks to the driver then has to know different ways
to interpret the data depending on the asic.
So we need to reverse them to expose the clocks levels from the
driver consistently.
Signed-off-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
This patch reverses the DPM clocks levels output of pp_dpm_mclk
and pp_dpm_fclk.
On dGPUs and older APUs we expose the levels from lowest clocks
to highest clocks. But for some APUs, the clocks levels that from
the DFPstateTable are given the reversed orders by PMFW. Like the
memory DPM clocks that are exposed by pp_dpm_mclk.
It's not intuitive that they are reversed on these APUs. All tools
and software that talks to the driver then has to know different ways
to interpret the data depending on the asic.
So we need to reverse them to expose the clocks levels from the
driver consistently.
Signed-off-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
This patch reverses the DPM clocks levels output of pp_dpm_mclk.
On dGPUs and older APUs we expose the levels from lowest clocks
to highest clocks. But for some APUs, the clocks levels that from
the DFPstateTable are given the reversed orders by PMFW. Like the
memory DPM clocks that are exposed by pp_dpm_mclk.
It's not intuitive that they are reversed on these APUs. All tools
and software that talks to the driver then has to know different ways
to interpret the data depending on the asic.
So we need to reverse them to expose the clocks levels from the
driver consistently.
Signed-off-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
This patch reverses the DPM clocks levels output of pp_dpm_mclk
and pp_dpm_fclk.
On dGPUs and older APUs we expose the levels from lowest clocks
to highest clocks. But for some APUs, the clocks levels that from
the DFPstateTable are given the reversed orders by PMFW. Like the
memory DPM clocks that are exposed by pp_dpm_mclk.
It's not intuitive that they are reversed on these APUs. All tools
and software that talks to the driver then has to know different ways
to interpret the data depending on the asic.
So we need to reverse them to expose the clocks levels from the
driver consistently.
Signed-off-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Until commit 5c2712387d ("cacheinfo: Fix LLC is not exported through
sysfs"), cacheinfo called populate_cache_leaves() for CPU coming online
which let the arch specific functions handle (at least on x86)
populating the shared_cpu_map. However, with the changes in the
aforementioned commit, populate_cache_leaves() is not called when a CPU
comes online as a result of hotplug since last_level_cache_is_valid()
returns true as the cacheinfo data is not discarded. The CPU coming
online is not present in shared_cpu_map, however, it will not be added
since the cpu_cacheinfo->cpu_map_populated flag is set (it is set in
populate_cache_leaves() when cacheinfo is first populated for x86)
This can lead to inconsistencies in the shared_cpu_map when an offlined
CPU comes online again. Example below depicts the inconsistency in the
shared_cpu_list in cacheinfo when CPU8 is offlined and onlined again on
a 3rd Generation EPYC processor:
# for i in /sys/devices/system/cpu/cpu8/cache/index*/shared_cpu_list; do echo -n "$i: "; cat $i; done
/sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list: 8,136
/sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list: 8,136
/sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list: 8,136
/sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list: 8-15,136-143
# echo 0 > /sys/devices/system/cpu/cpu8/online
# echo 1 > /sys/devices/system/cpu/cpu8/online
# for i in /sys/devices/system/cpu/cpu8/cache/index*/shared_cpu_list; do echo -n "$i: "; cat $i; done
/sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list: 8
/sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list: 8
/sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list: 8
/sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list: 8
# cat /sys/devices/system/cpu/cpu136/cache/index0/shared_cpu_list
136
# cat /sys/devices/system/cpu/cpu136/cache/index3/shared_cpu_list
9-15,136-143
Clear the flag when the CPU is removed from shared_cpu_map when
cache_shared_cpu_map_remove() is called during CPU hotplug. This will
allow cache_shared_cpu_map_setup() to add the CPU coming back online in
the shared_cpu_map. Set the flag again when the shared_cpu_map is setup.
Following are results of performing the same test as described above with
the changes:
# for i in /sys/devices/system/cpu/cpu8/cache/index*/shared_cpu_list; do echo -n "$i: "; cat $i; done
/sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list: 8,136
/sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list: 8,136
/sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list: 8,136
/sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list: 8-15,136-143
# echo 0 > /sys/devices/system/cpu/cpu8/online
# echo 1 > /sys/devices/system/cpu/cpu8/online
# for i in /sys/devices/system/cpu/cpu8/cache/index*/shared_cpu_list; do echo -n "$i: "; cat $i; done
/sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list: 8,136
/sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list: 8,136
/sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list: 8,136
/sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list: 8-15,136-143
# cat /sys/devices/system/cpu/cpu136/cache/index0/shared_cpu_list
8,136
# cat /sys/devices/system/cpu/cpu136/cache/index3/shared_cpu_list
8-15,136-143
Fixes: 5c2712387d ("cacheinfo: Fix LLC is not exported through sysfs")
Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
Reviewed-by: Yicong Yang <yangyicong@hisilicon.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Link: https://lore.kernel.org/r/20230508084115.1157-3-kprateek.nayak@amd.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
While building the shared_cpu_map, check if the cache level and cache
type matches. On certain systems that build the cache topology based on
the instance ID, there are cases where the same ID may repeat across
multiple cache levels, leading inaccurate topology.
In event of CPU offlining, the cache_shared_cpu_map_remove() does not
consider if IDs at same level are being compared. As a result, when same
IDs repeat across different cache levels, the CPU going offline is not
removed from all the shared_cpu_map.
Below is the output of cache topology of CPU8 and it's SMT sibling after
CPU8 is offlined on a dual socket 3rd Generation AMD EPYC processor
(2 x 64C/128T) running kernel release v6.3:
# for i in /sys/devices/system/cpu/cpu8/cache/index*/shared_cpu_list; do echo -n "$i: "; cat $i; done
/sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list: 8,136
/sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list: 8,136
/sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list: 8,136
/sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list: 8-15,136-143
# echo 0 > /sys/devices/system/cpu/cpu8/online
# for i in /sys/devices/system/cpu/cpu136/cache/index*/shared_cpu_list; do echo -n "$i: "; cat $i; done
/sys/devices/system/cpu/cpu136/cache/index0/shared_cpu_list: 136
/sys/devices/system/cpu/cpu136/cache/index1/shared_cpu_list: 8,136
/sys/devices/system/cpu/cpu136/cache/index2/shared_cpu_list: 8,136
/sys/devices/system/cpu/cpu136/cache/index3/shared_cpu_list: 9-15,136-143
CPU8 is removed from index0 (L1i) but remains in the shared_cpu_list of
index1 (L1d) and index2 (L2). Since L1i, L1d, and L2 are shared by the
SMT siblings, and they have the same cache instance ID, CPU 2 is only
removed from the first index with matching ID which is index1 (L1i) in
this case. With this fix, the results are as expected when performing
the same experiment on the same system:
# for i in /sys/devices/system/cpu/cpu8/cache/index*/shared_cpu_list; do echo -n "$i: "; cat $i; done
/sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list: 8,136
/sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list: 8,136
/sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list: 8,136
/sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list: 8-15,136-143
# echo 0 > /sys/devices/system/cpu/cpu8/online
# for i in /sys/devices/system/cpu/cpu136/cache/index*/shared_cpu_list; do echo -n "$i: "; cat $i; done
/sys/devices/system/cpu/cpu136/cache/index0/shared_cpu_list: 136
/sys/devices/system/cpu/cpu136/cache/index1/shared_cpu_list: 136
/sys/devices/system/cpu/cpu136/cache/index2/shared_cpu_list: 136
/sys/devices/system/cpu/cpu136/cache/index3/shared_cpu_list: 9-15,136-143
When rebuilding topology, the same problem appears as
cache_shared_cpu_map_setup() implements a similar logic. Consider the
same 3rd Generation EPYC processor: CPUs in Core 1, that share the L1
and L2 caches, have L1 and L2 instance ID as 1. For all the CPUs on
the second chiplet, the L3 ID is also 1 leading to grouping on CPUs from
Core 1 (1, 17) and the entire second chiplet (8-15, 24-31) as CPUs
sharing one cache domain. This went undetected since x86 processors
depended on arch specific populate_cache_leaves() method to repopulate
the shared_cpus_map when CPU came back online until kernel release
v6.3-rc5.
Fixes: 198102c910 ("cacheinfo: Fix shared_cpu_map to handle shared caches at different levels")
Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Link: https://lore.kernel.org/r/20230508084115.1157-2-kprateek.nayak@amd.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Dan Carpenter spotted a race condition in a couple of situations like
these in the test_firmware driver:
static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
{
u8 val;
int ret;
ret = kstrtou8(buf, 10, &val);
if (ret)
return ret;
mutex_lock(&test_fw_mutex);
*(u8 *)cfg = val;
mutex_unlock(&test_fw_mutex);
/* Always return full write size even if we didn't consume all */
return size;
}
static ssize_t config_num_requests_store(struct device *dev,
struct device_attribute *attr,
const char *buf, size_t count)
{
int rc;
mutex_lock(&test_fw_mutex);
if (test_fw_config->reqs) {
pr_err("Must call release_all_firmware prior to changing config\n");
rc = -EINVAL;
mutex_unlock(&test_fw_mutex);
goto out;
}
mutex_unlock(&test_fw_mutex);
rc = test_dev_config_update_u8(buf, count,
&test_fw_config->num_requests);
out:
return rc;
}
static ssize_t config_read_fw_idx_store(struct device *dev,
struct device_attribute *attr,
const char *buf, size_t count)
{
return test_dev_config_update_u8(buf, count,
&test_fw_config->read_fw_idx);
}
The function test_dev_config_update_u8() is called from both the locked
and the unlocked context, function config_num_requests_store() and
config_read_fw_idx_store() which can both be called asynchronously as
they are driver's methods, while test_dev_config_update_u8() and siblings
change their argument pointed to by u8 *cfg or similar pointer.
To avoid deadlock on test_fw_mutex, the lock is dropped before calling
test_dev_config_update_u8() and re-acquired within test_dev_config_update_u8()
itself, but alas this creates a race condition.
Having two locks wouldn't assure a race-proof mutual exclusion.
This situation is best avoided by the introduction of a new, unlocked
function __test_dev_config_update_u8() which can be called from the locked
context and reducing test_dev_config_update_u8() to:
static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
{
int ret;
mutex_lock(&test_fw_mutex);
ret = __test_dev_config_update_u8(buf, size, cfg);
mutex_unlock(&test_fw_mutex);
return ret;
}
doing the locking and calling the unlocked primitive, which enables both
locked and unlocked versions without duplication of code.
The similar approach was applied to all functions called from the locked
and the unlocked context, which safely mitigates both deadlocks and race
conditions in the driver.
__test_dev_config_update_bool(), __test_dev_config_update_u8() and
__test_dev_config_update_size_t() unlocked versions of the functions
were introduced to be called from the locked contexts as a workaround
without releasing the main driver's lock and thereof causing a race
condition.
The test_dev_config_update_bool(), test_dev_config_update_u8() and
test_dev_config_update_size_t() locked versions of the functions
are being called from driver methods without the unnecessary multiplying
of the locking and unlocking code for each method, and complicating
the code with saving of the return value across lock.
Fixes: 7feebfa487 ("test_firmware: add support for request_firmware_into_buf")
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Russ Weight <russell.h.weight@intel.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Tianfei Zhang <tianfei.zhang@intel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: linux-kselftest@vger.kernel.org
Cc: stable@vger.kernel.org # v5.4
Suggested-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
Link: https://lore.kernel.org/r/20230509084746.48259-1-mirsad.todorovac@alu.unizg.hr
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There was a bug where this code forgot to unlock the tdev->mutex if the
kzalloc() failed. Fix this issue, by moving the allocation outside the
lock.
Fixes: 2d1e952a2b ("mailbox: mailbox-test: Fix potential double-free in mbox_test_message_write()")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Lee Jones <lee@kernel.org>
Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
Pull rdma fixes from Jason Gunthorpe:
- Fix 64K ARM page size support in bnxt_re and efa
- bnxt_re fixes for a memory leak, incorrect error handling and a
remove a bogus FW failure when running on a VF
- Update MAINTAINERS for hns and efa
- Fix two rxe regressions added this merge window in error unwind and
incorrect spinlock primitives
- hns gets a better algorithm for allocating page tables to avoid
running out of resources, and a timeout adjustment
- Fix a text case failure in hns
- Use after free in irdma and fix incorrect construction of a WQE
causing mis-execution
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/irdma: Fix Local Invalidate fencing
RDMA/irdma: Prevent QP use after free
MAINTAINERS: Update maintainer of Amazon EFA driver
RDMA/bnxt_re: Do not enable congestion control on VFs
RDMA/bnxt_re: Fix return value of bnxt_re_process_raw_qp_pkt_rx
RDMA/bnxt_re: Fix a possible memory leak
RDMA/hns: Modify the value of long message loopback slice
RDMA/hns: Fix base address table allocation
RDMA/hns: Fix timeout attr in query qp for HIP08
RDMA/efa: Fix unsupported page sizes in device
RDMA/rxe: Convert spin_{lock_bh,unlock_bh} to spin_{lock_irqsave,unlock_irqrestore}
RDMA/rxe: Fix double unlock in rxe_qp.c
MAINTAINERS: Update maintainers of HiSilicon RoCE
RDMA/bnxt_re: Fix the page_size used during the MR creation
Pull ext4 fixes from Ted Ts'o:
"Fix two regressions in ext4 and a number of issues reported by syzbot"
* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: enable the lazy init thread when remounting read/write
ext4: fix fsync for non-directories
ext4: add lockdep annotations for i_data_sem for ea_inode's
ext4: disallow ea_inodes with extended attributes
ext4: set lockdep subclass for the ea_inode in ext4_xattr_inode_cache_find()
ext4: add EA_INODE checking to ext4_iget()
gcc 13 may assign another type to enumeration constants than gcc 12. Split
the large enum at the top of source file stex.c such that the type of the
constants used in time expressions is changed back to the same type chosen
by gcc 12. This patch suppresses compiler warnings like this one:
In file included from ./include/linux/bitops.h:7,
from ./include/linux/kernel.h:22,
from drivers/scsi/stex.c:13:
drivers/scsi/stex.c: In function ‘stex_common_handshake’:
./include/linux/typecheck.h:12:25: error: comparison of distinct pointer types lacks a cast [-Werror]
12 | (void)(&__dummy == &__dummy2); \
| ^~
./include/linux/jiffies.h:106:10: note: in expansion of macro ‘typecheck’
106 | typecheck(unsigned long, b) && \
| ^~~~~~~~~
drivers/scsi/stex.c:1035:29: note: in expansion of macro ‘time_after’
1035 | if (time_after(jiffies, before + MU_MAX_DELAY * HZ)) {
| ^~~~~~~~~~
See also https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107405.
Cc: stable@vger.kernel.org
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20230529195034.3077-1-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If an attempt at contacting a receiver or a device fails because the
receiver or device never responds, don't restart the communication, only
restart it if the receiver or device answers that it's busy, as originally
intended.
This was the behaviour on communication timeout before commit 586e8fede7
("HID: logitech-hidpp: Retry commands when device is busy").
This fixes some overly long waits in a critical path on boot, when
checking whether the device is connected by getting its HID++ version.
Signed-off-by: Bastien Nocera <hadess@hadess.net>
Suggested-by: Mark Lord <mlord@pobox.com>
Fixes: 586e8fede7 ("HID: logitech-hidpp: Retry commands when device is busy")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217412
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
The bug here is that you cannot rely on getting the same socket
from multiple calls to fget() because userspace can influence
that. This is a kind of double fetch bug.
The fix is to delete the svc_alien_sock() function and instead do
the checking inside the svc_addsock() function.
Fixes: 3064639423 ("nfsd: check passed socket's net matches NFSd superblock's one")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: NeilBrown <neilb@suse.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Syzkaller got the following report:
BUG: KASAN: use-after-free in sk_setup_caps+0x621/0x690 net/core/sock.c:2018
Read of size 8 at addr ffff888027f82780 by task syz-executor276/3255
The function sk_setup_caps (called by ip6_sk_dst_store_flow->
ip6_dst_store) referenced already freed memory as this memory was
freed by parallel task in udpv6_sendmsg->ip6_sk_dst_lookup_flow->
sk_dst_check.
task1 (connect) task2 (udp6_sendmsg)
sk_setup_caps->sk_dst_set |
| sk_dst_check->
| sk_dst_set
| dst_release
sk_setup_caps references |
to already freed dst_entry|
The reason for this race condition is: sk_setup_caps() keeps using
the dst after transferring the ownership to the dst cache.
Found by Linux Verification Center (linuxtesting.org) with syzkaller.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Vladislav Efanov <VEfanov@ispras.ru>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
KVM maintains a mask of supported CPUs when a vPMU type is explicitly
selected by userspace and is used to reject any attempt to run the vCPU
on an unsupported CPU. This is great, but we're still beholden to the
default behavior where vCPUs can be scheduled anywhere and guest
counters may silently stop working.
Avoid confusing the next poor sod to look at this code and document the
intended behavior.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230525212723.3361524-3-oliver.upton@linux.dev
To date KVM has relied on using a perf event to probe the core PMU at
the time of vPMU initialization. Behind the scenes perf_event_init()
would iteratively walk the PMUs of the system and return the PMU that
could handle the event. However, an upcoming change in perf core will
drop the iterative walk, thereby breaking the fragile dance we do on the
KVM side.
Avoid the problem altogether by iterating over the list of supported
PMUs maintained in KVM, returning the core PMU that matches the CPU
we were called on.
Tested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230525212723.3361524-2-oliver.upton@linux.dev
The reference count on page table allocations is increased for every
'counted' PTE (valid or donated) in the table in addition to the initial
reference from ->zalloc_page(). kvm_pgtable_stage2_free_removed() fails
to drop the last reference on the root of the table walk, meaning we
leak memory.
Fix it by dropping the last reference after the free walker returns,
at which point all references for 'counted' PTEs have been released.
Cc: stable@vger.kernel.org
Fixes: 5c359cca1f ("KVM: arm64: Tear down unlinked stage-2 subtree after break-before-make")
Reported-by: Yu Zhao <yuzhao@google.com>
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Tested-by: Yu Zhao <yuzhao@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230530193213.1663411-1-oliver.upton@linux.dev
When use the following command to test:
1)ip link add bond0 type bond
2)ip link set bond0 up
3)tc qdisc add dev bond0 root handle ffff: mq
4)tc qdisc replace dev bond0 parent ffff:fff1 handle ffff: mq
The kernel reports NULL pointer dereference issue. The stack information
is as follows:
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
Internal error: Oops: 0000000096000006 [#1] SMP
Modules linked in:
pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : mq_attach+0x44/0xa0
lr : qdisc_graft+0x20c/0x5cc
sp : ffff80000e2236a0
x29: ffff80000e2236a0 x28: ffff0000c0e59d80 x27: ffff0000c0be19c0
x26: ffff0000cae3e800 x25: 0000000000000010 x24: 00000000fffffff1
x23: 0000000000000000 x22: ffff0000cae3e800 x21: ffff0000c9df4000
x20: ffff0000c9df4000 x19: 0000000000000000 x18: ffff80000a934000
x17: ffff8000f5b56000 x16: ffff80000bb08000 x15: 0000000000000000
x14: 0000000000000000 x13: 6b6b6b6b6b6b6b6b x12: 6b6b6b6b00000001
x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000
x8 : ffff0000c0be0730 x7 : bbbbbbbbbbbbbbbb x6 : 0000000000000008
x5 : ffff0000cae3e864 x4 : 0000000000000000 x3 : 0000000000000001
x2 : 0000000000000001 x1 : ffff8000090bc23c x0 : 0000000000000000
Call trace:
mq_attach+0x44/0xa0
qdisc_graft+0x20c/0x5cc
tc_modify_qdisc+0x1c4/0x664
rtnetlink_rcv_msg+0x354/0x440
netlink_rcv_skb+0x64/0x144
rtnetlink_rcv+0x28/0x34
netlink_unicast+0x1e8/0x2a4
netlink_sendmsg+0x308/0x4a0
sock_sendmsg+0x64/0xac
____sys_sendmsg+0x29c/0x358
___sys_sendmsg+0x90/0xd0
__sys_sendmsg+0x7c/0xd0
__arm64_sys_sendmsg+0x2c/0x38
invoke_syscall+0x54/0x114
el0_svc_common.constprop.1+0x90/0x174
do_el0_svc+0x3c/0xb0
el0_svc+0x24/0xec
el0t_64_sync_handler+0x90/0xb4
el0t_64_sync+0x174/0x178
This is because when mq is added for the first time, qdiscs in mq is set
to NULL in mq_attach(). Therefore, when replacing mq after adding mq, we
need to initialize qdiscs in the mq before continuing to graft. Otherwise,
it will couse NULL pointer dereference issue in mq_attach(). And the same
issue will occur in the attach functions of mqprio, taprio and htb.
ffff:fff1 means that the repalce qdisc is ingress. Ingress does not allow
any qdisc to be attached. Therefore, ffff:fff1 is incorrectly used, and
the command should be dropped.
Fixes: 6ec1c69a8f ("net_sched: add classful multiqueue dummy scheduler")
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Tested-by: Peilin Ye <peilin.ye@bytedance.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://lore.kernel.org/r/20230527093747.3583502-1-shaozhengchao@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently, after creating an ingress (or clsact) Qdisc and grafting it
under TC_H_INGRESS (TC_H_CLSACT), it is possible to graft it again under
e.g. a TBF Qdisc:
$ ip link add ifb0 type ifb
$ tc qdisc add dev ifb0 handle 1: root tbf rate 20kbit buffer 1600 limit 3000
$ tc qdisc add dev ifb0 clsact
$ tc qdisc link dev ifb0 handle ffff: parent 1:1
$ tc qdisc show dev ifb0
qdisc tbf 1: root refcnt 2 rate 20Kbit burst 1600b lat 560.0ms
qdisc clsact ffff: parent ffff:fff1 refcnt 2
^^^^^^^^
clsact's refcount has increased: it is now grafted under both
TC_H_CLSACT and 1:1.
ingress and clsact Qdiscs should only be used under TC_H_INGRESS
(TC_H_CLSACT). Prohibit regrafting them.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Fixes: 1f211a1b92 ("net, sched: add clsact qdisc")
Tested-by: Pedro Tammela <pctammela@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently it is possible to add e.g. an HTB Qdisc under ffff:fff1
(TC_H_INGRESS, TC_H_CLSACT):
$ ip link add name ifb0 type ifb
$ tc qdisc add dev ifb0 parent ffff:fff1 htb
$ tc qdisc add dev ifb0 clsact
Error: Exclusivity flag on, cannot modify.
$ drgn
...
>>> ifb0 = netdev_get_by_name(prog, "ifb0")
>>> qdisc = ifb0.ingress_queue.qdisc_sleeping
>>> print(qdisc.ops.id.string_().decode())
htb
>>> qdisc.flags.value_() # TCQ_F_INGRESS
2
Only allow ingress and clsact Qdiscs under ffff:fff1. Return -EINVAL
for everything else. Make TCQ_F_INGRESS a static flag of ingress and
clsact Qdiscs.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Fixes: 1f211a1b92 ("net, sched: add clsact qdisc")
Tested-by: Pedro Tammela <pctammela@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
clsact Qdiscs are only supposed to be created under TC_H_CLSACT (which
equals TC_H_INGRESS). Return -EOPNOTSUPP if 'parent' is not
TC_H_CLSACT.
Fixes: 1f211a1b92 ("net, sched: add clsact qdisc")
Tested-by: Pedro Tammela <pctammela@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Upon keep alive completion, nvme_keep_alive_work is scheduled with the
same delay every time. If keep alive commands are completing slowly,
this may cause a keep alive timeout. The following trace illustrates the
issue, taking KATO = 8 and TBKAS off for simplicity:
1. t = 0: run nvme_keep_alive_work, send keep alive
2. t = ε: keep alive reaches controller, controller restarts its keep
alive timer
3. t = 4: host receives keep alive completion, schedules
nvme_keep_alive_work with delay 4
4. t = 8: run nvme_keep_alive_work, send keep alive
Here, a keep alive having RTT of 4 causes a delay of at least 8 - ε
between the controller receiving successive keep alives. With ε small,
the controller is likely to detect a keep alive timeout.
Fix this by calculating the RTT of the keep alive command, and adjusting
the scheduling delay of the next keep alive work accordingly.
Reported-by: Costa Sapuntzakis <costa@purestorage.com>
Reported-by: Randy Jennings <randyj@purestorage.com>
Signed-off-by: Uday Shankar <ushankar@purestorage.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Pull btrfs fixes from David Sterba:
"One bug fix and two build warning fixes:
- call proper end bio callback for metadata RAID0 in a rare case of
an unaligned block
- fix uninitialized variable (reported by gcc 10.2)
- fix warning about potential access beyond array bounds on mips64
with 64k pages (runtime check would not allow that)"
* tag 'for-6.4-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix csum_tree_block page iteration to avoid tripping on -Werror=array-bounds
btrfs: fix an uninitialized variable warning in btrfs_log_inode
btrfs: call btrfs_orig_bbio_end_io in btrfs_end_bio_work
Pull perf tools fixes from Arnaldo Carvalho de Melo:
- Fix BPF CO-RE naming convention for checking the availability of
fields on 'union perf_mem_data_src' on the running kernel
- Remove the use of llvm-strip on BPF skel object files, not needed,
fixes a build breakage when the llvm package, that contains it in
most distros, isn't installed
- Fix tools that use both evsel->{bpf_counter_list,bpf_filters},
removing them from a union
- Remove extra "--" from the 'perf ftrace latency' --use-nsec option,
previously it was working only when using the '-n' alternative
- Don't stop building when both binutils-devel and a C++ compiler isn't
available to compile the alternative C++ demangle support code,
disable that feature instead
- Sync the linux/in.h and coresight-pmu.h header copies with the kernel
sources
- Fix relative include path to cs-etm.h
* tag 'perf-tools-fixes-for-v6.4-2-2023-05-30' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
perf evsel: Separate bpf_counter_list and bpf_filters, can be used at the same time
tools headers UAPI: Sync the linux/in.h with the kernel sources
perf cs-etm: Copy kernel coresight-pmu.h header
perf bpf: Do not use llvm-strip on BPF binary
perf build: Don't compile demangle-cxx.cpp if not necessary
perf arm: Fix include path to cs-etm.h
perf bpf filter: Fix a broken perf sample data naming for BPF CO-RE
perf ftrace latency: Remove unnecessary "--" from --use-nsec option
Pull regmap fixes from Mark Brown:
"The most important fix here is for missing dropping of the RCU read
lock when syncing maple tree register caches, the physical devices I
have that use the code don't do any syncing so I'd only ever tested
this with virtual devices and missed the fact that we need to drop the
lock in order to write to buses that need to sleep.
Otherwise there's a fix for an edge case when splitting up large batch
writes which has been lurking for a long time, a check to make sure
nobody writes new drivers with a bug that was found in several
SoundWire drivers and a tweak to the way the new kunit tests are
enabled to ensure they don't cause regmap to be enabled when it
wouldn't otherwise be"
* tag 'regmap-fix-v6.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
regmap: maple: Drop the RCU read lock while syncing registers
regmap: sdw: check for invalid multi-register writes config
regmap: Account for register length when chunking
regmap: REGMAP_KUNIT should not select REGMAP
Pull modules fix from Luis Chamberlain:
"A fix is provided for ia64. Even though ia64 is on life support it
helps to fix issues if we can. Thanks to Linus for doing tons of the
ia64 debugging"
* tag 'modules-6.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux:
module: fix module load for ia64
In commit a44be64bbe ("ext4: don't clear SB_RDONLY when remounting
r/w until quota is re-enabled") we defer clearing tyhe SB_RDONLY flag
in struct super. However, we didn't defer when we checked sb_rdonly()
to determine the lazy itable init thread should be enabled, with the
next result that the lazy inode table initialization would not be
properly started. This can cause generic/231 to fail in ext4's
nojournal mode.
Fix this by moving when we decide to start or stop the lazy itable
init thread to after we clear the SB_RDONLY flag when we are
remounting the file system read/write.
Fixes a44be64bbe ("ext4: don't clear SB_RDONLY when remounting r/w until...")
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20230527035729.1001605-1-tytso@mit.edu
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Commit e360c6ed72 ("ext4: Drop special handling of journalled data
from ext4_sync_file()") simplified ext4_sync_file() by dropping special
handling of journalled data mode as it was not needed anymore. However
that branch was also used for directories and symlinks and since the
fastcommit code does not track metadata changes to non-regular files, the
change has caused e.g. fsync(2) on directories to not commit transaction
as it should. Fix the problem by adding handling for non-regular files.
Fixes: e360c6ed72 ("ext4: Drop special handling of journalled data from ext4_sync_file()")
Reported-by: Eric Whitney <enwlinux@gmail.com>
Link: https://lore.kernel.org/all/ZFqO3xVnmhL7zv1x@debian-BULLSEYE-live-builder-AMD64
Signed-off-by: Jan Kara <jack@suse.cz>
Tested-by: Eric Whitney <enwlinux@gmail.com>
Link: https://lore.kernel.org/r/20230524104453.8734-1-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is (mostly) ignored
and this typically results in resource leaks. To improve here there is a
quest to make the remove callback return void. In the first step of this
quest all drivers are converted to .remove_new() which already returns
void.
Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Helge Deller <deller@gmx.de>
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is (mostly) ignored
and this typically results in resource leaks. To improve here there is a
quest to make the remove callback return void. In the first step of this
quest all drivers are converted to .remove_new() which already returns
void.
Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Helge Deller <deller@gmx.de>
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is (mostly) ignored
and this typically results in resource leaks. To improve here there is a
quest to make the remove callback return void. In the first step of this
quest all drivers are converted to .remove_new() which already returns
void.
Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Helge Deller <deller@gmx.de>
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is (mostly) ignored
and this typically results in resource leaks. To improve here there is a
quest to make the remove callback return void. In the first step of this
quest all drivers are converted to .remove_new() which already returns
void.
Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Helge Deller <deller@gmx.de>
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is (mostly) ignored
and this typically results in resource leaks. To improve here there is a
quest to make the remove callback return void. In the first step of this
quest all drivers are converted to .remove_new() which already returns
void.
Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Helge Deller <deller@gmx.de>
The driver core never calls a remove callback with the platform_device
pointer being NULL. So the check for this condition can just be dropped.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Helge Deller <deller@gmx.de>
When a command completes, we set a flag which will skip sending a
keep alive at the next run of nvme_keep_alive_work when TBKAS is on.
However, if the command was submitted long ago, it's possible that
the controller may have also restarted its keep alive timer (as a
result of receiving the command) long ago. The following trace
demonstrates the issue, assuming TBKAS is on and KATO = 8 for
simplicity:
1. t = 0: submit I/O commands A, B, C, D, E
2. t = 0.5: commands A, B, C, D, E reach controller, restart its keep
alive timer
3. t = 1: A completes
4. t = 2: run nvme_keep_alive_work, see recent completion, do nothing
5. t = 3: B completes
6. t = 4: run nvme_keep_alive_work, see recent completion, do nothing
7. t = 5: C completes
8. t = 6: run nvme_keep_alive_work, see recent completion, do nothing
9. t = 7: D completes
10. t = 8: run nvme_keep_alive_work, see recent completion, do nothing
11. t = 9: E completes
At this point, 8.5 seconds have passed without restarting the
controller's keep alive timer, so the controller will detect a keep
alive timeout.
Fix this by checking the IO start time when deciding to defer sending a
keep alive command. Only set comp_seen if the command started after the
most recent run of nvme_keep_alive_work. With this change, the
completions of B, C, and D will not set comp_seen and the run of
nvme_keep_alive_work at t = 4 will send a keep alive.
Reported-by: Costa Sapuntzakis <costa@purestorage.com>
Reported-by: Randy Jennings <randyj@purestorage.com>
Signed-off-by: Uday Shankar <ushankar@purestorage.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
With TBKAS on, the completion of one command can defer sending a
keep alive for up to twice the delay between successive runs of
nvme_keep_alive_work. The current delay of KATO / 2 thus makes it
possible for one command to defer sending a keep alive for up to
KATO, which can result in the controller detecting a KATO. The following
trace demonstrates the issue, taking KATO = 8 for simplicity:
1. t = 0: run nvme_keep_alive_work, no keep-alive sent
2. t = ε: I/O completion seen, set comp_seen = true
3. t = 4: run nvme_keep_alive_work, see comp_seen == true,
skip sending keep-alive, set comp_seen = false
4. t = 8: run nvme_keep_alive_work, see comp_seen == false,
send a keep-alive command.
Here, there is a delay of 8 - ε between receiving a command completion
and sending the next command. With ε small, the controller is likely to
detect a keep alive timeout.
Fix this by running nvme_keep_alive_work with a delay of KATO / 4
whenever TBKAS is on. Going through the above trace now gives us a
worst-case delay of 4 - ε, which is in line with the recommendation of
sending a command every KATO / 2 in the NVMe specification.
Reported-by: Costa Sapuntzakis <costa@purestorage.com>
Reported-by: Randy Jennings <randyj@purestorage.com>
Signed-off-by: Uday Shankar <ushankar@purestorage.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
In the function nvme_passthru_end(), only the value of the command
opcode is checked, without checking the command type (IO command or
Admin command). When we send a Dataset Management command (The opcode
of the Dataset Management command is the same as the Set Feature
command), kernel thinks it is a set feature command, then sets the
controller's keep alive interval, and calls nvme_keep_alive_work().
Signed-off-by: min15.li <min15.li@samsung.com>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
It is usually better to request all necessary resources (clocks,
regulators, ...) before starting to make use of them. That way they do
not change state in case one of the resources is not available yet and
probe deferral (-EPROBE_DEFER) is necessary. This is particularly
important for DMA channels and IOMMUs which are not enforced by
fw_devlink yet (unless you use fw_devlink.strict=1).
spi-qup does this in the wrong order, the clocks are enabled and
disabled again when the DMA channels are not available yet.
This causes issues in some cases: On most SoCs one of the SPI QUP
clocks is shared with the UART controller. When using earlycon UART is
actively used during boot but might not have probed yet, usually for
the same reason (waiting for the DMA controller). In this case, the
brief enable/disable cycle ends up gating the clock and further UART
console output will halt the system completely.
Avoid this by requesting the DMA channels before changing the clock
state.
Fixes: 612762e82a ("spi: qup: Add DMA capabilities")
Signed-off-by: Stephan Gerhold <stephan@gerhold.net>
Link: https://lore.kernel.org/r/20230518-spi-qup-clk-defer-v1-1-f49fc9ca4e02@gerhold.net
Signed-off-by: Mark Brown <broonie@kernel.org>
Sending the mutex address(acp_lock) as platform
data during ACP PDM platform driver register sequence,
its creating copy of the platform data.
Referencing this platform data in ACP PDM driver results
incorrect reference to the common lock usage.
Instead of directly passing the lock address as platform
data, retrieve it from parent driver data structure
and use the same lock reference in ACP PDM driver.
Fixes: 45aa83cb93 ("ASoC: amd: ps: use acp_lock to protect common registers in pdm driver")
Signed-off-by: Vijendar Mukunda <Vijendar.Mukunda@amd.com>
Link: https://lore.kernel.org/r/20230525113000.1290758-1-Vijendar.Mukunda@amd.com
Signed-off-by: Mark Brown <broonie@kernel.org>
LPUART IP now has two known bugs, one is that CTS has higher priority
than the break signal, which causes the break signal sending through
UARTCTRL_SBK may impacted by the CTS input if the HW flow control is
enabled. It exists on all platforms we support in this driver.
So we add a workaround patch for this issue: commit c4c81db5cf
("tty: serial: fsl_lpuart: disable the CTS when send break signal").
Another IP bug is i.MX8QM LPUART may have an additional break character
being sent after SBK was cleared. It may need to add some delay between
clearing SBK and re-enabling CTS to ensure that the SBK latch are
completely cleared.
But we found that during the delay period before CTS is enabled, there
is still a risk that Bluetooth data in TX FIFO may be sent out during
this period because of break off and CTS disabled(even if BT sets CTS
line deasserted, data is still sent to BT).
Due to this risk, we have to drop the CTS-disabling workaround for SBK
bugs, use TXINV seems to be a better way to replace SBK feature and
avoid above risk. Also need to disable the transmitter to prevent any
data from being sent out during break, then invert the TX line to send
break. Then disable the TXINV when turn off break and re-enable
transmitter.
Fixes: c4c81db5cf ("tty: serial: fsl_lpuart: disable the CTS when send break signal")
Cc: stable <stable@kernel.org>
Signed-off-by: Sherry Sun <sherry.sun@nxp.com>
Link: https://lore.kernel.org/r/20230519094751.28948-1-sherry.sun@nxp.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Matthieu Baerts says:
====================
selftests: mptcp: skip tests not supported by old kernels (part 1)
After a few years of increasing test coverage in the MPTCP selftests, we
realised [1] the last version of the selftests is supposed to run on old
kernels without issues.
Supporting older versions is not that easy for this MPTCP case: these
selftests are often validating the internals by checking packets that
are exchanged, when some MIB counters are incremented after some
actions, how connections are getting opened and closed in some cases,
etc. In other words, it is not limited to the socket interface between
the userspace and the kernelspace. In addition, the current selftests
run a lot of different sub-tests but the TAP13 protocol used in the
selftests don't support sub-tests: in other words, one failure in
sub-tests implies that the whole selftest is seen as failed at the end
because sub-tests are not tracked. It is then important to skip
sub-tests not supported by old kernels.
To minimise the modifications and reduce the complexity to support old
versions, the idea is to look at external signs and skip the whole
selftests or just some sub-tests before starting them.
This first part focuses on marking the different selftests as skipped
if MPTCP is not even supported. That's what is done in patches 2 to 8.
Patch 2/8 introduces a new file (mptcp_lib.sh) to be able to re-use some
helpers in the different selftests. The first MPTCP selftest has been
introduced in v5.6.
Patch 1/8 is a bit different but still linked: it modifies mptcp_join.sh
selftest not to use 'cmp --bytes' which is not supported by the BusyBox
implementation. It is apparently quite common to use BusyBox in CI
environments. This tool is needed for a subtest introduced in v6.1.
Link: https://lore.kernel.org/stable/CA+G9fYtDGpgT4dckXD-y-N92nqUxuvue_7AtDdBcHrbOMsDZLg@mail.gmail.com/ [1]
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
====================
Link: https://lore.kernel.org/r/20230528-upstream-net-20230528-mptcp-selftests-support-old-kernels-part-1-v1-0-a32d85577fc6@tessares.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Selftests are supposed to run on any kernels, including the old ones not
supporting MPTCP.
A new check is then added to make sure MPTCP is supported. If not, the
test stops and is marked as "skipped". Note that this check can also
mark the test as failed if 'SELFTESTS_MPTCP_LIB_EXPECT_ALL_FEATURES' env
var is set to 1: by doing that, we can make sure a test is not being
skipped by mistake.
A new shared file is added here to be able to re-used the same check in
the different selftests we have.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
Fixes: 048d19d444 ("mptcp: add basic kselftest for mptcp")
Cc: stable@vger.kernel.org
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The apc->eth_stats.rx_cqes is one per NIC (vport), and it's on the
frequent and parallel code path of all queues. So, r/w into this
single shared variable by many threads on different CPUs creates a
lot caching and memory overhead, hence perf regression. And, it's
not accurate due to the high volume concurrent r/w.
For example, a workload is iperf with 128 threads, and with RPS
enabled. We saw perf regression of 25% with the previous patch
adding the counters. And this patch eliminates the regression.
Since the error path of mana_poll_rx_cq() already has warnings, so
keeping the counter and convert it to a per-queue variable is not
necessary. So, just remove this counter from this high frequency
code path.
Also, remove the tx_cqes counter for the same reason. We have
warnings & other counters for errors on that path, and don't need
to count every normal cqe processing.
Cc: stable@vger.kernel.org
Fixes: bd7fc6e195 ("net: mana: Add new MANA VF performance counters for easier troubleshooting")
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/1685115537-31675-1-git-send-email-haiyangz@microsoft.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
When checking for OF quirks, make sure either 'compatible' or 'property'
is set, and give up otherwise.
This avoids non-OF quirks being randomly applied as they don't have any
of the OF data that need checking.
Cc: Douglas Anderson <dianders@chromium.org>
Reported-by: Geert Uytterhoeven <geert+renesas@glider.be>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Fixes: 44bd78dd2b ("irqchip/gic-v3: Disable pseudo NMIs on Mediatek devices w/ firmware issues")
Signed-off-by: Marc Zyngier <maz@kernel.org>
We encountered a crash when using SMCRv2. It is caused by a logical
error in smc_llc_fill_ext_v2().
BUG: kernel NULL pointer dereference, address: 0000000000000014
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 7 PID: 453 Comm: kworker/7:4 Kdump: loaded Tainted: G W E 6.4.0-rc3+ #44
Workqueue: events smc_llc_add_link_work [smc]
RIP: 0010:smc_llc_fill_ext_v2+0x117/0x280 [smc]
RSP: 0018:ffffacb5c064bd88 EFLAGS: 00010282
RAX: ffff9a6bc1c3c02c RBX: ffff9a6be3558000 RCX: 0000000000000000
RDX: 0000000000000002 RSI: 0000000000000002 RDI: 000000000000000a
RBP: ffffacb5c064bdb8 R08: 0000000000000040 R09: 000000000000000c
R10: ffff9a6bc0910300 R11: 0000000000000002 R12: 0000000000000000
R13: 0000000000000002 R14: ffff9a6bc1c3c02c R15: ffff9a6be3558250
FS: 0000000000000000(0000) GS:ffff9a6eefdc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000014 CR3: 000000010b078003 CR4: 00000000003706e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
smc_llc_send_add_link+0x1ae/0x2f0 [smc]
smc_llc_srv_add_link+0x2c9/0x5a0 [smc]
? cc_mkenc+0x40/0x60
smc_llc_add_link_work+0xb8/0x140 [smc]
process_one_work+0x1e5/0x3f0
worker_thread+0x4d/0x2f0
? __pfx_worker_thread+0x10/0x10
kthread+0xe5/0x120
? __pfx_kthread+0x10/0x10
ret_from_fork+0x2c/0x50
</TASK>
When an alernate RNIC is available in system, SMC will try to add a new
link based on the RNIC for resilience. All the RMBs in use will be mapped
to the new link. Then the RMBs' MRs corresponding to the new link will be
filled into SMCRv2 LLC ADD LINK messages.
However, smc_llc_fill_ext_v2() mistakenly accesses to unused RMBs which
haven't been mapped to the new link and have no valid MRs, thus causing
a crash. So this patch fixes the logic.
Fixes: b4ba4652b3 ("net/smc: extend LLC layer for SMC-Rv2")
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
When finding the first RMB of link group, it should start from the
current RMB list whose index is 0. So fix it.
Fixes: b4ba4652b3 ("net/smc: extend LLC layer for SMC-Rv2")
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Devices with a type-cover have an additional "book" mode, deactivating
type-cover input and turning off its backlight. This is currently
unsupported, leading to the warning
surface_aggregator_tablet_mode_switch 01:26:01:00:01: unknown device posture for type-cover: 6
Therefore, add support for this state and map it to enable tablet-mode.
Fixes: 37ff64cd81 ("platform/surface: aggregator_tabletsw: Add support for Type-Cover posture source")
Signed-off-by: Maximilian Luz <luzmaximilian@gmail.com>
Link: https://lore.kernel.org/r/20230525213218.2797480-3-luzmaximilian@gmail.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Devices with a type-cover have an additional "book" mode, deactivating
type-cover input and turning off its backlight. This is currently
unsupported, leading to the warning
surface_aggregator_tablet_mode_switch 01:0e:01:00:01: unknown KIP cover state: 6
Therefore, add support for this state and map it to enable tablet-mode.
Fixes: 9f794056db ("platform/surface: Add KIP/POS tablet-mode switch driver")
Signed-off-by: Maximilian Luz <luzmaximilian@gmail.com>
Link: https://lore.kernel.org/r/20230525213218.2797480-2-luzmaximilian@gmail.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Currently, event completion work-items are restricted to be run strictly
in non-parallel fashion by the respective workqueue. However, this has
lead to some problems:
In some instances, the event notifier function called inside this
completion workqueue takes a non-negligible amount of time to execute.
One such example is the battery event handling code (surface_battery.c),
which can result in a full battery information refresh, involving
further synchronous communication with the EC inside the event handler.
This is made worse if the communication fails spuriously, generally
incurring a multi-second timeout.
Since the event completions are run strictly non-parallel, this blocks
other events from being propagated to the respective subsystems. This
becomes especially noticeable for keyboard and touchpad input, which
also funnel their events through this system. Here, users have reported
occasional multi-second "freezes".
Note, however, that the event handling system was never intended to run
purely sequentially. Instead, we have one work struct per EC/SAM
subsystem, processing the event queue for that subsystem. These work
structs were intended to run in parallel, allowing sequential processing
of work items for each subsystem but parallel processing of work items
across subsystems.
The only restriction to this is the way the workqueue is created.
Therefore, replace create_workqueue() with alloc_workqueue() and do not
restrict the maximum number of parallel work items to be executed on
that queue, resolving any cross-subsystem blockage.
Fixes: c167b9c7e3 ("platform/surface: Add Surface Aggregator subsystem")
Link: https://github.com/linux-surface/linux-surface/issues/1026
Signed-off-by: Maximilian Luz <luzmaximilian@gmail.com>
Link: https://lore.kernel.org/r/20230525210110.2785470-1-luzmaximilian@gmail.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Make to_ssam_device_driver() a bit safer by replacing container_of()
with container_of_const() to respect the constness of the passed in
pointer, instead of silently discarding any const specifications. This
change also makes it more similar to to_ssam_device(), which already
uses container_of_const().
Signed-off-by: Maximilian Luz <luzmaximilian@gmail.com>
Link: https://lore.kernel.org/r/20230525205041.2774947-1-luzmaximilian@gmail.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
The mailing list that goes to linux-trace-devel is for the tracing
libraries, and the patchwork associated to the tracing libraries keys
off of that mailing list.
For anything that lives in the Linux kernel proper (including the tools
directory) must go through linux-trace-kernel, as the patchwork to that
list keys off of the Linux kernel proper.
Update the MAINTAINERS file to reflect the proper mailing lists.
Link: https://lore.kernel.org/linux-trace-kernel/20230529044002.0481452b@rorschach.local.home
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Daniel Bristot de Oliveira <bristot@kernel.org>
kallsyms_lookup() which in turn calls kallsyms_lookup_buildid() writes
to index "KSYM_NAME_LEN - 1".
Thus the array passed as namebuf to kallsyms_lookup() should be
KSYM_NAME_LEN in size.
In xmon.c the array was defined to be "128" bytes directly, without
using KSYM_NAME_LEN. Commit b8a94bfb33 ("kallsyms: increase maximum
kernel symbol length to 512") changed the value to 512, but missed
updating the xmon code.
Fixes: b8a94bfb33 ("kallsyms: increase maximum kernel symbol length to 512")
Cc: stable@vger.kernel.org # v6.1+
Co-developed-by: Onkarnath <onkarnath.1@samsung.com>
Signed-off-by: Onkarnath <onkarnath.1@samsung.com>
Signed-off-by: Maninder Singh <maninder1.s@samsung.com>
[mpe: Tweak change log wording and fix commit reference]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230529111337.352990-2-maninder1.s@samsung.com
Currently in tce_freemulti_pSeriesLP() there is no limit on how many
TCEs are passed to the H_STUFF_TCE hcall. This has not caused an issue
until now, but newer firmware releases have started enforcing a limit of
512 TCEs per call.
The limit is correct per the specification (PAPR v2.12 § 14.5.4.2.3).
The code has been in it's current form since it was initially merged.
Cc: stable@vger.kernel.org
Signed-off-by: Gaurav Batra <gbatra@linux.vnet.ibm.com>
Reviewed-by: Brian King <brking@linux.vnet.ibm.com>
[mpe: Tweak change log wording & add PAPR reference]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230525143454.56878-1-gbatra@linux.vnet.ibm.com
The recently added P10 AES/GCM code added some files containing
CRYPTOGAMS perl-asm code which are near duplicates of the p8 files
found in drivers/crypto/vmx.
In particular the newly added files produce functions with identical
names to the existing code.
When the kernel is built with CONFIG_CRYPTO_AES_GCM_P10=y and
CONFIG_CRYPTO_DEV_VMX_ENCRYPT=y that leads to link errors, eg:
ld: drivers/crypto/vmx/aesp8-ppc.o: in function `aes_p8_set_encrypt_key':
(.text+0xa0): multiple definition of `aes_p8_set_encrypt_key'; arch/powerpc/crypto/aesp8-ppc.o:(.text+0xa0): first defined here
...
ld: drivers/crypto/vmx/ghashp8-ppc.o: in function `gcm_ghash_p8':
(.text+0x140): multiple definition of `gcm_ghash_p8'; arch/powerpc/crypto/ghashp8-ppc.o:(.text+0x2e4): first defined here
Fix it for now by renaming the newly added files and functions to use
"p10" instead of "p8" in the names.
Fixes: 45a4672b9a ("crypto: p10-aes-gcm - Update Kconfig and Makefile")
Tested-by: Vishal Chourasia <vishalc@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230525150501.37081-1-mpe@ellerman.id.au
For devices not attached to a port multiplier and managed directly by
libata, the device number passed to ata_find_dev() must always be lower
than the maximum number of devices returned by ata_link_max_devices().
That is 1 for SATA devices or 2 for an IDE link with master+slave
devices. This device number is the SCSI device ID which matches these
constraints as the IDs are generated per port and so never exceed the
maximum number of devices for the link being used.
However, for libsas managed devices, SCSI device IDs are assigned per
struct scsi_host, leading to device IDs for SATA devices that can be
well in excess of libata per-link maximum number of devices. This
results in ata_find_dev() to always return NULL for libsas managed
devices except for the first device of the target scsi_host with ID
(device number) equal to 0. This issue is visible by executing the
hdparm utility, which fails. E.g.:
hdparm -i /dev/sdX
/dev/sdX:
HDIO_GET_IDENTITY failed: No message of desired type
Fix this by rewriting ata_find_dev() to ignore the device number for
non-PMP attached devices with a link with at most 1 device, that is SATA
devices. For these, the device number 0 is always used to
return the correct pointer to the struct ata_device of the port link.
This change excludes IDE master/slave setups (maximum number of devices
per link is 2) and port-multiplier attached devices. Also, to be
consistant with the fact that SCSI device IDs and channel numbers used
as device numbers are both unsigned int, change the devno argument of
ata_find_dev() to unsigned int.
Reported-by: Xingui Yang <yangxingui@huawei.com>
Fixes: 41bda9c980 ("libata-link: update hotplug to handle PMP links")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Jason Yan <yanaijie@huawei.com>
There is a window where the poll cq may use a QP that has been freed.
This can happen if a CQE is polled before irdma_clean_cqes() can clear the
CQE's related to the QP and the destroy QP races to free the QP memory.
then the QP structures are used in irdma_poll_cq. Fix this by moving the
clearing of CQE's before the reference is removed and the QP is destroyed.
Fixes: b48c24c2d7 ("RDMA/irdma: Implement device supported verb APIs")
Link: https://lore.kernel.org/r/20230522155654.1309-3-shiraz.saleem@intel.com
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Without EXCLUSIVE_SYSTEM_RAM, users are allowed to map arbitrary
physical memory regions into the userspace via /dev/mem. At the same
time, pages may change their properties (e.g., from anonymous pages to
named pages) while they are still being mapped in the userspace, leading
to "corruption" detected by the page table check.
To avoid these false positives, this patch makes PAGE_TABLE_CHECK
depends on EXCLUSIVE_SYSTEM_RAM. This dependency is understandable
because PAGE_TABLE_CHECK is a hardening technique but /dev/mem without
STRICT_DEVMEM (i.e., !EXCLUSIVE_SYSTEM_RAM) is itself a security
problem.
Even with EXCLUSIVE_SYSTEM_RAM, I/O pages may be still allowed to be
mapped via /dev/mem. However, these pages are always considered as named
pages, so they won't break the logic used in the page table check.
Cc: <stable@vger.kernel.org> # 5.17
Signed-off-by: Ruihan Li <lrh2000@pku.edu.cn>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Link: https://lore.kernel.org/r/20230515130958.32471-4-lrh2000@pku.edu.cn
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When hcd->localmem_pool is non-null, localmem_pool is used to allocate
DMA memory. In this case, the dma address will be properly returned (in
dma_handle), and dma_mmap_coherent should be used to map this memory
into the user space. However, the current implementation uses
pfn_remap_range, which is supposed to map normal pages.
Instead of repeating the logic in the memory allocation function, this
patch introduces a more robust solution. Here, the type of allocated
memory is checked by testing whether dma_handle is properly set. If
dma_handle is properly returned, it means some DMA pages are allocated
and dma_mmap_coherent should be used to map them. Otherwise, normal
pages are allocated and pfn_remap_range should be called. This ensures
that the correct mmap functions are used consistently, independently
with logic details that determine which type of memory gets allocated.
Fixes: a0e710a7de ("USB: usbfs: fix mmap dma mismatch")
Cc: stable@vger.kernel.org
Signed-off-by: Ruihan Li <lrh2000@pku.edu.cn>
Link: https://lore.kernel.org/r/20230515130958.32471-3-lrh2000@pku.edu.cn
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The current implementation of usbdev_mmap uses usb_alloc_coherent to
allocate memory pages that will later be mapped into the user space.
Meanwhile, usb_alloc_coherent employs three different methods to
allocate memory, as outlined below:
* If hcd->localmem_pool is non-null, it uses gen_pool_dma_alloc to
allocate memory;
* If DMA is not available, it uses kmalloc to allocate memory;
* Otherwise, it uses dma_alloc_coherent.
However, it should be noted that gen_pool_dma_alloc does not guarantee
that the resulting memory will be page-aligned. Furthermore, trying to
map slab pages (i.e., memory allocated by kmalloc) into the user space
is not resonable and can lead to problems, such as a type confusion bug
when PAGE_TABLE_CHECK=y [1].
To address these issues, this patch introduces hcd_alloc_coherent_pages,
which addresses the above two problems. Specifically,
hcd_alloc_coherent_pages uses gen_pool_dma_alloc_align instead of
gen_pool_dma_alloc to ensure that the memory is page-aligned. To replace
kmalloc, hcd_alloc_coherent_pages directly allocates pages by calling
__get_free_pages.
Reported-by: syzbot+fcf1a817ceb50935ce99@syzkaller.appspotmail.comm
Closes: https://lore.kernel.org/lkml/000000000000258e5e05fae79fc1@google.com/ [1]
Fixes: f7d34b445a ("USB: Add support for usbfs zerocopy.")
Fixes: ff2437befd ("usb: host: Fix excessive alignment restriction for local memory allocations")
Cc: stable@vger.kernel.org
Signed-off-by: Ruihan Li <lrh2000@pku.edu.cn>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Link: https://lore.kernel.org/r/20230515130958.32471-2-lrh2000@pku.edu.cn
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The scsi driver function sd_read_block_characteristics() always calls
disk_set_zoned() to a disk zoned model correctly, in case the device
model changed. This is done even for regular disks to set the zoned
model to BLK_ZONED_NONE and free any zone related resources if the drive
previously was zoned.
This behavior significantly impact the time it takes to revalidate disks
on a large system as the call to disk_clear_zone_settings() done from
disk_set_zoned() for the BLK_ZONED_NONE case results in the device
request queued to be frozen, even if there are no zone resources to
free.
Avoid this overhead for non-zoned devices by not calling
disk_clear_zone_settings() in disk_set_zoned() if the device model
was already set to BLK_ZONED_NONE, which is always the case for regular
devices.
Reported by: Brian Bunker <brian@purestorage.com>
Fixes: 508aebb805 ("block: introduce blk_queue_clear_zone_settings()")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20230529073237.1339862-1-dlemoal@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
While exercising the unbind path, with the current implementation
the functionfs_unbind would be calling which waits for the ffs->mutex
to be available, however within the same time ffs_ep0_read is invoked
& if no setup packets are pending, it will invoke function
wait_event_interruptible_exclusive_locked_irq which by definition waits
for the ev.count to be increased inside the same mutex for which
functionfs_unbind is waiting.
This creates deadlock situation because the functionfs_unbind won't
get the lock until ev.count is increased which can only happen if
the caller ffs_func_unbind can proceed further.
Following is the illustration:
CPU1 CPU2
ffs_func_unbind() ffs_ep0_read()
mutex_lock(ffs->mutex)
wait_event(ffs->ev.count)
functionfs_unbind()
mutex_lock(ffs->mutex)
mutex_unlock(ffs->mutex)
ffs_event_add()
<deadlock>
Fix this by moving the event unbind before functionfs_unbind
to ensure the ev.count is incrased properly.
Fixes: 6a19da1110 ("usb: gadget: f_fs: Prevent race during ffs_ep0_queue_wait")
Cc: stable <stable@kernel.org>
Signed-off-by: Uttkarsh Aggarwal <quic_uaggarwa@quicinc.com>
Link: https://lore.kernel.org/r/20230525092854.7992-1-quic_uaggarwa@quicinc.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
At iMX8QM platform, enable NCM gadget and run 'iperf3 -s'.
At host, run 'iperf3 -V -c fe80::6863:98ff:feef:3e0%enxc6e147509498'
[ 5] 0.00-1.00 sec 1.55 MBytes 13.0 Mbits/sec 90 4.18 KBytes
[ 5] 1.00-2.00 sec 1.44 MBytes 12.0 Mbits/sec 75 4.18 KBytes
[ 5] 2.00-3.00 sec 1.48 MBytes 12.4 Mbits/sec 75 4.18 KBytes
Expected speed should be bigger than 300Mbits/sec.
The root cause of this performance drop was found to be data corruption
happening at 4K borders in some Ethernet packets, leading to TCP
checksum errors. This corruption occurs from the position
(4K - (address & 0x7F)) to 4K. The u_ether function's allocation of
skb_buff reserves 64B, meaning all RX addresses resemble 0xXXXX0040.
Force trb_burst_size to 16 can fix this problem.
Cc: stable@vger.kernel.org
Fixes: 7733f6c32e ("usb: cdns3: Add Cadence USB3 DRD Driver")
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Link: https://lore.kernel.org/r/20230518154946.3666662-1-Frank.Li@nxp.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The channel's rpmsg object allows new invocations to be made. After old
invocations are already interrupted, the driver shouldn't try to invoke
anymore. Invalidating the rpmsg at the end of the driver removal
function makes it easy to cause a race condition in userspace. Even
closing a file descriptor before the driver finishes its cleanup can
cause an invocation via fastrpc_release_current_dsp_process() and
subsequent timeout.
Invalidate the channel before the invocations are interrupted to make
sure that no invocations can be created to hang after the device closes.
Fixes: c68cfb718c ("misc: fastrpc: Add support for context Invoke method")
Cc: stable <stable@kernel.org>
Signed-off-by: Richard Acayan <mailingradian@gmail.com>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Link: https://lore.kernel.org/r/20230523152550.438363-5-srinivas.kandagatla@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The userspace map request for remote heap allocates CMA memory.
The ownership of this memory needs to be reassigned to proper
owners to allow access from the protection domain running on
DSP. This reassigning of ownership is not correct if done for
any other supported flags.
When any other flag is requested from userspace, fastrpc is
trying to reassign the ownership of memory and this reassignment
is getting skipped for remote heap request which is incorrect.
Add proper flag check to reassign the memory only if remote heap
is requested.
Fixes: 532ad70c6d ("misc: fastrpc: Add mmap request assigning for static PD pool")
Cc: stable <stable@kernel.org>
Tested-by: Ekansh Gupta <quic_ekangupt@quicinc.com>
Signed-off-by: Ekansh Gupta <quic_ekangupt@quicinc.com>
Reviewed-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Link: https://lore.kernel.org/r/20230523152550.438363-3-srinivas.kandagatla@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If a map request is made with securemap attribute, the memory
ownership needs to be reassigned to new VMID to allow access
from protection domain. Currently only DSP VMID is passed to
the reassign call which is incorrect as only a combination of
HLOS and DSP VMID is allowed for memory ownership reassignment
and passing only DSP VMID will cause assign call failure.
Also pass proper restoring permissions to HLOS as the source
permission will now carry both HLOS and DSP VMID permission.
Change is also made to get valid physical address from
scatter/gather for this allocation request.
Fixes: e90d911906 ("misc: fastrpc: Add support to secure memory map")
Cc: stable <stable@kernel.org>
Tested-by: Ekansh Gupta <quic_ekangupt@quicinc.com>
Signed-off-by: Ekansh Gupta <quic_ekangupt@quicinc.com>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Link: https://lore.kernel.org/r/20230523152550.438363-2-srinivas.kandagatla@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Hibernation support depends on firmware marking its reserved/PMP
protected regions as not accessible from Linux.
The latest versions of the de-facto SBI implementation (OpenSBI) do
not do this, having dropped the no-map property to enable 1 GiB huge
page mappings by the kernel.
This was exposed by commit 3335068f87 ("riscv: Use PUD/P4D/PGD pages
for the linear mapping"), which made the first 2 MiB of DRAM (where SBI
typically resides) accessible by the kernel.
Attempting to hibernate with either OpenSBI, or other implementations
following its lead, will lead to a kernel panic ([1], [2]) as the
hibernation process will attempt to save/restore any mapped regions,
including the PMP protected regions in use by the SBI implementation.
Mark hibernation as depending on "NONPORTABLE", as only a small subset
of systems are capable of supporting it, until such time that an SBI
implementation independent way to communicate what regions are in use
has been agreed on.
As hibernation support landed in v6.4-rc1, disabling it for most
platforms does not constitute a regression. The alternative would have
been reverting commit 3335068f87 ("riscv: Use PUD/P4D/PGD pages for
the linear mapping").
Doing so would permit hibernation on platforms with these SBI
implementations, but would limit the options we have to solve the
protection of the region without causing a regression in hibernation
support.
Reported-by: Song Shuai <suagrfillet@gmail.com>
Link: https://lore.kernel.org/all/CAAYs2=gQvkhTeioMmqRDVGjdtNF_vhB+vm_1dHJxPNi75YDQ_Q@mail.gmail.com/ [1]
Reported-by: JeeHeng Sia <jeeheng.sia@starfivetech.com>
Link: https://groups.google.com/a/groups.riscv.org/g/sw-dev/c/ITXwaKfA6z8 [2]
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230526-astride-detonator-9ae120051159@wendy
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Pull tracing fixes from Steven Rostedt:
"User events:
- Use long instead of int for storing the enable set/clear bit, as it
was found that big endian machines could end up using the wrong
bits.
- Split allocating mm and attaching it. This keeps the allocation
separate from the registration and avoids various races.
- Remove RCU locking around pin_user_pages_remote() as that can
schedule. The RCU protection is no longer needed with the above
split of mm allocation and attaching.
- Rename the "link" fields of the various structs to something more
meaningful.
- Add comments around user_event_mm struct usage and locking
requirements.
Timerlat tracer:
- Fix missed wakeup of timerlat thread caused by the timerlat
interrupt triggering when tracing is off. The timer interrupt
handler needs to always wake up the timerlat thread regardless if
tracing is enabled or not, otherwise, it will never wake up.
Histograms:
- Fix regression of breaking the "stacktrace" modifier for variables.
That modifier cannot be used for values, but can be used for
variables that are passed from one histogram to the next. This was
broken when adding the restriction to values as the variable logic
used the same code.
- Rename the special field "stacktrace" to "common_stacktrace".
Special fields (that are not actually part of the event, but can
act just like event fields, like 'comm' and 'timestamp') should be
prefixed with 'common_' for consistency. To keep backward
compatibility, 'stacktrace' can still be used (as with the special
field 'cpu'), but can be overridden if the event has a field called
'stacktrace'.
- Update the synthetic event selftests to use the new name (synthetic
events are created by histograms)
Tracing bootup selftests:
- Reorganize the code to keep artifacts of the selftests not compiled
in when selftests are not configured.
- Add various cond_resched() around the selftest code, as the
softlock watchdog was triggering much more often. It appears that
the kernel runs slower now with full debugging enabled.
- While debugging ftrace with ftrace (using an instance ring buffer
instead of the top level one), I found that the selftests were
disabling prints to the debug instance.
This should not happen, as the selftests only disable printing to
the main buffer as the selftests examine the main buffer to see if
it has what it expects, and prints can make the tests fail.
Make the selftests only disable printing to the toplevel buffer,
and leave the instance buffers alone"
* tag 'trace-v6.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: Have function_graph selftest call cond_resched()
tracing: Only make selftest conditionals affect the global_trace
tracing: Make tracing_selftest_running/delete nops when not used
tracing: Have tracer selftests call cond_resched() before running
tracing: Move setting of tracing_selftest_running out of register_tracer()
tracing/selftests: Update synthetic event selftest to use common_stacktrace
tracing: Rename stacktrace field to common_stacktrace
tracing/histograms: Allow variables to have some modifiers
tracing/user_events: Document user_event_mm one-shot list usage
tracing/user_events: Rename link fields for clarity
tracing/user_events: Remove RCU lock while pinning pages
tracing/user_events: Split up mm alloc and attach
tracing/timerlat: Always wakeup the timerlat thread
tracing/user_events: Use long vs int for atomic bit ops
Pull crypto fix from Herbert Xu:
"Fix an alignment crash in x86/aria"
* tag 'v6.4-p3' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: x86/aria - Use 16 byte alignment for GFNI constant vectors
This reverts commit 9828ed3f69.
Sadly, it does seem to cause failures to load modules. Johan Hovold reports:
"This change breaks module loading during boot on the Lenovo Thinkpad
X13s (aarch64).
Specifically it results in indefinite probe deferral of the display
and USB (ethernet) which makes it a pain to debug. Typing in the dark
to acquire some logs reveals that other modules are missing as well"
Since this was applied late as a "let's try this", I'm reverting it
asap, and we can try to figure out what goes wrong later. The excessive
parallel module loading problem is annoying, but not noticeable in
normal situations, and this was only meant as an optimistic workaround
for a user-space bug.
One possible solution may be to do the optimistic exclusive open first,
and then use a lock to serialize loading if that fails.
Reported-by: Johan Hovold <johan@kernel.org>
Link: https://lore.kernel.org/lkml/ZHRpH-JXAxA6DnzR@hovoldconsulting.com/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If the boot firmware has already established tunnels, especially ones
that have special requirements from the link such as DisplayPort, we
should not blindly enable CL states (nor change the TMU configuration).
Otherwise the existing tunnels may not work as expected.
For this reason, skip the CL state enabling when we go over the existing
topology. This will also keep the TMU settings untouched because we do
not change the TMU configuration when CL states are not enabled.
Reported-by: Koba Ko <koba.ko@canonical.com>
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/7831
Cc: stable@vger.kernel.org # v6.0+
Acked-By: Yehezkel Bernat <YehezkelShB@gmail.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
It turns out that when plugging in VGA cable through USB-C to VGA/DVI
dongle the Connection Manager handshake can take longer time, at least
on Intel Titan Ridge based docks such as Dell WD91TB. This leads to
following error in the dmesg:
thunderbolt 0000:00:0d.3: 3:10: DP tunnel activation failed, aborting
and the display stays blank (because we failed to establish the tunnel).
For this reason increase the timeout to 3s.
Reported-by: Koba Ko <koba.ko@canonical.com>
Cc: stable@vger.kernel.org
Acked-By: Yehezkel Bernat <YehezkelShB@gmail.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
When all kernel debugging is enabled (lockdep, KSAN, etc), the function
graph enabling and disabling can take several seconds to complete. The
function_graph selftest enables and disables function graph tracing
several times. With full debugging enabled, the soft lockup watchdog was
triggering because the selftest was running without ever scheduling.
Add cond_resched() throughout the test to make sure it does not trigger
the soft lockup detector.
Link: https://lkml.kernel.org/r/20230528051742.1325503-6-rostedt@goodmis.org
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
The tracing_selftest_running and tracing_selftest_disabled variables were
to keep trace_printk() and other writes from affecting the tracing
selftests, as the tracing selftests would examine the ring buffer to see
if it contained what it expected or not. trace_printk() and friends could
add to the ring buffer and cause the selftests to fail (and then disable
the tracer that was being tested). To keep that from happening, these
variables were added and would keep trace_printk() and friends from
writing to the ring buffer while the tests were going on.
But this was only the top level ring buffer (owned by the global_trace
instance). There is no reason to prevent writing into ring buffers of
other instances via the trace_array_printk() and friends. For the
functions that could be used by other instances, check if the global_trace
is the tracer instance that is being written to before deciding to not
allow the write.
Link: https://lkml.kernel.org/r/20230528051742.1325503-5-rostedt@goodmis.org
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
As there are more and more internal selftests being added to the Linux
kernel (KSAN, lockdep, etc) the selftests are taking longer to run when
these are enabled. Add a cond_resched() to the calling of
do_run_tracer_selftest() to force a schedule if NEED_RESCHED is set,
otherwise the soft lockup watchdog may trigger on boot up.
Link: https://lkml.kernel.org/r/20230528051742.1325503-3-rostedt@goodmis.org
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
The variables tracing_selftest_running and tracing_selftest_disabled are
only used for when CONFIG_FTRACE_STARTUP_TEST is enabled. Make them only
visible within the selftest code. The setting of those variables are in
the register_tracer() call, and set in a location where they do not need
to be. Create a wrapper around run_tracer_selftest() called
do_run_tracer_selftest() which sets those variables, and have
register_tracer() call that instead.
Having those variables only set within the CONFIG_FTRACE_STARTUP_TEST
scope gets rid of them (and also the ability to remove testing against
them) when the startup tests are not enabled (most cases).
Link: https://lkml.kernel.org/r/20230528051742.1325503-2-rostedt@goodmis.org
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Pull dmaengine fixes from Vinod Koul:
"Driver fixes for the at-hdmac, pl330, TI and IDXD drivers:
- AT HDMAC driver fixes for Flow Controller bitfield, peripheral ID
handling and potential NULL dereference check
- PL330 function rename to avoid conflicts
- build warning fix for pm function in TI driver
- IDXD driver fix for passing freed memory"
* tag 'dmaengine-fix-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine:
dmaengine: at_hdmac: Extend the Flow Controller bitfield to three bits
dmaengine: at_hdmac: Repair bitfield macros for peripheral ID handling
dmaengine: pl330: rename _start to prevent build error
dmaengine: at_xdmac: fix potential Oops in at_xdmac_prep_interleaved()
dmaengine: ti: k3-udma: annotate pm function with __maybe_unused
dmaengine: idxd: Fix passing freed memory in idxd_cdev_open()
Jonathan writes:
1st set of IIO fixes for the 6.4 cycle.
Usual mixed bag of issues in new code for this cycle and old issues
that have surfaced in the last few weeks.
- adi,ad_sigma_delta
* Ensure irq lazy disable handing is not used as it breaks completion
detection.
- adi,ad4130
* Fix failure to remove clock provider.
- adi,ad5758
* Wrong CONFIG variable used to control driver build.
- adi,ad7192
* Fix repeated channel index by just expressing shorted channels
as differential channel between a channel and itself.
- adi,ad74413
* Fix error handling for resistance input processing to not fail
in case of success.
- rohm,bu27034
* Fix integration time in wrong units (should be seconds not usecs)
* Ensure reset is actually written not detected as already set from
regcache.
- gts helper
* Fix wrong parameter docs.
* Fix integration time in wrong units (should be seconds not usecs)
- fsl,imx8qxp-adc
* Add missing vref-supply to binding doc (already used by driver)
- fsl,imx93
* Fix sign bug in read_raw() so that error check didn't work.
- inv,icm42600
* Fix reset of timestamp to work even if a particular sensor is off when
the chip is first enabled.
- kionix,kx022a
* Fix irq get form fw node to not include the 0 value.
- microchip,mcp4725
* Fix return value from i2c_master_send() handling to nto assume 0 on
success.
- mediatek,mt6370
* Fix incorrect scaling of a few currents on devices with particular
vendor IDs.
- fsl,mxs-lradc
* Cleanup ordering issue fix.
- renesas,rcar-adc bindings
* Fix missing vendor prefix for adi,ad7476
- st,st_accel
* Fix handling when no ACPI _ONT method present.
- st,stm32-adc
* Handle no adc-diff-channel present case (all single ended)
* Handle no adc-channels present case (all differential)
- ti,palmas
* Fix off by one bug that could allow out of bounds read if callers
provided wrong value.
- ti,tmag5273
* Fix a runtime PM leak on measurement error
- vishay,vcnl4035
* Correctly mask chip ID so devices with different addresses
don't fail the test.
* tag 'iio-fixes-for-6.4a' of https://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio: (23 commits)
iio: imu: inv_icm42600: fix timestamp reset
iio: adc: ad_sigma_delta: Fix IRQ issue by setting IRQ_DISABLE_UNLAZY flag
dt-bindings: iio: adc: renesas,rcar-gyroadc: Fix adi,ad7476 compatible value
iio: dac: mcp4725: Fix i2c_master_send() return value handling
iio: accel: kx022a fix irq getting
iio: bu27034: Ensure reset is written
iio: dac: build ad5758 driver when AD5758 is selected
iio: addac: ad74413: fix resistance input processing
iio: light: vcnl4035: fixed chip ID check
dt-bindings: iio: imx8qxp-adc: add missing vref-supply
iio: adc: stm32-adc: skip adc-channels setup if none is present
iio: adc: stm32-adc: skip adc-diff-channels setup if none is present
iio: adc: ad7192: Change "shorted" channels to differential
iio: accel: st_accel: Fix invalid mount_matrix on devices without ACPI _ONT method
iio: gts-helpers: fix integration time units
iio: bu27034: Fix integration time
iio: fix doc for iio_gts_find_sel_by_int_time
iio: adc: palmas: fix off by one bugs
iio: adc: mxs-lradc: fix the order of two cleanup operations
iio: ad4130: Make sure clock provider gets removed
...
The macOS hypervisor framework includes a host-side VMM called
VZLinuxBootLoader [1] which implements native support for booting the
Linux kernel inside a guest directly (instead of, e.g., via GRUB
installed inside the guest). On x86, it incorporates a BIOS style loader
that does not implement or expose EFI to the loaded kernel. However,
this loader appears to fail when the 'image minor version' field in the
kernel image's PE/COFF header (which is generally only used by EFI based
bootloaders) is set to any value other than 0x0. [2]
Commit e346bebbd3 ("efi: libstub: Always enable initrd command
line loader and bump version") incremented the EFI stub image minor
version to convey that all EFI stub kernels now implement support for
the initrd= command line option, and do so in a way where it can load
initrd images from any filesystem known to the EFI firmware (as opposed
to prior implementations that could only load initrds from the same
volume that the kernel image was loaded from).
Unfortunately, bumping the version to v1.1 triggers this issue in
VZLinuxBootLoader, breaking the boot on x86. So let's keep the image
minor version at 0x0, and bump the image major version instead.
While at it, convert this field to a bit field, so that individual
features are discoverable from it, as suggested by Linus. So let's bump
the major version to v3, and document the initrd= command line loading
feature as being represented by bit 1 in the mask.
Note that, due to the prior interpretation as a monotonically increasing
version field, loaders are still permitted to assume that the LoadFile2
initrd loading feature is supported for any major version value >= 1,
even if bit 0 is not set.
[1] https://developer.apple.com/documentation/virtualization/vzlinuxbootloader
[2] https://lore.kernel.org/linux-efi/CAG8fp8Teu4G9JuenQrqGndFt2Gy+V4YgJ=hN1xX7AD940YKf3A@mail.gmail.com/
Fixes: e346bebbd3 ("efi: libstub: Always enable initrd command ...")
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217485
Signed-off-by: Akihiro Suda <suda.kyoto@gmail.com>
[ardb: rewrite comment and commit log]
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Pull x86 cpu fix from Thomas Gleixner:
"A single fix for x86:
- Prevent a bogus setting for the number of HT siblings, which is
caused by the CPUID evaluation trainwreck of X86. That recomputes
the value for each CPU, so the last CPU "wins". That can cause
completely bogus sibling values"
* tag 'x86-urgent-2023-05-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/topology: Fix erroneous smp_num_siblings on Intel Hybrid platforms
Pull perf fixes from Thomas Gleixner:
"A small set of perf fixes:
- Make the MSR-readout based CHA discovery work around broken
discovery tables in some SPR firmwares.
- Prevent saving PEBS configuration which has software bits set that
cause a crash when restored into the relevant MSR"
* tag 'perf-urgent-2023-05-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/uncore: Correct the number of CHAs on SPR
perf/x86/intel: Save/restore cpuc->active_pebs_data_cfg when using guest PEBS
Pull unwinder fixes from Thomas Gleixner:
"A set of unwinder and tooling fixes:
- Ensure that the stack pointer on x86 is aligned again so that the
unwinder does not read past the end of the stack
- Discard .note.gnu.property section which has a pointlessly
different alignment than the other note sections. That confuses
tooling of all sorts including readelf, libbpf and pahole"
* tag 'objtool-urgent-2023-05-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/show_trace_log_lvl: Ensure stack pointer is aligned, again
vmlinux.lds.h: Discard .note.gnu.property section
Pull debugobjects fixes from Thomas Gleixner:
"Two fixes for debugobjects:
- Prevent the allocation path from waking up kswapd.
That's a long standing issue due to the GFP_ATOMIC allocation flag.
As debug objects can be invoked from pretty much any context waking
kswapd can end up in arbitrary lock chains versus the waitqueue
lock
- Correct the explicit lockdep wait-type violation in
debug_object_fill_pool()"
* tag 'core-debugobjects-2023-05-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
debugobjects: Don't wake up kswapd from fill_pool()
debugobjects,locking: Annotate debug_object_fill_pool() wait type violation
Pull irq fixes from Thomas Gleixner:
"A set of fixes for interrupt chip drivers:
- Prevent loss of state in the MIPS GIC interrupt controller
- Disable pseudo NMIs on Mediatek based Chromebooks as they have
firmware issues which cause instantenous chrashes and freezes wen
pseudo NMIs are used
- Fix the error handling path in the MBIGEN driver and a defined but
not used warning in the meson-gpio interrupt chip driver"
* tag 'irq-urgent-2023-05-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/mbigen: Unify the error handling in mbigen_of_create_domain()
irqchip/meson-gpio: Mark OF related data as maybe unused
irqchip/mips-gic: Use raw spinlock for gic_lock
irqchip/mips-gic: Don't touch vl_map if a local interrupt is not routable
irqchip/gic-v3: Disable pseudo NMIs on Mediatek devices w/ firmware issues
dt-bindings: interrupt-controller: arm,gic-v3: Add quirk for Mediatek SoCs w/ broken FW
Pull MIPS fixes from Thomas Bogendoerfer:
- fixes to get alchemy platform back in shape
- fix for initrd detection
* tag 'mips-fixes_6.4_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
mips: Move initrd_start check after initrd address sanitisation.
MIPS: Alchemy: fix dbdma2
MIPS: Restore Au1300 support
MIPS: unhide PATA_PLATFORM
Pull powerpc fix from Michael Ellerman:
- Reinstate ARCH_FORCE_MAX_ORDER ranges to fix various breakage
* tag 'powerpc-6.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/mm: Reinstate ARCH_FORCE_MAX_ORDER ranges
Pull xen fixes from Juergen Gross:
- a double free fix in the Xen pvcalls backend driver
- a fix for a regression causing the MSI related sysfs entries to not
being created in Xen PV guests
- a fix in the Xen blkfront driver for handling insane input data
better
* tag 'for-linus-6.4-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
x86/pci/xen: populate MSI sysfs entries
xen/pvcalls-back: fix double frees with pvcalls_new_active_socket()
xen/blkfront: Only check REQ_FUA for writes
Pull char/misc fixes from Greg KH:
"Here are some small driver fixes for 6.4-rc4. They are just two
different types:
- binder fixes and reverts for reported problems and regressions in
the binder "driver".
- coresight driver fixes for reported problems.
All of these have been in linux-next for over a week with no reported
problems"
* tag 'char-misc-6.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
binder: fix UAF of alloc->vma in race with munmap()
binder: add lockless binder_alloc_(set|get)_vma()
Revert "android: binder: stop saving a pointer to the VMA"
Revert "binder_alloc: add missing mmap_lock calls when using the VMA"
binder: fix UAF caused by faulty buffer cleanup
coresight: perf: Release Coresight path when alloc trace id failed
coresight: Fix signedness bug in tmc_etr_buf_insert_barrier_packet()
Add the assigned-clocks and assigned-clock-rates properties for the
LPUARTx nodes. Without these properties, the default clock rate
used would be 0, which can cause the UART ports to fail when open.
Fixes: 35f4e9d753 ("arm64: dts: imx8: split adma ss into dma and audio ss")
Signed-off-by: Shenwei Wang <shenwei.wang@nxp.com>
Reviewed-by: Fabio Estevam <festevam@gmail.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
After commit b8a1a4cd5a ("i2c: Provide a temporary .probe_new()
call-back type"), all drivers being converted to .probe_new() and then
03c835f498 ("i2c: Switch .probe() to not take an id parameter")
convert back to (the new) .probe() to be able to eventually drop
.probe_new() from struct i2c_driver.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Helge Deller <deller@gmx.de>
We don't need to set the list iterators to NULL before a
list_for_each_entry() loop because they are assigned inside the
macro.
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Mukesh Ojha <quic_mojha@quicinc.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
last component point filename struct. Currently putname is called after
vfs_path_parent_lookup(). And then last component is used for
lookup_one_qstr_excl(). name in last component is freed by previous
calling putname(). And It cause file lookup failure when testing
generic/464 test of xfstest.
Fixes: 74d7970feb ("ksmbd: fix racy issue from using ->d_parent and ->d_name")
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
If filesystem support sparse file, ksmbd should return allocated size
using ->i_blocks instead of stat->size. This fix generic/694 xfstests.
Cc: stable@vger.kernel.org
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
If opinfo->conn is another connection and while ksmbd send oplock break
request to cient on current connection, The connection for opinfo->conn
can be disconnect and conn could be freed. When sending oplock break
request, this ksmbd_conn can be used and cause user-after-free issue.
When getting opinfo from the list, ksmbd check connection is being
released. If it is not released, Increase ->r_count to wait that connection
is freed.
Cc: stable@vger.kernel.org
Reported-by: Per Forlin <per.forlin@axis.com>
Tested-by: Per Forlin <per.forlin@axis.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Check the remaining data length before accessing the context structure
to ensure that the entire structure is contained within the packet.
Additionally, since the context data length `ctxt_len` has already been
checked against the total packet length `len_of_ctxts`, update the
comparison to use `ctxt_len`.
Cc: stable@vger.kernel.org
Signed-off-by: Kuan-Ting Chen <h3xrabbit@gmail.com>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
This patch fix the failure from smb2.credits.single_req_credits_granted
test. When client send 8192 credit request, ksmbd return 8191 credit
granted. ksmbd should give maximum possible credits that must be granted
within the range of not exceeding the max credit to client.
Cc: stable@vger.kernel.org
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
There is a case that file_present is true and path is uninitialized.
This patch change file_present is set to false by default and set to
true when patch is initialized.
Fixes: 74d7970feb ("ksmbd: fix racy issue from using ->d_parent and ->d_name")
Reported-by: Coverity Scan <scan-admin@coverity.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Uninitialized rd.delegated_inode can be used in vfs_rename().
Fix this by setting rd.delegated_inode to NULL to avoid the uninitialized
read.
Fixes: 74d7970feb ("ksmbd: fix racy issue from using ->d_parent and ->d_name")
Reported-by: Coverity Scan <scan-admin@coverity.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Pull compute express link fixes from Dan Williams:
"The 'media ready' series prevents the driver from acting on bad
capacity information, and it moves some checks earlier in the init
sequence which impacts topics in the queue for 6.5.
Additional hotplug testing uncovered a missing enable for memory
decode. A debug crash fix is also included.
Summary:
- Stop trusting capacity data before the "media ready" indication
- Add missing HDM decoder capability enable for the cold-plug case
- Fix a debug message induced crash"
* tag 'cxl-fixes-6.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
cxl: Explicitly initialize resources when media is not ready
cxl/port: Fix NULL pointer access in devm_cxl_add_port()
cxl: Move cxl_await_media_ready() to before capacity info retrieval
cxl: Wait Memory_Info_Valid before access memory related info
cxl/port: Enable the HDM decoder capability for switch ports
Pull ARM SoC fixes from Arnd Bergmann:
"There have not been a lot of fixes for for the soc tree in 6.4, but
these have been sitting here for too long.
For the devicetree side, there is one minor warning fix for vexpress,
the rest all all for the the NXP i.MX platforms: SoC specific bugfixes
for the iMX8 clocks and its USB-3.0 gadget device, as well as board
specific fixes for regulators and the phy on some of the i.MX boards.
The microchip risc-v and arm32 maintainers now also add a shared
maintainer file entry for the arm64 parts.
The remaining fixes are all for firmware drivers, addressing mistakes
in the optee, scmi and ff-a firmware driver implementation, mostly in
the error handling code, incorrect use of the alloc_workqueue()
interface in SCMI, and compatibility with corner cases of the firmware
implementation"
* tag 'arm-fixes-6.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
MAINTAINERS: update arm64 Microchip entries
arm64: dts: imx8: fix USB 3.0 Gadget Failure in QM & QXPB0 at super speed
dt-binding: cdns,usb3: Fix cdns,on-chip-buff-size type
arm64: dts: colibri-imx8x: delete adc1 and dsp
arm64: dts: colibri-imx8x: fix iris pinctrl configuration
arm64: dts: colibri-imx8x: move pinctrl property from SoM to eval board
arm64: dts: colibri-imx8x: fix eval board pin configuration
arm64: dts: imx8mp: Fix video clock parents
ARM: dts: imx6qdl-mba6: Add missing pvcie-supply regulator
ARM: dts: imx6ull-dhcor: Set and limit the mode for PMIC buck 1, 2 and 3
arm64: dts: imx8mn-var-som: fix PHY detection bug by adding deassert delay
arm64: dts: imx8mn: Fix video clock parents
firmware: arm_ffa: Set reserved/MBZ fields to zero in the memory descriptors
firmware: arm_ffa: Fix FFA device names for logical partitions
firmware: arm_ffa: Fix usage of partition info get count flag
firmware: arm_ffa: Check if ffa_driver remove is present before executing
arm64: dts: arm: add missing cache properties
ARM: dts: vexpress: add missing cache properties
firmware: arm_scmi: Fix incorrect alloc_workqueue() invocation
optee: fix uninited async notif value
Pull PCI fix from Bjorn Helgaas:
- Quirk Ice Lake Root Ports to work around DPC log size issue (Mika
Westerberg)
* tag 'pci-v6.4-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
PCI/DPC: Quirk PIO log size for Intel Ice Lake Root Ports
Pull VFIO fix from Alex Williamson:
- Test for and return error for invalid pfns through the pin pages
interface (Yan Zhao)
* tag 'vfio-v6.4-rc4' of https://github.com/awilliam/linux-vfio:
vfio/type1: check pfn valid before converting to struct page
Pull MD fix from Song:
"This change fixes a raid5 regression since 5.12 kernels."
* tag 'md-fixes-2023-05-24' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md:
md/raid5: fix miscalculation of 'end_sector' in raid5_read_one_chunk()
Pull block fixes from Jens Axboe:
"A few fixes for the storage side of things:
- Fix bio caching condition for passthrough IO (Anuj)
- end-of-device check fix for zero sized devices (Christoph)
- Update Paolo's email address
- NVMe pull request via Keith with a single quirk addition
- Fix regression in how wbt enablement is done (Yu)
- Fix race in active queue accounting (Tian)"
* tag 'block-6.4-2023-05-26' of git://git.kernel.dk/linux:
NVMe: Add MAXIO 1602 to bogus nid list.
block: make bio_check_eod work for zero sized devices
block: fix bio-cache for passthru IO
block, bfq: update Paolo's address in maintainer list
blk-mq: fix race condition in active queue accounting
blk-wbt: fix that wbt can't be disabled by default
Pull io_uring fix from Jens Axboe:
"Just a single fix for the conditional schedule with the SQPOLL thread,
dropping the uring_lock if we do need to reschedule"
* tag 'io_uring-6.4-2023-05-26' of git://git.kernel.dk/linux:
io_uring: unlock sqd->lock before sq thread release CPU
When compiling on a MIPS 64-bit machine we get these warnings:
In file included from ./arch/mips/include/asm/cacheflush.h:13,
from ./include/linux/cacheflush.h:5,
from ./include/linux/highmem.h:8,
from ./include/linux/bvec.h:10,
from ./include/linux/blk_types.h:10,
from ./include/linux/blkdev.h:9,
from fs/btrfs/disk-io.c:7:
fs/btrfs/disk-io.c: In function ‘csum_tree_block’:
fs/btrfs/disk-io.c:100:34: error: array subscript 1 is above array bounds of ‘struct page *[1]’ [-Werror=array-bounds]
100 | kaddr = page_address(buf->pages[i]);
| ~~~~~~~~~~^~~
./include/linux/mm.h:2135:48: note: in definition of macro ‘page_address’
2135 | #define page_address(page) lowmem_page_address(page)
| ^~~~
cc1: all warnings being treated as errors
We can check if i overflows to solve the problem. However, this doesn't make
much sense, since i == 1 and num_pages == 1 doesn't execute the body of the loop.
In addition, i < num_pages can also ensure that buf->pages[i] will not cross
the boundary. Unfortunately, this doesn't help with the problem observed here:
gcc still complains.
To fix this add a compile-time condition for the extent buffer page
array size limit, which would eventually lead to eliminating the whole
for loop.
CC: stable@vger.kernel.org # 5.10+
Signed-off-by: pengfuyuan <pengfuyuan@kylinos.cn>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This fixes the following warning reported by gcc 10.2.1 under x86_64:
../fs/btrfs/tree-log.c: In function ‘btrfs_log_inode’:
../fs/btrfs/tree-log.c:6211:9: error: ‘last_range_start’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
6211 | ret = insert_dir_log_key(trans, log, path, key.objectid,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
6212 | first_dir_index, last_dir_index);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../fs/btrfs/tree-log.c:6161:6: note: ‘last_range_start’ was declared here
6161 | u64 last_range_start;
| ^~~~~~~~~~~~~~~~
This might be a false positive fixed in later compiler versions but we
want to have it fixed.
Reported-by: k2ci <kernel-bot@kylinos.cn>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Shida Zhang <zhangshida@kylinos.cn>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When I implemented the storage layer bio splitting, I was under the
assumption that we'll never split metadata bios. But Qu reminded me that
this can actually happen with very old file systems with unaligned
metadata chunks and RAID0.
I still haven't seen such a case in practice, but we better handled this
case, especially as it is fairly easily to do not calling the ->end_іo
method directly in btrfs_end_io_work, and using the proper
btrfs_orig_bbio_end_io helper instead.
In addition to the old file system with unaligned metadata chunks case
documented in the commit log, the combination of the new scrub code
with Johannes pending raid-stripe-tree also triggers this case. We
spent some time debugging it and found that this patch solves
the problem.
Fixes: 103c19723c ("btrfs: split the bio submission path into a separate file")
CC: stable@vger.kernel.org # 6.3+
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Tested-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David Sterba <dsterba@suse.com>
Syzkaller got a lot of crashes like:
KASAN: use-after-free Write in *_timers*
All of these crashes point to the same memory area:
The buggy address belongs to the object at ffff88801f870000
which belongs to the cache kmalloc-8k of size 8192
The buggy address is located 5320 bytes inside of
8192-byte region [ffff88801f870000, ffff88801f872000)
This area belongs to :
batadv_priv->batadv_priv_dat->delayed_work->timer_list
The reason for these issues is the lack of synchronization. Delayed
work (batadv_dat_purge) schedules new timer/work while the device
is being deleted. As the result new timer/delayed work is set after
cancel_delayed_work_sync() was called. So after the device is freed
the timer list contains pointer to already freed memory.
Found by Linux Verification Center (linuxtesting.org) with syzkaller.
Cc: stable@kernel.org
Fixes: 2f1dfbe185 ("batman-adv: Distributed ARP Table - implement local storage")
Signed-off-by: Vladislav Efanov <VEfanov@ispras.ru>
Acked-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Pull thermal control fix from Rafael Wysocki:
"Fix a regression introduced inadvertently during the 6.3 cycle by a
commit making the Intel int340x thermal driver use sysfs_emit_at()
instead of scnprintf() (Srinivas Pandruvada)"
* tag 'thermal-6.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
thermal: intel: int340x: Add new line for UUID display
Pull power management fixes from Rafael Wysocki:
"Fix three issues related to the ->fast_switch callback in the AMD
P-state cpufreq driver (Gautham R. Shenoy and Wyes Karny)"
* tag 'pm-6.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: amd-pstate: Update policy->cur in amd_pstate_adjust_perf()
cpufreq: amd-pstate: Remove fast_switch_possible flag from active driver
cpufreq: amd-pstate: Add ->fast_switch() callback
When media is not ready do not assume that the capacity information from
the identify command is valid, i.e. ->total_bytes
->partition_align_bytes ->{volatile,persistent}_only_bytes. Explicitly
zero out the capacity resources and exit early.
Given zero-init of those fields this patch is functionally equivalent to
the prior state, but it improves readability and robustness going
forward.
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/168506118166.3004974.13523455340007852589.stgit@djiang5-mobl3
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Pull gpio fixes from Bartosz Golaszewski:
- fix incorrect output in in-tree gpio tools
- fix a shell coding issue in gpio-sim selftests
- correctly set the permissions for debugfs attributes exposed by
gpio-mockup
- fix chip name and pin count in gpio-f7188x for one of the supported
models
- fix numberspace pollution when using dynamically and statically
allocated GPIOs together
* tag 'gpio-fixes-for-v6.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpio-f7188x: fix chip name and pin count on Nuvoton chip
gpiolib: fix allocation of mixed dynamic/static GPIOs
gpio: mockup: Fix mode of debugfs files
selftests: gpio: gpio-sim: Fix BUG: test FAILED due to recent change
tools: gpio: fix debounce_period_us output of lsgpio
Pull btrfs fixes from David Sterba:
- handle memory allocation error in checksumming helper (reported by
syzbot)
- fix lockdep splat when aborting a transaction, add NOFS protection
around invalidate_inode_pages2 that could allocate with GFP_KERNEL
- reduce chances to hit an ENOSPC during scrub with RAID56 profiles
* tag 'for-6.4-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: use nofs when cleaning up aborted transactions
btrfs: handle memory allocation failure in btrfs_csum_one_bio
btrfs: scrub: try harder to mark RAID56 block groups read-only
Pull drm fixes from Dave Airlie:
"This week's collection is pretty spread out, accel/qaic has a bunch of
fixes, amdgpu, then lots of single fixes across a bunch of places.
core:
- fix drmm_mutex_init lock class
mgag200:
- fix gamma lut initialisation
pl111:
- fix FB depth on IMPD-1 framebuffer
amdgpu:
- Fix missing BO unlocking in KIQ error path
- Avoid spurious secure display error messages
- SMU13 fix
- Fix an OD regression
- GPU reset display IRQ warning fix
- MST fix
radeon:
- Fix a DP regression
i915:
- PIPEDMC disabling fix for bigjoiner config
panel:
- fix aya neo air plus quirk
sched:
- remove redundant NULL check
qaic:
- fix NNC message corruption
- Grab ch_lock during QAIC_ATTACH_SLICE_BO
- Flush the transfer list again
- Validate if BO is sliced before slicing
- Validate user data before grabbing any lock
- initialize ret variable to 0
- silence some uninitialized variable warnings"
* tag 'drm-fixes-2023-05-26' of git://anongit.freedesktop.org/drm/drm:
drm/amd/display: Have Payload Properly Created After Resume
drm/amd/display: Fix warning in disabling vblank irq
drm/amd/pm: Fix output of pp_od_clk_voltage
drm/amd/pm: add missing NotifyPowerSource message mapping for SMU13.0.7
drm/radeon: reintroduce radeon_dp_work_func content
drm/amdgpu: don't enable secure display on incompatible platforms
drm:amd:amdgpu: Fix missing buffer object unlock in failure path
accel/qaic: Fix NNC message corruption
accel/qaic: Grab ch_lock during QAIC_ATTACH_SLICE_BO
accel/qaic: Flush the transfer list again
accel/qaic: Validate if BO is sliced before slicing
accel/qaic: Validate user data before grabbing any lock
accel/qaic: initialize ret variable to 0
drm/i915: Fix PIPEDMC disabling for a bigjoiner configuration
drm: fix drmm_mutex_init()
drm/sched: Remove redundant check
drm: panel-orientation-quirks: Change Air's quirk to support Air Plus
accel/qaic: silence some uninitialized variable warnings
drm/pl111: Fix FB depth on IMPD-1 framebuffer
drm/mgag200: Fix gamma lut not initialized.
I tried to streamline our user memory copy code fairly aggressively in
commit adfcf4231b ("x86: don't use REP_GOOD or ERMS for user memory
copies"), in order to then be able to clean up the code and inline the
modern FSRM case in commit 577e6a7fd5 ("x86: inline the 'rep movs' in
user copies for the FSRM case").
We had reports [1] of that causing regressions earlier with blogbench,
but that turned out to be a horrible benchmark for that case, and not a
sufficient reason for re-instating "rep movsb" on older machines.
However, now Eric Dumazet reported [2] a regression in performance that
seems to be a rather more real benchmark, where due to the removal of
"rep movs" a TCP stream over a 100Gbps network no longer reaches line
speed.
And it turns out that with the simplified the calling convention for the
non-FSRM case in commit 427fda2c8a ("x86: improve on the non-rep
'copy_user' function"), re-introducing the ERMS case is actually fairly
simple.
Of course, that "fairly simple" is glossing over several missteps due to
having to fight our assembler alternative code. This code really wanted
to rewrite a conditional branch to have two different targets, but that
made objtool sufficiently unhappy that this instead just ended up doing
a choice between "jump to the unrolled loop, or use 'rep movsb'
directly".
Let's see if somebody finds a case where the kernel memory copies also
care (see commit 68674f94ff: "x86: don't use REP_GOOD or ERMS for
small memory copies"). But Eric does argue that the user copies are
special because networking tries to copy up to 32KB at a time, if
order-3 pages allocations are possible.
In-kernel memory copies are typically small, unless they are the special
"copy pages at a time" kind that still use "rep movs".
Link: https://lore.kernel.org/lkml/202305041446.71d46724-yujie.liu@intel.com/ [1]
Link: https://lore.kernel.org/lkml/CANn89iKUbyrJ=r2+_kK+sb2ZSSHifFZ7QkPLDpAtkJ8v4WUumA@mail.gmail.com/ [2]
Reported-and-tested-by: Eric Dumazet <edumazet@google.com>
Fixes: adfcf4231b ("x86: don't use REP_GOOD or ERMS for user memory copies")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
'struct evsel' uses a union for the two lists. This turned out to be
error prone.
For example:
[root@quaco ~]# perf stat --bpf-prog 5246
Error: cpu-clock event does not have sample flags 66c660
failed to set filter "(null)" on event cpu-clock with 2 (No such file or directory)
[root@quaco ~]# perf stat --bpf-prog 5246
Fix this issue by separating the two lists.
Fixes: 56ec9457a4 ("perf bpf filter: Implement event sample filtering")
Signed-off-by: Song Liu <song@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: kernel-team@meta.com
Link: https://lore.kernel.org/r/20230519235757.3613719-1-song@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
llvm-strip is not really required. Remove this dependency to make it
easier to build perf with BUILD_BPF_SKEL=1.
Committer notes:
This removes the need for the 'llvm' package just to get llvm-strip.
Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Song Liu <song@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
A few functions provide an empty interface definition when
CONFIG_MTD_NAND_INGENIC_ECC is disabled, but they are accidentally
defined as global functions in the header:
drivers/mtd/nand/raw/ingenic/ingenic_ecc.h:39:5: error: no previous prototype for 'ingenic_ecc_calculate'
drivers/mtd/nand/raw/ingenic/ingenic_ecc.h:46:5: error: no previous prototype for 'ingenic_ecc_correct'
drivers/mtd/nand/raw/ingenic/ingenic_ecc.h:53:6: error: no previous prototype for 'ingenic_ecc_release'
drivers/mtd/nand/raw/ingenic/ingenic_ecc.h:57:21: error: no previous prototype for 'of_ingenic_ecc_get'
Turn them into 'static inline' definitions instead.
Fixes: 15de8c6efd ("mtd: rawnand: ingenic: Separate top-level and SoC specific code")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Paul Cercueil <paul@crapouillou.net>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20230516202133.559488-1-arnd@kernel.org
Following errors were seen with um-x86_64-gcc12/um-allyesconfig:
+ /kisskb/src/drivers/mtd/spi-nor/spansion.c: error: 'op' is used uninitialized [-Werror=uninitialized]: => 495:27, 364:27
Initialise local struct spi_mem_op with all zeros at declaration in
order to avoid using garbage data for fields that are not explicitly
set afterwards.
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Fixes: c87c9b11c5 ("mtd: spi-nor: spansion: Determine current address mode")
Fixes: 6afcc84080 ("mtd: spi-nor: spansion: Add support for Infineon S25FS256T")
Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20230509193900.948753-1-tudor.ambarus@linaro.org
Pull NVMe fix from Keith:
"nvme fixes for 6.4
One nvme quirk (Tatsuki)"
* tag 'nvme-6.4-2023-05-26' of git://git.infradead.org/nvme:
NVMe: Add MAXIO 1602 to bogus nid list.
Arm FF-A fixes for v6.4
Quite a few fixes to address set of assorted issues:
1. NULL pointer dereference if the ffa driver doesn't provide remove()
callback as it is currently executed unconditionally
2. FF-A core probe failure on systems with v1.0 firmware as the new
partition info get count flag is used unconditionally
3. Failure to register more than one logical partition or service within
the same physical partition as the device name contains only VM ID
which will be same for all but each will have unique UUID.
4. Rejection of certain memory interface transmissions by the receivers
(secure partitions) as few MBZ fields are non-zero due to lack of
explicit re-initialization of those fields
* tag 'ffa-fixes-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux:
firmware: arm_ffa: Set reserved/MBZ fields to zero in the memory descriptors
firmware: arm_ffa: Fix FFA device names for logical partitions
firmware: arm_ffa: Fix usage of partition info get count flag
firmware: arm_ffa: Check if ffa_driver remove is present before executing
Link: https://lore.kernel.org/r/20230509143453.1188753-1-sudeep.holla@arm.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
The option name should not have the dashes. Current version shows four
dashes for the option.
$ perf ftrace latency -h
Usage: perf ftrace [<options>] [<command>]
or: perf ftrace [<options>] -- [<command>] [<options>]
or: perf ftrace {trace|latency} [<options>] [<command>]
or: perf ftrace {trace|latency} [<options>] -- [<command>] [<options>]
-b, --use-bpf Use BPF to measure function latency
-n, ----use-nsec Use nano-second histogram
-T, --trace-funcs <func>
Show latency of given function
Fixes: 84005bb614 ("perf ftrace latency: Add -n/--use-nsec option")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230525212038.3535851-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This patch fixes the error checking in nfcsim.c.
The DebugFS kernel API is developed in
a way that the caller can safely ignore the errors that
occur during the creation of DebugFS nodes.
Signed-off-by: Osama Muhammad <osmtendev@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix a compiler warning:
include/linux/dev_printk.h: In function 'ov2680_probe':
include/linux/dev_printk.h:144:31: warning: 'high' may be used uninitialized [-Wmaybe-uninitialized]
144 | dev_printk_index_wrap(_dev_err, KERN_ERR, dev, dev_fmt(fmt), ##__VA_ARGS__)
| ^~~~~~~~
In function 'ov2680_detect',
inlined from 'ov2680_s_config' at drivers/staging/media/atomisp/i2c/atomisp-ov2680.c:468:8,
inlined from 'ov2680_probe' at drivers/staging/media/atomisp/i2c/atomisp-ov2680.c:647:8:
drivers/staging/media/atomisp/i2c/atomisp-ov2680.c:376:13: note: 'high' was declared here
376 | u32 high, low;
| ^~~~
'high' is indeed uninitialized after the ov_read_reg8() call failed, so there is no
point showing the value. Just say that the read failed. But low can also be used
uninitialized later, so just make it more robust and properly zero the high and low
variables.
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
When a message was received the last_initiator is set to 0xff.
This will force the signal free time for the next transmit
to that for a new initiator. However, if a new transmit is
already in progress, then don't set last_initiator, since
that's the initiator of the current transmit. Overwriting
this would cause the signal free time of a following transmit
to be that of the new initiator instead of a next transmit.
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Explicitly disable the CEC adapter in cec_devnode_unregister()
Usually this does not really do anything important, but for drivers
that use the CEC pin framework this is needed to properly stop the
hrtimer. Without this a crash would happen when such a driver is
unloaded with rmmod.
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
When VCODEC_CAPABILITY_4K_DISABLED is not set in dec_capability, skip
formats that are not MTK_FMT_DEC so only decoder formats is updated in
mtk_init_vdec_params.
Fixes: e25528e1db ("media: mediatek: vcodec: Use 4K frame size when supported by stateful decoder")
Signed-off-by: Pin-yen Lin <treapking@chromium.org>
Reviewed-by: Chen-Yu Tsai <wenst@chromium.org>
Reviewed-by: Yunfei Dong <yunfei.dong@mediatek.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
In an earlier commit, setting the which field of the subdev format struct
in video_get_subdev_format was moved to a designated initializer that also
zeroes all other fields. However, the memset call that was zeroing the
fields earlier was left in place, causing the which field to be cleared
after being set in the initializer.
Remove the memset call from video_get_subdev_format to avoid clearing the
initialized which field.
Fixes: ecefa105cc ("media: Zero-initialize all structures passed to subdev pad operations")
Signed-off-by: Yassine Oudjana <y.oudjana@protonmail.com>
Acked-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
Tested-by: Andrey Konovalov <andrey.konovalov@linaro.org>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
In the event of a change in XGBE mode, the current auto-negotiation
needs to be reset and the AN cycle needs to be re-triggerred. However,
the current code ignores the return value of xgbe_set_mode(), leading to
false information as the link is declared without checking the status
register.
Fix this by propagating the mode switch status information to
xgbe_phy_status().
Fixes: e57f7a3fea ("amd-xgbe: Prepare for working with more than one type of phy")
Co-developed-by: Sudheesh Mavila <sudheesh.mavila@amd.com>
Signed-off-by: Sudheesh Mavila <sudheesh.mavila@amd.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Most protos' poll() methods insert a memory barrier between
writes to sk_err and sk_error_report(). This dates back to
commit a4d258036e ("tcp: Fix race in tcp_poll").
I guess we should do the same thing in TLS, tcp_poll() does
not hold the socket lock.
Fixes: 3c4d755915 ("tls: kernel TLS support")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Saeed Mahameed says:
====================
mlx5 fixes 2023-05-24
This series includes bug fixes for the mlx5 driver.
* tag 'mlx5-fixes-2023-05-24' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
Documentation: net/mlx5: Wrap notes in admonition blocks
Documentation: net/mlx5: Add blank line separator before numbered lists
Documentation: net/mlx5: Use bullet and definition lists for vnic counters description
Documentation: net/mlx5: Wrap vnic reporter devlink commands in code blocks
net/mlx5: Fix check for allocation failure in comp_irqs_request_pci()
net/mlx5: DR, Add missing mutex init/destroy in pattern manager
net/mlx5e: Move Ethernet driver debugfs to profile init callback
net/mlx5e: Don't attach netdev profile while handling internal error
net/mlx5: Fix post parse infra to only parse every action once
net/mlx5e: Use query_special_contexts cmd only once per mdev
net/mlx5: fw_tracer, Fix event handling
net/mlx5: SF, Drain health before removing device
net/mlx5: Drain health before unregistering devlink
net/mlx5e: Do not update SBCM when prio2buffer command is invalid
net/mlx5e: Consider internal buffers size in port buffer calculations
net/mlx5e: Prevent encap offload when neigh update is running
net/mlx5e: Extract remaining tunnel encap code to dedicated file
====================
Link: https://lore.kernel.org/r/20230525034847.99268-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
syzkaller found a data race of pkt_sk(sk)->num.
The value is changed under lock_sock() and po->bind_lock, so we
need READ_ONCE() to access pkt_sk(sk)->num without these locks in
packet_bind_spkt(), packet_bind(), and sk_diag_fill().
Note that WRITE_ONCE() is already added by commit c7d2ef5dd4
("net/packet: annotate accesses to po->bind").
BUG: KCSAN: data-race in packet_bind / packet_do_bind
write (marked) to 0xffff88802ffd1cee of 2 bytes by task 7322 on cpu 0:
packet_do_bind+0x446/0x640 net/packet/af_packet.c:3236
packet_bind+0x99/0xe0 net/packet/af_packet.c:3321
__sys_bind+0x19b/0x1e0 net/socket.c:1803
__do_sys_bind net/socket.c:1814 [inline]
__se_sys_bind net/socket.c:1812 [inline]
__x64_sys_bind+0x40/0x50 net/socket.c:1812
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3b/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
read to 0xffff88802ffd1cee of 2 bytes by task 7318 on cpu 1:
packet_bind+0xbf/0xe0 net/packet/af_packet.c:3322
__sys_bind+0x19b/0x1e0 net/socket.c:1803
__do_sys_bind net/socket.c:1814 [inline]
__se_sys_bind net/socket.c:1812 [inline]
__x64_sys_bind+0x40/0x50 net/socket.c:1812
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3b/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
value changed: 0x0300 -> 0x0000
Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 7318 Comm: syz-executor.4 Not tainted 6.3.0-13380-g7fddb5b5300c #4
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
Fixes: 96ec632714 ("packet: Diag core and basic socket info dumping")
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Reported-by: syzkaller <syzkaller@googlegroups.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20230524232934.50950-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Python 3.9.0 or newer supports combining dicts() with |,
but older versions of Python are still used in the wild
(e.g. on CentOS 8, which goes EoL May 31, 2024).
With Python 3.6.8 we get:
TypeError: unsupported operand type(s) for |: 'dict' and 'dict'
Use older syntax. Tested with non-legacy families only.
Fixes: f036d936ca ("tools: ynl: Add fixed-header support to ynl")
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Tested-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://lore.kernel.org/r/20230524170712.2036128-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Simon Kapadia reported the following issue:
<quote>
The Online Amateur Radio Community (OARC) has recently been experimenting
with building a nationwide packet network in the UK.
As part of our experimentation, we have been testing out packet on 300bps HF,
and playing with net/rom. For HF packet at this baud rate you really need
to make sure that your MTU is relatively low; AX.25 suggests a PACLEN of 60,
and a net/rom PACLEN of 40 to go with that.
However the Linux net/rom support didn't work with a low PACLEN;
the mkiss module would truncate packets if you set the PACLEN below about 200 or so, e.g.:
Apr 19 14:00:51 radio kernel: [12985.747310] mkiss: ax1: truncating oversized transmit packet!
This didn't make any sense to me (if the packets are smaller why would they
be truncated?) so I started investigating.
I looked at the packets using ethereal, and found that many were just huge
compared to what I would expect.
A simple net/rom connection request packet had the request and then a bunch
of what appeared to be random data following it:
</quote>
Simon provided a patch that I slightly revised:
Not only we must not use skb_tailroom(), we also do
not want to count NR_NETWORK_LEN twice.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Co-Developed-by: Simon Kapadia <szymon@kapadia.pl>
Signed-off-by: Simon Kapadia <szymon@kapadia.pl>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Simon Kapadia <szymon@kapadia.pl>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230524141456.1045467-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We encountered a kernel call trace issue which was related to
ndo_xdp_xmit callback on our i.MX8MP platform. The reproduce
steps show as follows.
1. The FEC port (eth0) connects to a PC port, and the PC uses
pktgen_sample03_burst_single_flow.sh to generate packets and
send these packets to the FEC port. Notice that the script must
be executed before step 2.
2. Run the "./xdp_redirect eth0 eth1" command on i.MX8MP, the
eth1 interface is the dwmac. Then there will be a call trace
issue soon. Please see the log for more details.
The root cause is that the NETDEV_XDP_ACT_NDO_XMIT feature is
enabled by default, so when the step 2 command is exexcuted
and packets have already been sent to eth0, the stmmac_xdp_xmit()
starts running before the stmmac_xdp_set_prog() finishes. To
resolve this issue, we disable the NETDEV_XDP_ACT_NDO_XMIT
feature by default and turn on/off this feature when the bpf
program is installed/uninstalled which just like the other
ethernet drivers.
Call Trace log:
[ 306.311271] ------------[ cut here ]------------
[ 306.315910] WARNING: CPU: 0 PID: 15 at lib/timerqueue.c:55 timerqueue_del+0x68/0x70
[ 306.323590] Modules linked in:
[ 306.326654] CPU: 0 PID: 15 Comm: ksoftirqd/0 Not tainted 6.4.0-rc1+ #37
[ 306.333277] Hardware name: NXP i.MX8MPlus EVK board (DT)
[ 306.338591] pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 306.345561] pc : timerqueue_del+0x68/0x70
[ 306.349577] lr : __remove_hrtimer+0x5c/0xa0
[ 306.353777] sp : ffff80000b7c3920
[ 306.357094] x29: ffff80000b7c3920 x28: 0000000000000000 x27: 0000000000000001
[ 306.364244] x26: ffff80000a763a40 x25: ffff0000d0285a00 x24: 0000000000000001
[ 306.371390] x23: 0000000000000001 x22: ffff000179389a40 x21: 0000000000000000
[ 306.378537] x20: ffff000179389aa0 x19: ffff0000d2951308 x18: 0000000000001000
[ 306.385686] x17: f1d3000000000000 x16: 00000000c39c1000 x15: 55e99bbe00001a00
[ 306.392835] x14: 09000900120aa8c0 x13: e49af1d300000000 x12: 000000000000c39c
[ 306.399987] x11: 100055e99bbe0000 x10: ffff8000090b1048 x9 : ffff8000081603fc
[ 306.407133] x8 : 000000000000003c x7 : 000000000000003c x6 : 0000000000000001
[ 306.414284] x5 : ffff0000d2950980 x4 : 0000000000000000 x3 : 0000000000000000
[ 306.421432] x2 : 0000000000000001 x1 : ffff0000d2951308 x0 : ffff0000d2951308
[ 306.428585] Call trace:
[ 306.431035] timerqueue_del+0x68/0x70
[ 306.434706] __remove_hrtimer+0x5c/0xa0
[ 306.438549] hrtimer_start_range_ns+0x2bc/0x370
[ 306.443089] stmmac_xdp_xmit+0x174/0x1b0
[ 306.447021] bq_xmit_all+0x194/0x4b0
[ 306.450612] __dev_flush+0x4c/0x98
[ 306.454024] xdp_do_flush+0x18/0x38
[ 306.457522] fec_enet_rx_napi+0x6c8/0xc68
[ 306.461539] __napi_poll+0x40/0x220
[ 306.465038] net_rx_action+0xf8/0x240
[ 306.468707] __do_softirq+0x128/0x3a8
[ 306.472378] run_ksoftirqd+0x40/0x58
[ 306.475961] smpboot_thread_fn+0x1c4/0x288
[ 306.480068] kthread+0x124/0x138
[ 306.483305] ret_from_fork+0x10/0x20
[ 306.486889] ---[ end trace 0000000000000000 ]---
Fixes: 66c0e13ad2 ("drivers: net: turn on XDP features")
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230524125714.357337-1-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull smb directory moves and client fixes from Steve French:
"Four smb3 client fixes (three of which marked for stable) and three
patches to move of fs/cifs and fs/ksmbd to a new common "fs/smb"
parent directory
- Move the client and server source directories to a common parent
directory:
fs/cifs -> fs/smb/client
fs/ksmbd -> fs/smb/server
fs/smbfs_common -> fs/smb/common
- important readahead fix
- important fix for SMB1 regression
- fix for missing mount option ("mapchars") in mount API conversion
- minor debugging improvement"
* tag '6.4-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb3: move Documentation/filesystems/cifs to Documentation/filesystems/smb
cifs: correct references in Documentation to old fs/cifs path
smb: move client and server files to common directory fs/smb
cifs: mapchars mount option ignored
smb3: display debug information better for encryption
cifs: fix smb1 mount regression
cifs: Fix cifs_limit_bvec_subset() to correctly check the maxmimum size
Pull parisc architecture fixes from Helge Deller:
"Quite a bunch of real bugfixes in here and most of them are tagged for
backporting: A fix for cache flushing from irq context, a kprobes &
kgdb breakpoint handling fix, and a fix in the alternative code
patching function to take care of CPU hotplugging.
parisc now provides LOCKDEP support and comes with a lightweight
spinlock check. Both features helped me to find the cache flush bug.
Additionally writing the AGP gatt has been fixed, the machine allows
the user to reboot after a system halt and arch_sync_dma_for_cpu() has
been optimized for PCXL PCUs.
Summary:
- Fix flush_dcache_page() for usage from irq context
- Handle kprobes breakpoints only in kernel context
- Handle kgdb breakpoints only in kernel context
- Use num_present_cpus() in alternative patching code
- Enable LOCKDEP support
- Add lightweight spinlock checks
- Flush AGP gatt writes and adjust gatt mask in parisc_agp_mask_memory()
- Allow to reboot machine after system halt
- Improve cache flushing for PCXL in arch_sync_dma_for_cpu()"
* tag 'parisc-for-6.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Fix flush_dcache_page() for usage from irq context
parisc: Handle kgdb breakpoints only in kernel context
parisc: Handle kprobes breakpoints only in kernel context
parisc: Allow to reboot machine after system halt
parisc: Enable LOCKDEP support
parisc: Add lightweight spinlock checks
parisc: Use num_present_cpus() in alternative patching code
parisc: Flush gatt writes and adjust gatt mask in parisc_agp_mask_memory()
parisc: Improve cache flushing for PCXL in arch_sync_dma_for_cpu()
It turns out that udev under certain circumstances will concurrently try
to load the same modules over-and-over excessively. This isn't a kernel
bug, but it ends up affecting the kernel, to the point that under
certain circumstances we can fail to boot, because the kernel uses a lot
of memory to read all the module data all at once.
Note that it isn't a memory leak, it's just basically a thundering herd
problem happening at bootup with a lot of CPUs, with the worst cases
then being pretty bad.
Admittedly the worst situations are somewhat contrived: lots and lots of
CPUs, not a lot of memory, and KASAN enabled to make it all slower and
as such (unintentionally) exacerbate the problem.
Luis explains: [1]
"My best assessment of the situation is that each CPU in udev ends up
triggering a load of duplicate set of modules, not just one, but *a
lot*. Not sure what heuristics udev uses to load a set of modules per
CPU."
Petr Pavlu chimes in: [2]
"My understanding is that udev workers are forked. An initial kmod
context is created by the main udevd process but no sharing happens
after the fork. It means that the mentioned memory pool logic doesn't
really kick in.
Multiple parallel load requests come from multiple udev workers, for
instance, each handling an udev event for one CPU device and making
the exactly same requests as all others are doing at the same time.
The optimization idea would be to recognize these duplicate requests
at the udevd/kmod level and converge them"
Note that module loading has tried to mitigate this issue before, see
for example commit 064f4536d1 ("module: avoid allocation if module is
already present and ready"), which has a few ASCII graphs on memory use
due to this same issue.
However, while that noticed that the module was already loaded, and
exited with an error early before spending any more time on setting up
the module, it didn't handle the case of multiple concurrent module
loads all being active - but not complete - at the same time.
Yes, one of them will eventually win the race and finalize its copy, and
the others will then notice that the module already exists and error
out, but while this all happens, we have tons of unnecessary concurrent
work being done.
Again, the real fix is for udev to not do that (maybe it should use
threads instead of fork, and have actual shared data structures and not
cause duplicate work). That real fix is apparently not trivial.
But it turns out that the kernel already has a pretty good model for
dealing with concurrent access to the same file: the i_writecount of the
inode.
In fact, the module loading already indirectly uses 'i_writecount' ,
because 'kernel_file_read()' will in fact do
ret = deny_write_access(file);
if (ret)
return ret;
...
allow_write_access(file);
around the read of the file data. We do not allow concurrent writes to
the file, and return -ETXTBUSY if the file was open for writing at the
same time as the module data is loaded from it.
And the solution to the reader concurrency problem is to simply extend
this "no concurrent writers" logic to simply be "exclusive access".
Note that "exclusive" in this context isn't really some absolute thing:
it's only exclusion from writers and from other "special readers" that
do this writer denial. So we simply introduce a variation of that
"deny_write_access()" logic that not only denies write access, but also
requires that this is the _only_ such access that denies write access.
Which means that you can't start loading a module that is already being
loaded as a module by somebody else, or you will get the same -ETXTBSY
error that you would get if there were writers around.
[ It also means that you can't try to load a currently executing
executable as a module, for the same reason: executables do that same
"deny_write_access()" thing, and that's obviously where the whole
ETXTBSY logic traditionally came from.
This is not a problem for kernel modules, since the set of normal
executable files and kernel module files is entirely disjoint. ]
This new function is called "exclusive_deny_write_access()", and the
implementation is trivial, in that it's just an atomic decrement of
i_writecount if it was 0 before.
To use that new exclusivity check, all we then do is wrap the module
loading with that exclusive_deny_write_access()() / allow_write_access()
pair. The actual patch is a bit bigger than that, because we want to
surround not just the "load file data" part, but the whole module setup,
to get maximum exclusion.
So this ends up splitting up "finit_module()" into a few helper
functions to make it all very clear and legible.
In Luis' test-case (bringing up 255 vcpu's in a virtual machine [3]),
the "wasted vmalloc" space (ie module data read into a vmalloc'ed area
in order to be loaded as a module, but then discarded because somebody
else loaded the same module instead) dropped from 1.8GiB to 474kB. Yes,
that's gigabytes to kilobytes.
It doesn't drop completely to zero, because even with this change, you
can still end up having completely serial pointless module loads, where
one udev process has loaded a module fully (and thus the kernel has
released that exclusive lock on the module file), and then another udev
process tries to load the same module again.
So while we cannot fully get rid of the fundamental bug in user space,
we _can_ get rid of the excessive concurrent thundering herd effect.
A couple of final side notes on this all:
- This tweak only affects the "finit_module()" system call, which gives
the kernel a file descriptor with the module data.
You can also just feed the module data as raw data from user space
with "init_module()" (note the lack of 'f' at the beginning), and
obviously for that case we do _not_ have any "exclusive read" logic.
So if you absolutely want to do things wrong in user space, and try
to load the same module multiple times, and error out only later when
the kernel ends up saying "you can't load the same module name
twice", you can still do that.
And in fact, some distros will do exactly that, because they will
uncompress the kernel module data in user space before feeding it to
the kernel (mainly because they haven't started using the new kernel
side decompression yet).
So this is not some absolute "you can't do concurrent loads of the
same module". It's literally just a very simple heuristic that will
catch it early in case you try to load the exact same module file at
the same time, and in that case avoid a potentially nasty situation.
- There is another user of "deny_write_access()": the verity code that
enables fs-verity on a file (the FS_IOC_ENABLE_VERITY ioctl).
If you use fs-verity and you care about verifying the kernel modules
(which does make sense), you should do it *before* loading said
kernel module. That may sound obvious, but now the implementation
basically requires it. Because if you try to do it concurrently, the
kernel may refuse to load the module file that is being set up by the
fs-verity code.
- This all will obviously mean that if you insist on loading the same
module in parallel, only one module load will succeed, and the others
will return with an error.
That was true before too, but what is different is that the -ETXTBSY
error can be returned *before* the success case of another process
fully loading and instantiating the module.
Again, that might sound obvious, and it is indeed the whole point of
the whole change: we are much quicker to notice the whole "you're
already in the process of loading this module".
So it's very much intentional, but it does mean that if you just
spray the kernel with "finit_module()", and expect that the module is
immediately loaded afterwards without checking the return value, you
are doing something horribly horribly wrong.
I'd like to say that that would never happen, but the whole _reason_
for this commit is that udev is currently doing something horribly
horribly wrong, so ...
Link: https://lore.kernel.org/all/ZEGopJ8VAYnE7LQ2@bombadil.infradead.org/ [1]
Link: https://lore.kernel.org/all/23bd0ce6-ef78-1cd8-1f21-0e706a00424a@suse.com/ [2]
Link: https://lore.kernel.org/lkml/ZG%2Fa+nrt4%2FAAUi5z@bombadil.infradead.org/ [3]
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Petr Pavlu <petr.pavlu@suse.com>
Tested-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull vfs fixes from Christian Brauner:
- During the acl rework we merged this cycle the generic_listxattr()
helper had to be modified in a way that in principle it would allow
for POSIX ACLs to be reported. At least that was the impression we
had initially. Because before the acl rework POSIX ACLs would be
reported if the filesystem did have POSIX ACL xattr handlers in
sb->s_xattr. That logic changed and now we can simply check whether
the superblock has SB_POSIXACL set and if the inode has
inode->i_{default_}acl set report the appropriate POSIX ACL name.
However, we didn't realize that generic_listxattr() was only ever
used by two filesystems. Both of them don't support POSIX ACLs via
sb->s_xattr handlers and so never reported POSIX ACLs via
generic_listxattr() even if they raised SB_POSIXACL and did contain
inodes which had acls set. The example here is nfs4.
As a result, generic_listxattr() suddenly started reporting POSIX
ACLs when it wouldn't have before. Since SB_POSIXACL implies that the
umask isn't stripped in the VFS nfs4 can't just drop SB_POSIXACL from
the superblock as it would also alter umask handling for them.
So just have generic_listxattr() not report POSIX ACLs as it never
did anyway. It's documented as such.
- Our SB_* flags currently use a signed integer and we shift the last
bit causing UBSAN to complain about undefined behavior. Switch to
using unsigned. While the original patch used an explicit unsigned
bitshift it's now pretty common to rely on the BIT() macro in a lot
of headers nowadays. So the patch has been adjusted to use that.
- Add Namjae as ntfs reviewer. They're already active this cycle so
let's make it explicit right now.
* tag 'vfs/v6.4-rc3/misc.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
ntfs: Add myself as a reviewer
fs: don't call posix_acl_listxattr in generic_listxattr
fs: fix undefined behavior in bit shift for SB_NOUSER
Pull networking fixes from Paolo Abeni:
"Including fixes from bluetooth and bpf.
Current release - regressions:
- net: fix skb leak in __skb_tstamp_tx()
- eth: mtk_eth_soc: fix QoS on DSA MAC on non MTK_NETSYS_V2 SoCs
Current release - new code bugs:
- handshake:
- fix sock->file allocation
- fix handshake_dup() ref counting
- bluetooth:
- fix potential double free caused by hci_conn_unlink
- fix UAF in hci_conn_hash_flush
Previous releases - regressions:
- core: fix stack overflow when LRO is disabled for virtual
interfaces
- tls: fix strparser rx issues
- bpf:
- fix many sockmap/TCP related issues
- fix a memory leak in the LRU and LRU_PERCPU hash maps
- init the offload table earlier
- eth: mlx5e:
- do as little as possible in napi poll when budget is 0
- fix using eswitch mapping in nic mode
- fix deadlock in tc route query code
Previous releases - always broken:
- udplite: fix NULL pointer dereference in __sk_mem_raise_allocated()
- raw: fix output xfrm lookup wrt protocol
- smc: reset connection when trying to use SMCRv2 fails
- phy: mscc: enable VSC8501/2 RGMII RX clock
- eth: octeontx2-pf: fix TSOv6 offload
- eth: cdc_ncm: deal with too low values of dwNtbOutMaxSize"
* tag 'net-6.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (79 commits)
udplite: Fix NULL pointer dereference in __sk_mem_raise_allocated().
net: phy: mscc: enable VSC8501/2 RGMII RX clock
net: phy: mscc: remove unnecessary phydev locking
net: phy: mscc: add support for VSC8501
net: phy: mscc: add VSC8502 to MODULE_DEVICE_TABLE
net/handshake: Enable the SNI extension to work properly
net/handshake: Unpin sock->file if a handshake is cancelled
net/handshake: handshake_genl_notify() shouldn't ignore @flags
net/handshake: Fix uninitialized local variable
net/handshake: Fix handshake_dup() ref counting
net/handshake: Remove unneeded check from handshake_dup()
ipv6: Fix out-of-bounds access in ipv6_find_tlv()
net: ethernet: mtk_eth_soc: fix QoS on DSA MAC on non MTK_NETSYS_V2 SoCs
docs: netdev: document the existence of the mail bot
net: fix skb leak in __skb_tstamp_tx()
r8169: Use a raw_spinlock_t for the register locks.
page_pool: fix inconsistency for page_pool_ring_[un]lock()
bpf, sockmap: Test progs verifier error with latest clang
bpf, sockmap: Test FIONREAD returns correct bytes in rx buffer with drops
bpf, sockmap: Test FIONREAD returns correct bytes in rx buffer
...
Traditionally, all CPUs in a system have identical numbers of SMT
siblings. That changes with hybrid processors where some logical CPUs
have a sibling and others have none.
Today, the CPU boot code sets the global variable smp_num_siblings when
every CPU thread is brought up. The last thread to boot will overwrite
it with the number of siblings of *that* thread. That last thread to
boot will "win". If the thread is a Pcore, smp_num_siblings == 2. If it
is an Ecore, smp_num_siblings == 1.
smp_num_siblings describes if the *system* supports SMT. It should
specify the maximum number of SMT threads among all cores.
Ensure that smp_num_siblings represents the system-wide maximum number
of siblings by always increasing its value. Never allow it to decrease.
On MeteorLake-P platform, this fixes a problem that the Ecore CPUs are
not updated in any cpu sibling map because the system is treated as an
UP system when probing Ecore CPUs.
Below shows part of the CPU topology information before and after the
fix, for both Pcore and Ecore CPU (cpu0 is Pcore, cpu 12 is Ecore).
...
-/sys/devices/system/cpu/cpu0/topology/package_cpus:000fff
-/sys/devices/system/cpu/cpu0/topology/package_cpus_list:0-11
+/sys/devices/system/cpu/cpu0/topology/package_cpus:3fffff
+/sys/devices/system/cpu/cpu0/topology/package_cpus_list:0-21
...
-/sys/devices/system/cpu/cpu12/topology/package_cpus:001000
-/sys/devices/system/cpu/cpu12/topology/package_cpus_list:12
+/sys/devices/system/cpu/cpu12/topology/package_cpus:3fffff
+/sys/devices/system/cpu/cpu12/topology/package_cpus_list:0-21
Notice that the "before" 'package_cpus_list' has only one CPU. This
means that userspace tools like lscpu will see a little laptop like
an 11-socket system:
-Core(s) per socket: 1
-Socket(s): 11
+Core(s) per socket: 16
+Socket(s): 1
This is also expected to make the scheduler do rather wonky things
too.
[ dhansen: remove CPUID detail from changelog, add end user effects ]
CC: stable@kernel.org
Fixes: bbb65d2d36 ("x86: use cpuid vector 0xb when available for detecting cpu topology")
Fixes: 95f3d39ccf ("x86/cpu/topology: Provide detect_extended_topology_early()")
Suggested-by: Len Brown <len.brown@intel.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/all/20230323015640.27906-1-rui.zhang%40intel.com
Driver should update policy->cur after updating the frequency.
Currently amd_pstate doesn't update policy->cur when `adjust_perf`
is used. Which causes /proc/cpuinfo to show wrong cpu frequency.
Fix this by updating policy->cur with correct frequency value in
adjust_perf function callback.
- Before the fix: (setting min freq to 1.5 MHz)
[root@amd]# cat /proc/cpuinfo | grep "cpu MHz" | sort | uniq --count
1 cpu MHz : 1777.016
1 cpu MHz : 1797.160
1 cpu MHz : 1797.270
189 cpu MHz : 400.000
- After the fix: (setting min freq to 1.5 MHz)
[root@amd]# cat /proc/cpuinfo | grep "cpu MHz" | sort | uniq --count
1 cpu MHz : 1753.353
1 cpu MHz : 1756.838
1 cpu MHz : 1776.466
1 cpu MHz : 1776.873
1 cpu MHz : 1777.308
1 cpu MHz : 1779.900
183 cpu MHz : 1805.231
1 cpu MHz : 1956.815
1 cpu MHz : 2246.203
1 cpu MHz : 2259.984
Fixes: 1d215f0319 ("cpufreq: amd-pstate: Add fast switch function for AMD P-State")
Signed-off-by: Wyes Karny <wyes.karny@amd.com>
[ rjw: Subject edits ]
Cc: 5.17+ <stable@vger.kernel.org> # 5.17+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
On sc7280 where eDP is the primary display, PSR is causing
IGT breakage even for basic test cases like kms_atomic and
kms_atomic_transition. Most often the issue starts with below
stack so providing that as reference
Call trace:
dpu_encoder_assign_crtc+0x64/0x6c
dpu_crtc_enable+0x188/0x204
drm_atomic_helper_commit_modeset_enables+0xc0/0x274
msm_atomic_commit_tail+0x1a8/0x68c
commit_tail+0xb0/0x160
drm_atomic_helper_commit+0x11c/0x124
drm_atomic_commit+0xb0/0xdc
drm_atomic_connector_commit_dpms+0xf4/0x110
drm_mode_obj_set_property_ioctl+0x16c/0x3b0
drm_connector_property_set_ioctl+0x4c/0x74
drm_ioctl_kernel+0xec/0x15c
drm_ioctl+0x264/0x408
__arm64_sys_ioctl+0x9c/0xd4
invoke_syscall+0x4c/0x110
el0_svc_common+0x94/0xfc
do_el0_svc+0x3c/0xb0
el0_svc+0x2c/0x7c
el0t_64_sync_handler+0x48/0x114
el0t_64_sync+0x190/0x194
---[ end trace 0000000000000000 ]---
[drm-dp] dp_ctrl_push_idle: PUSH_IDLE pattern timedout
Other basic use-cases still seem to work fine hence add a
a module parameter to allow toggling psr enable/disable till
PSR related issues are hashed out with IGT.
Signed-off-by: Abhinav Kumar <quic_abhinavk@quicinc.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Acked-by: Rob Clark <robdclark@gmail.com>
Patchwork: https://patchwork.freedesktop.org/patch/534420/
Link: https://lore.kernel.org/r/20230427232848.5200-1-quic_abhinavk@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
Pull power supply fixes from Sebastian Reichel:
- Fix power_supply_get_battery_info for devices without parent devices
resulting in NULL pointer dereference
- Fix desktop systems reporting to run on battery once a power-supply
device with device scope appears (e.g. a HID keyboard with a battery)
- Ratelimit debug print about driver not providing data
- Fix race condition related to external_power_changed in multiple
drivers (ab8500, axp288, bq25890, sc27xx, bq27xxx)
- Fix LED trigger switching from blinking to solid-on when charging
finishes
- Fix multiple races in bq27xxx battery driver
- mt6360: handle potential ENOMEM from devm_work_autocancel
- sbs-charger: Fix SBS_CHARGER_STATUS_CHARGE_INHIBITED bit
- rt9467: avoid passing 0 to dev_err_probe
* tag 'for-v6.4-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply: (21 commits)
power: supply: Fix logic checking if system is running from battery
power: supply: mt6360: add a check of devm_work_autocancel in mt6360_charger_probe
power: supply: sbs-charger: Fix INHIBITED bit for Status reg
power: supply: rt9467: Fix passing zero to 'dev_err_probe'
power: supply: Ratelimit no data debug output
power: supply: Fix power_supply_get_battery_info() if parent is NULL
power: supply: bq24190: Call power_supply_changed() after updating input current
power: supply: bq25890: Call power_supply_changed() after updating input current or voltage
power: supply: bq27xxx: Use mod_delayed_work() instead of cancel() + schedule()
power: supply: bq27xxx: After charger plug in/out wait 0.5s for things to stabilize
power: supply: bq27xxx: Ensure power_supply_changed() is called on current sign changes
power: supply: bq27xxx: Move bq27xxx_battery_update() down
power: supply: bq27xxx: Add cache parameter to bq27xxx_battery_current_and_status()
power: supply: bq27xxx: Fix poll_interval handling and races on remove
power: supply: bq27xxx: Fix I2C IRQ race on remove
power: supply: bq27xxx: Fix bq27xxx_battery_update() race condition
power: supply: leds: Fix blink to LED on transition
power: supply: sc27xx: Fix external_power_changed race
power: supply: bq25890: Fix external_power_changed race
power: supply: axp288_fuel_gauge: Fix external_power_changed race
...
Pull sound fixes from Takashi Iwai:
"A collection of small fixes:
- HD-audio runtime PM bug fix
- A couple of HD-audio quirks
- Fix series of ASoC Intel AVS drivers
- ASoC DPCM fix for a bug found on new Intel systems
- A few other ASoC device-specific small fixes"
* tag 'sound-6.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda/realtek: Enable headset onLenovo M70/M90
ASoC: dwc: move DMA init to snd_soc_dai_driver probe()
ASoC: cs35l41: Fix default regmap values for some registers
ALSA: hda: Fix unhandled register update during auto-suspend period
ASoC: dt-bindings: tlv320aic32x4: Fix supply names
ASoC: Intel: avs: Add missing checks on FE startup
ASoC: Intel: avs: Fix avs_path_module::instance_id size
ASoC: Intel: avs: Account for UID of ACPI device
ASoC: Intel: avs: Fix declaration of enum avs_channel_config
ASoC: Intel: Skylake: Fix declaration of enum skl_ch_cfg
ASoC: Intel: avs: Access path components under lock
ASoC: Intel: avs: Fix module lookup
ALSA: hda/ca0132: add quirk for EVGA X299 DARK
ASoC: soc-pcm: test if a BE can be prepared
ASoC: rt5682: Disable jack detection interrupt during suspend
ASoC: lpass: Fix for KASAN use_after_free out of bounds
Pull x86 platform driver fixes from Hans de Goede:
"Nothing special to report just a few small fixes"
* tag 'platform-drivers-x86-v6.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/x86/intel/ifs: Annotate work queue on stack so object debug does not complain
platform/x86: ISST: Remove 8 socket limit
platform/mellanox: mlxbf-pmc: fix sscanf() error checking
platform/x86/amd/pmf: Fix CnQF and auto-mode after resume
platform/x86: asus-wmi: Ignore WMI events with codes 0x7B, 0xC0
Pull m68k fix from Geert Uytterhoeven:
- Fix signal frame issue causing user-space crashes on 68020/68030
* tag 'm68k-for-v6.4-tag2' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k:
m68k: Move signal frame following exception on 68020/030
The sq thread actively releases CPU resources by calling the
cond_resched() and schedule() interfaces when it is idle. Therefore,
more resources are available for other threads to run.
There exists a problem in sq thread: it does not unlock sqd->lock before
releasing CPU resources every time. This makes other threads pending on
sqd->lock for a long time. For example, the following interfaces all
require sqd->lock: io_sq_offload_create(), io_register_iowq_max_workers()
and io_ring_exit_work().
Before the sq thread releases CPU resources, unlocking sqd->lock will
provide the user a better experience because it can respond quickly to
user requests.
Signed-off-by: Kanchan Joshi<joshi.k@samsung.com>
Signed-off-by: Wenwen Chen<wenwen.chen@samsung.com>
Link: https://lore.kernel.org/r/20230525082626.577862-1-wenwen.chen@samsung.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
i.MX fixes for 6.4:
- A couple of i.MX8MN/P video clock changes from Adam Ford to fix issue
with clock re-parenting.
- Add missing pvcie-supply regulator for imx6qdl-mba6 board.
- A series of colibri-imx8x board fixes on pin configuration.
- Set and limit the mode for PMIC bucks for imx6ull-dhcor board to fix
stability problems.
- A couple of changes from Frank Li to correct cdns,usb3 bindings
cdns,on-chip-buff-size property and fix USB 3.0 gadget failure on
i.MX8QM & QXPB0.
- Add a required PHY deassert delay for imx8mn-var-som board to fix PHY
detection failure.
* tag 'imx-fixes-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux:
arm64: dts: imx8: fix USB 3.0 Gadget Failure in QM & QXPB0 at super speed
dt-binding: cdns,usb3: Fix cdns,on-chip-buff-size type
arm64: dts: colibri-imx8x: delete adc1 and dsp
arm64: dts: colibri-imx8x: fix iris pinctrl configuration
arm64: dts: colibri-imx8x: move pinctrl property from SoM to eval board
arm64: dts: colibri-imx8x: fix eval board pin configuration
arm64: dts: imx8mp: Fix video clock parents
ARM: dts: imx6qdl-mba6: Add missing pvcie-supply regulator
ARM: dts: imx6ull-dhcor: Set and limit the mode for PMIC buck 1, 2 and 3
arm64: dts: imx8mn-var-som: fix PHY detection bug by adding deassert delay
arm64: dts: imx8mn: Fix video clock parents
Link: https://lore.kernel.org/r/20230516133625.GI767028@dragon
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Arm SCMI fix for v6.4
Single fix for incorrect invocation of alloc_workqueue() where WQ_SYSFS
flag is passed as @max_active parameter instead of OR'ing with the other
flags in the @flags parameter.
* tag 'scmi-fix-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux:
firmware: arm_scmi: Fix incorrect alloc_workqueue() invocation
Link: https://lore.kernel.org/r/20230509143529.1188812-1-sudeep.holla@arm.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Arm FVP/Vexpress fixes for v6.4
Couple of fixes to address the missing required 'cache-unified' property
in the level 2 and 3 caches on some of the FVP/vexpress platforms.
* tag 'juno-fixes-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux:
arm64: dts: arm: add missing cache properties
ARM: dts: vexpress: add missing cache properties
Link: https://lore.kernel.org/r/20230509143508.1188786-1-sudeep.holla@arm.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
The cper.c file needs to include an extra header, and efi_zboot_entry
needs an extern declaration to avoid these 'make W=1' warnings:
drivers/firmware/efi/libstub/zboot.c:65:1: error: no previous prototype for 'efi_zboot_entry' [-Werror=missing-prototypes]
drivers/firmware/efi/efi.c:176:16: error: no previous prototype for 'efi_attr_is_visible' [-Werror=missing-prototypes]
drivers/firmware/efi/cper.c:626:6: error: no previous prototype for 'cper_estatus_print' [-Werror=missing-prototypes]
drivers/firmware/efi/cper.c:649:5: error: no previous prototype for 'cper_estatus_check_header' [-Werror=missing-prototypes]
drivers/firmware/efi/cper.c:662:5: error: no previous prototype for 'cper_estatus_check' [-Werror=missing-prototypes]
To make this easier, move the cper specific declarations to
include/linux/cper.h.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
The Make variable containing the objcopy flags may be constructed from
the output of build tools operating on build artifacts, and these may
not exist when doing a make clean.
So avoid evaluating them eagerly, to prevent spurious build warnings.
Suggested-by: Pedro Falcato <pedro.falcato@gmail.com>
Tested-by: Alan Bartlett <ajb@elrepo.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
David Epping says:
====================
net: phy: mscc: support VSC8501
this updated series of patches adds support for the VSC8501 Ethernet
PHY and fixes support for the VSC8502 PHY in cases where no other
software (like U-Boot) has initialized the PHY after power up.
The first patch simply adds the VSC8502 to the MODULE_DEVICE_TABLE,
where I guess it was unintentionally missing. I have no hardware to
test my change.
The second patch adds the VSC8501 PHY with exactly the same driver
implementation as the existing VSC8502.
The (new) third patch removes phydev locking from
vsc85xx_rgmii_set_skews(), as discussed for v2 of the patch set.
The (now) fourth patch fixes the initialization for VSC8501 and VSC8502.
I have tested this patch with VSC8501 on hardware in RGMII mode only.
https://ww1.microchip.com/downloads/aemDocuments/documents/UNG/ProductDocuments/DataSheets/VSC8501-03_Datasheet_60001741A.PDFhttps://ww1.microchip.com/downloads/aemDocuments/documents/UNG/ProductDocuments/DataSheets/VSC8502-03_Datasheet_60001742B.pdf
Table 4-42 "RGMII CONTROL, ADDRESS 20E2 (0X14)" Bit 11 for each of
them.
By default the RX_CLK is disabled for these PHYs. In cases where no
other software, like U-Boot, enabled the clock, this results in no
received packets being handed to the MAC.
The patch enables this clock output.
According to Microchip support (case number 01268776) this applies
to all modes (RGMII, GMII, and MII).
Other PHYs sharing the same register map and code, like
VSC8530/31/40/41 have the clock enabled and the relevant bit 11 is
reserved and read-only for them. As per previous discussion the
patch still clears the bit on these PHYs, too, possibly more easily
supporting other future PHYs implementing this functionality.
For the VSC8572 family of PHYs, having a different register map,
no such changes are applied.
====================
Link: https://lore.kernel.org/r/20230523153108.18548-1-david.epping@missinglinkelectronics.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
By default the VSC8501 and VSC8502 RGMII/GMII/MII RX_CLK output is
disabled. To allow packet forwarding towards the MAC it needs to be
enabled.
For other PHYs supported by this driver the clock output is enabled
by default.
Fixes: d316986331 ("net: phy: mscc: add support for VSC8502")
Signed-off-by: David Epping <david.epping@missinglinkelectronics.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Holding the struct phy_device (phydev) lock is unnecessary when
accessing phydev->interface in the PHY driver .config_init method,
which is the only place that vsc85xx_rgmii_set_skews() is called from.
The phy_modify_paged() function implements required MDIO bus level
locking, which can not be achieved by a phydev lock.
Signed-off-by: David Epping <david.epping@missinglinkelectronics.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The VSC8501 PHY can use the same driver implementation as the VSC8502.
Adding the PHY ID and copying the handler functions of VSC8502 is
sufficient to operate it.
Signed-off-by: David Epping <david.epping@missinglinkelectronics.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The mscc driver implements support for VSC8502, so its ID should be in
the MODULE_DEVICE_TABLE for automatic loading.
Signed-off-by: David Epping <david.epping@missinglinkelectronics.com>
Fixes: d316986331 ("net: phy: mscc: add support for VSC8502")
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Chuck Lever says:
====================
Bug fixes for net/handshake
Paolo observed that there is a possible leak of sock->file. I
haven't looked into that yet, but it seems to be separate from
the fixes in this series, so no need to hold these up.
====================
The submissions mentions net-next but it means netdev (perhaps
merge window left over when trees are converged). In any case,
it should have gone into net, but was instead applied to net-next
as commit deb2e484ba ("Merge branch 'net-handshake-fixes'").
These are fixes tho, and Chuck needs them to make progress with
the client so double-merging them into net... it is what it is :(
Link: https://lore.kernel.org/r/168381978252.84244.1933636428135211300.stgit@91.116.238.104.host.secureserver.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Enable the upper layer protocol to specify the SNI peername. This
avoids the need for tlshd to use a DNS lookup, which can return a
hostname that doesn't match the incoming certificate's SubjectName.
Fixes: 2fd5532044 ("net/handshake: Add a kernel API for requesting a TLSv1.3 handshake")
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If user space never calls DONE, sock->file's reference count remains
elevated. Enable sock->file to be freed eventually in this case.
Reported-by: Jakub Kacinski <kuba@kernel.org>
Fixes: 3b3009ea8a ("net/handshake: Create a NETLINK service for handling handshake requests")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
trace_handshake_cmd_done_err() simply records the pointer in @req,
so initializing it to NULL is sufficient and safe.
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Fixes: 3b3009ea8a ("net/handshake: Create a NETLINK service for handling handshake requests")
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If get_unused_fd_flags() fails, we ended up calling fput(sock->file)
twice.
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Fixes: 3b3009ea8a ("net/handshake: Create a NETLINK service for handling handshake requests")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
handshake_req_submit() now verifies that the socket has a file.
Fixes: 3b3009ea8a ("net/handshake: Create a NETLINK service for handling handshake requests")
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Daniel Borkmann says:
====================
pull-request: bpf 2023-05-24
We've added 19 non-merge commits during the last 10 day(s) which contain
a total of 20 files changed, 738 insertions(+), 448 deletions(-).
The main changes are:
1) Batch of BPF sockmap fixes found when running against NGINX TCP tests,
from John Fastabend.
2) Fix a memleak in the LRU{,_PERCPU} hash map when bucket locking fails,
from Anton Protopopov.
3) Init the BPF offload table earlier than just late_initcall,
from Jakub Kicinski.
4) Fix ctx access mask generation for 32-bit narrow loads of 64-bit fields,
from Will Deacon.
5) Remove a now unsupported __fallthrough in BPF samples,
from Andrii Nakryiko.
6) Fix a typo in pkg-config call for building sign-file,
from Jeremy Sowden.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf, sockmap: Test progs verifier error with latest clang
bpf, sockmap: Test FIONREAD returns correct bytes in rx buffer with drops
bpf, sockmap: Test FIONREAD returns correct bytes in rx buffer
bpf, sockmap: Test shutdown() correctly exits epoll and recv()=0
bpf, sockmap: Build helper to create connected socket pair
bpf, sockmap: Pull socket helpers out of listen test for general use
bpf, sockmap: Incorrectly handling copied_seq
bpf, sockmap: Wake up polling after data copy
bpf, sockmap: TCP data stall on recv before accept
bpf, sockmap: Handle fin correctly
bpf, sockmap: Improved check for empty queue
bpf, sockmap: Reschedule is now done through backlog
bpf, sockmap: Convert schedule_work into delayed_work
bpf, sockmap: Pass skb ownership through read_skb
bpf: fix a memory leak in the LRU and LRU_PERCPU hash maps
bpf: Fix mask generation for 32-bit narrow loads of 64-bit fields
samples/bpf: Drop unnecessary fallthrough
bpf: netdev: init the offload table earlier
selftests/bpf: Fix pkg-config call building sign-file
====================
Link: https://lore.kernel.org/r/20230524170839.13905-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In general, the three SKUs of sc7180 (lite, normal, and pro) are
handled dynamically.
The cpufreq table in sc7180.dtsi includes the superset of all CPU
frequencies. The "qcom-cpufreq-hw" driver in Linux shows that we can
dynamically detect which frequencies are actually available on the
currently running CPU and then we can just enable those ones.
The GPU is similarly dynamic. The nvmem has a fuse in it (see
"gpu_speed_bin" in sc7180.dtsi) that the GPU driver can use to figure
out which frequencies to enable.
There is one part, however, that is not so dynamic. The way SDRAM
frequency works in sc7180 is that it's tied to cpufreq. At the busiest
cpufreq operating points we'll pick the top supported SDRAM frequency.
They ramp down together.
For the "pro" SKU of sc7180, we only enable one extra cpufreq step.
That extra cpufreq step runs SDRAM at the same speed as the step
below. Thus, for normal and pro things are OK. There is no sc7180-pro
device tree snippet.
For the "lite" SKU if sc7180, however, things aren't so easy. The
"lite" SKU drops 3 cpufreq entries but can still run SDRAM at max
frequency. That messed things up with the whole scheme. This is why we
added the "sc7180-lite" fragment in commit 8fd01e01fd ("arm64: dts:
qcom: sc7180-lite: Tweak DDR/L3 scaling on SC7180-lite").
When the lite scheme came about, it was agreed that the WiFi SKUs of
lazor would _always_ be "lite" and would, in fact, be the only "lite"
devices. Unfortunately, this decision changed and folks didn't realize
that it would be a problem. Specifically, some later lazor WiFi-only
devices were built with "pro" CPUs.
Building WiFi-only lazor with "pro" CPUs isn't the end of the world.
The SDRAM will ramp up a little sooner than it otherwise would, but
aside from a small power hit things work OK. One problem, though, is
that the SDRAM scaling becomes a bit quirky. Specifically, with the
current tables we'll max out SDRAM frequency at 2.1GHz but then
_lower_ it at 2.2GHz / 2.3GHz only to raise it back to max for 2.4GHz
and 2.55GHz.
Let's at least fix this so that the SDRAM frequency doesn't go down in
that quirky way. On true "lite" SKUs this change will be a no-op
because the operating points we're touching are disabled. This change
is only useful when a board that thinks it has a "lite" CPU actually
has a "normal" or "pro" one stuffed.
Fixes: 8fd01e01fd ("arm64: dts: qcom: sc7180-lite: Tweak DDR/L3 scaling on SC7180-lite")
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org>
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Link: https://lore.kernel.org/r/20230515171929.1.Ic8dee2cb79ce39ffc04eab2a344dde47b2f9459f@changeid
Wrap note paragraphs in note:: directive as it better fit for the
purpose of noting devlink commands.
Fixes: f2d51e5793 ("net/mlx5: Separate mlx5 driver documentation into multiple pages")
Fixes: cf14af140a ("net/mlx5e: Add vnic devlink health reporter to representors")
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The doc forgets to add separator before numbered lists, which causes the
lists to be appended to previous paragraph inline instead.
Add the missing separator.
Fixes: f2d51e5793 ("net/mlx5: Separate mlx5 driver documentation into multiple pages")
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
"vnic reporter" section contains unformatted description for vnic
counters, which is rendered as one long paragraph instead of list.
Use bullet and definition lists to match other lists.
Fixes: b0bc615df4 ("net/mlx5: Add vnic devlink health reporter to PFs/VFs")
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Daniel Machon <daniel.machon@microchip.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
As priv->dfs_root is cleared, and therefore missed, when change
eswitch mode, move the creation of the root debugfs to the init
callback of mlx5e_nic_profile and mlx5e_uplink_rep_profile, and
the destruction to the cleanup callback for symmeter.
Fixes: 288eca60cc ("net/mlx5e: Add Ethernet driver debugfs")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
As part of switchdev mode disablement, driver changes port netdevice
profile from uplink to nic. If this process is triggered by health
recovery flow (PCI reset, for ex.) profile attach would fail because all
fw commands aborted when internal error flag is set. As a result, nic
netdevice profile is not attached and driver fails to rollback to uplink
profile, which leave driver in broken state and cause crash later.
To handle broken state do netdevice profile initialization only instead
of full attachment and release mdev resources on driver suspend as
expected. Actual netdevice attachment is done during driver load.
Fixes: c4d7eb5768 ("net/mxl5e: Add change profile method")
Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Caller of mlx5e_tc_act_post_parse() needs it to parse only the subset of
actions starting after previous split and ending at the current action.
However, that range is not provided as arguments and
mlx5e_tc_act_post_parse() uses generic flow_action_for_each() that iterates
over all flow actions. Not only this is redundant, it also causes a bug
when mlx5e_tc_act->post_parse() callback is not idempotent since it will be
called for every split. For example, ct action tc_act_post_parse_ct()
callback obtains a reference to mlx5_ct_ft instance and calling it several
times during parsing stage will cause reference counter imbalance.
Fix the issue by providing a proper action range of the current split
subset to mlx5e_tc_act_post_parse() and only calling
mlx5e_tc_act->post_parse() for actions inside the subset range.
Fixes: 8300f22526 ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Don't query the firmware so many times (num rqs * num wqes * wqe frags)
because it slows down linearly the interface creation time when the
product is larger. Do it only once per mdev and store the result in
mlx5e_param.
Due to helper function being called from different files, move it to
an appropriate location. Rename the function with a proper prefix and
add a small cleanup.
This fix applies only for legacy rq.
Fixes: 1b1e486883 ("net/mlx5e: Use query_special_contexts for mkeys")
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Or Har-Toov <ohartoov@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
mlx5 driver needs to parse traces with event_id inside the range of
first_string_trace and num_string_trace. However, mlx5 is parsing all
events with event_id >= first_string_trace.
Fix it by checking for the correct range.
Fixes: c71ad41ccb ("net/mlx5: FW tracer, events handling")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
There is no point in recovery during device removal. Also, if health
work started need to wait for it to avoid races and NULL pointer
access.
Hence, drain health WQ before removing device.
Fixes: 1958fc2f07 ("net/mlx5: SF, Add auxiliary device driver")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
mlx5 health mechanism is using devlink APIs, which are using devlink
notify APIs. After the cited patch, using devlink notify APIs after
devlink is unregistered triggers a WARN_ON().
Hence, drain health WQ before devlink is unregistered.
Fixes: cf53021740 ("devlink: Notify users when objects are accessible")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The shared buffer pools configuration which are stored in the SBCM
register are updated when the user changes the prio2buffer mapping.
However, in case the user desired prio2buffer change is invalid,
which can occur due to mapping a lossless priority to a not large enough
buffer, the SBCM update should not be performed, as the user command is
failed.
Thus, Perform the SBCM update only after xoff threshold calculation is
performed and the user prio2buffer mapping is validated.
Fixes: a440030d89 ("net/mlx5e: Update shared buffer along with device buffer changes")
Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Currently, when a user triggers a change in port buffer headroom
(buffers 0-7), the driver checks that the requested headroom does
not exceed the total port buffer size. However, this check does not
take into account the internal buffers (buffers 8-9), which are also
part of the total port buffer. This can result in treating invalid port
buffer change requests as valid, causing unintended changes to the shared
buffer.
To address this, include the internal buffers size in the calculation of
available port buffer space which ensures that port buffer requests do not
exceed the correct limit.
Furthermore, remove internal buffers (8-9) size from the total_size
calculation as these buffers are reserved for internal use and are not
exposed to the user.
While at it, add verbosity to the debug prints in
mlx5e_port_query_buffer() function to ease future debugging.
Fixes: ecdf2dadee ("net/mlx5e: Receive buffer support for DCBX")
Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The cited commit adds a compeletion to remove dependency on rtnl
lock. But it causes a deadlock for multiple encapsulations:
crash> bt ffff8aece8a64000
PID: 1514557 TASK: ffff8aece8a64000 CPU: 3 COMMAND: "tc"
#0 [ffffa6d14183f368] __schedule at ffffffffb8ba7f45
#1 [ffffa6d14183f3f8] schedule at ffffffffb8ba8418
#2 [ffffa6d14183f418] schedule_preempt_disabled at ffffffffb8ba8898
#3 [ffffa6d14183f428] __mutex_lock at ffffffffb8baa7f8
#4 [ffffa6d14183f4d0] mutex_lock_nested at ffffffffb8baabeb
#5 [ffffa6d14183f4e0] mlx5e_attach_encap at ffffffffc0f48c17 [mlx5_core]
#6 [ffffa6d14183f628] mlx5e_tc_add_fdb_flow at ffffffffc0f39680 [mlx5_core]
#7 [ffffa6d14183f688] __mlx5e_add_fdb_flow at ffffffffc0f3b636 [mlx5_core]
#8 [ffffa6d14183f6f0] mlx5e_tc_add_flow at ffffffffc0f3bcdf [mlx5_core]
#9 [ffffa6d14183f728] mlx5e_configure_flower at ffffffffc0f3c1d1 [mlx5_core]
#10 [ffffa6d14183f790] mlx5e_rep_setup_tc_cls_flower at ffffffffc0f3d529 [mlx5_core]
#11 [ffffa6d14183f7a0] mlx5e_rep_setup_tc_cb at ffffffffc0f3d714 [mlx5_core]
#12 [ffffa6d14183f7b0] tc_setup_cb_add at ffffffffb8931bb8
#13 [ffffa6d14183f810] fl_hw_replace_filter at ffffffffc0dae901 [cls_flower]
#14 [ffffa6d14183f8d8] fl_change at ffffffffc0db5c57 [cls_flower]
#15 [ffffa6d14183f970] tc_new_tfilter at ffffffffb8936047
#16 [ffffa6d14183fac8] rtnetlink_rcv_msg at ffffffffb88c7c31
#17 [ffffa6d14183fb50] netlink_rcv_skb at ffffffffb8942853
#18 [ffffa6d14183fbc0] rtnetlink_rcv at ffffffffb88c1835
#19 [ffffa6d14183fbd0] netlink_unicast at ffffffffb8941f27
#20 [ffffa6d14183fc18] netlink_sendmsg at ffffffffb8942245
#21 [ffffa6d14183fc98] sock_sendmsg at ffffffffb887d482
#22 [ffffa6d14183fcb8] ____sys_sendmsg at ffffffffb887d81a
#23 [ffffa6d14183fd38] ___sys_sendmsg at ffffffffb88806e2
#24 [ffffa6d14183fe90] __sys_sendmsg at ffffffffb88807a2
#25 [ffffa6d14183ff28] __x64_sys_sendmsg at ffffffffb888080f
#26 [ffffa6d14183ff38] do_syscall_64 at ffffffffb8b9b6a8
#27 [ffffa6d14183ff50] entry_SYSCALL_64_after_hwframe at ffffffffb8c0007c
crash> bt 0xffff8aeb07544000
PID: 1110766 TASK: ffff8aeb07544000 CPU: 0 COMMAND: "kworker/u20:9"
#0 [ffffa6d14e6b7bd8] __schedule at ffffffffb8ba7f45
#1 [ffffa6d14e6b7c68] schedule at ffffffffb8ba8418
#2 [ffffa6d14e6b7c88] schedule_timeout at ffffffffb8baef88
#3 [ffffa6d14e6b7d10] wait_for_completion at ffffffffb8ba968b
#4 [ffffa6d14e6b7d60] mlx5e_take_all_encap_flows at ffffffffc0f47ec4 [mlx5_core]
#5 [ffffa6d14e6b7da0] mlx5e_rep_update_flows at ffffffffc0f3e734 [mlx5_core]
#6 [ffffa6d14e6b7df8] mlx5e_rep_neigh_update at ffffffffc0f400bb [mlx5_core]
#7 [ffffa6d14e6b7e50] process_one_work at ffffffffb80acc9c
#8 [ffffa6d14e6b7ed0] worker_thread at ffffffffb80ad012
#9 [ffffa6d14e6b7f10] kthread at ffffffffb80b615d
#10 [ffffa6d14e6b7f50] ret_from_fork at ffffffffb8001b2f
After the first encap is attached, flow will be added to encap
entry's flows list. If neigh update is running at this time, the
following encaps of the flow can't hold the encap_tbl_lock and
sleep. If neigh update thread is waiting for that flow's init_done,
deadlock happens.
Fix it by holding lock outside of the for loop. If neigh update is
running, prevent encap flows from offloading. Since the lock is held
outside of the for loop, concurrent creation of encap entries is not
allowed. So remove unnecessary wait_for_completion call for res_ready.
Fixes: 95435ad799 ("net/mlx5e: Only access fully initialized flows in neigh update")
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Move set_encap_dests() and clean_encap_dests() to the tunnel encap
dedicated file. And rename them to mlx5e_tc_tun_encap_dests_set()
and mlx5e_tc_tun_encap_dests_unset().
No functional change in this patch. It is needed in the next patch.
Signed-off-by: Chris Mi <cmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Merge the RPMPD DeviceTree binding update through a topic branch, so
make the constants available to both driver and DeviceTree source
branches. Keep the two parts on separate branches to allow merge-back if
necessary towards v6.5.
Merge the RPMPD DeviceTree binding update through a topic branch, so
make the constants available to both driver and DeviceTree source
branches. Keep the two parts on separate branches to allow merge-back if
necessary towards v6.5.
The following error was reported when building x86_64 allmodconfig:
error: the following would cause module name conflict:
drivers/soc/qcom/ice.ko
drivers/net/ethernet/intel/ice/ice.ko
Seems the 'ice' module name is already used by some Intel ethernet
driver, so lets rename the Qualcomm Inline Crypto Engine (ICE) module
from 'ice' to 'qcom_ice' to avoid any kind of errors/confusions.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Fixes: 2afbf43a4a ("soc: qcom: Make the Qualcomm UFS/SDCC ICE a dedicated driver")
Acked-by: Mukesh Ojha <quic_mojha@quicinc.com>
Signed-off-by: Abel Vesa <abel.vesa@linaro.org>
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Link: https://lore.kernel.org/r/20230516082856.150214-1-abel.vesa@linaro.org
Documentation/filesystems/cifs contains both server and client information
so its pathname is misleading. In addition, the directory fs/smb
now contains both server and client, so move Documentation/filesystems/cifs
to Documentation/filesystems/smb
Suggested-by: Namjae Jeon <linkinjeon@kernel.org>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
The fs/cifs directory has moved to fs/smb/client, correct mentions
of this in Documentation and comments.
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Move CIFS/SMB3 related client and server files (cifs.ko and ksmbd.ko
and helper modules) to new fs/smb subdirectory:
fs/cifs --> fs/smb/client
fs/ksmbd --> fs/smb/server
fs/smbfs_common --> fs/smb/common
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
There are two ways that special characters (not allowed in some
other operating systems like Windows, but allowed in POSIX) have
been mapped in the past ("SFU" and "SFM" mappings) to allow them
to be stored in a range reserved for special chars. The default
for Linux has been to use "mapposix" (ie the SFM mapping) but
the conversion to the new mount API in the 5.11 kernel broke
the ability to override the default mapping of the reserved
characters (like '?' and '*' and '\') via "mapchars" mount option.
This patch fixes that - so can now mount with "mapchars"
mount option to override the default ("mapposix" ie SFM) mapping.
Reported-by: Tyler Spivey <tspivey8@gmail.com>
Fixes: 24e0a1eff9 ("cifs: switch to new mount api")
Signed-off-by: Steve French <stfrench@microsoft.com>
Fix /proc/fs/cifs/DebugData to use the same case for "encryption"
(ie "Encryption" with init capital letter was used in one place).
In addition, if gcm256 encryption (intead of gcm128) is used on
a connection to a server, note that in the DebugData as well.
It now displays (when gcm256 negotiated):
Security type: RawNTLMSSP SessionId: 0x86125800bc000b0d encrypted(gcm256)
Acked-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
cifs.ko maps NT_STATUS_NOT_FOUND to -EIO when SMB1 servers couldn't
resolve referral paths. Proceed to tree connect when we get -EIO from
dfs_get_referral() as well.
Reported-by: Kris Karas (Bug Reporting) <bugs-a21@moonlit-rail.com>
Tested-by: Woody Suwalski <terraluna977@gmail.com>
Fixes: 8e3554150d ("cifs: fix sharing of DFS connections")
Cc: stable@vger.kernel.org # v6.2+
Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
At drm suspend sequence, MST dc_sink is removed. When commit cached
MST stream back in drm resume sequence, the MST stream payload is not
properly created and added into the payload table. After resume, topology
change is reprobed by removing existing streams first. That leads to
no payload is found in the existing payload table as below error
"[drm] ERROR No payload for [MST PORT:] found in mst state"
1. In encoder .atomic_check routine, remove check existance of dc_sink
2. Bypass MST by checking existence of MST root port. dc_link_type cannot
differentiate MST port before topology is rediscovered.
Reviewed-by: Wayne Lin <wayne.lin@amd.com>
Acked-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Fangzhi Zuo <jerry.zuo@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
[Why]
During gpu-reset, we toggle vblank irq by calling dc_interrupt_set()
instead of amdgpu_irq_get/put() because we don't want to change the irq
source's refcount. However, we see the warning when vblank irq is enabled
by dc_interrupt_set() during gpu-reset but disabled by amdgpu_irq_put()
after gpu-reset.
[How]
Only in dm_gpureset_toggle_interrupts() we toggle vblank interrupts by
calling dc_interrupt_set(). Apart from this we call dm_set_vblank()
which uses amdgpu_irq_get/put() to operate vblank irq.
Reviewed-by: Bhawanpreet Lakha <bhawanpreet.lakha@amd.com>
Acked-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Alan Liu <haoping.liu@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Printing the other clock types should not be conditioned on being able
to print OD_SCLK. Some GPUs currently have limited capability of only
printing a subset of these.
Since this condition was introduced in v5.18-rc1, reading from
`pp_od_clk_voltage` has been returning empty on the Asus ROG Strix G15
(2021).
Fixes: 79c65f3fcb ("drm/amd/pm: do not expose power implementation details to amdgpu_pm.c")
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Jonatas Esteves <jntesteves@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Pull spi fixes from Mark Brown:
"A collection of fixes that came in since the merge window, plus an
update to MAINTAINERS.
The Cadence fixes are coming from the addition of device mode support,
they required a couple of incremental updates in order to get
something that works robustly for both device and controller modes"
* tag 'spi-fix-v6.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: spi-cadence: Interleave write of TX and read of RX FIFO
spi: dw: Replace spi->chip_select references with function calls
spi: MAINTAINERS: drop Krzysztof Kozlowski from Samsung SPI
spi: spi-cadence: Only overlap FIFO transactions in slave mode
spi: spi-cadence: Avoid read of RX FIFO before its ready
spi: spi-geni-qcom: Select FIFO mode for chip select
Pull regulator fixes from Mark Brown:
"Some fixes that came in since the merge window, nothing terribly
exciting - a couple of driver specific fixes and a fix for the error
handling when setting up the debugfs for the devices"
* tag 'regulator-fix-v6.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: mt6359: add read check for PMIC MT6359
regulator: Fix error checking for debugfs_create_dir
regulator: pca9450: Fix BUCK2 enable_mask
Pull MMC fixes from Ulf Hansson:
"MMC core:
- Fix error propagation for the non-block-device I/O paths
MMC host:
- sdhci-cadence: Fix an error path during probe
- sdhci-esdhc-imx: Fix support for the 'no-mmc-hs400' DT property"
* tag 'mmc-v6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: sdhci-esdhc-imx: make "no-mmc-hs400" works
mmc: sdhci-cadence: Fix an error handling path in sdhci_cdns_probe()
mmc: block: ensure error propagation for non-blk
Prior to the commit "763bd29fd3d1 ("thermal: int340x_thermal: Use
sysfs_emit_at() instead of scnprintf()", there was a new line after each
UUID string.
With the newline removed, existing user space like "thermald" fails to
compare each supported UUID as it is using getline() to read UUID and
apply correct thermal table.
To avoid breaking existing user space, add newline after each UUID string.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Fixes: 763bd29fd3 ("thermal: int340x_thermal: Use sysfs_emit_at() instead of scnprintf()")
Cc: 6.3+ <stable@vger.kernel.org> # 6.3+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
amd_pstate active mode driver is only compatible with static governors.
Therefore it doesn't need fast_switch functionality. Remove
fast_switch_possible flag from amd_pstate active mode driver.
Fixes: ffa5096a7c ("cpufreq: amd-pstate: implement Pstate EPP support for the AMD processors")
Signed-off-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Schedutil normally calls the adjust_perf callback for drivers with
adjust_perf callback available and fast_switch_possible flag set.
However, when frequency invariance is disabled and schedutil tries to
invoke fast_switch. So, there is a chance of kernel crash if this
function pointer is not set. To protect against this scenario add
fast_switch callback to amd_pstate driver.
Fixes: 1d215f0319 ("cpufreq: amd-pstate: Add fast switch function for AMD P-State")
Signed-off-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Signed-off-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Since at least kernel 6.1, flush_dcache_page() is called with IRQs
disabled, e.g. from aio_complete().
But the current implementation for flush_dcache_page() on parisc
unintentionally re-enables IRQs, which may lead to deadlocks.
Fix it by using xa_lock_irqsave() and xa_unlock_irqrestore()
for the flush_dcache_mmap_*lock() macros instead.
Cc: linux-parisc@vger.kernel.org
Cc: stable@kernel.org # 5.18+
Signed-off-by: Helge Deller <deller@gmx.de>
Commit bf5e758f02 ("genirq/msi: Simplify sysfs handling") reworked the
creation of sysfs entries for MSI IRQs. The creation used to be in
msi_domain_alloc_irqs_descs_locked after calling ops->domain_alloc_irqs.
Then it moved into __msi_domain_alloc_irqs which is an implementation of
domain_alloc_irqs. However, Xen comes with the only other implementation
of domain_alloc_irqs and hence doesn't run the sysfs population code
anymore.
Commit 6c796996ee ("x86/pci/xen: Fixup fallout from the PCI/MSI
overhaul") set the flag MSI_FLAG_DEV_SYSFS for the xen msi_domain_info
but that doesn't actually have an effect because Xen uses it's own
domain_alloc_irqs implementation.
Fix this by making use of the fallback functions for sysfs population.
Fixes: bf5e758f02 ("genirq/msi: Simplify sysfs handling")
Signed-off-by: Maximilian Heyne <mheyne@amazon.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20230503131656.15928-1-mheyne@amazon.de
Signed-off-by: Juergen Gross <jgross@suse.com>
In the pvcalls_new_active_socket() function, most error paths call
pvcalls_back_release_active(fedata->dev, fedata, map) which calls
sock_release() on "sock". The bug is that the caller also frees sock.
Fix this by making every error path in pvcalls_new_active_socket()
release the sock, and don't free it in the caller.
Fixes: 5db4d286a8 ("xen/pvcalls: implement connect command")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/e5f98dc2-0305-491f-a860-71bbd1398a2f@kili.mountain
Signed-off-by: Juergen Gross <jgross@suse.com>
The existing code silently converts read operations with the
REQ_FUA bit set into write-barrier operations. This results in data
loss as the backend scribbles zeroes over the data instead of returning
it.
While the REQ_FUA bit doesn't make sense on a read operation, at least
one well-known out-of-tree kernel module does set it and since it
results in data loss, let's be safe here and only look at REQ_FUA for
writes.
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Acked-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20230426164005.2213139-1-ross.lagerwall@citrix.com
Signed-off-by: Juergen Gross <jgross@suse.com>
Since the dawn of time bio_check_eod has a check for a non-zero size of
the device. This doesn't really make any sense as we never want to send
I/O to a device that's been set to zero size, or never moved out of that.
I am a bit surprised we haven't caught this for a long time, but the
removal of the extra validation inside of zram caused syzbot to trip
over this issue recently. I've added a Fixes tag for that commit, but
the issue really goes back way before git history.
Fixes: 9fe95babc7 ("zram: remove valid_io_request")
Reported-by: syzbot+b8d61a58b7c7ebd2c8e0@syzkaller.appspotmail.com
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230524060538.1593686-1-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
commit ef69d2559f ("riscv: Move early dtb mapping into the fixmap
region") wrongly moved the #ifndef CONFIG_BUILTIN_DTB surrounding the pa
variable definition in create_fdt_early_page_table(), so move it back to
its right place to quiet the following warning:
../arch/riscv/mm/init.c: In function ‘create_fdt_early_page_table’:
../arch/riscv/mm/init.c:925:12: warning: unused variable ‘pa’ [-Wunused-variable]
925 | uintptr_t pa = dtb_pa & ~(PMD_SIZE - 1);
Fixes: ef69d2559f ("riscv: Move early dtb mapping into the fixmap region")
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230519131311.391960-1-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
The kernel kgdb break instructions should only be handled when running
in kernel context.
Cc: <stable@vger.kernel.org> # v5.4+
Signed-off-by: Helge Deller <deller@gmx.de>
The kernel kprobes break instructions should only be handled when running
in kernel context.
Cc: <stable@vger.kernel.org> # v5.18+
Signed-off-by: Helge Deller <deller@gmx.de>
In case a machine can't power-off itself on system shutdown,
allow the user to reboot it by pressing the RETURN key.
Cc: <stable@vger.kernel.org> # v4.14+
Signed-off-by: Helge Deller <deller@gmx.de>
The preorder callback on the kvm_pgtable_stage2_map() path can replace
a table with a block, then recursively free the detached table. The
higher-level walking logic stashes the old page table entry and
then walks the freed table, invoking the leaf callback and
potentially freeing pgtable pages prematurely.
In normal operation, the call to tear down the detached stage-2
is indirected and uses an RCU callback to trigger the freeing.
RCU is not available to pKVM, which is where this bug is
triggered.
Change the behavior of the walker to reload the page table entry
after invoking the walker callback on preorder traversal, as it
does for leaf entries.
Tested on Pixel 6.
Fixes: 5c359cca1f ("KVM: arm64: Tear down unlinked stage-2 subtree after break-before-make")
Suggested-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230522103258.402272-1-tabba@google.com
We appear to have missed the Set/Way CMOs when adding MTE support.
Not that we really expect anyone to use them, but you never know
what stupidity some people can come up with...
Treat these mostly like we deal with the classic S/W CMOs, only
with an additional check that MTE really is enabled.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20230515204601.1270428-3-maz@kernel.org
We may get an empty response with zero length at the beginning of
the driver start and get following UBSAN error. Since there is no
content(SDRT_NONE) for the response, just return and skip the response
handling to avoid this problem.
Test pass : SDIO wifi throughput test with this patch
[ 126.980684] UBSAN: array-index-out-of-bounds in drivers/mmc/host/vub300.c:1719:12
[ 126.980709] index -1 is out of range for type 'u32 [4]'
[ 126.980729] CPU: 4 PID: 9 Comm: kworker/u16:0 Tainted: G E 6.3.0-rc4-mtk-local-202304272142 #1
[ 126.980754] Hardware name: Intel(R) Client Systems NUC8i7BEH/NUC8BEB, BIOS BECFL357.86A.0081.2020.0504.1834 05/04/2020
[ 126.980770] Workqueue: kvub300c vub300_cmndwork_thread [vub300]
[ 126.980833] Call Trace:
[ 126.980845] <TASK>
[ 126.980860] dump_stack_lvl+0x48/0x70
[ 126.980895] dump_stack+0x10/0x20
[ 126.980916] ubsan_epilogue+0x9/0x40
[ 126.980944] __ubsan_handle_out_of_bounds+0x70/0x90
[ 126.980979] vub300_cmndwork_thread+0x58e7/0x5e10 [vub300]
[ 126.981018] ? _raw_spin_unlock+0x18/0x40
[ 126.981042] ? finish_task_switch+0x175/0x6f0
[ 126.981070] ? __switch_to+0x42e/0xda0
[ 126.981089] ? __switch_to_asm+0x3a/0x80
[ 126.981129] ? __pfx_vub300_cmndwork_thread+0x10/0x10 [vub300]
[ 126.981174] ? __kasan_check_read+0x11/0x20
[ 126.981204] process_one_work+0x7ee/0x13d0
[ 126.981246] worker_thread+0x53c/0x1240
[ 126.981291] kthread+0x2b8/0x370
[ 126.981312] ? __pfx_worker_thread+0x10/0x10
[ 126.981336] ? __pfx_kthread+0x10/0x10
[ 126.981359] ret_from_fork+0x29/0x50
[ 126.981400] </TASK>
Fixes: 88095e7b47 ("mmc: Add new VUB300 USB-to-SD/SDIO/MMC driver")
Signed-off-by: Deren Wu <deren.wu@mediatek.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/048cd6972c50c33c2e8f81d5228fed928519918b.1683987673.git.deren.wu@mediatek.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Add a lightweight spinlock check which uses only two instructions
per spinlock call. It detects if a spinlock has been trashed by
some memory corruption and then halts the kernel. It will not detect
uninitialized spinlocks, for which CONFIG_DEBUG_SPINLOCK needs to
be enabled.
This lightweight spinlock check shouldn't influence runtime, so it's
safe to enable it by default.
The __ARCH_SPIN_LOCK_UNLOCKED_VAL constant has been choosen small enough
to be able to be loaded by one LDI assembler statement.
Signed-off-by: Helge Deller <deller@gmx.de>
Fbtest contains some very simple validation of the fbdev userspace API
contract. When used with shmob-drm, it reports the following warnings
and errors:
height changed from 68 to 0
height was rounded down
width changed from 111 to 0
width was rounded down
accel_flags changed from 0 to 1
The first part happens because __fill_var() resets the physical
dimensions of the first connector, as filled in by drm_setup_crtcs_fb().
Fix this by retaining the original values.
The last part happens because __fill_var() forces the FB_ACCELF_TEXT
flag on, while fbtest disables all acceleration on purpose, so it can
draw safely to the frame buffer. Fix this by setting accel_flags to
zero, as DRM does not implement any text console acceleration.
Note that this issue can also be seen in the output of fbset, which
reports "accel true".
Fixes: ee4cce0a8f ("drm/fb-helper: fix input validation gaps in check_var")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Tested-by: Sui Jingfeng <suijingfeng@loongson.cn>
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/57e6b334dae8148b1b8ae6ef308ce9a83810a850.1684854344.git.geert+renesas@glider.be
SoundWire code as it is only supports Bulk register writes and
it does not support multi-register writes.
Any drivers that set can_multi_write and use regmap_multi_reg_write() will
easily endup with programming the hardware incorrectly without any errors.
So, add this check in bus code to be able to validate the drivers config.
Fixes: 522272047d ("regmap: sdw: Remove 8-bit value size restriction")
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Link: https://lore.kernel.org/r/20230523154747.5429-1-srinivas.kandagatla@linaro.org
Signed-off-by: Mark Brown <broonie@kernel.org>
ASoC: Fixes for v6.4
A collection of fixes for v6.4, mostly driver specific but there's also
one fix for DPCM to avoid incorrectly repeated calls to prepare() which
can trigger issues on some systems.
The GFNI routines in the AVX version of the ARIA implementation now use
explicit VMOVDQA instructions to load the constant input vectors, which
means they must be 16 byte aligned. So ensure that this is the case, by
dropping the section split and the incorrect .align 8 directive, and
emitting the constants into the 16-byte aligned section instead.
Note that the AVX2 version of this code deviates from this pattern, and
does not require a similar fix, given that it loads these contants as
8-byte memory operands, for which AVX2 permits any alignment.
Cc: Taehee Yoo <ap420073@gmail.com>
Fixes: 8b84475318 ("crypto: x86/aria-avx - Do not use avx2 instructions")
Reported-by: syzbot+a6abcf08bad8b18fd198@syzkaller.appspotmail.com
Tested-by: syzbot+a6abcf08bad8b18fd198@syzkaller.appspotmail.com
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The of_find_device_by_node() function is returning a struct platform_device
object with the embedded struct device member's reference counter
incremented. This needs to be dropped when done with the platform device
returned by of_find_device_by_node().
at91_pm_eth_quirk_is_valid() calls of_find_device_by_node() on
suspend and resume path. On suspend it calls of_find_device_by_node() and
on resume and failure paths it drops the counter of
struct platform_device::dev.
In case ethernet device may not wakeup there is a put_device() on
at91_pm_eth_quirk_is_valid() which is wrong as it colides with
put_device() on resume path leading to the reference counter of struct
device embedded in struct platform_device to be messed, stack trace to be
displayed (after 5 consecutive suspend/resume cycles) and execution to
hang.
Along with this the error path of at91_pm_config_quirks() had been also
adapted to decrement propertly the reference counter of struct device
embedded in struct platform_device.
Fixes: b7fc72c633 ("ARM: at91: pm: add quirks for pm")
Signed-off-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Link: https://lore.kernel.org/r/20230518062511.2988500-1-claudiu.beznea@microchip.com
optlen is fetched without checking whether there is more than one byte to parse.
It can lead to out-of-bounds access.
Found by InfoTeCS on behalf of Linux Verification Center
(linuxtesting.org) with SVACE.
Fixes: c61a404325 ("[IPV6]: Find option offset by type.")
Signed-off-by: Gavrilov Ilia <Ilia.Gavrilov@infotecs.ru>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Saeed Mahameed says:
====================
mlx5-fixes-2023-05-22
This series provides bug fixes for the mlx5 driver.
Please pull and let me know if there is any problem.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The commit c6d96df9fa ("net: ethernet: mtk_eth_soc: drop generic vlan rx
offload, only use DSA untagging") makes VLAN RX offloading to be only used
on the SoCs without the MTK_NETSYS_V2 ability (which are not just MT7621
and MT7622). The commit disables the proper handling of special tagged
(DSA) frames, added with commit 87e3df4961 ("net-next: ethernet:
mediatek: add CDM able to recognize the tag for DSA"), for non
MTK_NETSYS_V2 SoCs when it finds a MAC that does not use DSA. So if the
other MAC uses DSA, the CDMQ component transmits DSA tagged frames to the
CPU improperly. This issue can be observed on frames with TCP, for example,
a TCP speed test using iperf3 won't work.
The commit disables the proper handling of special tagged (DSA) frames
because it assumes that these SoCs don't use more than one MAC, which is
wrong. Although I made Frank address this false assumption on the patch log
when they sent the patch on behalf of Felix, the code still made changes
with this assumption.
Therefore, the proper handling of special tagged (DSA) frames must be kept
enabled in all circumstances as it doesn't affect non DSA tagged frames.
Hardware DSA untagging, introduced with the commit 2d7605a729 ("net:
ethernet: mtk_eth_soc: enable hardware DSA untagging"), and VLAN RX
offloading are operations on the two CDM components of the frame engine,
CDMP and CDMQ, which connect to Packet DMA (PDMA) and QoS DMA (QDMA) and
are between the MACs and the CPU. These operations apply to all MACs of the
SoC so if one MAC uses DSA and the other doesn't, the hardware DSA
untagging operation will cause the CDMP component to transmit non DSA
tagged frames to the CPU improperly.
Since the VLAN RX offloading feature configuration was dropped, VLAN RX
offloading can only be used along with hardware DSA untagging. So, for the
case above, we need to disable both features and leave it to the CPU,
therefore software, to untag the DSA and VLAN tags.
So the correct way to handle this is:
For all SoCs:
Enable the proper handling of special tagged (DSA) frames
(MTK_CDMQ_IG_CTRL).
For non MTK_NETSYS_V2 SoCs:
Enable hardware DSA untagging (MTK_CDMP_IG_CTRL).
Enable VLAN RX offloading (MTK_CDMP_EG_CTRL).
When a non MTK_NETSYS_V2 SoC MAC does not use DSA:
Disable hardware DSA untagging (MTK_CDMP_IG_CTRL).
Disable VLAN RX offloading (MTK_CDMP_EG_CTRL).
Fixes: c6d96df9fa ("net: ethernet: mtk_eth_soc: drop generic vlan rx offload, only use DSA untagging")
Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
"_start" is used in several arches and proably should be reserved
for ARCH usage. Using it in a driver for a private symbol can cause
a build error when it conflicts with ARCH usage of the same symbol.
Therefore rename pl330's "_start" to "pl330_start_thread" so that there
is no conflict and no build error.
drivers/dma/pl330.c:1053:13: error: '_start' redeclared as different kind of symbol
1053 | static bool _start(struct pl330_thread *thrd)
| ^~~~~~
In file included from ../include/linux/interrupt.h:21,
from ../drivers/dma/pl330.c:18:
arch/riscv/include/asm/sections.h:11:13: note: previous declaration of '_start' with type 'char[]'
11 | extern char _start[];
| ^~~~~~
Fixes: b7d861d939 ("DMA: PL330: Merge PL330 driver into drivers/dma/")
Fixes: ae43b32891 ("ARM: 8202/1: dmaengine: pl330: Add runtime Power Management support v12")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Jaswinder Singh <jassisinghbrar@gmail.com>
Cc: Boojin Kim <boojin.kim@samsung.com>
Cc: Krzysztof Kozlowski <krzk@kernel.org>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Vinod Koul <vkoul@kernel.org>
Cc: dmaengine@vger.kernel.org
Cc: linux-riscv@lists.infradead.org
Link: https://lore.kernel.org/r/20230524045310.27923-1-rdunlap@infradead.org
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Commit 50749f2dd6 ("tcp/udp: Fix memleaks of sk and zerocopy skbs with
TX timestamp.") added a call to skb_orphan_frags_rx() to fix leaks with
zerocopy skbs. But it ended up adding a leak of its own. When
skb_orphan_frags_rx() fails, the function just returns, leaking the skb
it just cloned. Free it before returning.
This bug was discovered and resolved using Coverity Static Analysis
Security Testing (SAST) by Synopsys, Inc.
Fixes: 50749f2dd6 ("tcp/udp: Fix memleaks of sk and zerocopy skbs with TX timestamp.")
Signed-off-by: Pratyush Yadav <ptyadav@amazon.de>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20230522153020.32422-1-ptyadav@amazon.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The histogram and synthetic events can use a pseudo event called
"stacktrace" that will create a stacktrace at the time of the event and
use it just like it was a normal field. We have other pseudo events such
as "common_cpu" and "common_timestamp". To stay consistent with that,
convert "stacktrace" to "common_stacktrace". As this was used in older
kernels, to keep backward compatibility, this will act just like
"common_cpu" did with "cpu". That is, "cpu" will be the same as
"common_cpu" unless the event has a "cpu" field. In which case, the
event's field is used. The same is true with "stacktrace".
Also update the documentation to reflect this change.
Link: https://lore.kernel.org/linux-trace-kernel/20230523230913.6860e28d@rorschach.local.home
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Tom Zanussi <zanussi@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
The driver's interrupt service routine is requested with the
IRQF_NO_THREAD if MSI is available. This means that the routine is
invoked in hardirq context even on PREEMPT_RT. The routine itself is
relatively short and schedules a worker, performs register access and
schedules NAPI. On PREEMPT_RT, scheduling NAPI from hardirq results in
waking ksoftirqd for further processing so using NAPI threads with this
driver is highly recommended since it NULL routes the threaded-IRQ
efforts.
Adding rtl_hw_aspm_clkreq_enable() to the ISR is problematic on
PREEMPT_RT because the function uses spinlock_t locks which become
sleeping locks on PREEMPT_RT. The locks are only used to protect
register access and don't nest into other functions or locks. They are
also not used for unbounded period of time. Therefore it looks okay to
convert them to raw_spinlock_t.
Convert the three locks which are used from the interrupt service
routine to raw_spinlock_t.
Fixes: e1ed3e4d91 ("r8169: disable ASPM during NAPI poll")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/20230522134121.uxjax0F5@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
page_pool_ring_[un]lock() use in_softirq() to decide which
spin lock variant to use, and when they are called in the
context with in_softirq() being false, spin_lock_bh() is
called in page_pool_ring_lock() while spin_unlock() is
called in page_pool_ring_unlock(), because spin_lock_bh()
has disabled the softirq in page_pool_ring_lock(), which
causes inconsistency for spin lock pair calling.
This patch fixes it by returning in_softirq state from
page_pool_producer_lock(), and use it to decide which
spin lock variant to use in page_pool_producer_unlock().
As pool->ring has both producer and consumer lock, so
rename it to page_pool_producer_[un]lock() to reflect
the actual usage. Also move them to page_pool.c as they
are only used there, and remove the 'inline' as the
compiler may have better idea to do inlining or not.
Fixes: 7886244736 ("net: page_pool: Add bulk support for ptr_ring")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Link: https://lore.kernel.org/r/20230522031714.5089-1-linyunsheng@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Modifiers are used to change the behavior of keys. For instance, they
can grouped into buckets, converted to syscall names (from the syscall
identifier), show task->comm of the current pid, be an array of longs
that represent a stacktrace, and more.
It was found that nothing stopped a value from taking a modifier. As
values are simple counters. If this happened, it would call code that
was not expecting a modifier and crash the kernel. This was fixed by
having the ___create_val_field() function test if a modifier was present
and fail if one was. This fixed the crash.
Now there's a problem with variables. Variables are used to pass fields
from one event to another. Variables are allowed to have some modifiers,
as the processing may need to happen at the time of the event (like
stacktraces and comm names of the current pid). The issue is that it too
uses __create_val_field(). Now that fails on modifiers, variables can no
longer use them (this is a regression).
As not all modifiers are for variables, have them use a separate check.
Link: https://lore.kernel.org/linux-trace-kernel/20230523221108.064a5d82@rorschach.local.home
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Tom Zanussi <zanussi@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Fixes: e0213434fe ("tracing: Do not let histogram values have some modifiers")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Pull tpm fix from Jarkko Sakkinen:
"A fix to add a new entry to the deny for list for tpm_tis interrupts"
* tag 'tpmdd-v6.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
tpm: tpm_tis: Disable interrupts for AEON UPX-i11
Interrupts got recently enabled for tpm_tis.
The interrupts initially works on the device but they will stop arriving
after circa ~200 interrupts. On system reboot/shutdown this will cause a
long wait (120000 jiffies).
[jarkko@kernel.org: fix a merge conflict and adjust the commit message]
Fixes: e644b2f498 ("tpm, tpm_tis: Enable interrupt test")
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Pull Xtensa fixes from Max Filippov:
- fix signal delivery to FDPIC process
- add __bswap{si,di}2 helpers
* tag 'xtensa-20230523' of https://github.com/jcmvbkbc/linux-xtensa:
xtensa: add __bswap{si,di}2 helpers
xtensa: fix signal delivery to FDPIC process
While doing my research to improve the xpad device names I noticed
that the 1532:0037 VID/PID seems to be used by the DeathAdder 2013,
so that Razer Sabertooth instance looked wrong and very suspect to
me. I didn't see any mention in the official drivers, either.
After doing more research, it turns out that the xpad list
is used by many other projects (like Steam) as-is [1], this
issue was reported [2] and Valve/Sam Lantinga fixed it [3]:
[1]: dcc5eef0e2/src/joystick/controller_type.h (L251)
[2]: https://steamcommunity.com/app/353380/discussions/0/1743392486228754770/
[3]: https://hg.libsdl.org/SDL/rev/29809f6f0271
(With multiple Internet users reporting similar issues, not linked here)
After not being able to find the correct VID/PID combination anywhere
on the Internet and not receiving any reply from Razer support I did
some additional detective work, it seems like it presents itself as
"Razer Sabertooth Gaming Controller (XBOX360)", code 1689:FE00.
Leaving us with this:
* Razer Sabertooth (1689:fe00)
* Razer Sabertooth Elite (24c6:5d04)
* Razer DeathAdder 2013 (1532:0037) [note: not a gamepad]
So, to sum things up; remove this conflicting/duplicate entry:
{ 0x1532, 0x0037, "Razer Sabertooth", 0, XTYPE_XBOX360 },
As the real/correct one is already present there, even if
the Internet as a whole insists on presenting it as the
Razer Sabertooth Elite, which (by all accounts) is not:
{ 0x1689, 0xfe00, "Razer Sabertooth", 0, XTYPE_XBOX360 },
Actual change in SDL2 referencing this kernel issue:
e5e5416975
For more information of the device, take a look here:
https://github.com/xboxdrv/xboxdrv/pull/59
You can see a lsusb dump here: https://github.com/xboxdrv/xboxdrv/files/76581/Qa6dBcrv.txt
Fixes: f554f619b7 ("Input: xpad - sync device IDs with xboxdrv")
Signed-off-by: Ismael Ferreras Morezuelas <swyterzone@gmail.com>
Reviewed-by: Cameron Gutman <aicommander@gmail.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/5c12dbdb-5774-fc68-5c58-ca596383663e@gmail.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Check physical PFN is valid before converting the PFN to a struct page
pointer to be returned to caller of vfio_pin_pages().
vfio_pin_pages() pins user pages with contiguous IOVA.
If the IOVA of a user page to be pinned belongs to vma of vm_flags
VM_PFNMAP, pin_user_pages_remote() will return -EFAULT without returning
struct page address for this PFN. This is because usually this kind of PFN
(e.g. MMIO PFN) has no valid struct page address associated.
Upon this error, vaddr_get_pfns() will obtain the physical PFN directly.
While previously vfio_pin_pages() returns to caller PFN arrays directly,
after commit
34a255e676 ("vfio: Replace phys_pfn with pages for vfio_pin_pages()"),
PFNs will be converted to "struct page *" unconditionally and therefore
the returned "struct page *" array may contain invalid struct page
addresses.
Given current in-tree users of vfio_pin_pages() only expect "struct page *
returned, check PFN validity and return -EINVAL to let the caller be
aware of IOVAs to be pinned containing PFN not able to be returned in
"struct page *" array. So that, the caller will not consume the returned
pointer (e.g. test PageReserved()) and avoid error like "supervisor read
access in kernel mode".
Fixes: 34a255e676 ("vfio: Replace phys_pfn with pages for vfio_pin_pages()")
Cc: Sean Christopherson <seanjc@google.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20230519065843.10653-1-yan.y.zhao@intel.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
vmbus_wait_for_unload() may be called in the panic path after other
CPUs are stopped. vmbus_wait_for_unload() currently loops through
online CPUs looking for the UNLOAD response message. But the values of
CONFIG_KEXEC_CORE and crash_kexec_post_notifiers affect the path used
to stop the other CPUs, and in one of the paths the stopped CPUs
are removed from cpu_online_mask. This removal happens in both
x86/x64 and arm64 architectures. In such a case, vmbus_wait_for_unload()
only checks the panic'ing CPU, and misses the UNLOAD response message
except when the panic'ing CPU is CPU 0. vmbus_wait_for_unload()
eventually times out, but only after waiting 100 seconds.
Fix this by looping through *present* CPUs in vmbus_wait_for_unload().
The cpu_present_mask is not modified by stopping the other CPUs in the
panic path, nor should it be.
Also, in a CoCo VM the synic_message_page is not allocated in
hv_synic_alloc(), but is set and cleared in hv_synic_enable_regs()
and hv_synic_disable_regs() such that it is set only when the CPU is
online. If not all present CPUs are online when vmbus_wait_for_unload()
is called, the synic_message_page might be NULL. Add a check for this.
Fixes: cd95aad557 ("Drivers: hv: vmbus: handle various crash scenarios")
Cc: stable@vger.kernel.org
Reported-by: John Starks <jostarks@microsoft.com>
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Link: https://lore.kernel.org/r/1684422832-38476-1-git-send-email-mikelley@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Pull erofs fixes from Gao Xiang:
"One patch addresses a null-ptr-deref issue reported by syzbot weeks
ago, which is caused by the new long xattr name prefix feature and
needs to be fixed.
The remaining two patches are minor cleanups to avoid unnecessary
compilation and adjust per-cpu kworker configuration.
Summary:
- Fix null-ptr-deref related to long xattr name prefixes
- Avoid pcpubuf compilation if CONFIG_EROFS_FS_ZIP is off
- Use high priority kthreads by default if per-cpu kthread workers
are enabled"
* tag 'erofs-for-6.4-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: use HIPRI by default if per-cpu kthreads are enabled
erofs: avoid pcpubuf.c inclusion if CONFIG_EROFS_FS_ZIP is off
erofs: fix null-ptr-deref caused by erofs_xattr_prefixes_init
commit <8af870aa5b847> ("block: enable bio caching use for passthru IO")
introduced bio-cache for passthru IO. In case when nr_vecs are greater
than BIO_INLINE_VECS, bio and bvecs are allocated from mempool (instead
of percpu cache) and REQ_ALLOC_CACHE is cleared. This causes the side
effect of not freeing bio/bvecs into mempool on completion.
This patch lets the passthru IO fallback to allocation using bio_kmalloc
when nr_vecs are greater than BIO_INLINE_VECS. The corresponding bio
is freed during call to blk_mq_map_bio_put during completion.
Cc: stable@vger.kernel.org # 6.1
fixes <8af870aa5b847> ("block: enable bio caching use for passthru IO")
Signed-off-by: Anuj Gupta <anuj20.g@samsung.com>
Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
Link: https://lore.kernel.org/r/20230523111709.145676-1-anuj20.g@samsung.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
When patching the kernel code some alternatives depend on SMP vs. !SMP.
Use the value of num_present_cpus() instead of num_online_cpus() to
decide, otherwise we may run into issues if and additional CPU is
enabled after having loaded a module while only one CPU was enabled.
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: <stable@vger.kernel.org> # v6.1+
While testing rtla timerlat auto analysis, I reach a condition where
the interface was not receiving tracing data. I was able to manually
reproduce the problem with these steps:
# echo 0 > tracing_on # disable trace
# echo 1 > osnoise/stop_tracing_us # stop trace if timerlat irq > 1 us
# echo timerlat > current_tracer # enable timerlat tracer
# sleep 1 # wait... that is the time when rtla
# apply configs like prio or cgroup
# echo 1 > tracing_on # start tracing
# cat trace
# tracer: timerlat
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / _-=> migrate-disable
# |||| / delay
# ||||| ACTIVATION
# TASK-PID CPU# ||||| TIMESTAMP ID CONTEXT LATENCY
# | | | ||||| | | | |
NOTHING!
Then, trying to enable tracing again with echo 1 > tracing_on resulted
in no change: the trace was still not tracing.
This problem happens because the timerlat IRQ hits the stop tracing
condition while tracing is off, and do not wake up the timerlat thread,
so the timerlat threads are kept sleeping forever, resulting in no
trace, even after re-enabling the tracer.
Avoid this condition by always waking up the threads, even after stopping
tracing, allowing the tracer to return to its normal operating after
a new tracing on.
Link: https://lore.kernel.org/linux-trace-kernel/1ed8f830638b20a39d535d27d908e319a9a3c4e2.1683822622.git.bristot@kernel.org
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: stable@vger.kernel.org
Fixes: a955d7eac1 ("trace: Add timerlat tracer")
Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
If msg_xfer() is unable to queue part of a NNC message because the MHI ring
is full, it will attempt to give the QSM some time to drain the queue.
However, if QSM fails to make any room, msg_xfer() will fail and tell the
caller to try again. This is problematic because part of the message may
have been committed to the ring and there is no mechanism to revoke that
content. This will cause QSM to receive a corrupt message.
The better way to do this is to check if the ring has enough space for the
entire message before committing any of the message. Since msg_xfer() is
under the cntl_mutex no one else can come in and consume the space.
Fixes: 129776ac2e ("accel/qaic: Add control path")
Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Reviewed-by: Pranjal Ramajor Asha Kanojiya <quic_pkanojiy@quicinc.com>
Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230517193540.14323-6-quic_jhugo@quicinc.com
clang static analysis reports
drivers/accel/qaic/qaic_data.c:610:2: warning: Undefined or garbage
value returned to caller [core.uninitialized.UndefReturn]
return ret;
^~~~~~~~~~
From a code analysis of the function, the ret variable is only set some
of the time but is always returned. This suggests ret can return
uninitialized garbage. However BO allocation will ensure ret is always
set in reality.
Initialize ret to 0 to silence the warning.
Fixes: ff13be8303 ("accel/qaic: Add datapath")
Signed-off-by: Tom Rix <trix@redhat.com>
[jhugo: Reword commit text]
Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com>
Reviewed-by: Pranjal Ramajor Asha Kanojiya <quic_pkanojiy@quicinc.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230517165605.16770-1-quic_jhugo@quicinc.com
When BPF program drops pkts the sockmap logic 'eats' the packet and
updates copied_seq. In the PASS case where the sk_buff is accepted
we update copied_seq from recvmsg path so we need a new test to
handle the drop case.
Original patch series broke this resulting in
test_sockmap_skb_verdict_fionread:PASS:ioctl(FIONREAD) error 0 nsec
test_sockmap_skb_verdict_fionread:FAIL:ioctl(FIONREAD) unexpected ioctl(FIONREAD): actual 1503041772 != expected 256
After updated patch with fix.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/20230523025618.113937-14-john.fastabend@gmail.com
A bug was reported where ioctl(FIONREAD) returned zero even though the
socket with a SK_SKB verdict program attached had bytes in the msg
queue. The result is programs may hang or more likely try to recover,
but use suboptimal buffer sizes.
Add a test to check that ioctl(FIONREAD) returns the correct number of
bytes.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/20230523025618.113937-13-john.fastabend@gmail.com
The read_skb() logic is incrementing the tcp->copied_seq which is used for
among other things calculating how many outstanding bytes can be read by
the application. This results in application errors, if the application
does an ioctl(FIONREAD) we return zero because this is calculated from
the copied_seq value.
To fix this we move tcp->copied_seq accounting into the recv handler so
that we update these when the recvmsg() hook is called and data is in
fact copied into user buffers. This gives an accurate FIONREAD value
as expected and improves ACK handling. Before we were calling the
tcp_rcv_space_adjust() which would update 'number of bytes copied to
user in last RTT' which is wrong for programs returning SK_PASS. The
bytes are only copied to the user when recvmsg is handled.
Doing the fix for recvmsg is straightforward, but fixing redirect and
SK_DROP pkts is a bit tricker. Build a tcp_psock_eat() helper and then
call this from skmsg handlers. This fixes another issue where a broken
socket with a BPF program doing a resubmit could hang the receiver. This
happened because although read_skb() consumed the skb through sock_drop()
it did not update the copied_seq. Now if a single reccv socket is
redirecting to many sockets (for example for lb) the receiver sk will be
hung even though we might expect it to continue. The hang comes from
not updating the copied_seq numbers and memory pressure resulting from
that.
We have a slight layer problem of calling tcp_eat_skb even if its not
a TCP socket. To fix we could refactor and create per type receiver
handlers. I decided this is more work than we want in the fix and we
already have some small tweaks depending on caller that use the
helper skb_bpf_strparser(). So we extend that a bit and always set
the strparser bit when it is in use and then we can gate the
seq_copied updates on this.
Fixes: 04919bed94 ("tcp: Introduce tcp_read_skb()")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/20230523025618.113937-9-john.fastabend@gmail.com
When TCP stack has data ready to read sk_data_ready() is called. Sockmap
overwrites this with its own handler to call into BPF verdict program.
But, the original TCP socket had sock_def_readable that would additionally
wake up any user space waiters with sk_wake_async().
Sockmap saved the callback when the socket was created so call the saved
data ready callback and then we can wake up any epoll() logic waiting
on the read.
Note we call on 'copied >= 0' to account for returning 0 when a FIN is
received because we need to wake up user for this as well so they
can do the recvmsg() -> 0 and detect the shutdown.
Fixes: 04919bed94 ("tcp: Introduce tcp_read_skb()")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/20230523025618.113937-8-john.fastabend@gmail.com
A common mechanism to put a TCP socket into the sockmap is to hook the
BPF_SOCK_OPS_{ACTIVE_PASSIVE}_ESTABLISHED_CB event with a BPF program
that can map the socket info to the correct BPF verdict parser. When
the user adds the socket to the map the psock is created and the new
ops are assigned to ensure the verdict program will 'see' the sk_buffs
as they arrive.
Part of this process hooks the sk_data_ready op with a BPF specific
handler to wake up the BPF verdict program when data is ready to read.
The logic is simple enough (posted here for easy reading)
static void sk_psock_verdict_data_ready(struct sock *sk)
{
struct socket *sock = sk->sk_socket;
if (unlikely(!sock || !sock->ops || !sock->ops->read_skb))
return;
sock->ops->read_skb(sk, sk_psock_verdict_recv);
}
The oversight here is sk->sk_socket is not assigned until the application
accepts() the new socket. However, its entirely ok for the peer application
to do a connect() followed immediately by sends. The socket on the receiver
is sitting on the backlog queue of the listening socket until its accepted
and the data is queued up. If the peer never accepts the socket or is slow
it will eventually hit data limits and rate limit the session. But,
important for BPF sockmap hooks when this data is received TCP stack does
the sk_data_ready() call but the read_skb() for this data is never called
because sk_socket is missing. The data sits on the sk_receive_queue.
Then once the socket is accepted if we never receive more data from the
peer there will be no further sk_data_ready calls and all the data
is still on the sk_receive_queue(). Then user calls recvmsg after accept()
and for TCP sockets in sockmap we use the tcp_bpf_recvmsg_parser() handler.
The handler checks for data in the sk_msg ingress queue expecting that
the BPF program has already run from the sk_data_ready hook and enqueued
the data as needed. So we are stuck.
To fix do an unlikely check in recvmsg handler for data on the
sk_receive_queue and if it exists wake up data_ready. We have the sock
locked in both read_skb and recvmsg so should avoid having multiple
runners.
Fixes: 04919bed94 ("tcp: Introduce tcp_read_skb()")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/20230523025618.113937-7-john.fastabend@gmail.com
The sockmap code is returning EAGAIN after a FIN packet is received and no
more data is on the receive queue. Correct behavior is to return 0 to the
user and the user can then close the socket. The EAGAIN causes many apps
to retry which masks the problem. Eventually the socket is evicted from
the sockmap because its released from sockmap sock free handling. The
issue creates a delay and can cause some errors on application side.
To fix this check on sk_msg_recvmsg side if length is zero and FIN flag
is set then set return to zero. A selftest will be added to check this
condition.
Fixes: 04919bed94 ("tcp: Introduce tcp_read_skb()")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: William Findlay <will@isovalent.com>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/20230523025618.113937-6-john.fastabend@gmail.com
We noticed some rare sk_buffs were stepping past the queue when system was
under memory pressure. The general theory is to skip enqueueing
sk_buffs when its not necessary which is the normal case with a system
that is properly provisioned for the task, no memory pressure and enough
cpu assigned.
But, if we can't allocate memory due to an ENOMEM error when enqueueing
the sk_buff into the sockmap receive queue we push it onto a delayed
workqueue to retry later. When a new sk_buff is received we then check
if that queue is empty. However, there is a problem with simply checking
the queue length. When a sk_buff is being processed from the ingress queue
but not yet on the sockmap msg receive queue its possible to also recv
a sk_buff through normal path. It will check the ingress queue which is
zero and then skip ahead of the pkt being processed.
Previously we used sock lock from both contexts which made the problem
harder to hit, but not impossible.
To fix instead of popping the skb from the queue entirely we peek the
skb from the queue and do the copy there. This ensures checks to the
queue length are non-zero while skb is being processed. Then finally
when the entire skb has been copied to user space queue or another
socket we pop it off the queue. This way the queue length check allows
bypassing the queue only after the list has been completely processed.
To reproduce issue we run NGINX compliance test with sockmap running and
observe some flakes in our testing that we attributed to this issue.
Fixes: 04919bed94 ("tcp: Introduce tcp_read_skb()")
Suggested-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: William Findlay <will@isovalent.com>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/20230523025618.113937-5-john.fastabend@gmail.com
Now that the backlog manages the reschedule() logic correctly we can drop
the partial fix to reschedule from recvmsg hook.
Rescheduling on recvmsg hook was added to address a corner case where we
still had data in the backlog state but had nothing to kick it and
reschedule the backlog worker to run and finish copying data out of the
state. This had a couple limitations, first it required user space to
kick it introducing an unnecessary EBUSY and retry. Second it only
handled the ingress case and egress redirects would still be hung.
With the correct fix, pushing the reschedule logic down to where the
enomem error occurs we can drop this fix.
Fixes: bec217197b ("skmsg: Schedule psock work if the cached skb exists on the psock")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/20230523025618.113937-4-john.fastabend@gmail.com
Sk_buffs are fed into sockmap verdict programs either from a strparser
(when the user might want to decide how framing of skb is done by attaching
another parser program) or directly through tcp_read_sock. The
tcp_read_sock is the preferred method for performance when the BPF logic is
a stream parser.
The flow for Cilium's common use case with a stream parser is,
tcp_read_sock()
sk_psock_verdict_recv
ret = bpf_prog_run_pin_on_cpu()
sk_psock_verdict_apply(sock, skb, ret)
// if system is under memory pressure or app is slow we may
// need to queue skb. Do this queuing through ingress_skb and
// then kick timer to wake up handler
skb_queue_tail(ingress_skb, skb)
schedule_work(work);
The work queue is wired up to sk_psock_backlog(). This will then walk the
ingress_skb skb list that holds our sk_buffs that could not be handled,
but should be OK to run at some later point. However, its possible that
the workqueue doing this work still hits an error when sending the skb.
When this happens the skbuff is requeued on a temporary 'state' struct
kept with the workqueue. This is necessary because its possible to
partially send an skbuff before hitting an error and we need to know how
and where to restart when the workqueue runs next.
Now for the trouble, we don't rekick the workqueue. This can cause a
stall where the skbuff we just cached on the state variable might never
be sent. This happens when its the last packet in a flow and no further
packets come along that would cause the system to kick the workqueue from
that side.
To fix we could do simple schedule_work(), but while under memory pressure
it makes sense to back off some instead of continue to retry repeatedly. So
instead to fix convert schedule_work to schedule_delayed_work and add
backoff logic to reschedule from backlog queue on errors. Its not obvious
though what a good backoff is so use '1'.
To test we observed some flakes whil running NGINX compliance test with
sockmap we attributed these failed test to this bug and subsequent issue.
>From on list discussion. This commit
bec217197b41("skmsg: Schedule psock work if the cached skb exists on the psock")
was intended to address similar race, but had a couple cases it missed.
Most obvious it only accounted for receiving traffic on the local socket
so if redirecting into another socket we could still get an sk_buff stuck
here. Next it missed the case where copied=0 in the recv() handler and
then we wouldn't kick the scheduler. Also its sub-optimal to require
userspace to kick the internal mechanisms of sockmap to wake it up and
copy data to user. It results in an extra syscall and requires the app
to actual handle the EAGAIN correctly.
Fixes: 04919bed94 ("tcp: Introduce tcp_read_skb()")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: William Findlay <will@isovalent.com>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/20230523025618.113937-3-john.fastabend@gmail.com
The read_skb hook calls consume_skb() now, but this means that if the
recv_actor program wants to use the skb it needs to inc the ref cnt
so that the consume_skb() doesn't kfree the sk_buff.
This is problematic because in some error cases under memory pressure
we may need to linearize the sk_buff from sk_psock_skb_ingress_enqueue().
Then we get this,
skb_linearize()
__pskb_pull_tail()
pskb_expand_head()
BUG_ON(skb_shared(skb))
Because we incremented users refcnt from sk_psock_verdict_recv() we
hit the bug on with refcnt > 1 and trip it.
To fix lets simply pass ownership of the sk_buff through the skb_read
call. Then we can drop the consume from read_skb handlers and assume
the verdict recv does any required kfree.
Bug found while testing in our CI which runs in VMs that hit memory
constraints rather regularly. William tested TCP read_skb handlers.
[ 106.536188] ------------[ cut here ]------------
[ 106.536197] kernel BUG at net/core/skbuff.c:1693!
[ 106.536479] invalid opcode: 0000 [#1] PREEMPT SMP PTI
[ 106.536726] CPU: 3 PID: 1495 Comm: curl Not tainted 5.19.0-rc5 #1
[ 106.537023] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.16.0-1 04/01/2014
[ 106.537467] RIP: 0010:pskb_expand_head+0x269/0x330
[ 106.538585] RSP: 0018:ffffc90000138b68 EFLAGS: 00010202
[ 106.538839] RAX: 000000000000003f RBX: ffff8881048940e8 RCX: 0000000000000a20
[ 106.539186] RDX: 0000000000000002 RSI: 0000000000000000 RDI: ffff8881048940e8
[ 106.539529] RBP: ffffc90000138be8 R08: 00000000e161fd1a R09: 0000000000000000
[ 106.539877] R10: 0000000000000018 R11: 0000000000000000 R12: ffff8881048940e8
[ 106.540222] R13: 0000000000000003 R14: 0000000000000000 R15: ffff8881048940e8
[ 106.540568] FS: 00007f277dde9f00(0000) GS:ffff88813bd80000(0000) knlGS:0000000000000000
[ 106.540954] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 106.541227] CR2: 00007f277eeede64 CR3: 000000000ad3e000 CR4: 00000000000006e0
[ 106.541569] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 106.541915] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 106.542255] Call Trace:
[ 106.542383] <IRQ>
[ 106.542487] __pskb_pull_tail+0x4b/0x3e0
[ 106.542681] skb_ensure_writable+0x85/0xa0
[ 106.542882] sk_skb_pull_data+0x18/0x20
[ 106.543084] bpf_prog_b517a65a242018b0_bpf_skskb_http_verdict+0x3a9/0x4aa9
[ 106.543536] ? migrate_disable+0x66/0x80
[ 106.543871] sk_psock_verdict_recv+0xe2/0x310
[ 106.544258] ? sk_psock_write_space+0x1f0/0x1f0
[ 106.544561] tcp_read_skb+0x7b/0x120
[ 106.544740] tcp_data_queue+0x904/0xee0
[ 106.544931] tcp_rcv_established+0x212/0x7c0
[ 106.545142] tcp_v4_do_rcv+0x174/0x2a0
[ 106.545326] tcp_v4_rcv+0xe70/0xf60
[ 106.545500] ip_protocol_deliver_rcu+0x48/0x290
[ 106.545744] ip_local_deliver_finish+0xa7/0x150
Fixes: 04919bed94 ("tcp: Introduce tcp_read_skb()")
Reported-by: William Findlay <will@isovalent.com>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: William Findlay <will@isovalent.com>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/20230523025618.113937-2-john.fastabend@gmail.com
notify_change can modify the iattr structure. In particular it can
end up setting ATTR_MODE when ATTR_KILL_SUID is already set, causing
a BUG() if the same iattr is passed to notify_change more than once.
Make a copy of the struct iattr before calling notify_change.
Reported-by: Zhi Li <yieli@redhat.com>
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2207969
Tested-by: Zhi Li <yieli@redhat.com>
Fixes: 34b91dda71 ("NFSD: Make nfsd4_setattr() wait before returning NFS4ERR_DELAY")
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
With a raw socket bound to IPPROTO_RAW (ie with hdrincl enabled), the
protocol field of the flow structure, build by raw_sendmsg() /
rawv6_sendmsg()), is set to IPPROTO_RAW. This breaks the ipsec policy
lookup when some policies are defined with a protocol in the selector.
For ipv6, the sin6_port field from 'struct sockaddr_in6' could be used to
specify the protocol. Just accept all values for IPPROTO_RAW socket.
For ipv4, the sin_port field of 'struct sockaddr_in' could not be used
without breaking backward compatibility (the value of this field was never
checked). Let's add a new kind of control message, so that the userland
could specify which protocol is used.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
CC: stable@vger.kernel.org
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Link: https://lore.kernel.org/r/20230522120820.1319391-1-nicolas.dichtel@6wind.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Added a variable check and
transition in case of an error
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Signed-off-by: Denis Arefev <arefev@swemel.ru>
Reviewed-by: Ping Cheng <ping.cheng@wacom.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
If high bit is set to 1 in ((data[3] & 0x0f << 28), after all arithmetic
operations and integer promotions are done, high bits in
wacom->serial[idx] will be filled with 1s as well.
Avoid this, albeit unlikely, issue by specifying left operand's __u64
type for the right operand.
Found by Linux Verification Center (linuxtesting.org) with static
analysis tool SVACE.
Fixes: 3bea733ab2 ("USB: wacom tablet driver reorganization")
Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru>
Reviewed-by: Ping Cheng <ping.cheng@wacom.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
The strscpy function is able to return an error code when a copy would
overflow the size of the destination. The copy is stopped and the buffer
terminated before overflow actually occurs so it is safe to continue
execution, but we should still produce a warning should this occur.
Signed-off-by: Jason Gerecke <jason.gerecke@wacom.com>
Reviewed-by: Ping Cheng <ping.cheng@wacom.com>
Reviewed-by: Peter Hutterer <peter.hutterer@who-t.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
It was noticing that after a while when unloading/loading the driver and
sending traffic through the switch, it would stop working. It would stop
forwarding any traffic and the only way to get out of this was to do a
power cycle of the board. The root cause seems to be that the switch
core is initialized twice. Apparently initializing twice the switch core
disturbs the pointers in the queue systems in the HW, so after a while
it would stop sending the traffic.
Unfortunetly, it is not possible to use a reset of the switch here,
because the reset line is connected to multiple devices like MDIO,
SGPIO, FAN, etc. So then all the devices will get reseted when the
network driver will be loaded.
So the fix is to check if the core is initialized already and if that is
the case don't initialize it again.
Fixes: db8bcaad53 ("net: lan966x: add the basic lan966x driver")
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230522120038.3749026-1-horatiu.vultur@microchip.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Stop restricting the PCI search to a range of PCI domains fed to
pci_get_domain_bus_and_slot(). Instead, use for_each_pci_dev() and
look at all PCI domains in one pass.
On systems with more than 8 sockets, this avoids error messages like
"Information: Invalid level, Can't get TDP control information at
specified levels on cpu 480" from the intel speed select utility.
Fixes: aa2ddd2425 ("platform/x86: ISST: Use numa node id for cpu pci dev mapping")
Signed-off-by: Steve Wahl <steve.wahl@hpe.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://lore.kernel.org/r/20230519160420.2588475-1-steve.wahl@hpe.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
PAGE_OFFSET is technically a virtual address so when checking the value of
initrd_start against it we should make sure that it has been sanitised from
the values passed by the bootloader. Without this change, even with a bootloader
that passes correct addresses for an initrd, we are failing to load it on MT7621
boards, for example.
Signed-off-by: Liviu Dudau <liviu@dudau.co.uk>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Various fixes for the Au1200/Au1550/Au1300 DBDMA2 code:
- skip cache invalidation if chip has working coherency circuitry.
- invalidate KSEG0-portion of the (physical) data address.
- force the dma channel doorbell write out to bus immediately with
a sync.
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
The Au1300, at least the one I have to test, uses the NetLogic vendor
ID, but commit 95b8a5e011 ("MIPS: Remove NETLOGIC support") also
dropped Au1300 detection. Restore Au1300 detection.
Tested on DB1300 with Au1380 chip.
Signed-off-by: Manuel Lauss <manuel.lauss@gmail.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Alchemy DB1200/DB1300 boards can use the pata_platform driver.
Unhide the config entry in all of MIPS.
Signed-off-by: Manuel Lauss <manuel.lauss@gmail.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
In fact the device with chip id 0xD283 is called NCT6126D, and that is
the chip id the Nuvoton code was written for. Correct that name to avoid
confusion, because a NCT6116D in fact exists as well but has another
chip id, and is currently not supported.
The look at the spec also revealed that GPIO group7 in fact has 8 pins,
so correct the pin count in that group as well.
Fixes: d0918a84af ("gpio-f7188x: Add GPIO support for Nuvoton NCT6116")
Reported-by: Xing Tong Wu <xingtong.wu@siemens.com>
Signed-off-by: Henning Schild <henning.schild@siemens.com>
Acked-by: Simon Guinot <simon.guinot@sequanux.org>
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
After commit b752ea0c28 ("perf/x86/intel/ds: Flush PEBS DS when changing
PEBS_DATA_CFG"), the cpuc->pebs_data_cfg may save some bits that are not
supported by real hardware, such as PEBS_UPDATE_DS_SW. This would cause
the VMX hardware MSR switching mechanism to save/restore invalid values
for PEBS_DATA_CFG MSR, thus crashing the host when PEBS is used for guest.
Fix it by using the active host value from cpuc->active_pebs_data_cfg.
Fixes: b752ea0c28 ("perf/x86/intel/ds: Flush PEBS DS when changing PEBS_DATA_CFG")
Signed-off-by: Like Xu <likexu@tencent.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Link: https://lore.kernel.org/r/20230517133808.67885-1-likexu@tencent.com
IOMMU v2 page table supports 4 level (47 bit) or 5 level (56 bit) virtual
address space. Current code assumes it can support 64bit IOVA address
space. If IOVA allocator allocates virtual address > 47/56 bit (depending
on page table level) then it will do wrong mapping and cause invalid
translation.
Hence adjust aperture size to use max address supported by the page table.
Reported-by: Jerry Snitselaar <jsnitsel@redhat.com>
Fixes: aaac38f614 ("iommu/amd: Initial support for AMD IOMMU v2 page table")
Cc: <Stable@vger.kernel.org> # v6.0+
Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Vasant Hegde <vasant.hegde@amd.com>
Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Link: https://lore.kernel.org/r/20230518054351.9626-1-vasant.hegde@amd.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
After the cited patch, mlx5_irq xarray index can be different then
mlx5_irq MSIX table index.
Fix it by storing both mlx5_irq xarray index and MSIX table index.
Fixes: 3354822cde ("net/mlx5: Use dynamic msix vectors allocation")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Eli Cohen <elic@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The cited patch deny the user of changing the affinity of mlx5 irqs,
which break backward compatibility.
Hence, allow the user to change the affinity of mlx5 irqs.
Fixes: bbac70c741 ("net/mlx5: Use newer affinity descriptor")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Eli Cohen <elic@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
From one hand, mlx5 driver is allowing to probe PFs in parallel.
From the other hand, devcom, which is a share resource between PFs, is
registered without any lock. This might resulted in memory problems.
Hence, use the global mlx5_dev_list_lock in order to serialize devcom
registration.
Fixes: fadd59fc50 ("net/mlx5: Introduce inter-device communication mechanism")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
In case devcom allocation is failed, mlx5 is always freeing the priv.
However, this priv might have been allocated by a different thread,
and freeing it might lead to use-after-free bugs.
Fix it by freeing the priv only in case it was allocated by the
running thread.
Fixes: fadd59fc50 ("net/mlx5: Introduce inter-device communication mechanism")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
devcom events are sent to all registered component. Following the
cited patch, it is possible for two components, e.g.: two eswitches,
to send devcom events, while both components are registered. This
means eswitch layer will do double un/pairing, which is double
allocation and free of resources, even though only one un/pairing is
needed. flow example:
cpu0 cpu1
---- ----
mlx5_devlink_eswitch_mode_set(dev0)
esw_offloads_devcom_init()
mlx5_devcom_register_component(esw0)
mlx5_devlink_eswitch_mode_set(dev1)
esw_offloads_devcom_init()
mlx5_devcom_register_component(esw1)
mlx5_devcom_send_event()
mlx5_devcom_send_event()
Hence, check whether the eswitches are already un/paired before
free/allocation of resources.
Fixes: 09b278462f ("net: devlink: enable parallel ops on netlink interface")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Check in the mlx5e_ptp_poll_ts_cq context if the ptp tx sq should be woken
up. Before change, the ptp tx sq may never wake up if the ptp tx ts skb
fifo is full when mlx5e_poll_tx_cq checks if the queue should be woken up.
Fixes: 1880bc4e4a ("net/mlx5e: Add TX port timestamp support")
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Fix spacing for the error and also the correct error code pointer.
Fixes: c9b9dcb430 ("net/mlx5: Move device memory management to mlx5_core")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
With introduction of post action infrastructure most of the users of encap
attribute had been modified in order to obtain the correct attribute by
calling mlx5e_tc_get_encap_attr() helper instead of assuming encap action
is always on default attribute. However, the cited commit didn't modify
mlx5e_invalidate_encap() which prevents it from destroying correct modify
header action which leads to a warning [0]. Fix the issue by using correct
attribute.
[0]:
Feb 21 09:47:35 c-237-177-40-045 kernel: WARNING: CPU: 17 PID: 654 at drivers/net/ethernet/mellanox/mlx5/core/en_tc.c:684 mlx5e_tc_attach_mod_hdr+0x1cc/0x230 [mlx5_core]
Feb 21 09:47:35 c-237-177-40-045 kernel: RIP: 0010:mlx5e_tc_attach_mod_hdr+0x1cc/0x230 [mlx5_core]
Feb 21 09:47:35 c-237-177-40-045 kernel: Call Trace:
Feb 21 09:47:35 c-237-177-40-045 kernel: <TASK>
Feb 21 09:47:35 c-237-177-40-045 kernel: mlx5e_tc_fib_event_work+0x8e3/0x1f60 [mlx5_core]
Feb 21 09:47:35 c-237-177-40-045 kernel: ? mlx5e_take_all_encap_flows+0xe0/0xe0 [mlx5_core]
Feb 21 09:47:35 c-237-177-40-045 kernel: ? lock_downgrade+0x6d0/0x6d0
Feb 21 09:47:35 c-237-177-40-045 kernel: ? lockdep_hardirqs_on_prepare+0x273/0x3f0
Feb 21 09:47:35 c-237-177-40-045 kernel: ? lockdep_hardirqs_on_prepare+0x273/0x3f0
Feb 21 09:47:35 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310
Feb 21 09:47:35 c-237-177-40-045 kernel: ? lockdep_hardirqs_on_prepare+0x3f0/0x3f0
Feb 21 09:47:35 c-237-177-40-045 kernel: ? pwq_dec_nr_in_flight+0x230/0x230
Feb 21 09:47:35 c-237-177-40-045 kernel: ? rwlock_bug.part.0+0x90/0x90
Feb 21 09:47:35 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0
Feb 21 09:47:35 c-237-177-40-045 kernel: ? __kthread_parkme+0xd9/0x1d0
Fixes: 8300f22526 ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
SW Steering uses RC QP for writing STEs to ICM. This writingis done in LB
(loopback), and FL (force-loopback) QP is preferred for performance. FL is
available when RoCE is enabled or disabled based on RoCE caps.
This patch adds reading of FL capability from HCA caps in addition to the
existing reading from RoCE caps, thus fixing the case where we didn't
have loopback enabled when RoCE was disabled.
Fixes: 7304d603a5 ("net/mlx5: DR, Add support for force-loopback QP")
Signed-off-by: Itamar Gozlan <igozlan@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
When calculating crc for hash index we use the function crc32 that
calculates for little-endian (LE) arch.
Then we convert it to network endianness using htonl(), but it's wrong
to do the conversion in BE archs since the crc32 value is already LE.
The solution is to switch the bytes from the crc result for all types
of arc.
Fixes: 40416d8ede ("net/mlx5: DR, Replace CRC32 implementation to use kernel lib")
Signed-off-by: Erez Shitrit <erezsh@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
In case user switch a device from switchdev mode to legacy mode, mlx5
first unpair the E-switch and afterwards unload the uplink vport.
From the other hand, in case user remove or reload a device, mlx5
first unload the uplink vport and afterwards unpair the E-switch.
The latter is causing a bug[1], hence, handle pairing of E-switch as
part of uplink un/load APIs.
[1]
In case VF_LAG is used, every tc fdb flow is duplicated to the peer
esw. However, the original esw keeps a pointer to this duplicated
flow, not the peer esw.
e.g.: if user create tc fdb flow over esw0, the flow is duplicated
over esw1, in FW/HW, but in SW, esw0 keeps a pointer to the duplicated
flow.
During module unload while a peer tc fdb flow is still offloaded, in
case the first device to be removed is the peer device (esw1 in the
example above), the peer net-dev is destroyed, and so the mlx5e_priv
is memset to 0.
Afterwards, the peer device is trying to unpair himself from the
original device (esw0 in the example above). Unpair API invoke the
original device to clear peer flow from its eswitch (esw0), but the
peer flow, which is stored over the original eswitch (esw0), is
trying to use the peer mlx5e_priv, which is memset to 0 and result in
bellow kernel-oops.
[ 157.964081 ] BUG: unable to handle page fault for address: 000000000002ce60
[ 157.964662 ] #PF: supervisor read access in kernel mode
[ 157.965123 ] #PF: error_code(0x0000) - not-present page
[ 157.965582 ] PGD 0 P4D 0
[ 157.965866 ] Oops: 0000 [#1] SMP
[ 157.967670 ] RIP: 0010:mlx5e_tc_del_fdb_flow+0x48/0x460 [mlx5_core]
[ 157.976164 ] Call Trace:
[ 157.976437 ] <TASK>
[ 157.976690 ] __mlx5e_tc_del_fdb_peer_flow+0xe6/0x100 [mlx5_core]
[ 157.977230 ] mlx5e_tc_clean_fdb_peer_flows+0x67/0x90 [mlx5_core]
[ 157.977767 ] mlx5_esw_offloads_unpair+0x2d/0x1e0 [mlx5_core]
[ 157.984653 ] mlx5_esw_offloads_devcom_event+0xbf/0x130 [mlx5_core]
[ 157.985212 ] mlx5_devcom_send_event+0xa3/0xb0 [mlx5_core]
[ 157.985714 ] esw_offloads_disable+0x5a/0x110 [mlx5_core]
[ 157.986209 ] mlx5_eswitch_disable_locked+0x152/0x170 [mlx5_core]
[ 157.986757 ] mlx5_eswitch_disable+0x51/0x80 [mlx5_core]
[ 157.987248 ] mlx5_unload+0x2a/0xb0 [mlx5_core]
[ 157.987678 ] mlx5_uninit_one+0x5f/0xd0 [mlx5_core]
[ 157.988127 ] remove_one+0x64/0xe0 [mlx5_core]
[ 157.988549 ] pci_device_remove+0x31/0xa0
[ 157.988933 ] device_release_driver_internal+0x18f/0x1f0
[ 157.989402 ] driver_detach+0x3f/0x80
[ 157.989754 ] bus_remove_driver+0x70/0xf0
[ 157.990129 ] pci_unregister_driver+0x34/0x90
[ 157.990537 ] mlx5_cleanup+0xc/0x1c [mlx5_core]
[ 157.990972 ] __x64_sys_delete_module+0x15a/0x250
[ 157.991398 ] ? exit_to_user_mode_prepare+0xea/0x110
[ 157.991840 ] do_syscall_64+0x3d/0x90
[ 157.992198 ] entry_SYSCALL_64_after_hwframe+0x46/0xb0
Fixes: 04de7dda73 ("net/mlx5e: Infrastructure for duplicated offloading of TC flows")
Fixes: 1418ddd96a ("net/mlx5e: Duplicate offloaded TC eswitch rules under uplink LAG")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
DEVX can issue a general command, which is not used by mlx5 driver.
In case such command is failed, mlx5 is trying to collect the failure
data, However, mlx5 doesn't create a storage for this command, since
mlx5 doesn't use it. This lead to array-index-out-of-bounds error.
Fix it by checking whether the command is known before collecting the
failure data.
Fixes: 34f46ae0d4 ("net/mlx5: Add command failures data to debugfs")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
A successful call to cgroup_css_set_fork() will always have taken
a ref on kargs->cset (regardless of CLONE_INTO_CGROUP), so always
do a corresponding put in cgroup_css_set_put_fork().
Without this, a cset and its contained css structures will be
leaked for some fork failures. The following script reproduces
the leak for a fork failure due to exceeding pids.max in the
pids controller. A similar thing can happen if we jump to the
bad_fork_cancel_cgroup label in copy_process().
[ -z "$1" ] && echo "Usage $0 pids-root" && exit 1
PID_ROOT=$1
CGROUP=$PID_ROOT/foo
[ -e $CGROUP ] && rmdir -f $CGROUP
mkdir $CGROUP
echo 5 > $CGROUP/pids.max
echo $$ > $CGROUP/cgroup.procs
fork_bomb()
{
set -e
for i in $(seq 10); do
/bin/sleep 3600 &
done
}
(fork_bomb) &
wait
echo $$ > $PID_ROOT/cgroup.procs
kill $(cat $CGROUP/cgroup.procs)
rmdir $CGROUP
Fixes: ef2c41cf38 ("clone3: allow spawning processes into cgroups")
Cc: stable@vger.kernel.org # v5.7+
Signed-off-by: John Sperbeck <jsperbeck@google.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Pull x86 fix from Dave Hansen:
"This works around and issue where the INVLPG instruction may miss
invalidating kernel TLB entries in recent hybrid CPUs.
I do expect an eventual microcode fix for this. When the microcode
version numbers are known, we can circle back around and add them the
model table to disable this workaround"
* tag 'x86_urgent_for_6.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm: Avoid incomplete Global INVLPG flushes
Pull module fix from Luis Chamberlain:
"Only one fix has trickled through. Harshit Mogalapalli found a
use-after-free bug through static analysis with smatch"
* tag 'modules-6.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux:
module: Fix use-after-free bug in read_file_mod_stats()
When target mode is enabled, the pci_irq_get_affinity() function may return
a NULL value in qla_mapq_init_qp_cpu_map() due to the qla24xx_enable_msix()
code that handles IRQ settings for target mode. This leads to a crash due
to a NULL pointer dereference.
This patch fixes the issue by adding a check for the NULL value returned by
pci_irq_get_affinity() and introducing a 'cpu_mapped' boolean flag to the
qla_qpair structure, ensuring that the qpair's CPU affinity is updated when
it has not been mapped to a CPU.
Fixes: 1d201c81d4 ("scsi: qla2xxx: Select qpair depending on which CPU post_cmd() gets called")
Signed-off-by: Gleb Chesnokov <gleb.chesnokov@scst.dev>
Link: https://lore.kernel.org/r/56b416f2-4e0f-b6cf-d6d5-b7c372e3c6a2@scst.dev
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Smatch warns:
kernel/module/stats.c:394 read_file_mod_stats()
warn: passing freed memory 'buf'
We are passing 'buf' to simple_read_from_buffer() after freeing it.
Fix this by changing the order of 'simple_read_from_buffer' and 'kfree'.
Fixes: df3e764d8e ("module: add debug stats to help identify memory pressure")
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
The tpg->np_login_sem is a semaphore that is used to serialize the login
process when multiple login threads run concurrently against the same
target portal group.
The iscsi_target_locate_portal() function finds the tpg, calls
iscsit_access_np() against the np_login_sem semaphore and saves the tpg
pointer in conn->tpg;
If iscsi_target_locate_portal() fails, the caller will check for the
conn->tpg pointer and, if it's not NULL, then it will assume that
iscsi_target_locate_portal() called iscsit_access_np() on the semaphore.
Make sure that conn->tpg gets initialized only if iscsit_access_np() was
successful, otherwise iscsit_deaccess_np() may end up being called against
a semaphore we never took, allowing more than one thread to access the same
tpg.
Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
Link: https://lore.kernel.org/r/20230508162219.1731964-4-mlombard@redhat.com
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If the initiator suddenly stops sending data during a login while keeping
the TCP connection open, the login_work won't be scheduled and will never
release the login semaphore; concurrent login operations will therefore get
stuck and fail.
The bug is due to the inability of the login timeout code to properly
handle this particular case.
Fix the problem by replacing the old per-NP login timer with a new
per-connection timer.
The timer is started when an initiator connects to the target; if it
expires, it sends a SIGINT signal to the thread pointed at by the
conn->login_kworker pointer.
conn->login_kworker is set by calling the iscsit_set_login_timer_kworker()
helper, initially it will point to the np thread; When the login
operation's control is in the process of being passed from the NP-thread to
login_work, the conn->login_worker pointer is set to NULL. Finally,
login_kworker will be changed to point to the worker thread executing the
login_work job.
If conn->login_kworker is NULL when the timer expires, it means that the
login operation hasn't been completed yet but login_work isn't running, in
this case the timer will mark the login process as failed and will schedule
login_work so the latter will be forced to free the resources it holds.
Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
Link: https://lore.kernel.org/r/20230508162219.1731964-2-mlombard@redhat.com
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Pull NFS client fixes from Anna Schumaker:
"Stable Fix:
- Don't change task->tk_status after the call to rpc_exit_task
Other Bugfixes:
- Convert kmap_atomic() to kmap_local_folio()
- Fix a potential double free with READ_PLUS"
* tag 'nfs-for-6.4-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
NFSv4.2: Fix a potential double free with READ_PLUS
SUNRPC: Don't change task->tk_status after the call to rpc_exit_task
NFS: Convert kmap_atomic() to kmap_local_folio()
The commit 4f7e723643 ("cgroup: Fix threadgroup_rwsem <-> cpus_read_lock()
deadlock") fixed the deadlock between cgroup_threadgroup_rwsem and
cpus_read_lock() by introducing cgroup_attach_{lock,unlock}() and removing
cpus_read_{lock,unlock}() from cpuset_attach(). But cgroup_transfer_tasks()
was missed and not handled, which will cause th following warning:
WARNING: CPU: 0 PID: 589 at kernel/cpu.c:526 lockdep_assert_cpus_held+0x32/0x40
CPU: 0 PID: 589 Comm: kworker/1:4 Not tainted 6.4.0-rc2-next-20230517 #50
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
Workqueue: events cpuset_hotplug_workfn
RIP: 0010:lockdep_assert_cpus_held+0x32/0x40
<...>
Call Trace:
<TASK>
cpuset_attach+0x40/0x240
cgroup_migrate_execute+0x452/0x5e0
? _raw_spin_unlock_irq+0x28/0x40
cgroup_transfer_tasks+0x1f3/0x360
? find_held_lock+0x32/0x90
? cpuset_hotplug_workfn+0xc81/0xed0
cpuset_hotplug_workfn+0xcb1/0xed0
? process_one_work+0x248/0x5b0
process_one_work+0x2b9/0x5b0
worker_thread+0x56/0x3b0
? process_one_work+0x5b0/0x5b0
kthread+0xf1/0x120
? kthread_complete_and_exit+0x20/0x20
ret_from_fork+0x1f/0x30
</TASK>
So just use the cgroup_attach_{lock,unlock}() helper to fix it.
Reported-by: Zhao Gongyi <zhaogongyi@bytedance.com>
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Acked-by: Muchun Song <songmuchun@bytedance.com>
Fixes: 05c7b7a92c ("cgroup/cpuset: Fix a race between cpuset_attach() and cpu hotplug")
Cc: stable@vger.kernel.org # v5.17+
Signed-off-by: Tejun Heo <tj@kernel.org>
The LRU and LRU_PERCPU maps allocate a new element on update before locking the
target hash table bucket. Right after that the maps try to lock the bucket.
If this fails, then maps return -EBUSY to the caller without releasing the
allocated element. This makes the element untracked: it doesn't belong to
either of free lists, and it doesn't belong to the hash table, so can't be
re-used; this eventually leads to the permanent -ENOMEM on LRU map updates,
which is unexpected. Fix this by returning the element to the local free list
if bucket locking fails.
Fixes: 20b6cc34ea ("bpf: Avoid hashtab deadlock with map_locked")
Signed-off-by: Anton Protopopov <aspsk@isovalent.com>
Link: https://lore.kernel.org/r/20230522154558.2166815-1-aspsk@isovalent.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Add comment in arch_sync_dma_for_device() and handle the direction flag in
arch_sync_dma_for_cpu().
When receiving data from the device (DMA_FROM_DEVICE) unconditionally
purge the data cache in arch_sync_dma_for_cpu().
Signed-off-by: Helge Deller <deller@gmx.de>
When running on an AMD vIOMMU, we observed multiple invalidations (of
decreasing power of 2 aligned sizes) when unmapping a single page.
Domain flush takes gather bounds (end-start) as size param. However,
gather->end is defined as the last inclusive address (start + size - 1).
This leads to an off by 1 error.
With this patch, verified that 1 invalidation occurs when unmapping a
single page.
Fixes: a270be1b3f ("iommu/amd: Use only natural aligned flushes in a VM")
Cc: stable@vger.kernel.org # >= 5.15
Signed-off-by: Jon Pan-Doh <pandoh@google.com>
Tested-by: Sudheer Dantuluri <dantuluris@google.com>
Suggested-by: Gary Zibrat <gzibrat@google.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Acked-by: Nadav Amit <namit@vmware.com>
Link: https://lore.kernel.org/r/20230426203256.237116-1-pandoh@google.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Drivers are supposed to list the domain types they support in their
domain_alloc() ops so when we add new domain types, like BLOCKING or SVA,
they don't start breaking.
This ended up providing an empty UNMANAGED domain when the core code asked
for a BLOCKING domain, which happens to be the fallback for drivers that
don't support it, but this is completely wrong for SVA.
Check for the DMA types AMD supports and reject every other kind.
Fixes: 136467962e ("iommu: Add IOMMU SVA domain support")
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/0-v1-2ac37b893728+da-amd_check_types_jgg@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
GALog exists to propagate interrupts into all vCPUs in the system when
interrupts are marked as non running (e.g. when vCPUs aren't running). A
GALog overflow happens when there's in no space in the log to record the
GATag of the interrupt. So when the GALOverflow condition happens, the
GALog queue is processed and the GALog is restarted, as the IOMMU
manual indicates in section "2.7.4 Guest Virtual APIC Log Restart
Procedure":
| * Wait until MMIO Offset 2020h[GALogRun]=0b so that all request
| entries are completed as circumstances allow. GALogRun must be 0b to
| modify the guest virtual APIC log registers safely.
| * Write MMIO Offset 0018h[GALogEn]=0b.
| * As necessary, change the following values (e.g., to relocate or
| resize the guest virtual APIC event log):
| - the Guest Virtual APIC Log Base Address Register
| [MMIO Offset 00E0h],
| - the Guest Virtual APIC Log Head Pointer Register
| [MMIO Offset 2040h][GALogHead], and
| - the Guest Virtual APIC Log Tail Pointer Register
| [MMIO Offset 2048h][GALogTail].
| * Write MMIO Offset 2020h[GALOverflow] = 1b to clear the bit (W1C).
| * Write MMIO Offset 0018h[GALogEn] = 1b, and either set
| MMIO Offset 0018h[GAIntEn] to enable the GA log interrupt or clear
| the bit to disable it.
Failing to handle the GALog overflow means that none of the VFs (in any
guest) will work with IOMMU AVIC forcing the user to power cycle the
host. When handling the event it resumes the GALog without resizing
much like how it is done in the event handler overflow. The
[MMIO Offset 2020h][GALOverflow] bit might be set in status register
without the [MMIO Offset 2020h][GAInt] bit, so when deciding to poll
for GA events (to clear space in the galog), also check the overflow
bit.
[suravee: Check for GAOverflow without GAInt, toggle CONTROL_GAINT_EN]
Co-developed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Link: https://lore.kernel.org/r/20230419201154.83880-3-joao.m.martins@oracle.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
On KVM GSI routing table updates, specially those where they have vIOMMUs
with interrupt remapping enabled (to boot >255vcpus setups without relying
on KVM_FEATURE_MSI_EXT_DEST_ID), a VMM may update the backing VF MSIs
with a new VCPU affinity.
On AMD with AVIC enabled, the new vcpu affinity info is updated via:
avic_pi_update_irte()
irq_set_vcpu_affinity()
amd_ir_set_vcpu_affinity()
amd_iommu_{de}activate_guest_mode()
Where the IRTE[GATag] is updated with the new vcpu affinity. The GATag
contains VM ID and VCPU ID, and is used by IOMMU hardware to signal KVM
(via GALog) when interrupt cannot be delivered due to vCPU is in
blocking state.
The issue is that amd_iommu_activate_guest_mode() will essentially
only change IRTE fields on transitions from non-guest-mode to guest-mode
and otherwise returns *with no changes to IRTE* on already configured
guest-mode interrupts. To the guest this means that the VF interrupts
remain affined to the first vCPU they were first configured, and guest
will be unable to issue VF interrupts and receive messages like this
from spurious interrupts (e.g. from waking the wrong vCPU in GALog):
[ 167.759472] __common_interrupt: 3.34 No irq handler for vector
[ 230.680927] mlx5_core 0000:00:02.0: mlx5_cmd_eq_recover:247:(pid
3122): Recovered 1 EQEs on cmd_eq
[ 230.681799] mlx5_core 0000:00:02.0:
wait_func_handle_exec_timeout:1113:(pid 3122): cmd[0]: CREATE_CQ(0x400)
recovered after timeout
[ 230.683266] __common_interrupt: 3.34 No irq handler for vector
Given the fact that amd_ir_set_vcpu_affinity() uses
amd_iommu_activate_guest_mode() underneath it essentially means that VCPU
affinity changes of IRTEs are nops. Fix it by dropping the check for
guest-mode at amd_iommu_activate_guest_mode(). Same thing is applicable to
amd_iommu_deactivate_guest_mode() although, even if the IRTE doesn't change
underlying DestID on the host, the VFIO IRQ handler will still be able to
poke at the right guest-vCPU.
Fixes: b9c6ff94e4 ("iommu/amd: Re-factor guest virtual APIC (de-)activation code")
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Link: https://lore.kernel.org/r/20230419201154.83880-2-joao.m.martins@oracle.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Smatch complains that
drivers/iommu/rockchip-iommu.c:1306 rk_iommu_probe() warn: missing unwind goto?
The rk_iommu_probe function, after obtaining the irq value through
platform_get_irq, directly returns an error if the returned value
is negative, without releasing any resources.
Fix this by adding a new error handling label "err_pm_disable" and
use a goto statement to redirect to the error handling process. In
order to preserve the original semantics, set err to the value of irq.
Fixes: 1aa55ca9b1 ("iommu/rockchip: Move irq request past pm_runtime_enable")
Signed-off-by: Chao Wang <D202280639@hust.edu.cn>
Reviewed-by: Dongliang Mu <dzm91@hust.edu.cn>
Reviewed-by: Heiko Stuebner <heiko@sntech.de>
Link: https://lore.kernel.org/r/20230417030421.2777-1-D202280639@hust.edu.cn
Signed-off-by: Joerg Roedel <jroedel@suse.de>
On riscv64, linux-next-20233030 (and for several days earlier),
there is a kconfig warning:
WARNING: unmet direct dependencies detected for IOMMU_IO_PGTABLE_LPAE
Depends on [n]: IOMMU_SUPPORT [=y] && (ARM || ARM64 || COMPILE_TEST [=n]) && !GENERIC_ATOMIC64 [=n]
Selected by [y]:
- IPMMU_VMSA [=y] && IOMMU_SUPPORT [=y] && (ARCH_RENESAS [=y] || COMPILE_TEST [=n]) && !GENERIC_ATOMIC64 [=n]
and build errors:
riscv64-linux-ld: drivers/iommu/io-pgtable-arm.o: in function `.L140':
io-pgtable-arm.c:(.init.text+0x1e8): undefined reference to `alloc_io_pgtable_ops'
riscv64-linux-ld: drivers/iommu/io-pgtable-arm.o: in function `.L168':
io-pgtable-arm.c:(.init.text+0xab0): undefined reference to `free_io_pgtable_ops'
riscv64-linux-ld: drivers/iommu/ipmmu-vmsa.o: in function `.L140':
ipmmu-vmsa.c:(.text+0xbc4): undefined reference to `free_io_pgtable_ops'
riscv64-linux-ld: drivers/iommu/ipmmu-vmsa.o: in function `.L0 ':
ipmmu-vmsa.c:(.text+0x145e): undefined reference to `alloc_io_pgtable_ops'
Add ARM || ARM64 || COMPILE_TEST dependencies to IPMMU_VMSA to prevent
these issues, i.e., so that ARCH_RENESAS on RISC-V is not allowed.
This makes the ARCH dependencies become:
depends on (ARCH_RENESAS && (ARM || ARM64)) || COMPILE_TEST
but that can be a bit hard to read.
Fixes: 8292493c22 ("riscv: Kconfig.socs: Add ARCH_RENESAS kconfig option")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Suggested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Will Deacon <will@kernel.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: iommu@lists.linux.dev
Cc: Conor Dooley <conor@kernel.org>
Cc: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/20230330165817.21920-1-rdunlap@infradead.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Merge series from Amadeusz Sławiński <amadeuszx.slawinski@linux.intel.com>:
Series of fixes for issues found during development and testing,
primarly for avs driver.
For a bigjoiner configuration display->crtc_disable() will be called
first for the slave CRTCs and then for the master CRTC. However slave
CRTCs will be actually disabled only after the master CRTC is disabled
(from the encoder disable hooks called with the master CRTC state).
Hence the slave PIPEDMCs can be disabled only after the master CRTC is
disabled, make this so.
intel_encoders_post_pll_disable() must be called only for the master
CRTC, as for the other two encoder disable hooks. While at it fix this
up as well. This didn't cause a problem, since
intel_encoders_post_pll_disable() will call the corresponding hook only
for an encoder/connector connected to the given CRTC, however slave
CRTCs will have no associated encoder/connector.
Fixes: 3af2ff0840 ("drm/i915: Enable a PIPEDMC whenever its corresponding pipe is enabled")
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230510103131.1618266-2-imre.deak@intel.com
(cherry picked from commit 7eeef32719)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
It's reported that the recording started right after the driver probe
doesn't work properly, and it turned out that this is related with the
codec auto-suspend. Namely, after the probe phase, the usage count
goes zero, and the auto-suspend is programmed, but the codec is kept
still active until the auto-suspend expiration. When an application
(e.g. alsactl) updates the mixer values at this moment, the values are
cached but not actually written. Then, starting arecord thereafter
also results in the silence because of the missing unmute.
The root cause is the handling of "lazy update" mode; when a mixer
value is updated *after* the suspend, it should update only the cache
and exits. At the resume, the cached value is written to the device,
in turn. The problem is that the current code misinterprets the state
of auto-suspend as if it were already suspended.
Although we can add the check of the actual device state after
pm_runtime_get_if_in_use() for catching the missing state, this won't
suffice; the second call of regmap_update_bits_check() will skip
writing the register because the cache has been already updated by the
first call. So we'd need fixes in two different places.
OTOH, a simpler fix is to replace pm_runtime_get_if_in_use() with
pm_runtime_get_if_active() (with ign_usage_count=true). This change
implies that the driver takes the pm refcount if the device is still
in ACTIVE state and continues the processing. A small caveat is that
this will leave the auto-suspend timer. But, since the timer callback
itself checks the device state and aborts gracefully when it's active,
this won't be any substantial problem.
Long story short: we address the missing register-write problem just
by replacing the pm_runtime_*() call in snd_hda_keep_power_up().
Fixes: fc4f000bf8 ("ALSA: hda - Fix unexpected resume through regmap code path")
Reported-by: Amadeusz Sławiński <amadeuszx.slawinski@linux.intel.com>
Closes: https://lore.kernel.org/r/a7478636-af11-92ab-731c-9b13c582a70d@linux.intel.com
Suggested-by: Cezary Rojewski <cezary.rojewski@intel.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20230518113520.15213-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
On 68030/020, an instruction such as, moveml %a2-%a3/%a5,%sp@- may cause
a stack page fault during instruction execution (i.e. not at an
instruction boundary) and produce a format 0xB exception frame.
In this situation, the value of USP will be unreliable. If a signal is
to be delivered following the exception, this USP value is used to
calculate the location for a signal frame. This can result in a
corrupted user stack.
The corruption was detected in dash (actually in glibc) where it showed
up as an intermittent "stack smashing detected" message and crash
following signal delivery for SIGCHLD.
It was hard to reproduce that failure because delivery of the signal
raced with the page fault and because the kernel places an unpredictable
gap of up to 7 bytes between the USP and the signal frame.
A format 0xB exception frame can be produced by a bus error or an
address error. The 68030 Users Manual says that address errors occur
immediately upon detection during instruction prefetch. The instruction
pipeline allows prefetch to overlap with other instructions, which means
an address error can arise during the execution of a different
instruction. So it seems likely that this patch may help in the address
error case also.
Reported-and-tested-by: Stan Johnson <userm57@yahoo.com>
Link: https://lore.kernel.org/all/CAMuHMdW3yD22_ApemzW_6me3adq6A458u1_F0v-1EYwK_62jPA@mail.gmail.com/
Cc: Michael Schmitz <schmitzmic@gmail.com>
Cc: Andreas Schwab <schwab@linux-m68k.org>
Cc: stable@vger.kernel.org
Co-developed-by: Michael Schmitz <schmitzmic@gmail.com>
Signed-off-by: Michael Schmitz <schmitzmic@gmail.com>
Signed-off-by: Finn Thain <fthain@linux-m68k.org>
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Link: https://lore.kernel.org/r/9e66262a754fcba50208aa424188896cc52a1dd1.1683365892.git.fthain@linux-m68k.org
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Currently, if the device is offline and all the channel paths are
either configured or varied offline, the associated subchannel gets
unregistered. Don't unregister the subchannel, instead unregister
offline device.
Signed-off-by: Vineeth Vijayan <vneethv@linux.ibm.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
When working in slave mode it seems the timing is exceedingly tight.
The TX FIFO can never empty, because the master is driving the clock so
zeros would be sent for those bytes where the FIFO is empty.
Return to interleaving the writing of the TX FIFO and the reading
of the RX FIFO to try to ensure the data is available when required.
Fixes: a84c11e16d ("spi: spi-cadence: Avoid read of RX FIFO before its ready")
Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Link: https://lore.kernel.org/r/20230518093927.711358-1-ckeepax@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>
In mutex_init() lockdep identifies a lock by defining a special static
key for each lock class. However if we wrap the macro in a function,
like in drmm_mutex_init(), we end up generating:
int drmm_mutex_init(struct drm_device *dev, struct mutex *lock)
{
static struct lock_class_key __key;
__mutex_init((lock), "lock", &__key);
....
}
The static __key here is what lockdep uses to identify the lock class,
however since this is just a normal function the key here will be
created once, where all callers then use the same key. In effect the
mutex->depmap.key will be the same pointer for different
drmm_mutex_init() callers. This then results in impossible lockdep
splats since lockdep thinks completely unrelated locks are the same lock
class.
To fix this turn drmm_mutex_init() into a macro such that it generates a
different "static struct lock_class_key __key" for each invocation,
which looks to be inline with what mutex_init() wants.
v2:
- Revamp the commit message with clearer explanation of the issue.
- Rather export __drmm_mutex_release() than static inline.
Reported-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reported-by: Sarah Walker <sarah.walker@imgtec.com>
Fixes: e13f13e039 ("drm: Add DRM-managed mutex_init()")
Cc: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Cc: Boris Brezillon <boris.brezillon@collabora.com>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Jocelyn Falempe <jfalempe@redhat.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Acked-by: Thomas Zimmermann <tzimmermann@suse.de>
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20230519090733.489019-1-matthew.auld@intel.com
When doing plpmtu probe, the probe size is growing every time when it
receives the ACK during the Search state until the probe fails. When
the failure occurs, pl.probe_high is set and it goes to the Complete
state.
However, if the link pmtu is huge, like 65535 in loopback_dev, the probe
eventually keeps using SCTP_MAX_PLPMTU as the probe size and never fails.
Because of that, pl.probe_high can not be set, and the plpmtu probe can
never go to the Complete state.
Fix it by setting pl.probe_high to SCTP_MAX_PLPMTU when the probe size
grows to SCTP_MAX_PLPMTU in sctp_transport_pl_recv(). Also, not allow
the probe size greater than SCTP_MAX_PLPMTU in the Complete state.
Fixes: b87641aff9 ("sctp: do state transition when a probe succeeds on HB ACK recv path")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull irqchip fixes from Marc Zyngier:
- MIPS GIC fixes for issues that could result in either
loss of state in the interrupt controller, or a deadlock
- Workaround for Mediatek Chromebooks that only save/restore
partial state when turning the GIC redistributors off,
resulting if fireworks if Linux uses interrupt priorities
for pseudo-NMIs
- Fix the MBIGEN error handling on init
- Mark meson-gpio OF data structures as __maybe_unused,
avoiding compilation warnings on non-OF setups
Link: https://lore.kernel.org/lkml/20230521101812.2520740-1-maz@kernel.org
Both tb_xdomain_enable_paths() and tb_xdomain_disable_paths() expect -1,
not 0, if the corresponding ring is not needed. For this reason change
the driver to use correct value for the rings that are not needed.
Fixes: 180b068942 ("thunderbolt: Allow multiple DMA tunnels over a single XDomain connection")
Cc: stable@vger.kernel.org
Reported-by: Pengfei Xu <pengfei.xu@intel.com>
Tested-by: Pengfei Xu <pengfei.xu@intel.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
This change adds methods in the XFRM-I input path that ensures that
policies are checked prior to processing of the subsequent decapsulated
packet, after which the relevant policies may no longer be resolvable
(due to changing src/dst/proto/etc).
Notably, raw ESP/AH packets did not perform policy checks inherently,
whereas all other encapsulated packets (UDP, TCP encapsulated) do policy
checks after calling xfrm_input handling in the respective encapsulation
layer.
Fixes: b0355dbbf1 ("Fix XFRM-I support for nested ESP tunnels")
Test: Verified with additional Android Kernel Unit tests
Test: Verified against Android CTS
Signed-off-by: Benedict Wong <benedictwong@google.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
This change allows inbound traffic through nested IPsec tunnels to
successfully match policies and templates, while retaining the secpath
stack trace as necessary for netfilter policies.
Specifically, this patch marks secpath entries that have already matched
against a relevant policy as having been verified, allowing it to be
treated as optional and skipped after a tunnel decapsulation (during
which the src/dst/proto/etc may have changed, and the correct policy
chain no long be resolvable).
This approach is taken as opposed to the iteration in b0355dbbf1,
where the secpath was cleared, since that breaks subsequent validations
that rely on the existence of the secpath entries (netfilter policies, or
transport-in-tunnel mode, where policies remain resolvable).
Fixes: b0355dbbf1 ("Fix XFRM-I support for nested ESP tunnels")
Test: Tested against Android Kernel Unit Tests
Test: Tested against Android CTS
Signed-off-by: Benedict Wong <benedictwong@google.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Commit 1e8fed873e ("powerpc: drop ranges for definition of
ARCH_FORCE_MAX_ORDER") removed the limits on the possible values for
ARCH_FORCE_MAX_ORDER.
However removing the ranges entirely causes some common work flows to
break. For example building a defconfig (which uses 64K pages), changing
the page size to 4K, and rebuilding used to work, because
ARCH_FORCE_MAX_ORDER would be clamped to 12 by the ranges.
With the ranges removed it creates a kernel that builds but crashes at
boot:
kernel BUG at mm/huge_memory.c:470!
Oops: Exception in kernel mode, sig: 5 [#1]
...
NIP hugepage_init+0x9c/0x278
LR do_one_initcall+0x80/0x320
Call Trace:
do_one_initcall+0x80/0x320
kernel_init_freeable+0x304/0x3ac
kernel_init+0x30/0x1a0
ret_from_kernel_user_thread+0x14/0x1c
The reasoning for removing the ranges was that some of the values were
too large. So take that into account and limit the maximums to 10 which
is the default max, except for the 4K case which uses 12.
Fixes: 1e8fed873e ("powerpc: drop ranges for definition of ARCH_FORCE_MAX_ORDER")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230519113806.370635-1-mpe@ellerman.id.au
[ cmllamas: clean forward port from commit 015ac18be7 ("binder: fix
UAF of alloc->vma in race with munmap()") in 5.10 stable. It is needed
in mainline after the revert of commit a43cfc87ca ("android: binder:
stop saving a pointer to the VMA") as pointed out by Liam. The commit
log and tags have been tweaked to reflect this. ]
In commit 720c241924 ("ANDROID: binder: change down_write to
down_read") binder assumed the mmap read lock is sufficient to protect
alloc->vma inside binder_update_page_range(). This used to be accurate
until commit dd2283f260 ("mm: mmap: zap pages with read mmap_sem in
munmap"), which now downgrades the mmap_lock after detaching the vma
from the rbtree in munmap(). Then it proceeds to teardown and free the
vma with only the read lock held.
This means that accesses to alloc->vma in binder_update_page_range() now
will race with vm_area_free() in munmap() and can cause a UAF as shown
in the following KASAN trace:
==================================================================
BUG: KASAN: use-after-free in vm_insert_page+0x7c/0x1f0
Read of size 8 at addr ffff16204ad00600 by task server/558
CPU: 3 PID: 558 Comm: server Not tainted 5.10.150-00001-gdc8dcf942daa #1
Hardware name: linux,dummy-virt (DT)
Call trace:
dump_backtrace+0x0/0x2a0
show_stack+0x18/0x2c
dump_stack+0xf8/0x164
print_address_description.constprop.0+0x9c/0x538
kasan_report+0x120/0x200
__asan_load8+0xa0/0xc4
vm_insert_page+0x7c/0x1f0
binder_update_page_range+0x278/0x50c
binder_alloc_new_buf+0x3f0/0xba0
binder_transaction+0x64c/0x3040
binder_thread_write+0x924/0x2020
binder_ioctl+0x1610/0x2e5c
__arm64_sys_ioctl+0xd4/0x120
el0_svc_common.constprop.0+0xac/0x270
do_el0_svc+0x38/0xa0
el0_svc+0x1c/0x2c
el0_sync_handler+0xe8/0x114
el0_sync+0x180/0x1c0
Allocated by task 559:
kasan_save_stack+0x38/0x6c
__kasan_kmalloc.constprop.0+0xe4/0xf0
kasan_slab_alloc+0x18/0x2c
kmem_cache_alloc+0x1b0/0x2d0
vm_area_alloc+0x28/0x94
mmap_region+0x378/0x920
do_mmap+0x3f0/0x600
vm_mmap_pgoff+0x150/0x17c
ksys_mmap_pgoff+0x284/0x2dc
__arm64_sys_mmap+0x84/0xa4
el0_svc_common.constprop.0+0xac/0x270
do_el0_svc+0x38/0xa0
el0_svc+0x1c/0x2c
el0_sync_handler+0xe8/0x114
el0_sync+0x180/0x1c0
Freed by task 560:
kasan_save_stack+0x38/0x6c
kasan_set_track+0x28/0x40
kasan_set_free_info+0x24/0x4c
__kasan_slab_free+0x100/0x164
kasan_slab_free+0x14/0x20
kmem_cache_free+0xc4/0x34c
vm_area_free+0x1c/0x2c
remove_vma+0x7c/0x94
__do_munmap+0x358/0x710
__vm_munmap+0xbc/0x130
__arm64_sys_munmap+0x4c/0x64
el0_svc_common.constprop.0+0xac/0x270
do_el0_svc+0x38/0xa0
el0_svc+0x1c/0x2c
el0_sync_handler+0xe8/0x114
el0_sync+0x180/0x1c0
[...]
==================================================================
To prevent the race above, revert back to taking the mmap write lock
inside binder_update_page_range(). One might expect an increase of mmap
lock contention. However, binder already serializes these calls via top
level alloc->mutex. Also, there was no performance impact shown when
running the binder benchmark tests.
Fixes: c0fd210178 ("Revert "android: binder: stop saving a pointer to the VMA"")
Fixes: dd2283f260 ("mm: mmap: zap pages with read mmap_sem in munmap")
Reported-by: Jann Horn <jannh@google.com>
Closes: https://lore.kernel.org/all/20230518144052.xkj6vmddccq4v66b@revolver
Cc: <stable@vger.kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Acked-by: Todd Kjos <tkjos@google.com>
Link: https://lore.kernel.org/r/20230519195950.1775656-1-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The Sigma-Delta ADCs supported by this driver can use SDO as an interrupt
line to indicate the completion of a conversion. However, some devices
cannot properly detect the completion of a conversion by an interrupt.
This is for the reason mentioned in the following commit.
commit e9849777d0 ("genirq: Add flag to force mask in
disable_irq[_nosync]()")
A read operation is performed by an extra interrupt before the completion
of a conversion. At this time, the value read from the ADC data register
is the same as the previous conversion result. This patch fixes the issue
by setting IRQ_DISABLE_UNLAZY flag.
Fixes: 0c6ef985a1 ("iio: adc: ad7791: fix IRQ flags")
Fixes: 1a913270e5 ("iio: adc: ad7793: Fix IRQ flag")
Fixes: e081102f30 ("iio: adc: ad7780: Fix IRQ flag")
Fixes: 89a86da5cb ("iio: adc: ad7192: Add IRQ flag")
Fixes: 79ef91493f ("iio: adc: ad7124: Set IRQ type to falling")
Signed-off-by: Masahiro Honda <honda@mechatrax.com>
Link: https://lore.kernel.org/r/20230518110816.248-1-honda@mechatrax.com
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Luiz Augusto von Dentz says:
====================
bluetooth pull request for net:
- Fix compiler warnings on btnxpuart
- Fix potential double free on hci_conn_unlink
- Fix UAF on hci_conn_hash_flush
* tag 'for-net-2023-05-19' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: btnxpuart: Fix compiler warnings
Bluetooth: Unlink CISes when LE disconnects in hci_conn_del
Bluetooth: Fix UAF in hci_conn_hash_flush again
Bluetooth: Refcnt drop must be placed last in hci_conn_unlink
Bluetooth: Fix potential double free caused by hci_conn_unlink
====================
Link: https://lore.kernel.org/r/20230519233056.2024340-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When the virtual interface's feature is updated, it synchronizes the
updated feature for its own lower interface.
This propagation logic should be worked as the iteration, not recursively.
But it works recursively due to the netdev notification unexpectedly.
This problem occurs when it disables LRO only for the team and bonding
interface type.
team0
|
+------+------+-----+-----+
| | | | |
team1 team2 team3 ... team200
If team0's LRO feature is updated, it generates the NETDEV_FEAT_CHANGE
event to its own lower interfaces(team1 ~ team200).
It is worked by netdev_sync_lower_features().
So, the NETDEV_FEAT_CHANGE notification logic of each lower interface
work iteratively.
But generated NETDEV_FEAT_CHANGE event is also sent to the upper
interface too.
upper interface(team0) generates the NETDEV_FEAT_CHANGE event for its own
lower interfaces again.
lower and upper interfaces receive this event and generate this
event again and again.
So, the stack overflow occurs.
But it is not the infinite loop issue.
Because the netdev_sync_lower_features() updates features before
generating the NETDEV_FEAT_CHANGE event.
Already synchronized lower interfaces skip notification logic.
So, it is just the problem that iteration logic is changed to the
recursive unexpectedly due to the notification mechanism.
Reproducer:
ip link add team0 type team
ethtool -K team0 lro on
for i in {1..200}
do
ip link add team$i master team0 type team
ethtool -K team$i lro on
done
ethtool -K team0 lro off
In order to fix it, the notifier_ctx member of bonding/team is introduced.
Reported-by: syzbot+60748c96cf5c6df8e581@syzkaller.appspotmail.com
Fixes: fd867d51f8 ("net/core: generic support for disabling netdev features down stack")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://lore.kernel.org/r/20230517143010.3596250-1-ap420073@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently, hci_conn_del calls hci_conn_unlink for BR/EDR, (e)SCO, and
CIS connections, i.e., everything except LE connections. However, if
(e)SCO connections are unlinked when BR/EDR disconnects, CIS connections
should also be unlinked when LE disconnects.
In terms of disconnection behavior, CIS and (e)SCO connections are not
too different. One peculiarity of CIS is that when CIS connections are
disconnected, the CIS handle isn't deleted, as per [BLUETOOTH CORE
SPECIFICATION Version 5.4 | Vol 4, Part E] 7.1.6 Disconnect command:
All SCO, eSCO, and CIS connections on a physical link should be
disconnected before the ACL connection on the same physical
connection is disconnected. If it does not, they will be
implicitly disconnected as part of the ACL disconnection.
...
Note: As specified in Section 7.7.5, on the Central, the handle
for a CIS remains valid even after disconnection and, therefore,
the Host can recreate a disconnected CIS at a later point in
time using the same connection handle.
Since hci_conn_link invokes both hci_conn_get and hci_conn_hold,
hci_conn_unlink should perform both hci_conn_put and hci_conn_drop as
well. However, currently it performs only hci_conn_put.
This patch makes hci_conn_unlink call hci_conn_drop as well, which
simplifies the logic in hci_conn_del a bit and may benefit future users
of hci_conn_unlink. But it is noted that this change additionally
implies that hci_conn_unlink can queue disc_work on conn itself, with
the following call stack:
hci_conn_unlink(conn) [conn->parent == NULL]
-> hci_conn_unlink(child) [child->parent == conn]
-> hci_conn_drop(child->parent)
-> queue_delayed_work(&conn->disc_work)
Queued disc_work after hci_conn_del can be spurious, so during the
process of hci_conn_del, it is necessary to make the call to
cancel_delayed_work(&conn->disc_work) after invoking hci_conn_unlink.
Signed-off-by: Ruihan Li <lrh2000@pku.edu.cn>
Co-developed-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Commit 06149746e7 ("Bluetooth: hci_conn: Add support for linking
multiple hcon") reintroduced a previously fixed bug [1] ("KASAN:
slab-use-after-free Read in hci_conn_hash_flush"). This bug was
originally fixed by commit 5dc7d23e16 ("Bluetooth: hci_conn: Fix
possible UAF").
The hci_conn_unlink function was added to avoid invalidating the link
traversal caused by successive hci_conn_del operations releasing extra
connections. However, currently hci_conn_unlink itself also releases
extra connections, resulted in the reintroduced bug.
This patch follows a more robust solution for cleaning up all
connections, by repeatedly removing the first connection until there are
none left. This approach does not rely on the inner workings of
hci_conn_del and ensures proper cleanup of all connections.
Meanwhile, we need to make sure that hci_conn_del never fails. Indeed it
doesn't, as it now always returns zero. To make this a bit clearer, this
patch also changes its return type to void.
Reported-by: syzbot+8bb72f86fc823817bc5d@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-bluetooth/000000000000aa920505f60d25ad@google.com/
Fixes: 06149746e7 ("Bluetooth: hci_conn: Add support for linking multiple hcon")
Signed-off-by: Ruihan Li <lrh2000@pku.edu.cn>
Co-developed-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
If hci_conn_put(conn->parent) reduces conn->parent's reference count to
zero, it can immediately deallocate conn->parent. At the same time,
conn->link->list has its head in conn->parent, causing use-after-free
problems in the latter list_del_rcu(&conn->link->list).
This problem can be easily solved by reordering the two operations,
i.e., first performing the list removal with list_del_rcu and then
decreasing the refcnt with hci_conn_put.
Reported-by: Luiz Augusto von Dentz <luiz.dentz@gmail.com>
Closes: https://lore.kernel.org/linux-bluetooth/CABBYNZ+1kce8_RJrLNOXd_8=Mdpb=2bx4Nto-hFORk=qiOkoCg@mail.gmail.com/
Fixes: 06149746e7 ("Bluetooth: hci_conn: Add support for linking multiple hcon")
Signed-off-by: Ruihan Li <lrh2000@pku.edu.cn>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
kfree()-ing the scratch page isn't enough, we also need to set the pointer
back to NULL to avoid a double-free in the case of a resend.
Fixes: fbd2a05f29 (NFSv4.2: Rework scratch handling for READ_PLUS)
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Some calls to rpc_exit_task() may deliberately change the value of
task->tk_status, for instance because it gets checked by the RPC call's
rpc_release() callback. That makes it wrong to reset the value to
task->tk_rpc_status.
In particular this causes a bug where the rpc_call_done() callback tries
to fail over a set of pNFS/flexfiles writes to a different IP address,
but the reset of task->tk_status causes nfs_commit_release_pages() to
immediately mark the file as having a fatal error.
Fixes: 39494194f9 ("SUNRPC: Fix races with rpc_killall_tasks()")
Cc: stable@vger.kernel.org # 6.1.x
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
kmap_atomic() is deprecated in favor of kmap_local_{folio,page}().
Therefore, replace kmap_atomic() with kmap_local_folio() in
nfs_readdir_folio_array_append().
kmap_atomic() disables page-faults and preemption (the latter only for
!PREEMPT_RT kernels), However, the code within the mapping/un-mapping in
nfs_readdir_folio_array_append() does not depend on the above-mentioned
side effects.
Therefore, a mere replacement of the old API with the new one is all that
is required (i.e., there is no need to explicitly add any calls to
pagefault_disable() and/or preempt_disable()).
Tested with (x)fstests in a QEMU/KVM x86_32 VM, 6GB RAM, booting a kernel
with HIGHMEM64GB enabled.
Cc: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>
Fixes: ec108d3cc7 ("NFS: Convert readdir page array functions to use a folio")
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
A narrow load from a 64-bit context field results in a 64-bit load
followed potentially by a 64-bit right-shift and then a bitwise AND
operation to extract the relevant data.
In the case of a 32-bit access, an immediate mask of 0xffffffff is used
to construct a 64-bit BPP_AND operation which then sign-extends the mask
value and effectively acts as a glorified no-op. For example:
0: 61 10 00 00 00 00 00 00 r0 = *(u32 *)(r1 + 0)
results in the following code generation for a 64-bit field:
ldr x7, [x7] // 64-bit load
mov x10, #0xffffffffffffffff
and x7, x7, x10
Fix the mask generation so that narrow loads always perform a 32-bit AND
operation:
ldr x7, [x7] // 64-bit load
mov w10, #0xffffffff
and w7, w7, w10
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Krzesimir Nowak <krzesimir@kinvolk.io>
Cc: Andrey Ignatov <rdna@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Fixes: 31fd85816d ("bpf: permits narrower load from bpf program context fields")
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20230518102528.1341-1-will@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
There are two place if the at_xdmac_interleaved_queue_desc() fails which
could lead to a NULL dereference where "first" is NULL and we call
list_add_tail(&first->desc_node, ...). In the first caller, the return
is not checked so add a check for that. In the next caller, the return
is checked but if it fails on the first iteration through the loop then
it will lead to a NULL pointer dereference.
Fixes: 4e5385784e ("dmaengine: at_xdmac: handle numf > 1")
Fixes: 62b5cb757f ("dmaengine: at_xdmac: fix memory leak in interleaved mode")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Tudor Ambarus <tudor.ambarus@linaro.org>
Link: https://lore.kernel.org/r/21282b66-9860-410a-83df-39c17fcf2f1b@kili.mountain
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Since host stage-2 mappings are created lazily, we cannot rely solely on
the pte in order to recover the target physical address when checking a
host-initiated memory transition as this permits donation of unmapped
regions corresponding to MMIO or "no-map" memory.
Instead of inspecting the pte, move the addr_is_allowed_memory() check
into the host callback function where it is passed the physical address
directly from the walker.
Cc: Quentin Perret <qperret@google.com>
Fixes: e82edcc75c ("KVM: arm64: Implement do_share() helper for sharing memory")
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230518095844.1178-1-will@kernel.org
vgic_its_create() changes the vgic state without holding the
config_lock, which triggers a lockdep warning in vgic_v4_init():
[ 358.667941] WARNING: CPU: 3 PID: 178 at arch/arm64/kvm/vgic/vgic-v4.c:245 vgic_v4_init+0x15c/0x7a8
...
[ 358.707410] vgic_v4_init+0x15c/0x7a8
[ 358.708550] vgic_its_create+0x37c/0x4a4
[ 358.709640] kvm_vm_ioctl+0x1518/0x2d80
[ 358.710688] __arm64_sys_ioctl+0x7ac/0x1ba8
[ 358.711960] invoke_syscall.constprop.0+0x70/0x1e0
[ 358.713245] do_el0_svc+0xe4/0x2d4
[ 358.714289] el0_svc+0x44/0x8c
[ 358.715329] el0t_64_sync_handler+0xf4/0x120
[ 358.716615] el0t_64_sync+0x190/0x194
Wrap the whole of vgic_its_create() with config_lock since, in addition
to calling vgic_v4_init(), it also modifies the global kvm->arch.vgic
state.
Fixes: f003277311 ("KVM: arm64: Use config_lock to protect vgic state")
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230518100914.2837292-3-jean-philippe@linaro.org
Lockdep reports a circular lock dependency between the srcu and the
config_lock:
[ 262.179917] -> #1 (&kvm->srcu){.+.+}-{0:0}:
[ 262.182010] __synchronize_srcu+0xb0/0x224
[ 262.183422] synchronize_srcu_expedited+0x24/0x34
[ 262.184554] kvm_io_bus_register_dev+0x324/0x50c
[ 262.185650] vgic_register_redist_iodev+0x254/0x398
[ 262.186740] vgic_v3_set_redist_base+0x3b0/0x724
[ 262.188087] kvm_vgic_addr+0x364/0x600
[ 262.189189] vgic_set_common_attr+0x90/0x544
[ 262.190278] vgic_v3_set_attr+0x74/0x9c
[ 262.191432] kvm_device_ioctl+0x2a0/0x4e4
[ 262.192515] __arm64_sys_ioctl+0x7ac/0x1ba8
[ 262.193612] invoke_syscall.constprop.0+0x70/0x1e0
[ 262.195006] do_el0_svc+0xe4/0x2d4
[ 262.195929] el0_svc+0x44/0x8c
[ 262.196917] el0t_64_sync_handler+0xf4/0x120
[ 262.198238] el0t_64_sync+0x190/0x194
[ 262.199224]
[ 262.199224] -> #0 (&kvm->arch.config_lock){+.+.}-{3:3}:
[ 262.201094] __lock_acquire+0x2b70/0x626c
[ 262.202245] lock_acquire+0x454/0x778
[ 262.203132] __mutex_lock+0x190/0x8b4
[ 262.204023] mutex_lock_nested+0x24/0x30
[ 262.205100] vgic_mmio_write_v3_misc+0x5c/0x2a0
[ 262.206178] dispatch_mmio_write+0xd8/0x258
[ 262.207498] __kvm_io_bus_write+0x1e0/0x350
[ 262.208582] kvm_io_bus_write+0xe0/0x1cc
[ 262.209653] io_mem_abort+0x2ac/0x6d8
[ 262.210569] kvm_handle_guest_abort+0x9b8/0x1f88
[ 262.211937] handle_exit+0xc4/0x39c
[ 262.212971] kvm_arch_vcpu_ioctl_run+0x90c/0x1c04
[ 262.214154] kvm_vcpu_ioctl+0x450/0x12f8
[ 262.215233] __arm64_sys_ioctl+0x7ac/0x1ba8
[ 262.216402] invoke_syscall.constprop.0+0x70/0x1e0
[ 262.217774] do_el0_svc+0xe4/0x2d4
[ 262.218758] el0_svc+0x44/0x8c
[ 262.219941] el0t_64_sync_handler+0xf4/0x120
[ 262.221110] el0t_64_sync+0x190/0x194
Note that the current report, which can be triggered by the vgic_irq
kselftest, is a triple chain that includes slots_lock, but after
inverting the slots_lock/config_lock dependency, the actual problem
reported above remains.
In several places, the vgic code calls kvm_io_bus_register_dev(), which
synchronizes the srcu, while holding config_lock (#1). And the MMIO
handler takes the config_lock while holding the srcu read lock (#0).
Break dependency #1, by registering the distributor and redistributors
without holding config_lock. The ITS also uses kvm_io_bus_register_dev()
but already relies on slots_lock to serialize calls.
The distributor iodev is created on the first KVM_RUN call. Multiple
threads will race for vgic initialization, and only the first one will
see !vgic_ready() under the lock. To serialize those threads, rely on
slots_lock rather than config_lock.
Redistributors are created earlier, through KVM_DEV_ARM_VGIC_GRP_ADDR
ioctls and vCPU creation. Similarly, serialize the iodev creation with
slots_lock, and the rest with config_lock.
Fixes: f003277311 ("KVM: arm64: Use config_lock to protect vgic state")
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230518100914.2837292-2-jean-philippe@linaro.org
Two dma_wmb() are added in the XDP TX path to ensure proper ordering of
descriptor and buffer updates:
1. A dma_wmb() is added after updating the last BD to make sure
the updates to rest of the descriptor are visible before
transferring ownership to FEC.
2. A dma_wmb() is also added after updating the bdp to ensure these
updates are visible before updating txq->bd.cur.
3. Start the xmit of the frame immediately right after configuring the
tx descriptor.
Fixes: 6d6b39f180 ("net: fec: add initial XDP support")
Signed-off-by: Shenwei Wang <shenwei.wang@nxp.com>
Reviewed-by: Wei Fang <wei.fang@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
I would like to be copied on new patches submitted on this driver.
I am relatively familiar with the code, having practically maintained
it for a while.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
HW adds segment size to the payload length
in the IPv6 header. Fix payload length to
just TCP header length instead of 'TCP header
size + IPv6 header size'.
Fixes: 86d7476078 ("octeontx2-pf: TCP segmentation offload support")
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Avoid early devlink info return if errors arise with MCDI commands
executed for getting the required info from the device. The rationale
is some commands can fail but later ones could still give useful data.
Moreover, some nvram partitions could not be present which needs to be
handled as a non error.
The specific errors are reported through system messages and if any
error appears, it will be reported generically through extack.
Fixes 14743ddd24 ("sfc: add devlink info support for ef100")
Signed-off-by: Alejandro Lucero <alejandro.lucero-palau@amd.com>
Acked-by: Martin Habets <habetsm.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We found a crash when using SMCRv2 with 2 Mellanox ConnectX-4. It
can be reproduced by:
- smc_run nginx
- smc_run wrk -t 32 -c 500 -d 30 http://<ip>:<port>
BUG: kernel NULL pointer dereference, address: 0000000000000014
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 8000000108713067 P4D 8000000108713067 PUD 151127067 PMD 0
Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 4 PID: 2441 Comm: kworker/4:249 Kdump: loaded Tainted: G W E 6.4.0-rc1+ #42
Workqueue: smc_hs_wq smc_listen_work [smc]
RIP: 0010:smc_clc_send_confirm_accept+0x284/0x580 [smc]
RSP: 0018:ffffb8294b2d7c78 EFLAGS: 00010a06
RAX: ffff8f1873238880 RBX: ffffb8294b2d7dc8 RCX: 0000000000000000
RDX: 00000000000000b4 RSI: 0000000000000001 RDI: 0000000000b40c00
RBP: ffffb8294b2d7db8 R08: ffff8f1815c5860c R09: 0000000000000000
R10: 0000000000000400 R11: 0000000000000000 R12: ffff8f1846f56180
R13: ffff8f1815c5860c R14: 0000000000000001 R15: 0000000000000001
FS: 0000000000000000(0000) GS:ffff8f1aefd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000014 CR3: 00000001027a0001 CR4: 00000000003706e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
? mlx5_ib_map_mr_sg+0xa1/0xd0 [mlx5_ib]
? smcr_buf_map_link+0x24b/0x290 [smc]
? __smc_buf_create+0x4ee/0x9b0 [smc]
smc_clc_send_accept+0x4c/0xb0 [smc]
smc_listen_work+0x346/0x650 [smc]
? __schedule+0x279/0x820
process_one_work+0x1e5/0x3f0
worker_thread+0x4d/0x2f0
? __pfx_worker_thread+0x10/0x10
kthread+0xe5/0x120
? __pfx_kthread+0x10/0x10
ret_from_fork+0x2c/0x50
</TASK>
During the CLC handshake, server sequentially tries available SMCRv2
and SMCRv1 devices in smc_listen_work().
If an SMCRv2 device is found. SMCv2 based link group and link will be
assigned to the connection. Then assumed that some buffer assignment
errors happen later in the CLC handshake, such as RMB registration
failure, server will give up SMCRv2 and try SMCRv1 device instead. But
the resources assigned to the connection won't be reset.
When server tries SMCRv1 device, the connection creation process will
be executed again. Since conn->lnk has been assigned when trying SMCRv2,
it will not be set to the correct SMCRv1 link in
smcr_lgr_conn_assign_link(). So in such situation, conn->lgr points to
correct SMCRv1 link group but conn->lnk points to the SMCRv2 link
mistakenly.
Then in smc_clc_send_confirm_accept(), conn->rmb_desc->mr[link->link_idx]
will be accessed. Since the link->link_idx is not correct, the related
MR may not have been initialized, so crash happens.
| Try SMCRv2 device first
| |-> conn->lgr: assign existed SMCRv2 link group;
| |-> conn->link: assign existed SMCRv2 link (link_idx may be 1 in SMC_LGR_SYMMETRIC);
| |-> sndbuf & RMB creation fails, quit;
|
| Try SMCRv1 device then
| |-> conn->lgr: create SMCRv1 link group and assign;
| |-> conn->link: keep SMCRv2 link mistakenly;
| |-> sndbuf & RMB creation succeed, only RMB->mr[link_idx = 0]
| initialized.
|
| Then smc_clc_send_confirm_accept() accesses
| conn->rmb_desc->mr[conn->link->link_idx, which is 1], then crash.
v
This patch tries to fix this by cleaning conn->lnk before assigning
link. In addition, it is better to reset the connection and clean the
resources assigned if trying SMCRv2 failed in buffer creation or
registration.
Fixes: e49300a6bf ("net/smc: add listen processing for SMC-Rv2")
Link: https://lore.kernel.org/r/20220523055056.2078994-1-liuyacan@corp.netease.com/
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the end of the test, there will be an error message induced by the
`ip netns del ns1` command in cleanup()
Tests passed: 201
Tests failed: 0
Cannot remove namespace file "/run/netns/ns1": No such file or directory
This can even be reproduced with just `./fib_tests.sh -h` as we're
calling cleanup() on exit.
Redirect the error message to /dev/null to mute it.
V2: Update commit message and fixes tag.
V3: resubmit due to missing netdev ML in V2
Fixes: b60417a9f2 ("selftest: fib_tests: Always cleanup before exit")
Signed-off-by: Po-Hsu Lin <po-hsu.lin@canonical.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
NAPI gets called with budget of 0 from netpoll, which has interrupts
disabled. We should try to free some space on Tx rings and nothing
else.
Specifically do not try to handle XDP TX or try to refill Rx buffers -
we can't use the page pool from IRQ context. Don't check if IRQs moved,
either, that makes no sense in netpoll. Netpoll calls _all_ the rings
from whatever CPU it happens to be invoked on.
In general do as little as possible, the work quickly adds up when
there's tens of rings to poll.
The immediate stack trace I was seeing is:
__do_softirq+0xd1/0x2c0
__local_bh_enable_ip+0xc7/0x120
</IRQ>
<TASK>
page_pool_put_defragged_page+0x267/0x320
mlx5e_free_xdpsq_desc+0x99/0xd0
mlx5e_poll_xdpsq_cq+0x138/0x3b0
mlx5e_napi_poll+0xc3/0x8b0
netpoll_poll_dev+0xce/0x150
AFAIU page pool takes a BH lock, releases it and since BH is now
enabled tries to run softirqs.
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Fixes: 60bbf7eeef ("mlx5: use page_pool for xdp_return_frame call")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski says:
====================
tls: rx: strp: fix inline crypto offload
The local strparser version I added to TLS does not preserve
decryption status, which breaks inline crypto (NIC offload).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When receive buffer is small, or the TCP rx queue looks too
complicated to bother using it directly - we allocate a new
skb and copy data into it.
We already use sk->sk_allocation... but nothing actually
sets it to GFP_ATOMIC on the ->sk_data_ready() path.
Users of HW offload are far more likely to experience problems
due to scheduling while atomic. "Copy mode" is very rarely
triggered with SW crypto.
Fixes: 84c61fe1a7 ("tls: rx: do not use the standard strparser")
Tested-by: Shai Amiram <samiram@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When receive buffer is small we try to copy out the data from
TCP into a skb maintained by TLS to prevent connection from
stalling. Unfortunately if a single record is made up of a mix
of decrypted and non-decrypted skbs combining them into a single
skb leads to loss of decryption status, resulting in decryption
errors or data corruption.
Similarly when trying to use TCP receive queue directly we need
to make sure that all the skbs within the record have the same
status. If we don't the mixed status will be detected correctly
but we'll CoW the anchor, again collapsing it into a single paged
skb without decrypted status preserved. So the "fixup" code will
not know which parts of skb to re-encrypt.
Fixes: 84c61fe1a7 ("tls: rx: do not use the standard strparser")
Tested-by: Shai Amiram <samiram@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We'll need to copy input skbs individually in the next patch.
Factor that code out (without assuming we're copying a full record).
Tested-by: Shai Amiram <samiram@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We call tls_rx_msg_size(skb) before doing skb->len += chunk.
So the tls_rx_msg_size() code will see old skb->len, most
likely leading to an over-read.
Worst case we will over read an entire record, next iteration
will try to trim the skb but may end up turning frag len negative
or discarding the subsequent record (since we already told TCP
we've read it during previous read but now we'll trim it out of
the skb).
Fixes: 84c61fe1a7 ("tls: rx: do not use the standard strparser")
Tested-by: Shai Amiram <samiram@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If a record is partially decrypted we'll have to CoW it, anyway,
so go into copy mode and allocate a writable skb right away.
This will make subsequent fix simpler because we won't have to
teach tls_strp_msg_make_copy() how to copy skbs while preserving
decrypt status.
Tested-by: Shai Amiram <samiram@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
alloc_skb_with_frags() fills in page frag sizes but does not
set skb->len and skb->data_len. Set those correctly otherwise
device offload will most likely generate an empty skb and
hit the BUG() at the end of __skb_nsg().
Fixes: 84c61fe1a7 ("tls: rx: do not use the standard strparser")
Tested-by: Shai Amiram <samiram@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
skb->len covers the entire skb, including the frag_list.
In fact we're guaranteed that rxm->full_len <= skb->len,
so since the change under Fixes we were not checking decrypt
status of any skb but the first.
Note that the skb_pagelen() added here may feel a bit costly,
but it's removed by subsequent fixes, anyway.
Reported-by: Tariq Toukan <tariqt@nvidia.com>
Fixes: 86b259f6f8 ("tls: rx: device: bound the frag walk")
Tested-by: Shai Amiram <samiram@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently in cdc_ncm_check_tx_max(), if dwNtbOutMaxSize is lower than
the calculated "min" value, but greater than zero, the logic sets
tx_max to dwNtbOutMaxSize. This is then used to allocate a new SKB in
cdc_ncm_fill_tx_frame() where all the data is handled.
For small values of dwNtbOutMaxSize the memory allocated during
alloc_skb(dwNtbOutMaxSize, GFP_ATOMIC) will have the same size, due to
how size is aligned at alloc time:
size = SKB_DATA_ALIGN(size);
size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
Thus we hit the same bug that we tried to squash with
commit 2be6d4d16a ("net: cdc_ncm: Allow for dwNtbOutMaxSize to be unset or zero")
Low values of dwNtbOutMaxSize do not cause an issue presently because at
alloc_skb() time more memory (512b) is allocated than required for the
SKB headers alone (320b), leaving some space (512b - 320b = 192b)
for CDC data (172b).
However, if more elements (for example 3 x u64 = [24b]) were added to
one of the SKB header structs, say 'struct skb_shared_info',
increasing its original size (320b [320b aligned]) to something larger
(344b [384b aligned]), then suddenly the CDC data (172b) no longer
fits in the spare SKB data area (512b - 384b = 128b).
Consequently the SKB bounds checking semantics fails and panics:
skbuff: skb_over_panic: text:ffffffff831f755b len:184 put:172 head:ffff88811f1c6c00 data:ffff88811f1c6c00 tail:0xb8 end:0x80 dev:<NULL>
------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:113!
invalid opcode: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 57 Comm: kworker/0:2 Not tainted 5.15.106-syzkaller-00249-g19c0ed55a470 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/14/2023
Workqueue: mld mld_ifc_work
RIP: 0010:skb_panic net/core/skbuff.c:113 [inline]
RIP: 0010:skb_over_panic+0x14c/0x150 net/core/skbuff.c:118
[snip]
Call Trace:
<TASK>
skb_put+0x151/0x210 net/core/skbuff.c:2047
skb_put_zero include/linux/skbuff.h:2422 [inline]
cdc_ncm_ndp16 drivers/net/usb/cdc_ncm.c:1131 [inline]
cdc_ncm_fill_tx_frame+0x11ab/0x3da0 drivers/net/usb/cdc_ncm.c:1308
cdc_ncm_tx_fixup+0xa3/0x100
Deal with too low values of dwNtbOutMaxSize, clamp it in the range
[USB_CDC_NCM_NTB_MIN_OUT_SIZE, CDC_NCM_NTB_MAX_SIZE_TX]. We ensure
enough data space is allocated to handle CDC data by making sure
dwNtbOutMaxSize is not smaller than USB_CDC_NCM_NTB_MIN_OUT_SIZE.
Fixes: 289507d336 ("net: cdc_ncm: use sysfs for rx/tx aggregation tuning")
Cc: stable@vger.kernel.org
Reported-by: syzbot+9f575a1f15fc0c01ed69@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?extid=b982f1059506db48409d
Link: https://lore.kernel.org/all/20211202143437.1411410-1-lee.jones@linaro.org/
Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230517133808.1873695-2-tudor.ambarus@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Move cxl_await_media_ready() to cxl_pci probe before driver starts issuing
IDENTIFY and retrieving memory device information to ensure that the
device is ready to provide the information. Allow cxl_pci_probe() to succeed
even if media is not ready. Cache the media failure in cxlds and don't ask
the device for any media information.
The rationale for proceeding in the !media_ready case is to allow for
mailbox operations to interrogate and/or remediate the device. After
media is repaired then rebinding the cxl_pci driver is expected to
restart the capacity scan.
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Fixes: b39cb1052a ("cxl/mem: Register CXL memX devices")
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/168445310026.3251520.8124296540679268206.stgit@djiang5-mobl3
[djbw: fixup cxl_test]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The Memory_Info_Valid bit (CXL 3.0 8.1.3.8.2) indicates that the CXL
Range Size High and Size Low registers are valid. The bit must be set
within 1 second of reset deassertion to the device. Check valid bit
before we check the Memory_Active bit when waiting for
cxl_await_media_ready() to ensure that the memory info is valid for
consumption. Also ensures both DVSEC ranges 1 and 2 are ready if DVSEC
Capability indicates they are both supported.
Fixes: 523e594d9c ("cxl/pci: Implement wait for media active")
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/168444687469.3134781.11033518965387297327.stgit@djiang5-mobl3
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Derick noticed, when testing hot plug, that hot-add behaves nominally
after a removal. However, if the hot-add is done without a prior
removal, CXL.mem accesses fail. It turns out that the original
implementation of the port driver and region programming wrongly assumed
that platform-firmware always enables the host-bridge HDM decoder
capability. Add support turning on switch-level HDM decoders in the case
where platform-firmware has not.
The implementation is careful to only arrange for the enable to be
undone if the current instance of the driver was the one that did the
enable. This is to interoperate with platform-firmware that may expect
CXL.mem to remain active after the driver is shutdown. This comes at the
cost of potentially not shutting down the enable on kexec flows, but it
is mitigated by the fact that the related HDM decoders still need to be
enabled on an individual basis.
Cc: <stable@vger.kernel.org>
Reported-by: Derick Marks <derick.w.marks@intel.com>
Fixes: 54cdbf845c ("cxl/port: Add a driver for 'struct cxl_port' objects")
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/168437998331.403037.15719879757678389217.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
In the BE hw_params configuration, the existing code checks if any of the
existing FEs are prepared, running, paused or suspended - and skips the
configuration in those cases. This allows multiple calls of hw_params
which the ALSA state machine supports.
This check is not handled for the prepare stage, which can lead to the
same BE being prepared multiple times. This patch adds a check similar to
that of the hw_params, with the main difference being that the suspended
state is allowed: the ALSA state machine allows a transition from
suspended to prepared with hw_params skipped.
This problem was detected on Intel IPC4/SoundWire devices, where the BE
dailink .prepare stage is used to configure the SoundWire stream with a
bank switch. Multiple .prepare calls lead to conflicts with the .trigger
operation with IPC4 configurations. This problem was not detected earlier
on Intel devices, HDaudio BE dailinks detect that the link is already
prepared and skip the configuration, and for IPC3 devices there is no BE
trigger.
Link: https://github.com/thesofproject/sof/issues/7596
Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com
Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com
Link: https://lore.kernel.org/r/20230517185731.487124-1-pierre-louis.bossart@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org
Linux 6.4-rc2
* tag 'v6.4-rc2': (162 commits)
Linux 6.4-rc2
parisc: Fix encoding of swp_entry due to added SWP_EXCLUSIVE flag
ext4: bail out of ext4_xattr_ibody_get() fails for any reason
ext4: add bounds checking in get_max_inline_xattr_value_size()
ext4: add indication of ro vs r/w mounts in the mount message
ext4: fix deadlock when converting an inline directory in nojournal mode
ext4: improve error recovery code paths in __ext4_remount()
ext4: improve error handling from ext4_dirhash()
ext4: don't clear SB_RDONLY when remounting r/w until quota is re-enabled
ext4: check iomap type only if ext4_iomap_begin() does not fail
ext4: avoid a potential slab-out-of-bounds in ext4_group_desc_csum
ext4: fix data races when using cached status extents
ext4: avoid deadlock in fs reclaim with page writeback
ext4: fix invalid free tracking in ext4_xattr_move_to_block()
ext4: remove a BUG_ON in ext4_mb_release_group_pa()
ext4: allow ext4_get_group_info() to fail
cxl: Add missing return to cdat read error path
tools/testing/cxl: Use DEFINE_STATIC_SRCU()
x86/retbleed: Fix return thunk alignment
Documentation/block: drop the request.rst file
...
Currently, when regmap_raw_write() splits the data, it uses the
max_raw_write value defined for the bus. For any bus that includes
the target register address in the max_raw_write value, the chunked
transmission will always exceed the maximum transmission length.
To avoid this problem, subtract the length of the register and the
padding from the maximum transmission.
Signed-off-by: Jim Wylder <jwylder@google.com
Link: https://lore.kernel.org/r/20230517152444.3690870-2-jwylder@google.com
Signed-off-by: Mark Brown <broonie@kernel.org
Long message loopback slice is used for achieving traffic balance between
QPs. It prevents the problem that QPs with large traffic occupying the
hardware pipeline for a long time and QPs with small traffic cannot be
scheduled.
Currently, its maximum value is set to 16K, which means only after a QP
sends 16K will the second QP be scheduled. This value is too large, which
will lead to unbalanced traffic scheduling, and thus it needs to be
modified.
The setting range of the long message loopback slice is modified to be
from 1024 (the lower limit supported by hardware) to mtu. Actual testing
shows that this value can significantly reduce error in hardware traffic
scheduling.
This solution is compatible with both HIP08 and HIP09. The modified
lp_pktn_ini has a maximum value of 2 (when mtu is 256), so the range
checking code for lp_pktn_ini is no longer necessary and needs to be
deleted.
Fixes: 0e60778efb ("RDMA/hns: Modify the value of MAX_LP_MSG_LEN to meet hardware compatibility")
Link: https://lore.kernel.org/r/20230512092245.344442-4-huangjunxian6@hisilicon.com
Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
For hns, the specification of an entry like resource (E.g. WQE/CQE/EQE)
depends on BT page size, buf page size and hopnum. For user mode, the buf
page size depends on UMEM. Therefore, the actual specification is
controlled by BT page size and hopnum.
The current BT page size and hopnum are obtained from firmware. This makes
the driver inflexible and introduces unnecessary constraints. Resource
allocation failures occur in many scenarios.
This patch will calculate whether the BT page size set by firmware is
sufficient before allocating BT, and increase the BT page size if it is
insufficient.
Fixes: 1133401412 ("RDMA/hns: Optimize base address table config flow for qp buffer")
Link: https://lore.kernel.org/r/20230512092245.344442-3-huangjunxian6@hisilicon.com
Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
On HIP08, the queried timeout attr is different from the timeout attr
configured by the user.
It is found by rdma-core testcase test_rdmacm_async_traffic:
======================================================================
FAIL: test_rdmacm_async_traffic (tests.test_rdmacm.CMTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
File "./tests/test_rdmacm.py", line 33, in test_rdmacm_async_traffic
self.two_nodes_rdmacm_traffic(CMAsyncConnection, self.rdmacm_traffic,
File "./tests/base.py", line 382, in two_nodes_rdmacm_traffic
raise(res)
AssertionError
Fixes: 926a01dc00 ("RDMA/hns: Add QP operations support for hip08 SoC")
Link: https://lore.kernel.org/r/20230512092245.344442-2-huangjunxian6@hisilicon.com
Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
The INVLPG instruction is used to invalidate TLB entries for a
specified virtual address. When PCIDs are enabled, INVLPG is supposed
to invalidate TLB entries for the specified address for both the
current PCID *and* Global entries. (Note: Only kernel mappings set
Global=1.)
Unfortunately, some INVLPG implementations can leave Global
translations unflushed when PCIDs are enabled.
As a workaround, never enable PCIDs on affected processors.
I expect there to eventually be microcode mitigations to replace this
software workaround. However, the exact version numbers where that
will happen are not known today. Once the version numbers are set in
stone, the processor list can be tweaked to only disable PCIDs on
affected processors with affected microcode.
Note: if anyone wants a quick fix that doesn't require patching, just
stick 'nopcid' on your kernel command-line.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Shifting signed 32-bit value by 31 bits is undefined, so changing
significant bit to unsigned. It was spotted by UBSAN.
So let's just fix this by using the BIT() helper for all SB_* flags.
Fixes: e462ec50cb ("VFS: Differentiate mount flags (MS_*) from internal superblock flags")
Signed-off-by: Hao Ge <gehao@kylinos.cn>
Message-Id: <20230424051835.374204-1-gehao@kylinos.cn>
[brauner@kernel.org: use BIT() for all SB_* flags]
Signed-off-by: Christian Brauner <brauner@kernel.org>
Our CI system caught a lockdep splat:
======================================================
WARNING: possible circular locking dependency detected
6.3.0-rc7+ #1167 Not tainted
------------------------------------------------------
kswapd0/46 is trying to acquire lock:
ffff8c6543abd650 (sb_internal#2){++++}-{0:0}, at: btrfs_commit_inode_delayed_inode+0x5f/0x120
but task is already holding lock:
ffffffffabe61b40 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat+0x4aa/0x7a0
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (fs_reclaim){+.+.}-{0:0}:
fs_reclaim_acquire+0xa5/0xe0
kmem_cache_alloc+0x31/0x2c0
alloc_extent_state+0x1d/0xd0
__clear_extent_bit+0x2e0/0x4f0
try_release_extent_mapping+0x216/0x280
btrfs_release_folio+0x2e/0x90
invalidate_inode_pages2_range+0x397/0x470
btrfs_cleanup_dirty_bgs+0x9e/0x210
btrfs_cleanup_one_transaction+0x22/0x760
btrfs_commit_transaction+0x3b7/0x13a0
create_subvol+0x59b/0x970
btrfs_mksubvol+0x435/0x4f0
__btrfs_ioctl_snap_create+0x11e/0x1b0
btrfs_ioctl_snap_create_v2+0xbf/0x140
btrfs_ioctl+0xa45/0x28f0
__x64_sys_ioctl+0x88/0xc0
do_syscall_64+0x38/0x90
entry_SYSCALL_64_after_hwframe+0x72/0xdc
-> #0 (sb_internal#2){++++}-{0:0}:
__lock_acquire+0x1435/0x21a0
lock_acquire+0xc2/0x2b0
start_transaction+0x401/0x730
btrfs_commit_inode_delayed_inode+0x5f/0x120
btrfs_evict_inode+0x292/0x3d0
evict+0xcc/0x1d0
inode_lru_isolate+0x14d/0x1e0
__list_lru_walk_one+0xbe/0x1c0
list_lru_walk_one+0x58/0x80
prune_icache_sb+0x39/0x60
super_cache_scan+0x161/0x1f0
do_shrink_slab+0x163/0x340
shrink_slab+0x1d3/0x290
shrink_node+0x300/0x720
balance_pgdat+0x35c/0x7a0
kswapd+0x205/0x410
kthread+0xf0/0x120
ret_from_fork+0x29/0x50
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(fs_reclaim);
lock(sb_internal#2);
lock(fs_reclaim);
lock(sb_internal#2);
*** DEADLOCK ***
3 locks held by kswapd0/46:
#0: ffffffffabe61b40 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat+0x4aa/0x7a0
#1: ffffffffabe50270 (shrinker_rwsem){++++}-{3:3}, at: shrink_slab+0x113/0x290
#2: ffff8c6543abd0e0 (&type->s_umount_key#44){++++}-{3:3}, at: super_cache_scan+0x38/0x1f0
stack backtrace:
CPU: 0 PID: 46 Comm: kswapd0 Not tainted 6.3.0-rc7+ #1167
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x58/0x90
check_noncircular+0xd6/0x100
? save_trace+0x3f/0x310
? add_lock_to_list+0x97/0x120
__lock_acquire+0x1435/0x21a0
lock_acquire+0xc2/0x2b0
? btrfs_commit_inode_delayed_inode+0x5f/0x120
start_transaction+0x401/0x730
? btrfs_commit_inode_delayed_inode+0x5f/0x120
btrfs_commit_inode_delayed_inode+0x5f/0x120
btrfs_evict_inode+0x292/0x3d0
? lock_release+0x134/0x270
? __pfx_wake_bit_function+0x10/0x10
evict+0xcc/0x1d0
inode_lru_isolate+0x14d/0x1e0
__list_lru_walk_one+0xbe/0x1c0
? __pfx_inode_lru_isolate+0x10/0x10
? __pfx_inode_lru_isolate+0x10/0x10
list_lru_walk_one+0x58/0x80
prune_icache_sb+0x39/0x60
super_cache_scan+0x161/0x1f0
do_shrink_slab+0x163/0x340
shrink_slab+0x1d3/0x290
shrink_node+0x300/0x720
balance_pgdat+0x35c/0x7a0
kswapd+0x205/0x410
? __pfx_autoremove_wake_function+0x10/0x10
? __pfx_kswapd+0x10/0x10
kthread+0xf0/0x120
? __pfx_kthread+0x10/0x10
ret_from_fork+0x29/0x50
</TASK>
This happens because when we abort the transaction in the transaction
commit path we call invalidate_inode_pages2_range on our block group
cache inodes (if we have space cache v1) and any delalloc inodes we may
have. The plain invalidate_inode_pages2_range() call passes through
GFP_KERNEL, which makes sense in most cases, but not here. Wrap these
two invalidate callees with memalloc_nofs_save/memalloc_nofs_restore to
make sure we don't end up with the fs reclaim dependency under the
transaction dependency.
CC: stable@vger.kernel.org # 4.14+
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Since f8a53bb58e ("btrfs: handle checksum generation in the storage
layer") the failures of btrfs_csum_one_bio() are handled via
bio_end_io().
This means, we can return BLK_STS_RESOURCE from btrfs_csum_one_bio() in
case the allocation of the ordered sums fails.
This also fixes a syzkaller report, where injecting a failure into the
kvzalloc() call results in a BUG_ON().
Reported-by: syzbot+d8941552e21eac774778@syzkaller.appspotmail.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Currently we allow a block group not to be marked read-only for scrub.
But for RAID56 block groups if we require the block group to be
read-only, then we're allowed to use cached content from scrub stripe to
reduce unnecessary RAID56 reads.
So this patch would:
- Make btrfs_inc_block_group_ro() try harder
During my tests, for cases like btrfs/061 and btrfs/064, we can hit
ENOSPC from btrfs_inc_block_group_ro() calls during scrub.
The reason is if we only have one single data chunk, and trying to
scrub it, we won't have any space left for any newer data writes.
But this check should be done by the caller, especially for scrub
cases we only temporarily mark the chunk read-only.
And newer data writes would always try to allocate a new data chunk
when needed.
- Return error for scrub if we failed to mark a RAID56 chunk read-only
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
If static allocation and dynamic allocation GPIOs are present,
dynamic allocation pollutes the numberspace for static allocation,
causing static allocation to fail.
Enforce dynamic allocation above GPIO_DYNAMIC_BASE.
Seen on a GTA04 when omap-gpio (static) and twl-gpio (dynamic)
raced:
[some successful registrations of omap_gpio instances]
[ 2.553833] twl4030_gpio twl4030-gpio: gpio (irq 145) chaining IRQs 161..178
[ 2.561401] gpiochip_find_base: found new base at 160
[ 2.564392] gpio gpiochip5: (twl4030): added GPIO chardev (254:5)
[ 2.564544] gpio gpiochip5: registered GPIOs 160 to 177 on twl4030
[...]
[ 2.692169] omap-gpmc 6e000000.gpmc: GPMC revision 5.0
[ 2.697357] gpmc_mem_init: disabling cs 0 mapped at 0x0-0x1000000
[ 2.703643] gpiochip_find_base: found new base at 178
[ 2.704376] gpio gpiochip6: (omap-gpmc): added GPIO chardev (254:6)
[ 2.704589] gpio gpiochip6: registered GPIOs 178 to 181 on omap-gpmc
[...]
[ 2.840393] gpio gpiochip7: Static allocation of GPIO base is deprecated, use dynamic allocation.
[ 2.849365] gpio gpiochip7: (gpio-160-191): GPIO integer space overlap, cannot add chip
[ 2.857513] gpiochip_add_data_with_key: GPIOs 160..191 (gpio-160-191) failed to register, -16
[ 2.866149] omap_gpio 48310000.gpio: error -EBUSY: Could not register gpio chip
On that device it is fixed invasively by
commit 92bf78b33b ("gpio: omap: use dynamic allocation of base")
but let's also fix that for devices where there is still
a mixture of static and dynamic allocation.
Fixes: 7b61212f2a ("gpiolib: Get rid of ARCH_NR_GPIOS")
Signed-off-by: Andreas Kemnade <andreas@kemnade.info>
Reviewed-by: <christophe.leroy@csgroup.eu>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
This driver's debugfs files have had a read operation since commit
2a9e27408e ("gpio: mockup: rework debugfs interface"), but were
still being created with write-only mode bits. Update them to
indicate that the files can also be read.
Signed-off-by: Zev Weiss <zev@bewilderbeest.net>
Fixes: 2a9e27408e ("gpio: mockup: rework debugfs interface")
Cc: stable@kernel.org # v5.1+
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
We get a warning when PM is not set:
../drivers/dma/ti/k3-udma.c:5552:12: warning: 'udma_pm_resume' defined but not used [-Wunused-function]
5552 | static int udma_pm_resume(struct device *dev)
| ^~~~~~~~~~~~~~
../drivers/dma/ti/k3-udma.c:5530:12: warning: 'udma_pm_suspend' defined but not used [-Wunused-function]
5530 | static int udma_pm_suspend(struct device *dev)
| ^~~~~~~~~~~~~~~
Fix this by annotating pm function with __maybe_unused
Fixes: fbe05149e4 ("dmaengine: ti: k3-udma: Add system suspend/resume support")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Link: https://lore.kernel.org/r/20230516174311.117264-1-vkoul@kernel.org
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Smatch warns:
drivers/dma/idxd/cdev.c:327:
idxd_cdev_open() warn: 'sva' was already freed.
When idxd_wq_set_pasid() fails, the current code unbinds sva and then
goes to 'failed_set_pasid' where iommu_sva_unbind_device is called
again causing the above warning.
[ device_user_pasid_enabled(idxd) is still true when calling
failed_set_pasid ]
Fix this by removing additional unbind when idxd_wq_set_pasid() fails
Fixes: b022f59725 ("dmaengine: idxd: add idxd_copy_cr() to copy user completion record during page fault handling")
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Acked-by: Fenghua Yu <fenghua.yu@intel.com>
Acked-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20230509060716.2830630-1-harshit.m.mogalapalli@oracle.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
The rt5682 driver switches its regmap to cache-only when the
device suspends and back to regular mode on resume. When the
jack detect interrupt fires rt5682_irq() schedules the jack
detect work. This can result in invalid reads from the regmap
in cache-only mode if the work runs before the device has
resumed:
[ 56.245502] rt5682 9-001a: ASoC: error at soc_component_read_no_lock on rt5682.9-001a for register: [0x000000f0] -16
Disable the jack detection interrupt during suspend and
re-enable it on resume. The driver already schedules the
jack detection work on resume, so any state change during
suspend is still handled.
This is essentially the same as commit f7d00a9be1 ("SoC:
rt5682s: Disable jack detection interrupt during suspend")
for the rt5682s.
Cc: stable@kernel.org
Signed-off-by: Matthias Kaehlcke <mka@chromium.org
Reviewed-by: Douglas Anderson <dianders@chromium.org
Reviewed-by: Stephen Boyd <swboyd@chromium.org
Link: https://lore.kernel.org/r/20230516164629.1.Ibf79e94b3442eecc0054d2b478779cc512d967fc@changeid
Signed-off-by: Mark Brown <broonie@kernel.org
When we run syzkaller we get below Out of Bounds error.
"KASAN: slab-out-of-bounds Read in regcache_flat_read"
Below is the backtrace of the issue:
BUG: KASAN: slab-out-of-bounds in regcache_flat_read+0x10c/0x110
Read of size 4 at addr ffffff8088fbf714 by task syz-executor.4/14144
CPU: 6 PID: 14144 Comm: syz-executor.4 Tainted: G W
Hardware name: Qualcomm Technologies, Inc. sc7280 CRD platform (rev5+) (DT)
Call trace:
dump_backtrace+0x0/0x4ec
show_stack+0x34/0x50
dump_stack_lvl+0xdc/0x11c
print_address_description+0x30/0x2d8
kasan_report+0x178/0x1e4
__asan_report_load4_noabort+0x44/0x50
regcache_flat_read+0x10c/0x110
regcache_read+0xf8/0x5a0
_regmap_read+0x45c/0x86c
_regmap_update_bits+0x128/0x290
regmap_update_bits_base+0xc0/0x15c
snd_soc_component_update_bits+0xa8/0x22c
snd_soc_component_write_field+0x68/0xd4
tx_macro_put_dec_enum+0x1d0/0x268
snd_ctl_elem_write+0x288/0x474
By Error checking and checking valid values issue gets rectifies.
Signed-off-by: Ravulapati Vishnu Vardhan Rao <quic_visr@quicinc.com
Link: https://lore.kernel.org/r/20230511112532.16106-1-quic_visr@quicinc.com
Signed-off-by: Mark Brown <broonie@kernel.org
Device uses 4KB size blocks for user pages indirect list while the
driver creates those blocks with the size of PAGE_SIZE of the kernel. On
kernels with PAGE_SIZE different than 4KB (ARM RHEL), this leads to a
failure on register MR with indirect list because of the miss
communication between driver and device.
Fixes: 40909f664d ("RDMA/efa: Add EFA verbs implementation")
Link: https://lore.kernel.org/r/20230511115103.13876-1-ynachum@amazon.com
Reviewed-by: Firas Jahjah <firasj@amazon.com>
Reviewed-by: Michael Margolin <mrgolin@amazon.com>
Signed-off-by: Yonatan Nachum <ynachum@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
The logic used for power_supply_is_system_supplied() counts all power
supplies and assumes that the system is running from AC if there is
either a non-battery power-supply reporting to be online or if no
power-supplies exist at all.
The second rule is for desktop systems, that don't have any
battery/charger devices. These systems will incorrectly report to be
powered from battery once a device scope power-supply is registered
(e.g. a HID device), since these power-supplies increase the counter.
Apart from HID devices, recent dGPUs provide UCSI power supplies on a
desktop systems. The dGPU by default doesn't have anything plugged in so
it's 'offline'. This makes power_supply_is_system_supplied() return 0
with a count of 1 meaning all drivers that use this get a wrong judgement.
To fix this case adjust the logic to also examine the scope of the power
supply. If the power supply is deemed a device power supply, then don't
count it.
Cc: Evan Quan <Evan.Quan@amd.com>
Suggested-by: Lijo Lazar <Lijo.Lazar@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
gcc on aarch64 reports
drivers/phy/mediatek/phy-mtk-hdmi-mt8195.c: In function ‘mtk_hdmi_pll_set_rate’:
drivers/phy/mediatek/phy-mtk-hdmi-mt8195.c:240:52: error: ‘-mgeneral-regs-only’
is incompatible with the use of floating-point types
240 | else if (tmds_clk >= 54 * MEGA && tmds_clk < 148.35 * MEGA)
Floating point should not be used, so rework the floating point comparisons
to fixed point.
Signed-off-by: Tom Rix <trix@redhat.com>
Reviewed-by: Chunfeng Yun <chunfeng.yun@mediatek.com>
Link: https://lore.kernel.org/r/20230502145005.2927101-1-trix@redhat.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
The commit e335bb51cc ("x86/unwind: Ensure stack pointer is aligned")
tried to align the stack pointer in show_trace_log_lvl(), otherwise the
"stack < stack_info.end" check can't guarantee that the last read does
not go past the end of the stack.
However, we have the same problem with the initial value of the stack
pointer, it can also be unaligned. So without this patch this trivial
kernel module
#include <linux/module.h>
static int init(void)
{
asm volatile("sub $0x4,%rsp");
dump_stack();
asm volatile("add $0x4,%rsp");
return -EAGAIN;
}
module_init(init);
MODULE_LICENSE("GPL");
crashes the kernel.
Fixes: e335bb51cc ("x86/unwind: Ensure stack pointer is aligned")
Signed-off-by: Vernon Lovejoy <vlovejoy@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Link: https://lore.kernel.org/r/20230512104232.GA10227@redhat.com
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
When tooling reads ELF notes, it assumes each note entry is aligned to
the value listed in the .note section header's sh_addralign field.
The kernel-created ELF notes in the .note.Linux and .note.Xen sections
are aligned to 4 bytes. This causes the toolchain to set those
sections' sh_addralign values to 4.
On the other hand, the GCC-created .note.gnu.property section has an
sh_addralign value of 8 for some reason, despite being based on struct
Elf32_Nhdr which only needs 4-byte alignment.
When the mismatched input sections get linked together into the vmlinux
.notes output section, the higher alignment "wins", resulting in an
sh_addralign of 8, which confuses tooling. For example:
$ readelf -n .tmp_vmlinux.btf
...
readelf: .tmp_vmlinux.btf: Warning: note with invalid namesz and/or descsz found at offset 0x170
readelf: .tmp_vmlinux.btf: Warning: type: 0x4, namesize: 0x006e6558, descsize: 0x00008801, alignment: 8
In this case readelf thinks there's alignment padding where there is
none, so it starts reading an ELF note in the middle.
With newer toolchains (e.g., latest Fedora Rawhide), a similar mismatch
triggers a build failure when combined with CONFIG_X86_KERNEL_IBT:
btf_encoder__encode: btf__dedup failed!
Failed to encode BTF
libbpf: failed to find '.BTF' ELF section in vmlinux
FAILED: load BTF from vmlinux: No data available
make[1]: *** [scripts/Makefile.vmlinux:35: vmlinux] Error 255
This latter error was caused by pahole crashing when it encountered the
corrupt .notes section. This crash has been fixed in dwarves version
1.25. As Tianyi Liu describes:
"Pahole reads .notes to look for LINUX_ELFNOTE_BUILD_LTO. When LTO is
enabled, pahole needs to call cus__merge_and_process_cu to merge
compile units, at which point there should only be one unspecified
type (used to represent some compilation information) in the global
context.
However, when the kernel is compiled without LTO, if pahole calls
cus__merge_and_process_cu due to alignment issues with notes,
multiple unspecified types may appear after merging the cus, and
older versions of pahole only support up to one. This is why pahole
1.24 crashes, while newer versions support multiple. However, the
latest version of pahole still does not solve the problem of
incorrect LTO recognition, so compiling the kernel may be slower
than normal."
Even with the newer pahole, the note section misaligment issue still
exists and pahole is misinterpreting the LTO note. Fix it by discarding
the .note.gnu.property section. While GNU properties are important for
user space (and VDSO), they don't seem to have any use for vmlinux.
(In fact, they're already getting (inadvertently) stripped from vmlinux
when CONFIG_DEBUG_INFO_BTF is enabled. The BTF data is extracted from
vmlinux.o with "objcopy --only-section=.BTF" into .btf.vmlinux.bin.o.
That file doesn't have .note.gnu.property, so when it gets modified and
linked back into the main object, the linker automatically strips it
(see "How GNU properties are merged" in the ld man page).)
Reported-by: Daniel Xu <dxu@dxuuu.xyz>
Link: https://lkml.kernel.org/bpf/57830c30-cd77-40cf-9cd1-3bb608aa602e@app.fastmail.com
Debugged-by: Tianyi Liu <i.pear@outlook.com>
Suggested-by: Joan Bruguera Micó <joanbrugueram@gmail.com>
Link: https://lore.kernel.org/r/20230418214925.ay3jpf2zhw75kgmd@treble
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Resolve USB 3.0 gadget failure for QM and QXPB0 in super speed mode with
single IN and OUT endpoints, like mass storage devices, due to incorrect
ACTUAL_MEM_SIZE in ep_cap2 (32k instead of actual 18k). Implement dt
property cdns,on-chip-buff-size to override ep_cap2 and set it to 18k for
imx8QM and imx8QXP chips. No adverse effects for 8QXP C0.
Cc: stable@vger.kernel.org
Fixes: dce49449e0 ("usb: cdns3: allocate TX FIFO size according to composite EP number")
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
In cdns3-gadget.c, 'cdns,on-chip-buff-size' was read using
device_property_read_u16(). It resulted in 0 if a 32bit value was used
in dts. This commit fixes the dt binding doc to declare it as u16.
Cc: stable@vger.kernel.org
Fixes: 68989fe1c3 ("dt-bindings: usb: Convert cdns-usb3.txt to YAML schema")
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Dan Carpenter reported that commit fea087fc29 "irqchip/mbigen: move
to use bus_get_dev_root()" leads to the following Smatch static checker
warning:
drivers/irqchip/irq-mbigen.c:258 mbigen_of_create_domain()
error: potentially dereferencing uninitialized 'child'.
It should not cause a problem on real hardware, but better to fix the
warning, let's move the bus_get_dev_root() out of the loop, and unify
the error handling to silence it.
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230505090654.12793-1-wangkefeng.wang@huawei.com
Since we may hold gic_lock in hardirq context, use raw spinlock
makes more sense given that it is for low-level interrupt handling
routine and the critical section is small.
Fixes BUG:
[ 0.426106] =============================
[ 0.426257] [ BUG: Invalid wait context ]
[ 0.426422] 6.3.0-rc7-next-20230421-dirty #54 Not tainted
[ 0.426638] -----------------------------
[ 0.426766] swapper/0/1 is trying to lock:
[ 0.426954] ffffffff8104e7b8 (gic_lock){....}-{3:3}, at: gic_set_type+0x30/08
Fixes: 95150ae8b3 ("irqchip: mips-gic: Implement irq_set_type callback")
Cc: stable@vger.kernel.org
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Reviewed-by: Serge Semin <fancer.lancer@gmail.com>
Tested-by: Serge Semin <fancer.lancer@gmail.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230424103156.66753-3-jiaxun.yang@flygoat.com
When a GIC local interrupt is not routable, it's vl_map will be used
to control some internal states for core (providing IPTI, IPPCI, IPFDC
input signal for core). Overriding it will interfere core's intetrupt
controller.
Do not touch vl_map if a local interrupt is not routable, we are not
going to remap it.
Before dd098a0e03 (" irqchip/mips-gic: Get rid of the reliance on
irq_cpu_online()"), if a local interrupt is not routable, then it won't
be requested from GIC Local domain, and thus gic_all_vpes_irq_cpu_online
won't be called for that particular interrupt.
Fixes: dd098a0e03 (" irqchip/mips-gic: Get rid of the reliance on irq_cpu_online()")
Cc: stable@vger.kernel.org
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Reviewed-by: Serge Semin <fancer.lancer@gmail.com>
Tested-by: Serge Semin <fancer.lancer@gmail.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230424103156.66753-2-jiaxun.yang@flygoat.com
Some Chromebooks with Mediatek SoCs have a problem where the firmware
doesn't properly save/restore certain GICR registers. Newer
Chromebooks should fix this issue and we may be able to do firmware
updates for old Chromebooks. At the moment, the only known issue with
these Chromebooks is that we can't enable "pseudo NMIs" since the
priority register can be lost. Enabling "pseudo NMIs" on Chromebooks
with the problematic firmware causes crashes and freezes.
Let's detect devices with this problem and then disable "pseudo NMIs"
on them. We'll detect the problem by looking for the presence of the
"mediatek,broken-save-restore-fw" property in the GIC device tree
node. Any devices with fixed firmware will not have this property.
Our detection plan works because we never bake a Chromebook's device
tree into firmware. Instead, device trees are always bundled with the
kernel. We'll update the device trees of all affected Chromebooks and
then we'll never enable "pseudo NMI" on a kernel that is bundled with
old device trees. When a firmware update is shipped that fixes this
issue it will know to patch the device tree to remove the property.
In order to make this work, the quick detection mechanism of the GICv3
code is extended to be able to look for properties in addition to
looking at "compatible".
Reviewed-by: Julius Werner <jwerner@chromium.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230515131353.v2.2.I88dc0a0eb1d9d537de61604cd8994ecc55c0cac1@changeid
When trying to turn on the "pseudo NMI" kernel feature in Linux, it
was discovered that all Mediatek-based Chromebooks that ever shipped
(at least ones with GICv3) had a firmware bug where they wouldn't save
certain GIC "GICR" registers properly. If a processor ever entered a
suspend/idle mode where the GICR registers lost state then they'd be
reset to their default state.
As a result of the bug, if you try to enable "pseudo NMIs" on the
affected devices then certain interrupts will unexpectedly get
promoted to be "pseudo NMIs" and cause crashes / freezes / general
mayhem.
ChromeOS is looking to start turning on "pseudo NMIs" in production to
make crash reports more actionable. To do so, we will release firmware
updates for at least some of the affected Mediatek Chromebooks.
However, even when we update the firmware of a Chromebook it's always
possible that a user will end up booting with old firmware. We need to
be able to detect when we're running with firmware that will crash and
burn if pseudo NMIs are enabled.
The current plan is:
* Update the device trees of all affected Chromebooks to include the
'mediatek,broken-save-restore-fw' property. The kernel can use this
to know not to enable certain features like "pseudo NMI". NOTE:
device trees for Chromebooks are never baked into the firmware but
are bundled with the kernel. A kernel will never be configured to
use "pseudo NMIs" and be bundled with an old device tree.
* When we get a fixed firmware for one of these Chromebooks, it will
patch the device tree to remove this property.
For some details, you can also see the public bug
<https://issuetracker.google.com/281831288>
Reviewed-by: Julius Werner <jwerner@chromium.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230515131353.v2.1.Iabe67a827e206496efec6beb5616d5a3b99c1e65@changeid
The last argument to the function drm_fbdev_dma_setup() was
changed from desired BPP to desired depth.
In our case the desired depth was 15 but BPP was 16, so we
specified 16 as BPP and we relied on the FB emulation core to
select a format with a suitable depth for the limited bandwidth
and end up with e.g. XRGB1555 like in the past:
[drm] Initialized pl111 1.0.0 20170317 for c1000000.display on minor 0
drm-clcd-pl111 c1000000.display: [drm] requested bpp 16, scaled depth down to 15
drm-clcd-pl111 c1000000.display: enable IM-PD1 CLCD connectors
Console: switching to colour frame buffer device 80x30
drm-clcd-pl111 c1000000.display: [drm] fb0: pl111drmfb frame buffer device
However the current code will fail at that:
[drm] Initialized pl111 1.0.0 20170317 for c1000000.display on minor 0
drm-clcd-pl111 c1000000.display: [drm] bpp/depth value of 16/16 not supported
drm-clcd-pl111 c1000000.display: [drm] No compatible format found
drm-clcd-pl111 c1000000.display: [drm] *ERROR* fbdev: Failed to setup generic emulation (ret=-12)
Fix this by passing the desired depth of 15 for the IM/PD-1 display
instead of 16 to drm_fbdev_dma_setup().
The desired depth is however in turn used for bandwidth limiting
calculations and that was done with a simple / integer division,
whereas we now have to modify that to use DIV_ROUND_UP() so that
we get DIV_ROUND_UP(15, 2) = 2 not 15/2 = 1.
After this the display works again on the Integrator/AP IM/PD-1.
Cc: Emma Anholt <emma@anholt.net>
Cc: stable@vger.kernel.org
Suggested-by: Thomas Zimmermann <tzimmermann@suse.de>
Fixes: 37c90d589d ("drm/fb-helper: Fix single-probe color-format selection")
Link: https://lore.kernel.org/dri-devel/20230102112927.26565-1-tzimmermann@suse.de/
Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20230515092943.1401558-1-linus.walleij@linaro.org
CHARGE_INHIBITED bit position of the ChargerStatus register is actually
0 not 1. This patch corrects it.
Fixes: feb583e37f ("power: supply: add sbs-charger driver")
Signed-off-by: Daisuke Nojiri <dnojiri@chromium.org>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
After suspend/resume cycle there is an error message and auto-mode
or CnQF stops working.
[ 5741.447511] amd-pmf AMDI0100:00: SMU cmd failed. err: 0xff
[ 5741.447523] amd-pmf AMDI0100:00: AMD_PMF_REGISTER_RESPONSE:ff
[ 5741.447527] amd-pmf AMDI0100:00: AMD_PMF_REGISTER_ARGUMENT:7
[ 5741.447531] amd-pmf AMDI0100:00: AMD_PMF_REGISTER_MESSAGE:16
[ 5741.447540] amd-pmf AMDI0100:00: [AUTO_MODE] avg power: 0 mW mode: QUIET
This is because the DRAM address used for accessing metrics table
needs to be refreshed after a suspend resume cycle. Add a resume
callback to reset this again.
Fixes: 1a409b35c9 ("platform/x86/amd/pmf: Get performance metrics from PMFW")
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://lore.kernel.org/r/20230513011408.958-1-mario.limonciello@amd.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
On ASUS GU604V the key 0x7B is issued when the charger is connected or
disconnected, and key 0xC0 is issued when an external display is
connected or disconnected.
This commit maps them to KE_IGNORE to slience kernel messages about
unknown keys, such as:
kernel: asus_wmi: Unknown key code 0x7b
Signed-off-by: Alexandru Sorodoc <ealex95@gmail.com>
Link: https://lore.kernel.org/r/20230512101517.47416-1-ealex95@gmail.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
After TEE has completed processing of TEE_CMD_ID_LOAD_TA, set proper
value in 'return_origin' argument passed by open_session() call. To do
so, add 'return_origin' field to the structure tee_cmd_load_ta. The
Trusted OS shall update return_origin as part of TEE processing.
This change to 'struct tee_cmd_load_ta' interface requires a similar update
in AMD-TEE Trusted OS's TEE_CMD_ID_LOAD_TA interface.
This patch has been verified on Phoenix Birman setup. On older APUs,
return_origin value will be 0.
Cc: stable@vger.kernel.org
Fixes: 757cc3e9ff ("tee: add AMD-TEE driver")
Tested-by: Sourabh Das <sourabh.das@amd.com>
Signed-off-by: Rijo Thomas <Rijo-john.Thomas@amd.com>
Acked-by: Sumit Garg <sumit.garg@linaro.org>
Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>
i.MX8, i.MX8X, i.MX8XP and i.MX8XL SOC device trees are all based on
imx8-ss-*.dtsi files. For i.MX8X and i.MX8XP these device trees
should be updated with some peripherals removed or updated, similar
to i.MX8XL (imx8dxl-ss-*.dtsi files). However, it looks like only
i.MX8 and i.MX8XL are up to date, but for i.MX8X and i.MX8XP some
of the peripherals got inherited from imx8-ss-*.dtsi files, but in
reality they are not present on SOC.
As a result, during resource partition ownership check U-Boot receives
messages from SCU firmware about these resources not owned by boot
partition. In reality, these resources are not owned by anyone, as
they simply does not exist, but are defined in Linux device tree.
This change removes those peripherals, which are listed during
U-Boot resource partition ownership check as warnings:
## Flattened Device Tree blob at 9d400000
Booting using the fdt blob at 0x9d400000
Loading Device Tree to 00000000fd652000, end 00000000fd67efff ... OK
Disable clock-controller@59580000 rsrc 512 not owned
Disable clock-controller@5ac90000 rsrc 102 not owned
Starting kernel ...
Fixes: ba5a5615d5 ("arm64: dts: freescale: add initial support for colibri imx8x")
Signed-off-by: Andrejs Cainikovs <andrejs.cainikovs@toradex.com>
Signed-off-by: Francesco Dolcini <francesco.dolcini@toradex.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Remove GPIO3_IO10 from Iris carrier board pinctrl configuration,
this is already defined in the SOM dtsi since this is a
standard SOM functionality (wake-up button).
Duplicating it leads to the following error message
imx8qxp-pinctrl scu:pinctrl: pin IMX8QXP_QSPI0A_DATA1 already requested
Fixes: aefb5e2d97 ("arm64: dts: colibri-imx8x: Add iris carrier board")
Signed-off-by: Emanuele Ghidoli <emanuele.ghidoli@toradex.com>
Signed-off-by: Andrejs Cainikovs <andrejs.cainikovs@toradex.com>
Signed-off-by: Francesco Dolcini <francesco.dolcini@toradex.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Each carrier board device tree except the eval board one already override
iomuxc pinctrl property to configure unused pins as gpio.
So move also the pinctrl property to eval board device tree.
Leave the pin group definition in imx8x-colibri.dtsi to avoid duplication
and simplify configuration of gpio.
Signed-off-by: Emanuele Ghidoli <emanuele.ghidoli@toradex.com>
Signed-off-by: Andrejs Cainikovs <andrejs.cainikovs@toradex.com>
Signed-off-by: Francesco Dolcini <francesco.dolcini@toradex.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Fix pinctrl groups to have SODIMM 75 only in one group.
Remove configuration of the pin at SoM level because it is normally
used as CSI_MCLK at camera interface connector.
Without this fix it is not possible, without redefining iomuxc pinctrl,
to use CSI_MCLK signal and leads to the following error messages:
imx8qxp-pinctrl scu:pinctrl: pin IMX8QXP_CSI_MCLK already requested
imx8qxp-pinctrl scu:pinctrl: pin-147 (16-003c) status -22
Fixes: 4d2adf7381 ("arm64: dts: colibri-imx8x: Split pinctrl_hog1")
Signed-off-by: Emanuele Ghidoli <emanuele.ghidoli@toradex.com>
Signed-off-by: Andrejs Cainikovs <andrejs.cainikovs@toradex.com>
Signed-off-by: Francesco Dolcini <francesco.dolcini@toradex.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
There are a few clocks whose parents are set in mipi_dsi
and lcdif nodes, but these clocks are used by the media_blk_ctrl
power domain. This may cause an issue when re-parenting, because
the media_blk_ctrl may start the clocks before the reparent is
done resulting in a disp_pixel clock having the wrong parent and
rate.
Fix this by moving the assigned-clock-parents and rates to the
media_blk_ctrl node to configure these clocks before they are enabled.
After this patch, both disp1_pix_root and dixp2_pix_root clock
become children of the video_pll1.
video_pll1_ref_sel 24000000
video_pll1 1039500000
video_pll1_bypass 1039500000
video_pll1_out 1039500000
media_disp2_pix 1039500000
media_disp2_pix_root_clk 1039500000
media_disp1_pix 1039500000
media_disp1_pix_root_clk 1039500000
Fixes: eda09fe149 ("arm64: dts: imx8mp: Add display pipeline components")
Signed-off-by: Adam Ford <aford173@gmail.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Commit b1b90514ea ("spi: spi-cadence: Add support for Slave mode")
updated the code to trigger the IRQ when the FIFO was half empty,
overlapping filling more data into the FIFO and sending what is left.
This appears to cause regressions on the Zynq 7000, for transactions
longer than the FIFO size, below that no overlapping occurs.
It would appear from my testing that any attempt to put new data into
the FIFO whilst data is still transmitting causes data corruption
on both send and receive. If I am reading the commit message right
on commit 49530e6411 ("spi: cadence: Add usleep_range() for
cdns_spi_fill_tx_fifo()"), that would also seem to imply this is the
case.
On the assumption that this isn't an issue on the platform
the original slave mode support was added for, update the
cdns_transfer_one to only set the watermark to 50% of the FIFO size
when in slave mode. There by retaining the new behaviour for slave
mode but reverting to the older behaviour when the SPI is used a
master.
Fixes: b1b90514ea ("spi: spi-cadence: Add support for Slave mode")
Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com
Link: https://lore.kernel.org/r/20230509164153.3907694-2-ckeepax@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org
Recent changes to cdns_spi_irq introduced some issues.
Firstly, when writing the end of a longer transaction, the code in
cdns_spi_irq will write data into the TX FIFO, then immediately
fall into the if (!xspi->tx_bytes) path and attempt to read data
from the RX FIFO. However this required waiting for the TX FIFO to
empty before the RX data was ready.
Secondly, the variable trans_cnt is now rather inaccurately named
since in cases, where the watermark is set to 1, trans_cnt will be
1 but the count of bytes transferred would be much longer.
Finally, when setting up the transaction we set the watermark to 50%
of the FIFO if the transaction is great than 50% of the FIFO. However,
there is no need to split a tranaction that is smaller than the
whole FIFO, so anything up to the FIFO size can be done in a single
transaction.
Tidy up the code a little, to avoid repeatedly calling
cdns_spi_read_rx_fifo with a count of 1, and correct the three issues
noted above.
Fixes: b1b90514ea ("spi: spi-cadence: Add support for Slave mode")
Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com
Link: https://lore.kernel.org/r/20230509164153.3907694-1-ckeepax@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org
When building sign-file, the call to get the CFLAGS for libcrypto is
missing white-space between `pkg-config` and `--cflags`:
$(shell $(HOSTPKG_CONFIG)--cflags libcrypto 2> /dev/null)
Removing the redirection of stderr, we see:
$ make -C tools/testing/selftests/bpf sign-file
make: Entering directory '[...]/tools/testing/selftests/bpf'
make: pkg-config--cflags: No such file or directory
SIGN-FILE sign-file
make: Leaving directory '[...]/tools/testing/selftests/bpf'
Add the missing space.
Fixes: fc97590668 ("selftests/bpf: Add test for bpf_verify_pkcs7_signature() kfunc")
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Roberto Sassu <roberto.sassu@huawei.com>
Link: https://lore.kernel.org/bpf/20230426215032.415792-1-jeremy@azazel.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This worked before by coincidence, as the regulator was probed and enabled
before PCI RC probe. But probe order changed since commit 259b93b21a
("regulator: Set PROBE_PREFER_ASYNCHRONOUS for drivers that existed in
4.14") and PCIe supply is enabled after RC.
Fix this by adding the regulator to RC node.
The PCIe vaux regulator still needs to be enabled unconditionally for
Mini-PCIe USB-only devices.
Fixes: ef3846247b ("ARM: dts: imx6qdl: add TQ-Systems MBa6x device trees")
Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
According to Renesas Electronics (formerly Dialog Semiconductor), the
standard AUTO mode of the PMIC DA9061 can lead to stability problems
depending on the hardware revision. It is recommended to set a defined
mode such as PFM or PWM permanently. So set and limit the mode for
buck 1, 2 and 3 to a fixed one.
Fixes: 611b6c891e ("ARM: dts: imx6ull-dhcom: Add DH electronics DHCOM i.MX6ULL SoM and PDK2 board")
Signed-off-by: Christoph Niedermaier <cniedermaier@dh-electronics.com>
Reviewed-by: Marek Vasut <marex@denx.de>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
While testing the ethernet interface on a Variscite symphony carrier
board using an imx8mn SOM with an onboard ADIN1300 PHY (EC hardware
configuration), the ethernet PHY is not detected.
The ADIN1300 datasheet indicate that the "Management interface
active (t4)" state is reached at most 5ms after the reset signal is
deasserted.
The device tree in Variscite custom git repository uses the following
property:
phy-reset-post-delay = <20>;
Add a new MDIO property 'reset-deassert-us' of 20ms to have the same
delay inside the ethphy node. Adding this property fixes the problem
with the PHY detection.
Note that this SOM can also have an Atheros AR8033 PHY. In this case,
a 1ms deassert delay is sufficient. Add a comment to that effect.
Fixes: ade0176dd8 ("arm64: dts: imx8mn-var-som: Add Variscite VAR-SOM-MX8MN System on Module")
Signed-off-by: Hugo Villeneuve <hvilleneuve@dimonoff.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
There are a few clocks whose parents are set in mipi_dsi
and mxsfb nodes, but these clocks are used by the disp_blk_ctrl
power domain which may cause an issue when re-parenting, resuling
in a disp_pixel clock having the wrong parent and wrong rate.
Fix this by moving the assigned-clock-parents as associate clock
assignments to the power-domain node to setup these clocks before
they are enabled.
Fixes: d825fb6455 ("arm64: dts: imx8mn: Add display pipeline components")
Signed-off-by: Adam Ford <aford173@gmail.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
The reset bit must be always written to the hardware no matter what value
is in a cache or register. Ensure this by using regmap_write_bits()
instead of the regmap_update_bits(). Furthermore, the SWRESET bit may be
self-clearing, so mark the SYSTEM_CONTROL register volatile to guarantee
we do also read the right state - should we ever need to read it.
Finally, writing the SWRESET bit will restore the default register
values. This can cause register cache to be outdated if there are any
register values cached.
Rebuild register cache after SWRESET and use regmap_update_bits() when
performing the reset.
Signed-off-by: Matti Vaittinen <mazziesaccount@gmail.com>
Fixes: e52afbd610 ("iio: light: ROHM BU27034 Ambient Light Sensor")
Link: https://lore.kernel.org/r/ZFjWhbfuN5XcKty+@fedora
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Commit 28d1a7ac2a ("iio: dac: Add AD5758 support") adds the config AD5758
and the corresponding driver ad5758.c. In the Makefile, the ad5758 driver
is however included when AD5755 is selected, not when AD5758 is selected.
Probably, this was simply a mistake that happened by copy-and-paste and
forgetting to adjust the actual line. Surprisingly, no one has ever noticed
that this driver is actually only included when AD5755 is selected and that
the config AD5758 has actually no effect on the build.
Fixes: 28d1a7ac2a ("iio: dac: Add AD5758 support")
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Link: https://lore.kernel.org/r/20230508040208.12033-1-lukas.bulwahn@gmail.com
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
The AD7192 provides a specific channel configuration where both negative
and positive inputs are connected to AIN2. This was represented in the
ad7192 driver as a IIO channel with .channel = 2 and .extended_name set
to "shorted".
The problem with this approach, is that the driver provided two IIO
channels with the identifier .channel = 2; one "shorted" and the other
not. This goes against the IIO ABI, as a channel identifier should be
unique.
Address this issue by changing "shorted" channels to being differential
instead, with channel 2 vs. itself, as we're actually measuring AIN2 vs.
itself.
Note that the fix tag is for the commit that moved the driver out of
staging. The bug existed before that, but backporting would become very
complex further down and unlikely to happen.
Fixes: b581f748cc ("staging: iio: adc: ad7192: move out of staging")
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Co-developed-by: Alisa Roman <alisa.roman@analog.com>
Signed-off-by: Alisa Roman <alisa.roman@analog.com>
Reviewed-by: Nuno Sa <nuno.sa@analog.com>
Link: https://lore.kernel.org/r/20230330102100.17590-1-paul@crapouillou.net
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
When apply_acpi_orientation() fails, st_accel_common_probe() will fall back
to iio_read_mount_matrix(), which checks for a mount-matrix device property
and if that is not set falls back to the identity matrix.
But when a sensor has no ACPI companion fwnode, or when the ACPI fwnode
does not have a "_ONT" method apply_acpi_orientation() was returning 0,
causing iio_read_mount_matrix() to never get called resulting in an
invalid mount_matrix:
[root@fedora ~]# cat /sys/bus/iio/devices/iio\:device0/mount_matrix
(null), (null), (null); (null), (null), (null); (null), (null), (null)
Fix this by making apply_acpi_orientation() always return an error when
it did not set the mount_matrix.
Fixes: 3d8ad94bb1 ("iio: accel: st_sensors: Support generic mounting matrix")
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Tested-by: Marius Hoch <mail@mariushoch.de>
Link: https://lore.kernel.org/r/20230416212409.310936-1-hdegoede@redhat.com
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
The kerneldoc for iio_gts_find_sel_by_int_time() has an error.
Documentation states that function is searching a selector for a HW-gain
while it is searching a selector for an integration time.
Fix the documentation by saying the function is looking for a selector
for an integration time.
Fixes: 38416c28e1 ("iio: light: Add gain-time-scale helpers")
Signed-off-by: Matti Vaittinen <mazziesaccount@gmail.com>
Link: https://lore.kernel.org/r/ZEIjI4YUzqPZk/9X@fedora
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Valid values for "adc_chan" are zero to (PALMAS_ADC_CH_MAX - 1).
Smatch detects some buffer overflows caused by this:
drivers/iio/adc/palmas_gpadc.c:721 palmas_gpadc_read_event_value() error: buffer overflow 'adc->thresholds' 16 <= 16
drivers/iio/adc/palmas_gpadc.c:758 palmas_gpadc_write_event_value() error: buffer overflow 'adc->thresholds' 16 <= 16
The effect of this bug in other functions is more complicated but
obviously we should fix all of them.
Fixes: a99544c6c8 ("iio: adc: palmas: add support for iio threshold events")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Link: https://lore.kernel.org/r/14fee94a-7db7-4371-b7d6-e94d86b9561e@kili.mountain
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Smatch reports:
drivers/iio/adc/mxs-lradc-adc.c:766 mxs_lradc_adc_probe() warn:
missing unwind goto?
the order of three init operation:
1.mxs_lradc_adc_trigger_init
2.iio_triggered_buffer_setup
3.mxs_lradc_adc_hw_init
thus, the order of three cleanup operation should be:
1.mxs_lradc_adc_hw_stop
2.iio_triggered_buffer_cleanup
3.mxs_lradc_adc_trigger_remove
we exchange the order of two cleanup operations,
introducing the following differences:
1.if mxs_lradc_adc_trigger_init fails, returns directly;
2.if trigger_init succeeds but iio_triggered_buffer_setup fails,
goto err_trig and remove the trigger.
In addition, we also reorder the unwind that goes on in the
remove() callback to match the new ordering.
Fixes: 6dd112b9f8 ("iio: adc: mxs-lradc: Add support for ADC driver")
Signed-off-by: Jiakai Luo <jkluo@hust.edu.cn>
Reviewed-by: Dongliang Mu <dzm91@hust.edu.cn>
Link: https://lore.kernel.org/r/20230422133407.72908-1-jkluo@hust.edu.cn
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
The problem is these lines:
ret = vref_uv = regulator_get_voltage(adc->vref);
if (ret < 0)
The "ret" variable is type long and "vref_uv" is u32 so that means
the condition can never be true on a 64bit system. A negative error
code from regulator_get_voltage() would be cast to a high positive
u32 value and then remain a high positive value when cast to a long.
The "ret" variable only ever stores ints so it should be declared as
an int. We can delete the "vref_uv" variable and use "ret" directly.
Fixes: 7d02296ac8 ("iio: adc: add imx93 adc support")
Signed-off-by: Dan Carpenter <error27@gmail.com>
Reviewed-by: Haibo Chen <haibo.chen@nxp.com>
Link: https://lore.kernel.org/r/Y+utEvjfjQRQo2QB@kili
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Bring back the original lockless design in binder_alloc to determine
whether the buffer setup has been completed by the ->mmap() handler.
However, this time use smp_load_acquire() and smp_store_release() to
wrap all the ordering in a single macro call.
Also, add comments to make it evident that binder uses alloc->vma to
determine when the binder_alloc has been fully initialized. In these
scenarios acquiring the mmap_lock is not required.
Fixes: a43cfc87ca ("android: binder: stop saving a pointer to the VMA")
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Link: https://lore.kernel.org/r/20230502201220.1756319-3-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In binder_transaction_buffer_release() the 'failed_at' offset indicates
the number of objects to clean up. However, this function was changed by
commit 44d8047f1d ("binder: use standard functions to allocate fds"),
to release all the objects in the buffer when 'failed_at' is zero.
This introduced an issue when a transaction buffer is released without
any objects having been processed so far. In this case, 'failed_at' is
indeed zero yet it is misinterpreted as releasing the entire buffer.
This leads to use-after-free errors where nodes are incorrectly freed
and subsequently accessed. Such is the case in the following KASAN
report:
==================================================================
BUG: KASAN: slab-use-after-free in binder_thread_read+0xc40/0x1f30
Read of size 8 at addr ffff4faf037cfc58 by task poc/474
CPU: 6 PID: 474 Comm: poc Not tainted 6.3.0-12570-g7df047b3f0aa #5
Hardware name: linux,dummy-virt (DT)
Call trace:
dump_backtrace+0x94/0xec
show_stack+0x18/0x24
dump_stack_lvl+0x48/0x60
print_report+0xf8/0x5b8
kasan_report+0xb8/0xfc
__asan_load8+0x9c/0xb8
binder_thread_read+0xc40/0x1f30
binder_ioctl+0xd9c/0x1768
__arm64_sys_ioctl+0xd4/0x118
invoke_syscall+0x60/0x188
[...]
Allocated by task 474:
kasan_save_stack+0x3c/0x64
kasan_set_track+0x2c/0x40
kasan_save_alloc_info+0x24/0x34
__kasan_kmalloc+0xb8/0xbc
kmalloc_trace+0x48/0x5c
binder_new_node+0x3c/0x3a4
binder_transaction+0x2b58/0x36f0
binder_thread_write+0x8e0/0x1b78
binder_ioctl+0x14a0/0x1768
__arm64_sys_ioctl+0xd4/0x118
invoke_syscall+0x60/0x188
[...]
Freed by task 475:
kasan_save_stack+0x3c/0x64
kasan_set_track+0x2c/0x40
kasan_save_free_info+0x38/0x5c
__kasan_slab_free+0xe8/0x154
__kmem_cache_free+0x128/0x2bc
kfree+0x58/0x70
binder_dec_node_tmpref+0x178/0x1fc
binder_transaction_buffer_release+0x430/0x628
binder_transaction+0x1954/0x36f0
binder_thread_write+0x8e0/0x1b78
binder_ioctl+0x14a0/0x1768
__arm64_sys_ioctl+0xd4/0x118
invoke_syscall+0x60/0x188
[...]
==================================================================
In order to avoid these issues, let's always calculate the intended
'failed_at' offset beforehand. This is renamed and wrapped in a helper
function to make it clear and convenient.
Fixes: 32e9f56a96 ("binder: don't detect sender/target during buffer cleanup")
Reported-by: Zi Fan Tan <zifantan@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Acked-by: Todd Kjos <tkjos@google.com>
Link: https://lore.kernel.org/r/20230505203020.4101154-1-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Suzuki writes:
coresight: Fixes for v6.4
Couple of fixes for coresight self-hosted tracing subsystem for v6.4.
These include :
- Fix signedness bug in tmc_etr_buf_insert_barrier_packet()
- Fix memory leak with failing to release the path if trace-id allocation fails
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
* tag 'coresight-fixes-v6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/coresight/linux:
coresight: perf: Release Coresight path when alloc trace id failed
coresight: Fix signedness bug in tmc_etr_buf_insert_barrier_packet()
Driver populates the list of pages used for Memory region wrongly when
page size is more than system page size. This is causing a failure when
some of the applications that creates MR with page size as 2M. Since HW
can support multiple page sizes, pass the correct page size while creating
the MR.
Also, driver need not adjust the number of pages when HW Queues are
created with user memory. It should work with the number of dma blocks
returned by ib_umem_num_dma_blocks. Fix this calculation also.
Fixes: 0c4dcd6028 ("RDMA/bnxt_re: Refactor hardware queue memory allocation")
Fixes: f6919d5638 ("RDMA/bnxt_re: Code refactor while populating user MRs")
Link: https://lore.kernel.org/r/1683484169-9539-1-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
gcc-13 may generate calls for __bswap{si,di}2. This breaks the kernel
build when optimization for size is selected. Add __bswap{si,di}2
helpers to fix that.
Cc: stable@vger.kernel.org
Fixes: 19c5699f9a ("xtensa: don't link with libgcc")
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Fetch function descriptor pointed to by the signal handler pointer from
userspace on signal delivery and function pointer pointed to by the
sa_restorer on return from the signal handler.
Cc: stable@vger.kernel.org
Fixes: e3ddb8bbe0 ("xtensa: add FDPIC and static PIE support for noMMU")
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Some devices have a wrong entry in their button array which points to
a GPIO which is required in another driver, so soc_button_array must
not claim it.
A specific example of this is the Lenovo Yoga Book X90F / X90L,
where the PNP0C40 home button entry points to a GPIO which is not
a home button and which is required by the lenovo-yogabook driver.
Add a DMI quirk table which can specify an ACPI GPIO resource index which
should be skipped; and add an entry for the Lenovo Yoga Book X90F / X90L
to this new DMI quirk table.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20230414072116.4497-1-hdegoede@redhat.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Reduce the amount of output this dev_dbg() statement emits into logs,
otherwise if system software polls the sysfs entry for data and keeps
getting -ENODATA, it could end up filling the logs up.
This does in fact make systemd journald choke, since during boot the
sysfs power supply entries are polled and if journald starts at the
same time, the journal is just being repeatedly filled up, and the
system stops on trying to start journald without booting any further.
Signed-off-by: Marek Vasut <marex@denx.de>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
According to Mirsad the gpio-sim.sh test appears to FAIL in a wrong way
due to missing initialisation of shell variables:
4.2. Bias settings work correctly
cat: /sys/devices/platform/gpio-sim.0/gpiochip18/sim_gpio0/value: No such file or directory
./gpio-sim.sh: line 393: test: =: unary operator expected
bias setting does not work
GPIO gpio-sim test FAIL
After this change the test passed:
4.2. Bias settings work correctly
GPIO gpio-sim test PASS
His testing environment is AlmaLinux 8.7 on Lenovo desktop box with
the latest Linux kernel based on v6.2:
Linux 6.2.0-mglru-kmlk-andy-09238-gd2980d8d8265 x86_64
Suggested-by: Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Tested-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
Signed-off-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
This code generates a Smatch warning:
drivers/hwtracing/coresight/coresight-tmc-etr.c:947 tmc_etr_buf_insert_barrier_packet()
error: uninitialized symbol 'bufp'.
The problem is that if tmc_sg_table_get_data() returns -EINVAL, then
when we test if "len < CORESIGHT_BARRIER_PKT_SIZE", the negative "len"
value is type promoted to a high unsigned long value which is greater
than CORESIGHT_BARRIER_PKT_SIZE. Fix this bug by adding an explicit
check for error codes.
Fixes: 75f4e3619f ("coresight: tmc-etr: Add transparent buffer management")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/7d33e244-d8b9-4c27-9653-883a13534b01@kili.mountain
Spi geni driver switches between FIFO and DMA modes based on xfer length.
FIFO mode relies on M_CMD_DONE_EN interrupt for completion while DMA mode
relies on XX_DMA_DONE.
During dynamic switching, if FIFO mode is chosen, FIFO related interrupts
are enabled and DMA related interrupts are disabled. And viceversa.
Chip select shares M_CMD_DONE_EN interrupt with FIFO to check completion.
Now, if a chip select operation is preceded by a DMA xfer, M_CMD_DONE_EN
interrupt would have been disabled and hence it will never receive one
resulting in timeout.
For chip select, in addition to setting the xfer mode to FIFO,
select_mode() to FIFO so that required interrupts are enabled.
Fixes: e5f0dfa78a ("spi: spi-geni-qcom: Add support for SE DMA mode")
Suggested-by: Praveen Talari <quic_ptalari@quicinc.com
Signed-off-by: Vijaya Krishna Nivarthi <quic_vnivarth@quicinc.com
Reviewed-by: Douglas Anderson <dianders@chromium.org
Link: https://lore.kernel.org/r/1683626496-9685-1-git-send-email-quic_vnivarth@quicinc.com
Signed-off-by: Mark Brown <broonie@kernel.org
After commit 1ed5c3b22f ("mmc: sdhci-esdhc-imx: Propagate
ESDHC_FLAG_HS400* only on 8bit bus"), the property "no-mmc-hs400"
from device tree file do not work any more.
This patch reorder the code, which can avoid the warning message
"drop HS400 support since no 8-bit bus" and also make the property
"no-mmc-hs400" from dts file works.
Fixes: 1ed5c3b22f ("mmc: sdhci-esdhc-imx: Propagate ESDHC_FLAG_HS400* only on 8bit bus")
Signed-off-by: Haibo Chen <haibo.chen@nxp.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230504112222.3599602-1-haibo.chen@nxp.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Requests to the mmc layer usually come through a block device IO.
The exceptions are the ioctl interface, RPMB chardev ioctl
and debugfs, which issue their own blk_mq requests through
blk_execute_rq and do not query the BLK_STS error but the
mmcblk-internal drv_op_result. This patch ensures that drv_op_result
defaults to an error and has to be overwritten by the operation
to be considered successful.
The behavior leads to a bug where the request never propagates
the error, e.g. by directly erroring out at mmc_blk_mq_issue_rq if
mmc_blk_part_switch fails. The ioctl caller of the rpmb chardev then
can never see an error (BLK_STS_IOERR, but drv_op_result is unchanged)
and thus may assume that their call executed successfully when it did not.
While always checking the blk_execute_rq return value would be
advised, let's eliminate the error by always setting
drv_op_result as -EIO to be overwritten on success (or other error)
Fixes: 614f0388f5 ("mmc: block: move single ioctl() commands to block requests")
Signed-off-by: Christian Loehle <cloehle@hyperstone.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/59c17ada35664b818b7bd83752119b2d@hyperstone.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
The transmit buffers allocated by the driver can be used to transmit data
by any messages/commands needing the buffer. However, it is not guaranteed
to have been zero-ed before every new transmission and hence it will just
contain residual value from the previous transmission. There are several
reserved fields in the memory descriptors that must be zero(MBZ). The
receiver can reject the transmission if any such MBZ fields are non-zero.
While we can set the whole page to zero, it is not optimal as most of the
fields get initialised to the value required for the current transmission.
So, just set the reserved/MBZ fields to zero in the memory descriptors
explicitly to honour the requirement and keep the receiver happy.
Fixes: cc2195fe53 ("firmware: arm_ffa: Add support for MEM_* interfaces")
Reported-by: Marc Bonnici <marc.bonnici@arm.com>
Link: https://lore.kernel.org/r/20230503131252.12585-1-sudeep.holla@arm.com
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Each physical partition can provide multiple services each with UUID.
Each such service can be presented as logical partition with a unique
combination of VM ID and UUID. The number of distinct UUID in a system
will be less than or equal to the number of logical partitions.
However, currently it fails to register more than one logical partition
or service within a physical partition as the device name contains only
VM ID while both VM ID and UUID are maintained in the partition information.
The kernel complains with the below message:
| sysfs: cannot create duplicate filename '/devices/arm-ffa-8001'
| CPU: 1 PID: 1 Comm: swapper/0 Not tainted 6.3.0-rc7 #8
| Hardware name: FVP Base RevC (DT)
| Call trace:
| dump_backtrace+0xf8/0x118
| show_stack+0x18/0x24
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| sysfs_create_dir_ns+0xe0/0x13c
| kobject_add_internal+0x220/0x3d4
| kobject_add+0x94/0x100
| device_add+0x144/0x5d8
| device_register+0x20/0x30
| ffa_device_register+0x88/0xd8
| ffa_setup_partitions+0x108/0x1b8
| ffa_init+0x2ec/0x3a4
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| kobject_add_internal failed for arm-ffa-8001 with -EEXIST, don't try to
| register things with the same name in the same directory.
| arm_ffa arm-ffa: unable to register device arm-ffa-8001 err=-17
| ARM FF-A: ffa_setup_partitions: failed to register partition ID 0x8001
By virtue of being random enough to avoid collisions when generated in a
distributed system, there is no way to compress UUID keys to the number
of bits required to identify each. We can eliminate '-' in the name but
it is not worth eliminating 4 bytes and add unnecessary logic for doing
that. Also v1.0 doesn't provide the UUID of the partitions which makes
it hard to use the same for the device name.
So to keep it simple, let us alloc an ID using ida_alloc() and append the
same to "arm-ffa" to make up a unique device name. Also stash the id value
in ffa_dev to help freeing the ID later when the device is destroyed.
Fixes: e781858488 ("firmware: arm_ffa: Add initial FFA bus support for device enumeration")
Reported-by: Lucian Paul-Trifu <lucian.paul-trifu@arm.com>
Link: https://lore.kernel.org/r/20230419-ffa_fixes_6-4-v2-3-d9108e43a176@arm.com
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Commit bb1be74985 ("firmware: arm_ffa: Add v1.1 get_partition_info support")
adds support to discovery the UUIDs of the partitions or just fetch the
partition count using the PARTITION_INFO_GET_RETURN_COUNT_ONLY flag.
However the commit doesn't handle the fact that the older version doesn't
understand the flag and must be MBZ which results in firmware returning
invalid parameter error. That results in the failure of the driver probe
which is in correct.
Limit the usage of the PARTITION_INFO_GET_RETURN_COUNT_ONLY flag for the
versions above v1.0(i.e v1.1 and onwards) which fixes the issue.
Fixes: bb1be74985 ("firmware: arm_ffa: Add v1.1 get_partition_info support")
Reported-by: Jens Wiklander <jens.wiklander@linaro.org>
Reported-by: Marc Bonnici <marc.bonnici@arm.com>
Tested-by: Jens Wiklander <jens.wiklander@linaro.org>
Reviewed-by: Jens Wiklander <jens.wiklander@linaro.org>
Link: https://lore.kernel.org/r/20230419-ffa_fixes_6-4-v2-2-d9108e43a176@arm.com
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Currently ffa_drv->remove() is called unconditionally from
ffa_device_remove(). Since the driver registration doesn't check for it
and allows it to be registered without .remove callback, we need to check
for the presence of it before executing it from ffa_device_remove() to
above a NULL pointer dereference like the one below:
| Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
| Mem abort info:
| ESR = 0x0000000086000004
| EC = 0x21: IABT (current EL), IL = 32 bits
| SET = 0, FnV = 0
| EA = 0, S1PTW = 0
| FSC = 0x04: level 0 translation fault
| user pgtable: 4k pages, 48-bit VAs, pgdp=0000000881cc8000
| [0000000000000000] pgd=0000000000000000, p4d=0000000000000000
| Internal error: Oops: 0000000086000004 [#1] PREEMPT SMP
| CPU: 3 PID: 130 Comm: rmmod Not tainted 6.3.0-rc7 #6
| Hardware name: FVP Base RevC (DT)
| pstate: 63402809 (nZCv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=-c)
| pc : 0x0
| lr : ffa_device_remove+0x20/0x2c
| Call trace:
| 0x0
| device_release_driver_internal+0x16c/0x260
| driver_detach+0x90/0xd0
| bus_remove_driver+0xdc/0x11c
| driver_unregister+0x30/0x54
| ffa_driver_unregister+0x14/0x20
| cleanup_module+0x18/0xeec
| __arm64_sys_delete_module+0x234/0x378
| invoke_syscall+0x40/0x108
| el0_svc_common+0xb4/0xf0
| do_el0_svc+0x30/0xa4
| el0_svc+0x2c/0x7c
| el0t_64_sync_handler+0x84/0xf0
| el0t_64_sync+0x190/0x194
Fixes: 244f5d597e ("firmware: arm_ffa: Add missing remove callback to ffa_bus_type")
Link: https://lore.kernel.org/r/20230419-ffa_fixes_6-4-v2-1-d9108e43a176@arm.com
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
scmi_xfer_raw_worker_init() is specifying a flag, WQ_SYSFS, as @max_active.
Fix it by or'ing WQ_SYSFS into @flags so that it actually enables sysfs
interface and using 0 for @max_active for the default setting.
Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: 3c3d818a93 ("firmware: arm_scmi: Add core raw transmission support")
Link: https://lore.kernel.org/r/ZEGTnajiQm7mkkZS@slm.duckdns.org
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Commit 572086325c ("Drivers: hv: vmbus: Cleanup synic memory free path")
says "Any memory allocations that succeeded will be freed when the caller
cleans up by calling hv_synic_free()", but if the get_zeroed_page() in
hv_synic_alloc() fails, currently hv_synic_free() is not really called
in vmbus_bus_init(), consequently there will be a memory leak, e.g.
hv_context.hv_numa_map is not freed in the error path. Fix this by
updating the goto labels.
Cc: stable@kernel.org
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Fixes: 4df4cb9e99 ("x86/hyperv: Initialize clockevents earlier in CPU onlining")
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20230504224155.10484-1-decui@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
I gave up using MIPS OCTEON hardware after the drivers were deleted from
staging in 2019. Afterwards, the driver seems to be added back but the TODO
contact name was not updated accordingly. Delete my name, as I still get
mails from people asking help with the driver and their systems.
Fixes: 422d97b8b0 ("Revert "staging: octeon: delete driver"")
Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Link: https://lore.kernel.org/r/20230427211606.GD881984@darkstar.musicnaut.iki.fi
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The bq24192 model relies on external charger-type detection and once
that is done the bq24190_charger code will update the input current.
In this case, when the initial power_supply_changed() call is made
from the interrupt handler, the input settings are 5V/0.5A which
on many devices is not enough power to charge (while the device is on).
On many devices the fuel-gauge relies in its external_power_changed
callback to timely signal userspace about charging <-> discharging
status changes. Add a power_supply_changed() call after updating
the input current. This allows the fuel-gauge driver to timely recheck
if the battery is charging after the new input current has been applied
and then it can immediately notify userspace about this.
Fixes: 18f8e6f695 ("power: supply: bq24190_charger: Get input_current_limit from our supplier")
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
The bq25892 model relies on external charger-type detection and once
that is done the bq25890_charger code will update the input current
and if pumpexpress is used also the input voltage.
In this case, when the initial power_supply_changed() call is made
from the interrupt handler, the input settings are 5V/0.5A which
on many devices is not enough power to charge (while the device is on).
On many devices the fuel-gauge relies in its external_power_changed
callback to timely signal userspace about charging <-> discharging
status changes. Add a power_supply_changed() call after updating
the input current or voltage. This allows the fuel-gauge driver
to timely recheck if the battery is charging after the new input
settings have been applied and then it can immediately notify
userspace about this.
Fixes: 48f45b094d ("power: supply: bq25890: Support higher charging voltages through Pump Express+ protocol")
Fixes: eab25b4f93 ("power: supply: bq25890: On the bq25892 set the IINLIM based on external charger detection")
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Use mod_delayed_work() instead of separate cancel_delayed_work_sync() +
schedule_delayed_work() calls.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
bq27xxx_external_power_changed() gets called when the charger is plugged
in or out. Rather then immediately scheduling an update wait 0.5 seconds
for things to stabilize, so that e.g. the (dis)charge current is stable
when bq27xxx_battery_update() runs.
Fixes: 740b755a3b ("bq27x00: Poll battery state")
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
On gauges where the current register is signed, there is no charging
flag in the flags register. So only checking flags will not result
in power_supply_changed() getting called when e.g. a charger is plugged
in and the current sign changes from negative (discharging) to
positive (charging).
This causes userspace's notion of the status to lag until userspace
does a poll.
And when a power_supply_leds.c LED trigger is used to indicate charging
status with a LED, this LED will lag until the capacity percentage
changes, which may take many minutes (because the LED trigger only is
updated on power_supply_changed() calls).
Fix this by calling bq27xxx_battery_current_and_status() on gauges with
a signed current register and checking if the status has changed.
Fixes: 297a533b3e ("bq27x00: Cache battery registers")
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Move the bq27xxx_battery_update() functions to below
the bq27xxx_battery_current_and_status() function.
This is just moving a block of text, no functional changes.
This is a preparation patch for making bq27xxx_battery_update() check
the status and have it call power_supply_changed() on status changes.
Fixes: 297a533b3e ("bq27x00: Cache battery registers")
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Add a cache parameter to bq27xxx_battery_current_and_status() so that
it can optionally use cached flags instead of re-reading them itself.
This is a preparation patch for making bq27xxx_battery_update() check
the status and have it call power_supply_changed() on status changes.
Fixes: 297a533b3e ("bq27x00: Cache battery registers")
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Before this patch bq27xxx_battery_teardown() was setting poll_interval = 0
to avoid bq27xxx_battery_update() requeuing the delayed_work item.
There are 2 problems with this:
1. If the driver is unbound through sysfs, rather then the module being
rmmod-ed, this changes poll_interval unexpectedly
2. This is racy, after it being set poll_interval could be changed
before bq27xxx_battery_update() checks it through
/sys/module/bq27xxx_battery/parameters/poll_interval
Fix this by added a removed attribute to struct bq27xxx_device_info and
using that instead of setting poll_interval to 0.
There also is another poll_interval related race on remove(), writing
/sys/module/bq27xxx_battery/parameters/poll_interval will requeue
the delayed_work item for all devices on the bq27xxx_battery_devices
list and the device being removed was only removed from that list
after cancelling the delayed_work item.
Fix this by moving the removal from the bq27xxx_battery_devices list
to before cancelling the delayed_work item.
Fixes: 8cfaaa8118 ("bq27x00_battery: Fix OOPS caused by unregistring bq27x00 driver")
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
devm_request_threaded_irq() requested IRQs are only free-ed after
the driver's remove function has ran. So the IRQ could trigger and
call bq27xxx_battery_update() after bq27xxx_battery_teardown() has
already run.
Switch to explicitly free-ing the IRQ in bq27xxx_battery_i2c_remove()
to fix this.
Fixes: 8807feb91b ("power: bq27xxx_battery: Add interrupt handling support")
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
bq27xxx_battery_update() assumes / requires that it is only run once,
not multiple times at the same time. But there are 3 possible callers:
1. bq27xxx_battery_poll() delayed_work item handler
2. bq27xxx_battery_irq_handler_thread() I2C IRQ handler
3. bq27xxx_battery_setup()
And there is no protection against these racing with each other,
fix this race condition by making all callers take di->lock:
- Rename bq27xxx_battery_update() to bq27xxx_battery_update_unlocked()
- Add new bq27xxx_battery_update() which takes di->lock and then calls
bq27xxx_battery_update_unlocked()
- Make stale cache check code in bq27xxx_battery_get_property(), which
already takes di->lock directly to check the jiffies, call
bq27xxx_battery_update_unlocked() instead of messing with
the delayed_work item
- Make bq27xxx_battery_update_unlocked() mod the delayed-work item
so that the next poll is delayed to poll_interval milliseconds after
the last update independent of the source of the update
Fixes: 740b755a3b ("bq27x00: Poll battery state")
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
When a battery's status changes from charging to full then
the charging-blink-full-solid trigger tries to change
the LED from blinking to solid/on.
As is documented in include/linux/leds.h to deactivate blinking /
to make the LED solid a LED_OFF must be send:
"""
* Deactivate blinking again when the brightness is set to LED_OFF
* via the brightness_set() callback.
"""
led_set_brighness() calls with a brightness value other then 0 / LED_OFF
merely change the brightness of the LED in its on state while it is
blinking.
So power_supply_update_bat_leds() must first send a LED_OFF event
before the LED_FULL to disable blinking.
Fixes: 6501f728c5 ("power_supply: Add new LED trigger charging-blink-solid-full")
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
sc27xx_fgu_external_power_changed() dereferences data->battery,
which gets sets in ab8500_btemp_probe() like this:
data->battery = devm_power_supply_register(dev, &sc27xx_fgu_desc,
&fgu_cfg);
As soon as devm_power_supply_register() has called device_add()
the external_power_changed callback can get called. So there is a window
where sc27xx_fgu_external_power_changed() may get called while
data->battery has not been set yet leading to a NULL pointer dereference.
Fixing this is easy. The external_power_changed callback gets passed
the power_supply which will eventually get stored in data->battery,
so sc27xx_fgu_external_power_changed() can simply directly use
the passed in psy argument which is always valid.
After this change sc27xx_fgu_external_power_changed() is reduced to just
"power_supply_changed(psy);" and it has the same prototype. While at it
simply replace it with making the external_power_changed callback
directly point to power_supply_changed.
Cc: Orson Zhai <orsonzhai@gmail.com>
Cc: Chunyan Zhang <zhang.lyra@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
bq25890_charger_external_power_changed() dereferences bq->charger,
which gets sets in bq25890_power_supply_init() like this:
bq->charger = devm_power_supply_register(bq->dev, &bq->desc, &psy_cfg);
As soon as devm_power_supply_register() has called device_add()
the external_power_changed callback can get called. So there is a window
where bq25890_charger_external_power_changed() may get called while
bq->charger has not been set yet leading to a NULL pointer dereference.
This race hits during boot sometimes on a Lenovo Yoga Book 1 yb1-x90f
when the cht_wcove_pwrsrc (extcon) power_supply is done with detecting
the connected charger-type which happens to exactly hit the small window:
BUG: kernel NULL pointer dereference, address: 0000000000000018
<snip>
RIP: 0010:__power_supply_is_supplied_by+0xb/0xb0
<snip>
Call Trace:
<TASK>
__power_supply_get_supplier_property+0x19/0x50
class_for_each_device+0xb1/0xe0
power_supply_get_property_from_supplier+0x2e/0x50
bq25890_charger_external_power_changed+0x38/0x1b0 [bq25890_charger]
__power_supply_changed_work+0x30/0x40
class_for_each_device+0xb1/0xe0
power_supply_changed_work+0x5f/0xe0
<snip>
Fixing this is easy. The external_power_changed callback gets passed
the power_supply which will eventually get stored in bq->charger,
so bq25890_charger_external_power_changed() can simply directly use
the passed in psy argument which is always valid.
Fixes: eab25b4f93 ("power: supply: bq25890: On the bq25892 set the IINLIM based on external charger detection")
Cc: stable@vger.kernel.org
Cc: Marek Vasut <marex@denx.de>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
fuel_gauge_external_power_changed() dereferences info->bat,
which gets sets in axp288_fuel_gauge_probe() like this:
info->bat = devm_power_supply_register(dev, &fuel_gauge_desc, &psy_cfg);
As soon as devm_power_supply_register() has called device_add()
the external_power_changed callback can get called. So there is a window
where fuel_gauge_external_power_changed() may get called while
info->bat has not been set yet leading to a NULL pointer dereference.
Fixing this is easy. The external_power_changed callback gets passed
the power_supply which will eventually get stored in info->bat,
so fuel_gauge_external_power_changed() can simply directly use
the passed in psy argument which is always valid.
Fixes: 30abb3d079 ("power: supply: axp288_fuel_gauge: Take lock before updating the valid flag")
Cc: stable@vger.kernel.org
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
ab8500_btemp_external_power_changed() dereferences di->btemp_psy,
which gets sets in ab8500_btemp_probe() like this:
di->btemp_psy = devm_power_supply_register(dev, &ab8500_btemp_desc,
&psy_cfg);
As soon as devm_power_supply_register() has called device_add()
the external_power_changed callback can get called. So there is a window
where ab8500_btemp_external_power_changed() may get called while
di->btemp_psy has not been set yet leading to a NULL pointer dereference.
Fixing this is easy. The external_power_changed callback gets passed
the power_supply which will eventually get stored in di->btemp_psy,
so ab8500_btemp_external_power_changed() can simply directly use
the passed in psy argument which is always valid.
And the same applies to ab8500_fg_external_power_changed().
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Enabling a (modular) test should not silently enable additional kernel
functionality, as that may increase the attack vector of a product.
Fix this by:
1. making REGMAP visible if CONFIG_KUNIT_ALL_TESTS is enabled,
2. making REGMAP_KUNIT depend on REGMAP instead of selecting it.
After this, one can safely enable CONFIG_KUNIT_ALL_TESTS=m to build
modules for all appropriate tests for ones system, without pulling in
extra unwanted functionality, while still allowing a tester to manually
enable REGMAP and its test suite on a system where REGMAP is not enabled
by default.
Fixes: 2238959b6a ("regmap: Add some basic kunit tests")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org
Link: https://lore.kernel.org/r/b0a5dbb17c1d5ea482e052e585ae83bb69c48806.1682516005.git.geert@linux-m68k.org
Signed-off-by: Mark Brown <broonie@kernel.org
In pre-production prototypes (of which I only know one person
having one, Peter Geis), GPIO0 pin A5 was tied to the SDMMC
power enable pin on the CM4 connector. On all production models,
this is not the case; instead, this pin is used for the nEXTRST
signal, and the SDMMC power enable pin is always pulled high.
Since everyone currently using the SOQuartz device trees will
want this change, it is made to the tree without splitting the
trees into two separate ones of which users will then inevitably
choose the wrong one.
This fixes USB and PCIe on a wide variety of CM4IO-compatible
boards which use the nEXTRST signal.
Fixes: 5859b5a9c3 ("arm64: dts: rockchip: add SoQuartz CM4IO dts")
Signed-off-by: Nicolas Frattaroli <frattaroli.nicolas@gmail.com>
Link: https://lore.kernel.org/r/20230421152610.21688-1-frattaroli.nicolas@gmail.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Currently the ROCK64 device tree specifies two regulators, vcc_host_5v
and vcc_host1_5v for USB VBUS on the device. Both of those are however
specified with RK_PA2 as the GPIO enabling them, causing the following
error when booting:
rockchip-pinctrl pinctrl: pin gpio0-2 already requested by vcc-host-5v-regulator; cannot claim for vcc-host1-5v-regulator
rockchip-pinctrl pinctrl: pin-2 (vcc-host1-5v-regulator) status -22
rockchip-pinctrl pinctrl: could not request pin 2 (gpio0-2) from group usb20-host-drv on device rockchip-pinctrl
reg-fixed-voltage vcc-host1-5v-regulator: Error applying setting, reverse things back
Looking at the schematic, there are in fact three USB regulators,
vcc_host_5v, vcc_host1_5v and vcc_otg_v5. But the enable signal for all
three is driven by Q2604 which is in turn driven by GPIO_A2/PA2.
Since these three regulators are not controllable separately, I removed
the second one which was causing the error and added labels for all
rails to the single regulator.
Signed-off-by: Lorenz Brun <lorenz@brun.one>
Tested-by: Diederik de Haas <didi.debian@cknow.org>
Link: https://lore.kernel.org/r/20230421213841.3079632-1-lorenz@brun.one
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
There is an explicit wait-type violation in debug_object_fill_pool()
for PREEMPT_RT=n kernels which allows them to more easily fill the
object pool and reduce the chance of allocation failures.
Lockdep's wait-type checks are designed to check the PREEMPT_RT
locking rules even for PREEMPT_RT=n kernels and object to this, so
create a lockdep annotation to allow this to stand.
Specifically, create a 'lock' type that overrides the inner wait-type
while it is held -- allowing one to temporarily raise it, such that
the violation is hidden.
Reported-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: Qi Zheng <zhengqi.arch@bytedance.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Qi Zheng <zhengqi.arch@bytedance.com>
Link: https://lkml.kernel.org/r/20230429100614.GA1489784@hirez.programming.kicks-ass.net
The wpan maintainers group is switching from Stefan's tree to a group
tree called 'wpan'. We will now maintain:
* wpan/wpan.git master:
Fixes targeting the 'net' tree
* wpan/wpan-next.git master:
Features targeting the 'net-next' tree
* wpan/wpan-next.git staging:
Same as the wpan-next master branch, but we will push there first,
expecting robots to parse the tree and report mistakes we would have
not catch. This branch can be rebased and force pushed, unlike the
others.
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Acked-by: Alexander Aring <aahringo@redhat.com>
[Fixed two small typos stefan@datenfreihafen.org]
Link: https://lore.kernel.org/r/20230411090122.419761-1-miquel.raynal@bootlin.com
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.