Pull tracing fixes from Steven Rostedt:
- Do not allow large strings (> 4096) as single write to trace_marker
The size of a string written into trace_marker was determined by the
size of the sub-buffer in the ring buffer. That size is dependent on
the PAGE_SIZE of the architecture as it can be mapped into user
space. But on PowerPC, where PAGE_SIZE is 64K, that made the limit of
the string of writing into trace_marker 64K.
One of the selftests looks at the size of the ring buffer sub-buffers
and writes that plus more into the trace_marker. The write will take
what it can and report back what it consumed so that the user space
application (like echo) will write the rest of the string. The string
is stored in the ring buffer and can be read via the "trace" or
"trace_pipe" files.
The reading of the ring buffer uses vsnprintf(), which uses a
precision "%.*s" to make sure it only reads what is stored in the
buffer, as a bug could cause the string to be non terminated.
With the combination of the precision change and the PAGE_SIZE of 64K
allowing huge strings to be added into the ring buffer, plus the test
that would actually stress that limit, a bug was reported that the
precision used was too big for "%.*s" as the string was close to 64K
in size and the max precision of vsnprintf is 32K.
Linus suggested not to have that precision as it could hide a bug if
the string was again stored without a nul byte.
Another issue that was brought up is that the trace_seq buffer is
also based on PAGE_SIZE even though it is not tied to the
architecture limit like the ring buffer sub-buffer is. Having it be
64K * 2 is simply just too big and wasting memory on systems with 64K
page sizes. It is now hardcoded to 8K which is what all other
architectures with 4K PAGE_SIZE has.
Finally, the write to trace_marker is now limited to 4K as there is
no reason to write larger strings into trace_marker.
- ring_buffer_wait() should not loop.
The ring_buffer_wait() does not have the full context (yet) on if it
should loop or not. Just exit the loop as soon as its woken up and
let the callers decide to loop or not (they already do, so it's a bit
redundant).
- Fix shortest_full field to be the smallest amount in the ring buffer
that a waiter is waiting for. The "shortest_full" field is updated
when a new waiter comes in and wants to wait for a smaller amount of
data in the ring buffer than other waiters. But after all waiters are
woken up, it's not reset, so if another waiter comes in wanting to
wait for more data, it will be woken up when the ring buffer has a
smaller amount from what the previous waiters were waiting for.
- The wake up all waiters on close is incorrectly called frome
.release() and not from .flush() so it will never wake up any waiters
as the .release() will not get called until all .read() calls are
finished. And the wakeup is for the waiters in those .read() calls.
* tag 'trace-ring-buffer-v6.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: Use .flush() call to wake up readers
ring-buffer: Fix resetting of shortest_full
ring-buffer: Fix waking up ring buffer readers
tracing: Limit trace_marker writes to just 4K
tracing: Limit trace_seq size to just 8K and not depend on architecture PAGE_SIZE
tracing: Remove precision vsnprintf() check from print event
Pull phy fixes from Vinod Koul:
- fixes for Qualcomm qmp-combo driver for ordering of drm and type-c
switch registartion due to drivers might not probe defer after having
registered child devices to avoid triggering a probe deferral loop.
This fixes internal display on Lenovo ThinkPad X13s
* tag 'phy-fixes3-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy:
phy: qcom-qmp-combo: fix type-c switch registration
phy: qcom-qmp-combo: fix drm bridge registration
The .release() function does not get called until all readers of a file
descriptor are finished.
If a thread is blocked on reading a file descriptor in ring_buffer_wait(),
and another thread closes the file descriptor, it will not wake up the
other thread as ring_buffer_wake_waiters() is called by .release(), and
that will not get called until the .read() is finished.
The issue originally showed up in trace-cmd, but the readers are actually
other processes with their own file descriptors. So calling close() would wake
up the other tasks because they are blocked on another descriptor then the
one that was closed(). But there's other wake ups that solve that issue.
When a thread is blocked on a read, it can still hang even when another
thread closed its descriptor.
This is what the .flush() callback is for. Have the .flush() wake up the
readers.
Link: https://lore.kernel.org/linux-trace-kernel/20240308202432.107909457@goodmis.org
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linke li <lilinke99@qq.com>
Cc: Rabin Vincent <rabin@rab.in>
Fixes: f3ddb74ad0 ("tracing: Wake up ring buffer waiters on closing of the file")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
The "shortest_full" variable is used to keep track of the waiter that is
waiting for the smallest amount on the ring buffer before being woken up.
When a tasks waits on the ring buffer, it passes in a "full" value that is
a percentage. 0 means wake up on any data. 1-100 means wake up from 1% to
100% full buffer.
As all waiters are on the same wait queue, the wake up happens for the
waiter with the smallest percentage.
The problem is that the smallest_full on the cpu_buffer that stores the
smallest amount doesn't get reset when all the waiters are woken up. It
does get reset when the ring buffer is reset (echo > /sys/kernel/tracing/trace).
This means that tasks may be woken up more often then when they want to
be. Instead, have the shortest_full field get reset just before waking up
all the tasks. If the tasks wait again, they will update the shortest_full
before sleeping.
Also add locking around setting of shortest_full in the poll logic, and
change "work" to "rbwork" to match the variable name for rb_irq_work
structures that are used in other places.
Link: https://lore.kernel.org/linux-trace-kernel/20240308202431.948914369@goodmis.org
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linke li <lilinke99@qq.com>
Cc: Rabin Vincent <rabin@rab.in>
Fixes: 2c2b0a78b3 ("ring-buffer: Add percentage of ring buffer full to wake up reader")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Pull kvm fixes from Paolo Bonzini:
"KVM GUEST_MEMFD fixes for 6.8:
- Make KVM_MEM_GUEST_MEMFD mutually exclusive with KVM_MEM_READONLY
to avoid creating an inconsistent ABI (KVM_MEM_GUEST_MEMFD is not
writable from userspace, so there would be no way to write to a
read-only guest_memfd).
- Update documentation for KVM_SW_PROTECTED_VM to make it abundantly
clear that such VMs are purely for development and testing.
- Limit KVM_SW_PROTECTED_VM guests to the TDP MMU, as the long term
plan is to support confidential VMs with deterministic private
memory (SNP and TDX) only in the TDP MMU.
- Fix a bug in a GUEST_MEMFD dirty logging test that caused false
passes.
x86 fixes:
- Fix missing marking of a guest page as dirty when emulating an
atomic access.
- Check for mmu_notifier invalidation events before faulting in the
pfn, and before acquiring mmu_lock, to avoid unnecessary work and
lock contention with preemptible kernels (including
CONFIG_PREEMPT_DYNAMIC in non-preemptible mode).
- Disable AMD DebugSwap by default, it breaks VMSA signing and will
be re-enabled with a better VM creation API in 6.10.
- Do the cache flush of converted pages in svm_register_enc_region()
before dropping kvm->lock, to avoid a race with unregistering of
the same region and the consequent use-after-free issue"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
SEV: disable SEV-ES DebugSwap by default
KVM: x86/mmu: Retry fault before acquiring mmu_lock if mapping is changing
KVM: SVM: Flush pages under kvm->lock to fix UAF in svm_register_enc_region()
KVM: selftests: Add a testcase to verify GUEST_MEMFD and READONLY are exclusive
KVM: selftests: Create GUEST_MEMFD for relevant invalid flags testcases
KVM: x86/mmu: Restrict KVM_SW_PROTECTED_VM to the TDP MMU
KVM: x86: Update KVM_SW_PROTECTED_VM docs to make it clear they're a WIP
KVM: Make KVM_MEM_GUEST_MEMFD mutually exclusive with KVM_MEM_READONLY
KVM: x86: Mark target gfn of emulated atomic instruction as dirty
A task can wait on a ring buffer for when it fills up to a specific
watermark. The writer will check the minimum watermark that waiters are
waiting for and if the ring buffer is past that, it will wake up all the
waiters.
The waiters are in a wait loop, and will first check if a signal is
pending and then check if the ring buffer is at the desired level where it
should break out of the loop.
If a file that uses a ring buffer closes, and there's threads waiting on
the ring buffer, it needs to wake up those threads. To do this, a
"wait_index" was used.
Before entering the wait loop, the waiter will read the wait_index. On
wakeup, it will check if the wait_index is different than when it entered
the loop, and will exit the loop if it is. The waker will only need to
update the wait_index before waking up the waiters.
This had a couple of bugs. One trivial one and one broken by design.
The trivial bug was that the waiter checked the wait_index after the
schedule() call. It had to be checked between the prepare_to_wait() and
the schedule() which it was not.
The main bug is that the first check to set the default wait_index will
always be outside the prepare_to_wait() and the schedule(). That's because
the ring_buffer_wait() doesn't have enough context to know if it should
break out of the loop.
The loop itself is not needed, because all the callers to the
ring_buffer_wait() also has their own loop, as the callers have a better
sense of what the context is to decide whether to break out of the loop
or not.
Just have the ring_buffer_wait() block once, and if it gets woken up, exit
the function and let the callers decide what to do next.
Link: https://lore.kernel.org/all/CAHk-=whs5MdtNjzFkTyaUy=vHi=qwWgPi0JgTe6OYUYMNSRZfg@mail.gmail.com/
Link: https://lore.kernel.org/linux-trace-kernel/20240308202431.792933613@goodmis.org
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linke li <lilinke99@qq.com>
Cc: Rabin Vincent <rabin@rab.in>
Fixes: e30f53aad2 ("tracing: Do not busy wait in buffer splice")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Pull i2c fixes from Wolfram Sang:
"Two patches from Heiner for the i801 are targeting muxes discovered
while working on some other features. Essentially, there is a
reordering when adding optional slaves and proper cleanup upon
registering a mux device.
Christophe fixes the exit path in the wmt driver that was leaving the
clocks hanging, and the last fix from Tommy avoids false error reports
in IRQ"
* tag 'i2c-for-6.8-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: aspeed: Fix the dummy irq expected print
i2c: wmt: Fix an error handling path in wmt_i2c_probe()
i2c: i801: Avoid potential double call to gpiod_remove_lookup_table
i2c: i801: Fix using mux_pdev before it's set
Pull firewire fix from Takashi Sakamoto:
"A fix to suppress a warning about unreleased IRQ for 1394 OHCI
hardware when disabling MSI.
In Linux kernel v6.5, a PCI driver for 1394 OHCI hardware was
optimized into the managed device resources. Edmund Raile points out
that the change brings the warning about unreleased IRQ at the call of
pci_disable_msi(), since the API expects that the relevant IRQ has
already been released in advance.
As long as the API is called in .remove callback of PCI device
operation, it is prohibited to maintain the IRQ as the part of managed
device resource. As a workaround, the IRQ is explicitly released at
.remove callback, before the call of pci_disable_msi().
pci_disable_msi() is legacy API nowadays in PCI MSI implementation. I
have a plan to replace it with the modern API in the development for
the future version of Linux kernel. So at present I keep them as is"
* tag 'firewire-fixes-6.8-final' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
firewire: ohci: prevent leak of left-over IRQ on unbind
The DebugSwap feature of SEV-ES provides a way for confidential guests to use
data breakpoints. However, because the status of the DebugSwap feature is
recorded in the VMSA, enabling it by default invalidates the attestation
signatures. In 6.10 we will introduce a new API to create SEV VMs that
will allow enabling DebugSwap based on what the user tells KVM to do.
Contextually, we will change the legacy KVM_SEV_ES_INIT API to never
enable DebugSwap.
For compatibility with kernels that pre-date the introduction of DebugSwap,
as well as with those where KVM_SEV_ES_INIT will never enable it, do not enable
the feature by default. If anybody wants to use it, for now they can enable
the sev_es_debug_swap_enabled module parameter, but this will result in a
warning.
Fixes: d1f85fbe83 ("KVM: SEV: Enable data breakpoints in SEV-ES")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
KVM GUEST_MEMFD fixes for 6.8:
- Make KVM_MEM_GUEST_MEMFD mutually exclusive with KVM_MEM_READONLY to
avoid creating ABI that KVM can't sanely support.
- Update documentation for KVM_SW_PROTECTED_VM to make it abundantly
clear that such VMs are purely a development and testing vehicle, and
come with zero guarantees.
- Limit KVM_SW_PROTECTED_VM guests to the TDP MMU, as the long term plan
is to support confidential VMs with deterministic private memory (SNP
and TDX) only in the TDP MMU.
- Fix a bug in a GUEST_MEMFD negative test that resulted in false passes
when verifying that KVM_MEM_GUEST_MEMFD memslots can't be dirty logged.
KVM x86 fixes for 6.8, round 2:
- When emulating an atomic access, mark the gfn as dirty in the memslot
to fix a bug where KVM could fail to mark the slot as dirty during live
migration, ultimately resulting in guest data corruption due to a dirty
page not being re-copied from the source to the target.
- Check for mmu_notifier invalidation events before faulting in the pfn,
and before acquiring mmu_lock, to avoid unnecessary work and lock
contention. Contending mmu_lock is especially problematic on preemptible
kernels, as KVM may yield mmu_lock in response to the contention, which
severely degrades overall performance due to vCPUs making it difficult
for the task that triggered invalidation to make forward progress.
Note, due to another kernel bug, this fix isn't limited to preemtible
kernels, as any kernel built with CONFIG_PREEMPT_DYNAMIC=y will yield
contended rwlocks and spinlocks.
https://lore.kernel.org/all/20240110214723.695930-1-seanjc@google.com
Pull ceph fix from Ilya Dryomov:
"A follow-up for sparse read fixes that went into -rc4 -- msgr2 case
was missed and is corrected here"
* tag 'ceph-for-6.8-rc8' of https://github.com/ceph/ceph-client:
libceph: init the cursor when preparing sparse read in msgr2
Pull char/misc driver fixes from Greg KH:
"Here are a few small char/misc and other driver subsystem fixes for
reported issues that have been in my tree.
Included in here are fixes for:
- iio driver fixes for reported problems
- much reported bugfix for a lis3lv02d_i2c regression
- comedi driver bugfix
- mei new device ids
- mei driver fixes
- counter core fix
All of these have been in linux-next with no reported issues, some for
many weeks"
* tag 'char-misc-6.8-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
mei: gsc_proxy: match component when GSC is on different bus
misc: fastrpc: Pass proper arguments to scm call
comedi: comedi_test: Prevent timers rescheduling during deletion
comedi: comedi_8255: Correct error in subdevice initialization
misc: lis3lv02d_i2c: Fix regulators getting en-/dis-abled twice on suspend/resume
iio: accel: adxl367: fix I2C FIFO data register
iio: accel: adxl367: fix DEVID read after reset
iio: pressure: dlhl60d: Initialize empty DLH bytes
iio: imu: inv_mpu6050: fix frequency setting when chip is off
iio: pressure: Fixes BMP38x and BMP390 SPI support
iio: imu: inv_mpu6050: fix FIFO parsing when empty
mei: Add Meteor Lake support for IVSC device
mei: me: add arrow lake point H DID
mei: me: add arrow lake point S DID
counter: fix privdata alignment
Pull tty / serial fixes from Greg KH:
"Here are some small remaining tty/serial driver fixes. Included in
here is fixes for:
- vt unicode buffer corruption fix
- imx serial driver fixes, again
- port suspend fix
- 8250_dw driver fix
- fsl_lpuart driver fix
- revert for the qcom_geni_serial driver to fix a reported regression
All of these have been in linux-next with no reported issues"
* tag 'tty-6.8-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
Revert "tty: serial: simplify qcom_geni_serial_send_chunk_fifo()"
tty: serial: fsl_lpuart: avoid idle preamble pending if CTS is enabled
vt: fix unicode buffer corruption when deleting characters
serial: port: Don't suspend if the port is still busy
serial: 8250_dw: Do not reclock if already at correct rate
tty: serial: imx: Fix broken RS485
Pull USB / Thunderbolt fixes from Greg KH:
"Here are some small remaining fixes for USB and Thunderbolt drivers.
Included in here are fixes for:
- thunderbold NULL dereference fix
- typec driver fixes
- xhci driver regression fix
- usb-storage divide-by-0 fix
- ncm gadget driver fix
All of these have been in linux-next with no reported issues"
* tag 'usb-6.8-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
xhci: Fix failure to detect ring expansion need.
usb: port: Don't try to peer unused USB ports based on location
usb: gadget: ncm: Fix handling of zero block length packets
usb: typec: altmodes/displayport: create sysfs nodes as driver's default device attribute group
usb: typec: tpcm: Fix PORT_RESET behavior for self powered devices
usb: typec: ucsi: fix UCSI on SM8550 & SM8650 Qualcomm devices
USB: usb-storage: Prevent divide-by-0 error in isd200_ata_command
thunderbolt: Fix NULL pointer dereference in tb_port_update_credits()
Pull pin control fixes from Linus Walleij:
- Fix the PM suspend callback in the STM32 ST32MP257 driver to properly
support suspend
- Drop an extraneous reference put in the debugfs code, this was
confusing the reference counts and causing unsolicited calls to
__free()
* tag 'pinctrl-v6.8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: don't put the reference to GPIO device in pinctrl_pins_show()
pinctrl: stm32: fix PM support for stm32mp257
Pull input updates from Dmitry Torokhov:
- a revert of endpoint checks in bcm5974 - the driver is being naughty
and pokes at unclaimed USB interface, so the check fails. We need to
fix the driver to claim both interfaces, and then re-implement the
endpoints check
- a fix to Synaptics RMI driver to avoid UAF on driver unload or device
unbinding
- a few new VID/PIDs added to xpad game controller driver
- a change to gpio_keys_polled driver to quiet it when GPIO causes
probe deferral.
* tag 'input-for-v6.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: synaptics-rmi4 - fix UAF of IRQ domain on driver removal
Input: gpio_keys_polled - suppress deferred probe error for gpio
Revert "Input: bcm5974 - check endpoint type before starting traffic"
Input: xpad - add additional HyperX Controller Identifiers
Pull sound fixes from Takashi Iwai:
"A collection of small fixes. Half of them are HD-audio quirks while
the rest are various device-specific ASoC fixes"
* tag 'sound-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ASoC: wm8962: Fix up incorrect error message in wm8962_set_fll
ASoC: wm8962: Enable both SPKOUTR_ENA and SPKOUTL_ENA in mono mode
ASoC: wm8962: Enable oscillator if selecting WM8962_FLL_OSC
ASoC: dt-bindings: nvidia: Fix 'lge' vendor prefix
ALSA: hda/realtek: fix mute/micmute LEDs for HP EliteBook
ASoC: amd: yc: Add HP Pavilion Aero Laptop 13-be2xxx(8BD6) into DMI quirk table
ASoC: rcar: adg: correct TIMSEL setting for SSI9
ALSA: hda: cs35l41: Overwrite CS35L41 configuration for ASUS UM5302LA
ALSA: hda/realtek: Add quirks for Lenovo Thinkbook 16P laptops
ALSA: hda: cs35l41: Support Lenovo Thinkbook 16P
ALSA: hda/realtek - Add Headset Mic supported Acer NB platform
ALSA: hda: optimize the probe codec process
ALSA: hda/realtek - Fix headset Mic no show at resume back for Lenovo ALC897 platform
ASoC: Intel: bytcr_rt5640: Add an extra entry for the Chuwi Vi8 tablet
ASoC: madera: Fix typo in madera_set_fll_clks shift value
Pull drm fixes from Dave Airlie:
"Regular fixes (two weeks for i915), scattered across drivers, amdgpu
and i915 being the main ones, with nouveau having a couple of fixes.
One patch got applied for udl, but reverted soon after as the
maintainer has missed some crucial prior discussion.
Seems quiet and normal enough for this stage.
MAINTAINERS
- update email address
core:
- fix polling in certain configurations
buddy:
- fix kunit test warning
panel:
- boe-tv101wum-nl6: timing tuning fixes
i915:
- Fix to extract HDCP information from primary connector
- Check for NULL mmu_interval_notifier before removing
- Fix for #10184: Kernel crash on UHD Graphics 730 (Cc stable)
- Fix for #10284: Boot delay regresion with PSR
- Fix DP connector DSC HW state readout
- Selftest fix to convert msecs to jiffies
xe:
- error path fix
amdgpu:
- SMU14 fix
- Fix possible NULL pointer
- VRR fix
- pwm fix
nouveau:
- fix deadlock in new ioctls fail path
- fix missing locking around object rbtree
udl:
- apply and revert format change"
* tag 'drm-fixes-2024-03-08' of https://gitlab.freedesktop.org/drm/kernel: (21 commits)
nouveau: lock the client object tree.
drm/tests/buddy: fix print format
drm/xe: Return immediately on tile_init failure
drm/amdgpu/pm: Fix the error of pwm1_enable setting
drm/amd/display: handle range offsets in VRR ranges
drm/amd/display: check dc_link before dereferencing
drm/amd/swsmu: modify the gfx activity scaling
Revert "drm/udl: Add ARGB8888 as a format"
drm/i915/panelreplay: Move out psr_init_dpcd() from init_connector()
drm/i915/dp: Fix connector DSC HW state readout
drm/i915/selftests: Fix dependency of some timeouts on HZ
drm/udl: Add ARGB8888 as a format
drm/nouveau: fix stale locked mutex in nouveau_gem_ioctl_pushbuf
drm/i915: Don't explode when the dig port we don't have an AUX CH
MAINTAINERS: Update email address for Tvrtko Ursulin
drm/panel: boe-tv101wum-nl6: Fine tune Himax83102-j02 panel HFP and HBP (again)
drm: Fix output poll work for drm_kms_helper_poll=n
drm/i915: Check before removing mm notifier
drm/i915/hdcp: Extract hdcp structure from correct connector
drm/i915/hdcp: Remove additional timing for reading mst hdcp message
...
When the i2c error condition occurred and master state was not
idle, the master irq function will goto complete state without any
other interrupt handling. It would cause dummy irq expected print.
Under this condition, assign the irq_status into irq_handle.
For example, when the abnormal start / stop occurred (bit 5) with
normal stop status (bit 4) at same time. Then the normal stop status
would not be handled and it would cause irq expected print in
the aspeed_i2c_bus_irq.
...
aspeed-i2c-bus x. i2c-bus: irq handled != irq.
Expected 0x00000030, but was 0x00000020
...
Fixes: 3e9efc3299 ("i2c: aspeed: Handle master/slave combined irq events properly")
Cc: Jae Hyun Yoo <jae.hyun.yoo@linux.intel.com>
Signed-off-by: Tommy Huang <tommy_huang@aspeedtech.com>
Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
wmt_i2c_reset_hardware() calls clk_prepare_enable(). So, should an error
occur after it, it should be undone by a corresponding
clk_disable_unprepare() call, as already done in the remove function.
Fixes: 560746eb79 ("i2c: vt8500: Add support for I2C bus on Wondermedia SoCs")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
If registering the platform device fails, the lookup table is
removed in the error path. On module removal we would try to
remove the lookup table again. Fix this by setting priv->lookup
only if registering the platform device was successful.
In addition free the memory allocated for the lookup table in
the error path.
Fixes: d308dfbf62 ("i2c: mux/i801: Switch to use descriptor passing")
Cc: stable@vger.kernel.org
Reviewed-by: Andi Shyti <andi.shyti@kernel.org>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
i801_probe_optional_slaves() is called before i801_add_mux().
This results in mux_pdev being checked before it's set by
i801_add_mux(). Fix this by changing the order of the calls.
I consider this safe as I see no dependencies.
Fixes: 80e56b86b5 ("i2c: i801: Simplify class-based client device instantiation")
Cc: stable@vger.kernel.org
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andi Shyti <andi.shyti@kernel.org>
Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
ASoC: Fixes for v6.8
Some more driver specific fixes for v6.8, plus one new x86 platform
quirk. All good fixes to have if you have systems that use the relevant
hardware.
Pull misc fixes from Andrew Morton:
"6 hotfixes. 4 are cc:stable and the remainder pertain to post-6.7
issues or aren't considered to be needed in earlier kernel versions"
* tag 'mm-hotfixes-stable-2024-03-07-16-17' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
scripts/gdb/symbols: fix invalid escape sequence warning
mailmap: fix Kishon's email
init/Kconfig: lower GCC version check for -Warray-bounds
mm, mmap: fix vma_merge() case 7 with vma_ops->close
mm: userfaultfd: fix unexpected change to src_folio when UFFDIO_MOVE fails
mm, vmscan: prevent infinite loop for costly GFP_NOIO | __GFP_RETRY_MAYFAIL allocations
Calling irq_domain_remove() will lead to freeing the IRQ domain
prematurely. The domain is still referenced and will be attempted to get
used via rmi_free_function_list() -> rmi_unregister_function() ->
irq_dispose_mapping() -> irq_get_irq_data()'s ->domain pointer.
With PaX's MEMORY_SANITIZE this will lead to an access fault when
attempting to dereference embedded pointers, as in Torsten's report that
was faulting on the 'domain->ops->unmap' test.
Fix this by releasing the IRQ domain only after all related IRQs have
been deactivated.
Fixes: 24d28e4f12 ("Input: synaptics-rmi4 - convert irq distribution to irq_domain")
Reported-by: Torsten Hilbrich <torsten.hilbrich@secunet.com>
Signed-off-by: Mathias Krause <minipli@grsecurity.net>
Link: https://lore.kernel.org/r/20240222142654.856566-1-minipli@grsecurity.net
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Pull spi fix from Mark Brown:
"One small fix for the newly added cs42l43 driver which would have
caused it problems working in some system configurations by needlessly
restricting chip select configurations"
* tag 'spi-fix-v6.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: cs42l43: Don't limit native CS to the first chip select
Pull regulator fixes from Mark Brown:
"A couple of small fixes for the rk808 driver, the regulator voltage
configurations were incorrectly described.
The changes are not expected to have practical impact but given that
we're dealing with power it's generally better to follow the hardware
specification as closely as we can to avoid unexpected stresses"
* tag 'regulator-fix-v6.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: rk808: fix LDO range on RK806
regulator: rk808: fix buck range on RK806
Pull arm64 fix from Will Deacon:
"A lonely arm64 fix addressing a kprobes regression that we introduced
during the merge window:
- Fix recursive kprobes regression when probing the stack unwinder"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: prohibit probing on arch_kunwind_consume_entry()
Pull erofs fixes from Gao Xiang:
"The main one is a KMSAN fix which addresses an issue introduced in
this cycle so it'd be much better to fix before releasing, and the
remaining one fixes VMA alignment for THP.
Summary:
- Fix a KMSAN uninit-value issue triggered by a crafted image
- Fix VMA alignment for memory mapped files on THP"
* tag 'erofs-for-6.8-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: apply proper VMA alignment for memory mapped files on THP
erofs: fix uninitialized page cache reported by KMSAN
Pull networking fixes from Paolo Abeni:
"Including fixes from bpf, ipsec and netfilter.
No solution yet for the stmmac issue mentioned in the last PR, but it
proved to be a lockdep false positive, not a blocker.
Current release - regressions:
- dpll: move all dpll<>netdev helpers to dpll code, fix build
regression with old compilers
Current release - new code bugs:
- page_pool: fix netlink dump stop/resume
Previous releases - regressions:
- bpf: fix verifier to check bpf_func_state->callback_depth when
pruning states as otherwise unsafe programs could get accepted
- ipv6: avoid possible UAF in ip6_route_mpath_notify()
- ice: reconfig host after changing MSI-X on VF
- mlx5:
- e-switch, change flow rule destination checking
- add a memory barrier to prevent a possible null-ptr-deref
- switch to using _bh variant of of spinlock where needed
Previous releases - always broken:
- netfilter: nf_conntrack_h323: add protection for bmp length out of
range
- bpf: fix to zero-initialise xdp_rxq_info struct before running XDP
program in CPU map which led to random xdp_md fields
- xfrm: fix UDP encapsulation in TX packet offload
- netrom: fix data-races around sysctls
- ice:
- fix potential NULL pointer dereference in ice_bridge_setlink()
- fix uninitialized dplls mutex usage
- igc: avoid returning frame twice in XDP_REDIRECT
- i40e: disable NAPI right after disabling irqs when handling
xsk_pool
- geneve: make sure to pull inner header in geneve_rx()
- sparx5: fix use after free inside sparx5_del_mact_entry
- dsa: microchip: fix register write order in ksz8_ind_write8()
Misc:
- selftests: mptcp: fixes for diag.sh"
* tag 'net-6.8-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (63 commits)
net: pds_core: Fix possible double free in error handling path
netrom: Fix data-races around sysctl_net_busy_read
netrom: Fix a data-race around sysctl_netrom_link_fails_count
netrom: Fix a data-race around sysctl_netrom_routing_control
netrom: Fix a data-race around sysctl_netrom_transport_no_activity_timeout
netrom: Fix a data-race around sysctl_netrom_transport_requested_window_size
netrom: Fix a data-race around sysctl_netrom_transport_busy_delay
netrom: Fix a data-race around sysctl_netrom_transport_acknowledge_delay
netrom: Fix a data-race around sysctl_netrom_transport_maximum_tries
netrom: Fix a data-race around sysctl_netrom_transport_timeout
netrom: Fix data-races around sysctl_netrom_network_ttl_initialiser
netrom: Fix a data-race around sysctl_netrom_obsolescence_count_initialiser
netrom: Fix a data-race around sysctl_netrom_default_path_quality
netfilter: nf_conntrack_h323: Add protection for bmp length out of range
netfilter: nf_tables: mark set as dead when unbinding anonymous set with timeout
netfilter: nft_ct: fix l3num expectations with inet pseudo family
netfilter: nf_tables: reject constant set with timeout
netfilter: nf_tables: disallow anonymous set with timeout flag
net/rds: fix WARNING in rds_conn_connect_if_down
net: dsa: microchip: fix register write order in ksz8_ind_write8()
...
When auxiliary_device_add() returns error and then calls
auxiliary_device_uninit(), Callback function pdsc_auxbus_dev_release
calls kfree(padev) to free memory. We shouldn't call kfree(padev)
again in the error handling path.
Fix this by cleaning up the redundant kfree() and putting
the error handling back to where the errors happened.
Fixes: 4569cce43b ("pds_core: add auxiliary_bus devices")
Signed-off-by: Yongzhi Liu <hyperlyzcs@gmail.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Link: https://lore.kernel.org/r/20240306105714.20597-1-hyperlyzcs@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains fixes for net:
Patch #1 disallows anonymous sets with timeout, except for dynamic sets.
Anonymous sets with timeouts using the pipapo set backend makes
no sense from userspace perspective.
Patch #2 rejects constant sets with timeout which has no practical usecase.
This kind of set, once bound, contains elements that expire but
no new elements can be added.
Patch #3 restores custom conntrack expectations with NFPROTO_INET,
from Florian Westphal.
Patch #4 marks rhashtable anonymous set with timeout as dead from the
commit path to avoid that async GC collects these elements. Rules
that refers to the anonymous set get released with no mutex held
from the commit path.
Patch #5 fixes a UBSAN shift overflow in H.323 conntrack helper,
from Lena Wang.
netfilter pull request 24-03-07
* tag 'nf-24-03-07' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nf_conntrack_h323: Add protection for bmp length out of range
netfilter: nf_tables: mark set as dead when unbinding anonymous set with timeout
netfilter: nft_ct: fix l3num expectations with inet pseudo family
netfilter: nf_tables: reject constant set with timeout
netfilter: nf_tables: disallow anonymous set with timeout flag
====================
Link: https://lore.kernel.org/r/20240307021545.149386-1-pablo@netfilter.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jason Xing says:
====================
netrom: Fix all the data-races around sysctls
As the title said, in this patchset I fix the data-race issues because
the writer and the reader can manipulate the same value concurrently.
====================
Link: https://lore.kernel.org/r/20240304082046.64977-1-kerneljasonxing@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
We need to protect the reader reading the sysctl value because the
value can be changed concurrently.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
We need to protect the reader reading the sysctl value because the
value can be changed concurrently.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
We need to protect the reader reading the sysctl value because the
value can be changed concurrently.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
We need to protect the reader reading the sysctl value because the
value can be changed concurrently.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
We need to protect the reader reading the sysctl value because the
value can be changed concurrently.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
We need to protect the reader reading the sysctl value because the
value can be changed concurrently.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
We need to protect the reader reading the sysctl value because the
value can be changed concurrently.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
We need to protect the reader reading the sysctl value because the
value can be changed concurrently.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
We need to protect the reader reading the sysctl value because the
value can be changed concurrently.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
We need to protect the reader reading the sysctl value because the
value can be changed concurrently.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
We need to protect the reader reading the sysctl value
because the value can be changed concurrently.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
We need to protect the reader reading sysctl_netrom_default_path_quality
because the value can be changed concurrently.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Steffen Klassert says:
====================
pull request (net): ipsec 2024-03-06
1) Clear the ECN bits flowi4_tos in decode_session4().
This was already fixed but the bug was reintroduced
when decode_session4() switched to us the flow dissector.
From Guillaume Nault.
2) Fix UDP encapsulation in the TX path with packet offload mode.
From Leon Romanovsky,
3) Avoid clang fortify warning in copy_to_user_tmpl().
From Nathan Chancellor.
4) Fix inter address family tunnel in packet offload mode.
From Mike Yu.
* tag 'ipsec-2024-03-06' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec:
xfrm: set skb control buffer based on packet offload as well
xfrm: fix xfrm child route lookup for packet offload
xfrm: Avoid clang fortify warning in copy_to_user_tmpl()
xfrm: Pass UDP encapsulation in TX packet offload
xfrm: Clear low order bits of ->flowi4_tos in decode_session4().
====================
Link: https://lore.kernel.org/r/20240306100438.3953516-1-steffen.klassert@secunet.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Daniel Borkmann says:
====================
pull-request: bpf 2024-03-06
We've added 5 non-merge commits during the last 1 day(s) which contain
a total of 5 files changed, 77 insertions(+), 4 deletions(-).
The main changes are:
1) Fix BPF verifier to check bpf_func_state->callback_depth when pruning
states as otherwise unsafe programs could get accepted,
from Eduard Zingerman.
2) Fix to zero-initialise xdp_rxq_info struct before running XDP program in
CPU map which led to random xdp_md fields, from Toke Høiland-Jørgensen.
3) Fix bonding XDP feature flags calculation when bonding device has no
slave devices anymore, from Daniel Borkmann.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
cpumap: Zero-initialise xdp_rxq_info struct before running XDP program
selftests/bpf: Fix up xdp bonding test wrt feature flags
xdp, bonding: Fix feature flags when there are no slave devs anymore
selftests/bpf: test case for callback_depth states pruning logic
bpf: check bpf_func_state->callback_depth when pruning states
====================
Link: https://lore.kernel.org/r/20240306220309.13534-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
There are mainly two reasons that thp_get_unmapped_area() should be
used for EROFS as other filesystems:
- It's needed to enable PMD mappings as a FSDAX filesystem, see
commit 74d2fad133 ("thp, dax: add thp_get_unmapped_area for pmd
mappings");
- It's useful together with large folios and
CONFIG_READ_ONLY_THP_FOR_FS which enable THPs for mmapped files
(e.g. shared libraries) even without FSDAX. See commit 1854bc6e24
("mm/readahead: Align file mappings for non-DAX").
Fixes: 06252e9ce0 ("erofs: dax support for non-tailpacking regular file")
Fixes: ce529cc25b ("erofs: enable large folios for iomap mode")
Fixes: e6687b8922 ("erofs: enable large folios for fscache mode")
Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20240306053138.2240206-1-hsiangkao@linux.alibaba.com
UBSAN load reports an exception of BRK#5515 SHIFT_ISSUE:Bitwise shifts
that are out of bounds for their data type.
vmlinux get_bitmap(b=75) + 712
<net/netfilter/nf_conntrack_h323_asn1.c:0>
vmlinux decode_seq(bs=0xFFFFFFD008037000, f=0xFFFFFFD008037018, level=134443100) + 1956
<net/netfilter/nf_conntrack_h323_asn1.c:592>
vmlinux decode_choice(base=0xFFFFFFD0080370F0, level=23843636) + 1216
<net/netfilter/nf_conntrack_h323_asn1.c:814>
vmlinux decode_seq(f=0xFFFFFFD0080371A8, level=134443500) + 812
<net/netfilter/nf_conntrack_h323_asn1.c:576>
vmlinux decode_choice(base=0xFFFFFFD008037280, level=0) + 1216
<net/netfilter/nf_conntrack_h323_asn1.c:814>
vmlinux DecodeRasMessage() + 304
<net/netfilter/nf_conntrack_h323_asn1.c:833>
vmlinux ras_help() + 684
<net/netfilter/nf_conntrack_h323_main.c:1728>
vmlinux nf_confirm() + 188
<net/netfilter/nf_conntrack_proto.c:137>
Due to abnormal data in skb->data, the extension bitmap length
exceeds 32 when decoding ras message then uses the length to make
a shift operation. It will change into negative after several loop.
UBSAN load could detect a negative shift as an undefined behaviour
and reports exception.
So we add the protection to avoid the length exceeding 32. Or else
it will return out of range error and stop decoding.
Fixes: 5e35941d99 ("[NETFILTER]: Add H.323 conntrack/NAT helper")
Signed-off-by: Lena Wang <lena.wang@mediatek.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
While the rhashtable set gc runs asynchronously, a race allows it to
collect elements from anonymous sets with timeouts while it is being
released from the commit path.
Mingi Cho originally reported this issue in a different path in 6.1.x
with a pipapo set with low timeouts which is not possible upstream since
7395dfacff ("netfilter: nf_tables: use timestamp to check for set
element timeout").
Fix this by setting on the dead flag for anonymous sets to skip async gc
in this case.
According to 08e4c8c591 ("netfilter: nf_tables: mark newset as dead on
transaction abort"), Florian plans to accelerate abort path by releasing
objects via workqueue, therefore, this sets on the dead flag for abort
path too.
Cc: stable@vger.kernel.org
Fixes: 5f68718b34 ("netfilter: nf_tables: GC transaction API to avoid race with control plane")
Reported-by: Mingi Cho <mgcho.minic@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Following is rejected but should be allowed:
table inet t {
ct expectation exp1 {
[..]
l3proto ip
Valid combos are:
table ip t, l3proto ip
table ip6 t, l3proto ip6
table inet t, l3proto ip OR l3proto ip6
Disallow inet pseudeo family, the l3num must be a on-wire protocol known
to conntrack.
Retain NFPROTO_INET case to make it clear its rejected
intentionally rather as oversight.
Fixes: 8059918a13 ("netfilter: nft_ct: sanitize layer 3 and 4 protocol number in custom expectations")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This set combination is weird: it allows for elements to be
added/deleted, but once bound to the rule it cannot be updated anymore.
Eventually, all elements expire, leading to an empty set which cannot
be updated anymore. Reject this flags combination.
Cc: stable@vger.kernel.org
Fixes: 761da2935d ("netfilter: nf_tables: add set timeout API support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Anonymous sets are never used with timeout from userspace, reject this.
Exception to this rule is NFT_SET_EVAL to ensure legacy meters still work.
Cc: stable@vger.kernel.org
Fixes: 761da2935d ("netfilter: nf_tables: add set timeout API support")
Reported-by: lonial con <kongln9170@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Limit the max print event of trace_marker to just 4K string size. This must
also be less than the amount that can be held by a trace_seq along with
the text that is before the output (like the task name, PID, CPU, state,
etc). As trace_seq is made to handle large events (some greater than 4K).
Make the max size of a trace_marker write event be 4K which is guaranteed
to fit in the trace_seq buffer.
Link: https://lore.kernel.org/linux-trace-kernel/20240304223433.4ba47dff@gandalf.local.home
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
The trace_seq buffer is used to print out entire events. It's typically
set to PAGE_SIZE * 2 as there's some events that can be quite large.
As a side effect, writes to trace_marker is limited by both the size of the
trace_seq buffer as well as the ring buffer's sub-buffer size (which is a
power of PAGE_SIZE). By limiting the trace_seq size, it also limits the
size of the largest string written to trace_marker.
trace_seq does not need to be dependent on PAGE_SIZE like the ring buffer
sub-buffers need to be. Hard code it to 8K which is PAGE_SIZE * 2 on most
architectures. This will also limit the size of trace_marker on those
architectures with greater than 4K PAGE_SIZE.
Link: https://lore.kernel.org/all/20240302111244.3a1674be@gandalf.local.home/
Link: https://lore.kernel.org/linux-trace-kernel/20240304191342.56fb1087@gandalf.local.home
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Sachin Sant <sachinp@linux.ibm.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
As the chip selects can be configured through ACPI/OF/swnode, and
the set_cs() callback will only be called when a native chip select
is being used, there is no reason for the driver to only support the
native chip select as the first chip select. Remove the check that
introduces this limitation.
Fixes: ef75e76716 ("spi: cs42l43: Add SPI controller support")
Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Link: https://msgid.link/r/20240306161004.2205113-1-ckeepax@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Pull vfs fixes from Christian Brauner:
- Get rid of copy_mc flag in iov_iter which really only makes sense for
the core dumping code so move it out of the generic iov iter code and
make it coredump's problem. See the detailed commit description.
- Revert fs/aio: Make io_cancel() generate completions again
The initial fix here was predicated on the assumption that calling
ki_cancel() didn't complete aio requests. However, that turned out to
be wrong since the two drivers that actually make use of this set a
cancellation function that performs the cancellation correctly. So
revert this change.
- Ensure that the test for IOCB_AIO_RW always happens before the read
from ki_ctx.
* tag 'vfs-6.8-release.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
iov_iter: get rid of 'copy_mc' flag
fs/aio: Check IOCB_AIO_RW before the struct aio_kiocb conversion
Revert "fs/aio: Make io_cancel() generate completions again"
Pull ARM SoC fixes from Arnd Bergmann:
"These should be the final fixes for the soc tree for 6.8, as usual
they mostly deal wtih dts files:
- Qualcomm fixes for pcie4 on sc8280xp, a revert of msm8996 mpm
support, sm6115 interconnect and sm8650 gpio.
- Two fixes for Tegra234 ethernet
- A Makefile fix to actually build the allwinner based orange pi zero
2w device tree
- Fixes for clocks and reset on imx8mp and a DSI display regression
on imx7.
The non-DT fixes are:
- Firmware fixes addressing a kernel panic in op-tee and a minor
regression in microchip/riscv.
- A defconfig change to bring back backlight support after a Kconfig
change"
* tag 'arm-fixes-6.8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
firmware: microchip: Fix over-requested allocation size
tee: optee: Fix kernel panic caused by incorrect error handling
Revert "arm64: dts: qcom: msm8996: Hook up MPM"
arm64: dts: qcom: sc8280xp-x13s: limit pcie4 link speed
arm64: dts: qcom: sc8280xp-crd: limit pcie4 link speed
arm64: dts: imx8mp: Fix LDB clocks property
arm64: dts: imx8mp: Fix TC9595 reset GPIO on DH i.MX8M Plus DHCOM SoM
MAINTAINERS: Use a proper mailinglist for NXP i.MX development
ARM: dts: imx7: remove DSI port endpoints
arm64: dts: allwinner: h616: Add Orange Pi Zero 2W to Makefile
ARM: imx_v6_v7_defconfig: Restore CONFIG_BACKLIGHT_CLASS_DEVICE
arm64: tegra: Fix Tegra234 MGBE power-domains
arm64: tegra: Set the correct PHY mode for MGBE
arm64: dts: qcom: sm6115: Fix missing interconnect-names
arm64: dts: qcom: sm8650-mtp: add gpio74 as reserved gpio
arm64: dts: qcom: sm8650-qrd: add gpio74 as reserved gpio
Pull crypto fixes from Herbert Xu:
"Fix potential use-after-frees in rk3288 and sun8i-ce"
* tag 'v6.8-p6' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: rk3288 - Fix use after free in unprepare
crypto: sun8i-ce - Fix use after free in unprepare
Due to a long-standing issue in driver core, drivers may not probe defer
after having registered child devices to avoid triggering a probe
deferral loop (see fbc35b45f9 ("Add documentation on meaning of
-EPROBE_DEFER")).
Move registration of the typec switch to after looking up clocks and
other resources.
Note that PHY creation can in theory also trigger a probe deferral when
a 'phy' supply is used. This does not seem to affect the QMP PHY driver
but the PHY subsystem should be reworked to address this (i.e. by
separating initialisation and registration of the PHY).
Fixes: 2851117f8f ("phy: qcom-qmp-combo: Introduce orientation switching")
Cc: stable@vger.kernel.org # 6.5
Cc: Bjorn Andersson <quic_bjorande@quicinc.com>
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Reviewed-by: Bjorn Andersson <andersson@kernel.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Acked-by: Vinod Koul <vkoul@kernel.org>
Acked-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://lore.kernel.org/r/20240217150228.5788-7-johan+linaro@kernel.org
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Due to a long-standing issue in driver core, drivers may not probe defer
after having registered child devices to avoid triggering a probe
deferral loop (see fbc35b45f9 ("Add documentation on meaning of
-EPROBE_DEFER")).
This could potentially also trigger a bug in the DRM bridge
implementation which does not expect bridges to go away even if device
links may avoid triggering this (when enabled).
Move registration of the DRM aux bridge to after looking up clocks and
other resources.
Note that PHY creation can in theory also trigger a probe deferral when
a 'phy' supply is used. This does not seem to affect the QMP PHY driver
but the PHY subsystem should be reworked to address this (i.e. by
separating initialisation and registration of the PHY).
Fixes: 35921910bb ("phy: qcom: qmp-combo: switch to DRM_AUX_BRIDGE")
Fixes: 1904c3f578 ("phy: qcom-qmp-combo: Introduce drm_bridge")
Cc: stable@vger.kernel.org # 6.5
Cc: Bjorn Andersson <quic_bjorande@quicinc.com>
Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Reviewed-by: Bjorn Andersson <andersson@kernel.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Acked-by: Vinod Koul <vkoul@kernel.org>
Acked-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://lore.kernel.org/r/20240217150228.5788-6-johan+linaro@kernel.org
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Commit 5a95f1ded2 ("firewire: ohci: use devres for requested IRQ")
also removed the call to free_irq() in pci_remove(), leading to a
leftover irq of devm_request_irq() at pci_disable_msi() in pci_remove()
when unbinding the driver from the device
remove_proc_entry: removing non-empty directory 'irq/136', leaking at
least 'firewire_ohci'
Call Trace:
? remove_proc_entry+0x19c/0x1c0
? __warn+0x81/0x130
? remove_proc_entry+0x19c/0x1c0
? report_bug+0x171/0x1a0
? console_unlock+0x78/0x120
? handle_bug+0x3c/0x80
? exc_invalid_op+0x17/0x70
? asm_exc_invalid_op+0x1a/0x20
? remove_proc_entry+0x19c/0x1c0
unregister_irq_proc+0xf4/0x120
free_desc+0x3d/0xe0
? kfree+0x29f/0x2f0
irq_free_descs+0x47/0x70
msi_domain_free_locked.part.0+0x19d/0x1d0
msi_domain_free_irqs_all_locked+0x81/0xc0
pci_free_msi_irqs+0x12/0x40
pci_disable_msi+0x4c/0x60
pci_remove+0x9d/0xc0 [firewire_ohci
01b483699bebf9cb07a3d69df0aa2bee71db1b26]
pci_device_remove+0x37/0xa0
device_release_driver_internal+0x19f/0x200
unbind_store+0xa1/0xb0
remove irq with devm_free_irq() before pci_disable_msi()
also remove it in fail_msi: of pci_probe() as this would lead to
an identical leak
Cc: stable@vger.kernel.org
Fixes: 5a95f1ded2 ("firewire: ohci: use devres for requested IRQ")
Signed-off-by: Edmund Raile <edmund.raile@proton.me>
Link: https://lore.kernel.org/r/20240229144723.13047-2-edmund.raile@proton.me
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Third argument of i915_request_wait() accepts a timeout value in jiffies.
Most users pass either a simple HZ based expression, or a result of
msecs_to_jiffies(), or MAX_SCHEDULE_TIMEOUT, or a very small number not
exceeding 4 if applicable as that value. However, there is one user --
intel_selftest_wait_for_rq() -- that passes a WAIT_FOR_RESET_TIME symbol,
defined as a large constant value that most probably represents a desired
timeout in ms. While that usage results in the intended value of timeout
on usual x86_64 kernel configurations, it is not portable across different
architectures and custom kernel configs.
Rename the symbol to clearly indicate intended units and convert it to
jiffies before use.
Fixes: 3a4bfa091c ("drm/i915/selftest: Fix workarounds selftest for GuC submission")
Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Cc: Rahul Kumar Singh <rahul.kumar.singh@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240222113347.648945-2-janusz.krzysztofik@linux.intel.com
(cherry picked from commit 6ee3f54b88)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
The cursor is no longer initialized in the OSD client, causing the
sparse read state machine to fall into an infinite loop. The cursor
should be initialized in IN_S_PREPARE_SPARSE_DATA state.
[ idryomov: use msg instead of con->in_msg, changelog ]
Link: https://tracker.ceph.com/issues/64607
Fixes: 8e46a2d068 ("libceph: just wait for more data to be available on the socket")
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Tested-by: Luis Henriques <lhenriques@suse.de>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2024-03-05 (idpf, ice, i40e, igc, e1000e)
This series contains updates to idpf, ice, i40e, igc and e1000e drivers.
Emil disables local BH on NAPI schedule for proper handling of softirqs
on idpf.
Jake stops reporting of virtchannel RSS option which in unsupported on
ice.
Rand Deeb adds null check to prevent possible null pointer dereference
on ice.
Michal Schmidt moves DPLL mutex initialization to resolve uninitialized
mutex usage for ice.
Jesse fixes incorrect variable usage for calculating Tx stats on ice.
Ivan Vecera corrects logic for firmware equals check on i40e.
Florian Kauer prevents memory corruption for XDP_REDIRECT on igc.
Sasha reverts an incorrect use of FIELD_GET which caused a regression
for Wake on LAN on e1000e.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This flag is only set by one single user: the magical core dumping code
that looks up user pages one by one, and then writes them out using
their kernel addresses (by using a BVEC_ITER).
That actually ends up being a huge problem, because while we do use
copy_mc_to_kernel() for this case and it is able to handle the possible
machine checks involved, nothing else is really ready to handle the
failures caused by the machine check.
In particular, as reported by Tong Tiangen, we don't actually support
fault_in_iov_iter_readable() on a machine check area.
As a result, the usual logic for writing things to a file under a
filesystem lock, which involves doing a copy with page faults disabled
and then if that fails trying to fault pages in without holding the
locks with fault_in_iov_iter_readable() does not work at all.
We could decide to always just make the MC copy "succeed" (and filling
the destination with zeroes), and that would then create a core dump
file that just ignores any machine checks.
But honestly, this single special case has been problematic before, and
means that all the normal iov_iter code ends up slightly more complex
and slower.
See for example commit c9eec08bac ("iov_iter: Don't deal with
iter->copy_mc in memcpy_from_iter_mc()") where David Howells
re-organized the code just to avoid having to check the 'copy_mc' flags
inside the inner iov_iter loops.
So considering that we have exactly one user, and that one user is a
non-critical special case that doesn't actually ever trigger in real
life (Tong found this with manual error injection), the sane solution is
to just decide that the onus on handling the machine check lines on that
user instead.
Ergo, do the copy_mc_to_kernel() in the core dump logic itself, copying
the user data to a stable kernel page before writing it out.
Fixes: f1982740f5 ("iov_iter: Convert iterate*() to inline funcs")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Tong Tiangen <tongtiangen@huawei.com>
Link: https://lore.kernel.org/r/20240305133336.3804360-1-tongtiangen@huawei.com
Link: https://lore.kernel.org/all/4e80924d-9c85-f13a-722a-6a5d2b1c225a@huawei.com/
Tested-by: David Howells <dhowells@redhat.com>
Reviewed-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Reported-by: Tong Tiangen <tongtiangen@huawei.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Mike Yu says:
====================
In the XFRM stack, whether a packet is forwarded to the IPv4
or IPv6 stack depends on the family field of the matched SA.
This does not completely work for IPsec packet offload in some
scenario, for example, sending an IPv6 packet that will be
encrypted and encapsulated as an IPv4 packet in HW.
Here are the patches to make IPsec packet offload work on the
mentioned scenario.
====================
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
A few more Qualcomm Arm64 DeviceTree fixes for v6.8
This reduces the link speed of the PCIe bus with WiFi-card connected on the
Lenovo ThinkPad X13s and the Qualcomm Compute Reference Device, avoid
link errors and initialization issues reported by users.
It also reverts the enablement of MPM on MSM8996, which is reported to
prevent boards on this platform from booting for some users.
* tag 'qcom-arm64-fixes-for-6.8-2' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux:
Revert "arm64: dts: qcom: msm8996: Hook up MPM"
arm64: dts: qcom: sc8280xp-x13s: limit pcie4 link speed
arm64: dts: qcom: sc8280xp-crd: limit pcie4 link speed
Link: https://lore.kernel.org/r/20240306031208.4218-1-andersson@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
This bug was noticed while re-implementing parts of the kernel
driver in userspace using spidev. The goal was to enable some
of the errata workarounds that Microchip describes in their
errata sheet [1].
Both the errata sheet and the regular datasheet of e.g. the KSZ8795
imply that you need to do this for indirect register accesses:
- write a 16-bit value to a control register pair (this value
consists of the indirect register table, and the offset inside
the table)
- either read or write an 8-bit value from the data storage
register (indicated by REG_IND_BYTE in the kernel)
The current implementation has the order swapped. It can be
proven, by reading back some indirect register with known content
(the EEE register modified in ksz8_handle_global_errata() is one of
these), that this implementation does not work.
Private discussion with Oleksij Rempel of Pengutronix has revealed
that the workaround was apparantly never tested on actual hardware.
[1] https://ww1.microchip.com/downloads/aemDocuments/documents/OTH/ProductDocuments/Errata/KSZ87xx-Errata-DS80000687C.pdf
Signed-off-by: Tobias Jakobi (Compleo) <tobias.jakobi.compleo@gmail.com>
Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de>
Fixes: 7b6e6235b6 ("net: dsa: microchip: ksz8795: handle eee specif erratum")
Link: https://lore.kernel.org/r/20240304154135.161332-1-tobias.jakobi.compleo@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Older versions of GCC really want to know the full definition
of the type involved in rcu_assign_pointer().
struct dpll_pin is defined in a local header, net/core can't
reach it. Move all the netdev <> dpll code into dpll, where
the type is known. Otherwise we'd need multiple function calls
to jump between the compilation units.
This is the same problem the commit under fixes was trying to address,
but with rcu_assign_pointer() not rcu_dereference().
Some of the exports are not needed, networking core can't
be a module, we only need exports for the helpers used by
drivers.
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Link: https://lore.kernel.org/all/35a869c8-52e8-177-1d4d-e57578b99b6@linux-m68k.org/
Fixes: 640f41ed33 ("dpll: fix build failure due to rcu_dereference_check() on unknown type")
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240305013532.694866-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When running an XDP program that is attached to a cpumap entry, we don't
initialise the xdp_rxq_info data structure being used in the xdp_buff
that backs the XDP program invocation. Tobias noticed that this leads to
random values being returned as the xdp_md->rx_queue_index value for XDP
programs running in a cpumap.
This means we're basically returning the contents of the uninitialised
memory, which is bad. Fix this by zero-initialising the rxq data
structure before running the XDP program.
Fixes: 9216477449 ("bpf: cpumap: Add the possibility to attach an eBPF program to cpumap")
Reported-by: Tobias Böhm <tobias@aibor.de>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/r/20240305213132.11955-1-toke@redhat.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Commit 9b0ed890ac ("bonding: do not report NETDEV_XDP_ACT_XSK_ZEROCOPY")
changed the driver from reporting everything as supported before a device
was bonded into having the driver report that no XDP feature is supported
until a real device is bonded as it seems to be more truthful given
eventually real underlying devices decide what XDP features are supported.
The change however did not take into account when all slave devices get
removed from the bond device. In this case after 9b0ed890ac, the driver
keeps reporting a feature mask of 0x77, that is, NETDEV_XDP_ACT_MASK &
~NETDEV_XDP_ACT_XSK_ZEROCOPY whereas it should have reported a feature
mask of 0.
Fix it by resetting XDP feature flags in the same way as if no XDP program
is attached to the bond device. This was uncovered by the XDP bond selftest
which let BPF CI fail. After adjusting the starting masks on the latter
to 0 instead of NETDEV_XDP_ACT_MASK the test passes again together with
this fix.
Fixes: 9b0ed890ac ("bonding: do not report NETDEV_XDP_ACT_XSK_ZEROCOPY")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Magnus Karlsson <magnus.karlsson@intel.com>
Cc: Prashant Batra <prbatra.mail@gmail.com>
Cc: Toke Høiland-Jørgensen <toke@redhat.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Message-ID: <20240305090829.17131-1-daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Eduard Zingerman says:
====================
check bpf_func_state->callback_depth when pruning states
This patch-set fixes bug in states pruning logic hit in mailing list
discussion [0]. The details of the fix are in patch #1.
The main idea for the fix belongs to Yonghong Song,
mine contribution is merely in review and test cases.
There are some changes in verification performance:
File Program Insns (DIFF) States (DIFF)
------------------------- ------------- --------------- --------------
pyperf600_bpf_loop.bpf.o on_event +15 (+0.42%) +0 (+0.00%)
strobemeta_bpf_loop.bpf.o on_event +857 (+37.95%) +60 (+38.96%)
xdp_synproxy_kern.bpf.o syncookie_tc +2892 (+30.39%) +109 (+36.33%)
xdp_synproxy_kern.bpf.o syncookie_xdp +2892 (+30.01%) +109 (+36.09%)
(when tested on a subset of selftests identified by
selftests/bpf/veristat.cfg and Cilium bpf object files from [4])
Changelog:
v2 [2] -> v3:
- fixes for verifier.c commit message as suggested by Yonghong;
- patch-set re-rerouted to 'bpf' tree as suggested in [2];
- patch for test_tcp_custom_syncookie is sent separately to 'bpf-next' [3].
- veristat results updated using 'bpf' tree as baseline and clang 16.
v1 [1] -> v2:
- patch #2 commit message updated to better reflect verifier behavior
with regards to checkpoints tree (suggested by Yonghong);
- veristat results added (suggested by Andrii).
[0] https://lore.kernel.org/bpf/9b251840-7cb8-4d17-bd23-1fc8071d8eef@linux.dev/
[1] https://lore.kernel.org/bpf/20240212143832.28838-1-eddyz87@gmail.com/
[2] https://lore.kernel.org/bpf/20240216150334.31937-1-eddyz87@gmail.com/
[3] https://lore.kernel.org/bpf/20240222150300.14909-1-eddyz87@gmail.com/
[4] https://github.com/anakryiko/cilium
====================
Link: https://lore.kernel.org/r/20240222154121.6991-1-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The test case was minimized from mailing list discussion [0].
It is equivalent to the following C program:
struct iter_limit_bug_ctx { __u64 a; __u64 b; __u64 c; };
static __naked void iter_limit_bug_cb(void)
{
switch (bpf_get_prandom_u32()) {
case 1: ctx->a = 42; break;
case 2: ctx->b = 42; break;
default: ctx->c = 42; break;
}
}
int iter_limit_bug(struct __sk_buff *skb)
{
struct iter_limit_bug_ctx ctx = { 7, 7, 7 };
bpf_loop(2, iter_limit_bug_cb, &ctx, 0);
if (ctx.a == 42 && ctx.b == 42 && ctx.c == 7)
asm volatile("r1 /= 0;":::"r1");
return 0;
}
The main idea is that each loop iteration changes one of the state
variables in a non-deterministic manner. Hence it is premature to
prune the states that have two iterations left comparing them to
states with one iteration left.
E.g. {{7,7,7}, callback_depth=0} can reach state {42,42,7},
while {{7,7,7}, callback_depth=1} can't.
[0] https://lore.kernel.org/bpf/9b251840-7cb8-4d17-bd23-1fc8071d8eef@linux.dev/
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20240222154121.6991-3-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
When comparing current and cached states verifier should consider
bpf_func_state->callback_depth. Current state cannot be pruned against
cached state, when current states has more iterations left compared to
cached state. Current state has more iterations left when it's
callback_depth is smaller.
Below is an example illustrating this bug, minimized from mailing list
discussion [0] (assume that BPF_F_TEST_STATE_FREQ is set).
The example is not a safe program: if loop_cb point (1) is followed by
loop_cb point (2), then division by zero is possible at point (4).
struct ctx {
__u64 a;
__u64 b;
__u64 c;
};
static void loop_cb(int i, struct ctx *ctx)
{
/* assume that generated code is "fallthrough-first":
* if ... == 1 goto
* if ... == 2 goto
* <default>
*/
switch (bpf_get_prandom_u32()) {
case 1: /* 1 */ ctx->a = 42; return 0; break;
case 2: /* 2 */ ctx->b = 42; return 0; break;
default: /* 3 */ ctx->c = 42; return 0; break;
}
}
SEC("tc")
__failure
__flag(BPF_F_TEST_STATE_FREQ)
int test(struct __sk_buff *skb)
{
struct ctx ctx = { 7, 7, 7 };
bpf_loop(2, loop_cb, &ctx, 0); /* 0 */
/* assume generated checks are in-order: .a first */
if (ctx.a == 42 && ctx.b == 42 && ctx.c == 7)
asm volatile("r0 /= 0;":::"r0"); /* 4 */
return 0;
}
Prior to this commit verifier built the following checkpoint tree for
this example:
.------------------------------------- Checkpoint / State name
| .-------------------------------- Code point number
| | .---------------------------- Stack state {ctx.a,ctx.b,ctx.c}
| | | .------------------- Callback depth in frame #0
v v v v
- (0) {7P,7P,7},depth=0
- (3) {7P,7P,7},depth=1
- (0) {7P,7P,42},depth=1
- (3) {7P,7,42},depth=2
- (0) {7P,7,42},depth=2 loop terminates because of depth limit
- (4) {7P,7,42},depth=0 predicted false, ctx.a marked precise
- (6) exit
(a) - (2) {7P,7,42},depth=2
- (0) {7P,42,42},depth=2 loop terminates because of depth limit
- (4) {7P,42,42},depth=0 predicted false, ctx.a marked precise
- (6) exit
(b) - (1) {7P,7P,42},depth=2
- (0) {42P,7P,42},depth=2 loop terminates because of depth limit
- (4) {42P,7P,42},depth=0 predicted false, ctx.{a,b} marked precise
- (6) exit
- (2) {7P,7,7},depth=1 considered safe, pruned using checkpoint (a)
(c) - (1) {7P,7P,7},depth=1 considered safe, pruned using checkpoint (b)
Here checkpoint (b) has callback_depth of 2, meaning that it would
never reach state {42,42,7}.
While checkpoint (c) has callback_depth of 1, and thus
could yet explore the state {42,42,7} if not pruned prematurely.
This commit makes forbids such premature pruning,
allowing verifier to explore states sub-tree starting at (c):
(c) - (1) {7,7,7P},depth=1
- (0) {42P,7,7P},depth=1
...
- (2) {42,7,7},depth=2
- (0) {42,42,7},depth=2 loop terminates because of depth limit
- (4) {42,42,7},depth=0 predicted true, ctx.{a,b,c} marked precise
- (5) division by zero
[0] https://lore.kernel.org/bpf/9b251840-7cb8-4d17-bd23-1fc8071d8eef@linux.dev/
Fixes: bb124da69c ("bpf: keep track of max number of bpf_loop callback iterations")
Suggested-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20240222154121.6991-2-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Pull cgroup fixes from Tejun Heo:
"Two cpuset fixes. Both are for bugs in error handling paths and low
risk"
* tag 'cgroup-for-6.8-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup/cpuset: Fix retval in update_cpumask()
cgroup/cpuset: Fix a memory leak in update_exclusive_cpumask()
Pull integrity fix from Mimi Zohar:
"A single fix to eliminate an unnecessary message"
* tag 'integrity-v6.8-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
integrity: eliminate unnecessary "Problem loading X.509 certificate" msg
Pull x86 platform driver fixes from Hans de Goede:
- Fix P2SB regression causing ACPI errors and high CPU load
- Fix error return path in amd_pmf_init_smart_pc()
* tag 'platform-drivers-x86-v6.8-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/x86/amd/pmf: Fix missing error code in amd_pmf_init_smart_pc()
platform/x86: p2sb: On Goldmont only cache P2SB and SPI devfn BAR
Pull hyperv fixes from Wei Liu:
- Multiple fixes, cleanups and documentations for Hyper-V core code and
drivers
* tag 'hyperv-fixes-signed-20240303' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
Drivers: hv: vmbus: make hv_bus const
x86/hyperv: Allow 15-bit APIC IDs for VTL platforms
x86/hyperv: Make encrypted/decrypted changes safe for load_unaligned_zeropad()
x86/mm: Regularize set_memory_p() parameters and make non-static
x86/hyperv: Use slow_virt_to_phys() in page transition hypervisor callback
Documentation: hyperv: Add overview of PCI pass-thru device support
Drivers: hv: vmbus: Update indentation in create_gpadl_header()
Drivers: hv: vmbus: Remove duplication and cleanup code in create_gpadl_header()
fbdev/hyperv_fb: Fix logic error for Gen2 VMs in hvfb_getmem()
Drivers: hv: vmbus: Calculate ring buffer size for more efficient use of memory
hv_utils: Allow implicit ICTIMESYNCFLAG_SYNC
Refactoring of the field get conversion introduced a regression in the
legacy Wake On Lan from a magic packet with i219 devices. Rx address
copied not correctly from MAC to PHY with FIELD_GET macro.
Fixes: b9a4525450 ("intel: legacy: field get conversion")
Suggested-by: Vitaly Lifshits <vitaly.lifshits@intel.com>
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
When a frame can not be transmitted in XDP_REDIRECT
(e.g. due to a full queue), it is necessary to free
it by calling xdp_return_frame_rx_napi.
However, this is the responsibility of the caller of
the ndo_xdp_xmit (see for example bq_xmit_all in
kernel/bpf/devmap.c) and thus calling it inside
igc_xdp_xmit (which is the ndo_xdp_xmit of the igc
driver) as well will lead to memory corruption.
In fact, bq_xmit_all expects that it can return all
frames after the last successfully transmitted one.
Therefore, break for the first not transmitted frame,
but do not call xdp_return_frame_rx_napi in igc_xdp_xmit.
This is equally implemented in other Intel drivers
such as the igb.
There are two alternatives to this that were rejected:
1. Return num_frames as all the frames would have been
transmitted and release them inside igc_xdp_xmit.
While it might work technically, it is not what
the return value is meant to represent (i.e. the
number of SUCCESSFULLY transmitted packets).
2. Rework kernel/bpf/devmap.c and all drivers to
support non-consecutively dropped packets.
Besides being complex, it likely has a negative
performance impact without a significant gain
since it is anyway unlikely that the next frame
can be transmitted if the previous one was dropped.
The memory corruption can be reproduced with
the following script which leads to a kernel panic
after a few seconds. It basically generates more
traffic than a i225 NIC can transmit and pushes it
via XDP_REDIRECT from a virtual interface to the
physical interface where frames get dropped.
#!/bin/bash
INTERFACE=enp4s0
INTERFACE_IDX=`cat /sys/class/net/$INTERFACE/ifindex`
sudo ip link add dev veth1 type veth peer name veth2
sudo ip link set up $INTERFACE
sudo ip link set up veth1
sudo ip link set up veth2
cat << EOF > redirect.bpf.c
SEC("prog")
int redirect(struct xdp_md *ctx)
{
return bpf_redirect($INTERFACE_IDX, 0);
}
char _license[] SEC("license") = "GPL";
EOF
clang -O2 -g -Wall -target bpf -c redirect.bpf.c -o redirect.bpf.o
sudo ip link set veth2 xdp obj redirect.bpf.o
cat << EOF > pass.bpf.c
SEC("prog")
int pass(struct xdp_md *ctx)
{
return XDP_PASS;
}
char _license[] SEC("license") = "GPL";
EOF
clang -O2 -g -Wall -target bpf -c pass.bpf.c -o pass.bpf.o
sudo ip link set $INTERFACE xdp obj pass.bpf.o
cat << EOF > trafgen.cfg
{
/* Ethernet Header */
0xe8, 0x6a, 0x64, 0x41, 0xbf, 0x46,
0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
const16(ETH_P_IP),
/* IPv4 Header */
0b01000101, 0, # IPv4 version, IHL, TOS
const16(1028), # IPv4 total length (UDP length + 20 bytes (IP header))
const16(2), # IPv4 ident
0b01000000, 0, # IPv4 flags, fragmentation off
64, # IPv4 TTL
17, # Protocol UDP
csumip(14, 33), # IPv4 checksum
/* UDP Header */
10, 0, 1, 1, # IP Src - adapt as needed
10, 0, 1, 2, # IP Dest - adapt as needed
const16(6666), # UDP Src Port
const16(6666), # UDP Dest Port
const16(1008), # UDP length (UDP header 8 bytes + payload length)
csumudp(14, 34), # UDP checksum
/* Payload */
fill('W', 1000),
}
EOF
sudo trafgen -i trafgen.cfg -b3000MB -o veth1 --cpp
Fixes: 4ff3203610 ("igc: Add support for XDP_REDIRECT action")
Signed-off-by: Florian Kauer <florian.kauer@linutronix.de>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Helper i40e_is_fw_ver_eq() compares incorrectly given firmware version
as it returns true when the major version of running firmware is
greater than the given major version that is wrong and results in
failure during getting of DCB configuration where this helper is used.
Fix the check and return true only if the running FW version is exactly
equals to the given version.
Reproducer:
1. Load i40e driver
2. Check dmesg output
[root@host ~]# modprobe i40e
[root@host ~]# dmesg | grep 'i40e.*DCB'
[ 74.750642] i40e 0000:02:00.0: Query for DCB configuration failed, err -EIO aq_err I40E_AQ_RC_EINVAL
[ 74.759770] i40e 0000:02:00.0: DCB init failed -5, disabled
[ 74.966550] i40e 0000:02:00.1: Query for DCB configuration failed, err -EIO aq_err I40E_AQ_RC_EINVAL
[ 74.975683] i40e 0000:02:00.1: DCB init failed -5, disabled
Fixes: cf488e1322 ("i40e: Add other helpers to check version of running firmware and AQ API")
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The function ice_bridge_setlink() may encounter a NULL pointer dereference
if nlmsg_find_attr() returns NULL and br_spec is dereferenced subsequently
in nla_for_each_nested(). To address this issue, add a check to ensure that
br_spec is not NULL before proceeding with the nested attribute iteration.
Fixes: b1edc14a3f ("ice: Implement ice_bridge_getlink and ice_bridge_setlink")
Signed-off-by: Rand Deeb <rand.sec96@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The E800 series hardware uses the same iAVF driver as older devices,
including the virtchnl negotiation scheme.
This negotiation scheme includes a mechanism to determine what type of RSS
should be supported, including RSS over PF virtchnl messages, RSS over
firmware AdminQ messages, and RSS via direct register access.
The PF driver will always prefer VIRTCHNL_VF_OFFLOAD_RSS_PF if its
supported by the VF driver. However, if an older VF driver is loaded, it
may request only VIRTCHNL_VF_OFFLOAD_RSS_REG or VIRTCHNL_VF_OFFLOAD_RSS_AQ.
The ice driver happily agrees to support these methods. Unfortunately, the
underlying hardware does not support these mechanisms. The E800 series VFs
don't have the appropriate registers for RSS_REG. The mailbox queue used by
VFs for VF to PF communication blocks messages which do not have the
VF-to-PF opcode.
Stop lying to the VF that it could support RSS over AdminQ or registers, as
these interfaces do not work when the hardware is operating on an E800
series device.
In practice this is unlikely to be hit by any normal user. The iAVF driver
has supported RSS over PF virtchnl commands since 2016, and always defaults
to using RSS_PF if possible.
In principle, nothing actually stops the existing VF from attempting to
access the registers or send an AQ command. However a properly coded VF
will check the capability flags and will report a more useful error if it
detects a case where the driver does not support the RSS offloads that it
does.
Fixes: 1071a8358a ("ice: Implement virtchnl commands for AVF support")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Alan Brady <alan.brady@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Fix softirq's not being handled during napi_schedule() call when
receiving marker packets for queue disable by disabling local bottom
half.
The issue can be seen on ifdown:
NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #08!!!
Using ftrace to catch the failing scenario:
ifconfig [003] d.... 22739.830624: softirq_raise: vec=3 [action=NET_RX]
<idle>-0 [003] ..s.. 22739.831357: softirq_entry: vec=3 [action=NET_RX]
No interrupt and CPU is idle.
After the patch when disabling local BH before calling napi_schedule:
ifconfig [003] d.... 22993.928336: softirq_raise: vec=3 [action=NET_RX]
ifconfig [003] ..s1. 22993.928337: softirq_entry: vec=3 [action=NET_RX]
Fixes: c2d548cad1 ("idpf: add TX splitq napi poll support")
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Signed-off-by: Alan Brady <alan.brady@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
This patch intended to fix an well-knonw issue in old drivers where the
endpoint type is taken for granted, which is often triggered by fuzzers.
That was the case for this driver [1], and although the fix seems to be
correct, it uncovered another issue that leads to a regression [2], if
the endpoints of the current interface are checked.
The driver makes use of endpoints that belong to a different interface
rather than the one it binds (it binds to the third interface, but also
accesses an endpoint from a different one). The driver should claim the
interfaces it requires, but that is still not the case.
Given that the regression is more severe than the issue found by
syzkaller, the best approach is reverting the patch that causes the
regression, and trying to fix the underlying problem before checking
the endpoint types again.
Note that reverting this patch will probably trigger the syzkaller bug
at some point.
This reverts commit 2b9c3eb32a.
Link: https://syzkaller.appspot.com/bug?extid=348331f63b034f89b622 [1]
Link: https://lore.kernel.org/linux-input/87sf161jjc.wl-tiwai@suse.de/ [2]
Fixes: 2b9c3eb32a ("Input: bcm5974 - check endpoint type before starting traffic")
Reported-by: Jacopo Radice <jacopo.radice@outlook.com>
Closes: https://bugzilla.suse.com/show_bug.cgi?id=1220030
Signed-off-by: Javier Carrasco <javier.carrasco.cruz@gmail.com>
Link: https://lore.kernel.org/r/20240305-revert_bcm5974_ep_check-v3-1-527198cf6499@gmail.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
On Arrow Lake S systems, MEI is no longer strictly connected to bus 0,
while graphics remain exclusively on bus 0. Adapt the component
matching logic to accommodate this change:
Original behavior: Required both MEI and graphics to be on the same
bus 0.
New behavior: Only enforces graphics to be on bus 0 (integrated),
allowing MEI to reside on any bus.
This ensures compatibility with Arrow Lake S and maintains functionality
for the legacy systems.
Fixes: 1dd924f688 ("mei: gsc_proxy: add gsc proxy driver")
Cc: stable@vger.kernel.org # v6.3+
Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Link: https://lore.kernel.org/r/20240220200020.231192-1-tomas.winkler@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
For CMA memory allocation, ownership is assigned to DSP to make it
accessible by the PD running on the DSP. With current implementation
HLOS VM is stored in the channel structure during rpmsg_probe and
this VM is passed to qcom_scm call as the source VM.
The qcom_scm call will overwrite the passed source VM with the next
VM which would cause a problem in case the scm call is again needed.
Adding a local copy of source VM whereever scm call is made to avoid
this problem.
Fixes: 0871561055 ("misc: fastrpc: Add support for audiopd")
Cc: stable <stable@kernel.org>
Signed-off-by: Ekansh Gupta <quic_ekangupt@quicinc.com>
Reviewed-by: Elliot Berman <quic_eberman@quicinc.com>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Link: https://lore.kernel.org/r/20240224114247.85953-2-srinivas.kandagatla@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The comedi_test devices have a couple of timers (ai_timer and ao_timer)
that can be started to simulate hardware interrupts. Their expiry
functions normally reschedule the timer. The driver code calls either
del_timer_sync() or del_timer() to delete the timers from the queue, but
does not currently prevent the timers from rescheduling themselves so
synchronized deletion may be ineffective.
Add a couple of boolean members (one for each timer: ai_timer_enable and
ao_timer_enable) to the device private data structure to indicate
whether the timers are allowed to reschedule themselves. Set the member
to true when adding the timer to the queue, and to false when deleting
the timer from the queue in the waveform_ai_cancel() and
waveform_ao_cancel() functions.
The del_timer_sync() function is also called from the waveform_detach()
function, but the timer enable members will already be set to false when
that function is called, so no change is needed there.
Fixes: 403fe7f34e ("staging: comedi: comedi_test: fix timer race conditions")
Cc: stable@vger.kernel.org # 4.4+
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Link: https://lore.kernel.org/r/20240214100747.16203-1-abbotti@mev.co.uk
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Ring expansion checker may incorrectly assume a completely full ring
is empty, missing the need for expansion.
This is due to a special empty ring case where the dequeue ends up
ahead of the enqueue pointer. This is seen when enqueued TRBs fill up
exactly a segment, with enqueue then pointing to the end link TRB.
Once those TRBs are handled the dequeue pointer will follow the link
TRB and end up pointing to the first entry on the next segment, past
the enqueue.
This same enqueue - dequeue condition can be true if a ring is full,
with enqueue ending on that last link TRB before the dequeue pointer
on the next segment.
This can be seen when queuing several ~510 small URBs via usbfs in
one go before a single one is handled (i.e. dequeue not moved from first
entry in segment).
Expand the ring already when enqueue reaches the link TRB before the
dequeue segment, instead of expanding it when enqueue moves into the
dequeue segment.
Reported-by: Chris Yokum <linux-usb@mail.totalphase.com>
Closes: https://lore.kernel.org/all/949223224.833962.1709339266739.JavaMail.zimbra@totalphase.com
Tested-by: Chris Yokum <linux-usb@mail.totalphase.com>
Fixes: f5af638f06 ("xhci: Fix transfer ring expansion size calculation")
Cc: stable@vger.kernel.org # v6.5+
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20240305132312.955171-2-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This reverts commit 5c7e105cd1.
As identified by KASAN, the simplification done by the cleanup patch
was not legal.
>From tracing through the code, it can be seen that we're transmitting
from a 4096-byte circular buffer. We copy anywhere from 1-4 bytes from
it each time. The simplification runs into trouble when we get near
the end of the circular buffer. For instance, we might start out with
xmit->tail = 4094 and we want to transfer 4 bytes. With the code
before simplification this was no problem. We'd read buf[4094],
buf[4095], buf[0], and buf[1]. With the new code we'll do a
memcpy(&buf[4094], 4) which reads 2 bytes past the end of the buffer
and then skips transmitting what's at buf[0] and buf[1].
KASAN isn't 100% consistent at reporting this for me, but to be extra
confident in the analysis, I added traces of the tail and tx_bytes and
then wrote a test program:
while true; do
echo -n "abcdefghijklmnopqrstuvwxyz0" > /dev/ttyMSM0
sleep .1
done
I watched the traces over SSH and saw:
qcom_geni_serial_send_chunk_fifo: 4093 4
qcom_geni_serial_send_chunk_fifo: 1 3
Which indicated that one byte should be missing. Sure enough the
output that should have been:
abcdefghijklmnopqrstuvwxyz0
In one case was actually missing a byte:
abcdefghijklmnopqrstuvwyz0
Running "ls -al" on large directories also made the missing bytes
obvious since columns didn't line up.
While the original code may not be the most elegant, we only talking
about copying up to 4 bytes here. Let's just go back to the code that
worked.
Fixes: 5c7e105cd1 ("tty: serial: simplify qcom_geni_serial_send_chunk_fifo()")
Cc: stable <stable@kernel.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Acked-by: Jiri Slaby <jirislaby@kernel.org>
Tested-by: Johan Hovold <johan+linaro@kernel.org>
Link: https://lore.kernel.org/r/20240304174952.1.I920a314049b345efd1f69d708e7f74d2213d0b49@changeid
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If the remote uart device is not connected or not enabled after booting
up, the CTS line is high by default. At this time, if we enable the flow
control when opening the device(for example, using “stty -F /dev/ttyLP4
crtscts” command), there will be a pending idle preamble(first writing 0
and then writing 1 to UARTCTRL_TE will queue an idle preamble) that
cannot be sent out, resulting in the uart port fail to close(waiting for
TX empty), so the user space stty will have to wait for a long time or
forever.
This is an LPUART IP bug(idle preamble has higher priority than CTS),
here add a workaround patch to enable TX CTS after enabling UARTCTRL_TE,
so that the idle preamble does not get stuck due to CTS is deasserted.
Fixes: 380c966c09 ("tty: serial: fsl_lpuart: add 32-bit register interface support")
Cc: stable <stable@kernel.org>
Signed-off-by: Sherry Sun <sherry.sun@nxp.com>
Reviewed-by: Alexander Sverdlin <alexander.sverdlin@siemens.com>
Link: https://lore.kernel.org/r/20240305015706.1050769-1-sherry.sun@nxp.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
While connecting to a Linux host with CDC_NCM_NTB_DEF_SIZE_TX
set to 65536, it has been observed that we receive short packets,
which come at interval of 5-10 seconds sometimes and have block
length zero but still contain 1-2 valid datagrams present.
According to the NCM spec:
"If wBlockLength = 0x0000, the block is terminated by a
short packet. In this case, the USB transfer must still
be shorter than dwNtbInMaxSize or dwNtbOutMaxSize. If
exactly dwNtbInMaxSize or dwNtbOutMaxSize bytes are sent,
and the size is a multiple of wMaxPacketSize for the
given pipe, then no ZLP shall be sent.
wBlockLength= 0x0000 must be used with extreme care, because
of the possibility that the host and device may get out of
sync, and because of test issues.
wBlockLength = 0x0000 allows the sender to reduce latency by
starting to send a very large NTB, and then shortening it when
the sender discovers that there’s not sufficient data to justify
sending a large NTB"
However, there is a potential issue with the current implementation,
as it checks for the occurrence of multiple NTBs in a single
giveback by verifying if the leftover bytes to be processed is zero
or not. If the block length reads zero, we would process the same
NTB infintely because the leftover bytes is never zero and it leads
to a crash. Fix this by bailing out if block length reads zero.
Cc: stable@vger.kernel.org
Fixes: 427694cfaa ("usb: gadget: ncm: Handle decoding of multiple NTB's in unwrap call")
Signed-off-by: Krishna Kurapati <quic_kriskura@quicinc.com>
Reviewed-by: Maciej Żenczykowski <maze@google.com>
Link: https://lore.kernel.org/r/20240228115441.2105585-1-quic_kriskura@quicinc.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The DisplayPort driver's sysfs nodes may be present to the userspace before
typec_altmode_set_drvdata() completes in dp_altmode_probe. This means that
a sysfs read can trigger a NULL pointer error by deferencing dp->hpd in
hpd_show or dp->lock in pin_assignment_show, as dev_get_drvdata() returns
NULL in those cases.
Remove manual sysfs node creation in favor of adding attribute group as
default for devices bound to the driver. The ATTRIBUTE_GROUPS() macro is
not used here otherwise the path to the sysfs nodes is no longer compliant
with the ABI.
Fixes: 0e3bb7d689 ("usb: typec: Add driver for DisplayPort alternate mode")
Cc: stable@vger.kernel.org
Signed-off-by: RD Babiera <rdbabiera@google.com>
Link: https://lore.kernel.org/r/20240229001101.3889432-2-rdbabiera@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
While commit 69f89168b3 ("usb: typec: tpcm: Fix issues with power being
removed during reset") fixes the boot issues for bus powered devices such
as LibreTech Renegade Elite/Firefly, it trades off the CC pins NOT being
Hi-Zed during errory recovery (i.e PORT_RESET) for devices which are NOT
bus powered(a.k.a self powered). This change Hi-Zs the CC pins only for
self powered devices, thus preventing brown out for bus powered devices
Adhering to spec is gaining more importance due to the Common charger
initiative enforced by the European Union.
Quoting from the spec:
4.5.2.2.2.1 ErrorRecovery State Requirements
The port shall not drive VBUS or VCONN, and shall present a
high-impedance to ground (above zOPEN) on its CC1 and CC2 pins.
Hi-Zing the CC pins is the inteded behavior for PORT_RESET.
CC pins are set to default state after tErrorRecovery in
PORT_RESET_WAIT_OFF.
4.5.2.2.2.2 Exiting From ErrorRecovery State
A Sink shall transition to Unattached.SNK after tErrorRecovery.
A Source shall transition to Unattached.SRC after tErrorRecovery.
Fixes: 69f89168b3 ("usb: typec: tpcm: Fix issues with power being removed during reset")
Cc: stable@vger.kernel.org
Cc: Mark Brown <broonie@kernel.org>
Signed-off-by: Badhri Jagan Sridharan <badhri@google.com>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20240228000512.746252-1-badhri@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In packet offload, packets are not encrypted in XFRM stack, so
the next network layer which the packets will be forwarded to
should depend on where the packet came from (either xfrm4_output
or xfrm6_output) rather than the matched SA's family type.
Test: verified IPv6-in-IPv4 packets on Android device with
IPsec packet offload enabled
Signed-off-by: Mike Yu <yumike@google.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
In current code, xfrm_bundle_create() always uses the matched
SA's family type to look up a xfrm child route for the skb.
The route returned by xfrm_dst_lookup() will eventually be
used in xfrm_output_resume() (skb_dst(skb)->ops->local_out()).
If packet offload is used, the above behavior can lead to
calling ip_local_out() for an IPv6 packet or calling
ip6_local_out() for an IPv4 packet, which is likely to fail.
This change fixes the behavior by checking if the matched SA
has packet offload enabled. If not, keep the same behavior;
if yes, use the matched SP's family type for the lookup.
Test: verified IPv6-in-IPv4 packets on Android device with
IPsec packet offload enabled
Signed-off-by: Mike Yu <yumike@google.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
The icl+ power well code currently assumes that every AUX power
well maps to an encoder which is using said power well. That is
by no menas guaranteed as we:
- only register encoders for ports declared in the VBT
- combo PHY HDMI-only encoder no longer get an AUX CH since
commit 9856308c94 ("drm/i915: Only populate aux_ch if really needed")
However we have places such as intel_power_domains_sanitize_state()
that blindly traverse all the possible power wells. So these bits
of code may very well encounbter an aux power well with no associated
encoder.
In this particular case the BIOS seems to have left one AUX power
well enabled even though we're dealing with a HDMI only encoder
on a combo PHY. We then proceed to turn off said power well and
explode when we can't find a matching encoder. As a short term fix
we should be able to just skip the PHY related parts of the power
well programming since we know this situation can only happen with
combo PHYs.
Another option might be to go back to always picking an AUX CH for
all encoders. However I'm a bit wary about that since we might in
theory end up conflicting with the VBT AUX CH assignment. Also
that wouldn't help with encoders not declared in the VBT, should
we ever need to poke the corresponding power wells.
Longer term we need to figure out what the actual relationship
is between the PHY vs. AUX CH vs. AUX power well. Currently this
is entirely unclear.
Cc: stable@vger.kernel.org
Fixes: 9856308c94 ("drm/i915: Only populate aux_ch if really needed")
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/10184
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240223203216.15210-1-ville.syrjala@linux.intel.com
Reviewed-by: Imre Deak <imre.deak@intel.com>
(cherry picked from commit 6a8c66bf0e)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
On Goldmont p2sb_bar() only ever gets called for 2 devices, the actual P2SB
devfn 13,0 and the SPI controller which is part of the P2SB, devfn 13,2.
But the current p2sb code tries to cache BAR0 info for all of
devfn 13,0 to 13,7 . This involves calling pci_scan_single_device()
for device 13 functions 0-7 and the hw does not seem to like
pci_scan_single_device() getting called for some of the other hidden
devices. E.g. on an ASUS VivoBook D540NV-GQ065T this leads to continuous
ACPI errors leading to high CPU usage.
Fix this by only caching BAR0 info and thus only calling
pci_scan_single_device() for the P2SB and the SPI controller.
Fixes: 5913320eb0 ("platform/x86: p2sb: Allow p2sb_bar() calls during PCI device probe")
Reported-by: Danil Rybakov <danilrybakov249@gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218531
Tested-by: Danil Rybakov <danilrybakov249@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20240304134356.305375-2-hdegoede@redhat.com
Saeed Mahameed says:
====================
mlx5 fixes 2024-03-01
This series provides bug fixes to mlx5 driver.
Please pull and let me know if there is any problem.
* tag 'mlx5-fixes-2024-03-01' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5e: Switch to using _bh variant of of spinlock API in port timestamping NAPI poll context
net/mlx5e: Use a memory barrier to enforce PTP WQ xmit submission tracking occurs after populating the metadata_map
net/mlx5e: Fix MACsec state loss upon state update in offload path
net/mlx5e: Change the warning when ignore_flow_level is not supported
net/mlx5: Check capability for fw_reset
net/mlx5: Fix fw reporter diagnose output
net/mlx5: E-switch, Change flow rule destination checking
Revert "net/mlx5e: Check the number of elements before walk TC rhashtable"
Revert "net/mlx5: Block entering switchdev mode with ns inconsistency"
====================
Link: https://lore.kernel.org/r/20240302070318.62997-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2024-03-01 (ixgbe, i40e, ice)
This series contains updates to ixgbe, i40e, and ice drivers.
Maciej corrects disable flow for ixgbe, i40e, and ice drivers which could
cause non-functional interface with AF_XDP.
Michal restores host configuration when changing MSI-X count for VFs on
ice driver.
* '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
ice: reconfig host after changing MSI-X on VF
ice: reorder disabling IRQ and NAPI in ice_qp_dis
i40e: disable NAPI right after disabling irqs when handling xsk_pool
ixgbe: {dis, en}able irqs in ixgbe_txrx_ring_{dis, en}able
====================
Link: https://lore.kernel.org/r/20240301192549.2993798-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Based on the static analyzis of the code it looks like when an entry
from the MAC table was removed, the entry was still used after being
freed. More precise the vid of the mac_entry was used after calling
devm_kfree on the mac_entry.
The fix consists in first using the vid of the mac_entry to delete the
entry from the HW and after that to free it.
Fixes: b37a1bae74 ("net: sparx5: add mactable support")
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240301080608.3053468-1-horatiu.vultur@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kishon updated his email in commit e6aa4edd2f ("MAINTAINERS: Update
Kishon's email address in PCI endpoint subsystem").
However, as he is no longer at TI, his TI email now bounces.
Add the same email as he has in MAINTAINERS to a mailmap, so that
get_maintainer.pl will not output an email that bounces.
(This is neeed as get_maintainer.pl will use "git author" to CC
people who have significantly modified the same file as you.)
Link: https://lkml.kernel.org/r/20240229134318.1201935-1-cassel@kernel.org
Signed-off-by: Niklas Cassel <cassel@kernel.org>
Cc: Kishon Vijay Abraham I <kishon@kernel.org>
Cc: Kishon Vijay Abraham I <kishon@ti.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
When debugging issues with a workload using SysV shmem, Michal Hocko has
come up with a reproducer that shows how a series of mprotect() operations
can result in an elevated shm_nattch and thus leak of the resource.
The problem is caused by wrong assumptions in vma_merge() commit
714965ca82 ("mm/mmap: start distinguishing if vma can be removed in
mergeability test"). The shmem vmas have a vma_ops->close callback that
decrements shm_nattch, and we remove the vma without calling it.
vma_merge() has thus historically avoided merging vma's with
vma_ops->close and commit 714965ca82 was supposed to keep it that way.
It relaxed the checks for vma_ops->close in can_vma_merge_after() assuming
that it is never called on a vma that would be a candidate for removal.
However, the vma_merge() code does also use the result of this check in
the decision to remove a different vma in the merge case 7.
A robust solution would be to refactor vma_merge() code in a way that the
vma_ops->close check is only done for vma's that are actually going to be
removed, and not as part of the preliminary checks. That would both solve
the existing bug, and also allow additional merges that the checks
currently prevent unnecessarily in some cases.
However to fix the existing bug first with a minimized risk, and for
easier stable backports, this patch only adds a vma_ops->close check to
the buggy case 7 specifically. All other cases of vma removal are covered
by the can_vma_merge_before() check that includes the test for
vma_ops->close.
The reproducer code, adapted from Michal Hocko's code:
int main(int argc, char *argv[]) {
int segment_id;
size_t segment_size = 20 * PAGE_SIZE;
char * sh_mem;
struct shmid_ds shmid_ds;
key_t key = 0x1234;
segment_id = shmget(key, segment_size,
IPC_CREAT | IPC_EXCL | S_IRUSR | S_IWUSR);
sh_mem = (char *)shmat(segment_id, NULL, 0);
mprotect(sh_mem + 2*PAGE_SIZE, PAGE_SIZE, PROT_NONE);
mprotect(sh_mem + PAGE_SIZE, PAGE_SIZE, PROT_WRITE);
mprotect(sh_mem + 2*PAGE_SIZE, PAGE_SIZE, PROT_WRITE);
shmdt(sh_mem);
shmctl(segment_id, IPC_STAT, &shmid_ds);
printf("nattch after shmdt(): %lu (expected: 0)\n", shmid_ds.shm_nattch);
if (shmctl(segment_id, IPC_RMID, 0))
printf("IPCRM failed %d\n", errno);
return (shmid_ds.shm_nattch) ? 1 : 0;
}
Link: https://lkml.kernel.org/r/20240222215930.14637-2-vbabka@suse.cz
Fixes: 714965ca82 ("mm/mmap: start distinguishing if vma can be removed in mergeability test")
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
After ptep_clear_flush(), if we find that src_folio is pinned we will fail
UFFDIO_MOVE and put src_folio back to src_pte entry, but the change to
src_folio->{mapping,index} is not restored in this process. This is not
what we expected, so fix it.
This can cause the rmap for that page to be invalid, possibly resulting
in memory corruption. At least swapout+migration would no longer work,
because we might fail to locate the mappings of that folio.
Link: https://lkml.kernel.org/r/20240222080815.46291-1-zhengqi.arch@bytedance.com
Fixes: adef440691 ("userfaultfd: UFFDIO_MOVE uABI")
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Sven reports an infinite loop in __alloc_pages_slowpath() for costly order
__GFP_RETRY_MAYFAIL allocations that are also GFP_NOIO. Such combination
can happen in a suspend/resume context where a GFP_KERNEL allocation can
have __GFP_IO masked out via gfp_allowed_mask.
Quoting Sven:
1. try to do a "costly" allocation (order > PAGE_ALLOC_COSTLY_ORDER)
with __GFP_RETRY_MAYFAIL set.
2. page alloc's __alloc_pages_slowpath tries to get a page from the
freelist. This fails because there is nothing free of that costly
order.
3. page alloc tries to reclaim by calling __alloc_pages_direct_reclaim,
which bails out because a zone is ready to be compacted; it pretends
to have made a single page of progress.
4. page alloc tries to compact, but this always bails out early because
__GFP_IO is not set (it's not passed by the snd allocator, and even
if it were, we are suspending so the __GFP_IO flag would be cleared
anyway).
5. page alloc believes reclaim progress was made (because of the
pretense in item 3) and so it checks whether it should retry
compaction. The compaction retry logic thinks it should try again,
because:
a) reclaim is needed because of the early bail-out in item 4
b) a zonelist is suitable for compaction
6. goto 2. indefinite stall.
(end quote)
The immediate root cause is confusing the COMPACT_SKIPPED returned from
__alloc_pages_direct_compact() (step 4) due to lack of __GFP_IO to be
indicating a lack of order-0 pages, and in step 5 evaluating that in
should_compact_retry() as a reason to retry, before incrementing and
limiting the number of retries. There are however other places that
wrongly assume that compaction can happen while we lack __GFP_IO.
To fix this, introduce gfp_compaction_allowed() to abstract the __GFP_IO
evaluation and switch the open-coded test in try_to_compact_pages() to use
it.
Also use the new helper in:
- compaction_ready(), which will make reclaim not bail out in step 3, so
there's at least one attempt to actually reclaim, even if chances are
small for a costly order
- in_reclaim_compaction() which will make should_continue_reclaim()
return false and we don't over-reclaim unnecessarily
- in __alloc_pages_slowpath() to set a local variable can_compact,
which is then used to avoid retrying reclaim/compaction for costly
allocations (step 5) if we can't compact and also to skip the early
compaction attempt that we do in some cases
Link: https://lkml.kernel.org/r/20240221114357.13655-2-vbabka@suse.cz
Fixes: 3250845d05 ("Revert "mm, oom: prevent premature OOM killer invocation for high order request"")
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: Sven van Ashbrook <svenva@chromium.org>
Closes: https://lore.kernel.org/all/CAG-rBihs_xMKb3wrMO1%2B-%2Bp4fowP9oy1pa_OTkfxBzPUVOZF%2Bg@mail.gmail.com/
Tested-by: Karthikeyan Ramasubramanian <kramasub@chromium.org>
Cc: Brian Geffon <bgeffon@google.com>
Cc: Curtis Malainey <cujomalainey@chromium.org>
Cc: Jaroslav Kysela <perex@perex.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Takashi Iwai <tiwai@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
cocci warnings: (new ones prefixed by >>)
>> drivers/firmware/microchip/mpfs-auto-update.c:387:72-78:
ERROR: application of sizeof to pointer
drivers/firmware/microchip/mpfs-auto-update.c:170:72-78:
ERROR: application of sizeof to pointer
response_msg is a pointer to u32, so the size of element it points to is
supposed to be a multiple of sizeof(u32), rather than sizeof(u32 *).
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202403040516.CYxoWTXw-lkp@intel.com/
Signed-off-by: Dawei Li <dawei.li@shingroup.cn>
Fixes: ec5b0f1193 ("firmware: microchip: add PolarFire SoC Auto Update support")
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Timing select registers for SRC and CMD are by default
referring to the corresponding SSI word select.
The calculation rule from HW spec skips SSI8, which has
no clock connection.
>From section 43.2.18 CMD Output Timing Select Register (CMDOUT_TIMSEL),
of R-Car Series, 3rd Generation Hardware User’s Manual Rev.2.20:
CMD0_OUT_DIVCLK_ Output Timing
SEL [4:0] Signal Select
B'0 0110: ssi_ws0
B'0 0111: ssi_ws1
B'0 1000: ssi_ws2
B'0 1001: ssi_ws3
B'0 1010: ssi_ws4
B'0 1011: ssi_ws5
B'0 1100: ssi_ws6
B'0 1101: ssi_ws7
<GAP>
B'0 1110: ssi_ws9
B'0 1111: Setting prohibited
Fix the erroneous prohibited setting of timsel value 1111 (0xf) for SSI9
by using timsel value 1110 (0xe) instead. This is possible because SSI8
is not connected as shown by <GAP> in the table above.
[21.695055] rcar_sound ec500000.sound: b adg[0]-CMDOUT_TIMSEL (32):00000f00/00000f1f
Correct the timsel assignment.
Fixes: 629509c5bc ("ASoC: rsnd: add Gen2 SRC and DMAEngine support")
Suggested-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Signed-off-by: Andreas Pape <Andreas.Pape4@bosch.com>
Signed-off-by: Yeswanth Rayapati <yeswanth.rayapati@in.bosch.com>
Tested-by: Yeswanth Rayapati <yeswanth.rayapati@in.bosch.com>
[erosca: massage commit description]
Signed-off-by: Eugeniu Rosca <eugeniu.rosca@bosch.com>
Acked-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Link: https://msgid.link/r/20240301085003.3057-1-erosca@de.adit-jv.com
Signed-off-by: Mark Brown <broonie@kernel.org>
The linear ranges aren't really matching what they should be. Indeed,
the range is inclusive of the min value, so it makes sense the previous
range does NOT include the max step value representing the min value of
the range in question.
Since 3.4V is represented by the decimal value 232, the previous range
max step value should be 231 and not 232.
No expected change in behavior since 3.4V was mapped with step 232 from
the first range but is now mapped with step 232 from the second range.
While at it, remove the incorrect comment from the second range.
Fixes: f991a220a4 ("regulator: rk808: add rk806 support")
Cc: Quentin Schulz <foss+kernel@0leil.net>
Signed-off-by: Quentin Schulz <quentin.schulz@theobroma-systems.com>
Link: https://msgid.link/r/20240223-rk806-regulator-ranges-v1-2-3904ab70d250@theobroma-systems.com
Signed-off-by: Mark Brown <broonie@kernel.org>
The linear ranges aren't really matching what they should be. Indeed,
the range is inclusive of the min value, so it makes sense the previous
range does NOT include the max step value representing the min value of
the range in question.
Since 1.5V is represented by the decimal value 160, the previous range
max step value should be 159 and not 160. Similarly, 3.4V is represented
by the decimal value 236, so the previous range max value should be 235
and not 237.
The only change in behavior this makes is that this actually modeled
the ranges to map step with decimal value 237 with 3.65V instead of
3.4V (the max supported by the HW).
Fixes: f991a220a4 ("regulator: rk808: add rk806 support")
Cc: Quentin Schulz <foss+kernel@0leil.net>
Signed-off-by: Quentin Schulz <quentin.schulz@theobroma-systems.com>
Link: https://msgid.link/r/20240223-rk806-regulator-ranges-v1-1-3904ab70d250@theobroma-systems.com
Signed-off-by: Mark Brown <broonie@kernel.org>
arm64: tegra: Device tree fixes for v6.8
This contains two fixes to make the MGBE Ethernet devices found on
Tegra234 work properly.
* tag 'tegra-for-6.8-arm64-dt' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux:
arm64: tegra: Fix Tegra234 MGBE power-domains
arm64: tegra: Set the correct PHY mode for MGBE
Link: https://lore.kernel.org/r/20240226144536.1525704-1-thierry.reding@gmail.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
i.MX fixes for 6.8, round 2:
- Update MAINTAINERS to use a public mailing list for NXP i.MX
development.
- Re-enable CONFIG_BACKLIGHT_CLASS_DEVICE in imx_v6_v7_defconfig to fix
a backlight regression.
- Remove DSI port endpoints from i.MX7 SoC DTSI to fix a display
regression.
- Fix LDB clocks property for i.MX8MP device tree.
- Fix TC9595 reset GPIO on DH i.MX8M Plus DHCOM SoM.
* tag 'imx-fixes-6.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux:
arm64: dts: imx8mp: Fix LDB clocks property
arm64: dts: imx8mp: Fix TC9595 reset GPIO on DH i.MX8M Plus DHCOM SoM
MAINTAINERS: Use a proper mailinglist for NXP i.MX development
ARM: dts: imx7: remove DSI port endpoints
ARM: imx_v6_v7_defconfig: Restore CONFIG_BACKLIGHT_CLASS_DEVICE
Link: https://lore.kernel.org/r/ZdtPJzdenRybI+Bq@dragon
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Qualcomm ARM64 DeviceTree fixes for 6.8
This marks an additional GPIO as protected on SM8650 devices, to avoid
a system reset caused by a security violation with some firmware
versions.
It also adds the missing interconnect-names, which resolves a regression
where one of the I2C busses on SM6115 devices would no longer probe in
Linux.
* tag 'qcom-arm64-fixes-for-6.8' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux:
arm64: dts: qcom: sm6115: Fix missing interconnect-names
arm64: dts: qcom: sm8650-mtp: add gpio74 as reserved gpio
arm64: dts: qcom: sm8650-qrd: add gpio74 as reserved gpio
Link: https://lore.kernel.org/r/20240225025205.479589-1-andersson@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Matthieu Baerts says:
====================
selftests: mptcp: fixes for diag.sh
Here are two patches fixing issues in MPTCP diag.sh kselftest:
- Patch 1 makes sure the exit code is '1' in case of error, and not the
test ID, not to return an exit code that would be wrongly interpreted
by the ksefltests framework, e.g. '4' means 'skip'.
- Patch 2 avoids waiting for unnecessary conditions, which can cause
timeouts in some very slow environments.
====================
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
When creating a lot of listener sockets, it is enough to wait only for
the last one, like we are doing before in diag.sh for other subtests.
If we do a check for each listener sockets, each time listing all
available sockets, it can take a very long time in very slow
environments, at the point we can reach some timeout.
When using the debug kconfig, the waiting time switches from more than
8 sec to 0.1 sec on my side. In slow/busy environments, and with a poll
timeout set to 30 ms, the waiting time could go up to ~100 sec because
the listener socket would timeout and stop, while the script would still
be checking one by one if all sockets are ready. The result is that
after having waited for everything to be ready, all sockets have been
stopped due to a timeout, and it is too late for the script to check how
many there were.
While at it, also removed ss options we don't need: we only need the
filtering options, to count how many listener sockets have been created.
We don't need to ask ss to display internal TCP information, and the
memory if the output is dropped by the 'wc -l' command anyway.
Fixes: b4b51d36bb ("selftests: mptcp: explicitly trigger the listener diag code-path")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Closes: https://lore.kernel.org/r/20240301063754.2ecefecf@kernel.org
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The test counter 'test_cnt' should not be returned in diag.sh, e.g. what
if only the 4th test fail? Will do 'exit 4' which is 'exit ${KSFT_SKIP}',
the whole test will be marked as skipped instead of 'failed'!
So we should do ret=${KSFT_FAIL} instead.
Fixes: df62f2ec3d ("selftests/mptcp: add diag interface tests")
Cc: stable@vger.kernel.org
Fixes: 42fb6cddec ("selftests: mptcp: more stable diag tests")
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
If drm_kms_helper_poll=n the output poll work will only get scheduled
from drm_helper_probe_single_connector_modes() to handle a delayed
hotplug event. Since polling is disabled the work in this case should
just call drm_kms_helper_hotplug_event() w/o detecting the state of
connectors and rescheduling the work.
After commit d33a54e399 after a delayed hotplug event above the
connectors did get re-detected in the poll work and the work got
re-scheduled periodically (since poll_running is also false if
drm_kms_helper_poll=n), in effect ignoring the drm_kms_helper_poll=n
kernel param.
Fix the above by calling only drm_kms_helper_hotplug_event() for a
delayed hotplug event if drm_kms_helper_hotplug_event=n, as was done
before d33a54e399.
Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Fixes: d33a54e399 ("drm/probe_helper: sort out poll_running vs poll_enabled")
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240301152243.1670573-1-imre.deak@intel.com
If message fills up we need to stop writing. 'break' will
only get us out of the iteration over pools of a single
netdev, we need to also stop walking netdevs.
This results in either infinite dump, or missing pools,
depending on whether message full happens on the last
netdev (infinite dump) or non-last (missing pools).
Fixes: 950ab53b77 ("net: page_pool: implement GET in the netlink API")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
I'm updating __assign_str() and will be removing the second parameter. To
make sure that it does not break anything, I make sure that it matches the
__string() field, as that is where the string is actually going to be
saved in. To make sure there's nothing that breaks, I added a WARN_ON() to
make sure that what was used in __string() is the same that is used in
__assign_str().
In doing this change, an error was triggered as __assign_str() now expects
the string passed in to be a char * value. I instead had the following
warning:
include/trace/events/qdisc.h: In function ‘trace_event_raw_event_qdisc_reset’:
include/trace/events/qdisc.h:91:35: error: passing argument 1 of 'strcmp' from incompatible pointer type [-Werror=incompatible-pointer-types]
91 | __assign_str(dev, qdisc_dev(q));
That's because the qdisc_enqueue() and qdisc_reset() pass in qdisc_dev(q)
to __assign_str() and to __string(). But that function returns a pointer
to struct net_device and not a string.
It appears that these events are just saving the pointer as a string and
then reading it as a string as well.
Use qdisc_dev(q)->name to save the device instead.
Fixes: a34dac0b90 ("net_sched: add tracepoints for qdisc_reset() and qdisc_destroy()")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When not configured for wakeup lis3lv02d_i2c_suspend() will call
lis3lv02d_poweroff() even if the device has already been turned off
by the runtime-suspend handler and if configured for wakeup and
the device is runtime-suspended at this point then it is not turned
back on to serve as a wakeup source.
Before commit b1b9f7a494 ("misc: lis3lv02d_i2c: Add missing setting
of the reg_ctrl callback"), lis3lv02d_poweroff() failed to disable
the regulators which as a side effect made calling poweroff() twice ok.
Now that poweroff() correctly disables the regulators, doing this twice
triggers a WARN() in the regulator core:
unbalanced disables for regulator-dummy
WARNING: CPU: 1 PID: 92 at drivers/regulator/core.c:2999 _regulator_disable
...
Fix lis3lv02d_i2c_suspend() to not call poweroff() a second time if
already runtime-suspended and add a poweron() call when necessary to
make wakeup work.
lis3lv02d_i2c_resume() has similar issues, with an added weirness that
it always powers on the device if it is runtime suspended, after which
the first runtime-resume will call poweron() again, causing the enabled
count for the regulator to increase by 1 every suspend/resume. These
unbalanced regulator_enable() calls cause the regulator to never
be turned off and trigger the following WARN() on driver unbind:
WARNING: CPU: 1 PID: 1724 at drivers/regulator/core.c:2396 _regulator_put
Fix this by making lis3lv02d_i2c_resume() mirror the new suspend().
Fixes: b1b9f7a494 ("misc: lis3lv02d_i2c: Add missing setting of the reg_ctrl callback")
Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Closes: https://lore.kernel.org/regressions/5fc6da74-af0a-4aac-b4d5-a000b39a63a5@molgen.mpg.de/
Cc: stable@vger.kernel.org
Cc: regressions@lists.linux.dev
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Tested-by: Paul Menzel <pmenzel@molgen.mpg.de> # Dell XPS 15 7590
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Link: https://lore.kernel.org/r/20240220190035.53402-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 09896da073 ("arm64: dts: qcom: msm8996: Hook up MPM") has
hooked up the MPM irq chip on the MSM8996 platform. However this causes
my Dragonboard 820c crash during bootup (usually when probing IOMMUs).
Revert the offending commit for now. Quick debug shows that making
tlmm's wakeup-parent point to the MPM is enough to trigger the crash.
Fixes: 09896da073 ("arm64: dts: qcom: msm8996: Hook up MPM")
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Link: https://lore.kernel.org/r/20240221-msm8996-revert-mpm-v1-1-cdca9e30c9b4@linaro.org
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Pull phy fixes from Vinod Koul:
- qcom: m31 pointer err fix, eusb2 fix redundant zero-out loop and v3
offset fix on qmp-usb
- freescale: fix for dphy alias
* tag 'phy-fixes2-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy:
phy: qcom-qmp-usb: fix v3 offsets data
phy: qualcomm: eusb2-repeater: Rework init to drop redundant zero-out loop
phy: qcom: phy-qcom-m31: fix wrong pointer pass to PTR_ERR()
phy: freescale: phy-fsl-imx8-mipi-dphy: Fix alias name to use dashes
Pull dmaengine fixes from Vinod Koul:
- dw-edma fixes to improve driver and remote HDMA setup
- fsl-edma fixes for SoC hange, irq init and byte calculations and
sparse fixes
- idxd: safe user copy of completion record fix
- ptdma: consistent DMA mask fix
* tag 'dmaengine-fix2-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine:
dmaengine: ptdma: use consistent DMA masks
dmaengine: fsl-qdma: add __iomem and struct in union to fix sparse warning
dmaengine: idxd: Ensure safe user copy of completion record
dmaengine: fsl-edma: correct max_segment_size setting
dmaengine: idxd: Remove shadow Event Log head stored in idxd
dmaengine: fsl-edma: correct calculation of 'nbytes' in multi-fifo scenario
dmaengine: fsl-qdma: init irq after reg initialization
dmaengine: fsl-qdma: fix SoC may hang on 16 byte unaligned read
dmaengine: dw-edma: eDMA: Add sync read before starting the DMA transfer in remote setup
dmaengine: dw-edma: HDMA: Add sync read before starting the DMA transfer in remote setup
dmaengine: dw-edma: Add HDMA remote interrupt configuration
dmaengine: dw-edma: HDMA_V0_REMOTEL_STOP_INT_EN typo fix
dmaengine: dw-edma: Fix wrong interrupt bit set for HDMA
dmaengine: dw-edma: Fix the ch_count hdma callback
Pull powerpc fixes from Michael Ellerman:
- Fix IOMMU table initialisation when doing kdump over SR-IOV
- Fix incorrect RTAS function name for resetting TCE tables
- Fix fpu_signal selftest failures since a recent change
Thanks to Gaurav Batra and Nathan Lynch.
* tag 'powerpc-6.8-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
selftests/powerpc: Fix fpu_signal failures
powerpc/rtas: use correct function name for resetting TCE tables
powerpc/pseries/iommu: IOMMU table is not initialized for kdump over SR-IOV
Pull x86 fixes from Borislav Petkov:
- Do not reserve SETUP_RNG_SEED setup data in the e820 map as it should
be used by kexec only
- Make sure MKTME feature detection happens at an earlier time in the
boot process so that the physical address size supported by the CPU
is properly corrected and MTRR masks are programmed properly, leading
to TDX systems booting without disable_mtrr_cleanup on the cmdline
- Make sure the different address sizes supported by the CPU are read
out as early as possible
* tag 'x86_urgent_for_v6.8_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/e820: Don't reserve SETUP_RNG_SEED in e820
x86/cpu/intel: Detect TME keyid bits before setting MTRR mask registers
x86/cpu: Allow reducing x86_phys_bits during early_identify_cpu()
Pull firewire fixes from Takashi Sakamoto:
"A workaround to suppress the continuous bus resets in the case that
older devices are connected to the modern 1394 OHCI hardware and
devices
In IEEE 1394 Amendment (IEEE 1394a-2000), the short bus reset is added
to resolve the shortcomings of the long bus reset in IEEE 1394-1995.
However, it is well-known that the solution is not necessarily
effective in the mixing environment that both IEEE 1394-1995 PHY and
IEEE 1394a-2000 (or later) PHY exist, as described in section 8.4.6.2
of IEEE 1394a-2000.
The current implementation of firewire stack schedules the short bus
reset when attempting to resolve the mismatch of gap count in the
certain generation of bus topology. It can cause the continuous bus
reset in the issued environment.
The workaround simply uses the long bus reset instead of the short bus
reset. It is desirable to detect whether the issued environment or
not. However, the way to access PHY registers from remote note is
firstly defined in IEEE 1394a-2000, thus it is not available in the
case"
* tag 'firewire-fixes-6.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
firewire: core: use long bus reset on gap count error
We accidently met the issue that the bash prompt is not shown after the
previous command done and until the next input if there's only one CPU
(In our issue other CPUs are isolated by isolcpus=). Further analysis
shows it's because the port entering runtime suspend even if there's
still pending chars in the buffer and the pending chars will only be
processed in next device resuming. We are using amba-pl011 and the
problematic flow is like below:
Bash kworker
tty_write()
file_tty_write()
n_tty_write()
uart_write()
__uart_start()
pm_runtime_get() // wakeup waker
queue_work()
pm_runtime_work()
rpm_resume()
status = RPM_RESUMING
serial_port_runtime_resume()
port->ops->start_tx()
pl011_tx_chars()
uart_write_wakeup()
[…]
__uart_start()
pm_runtime_get() < 0 // because runtime status = RPM_RESUMING
// later data are not commit to the port driver
status = RPM_ACTIVE
rpm_idle() -> rpm_suspend()
This patch tries to fix this by checking the port busy before entering
runtime suspending. A runtime_suspend callback is added for the port
driver. When entering runtime suspend the callback is invoked, if there's
still pending chars in the buffer then flush the buffer.
Fixes: 84a9582fd2 ("serial: core: Start managing serial controllers to enable runtime PM")
Cc: stable <stable@kernel.org>
Reviewed-by: Tony Lindgren <tony@atomide.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Link: https://lore.kernel.org/r/20240226152351.40924-1-yangyicong@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When userspace opens the console, we call set_termios() passing a
termios with the console's configured baud rate. Currently this causes
dw8250_set_termios() to disable and then re-enable the UART clock at
the same frequency as it was originally. This can cause corruption
of any concurrent console output. Fix it by skipping the reclocking
if we are already at the correct rate.
Signed-off-by: Peter Collingbourne <pcc@google.com>
Fixes: 4e26b134bd ("serial: 8250_dw: clock rate handling for all ACPI platforms")
Cc: stable@vger.kernel.org
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20240222192635.1050502-1-pcc@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When about to transmit the function imx_uart_start_tx is called and in
some RS485 configurations this function will call imx_uart_stop_rx. The
problem is that imx_uart_stop_rx will enable loopback in order to
release the RS485 bus, but when loopback is enabled transmitted data
will just be looped to RX.
This patch fixes the above problem by not enabling loopback when about
to transmit.
This driver now works well when used for RS485 half duplex master
configurations.
Fixes: 79d0224f6b ("tty: serial: imx: Handle RS485 DE signal active high")
Cc: stable <stable@kernel.org>
Signed-off-by: Rickard x Andersson <rickaran@axis.com>
Tested-by: Christoph Niedermaier <cniedermaier@dh-electronics.com>
Link: https://lore.kernel.org/r/20240221115304.509811-1-rickaran@axis.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jonathan writes:
IIO: 2nd set of fixes for the 6.8 cycle.
Given this is very late these can wait for the 6.9 cycle if you would
prefer.
adi,adxl367
- Sleep for 15ms after reset to avoid reading before the device is awake.
- Fix FIFO register address.
asc,dlhl60d
- Avoid uninitialized data leak to user-space. Also suppress a false
positive clang warning by refactoring a loop.
bosch,bmp280
- Fix missing extra byte in SPI reads from BMP38x and BMP390 parts
invensense,mpu6050
- Fix handing of empty FIFO which can happen due to a race condition.
- Make sure frequency can be updated more than once when the FIFO is not
enabled.
* tag 'iio-fixes-for-6.8b' of https://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio:
iio: accel: adxl367: fix I2C FIFO data register
iio: accel: adxl367: fix DEVID read after reset
iio: pressure: dlhl60d: Initialize empty DLH bytes
iio: imu: inv_mpu6050: fix frequency setting when chip is off
iio: pressure: Fixes BMP38x and BMP390 SPI support
iio: imu: inv_mpu6050: fix FIFO parsing when empty
William writes:
First set of Counter fixes for 6.8
One fix to ensure private data in struct counter_device_allochelper has
minimum alignment for safe DMA operations.
* tag 'counter-fixes-for-6.8b' of https://git.kernel.org/pub/scm/linux/kernel/git/wbg/counter:
counter: fix privdata alignment
Mika writes:
thunderbolt: Fix for v6.8-rc7
This includes one USB4/Thunderbolt fix for v6.8-rc7:
- Fix NULL pointer dereference in tb_port_update_credits() on
Apple Thunderbolt 1 hardware.
This has been in linux-next with no reported issues.
* tag 'thunderbolt-for-v6.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/westeri/thunderbolt:
thunderbolt: Fix NULL pointer dereference in tb_port_update_credits()
Pull xfs fix from Chandan Babu:
"Drop experimental warning message when mounting an xfs filesystem on
an fsdax device. We now consider xfs on fsdax to be stable"
* tag 'xfs-6.8-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: drop experimental warning for FSDAX
Pull gpio fixes from Bartosz Golaszewski:
- fix resource freeing ordering in error path when adding a GPIO chip
- only set pins to output after the reset is complete in gpio-74x164
* tag 'gpio-fixes-for-v6.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpio: fix resource unwinding order in error path
gpiolib: Fix the error path order in gpiochip_add_data_with_key()
gpio: 74x164: Enable output pins after registers are reset
In commit 19416123ab ("block: define 'struct bvec_iter' as packed"),
what we need is to save the 4byte padding, and avoid `bio` to spread on
one extra cache line.
It is enough to define it as '__packed __aligned(4)', as '__packed'
alone means byte aligned, and can cause compiler to generate horrible
code on architectures that don't support unaligned access in case that
bvec_iter is embedded in other structures.
Cc: Mikulas Patocka <mpatocka@redhat.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Fixes: 19416123ab ("block: define 'struct bvec_iter' as packed")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull SCSI fixes from James Bottomley:
"Two small fixes, all in drivers (the more obsolete mpt3sas and the
newer mpi3mr)"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: mpt3sas: Prevent sending diag_reset when the controller is ready
scsi: mpi3mr: Reduce stack usage in mpi3mr_refresh_sas_ports()
The NAPI poll context is a softirq context. Do not use normal spinlock API
in this context to prevent concurrency issues.
Fixes: 3178308ad4 ("net/mlx5e: Make tx_port_ts logic resilient to out-of-order CQEs")
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
CC: Vadim Fedorenko <vadfed@meta.com>
Just simply reordering the functions mlx5e_ptp_metadata_map_put and
mlx5e_ptpsq_track_metadata in the mlx5e_txwqe_complete context is not good
enough since both the compiler and CPU are free to reorder these two
functions. If reordering does occur, the issue that was supposedly fixed by
7e3f3ba97e ("net/mlx5e: Track xmit submission to PTP WQ after populating
metadata map") will be seen. This will lead to NULL pointer dereferences in
mlx5e_ptpsq_mark_ts_cqes_undelivered in the NAPI polling context due to the
tracking list being populated before the metadata map.
Fixes: 7e3f3ba97e ("net/mlx5e: Track xmit submission to PTP WQ after populating metadata map")
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
CC: Vadim Fedorenko <vadfed@meta.com>
The packet number attribute of the SA is incremented by the device rather
than the software stack when enabling hardware offload. Because the packet
number attribute is managed by the hardware, the software has no insight
into the value of the packet number attribute actually written by the
device.
Previously when MACsec offload was enabled, the hardware object for
handling the offload was destroyed when the SA was disabled. Re-enabling
the SA would lead to a new hardware object being instantiated. This new
hardware object would not have any recollection of the correct packet
number for the SA. Instead, destroy the flow steering rule when
deactivating the SA and recreate it upon reactivation, preserving the
original hardware object.
Fixes: 8ff0ac5be1 ("net/mlx5: Add MACsec offload Tx command support")
Signed-off-by: Emeel Hakim <ehakim@nvidia.com>
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Downgrade the print from mlx5_core_warn() to mlx5_core_dbg(), as it
is just a statement of fact that firmware doesn't support ignore flow
level.
And change the wording to "firmware flow level support is missing", to
make it more accurate.
Fixes: ae2ee3be99 ("net/mlx5: CT: Remove warning of ignore_flow_level support for VFs")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Suggested-by: Elliott, Robert (Servers) <elliott@hpe.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Functions which can't access MFRL (Management Firmware Reset Level)
register, have no use of fw_reset structures or events. Remove fw_reset
structures allocation and registration for fw reset events notifications
for these functions.
Having the devlink param enable_remote_dev_reset on functions that don't
have this capability is misleading as these functions are not allowed to
influence the reset flow. Hence, this patch removes this parameter for
such functions.
In addition, return not supported on devlink reload action fw_activate
for these functions.
Fixes: 38b9f903f2 ("net/mlx5: Handle sync reset request event")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Aya Levin <ayal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Restore fw reporter diagnose to print the syndrome even if it is zero.
Following the cited commit, in this case (syndrome == 0) command returns no
output at all.
This fix restores command output in case syndrome is cleared:
$ devlink health diagnose pci/0000:82:00.0 reporter fw
Syndrome: 0
Fixes: d17f98bf7c ("net/mlx5: devlink health: use retained error fmsg API")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The checking in the cited commit is not accurate. In the common case,
VF destination is internal, and uplink destination is external.
However, uplink destination with packet reformat is considered as
internal because firmware uses LB+hairpin to support it. Update the
checking so header rewrite rules with both internal and external
destinations are not allowed.
Fixes: e0e22d59b4 ("net/mlx5: E-switch, Add checking for flow rule destinations")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
This reverts commit 4e25b661f4.
This Commit was mistakenly applied by pulling the wrong tag, remove it.
Fixes: 4e25b661f4 ("net/mlx5e: Check the number of elements before walk TC rhashtable")
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
This reverts commit 662404b24a.
The revert is required due to the suspicion it is not good for anything
and cause crash.
Fixes: 662404b24a ("net/mlx5e: Block entering switchdev mode with ns inconsistency")
Signed-off-by: Gavin Li <gavinl@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Pull power supply fixes from Sebastian Reichel:
- Kconfig dependency fix
- bq27xxx-i2c: do not free non-existing IRQ
* tag 'for-v6.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply:
power: supply: bq27xxx-i2c: Do not free non existing IRQ
power: supply: mm8013: select REGMAP_I2C
Pull iommufd fixes from Jason Gunthorpe:
"Four syzkaller found bugs:
- Corruption during error unwind in iommufd_access_change_ioas()
- Overlapping IDs in the test suite due to out of order destruction
- Missing locking for access->ioas in the test suite
- False failures in the test suite validation logic with huge pages"
* tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd:
iommufd/selftest: Don't check map/unmap pairing with HUGE_PAGES
iommufd: Fix protection fault in iommufd_test_syz_conv_iova
iommufd/selftest: Fix mock_dev_num bug
iommufd: Fix iopt_access_list_id overwrite bug
Pull devicetree fix from Rob Herring:
"One fix for a bug in fw_devlink handling of OF graph. This doesn't
completely fix the reported problems, but it's with users adding out
of tree code"
* tag 'devicetree-fixes-for-6.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
of: property: fw_devlink: Fix stupid bug in remote-endpoint parsing
Pull RISC-V fixes from Palmer Dabbelt:
- detect ".option arch" support on not-yet-released LLVM builds
- fix missing TLB flush when modifying non-leaf PTEs
- fixes for T-Head custom extensions
- fix for systems with the legacy PMU, that manifests as a crash on
kernels built without SBI PMU support
- fix for systems that clear *envcfg on suspend, which manifests as
cbo.zero trapping after resume
- fixes for Svnapot systems, including removing Svnapot support for
huge vmalloc/vmap regions
* tag 'riscv-for-linus-6.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: Sparse-Memory/vmemmap out-of-bounds fix
riscv: Fix pte_leaf_size() for NAPOT
Revert "riscv: mm: support Svnapot in huge vmap"
riscv: Save/restore envcfg CSR during CPU suspend
riscv: Add a custom ISA extension for the [ms]envcfg CSR
riscv: Fix enabling cbo.zero when running in M-mode
perf: RISCV: Fix panic on pmu overflow handler
MAINTAINERS: Update SiFive driver maintainers
drivers: perf: ctr_get_width function for legacy is not defined
drivers: perf: added capabilities for legacy PMU
RISC-V: Ignore V from the riscv,isa DT property on older T-Head CPUs
riscv: Fix build error if !CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION
riscv: mm: fix NOCACHE_THEAD does not set bit[61] correctly
riscv: add CALLER_ADDRx support
RISC-V: Drop invalid test from CONFIG_AS_HAS_OPTION_ARCH
kbuild: Add -Wa,--fatal-warnings to as-instr invocation
riscv: tlb: fix __p*d_free_tlb()
Pull ceph fix from Ilya Dryomov:
"Catch up with mdsmap encoding rectification which ended up being
necessary after all to enable cluster upgrades from problematic
v18.2.0 and v18.2.1 releases"
* tag 'ceph-for-6.8-rc7' of https://github.com/ceph/ceph-client:
ceph: switch to corrected encoding of max_xattr_size in mdsmap
Pull btrfs fixes from David Sterba:
- fix freeing allocated id for anon dev when snapshot creation fails
- fiemap fixes:
- followup for a recent deadlock fix, ranges that fiemap can access
can still race with ordered extent completion
- make sure fiemap with SYNC flag does not race with writes
* tag 'for-6.8-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix double free of anonymous device after snapshot creation failure
btrfs: ensure fiemap doesn't race with writes when FIEMAP_FLAG_SYNC is given
btrfs: fix race between ordered extent completion and fiemap
Pull vfs fixes from Christian Brauner:
"Two small fixes:
- Fix an endless loop during afs directory iteration caused by not
skipping silly-rename files correctly.
- Fix reporting of completion events for aio causing leaks in
userspace. This is based on the fix last week as it's now possible
to recognize aio events submitted through the old aio interface"
* tag 'vfs-6.8-rc7.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
fs/aio: Make io_cancel() generate completions again
afs: Fix endless loop in directory parsing
Pull iommu fix from Joerg Roedel:
- Fix SVA handle sharing in multi device case
* tag 'iommu-fix-v6.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/sva: Fix SVA handle sharing in multi device case
Pull EFI fixes from Ard Biesheuvel:
"Only the EFI variable name size change is significant, and will be
backported once it lands. The others are cleanup.
- Fix phys_addr_t size confusion in 32-bit capsule loader
- Reduce maximum EFI variable name size to 512 to work around buggy
firmware
- Drop some redundant code from efivarfs while at it"
* tag 'efi-fixes-for-v6.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
efivarfs: Drop 'duplicates' bool parameter on efivar_init()
efivarfs: Drop redundant cleanup on fill_super() failure
efivarfs: Request at most 512 bytes for variable names
efi/capsule-loader: fix incorrect allocation size
Pull sound fixes from Takashi Iwai:
"The amount of changes wasn't as small as wished, but all reasonably
small fixes. There is a PCM core API change, which is for correcting
the behavior change we took in 6.8. The rest are device-specific fixes
for ASoC AMD, Qualcomm, Cirrus codecs, HD-audio quirks & co"
* tag 'sound-6.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ASoC: amd: yc: Fix non-functional mic on Lenovo 21J2
ASoC: amd: yc: add new YC platform variant (0x63) support
ALSA: hda/realtek - ALC285 reduce pop noise from Headphone port
ASoC: amd: yc: Add Lenovo ThinkBook 21J0 into DMI quirk table
ALSA: hda/realtek: Add special fixup for Lenovo 14IRP8
ASoC: soc-card: Fix missing locking in snd_soc_card_get_kcontrol()
ALSA: hda/realtek: tas2781: enable subwoofer volume control
ALSA: pcm: clarify and fix default msbits value for all formats
ASoC: qcom: Fix uninitialized pointer dmactl
ALSA: hda/realtek: fix mute/micmute LED For HP mt440
ALSA: Drop leftover snd-rtctimer stuff from Makefile
ALSA: ump: Fix the discard error code from snd_ump_legacy_open()
ALSA: hda/realtek: Enable Mute LED on HP 840 G8 (MB 8AB8)
ASoC: cs35l56: Must clear HALO_STATE before issuing SYSTEM_RESET
ALSA: hda/realtek: Fix top speaker connection on Dell Inspiron 16 Plus 7630
ALSA: firewire-lib: fix to check cycle continuity
Pull drm fixes from Dave Airlie:
"Bunch of fixes, xe, amdgpu, nouveau and tegra all have a few. Then
drm/bridge including some drivers/soc fallout fixes. The biggest thing
in here is a new unit test for some buddy allocator fixes, otherwise a
misc fbcon, ttm unit test and one msm revert.
Seems pretty normal for this stage.
buddy:
- two allocation fixes + unit test
fbcon:
- font restore syzkaller fix
ttm:
- kunit test fix
bridge:
- fix aux-hpd leaks
- fix aux-hpd registration
- fix use after free in soc/qcom
- fix boot on soc/qcom
xe:
- A couple of tracepoint updates from Priyanka and Lucas
- Make sure BINDs are completed before accepting UNBINDs on LR vms
- Don't arbitrarily restrict max number of batched binds
- Add uapi for dumpable bos (agreed on IRC)
- Remove unused uapi flags and a leftover comment
- A couple of fixes related to the execlist backend
msm:
- DP: Revert a change which was causing a HDP regression
amdgpu:
- Fix potential buffer overflow
- Fix power min cap
- Suspend/resume fix
- SI PM fix
- eDP fix
nouveau:
- fix a misreported VRAM sizing
- fix a regression in suspend/resume due to freeing
tegra:
- host1x reset fix
- only remove existing driver if display is possible"
* tag 'drm-fixes-2024-03-01' of https://gitlab.freedesktop.org/drm/kernel: (32 commits)
drm/nouveau: keep DMA buffers required for suspend/resume
nouveau: report byte usage in VRAM usage
drm/xe/xe_trace: Add move_lacks_source detail to xe_bo_move trace
drm/xe: Deny unbinds if uapi ufence pending
drm/xe: Expose user fence from xe_sync_entry
drm/xe: Use pointers in trace events
drm/xe/xe_bo_move: Enhance xe_bo_move trace
drm/xe/mmio: fix build warning for BAR resize on 32-bit
drm/xe: get rid of MAX_BINDS
drm/xe: Use vmalloc for array of bind allocation in bind IOCTL
drm/xe: Don't support execlists in xe_gt_tlb_invalidation layer
drm/xe: Fix execlist splat
drm/xe/uapi: Remove unused flags
drm/xe/uapi: Remove DRM_XE_VM_BIND_FLAG_ASYNC comment left over
drm/xe: Add uapi for dumpable bos
drm/amd/display: Add monitor patch for specific eDP
Revert "drm/msm/dp: use drm_bridge_hpd_notify() to report HPD status changes"
drm/tests/drm_buddy: add alloc_range_bias test
drm/buddy: check range allocation matches alignment
drm/buddy: fix range bias
...
Pull fprobe fix from Masami Hiramatsu:
- allocate entry_data_size buffer for each rethook instance.
This fixes a buffer overrun bug (which leads a kernel crash)
when fprobe user uses its entry_data in the entry_handler.
* tag 'probes-fixes-v6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
fprobe: Fix to allocate entry_data_size buffer with rethook instances
SETUP_RNG_SEED in setup_data is supplied by kexec and should
not be reserved in the e820 map.
Doing so reserves 16 bytes of RAM when booting with kexec.
(16 bytes because data->len is zeroed by parse_setup_data so only
sizeof(setup_data) is reserved.)
When kexec is used repeatedly, each boot adds two entries in the
kexec-provided e820 map as the 16-byte range splits a larger
range of usable memory. Eventually all of the 128 available entries
get used up. The next split will result in losing usable memory
as the new entries cannot be added to the e820 map.
Fixes: 68b8e9713c ("x86/setup: Use rng seeds from setup_data")
Signed-off-by: Jiri Bohac <jbohac@suse.cz>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: <stable@kernel.org>
Link: https://lore.kernel.org/r/ZbmOjKnARGiaYBd5@dwarf.suse.cz
During VSI reconfiguration filters and VSI config which is set in
ice_vf_init_host_cfg() are lost. Recall the host configuration function
to restore them.
Without this config VF on which MSI-X amount was changed might had a
connection problems.
Fixes: 4d38cb44bd ("ice: manage VFs MSI-X using resource tracking")
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
ice_qp_dis() currently does things in very mixed way. Tx is stopped
before disabling IRQ on related queue vector, then it takes care of
disabling Rx and finally NAPI is disabled.
Let us start with disabling IRQs in the first place followed by turning
off NAPI. Then it is safe to handle queues.
One subtle change on top of that is that even though ice_qp_ena() looks
more sane, clear ICE_CFG_BUSY as the last thing there.
Fixes: 2d4238f556 ("ice: Add support for AF_XDP")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel)
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Disable NAPI before shutting down queues that this particular NAPI
contains so that the order of actions in i40e_queue_pair_disable()
mirrors what we do in i40e_queue_pair_enable().
Fixes: 123cecd427 ("i40e: added queue pair disable/enable functions")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel)
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Currently routines that are supposed to toggle state of ring pair do not
take care of associated interrupt with queue vector that these rings
belong to. This causes funky issues such as dead interface due to irq
misconfiguration, as per Pavel's report from Closes: tag.
Add a function responsible for disabling single IRQ in EIMC register and
call this as a very first thing when disabling ring pair during xsk_pool
setup. For enable let's reuse ixgbe_irq_enable_queues(). Besides this,
disable/enable NAPI as first/last thing when dealing with closing or
opening ring pair that xsk_pool is being configured on.
Reported-by: Pavel Vazharov <pavel@x3me.net>
Closes: https://lore.kernel.org/netdev/CAJEV1ijxNyPTwASJER1bcZzS9nMoZJqfR86nu_3jFFVXzZQ4NA@mail.gmail.com/
Fixes: 024aa5800f ("ixgbe: added Rx/Tx ring disable/enable functions")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Limit the WiFi PCIe link speed to Gen2 speed (500 MB/s), which is the
speed that the boot firmware has brought up the link at (and that
Windows uses).
This is specifically needed to avoid a large amount of link errors when
restarting the link during boot (but which are currently not reported).
This also appears to fix intermittent failures to download the ath11k
firmware during boot which can be seen when there is a longer delay
between restarting the link and loading the WiFi driver (e.g. when using
full disk encryption).
Fixes: 123b30a756 ("arm64: dts: qcom: sc8280xp-x13s: enable WiFi controller")
Cc: stable@vger.kernel.org # 6.2
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20240223152124.20042-8-johan+linaro@kernel.org
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
My recent commit e5d00aaac6 ("selftests/powerpc: Check all FPRs in
fpu_preempt") inadvertently broke the fpu_signal test.
It needs to take into account that fpu_preempt now loads 32 FPRs, so
enlarge darray.
Also use the newly added randomise_darray() to properly randomise darray.
Finally the checking done in signal_fpu_sig() needs to skip checking
f30/f31, because they are used as scratch registers in check_all_fprs(),
called by preempt_fpu(), and so could hold other values when the signal
is taken.
Fixes: e5d00aaac6 ("selftests/powerpc: Check all FPRs in fpu_preempt")
Reported-by: Spoorthy <spoorthy@linux.ibm.com>
Depends-on: 2ba107f679 ("selftests/powerpc: Generate better bit patterns for FPU tests")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240301101035.1230024-1-mpe@ellerman.id.au
In azx_probe_codecs function, when bus->codec_mask is becomes to 0(no codecs),
execute azx_init_chip, bus->codec_mask will be initialized to a value again,
this causes snd_hda_codec_new function to run, the process is as follows:
-->snd_hda_codec_new
-->snd_hda_codec_device_init
-->snd_hdac_device_init---snd_hdac_read_parm(...AC_PAR_VENDOR_ID) 2s
---snd_hdac_read_parm(...AC_PAR_VENDOR_ID) 2s
---snd_hdac_read_parm(...AC_PAR_SUBSYSTEM_ID) 2s
---snd_hdac_read_parm(...AC_PAR_REV_ID) 2s
---snd_hdac_read_parm(...AC_PAR_NODE_COUNT) 2s
when no codecs, read communication is error, each command will be polled for
2 second, a total of 10s, it is easy to some problem.
like this:
2 [ 14.833404][ 6] [ T164] hda 0006:00: Codec #0 probe error; disabling it...
3 [ 14.844178][ 6] [ T164] hda 0006:00: codec_mask = 0x1
4 [ 14.880532][ 6] [ T164] hda 0006:00: too slow response, last cmd=0x0f0000
5 [ 15.891988][ 6] [ T164] hda 0006:00: too slow response, last cmd=0x0f0000
6 [ 16.978090][ 6] [ T164] hda 0006:00: too slow response, last cmd=0x0f0001
7 [ 18.140895][ 6] [ T164] hda 0006:00: too slow response, last cmd=0x0f0002
8 [ 19.135516][ 6] [ T164] hda 0006:00: too slow response, last cmd=0x0f0004
10 [ 19.900086][ 6] [ T164] hda 0006:00: no codecs initialized
11 [ 45.573398][ 2] [ C2] watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [kworker/2:0:25]
Here, when bus->codec_mask is 0, use a direct break to avoid execute snd_hda_codec_new function.
Signed-off-by: songxiebing <songxiebing@kylinos.cn>
Link: https://lore.kernel.org/r/20240301011841.7247-1-soxiebing@163.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
It is now possible to disable BQL, but that causes the cpsw driver to break:
drivers/net/ethernet/ti/am65-cpsw-nuss.c:297:28: error: no member named 'dql' in 'struct netdev_queue'
297 | dql_avail(&netif_txq->dql),
There is already a helper function in net/sch_generic.h that could
be used to help here. Move its implementation into the common
linux/netdevice.h along with the other bql interfaces and change
both users over to the new interface.
Fixes: ea7f3cfaa5 ("net: bql: allow the config to be disabled")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Current driver has some asymmetry in the runtime PM calls. On lan78xx_open()
it will call usb_autopm_get() and unconditionally usb_autopm_put(). And
on lan78xx_stop() it will call only usb_autopm_put(). So far, it was
working only because this driver do not activate autosuspend by default,
so it was visible only by warning "Runtime PM usage count underflow!".
Since, with current driver, we can't use runtime PM with active link,
execute lan78xx_open()->usb_autopm_put() only in error case. Otherwise,
keep ref counting high as long as interface is open.
Fixes: 55d7de9de6 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet device driver")
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The internal delay properties are not mandatory and should have a
documented default value. The device only supports either no delay or a
fixed delay and the device reset default is no delay, document the
default as no delay.
Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
In a CoCo VM, when transitioning memory from encrypted to decrypted, or
vice versa, the caller of set_memory_encrypted() or set_memory_decrypted()
is responsible for ensuring the memory isn't in use and isn't referenced
while the transition is in progress. The transition has multiple steps,
and the memory is in an inconsistent state until all steps are complete.
A reference while the state is inconsistent could result in an exception
that can't be cleanly fixed up.
However, the kernel load_unaligned_zeropad() mechanism could cause a stray
reference that can't be prevented by the caller of set_memory_encrypted()
or set_memory_decrypted(), so there's specific code to handle this case.
But a CoCo VM running on Hyper-V may be configured to run with a paravisor,
with the #VC or #VE exception routed to the paravisor. There's no
architectural way to forward the exceptions back to the guest kernel, and
in such a case, the load_unaligned_zeropad() specific code doesn't work.
To avoid this problem, mark pages as "not present" while a transition
is in progress. If load_unaligned_zeropad() causes a stray reference, a
normal page fault is generated instead of #VC or #VE, and the
page-fault-based fixup handlers for load_unaligned_zeropad() resolve the
reference. When the encrypted/decrypted transition is complete, mark the
pages as "present" again.
Signed-off-by: Michael Kelley <mhklinux@outlook.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Link: https://lore.kernel.org/r/20240116022008.1023398-4-mhklinux@outlook.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Message-ID: <20240116022008.1023398-4-mhklinux@outlook.com>
In preparation for temporarily marking pages not present during a
transition between encrypted and decrypted, use slow_virt_to_phys()
in the hypervisor callback. As long as the PFN is correct,
slow_virt_to_phys() works even if the leaf PTE is not present.
The existing functions that depend on vmalloc_to_page() all
require that the leaf PTE be marked present, so they don't work.
Update the comments for slow_virt_to_phys() to note this broader usage
and the requirement to work even if the PTE is not marked present.
Signed-off-by: Michael Kelley <mhklinux@outlook.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Link: https://lore.kernel.org/r/20240116022008.1023398-2-mhklinux@outlook.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Message-ID: <20240116022008.1023398-2-mhklinux@outlook.com>
create_gpadl_header() creates a message header, and one or more message
bodies if the number of GPADL entries exceeds what fits in the
header. Currently the code for creating the message header is
duplicated in the two halves of the main "if" statement governing
whether message bodies are created.
Eliminate the duplication by making minor tweaks to the logic and
associated comments. While here, simplify the handling of memory
allocation errors, and use umin() instead of open coding it.
For ease of review, the indentation of sizable chunks of code is
*not* changed. A follow-on patch updates only the indentation.
No functional change.
Suggested-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Michael Kelley <mhklinux@outlook.com>
Link: https://lore.kernel.org/r/20240111165451.269418-1-mhklinux@outlook.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Message-ID: <20240111165451.269418-1-mhklinux@outlook.com>
A recent commit removing the use of screen_info introduced a logic
error. The error causes hvfb_getmem() to always return -ENOMEM
for Generation 2 VMs. As a result, the Hyper-V frame buffer
device fails to initialize. The error was introduced by removing
an "else if" clause, leaving Gen2 VMs to always take the -ENOMEM
error path.
Fix the problem by removing the error path "else" clause. Gen 2
VMs now always proceed through the MMIO memory allocation code,
but with "base" and "size" defaulting to 0.
Fixes: 0aa0838c84 ("fbdev/hyperv_fb: Remove firmware framebuffers with aperture helpers")
Signed-off-by: Michael Kelley <mhklinux@outlook.com>
Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Saurabh Sengar <ssengar@linux.microsoft.com>
Link: https://lore.kernel.org/r/20240201060022.233666-1-mhklinux@outlook.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Message-ID: <20240201060022.233666-1-mhklinux@outlook.com>
The VMBUS_RING_SIZE macro adds space for a ring buffer header to the
requested ring buffer size. The header size is always 1 page, and so
its size varies based on the PAGE_SIZE for which the kernel is built.
If the requested ring buffer size is a large power-of-2 size and the header
size is small, the resulting size is inefficient in its use of memory.
For example, a 512 Kbyte ring buffer with a 4 Kbyte page size results in
a 516 Kbyte allocation, which is rounded to up 1 Mbyte by the memory
allocator, and wastes 508 Kbytes of memory.
In such situations, the exact size of the ring buffer isn't that important,
and it's OK to allocate the 4 Kbyte header at the beginning of the 512
Kbytes, leaving the ring buffer itself with just 508 Kbytes. The memory
allocation can be 512 Kbytes instead of 1 Mbyte and nothing is wasted.
Update VMBUS_RING_SIZE to implement this approach for "large" ring buffer
sizes. "Large" is somewhat arbitrarily defined as 8 times the size of
the ring buffer header (which is of size PAGE_SIZE). For example, for
4 Kbyte PAGE_SIZE, ring buffers of 32 Kbytes and larger use the first
4 Kbytes as the ring buffer header. For 64 Kbyte PAGE_SIZE, ring buffers
of 512 Kbytes and larger use the first 64 Kbytes as the ring buffer
header. In both cases, smaller sizes add space for the header so
the ring size isn't reduced too much by using part of the space for
the header. For example, with a 64 Kbyte page size, we don't want
a 128 Kbyte ring buffer to be reduced to 64 Kbytes by allocating half
of the space for the header. In such a case, the memory allocation
is less efficient, but it's the best that can be done.
While the new algorithm slightly changes the amount of space allocated
for ring buffers by drivers that use VMBUS_RING_SIZE, the devices aren't
known to be sensitive to small changes in ring buffer size, so there
shouldn't be any effect.
Fixes: c1135c7fd0 ("Drivers: hv: vmbus: Introduce types of GPADL")
Fixes: 6941f67ad3 ("hv_netvsc: Calculate correct ring size when PAGE_SIZE is not 4 Kbytes")
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218502
Cc: stable@vger.kernel.org
Signed-off-by: Michael Kelley <mhklinux@outlook.com>
Reviewed-by: Saurabh Sengar <ssengar@linux.microsoft.com>
Reviewed-by: Dexuan Cui <decui@microsoft.com>
Tested-by: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>
Link: https://lore.kernel.org/r/20240229004533.313662-1-mhklinux@outlook.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Message-ID: <20240229004533.313662-1-mhklinux@outlook.com>
After shuffling the code, error path wasn't updated correctly.
Fix it here.
Fixes: 2f4133bb5f ("gpiolib: No need to call gpiochip_remove_pin_ranges() twice")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Chip outputs are enabled[1] before actual reset is performed[2] which might
cause pin output value to flip flop if previous pin value was set to 1.
Fix that behavior by making sure chip is fully reset before all outputs are
enabled.
Flip-flop can be noticed when module is removed and inserted again and one of
the pins was changed to 1 before removal. 100 microsecond flipping is
noticeable on oscilloscope (100khz SPI bus).
For a properly reset chip - output is enabled around 100 microseconds (on 100khz
SPI bus) later during probing process hence should be irrelevant behavioral
change.
Fixes: 7ebc194d0f (gpio: 74x164: Introduce 'enable-gpios' property)
Link: https://elixir.bootlin.com/linux/v6.7.4/source/drivers/gpio/gpio-74x164.c#L130 [1]
Link: https://elixir.bootlin.com/linux/v6.7.4/source/drivers/gpio/gpio-74x164.c#L150 [2]
Signed-off-by: Arturas Moskvinas <arturas.moskvinas@gmail.com>
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Nouveau deallocates a few buffers post GPU init which are required for GPU suspend/resume to function correctly.
This is likely not as big an issue on systems where the NVGPU is the only GPU, but on multi-GPU set ups it leads to a regression where the kernel module errors and results in a system-wide rendering freeze.
This commit addresses that regression by moving the two buffers required for suspend and resume to be deallocated at driver unload instead of post init.
Fixes: 042b5f8384 ("drm/nouveau: fix several DMA buffer leaks")
Signed-off-by: Sid Pranjale <sidpranjale127@protonmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Turns out usage is always in bytes not shifted.
Fixes: 72fa02fdf8 ("nouveau: add an ioctl to report vram usage")
Signed-off-by: Dave Airlie <airlied@redhat.com>
UAPI Changes:
- A couple of tracepoint updates from Priyanka and Lucas.
- Make sure BINDs are completed before accepting UNBINDs on LR vms.
- Don't arbitrarily restrict max number of batched binds.
- Add uapi for dumpable bos (agreed on IRC).
- Remove unused uapi flags and a leftover comment.
Driver Changes:
- A couple of fixes related to the execlist backend.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/ZeCBg4MA2hd1oggN@fedora
A reset fix for host1x, a resource leak fix and a probe fix for aux-hpd,
a use-after-free fix and a boot fix for a pmic_glink qcom driver in
drivers/soc, a fix for the simpledrm/tegra transition, a kunit fix for
the TTM tests, a font handling fix for fbcon, two allocation fixes and a
kunit test to cover them for drm/buddy
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maxime Ripard <mripard@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240229-angelic-adorable-teal-fbfabb@houat
When creating a snapshot we may do a double free of an anonymous device
in case there's an error committing the transaction. The second free may
result in freeing an anonymous device number that was allocated by some
other subsystem in the kernel or another btrfs filesystem.
The steps that lead to this:
1) At ioctl.c:create_snapshot() we allocate an anonymous device number
and assign it to pending_snapshot->anon_dev;
2) Then we call btrfs_commit_transaction() and end up at
transaction.c:create_pending_snapshot();
3) There we call btrfs_get_new_fs_root() and pass it the anonymous device
number stored in pending_snapshot->anon_dev;
4) btrfs_get_new_fs_root() frees that anonymous device number because
btrfs_lookup_fs_root() returned a root - someone else did a lookup
of the new root already, which could some task doing backref walking;
5) After that some error happens in the transaction commit path, and at
ioctl.c:create_snapshot() we jump to the 'fail' label, and after
that we free again the same anonymous device number, which in the
meanwhile may have been reallocated somewhere else, because
pending_snapshot->anon_dev still has the same value as in step 1.
Recently syzbot ran into this and reported the following trace:
------------[ cut here ]------------
ida_free called for id=51 which is not allocated.
WARNING: CPU: 1 PID: 31038 at lib/idr.c:525 ida_free+0x370/0x420 lib/idr.c:525
Modules linked in:
CPU: 1 PID: 31038 Comm: syz-executor.2 Not tainted 6.8.0-rc4-syzkaller-00410-gc02197fc9076 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/25/2024
RIP: 0010:ida_free+0x370/0x420 lib/idr.c:525
Code: 10 42 80 3c 28 (...)
RSP: 0018:ffffc90015a67300 EFLAGS: 00010246
RAX: be5130472f5dd000 RBX: 0000000000000033 RCX: 0000000000040000
RDX: ffffc90009a7a000 RSI: 000000000003ffff RDI: 0000000000040000
RBP: ffffc90015a673f0 R08: ffffffff81577992 R09: 1ffff92002b4cdb4
R10: dffffc0000000000 R11: fffff52002b4cdb5 R12: 0000000000000246
R13: dffffc0000000000 R14: ffffffff8e256b80 R15: 0000000000000246
FS: 00007fca3f4b46c0(0000) GS:ffff8880b9500000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f167a17b978 CR3: 000000001ed26000 CR4: 0000000000350ef0
Call Trace:
<TASK>
btrfs_get_root_ref+0xa48/0xaf0 fs/btrfs/disk-io.c:1346
create_pending_snapshot+0xff2/0x2bc0 fs/btrfs/transaction.c:1837
create_pending_snapshots+0x195/0x1d0 fs/btrfs/transaction.c:1931
btrfs_commit_transaction+0xf1c/0x3740 fs/btrfs/transaction.c:2404
create_snapshot+0x507/0x880 fs/btrfs/ioctl.c:848
btrfs_mksubvol+0x5d0/0x750 fs/btrfs/ioctl.c:998
btrfs_mksnapshot+0xb5/0xf0 fs/btrfs/ioctl.c:1044
__btrfs_ioctl_snap_create+0x387/0x4b0 fs/btrfs/ioctl.c:1306
btrfs_ioctl_snap_create_v2+0x1ca/0x400 fs/btrfs/ioctl.c:1393
btrfs_ioctl+0xa74/0xd40
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:871 [inline]
__se_sys_ioctl+0xfe/0x170 fs/ioctl.c:857
do_syscall_64+0xfb/0x240
entry_SYSCALL_64_after_hwframe+0x6f/0x77
RIP: 0033:0x7fca3e67dda9
Code: 28 00 00 00 (...)
RSP: 002b:00007fca3f4b40c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007fca3e7abf80 RCX: 00007fca3e67dda9
RDX: 00000000200005c0 RSI: 0000000050009417 RDI: 0000000000000003
RBP: 00007fca3e6ca47a R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 000000000000000b R14: 00007fca3e7abf80 R15: 00007fff6bf95658
</TASK>
Where we get an explicit message where we attempt to free an anonymous
device number that is not currently allocated. It happens in a different
code path from the example below, at btrfs_get_root_ref(), so this change
may not fix the case triggered by syzbot.
To fix at least the code path from the example above, change
btrfs_get_root_ref() and its callers to receive a dev_t pointer argument
for the anonymous device number, so that in case it frees the number, it
also resets it to 0, so that up in the call chain we don't attempt to do
the double free.
CC: stable@vger.kernel.org # 5.10+
Link: https://lore.kernel.org/linux-btrfs/000000000000f673a1061202f630@google.com/
Fixes: e03ee2fe87 ("btrfs: do not ASSERT() if the newly created subvolume already got read")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When FIEMAP_FLAG_SYNC is given to fiemap the expectation is that that
are no concurrent writes and we get a stable view of the inode's extent
layout.
When the flag is given we flush all IO (and wait for ordered extents to
complete) and then lock the inode in shared mode, however that leaves open
the possibility that a write might happen right after the flushing and
before locking the inode. So fix this by flushing again after locking the
inode - we leave the initial flushing before locking the inode to avoid
holding the lock and blocking other RO operations while waiting for IO
and ordered extents to complete. The second flushing while holding the
inode's lock will most of the time do nothing or very little since the
time window for new writes to have happened is small.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
For fiemap we recently stopped locking the target extent range for the
whole duration of the fiemap call, in order to avoid a deadlock in a
scenario where the fiemap buffer happens to be a memory mapped range of
the same file. This use case is very unlikely to be useful in practice but
it may be triggered by fuzz testing (syzbot, etc).
However by not locking the target extent range for the whole duration of
the fiemap call we can race with an ordered extent. This happens like
this:
1) The fiemap task finishes processing a file extent item that covers
the file range [512K, 1M[, and that file extent item is the last item
in the leaf currently being processed;
2) And ordered extent for the file range [768K, 2M[, in COW mode,
completes (btrfs_finish_one_ordered()) and the file extent item
covering the range [512K, 1M[ is trimmed to cover the range
[512K, 768K[ and then a new file extent item for the range [768K, 2M[
is inserted in the inode's subvolume tree;
3) The fiemap task calls fiemap_next_leaf_item(), which then calls
btrfs_next_leaf() to find the next leaf / item. This finds that the
the next key following the one we previously processed (its type is
BTRFS_EXTENT_DATA_KEY and its offset is 512K), is the key corresponding
to the new file extent item inserted by the ordered extent, which has
a type of BTRFS_EXTENT_DATA_KEY and an offset of 768K;
4) Later the fiemap code ends up at emit_fiemap_extent() and triggers
the warning:
if (cache->offset + cache->len > offset) {
WARN_ON(1);
return -EINVAL;
}
Since we get 1M > 768K, because the previously emitted entry for the
old extent covering the file range [512K, 1M[ ends at an offset that
is greater than the new extent's start offset (768K). This makes fiemap
fail with -EINVAL besides triggering the warning that produces a stack
trace like the following:
[1621.677651] ------------[ cut here ]------------
[1621.677656] WARNING: CPU: 1 PID: 204366 at fs/btrfs/extent_io.c:2492 emit_fiemap_extent+0x84/0x90 [btrfs]
[1621.677899] Modules linked in: btrfs blake2b_generic (...)
[1621.677951] CPU: 1 PID: 204366 Comm: pool Not tainted 6.8.0-rc5-btrfs-next-151+ #1
[1621.677954] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
[1621.677956] RIP: 0010:emit_fiemap_extent+0x84/0x90 [btrfs]
[1621.678033] Code: 2b 4c 89 63 (...)
[1621.678035] RSP: 0018:ffffab16089ffd20 EFLAGS: 00010206
[1621.678037] RAX: 00000000004fa000 RBX: ffffab16089ffe08 RCX: 0000000000009000
[1621.678039] RDX: 00000000004f9000 RSI: 00000000004f1000 RDI: ffffab16089ffe90
[1621.678040] RBP: 00000000004f9000 R08: 0000000000001000 R09: 0000000000000000
[1621.678041] R10: 0000000000000000 R11: 0000000000001000 R12: 0000000041d78000
[1621.678043] R13: 0000000000001000 R14: 0000000000000000 R15: ffff9434f0b17850
[1621.678044] FS: 00007fa6e20006c0(0000) GS:ffff943bdfa40000(0000) knlGS:0000000000000000
[1621.678046] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[1621.678048] CR2: 00007fa6b0801000 CR3: 000000012d404002 CR4: 0000000000370ef0
[1621.678053] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[1621.678055] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[1621.678056] Call Trace:
[1621.678074] <TASK>
[1621.678076] ? __warn+0x80/0x130
[1621.678082] ? emit_fiemap_extent+0x84/0x90 [btrfs]
[1621.678159] ? report_bug+0x1f4/0x200
[1621.678164] ? handle_bug+0x42/0x70
[1621.678167] ? exc_invalid_op+0x14/0x70
[1621.678170] ? asm_exc_invalid_op+0x16/0x20
[1621.678178] ? emit_fiemap_extent+0x84/0x90 [btrfs]
[1621.678253] extent_fiemap+0x766/0xa30 [btrfs]
[1621.678339] btrfs_fiemap+0x45/0x80 [btrfs]
[1621.678420] do_vfs_ioctl+0x1e4/0x870
[1621.678431] __x64_sys_ioctl+0x6a/0xc0
[1621.678434] do_syscall_64+0x52/0x120
[1621.678445] entry_SYSCALL_64_after_hwframe+0x6e/0x76
There's also another case where before calling btrfs_next_leaf() we are
processing a hole or a prealloc extent and we had several delalloc ranges
within that hole or prealloc extent. In that case if the ordered extents
complete before we find the next key, we may end up finding an extent item
with an offset smaller than (or equals to) the offset in cache->offset.
So fix this by changing emit_fiemap_extent() to address these three
scenarios like this:
1) For the first case, steps listed above, adjust the length of the
previously cached extent so that it does not overlap with the current
extent, emit the previous one and cache the current file extent item;
2) For the second case where he had a hole or prealloc extent with
multiple delalloc ranges inside the hole or prealloc extent's range,
and the current file extent item has an offset that matches the offset
in the fiemap cache, just discard what we have in the fiemap cache and
assign the current file extent item to the cache, since it's more up
to date;
3) For the third case where he had a hole or prealloc extent with
multiple delalloc ranges inside the hole or prealloc extent's range
and the offset of the file extent item we just found is smaller than
what we have in the cache, just skip the current file extent item
if its range end at or behind the cached extent's end, because we may
have emitted (to the fiemap user space buffer) delalloc ranges that
overlap with the current file extent item's range. If the file extent
item's range goes beyond the end offset of the cached extent, just
emit the cached extent and cache a subrange of the file extent item,
that goes from the end offset of the cached extent to the end offset
of the file extent item.
Dealing with those cases in those ways makes everything consistent by
reflecting the current state of file extent items in the btree and
without emitting extents that have overlapping ranges (which would be
confusing and violating expectations).
This issue could be triggered often with test case generic/561, and was
also hit and reported by Wang Yugui.
Reported-by: Wang Yugui <wangyugui@e16-tech.com>
Link: https://lore.kernel.org/linux-btrfs/20240223104619.701F.409509F4@e16-tech.com/
Fixes: b0ad381fa7 ("btrfs: fix deadlock with fiemap and extent locking")
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Pull networking fixes from Jakub Kicinski:
"Including fixes from bluetooth, WiFi and netfilter.
We have one outstanding issue with the stmmac driver, which may be a
LOCKDEP false positive, not a blocker.
Current release - regressions:
- netfilter: nf_tables: re-allow NFPROTO_INET in
nft_(match/target)_validate()
- eth: ionic: fix error handling in PCI reset code
Current release - new code bugs:
- eth: stmmac: complete meta data only when enabled, fix null-deref
- kunit: fix again checksum tests on big endian CPUs
Previous releases - regressions:
- veth: try harder when allocating queue memory
- Bluetooth:
- hci_bcm4377: do not mark valid bd_addr as invalid
- hci_event: fix handling of HCI_EV_IO_CAPA_REQUEST
Previous releases - always broken:
- info leak in __skb_datagram_iter() on netlink socket
- mptcp:
- map v4 address to v6 when destroying subflow
- fix potential wake-up event loss due to sndbuf auto-tuning
- fix double-free on socket dismantle
- wifi: nl80211: reject iftype change with mesh ID change
- fix small out-of-bound read when validating netlink be16/32 types
- rtnetlink: fix error logic of IFLA_BRIDGE_FLAGS writing back
- ipv6: fix potential "struct net" ref-leak in inet6_rtm_getaddr()
- ip_tunnel: prevent perpetual headroom growth with huge number of
tunnels on top of each other
- mctp: fix skb leaks on error paths of mctp_local_output()
- eth: ice: fixes for DPLL state reporting
- dpll: rely on rcu for netdev_dpll_pin() to prevent UaF
- eth: dpaa: accept phy-interface-type = '10gbase-r' in the device
tree"
* tag 'net-6.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (73 commits)
dpll: fix build failure due to rcu_dereference_check() on unknown type
kunit: Fix again checksum tests on big endian CPUs
tls: fix use-after-free on failed backlog decryption
tls: separate no-async decryption request handling from async
tls: fix peeking with sync+async decryption
tls: decrement decrypt_pending if no async completion will be called
gtp: fix use-after-free and null-ptr-deref in gtp_newlink()
net: hsr: Use correct offset for HSR TLV values in supervisory HSR frames
igb: extend PTP timestamp adjustments to i211
rtnetlink: fix error logic of IFLA_BRIDGE_FLAGS writing back
tools: ynl: fix handling of multiple mcast groups
selftests: netfilter: add bridge conntrack + multicast test case
netfilter: bridge: confirm multicast packets before passing them up the stack
netfilter: nf_tables: allow NFPROTO_INET in nft_(match/target)_validate()
Bluetooth: qca: Fix triggering coredump implementation
Bluetooth: hci_qca: Set BDA quirk bit if fwnode exists in DT
Bluetooth: qca: Fix wrong event type for patch config command
Bluetooth: Enforce validation on max value of connection interval
Bluetooth: hci_event: Fix handling of HCI_EV_IO_CAPA_REQUEST
Bluetooth: mgmt: Fix limited discoverable off timeout
...
The update_cpumask(), checks for newly requested cpumask by calling
validate_change(), which returns an error on passing an invalid set
of cpu(s). Independent of the error returned, update_cpumask() always
returns zero, suppressing the error and returning success to the user
on writing an invalid cpu range for a cpuset. Fix it by returning
retval instead, which is returned by validate_change().
Fixes: 99fe36ba6f ("cgroup/cpuset: Improve temporary cpumasks handling")
Signed-off-by: Kamalesh Babulal <kamalesh.babulal@oracle.com>
Reviewed-by: Waiman Long <longman@redhat.com>
Cc: stable@vger.kernel.org # v6.6+
Signed-off-by: Tejun Heo <tj@kernel.org>
Pull Landlock fix from Mickaël Salaün:
"Fix a potential issue when handling inodes with inconsistent
properties"
* tag 'landlock-6.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux:
landlock: Fix asymmetric private inodes referring
This reverts commit ce173474cf.
We cannot correctly deal with NAPOT mappings in vmalloc/vmap because if
some part of a NAPOT mapping is unmapped, the remaining mapping is not
updated accordingly. For example:
ptr = vmalloc_huge(64 * 1024, GFP_KERNEL);
vunmap_range((unsigned long)(ptr + PAGE_SIZE),
(unsigned long)(ptr + 64 * 1024));
leads to the following kernel page table dump:
0xffff8f8000ef0000-0xffff8f8000ef1000 0x00000001033c0000 4K PTE N .. .. D A G . . W R V
Meaning the first entry which was not unmapped still has the N bit set,
which, if accessed first and cached in the TLB, could allow access to the
unmapped range.
That's because the logic to break the NAPOT mapping does not exist and
likely won't. Indeed, to break a NAPOT mapping, we first have to clear
the whole mapping, flush the TLB and then set the new mapping ("break-
before-make" equivalent). That works fine in userspace since we can handle
any pagefault occurring on the remaining mapping but we can't handle a kernel
pagefault on such mapping.
So fix this by reverting the commit that introduced the vmap/vmalloc
support.
Fixes: ce173474cf ("riscv: mm: support Svnapot in huge vmap")
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20240227205016.121901-2-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Samuel Holland <samuel.holland@sifive.com> says:
This series fixes a couple of issues related to using the cbo.zero
instruction in userspace. The first patch fixes a bug where the wrong
enable bit gets set if the kernel is running in M-mode. The remaining
patches fix a bug where the enable bit gets reset to its default value
after a nonretentive idle state. I have hardware which reproduces this:
Before this series:
$ tools/testing/selftests/riscv/hwprobe/cbo
TAP version 13
1..3
ok 1 Zicboz block size
# Zicboz block size: 64
Illegal instruction
After applying this series:
$ tools/testing/selftests/riscv/hwprobe/cbo
TAP version 13
1..3
ok 1 Zicboz block size
# Zicboz block size: 64
ok 2 cbo.zero
ok 3 cbo.zero check
# Totals: pass:3 fail:0 xfail:0 xpass:0 skip:0 error:0
* b4-shazam-merge:
riscv: Save/restore envcfg CSR during CPU suspend
riscv: Add a custom ISA extension for the [ms]envcfg CSR
riscv: Fix enabling cbo.zero when running in M-mode
Link: https://lore.kernel.org/r/20240228065559.3434837-1-samuel.holland@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
The [ms]envcfg CSR was added in version 1.12 of the RISC-V privileged
ISA (aka S[ms]1p12). However, bits in this CSR are defined by several
other extensions which may be implemented separately from any particular
version of the privileged ISA (for example, some unrelated errata may
prevent an implementation from claiming conformance with Ss1p12). As a
result, Linux cannot simply use the privileged ISA version to determine
if the CSR is present. It must also check if any of these other
extensions are implemented. It also cannot probe the existence of the
CSR at runtime, because Linux does not require Sstrict, so (in the
absence of additional information) it cannot know if a CSR at that
address is [ms]envcfg or part of some non-conforming vendor extension.
Since there are several standard extensions that imply the existence of
the [ms]envcfg CSR, it becomes unwieldy to check for all of them
wherever the CSR is accessed. Instead, define a custom Xlinuxenvcfg ISA
extension bit that is implied by the other extensions and denotes that
the CSR exists as defined in the privileged ISA, containing at least one
of the fields common between menvcfg and senvcfg.
This extension does not need to be parsed from the devicetree or ISA
string because it can only be implemented as a subset of some other
standard extension.
Cc: <stable@vger.kernel.org> # v6.7+
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20240228065559.3434837-3-samuel.holland@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Add myself as a maintainer for the various SiFive drivers, since I have
been performing cleanup activity on these drivers and reviewing patches
to them for a while now. Remove Palmer as a maintainer, as he is focused
on overall RISC-V architecture support.
Collapse some duplicate entries into the main SiFive drivers entry:
- Conor is already maintainer of standalone cache drivers as a whole,
and these files are also covered by the "sifive" file name regex.
- Paul's git tree has not been updated since 2018, and all file names
matching the "fu540" pattern also match the "sifive" pattern.
- Green has not been active on the LKML for a couple of years.
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Acked-by: Paul Walmsley <paul.walmsley@sifive.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Link: https://lore.kernel.org/r/20240215234941.1663791-1-samuel.holland@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Luiz Augusto von Dentz says:
====================
bluetooth pull request for net:
- mgmt: Fix limited discoverable off timeout
- hci_qca: Set BDA quirk bit if fwnode exists in DT
- hci_bcm4377: do not mark valid bd_addr as invalid
- hci_sync: Check the correct flag before starting a scan
- Enforce validation on max value of connection interval
- hci_sync: Fix accept_list when attempting to suspend
- hci_event: Fix handling of HCI_EV_IO_CAPA_REQUEST
- Avoid potential use-after-free in hci_error_reset
- rfcomm: Fix null-ptr-deref in rfcomm_check_security
- hci_event: Fix wrongly recorded wakeup BD_ADDR
- qca: Fix wrong event type for patch config command
- qca: Fix triggering coredump implementation
* tag 'for-net-2024-02-28' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: qca: Fix triggering coredump implementation
Bluetooth: hci_qca: Set BDA quirk bit if fwnode exists in DT
Bluetooth: qca: Fix wrong event type for patch config command
Bluetooth: Enforce validation on max value of connection interval
Bluetooth: hci_event: Fix handling of HCI_EV_IO_CAPA_REQUEST
Bluetooth: mgmt: Fix limited discoverable off timeout
Bluetooth: hci_event: Fix wrongly recorded wakeup BD_ADDR
Bluetooth: rfcomm: Fix null-ptr-deref in rfcomm_check_security
Bluetooth: hci_sync: Fix accept_list when attempting to suspend
Bluetooth: Avoid potential use-after-free in hci_error_reset
Bluetooth: hci_sync: Check the correct flag before starting a scan
Bluetooth: hci_bcm4377: do not mark valid bd_addr as invalid
====================
Link: https://lore.kernel.org/r/20240228145644.2269088-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When the decrypt request goes to the backlog and crypto_aead_decrypt
returns -EBUSY, tls_do_decryption will wait until all async
decryptions have completed. If one of them fails, tls_do_decryption
will return -EBADMSG and tls_decrypt_sg jumps to the error path,
releasing all the pages. But the pages have been passed to the async
callback, and have already been released by tls_decrypt_done.
The only true async case is when crypto_aead_decrypt returns
-EINPROGRESS. With -EBUSY, we already waited so we can tell
tls_sw_recvmsg that the data is available for immediate copy, but we
need to notify tls_decrypt_sg (via the new ->async_done flag) that the
memory has already been released.
Fixes: 8590541473 ("net: tls: handle backlogging of crypto requests")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/4755dd8d9bebdefaa19ce1439b833d6199d4364c.1709132643.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If we're not doing async, the handling is much simpler. There's no
reference counting, we just need to wait for the completion to wake us
up and return its result.
We should preferably also use a separate crypto_wait. I'm not seeing a
UAF as I did in the past, I think aec7961916 ("tls: fix race between
async notify and socket close") took care of it.
This will make the next fix easier.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/47bde5f649707610eaef9f0d679519966fc31061.1709132643.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If we peek from 2 records with a currently empty rx_list, and the
first record is decrypted synchronously but the second record is
decrypted async, the following happens:
1. decrypt record 1 (sync)
2. copy from record 1 to the userspace's msg
3. queue the decrypted record to rx_list for future read(!PEEK)
4. decrypt record 2 (async)
5. queue record 2 to rx_list
6. call process_rx_list to copy data from the 2nd record
We currently pass copied=0 as skip offset to process_rx_list, so we
end up copying once again from the first record. We should skip over
the data we've already copied.
Seen with selftest tls.12_aes_gcm.recv_peek_large_buf_mult_recs
Fixes: 692d7b5d1f ("tls: Fix recvmsg() to be able to peek across multiple records")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/1b132d2b2b99296bfde54e8a67672d90d6d16e71.1709132643.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
With mixed sync/async decryption, or failures of crypto_aead_decrypt,
we increment decrypt_pending but we never do the corresponding
decrement since tls_decrypt_done will not be called. In this case, we
should decrement decrypt_pending immediately to avoid getting stuck.
For example, the prequeue prequeue test gets stuck with mixed
modes (one async decrypt + one sync decrypt).
Fixes: 94524d8fc9 ("net/tls: Add support for async decryption of tls records")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/c56d5fc35543891d5319f834f25622360e1bfbec.1709132643.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When resetting the bus after a gap count error, use a long rather than
short bus reset.
IEEE 1394-1995 uses only long bus resets. IEEE 1394a adds the option of
short bus resets. When video or audio transmission is in progress and a
device is hot-plugged elsewhere on the bus, the resulting bus reset can
cause video frame drops or audio dropouts. Short bus resets reduce or
eliminate this problem. Accordingly, short bus resets are almost always
preferred.
However, on a mixed 1394/1394a bus, a short bus reset can trigger an
immediate additional bus reset. This double bus reset can be interpreted
differently by different nodes on the bus, resulting in an inconsistent gap
count after the bus reset. An inconsistent gap count will cause another bus
reset, leading to a neverending bus reset loop. This only happens for some
bus topologies, not for all mixed 1394/1394a buses.
By instead sending a long bus reset after a gap count inconsistency, we
avoid the doubled bus reset, restoring the bus to normal operation.
Signed-off-by: Adam Goldman <adamg@pobox.com>
Link: https://sourceforge.net/p/linux1394/mailman/message/58741624/
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
Patch #1 restores NFPROTO_INET with nft_compat, from Ignat Korchagin.
Patch #2 fixes an issue with bridge netfilter and broadcast/multicast
packets.
There is a day 0 bug in br_netfilter when used with connection tracking.
Conntrack assumes that an nf_conn structure that is not yet added to
hash table ("unconfirmed"), is only visible by the current cpu that is
processing the sk_buff.
For bridge this isn't true, sk_buff can get cloned in between, and
clones can be processed in parallel on different cpu.
This patch disables NAT and conntrack helpers for multicast packets.
Patch #3 adds a selftest to cover for the br_netfilter bug.
netfilter pull request 24-02-29
* tag 'nf-24-02-29' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
selftests: netfilter: add bridge conntrack + multicast test case
netfilter: bridge: confirm multicast packets before passing them up the stack
netfilter: nf_tables: allow NFPROTO_INET in nft_(match/target)_validate()
====================
Link: https://lore.kernel.org/r/20240229000135.8780-1-pablo@netfilter.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Current HSR implementation uses following supervisory frame (even for
HSRv1 the HSR tag is not is not present):
00000000: 01 15 4e 00 01 2d XX YY ZZ 94 77 10 88 fb 00 01
00000010: 7e 1c 17 06 XX YY ZZ 94 77 10 1e 06 XX YY ZZ 94
00000020: 77 10 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00000030: 00 00 00 00 00 00 00 00 00 00 00 00
The current code adds extra two bytes (i.e. sizeof(struct hsr_sup_tlv))
when offset for skb_pull() is calculated.
This is wrong, as both 'struct hsrv1_ethhdr_sp' and 'hsrv0_ethhdr_sp'
already have 'struct hsr_sup_tag' defined in them, so there is no need
for adding extra two bytes.
This code was working correctly as with no RedBox support, the check for
HSR_TLV_EOT (0x00) was off by two bytes, which were corresponding to
zeroed padded bytes for minimal packet size.
Fixes: eafaa88b3e ("net: hsr: Add support for redbox supervision frames")
Signed-off-by: Lukasz Majewski <lukma@denx.de>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20240228085644.3618044-1-lukma@denx.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
clang complains about a nonsensical test on builds with a 32-bit phys_addr_t,
which means resizing will always fail:
drivers/gpu/drm/xe/xe_mmio.c:109:23: error: result of comparison of constant 4294967296 with expression of type 'resource_size_t' (aka 'unsigned int') is always false [-Werror,-Wtautological-constant-out-of-range-compare]
109 | root_res->start > 0x100000000ull)
| ~~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~
Previously, BAR resize was always disallowed on 32-bit kernels, but
this apparently changed recently. Since 32-bit machines can in theory
support PAE/LPAE for large address spaces, this may end up useful,
so change the driver to shut up the warning but still work when
phys_addr_t/resource_size_t is 64 bit wide.
Fixes: 9a6e6c14bf ("drm/xe/mmio: Use non-atomic writeq/readq variant for 32b")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Acked-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240226124736.1272949-2-arnd@kernel.org
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit f5d3983366)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Mesa has been issuing a single bind operation per ioctl since xe.ko
changed to GPUVA due xe.ko bug #746. If I change Mesa to try again to
issue every single bind operation it can in the same ioctl, it hits
the MAX_BINDS assertion when running Vulkan conformance tests.
Test dEQP-VK.sparse_resources.transfer_queue.3d.rgba32i.1024_128_8
issues 960 bind operations in a single ioctl, it's the most I could
find in the conformance suite.
I don't see a reason to keep the MAX_BINDS restriction: it doesn't
seem to be preventing any specific issue. If the number is too big for
the memory allocations, then those will fail. Nothing related to
num_binds seems to be using the stack. Let's just get rid of it.
Fixes: dd08ebf6c3 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Testcase: dEQP-VK.sparse_resources.transfer_queue.3d.rgba32i.1024_128_8
References: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/746
Cc: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240215005353.1295420-1-paulo.r.zanoni@intel.com
(cherry picked from commit ba6bbdc6ea)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Those cases missed in previous uAPI cleanups were mostly accidentally
brought in from i915 or created to exercise the possibilities of gpuvm
but they are not used by userspace yet, so let's remove them. They can
still be brought back later if needed.
v2:
- Fix XE_VM_FLAG_FAULT_MODE support in xe_lrc.c (Brian Welty)
- Leave DRM_XE_VM_BIND_OP_UNMAP_ALL (José Roberto de Souza)
- Ensure invalid flag values are rejected (Rodrigo Vivi)
v3: Rebase after removal of persistent exec_queues (Francois Dugast)
v4: Rodrigo: Rebase after the new dumpable flag.
Fixes: dd08ebf6c3 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Francois Dugast <francois.dugast@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240222232356.175431-1-rodrigo.vivi@intel.com
(cherry picked from commit 84a1ed5e67)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
ASoC: Fixes for v6.8
A few small fixes, some driver specific and one slightly larger one
from Richard which adds a new core helper and updates a small clutch of
drivers to deal with the fact that they were using a helper which
requires that the lock for the list of controls without holding that
lock. We also have some quirks for new AMD based Lenovo systems.
In the commit d73ef2d69c ("rtnetlink: let rtnl_bridge_setlink checks
IFLA_BRIDGE_MODE length"), an adjustment was made to the old loop logic
in the function `rtnl_bridge_setlink` to enable the loop to also check
the length of the IFLA_BRIDGE_MODE attribute. However, this adjustment
removed the `break` statement and led to an error logic of the flags
writing back at the end of this function.
if (have_flags)
memcpy(nla_data(attr), &flags, sizeof(flags));
// attr should point to IFLA_BRIDGE_FLAGS NLA !!!
Before the mentioned commit, the `attr` is granted to be IFLA_BRIDGE_FLAGS.
However, this is not necessarily true fow now as the updated loop will let
the attr point to the last NLA, even an invalid NLA which could cause
overflow writes.
This patch introduces a new variable `br_flag` to save the NLA pointer
that points to IFLA_BRIDGE_FLAGS and uses it to resolve the mentioned
error logic.
Fixes: d73ef2d69c ("rtnetlink: let rtnl_bridge_setlink checks IFLA_BRIDGE_MODE length")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://lore.kernel.org/r/20240227121128.608110-1-linma@zju.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We never increment the group number iterator, so all groups
get recorded into index 0 of the mcast_groups[] array.
As a result YNL can only handle using the last group.
For example using the "netdev" sample on kernel with
page pool commands results in:
$ ./samples/netdev
YNL: Multicast group 'mgmt' not found
Most families have only one multicast group, so this hasn't
been noticed. Plus perhaps developers usually test the last
group which would have worked.
Fixes: 86878f14d7 ("tools: ynl: user space helpers")
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Link: https://lore.kernel.org/r/20240226214019.1255242-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add test case for multicast packet confirm race.
Without preceding patch, this should result in:
WARNING: CPU: 0 PID: 38 at net/netfilter/nf_conntrack_core.c:1198 __nf_conntrack_confirm+0x3ed/0x5f0
Workqueue: events_unbound macvlan_process_broadcast
RIP: 0010:__nf_conntrack_confirm+0x3ed/0x5f0
? __nf_conntrack_confirm+0x3ed/0x5f0
nf_confirm+0x2ad/0x2d0
nf_hook_slow+0x36/0xd0
ip_local_deliver+0xce/0x110
__netif_receive_skb_one_core+0x4f/0x70
process_backlog+0x8c/0x130
[..]
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
conntrack nf_confirm logic cannot handle cloned skbs referencing
the same nf_conn entry, which will happen for multicast (broadcast)
frames on bridges.
Example:
macvlan0
|
br0
/ \
ethX ethY
ethX (or Y) receives a L2 multicast or broadcast packet containing
an IP packet, flow is not yet in conntrack table.
1. skb passes through bridge and fake-ip (br_netfilter)Prerouting.
-> skb->_nfct now references a unconfirmed entry
2. skb is broad/mcast packet. bridge now passes clones out on each bridge
interface.
3. skb gets passed up the stack.
4. In macvlan case, macvlan driver retains clone(s) of the mcast skb
and schedules a work queue to send them out on the lower devices.
The clone skb->_nfct is not a copy, it is the same entry as the
original skb. The macvlan rx handler then returns RX_HANDLER_PASS.
5. Normal conntrack hooks (in NF_INET_LOCAL_IN) confirm the orig skb.
The Macvlan broadcast worker and normal confirm path will race.
This race will not happen if step 2 already confirmed a clone. In that
case later steps perform skb_clone() with skb->_nfct already confirmed (in
hash table). This works fine.
But such confirmation won't happen when eb/ip/nftables rules dropped the
packets before they reached the nf_confirm step in postrouting.
Pablo points out that nf_conntrack_bridge doesn't allow use of stateful
nat, so we can safely discard the nf_conn entry and let inet call
conntrack again.
This doesn't work for bridge netfilter: skb could have a nat
transformation. Also bridge nf prevents re-invocation of inet prerouting
via 'sabotage_in' hook.
Work around this problem by explicit confirmation of the entry at LOCAL_IN
time, before upper layer has a chance to clone the unconfirmed entry.
The downside is that this disables NAT and conntrack helpers.
Alternative fix would be to add locking to all code parts that deal with
unconfirmed packets, but even if that could be done in a sane way this
opens up other problems, for example:
-m physdev --physdev-out eth0 -j SNAT --snat-to 1.2.3.4
-m physdev --physdev-out eth1 -j SNAT --snat-to 1.2.3.5
For multicast case, only one of such conflicting mappings will be
created, conntrack only handles 1:1 NAT mappings.
Users should set create a setup that explicitly marks such traffic
NOTRACK (conntrack bypass) to avoid this, but we cannot auto-bypass
them, ruleset might have accept rules for untracked traffic already,
so user-visible behaviour would change.
Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org>
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217777
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Commit d0009effa8 ("netfilter: nf_tables: validate NFPROTO_* family") added
some validation of NFPROTO_* families in the nft_compat module, but it broke
the ability to use legacy iptables modules in dual-stack nftables.
While with legacy iptables one had to independently manage IPv4 and IPv6
tables, with nftables it is possible to have dual-stack tables sharing the
rules. Moreover, it was possible to use rules based on legacy iptables
match/target modules in dual-stack nftables.
As an example, the program from [2] creates an INET dual-stack family table
using an xt_bpf based rule, which looks like the following (the actual output
was generated with a patched nft tool as the current nft tool does not parse
dual stack tables with legacy match rules, so consider it for illustrative
purposes only):
table inet testfw {
chain input {
type filter hook prerouting priority filter; policy accept;
bytecode counter packets 0 bytes 0 accept
}
}
After d0009effa8 ("netfilter: nf_tables: validate NFPROTO_* family") we get
EOPNOTSUPP for the above program.
Fix this by allowing NFPROTO_INET for nft_(match/target)_validate(), but also
restrict the functions to classic iptables hooks.
Changes in v3:
* clarify that upstream nft will not display such configuration properly and
that the output was generated with a patched nft tool
* remove example program from commit description and link to it instead
* no code changes otherwise
Changes in v2:
* restrict nft_(match/target)_validate() to classic iptables hooks
* rewrite example program to use unmodified libnftnl
Fixes: d0009effa8 ("netfilter: nf_tables: validate NFPROTO_* family")
Link: https://lore.kernel.org/all/Zc1PfoWN38UuFJRI@calendula/T/#mc947262582c90fec044c7a3398cc92fac7afea72 [1]
Link: https://lore.kernel.org/all/20240220145509.53357-1-ignat@cloudflare.com/ [2]
Reported-by: Jordan Griege <jgriege@cloudflare.com>
Signed-off-by: Ignat Korchagin <ignat@cloudflare.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pull ACPI fix from Rafael Wysocki:
"Revert a recent EC driver change that introduced an unexpected and
undesirable user-visible difference in behavior (Rafael Wysocki)"
* tag 'acpi-6.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
Revert "ACPI: EC: Use a spin lock without disabing interrupts"
Pull power management fix from Rafael Wysocki:
"Fix a latent bug in the intel-pstate cpufreq driver that has been
exposed by the recent schedutil governor changes (Doug Smythies)"
* tag 'pm-6.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: intel_pstate: fix pstate limits enforcement for adjust_perf call back
Pull spi fixes from Mark Brown:
"There's two things here - the big one is a batch of fixes for the
power management in the Cadence QuadSPI driver which had some serious
issues with runtime PM and there's also a revert of one of the last
batch of fixes for ppc4xx which has a dependency on -next but was in
between two mainline fixes so the -next dependency got missed.
The ppc4xx driver is not currently included in any defconfig and has
dependencies that exclude it from allmodconfigs so none of the CI
systems catch issues with it, hence the need for the earlier fixes
series. There's some updates to the PowerPC configs to address this"
* tag 'spi-fix-v6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: Drop mismerged fix
spi: cadence-qspi: add system-wide suspend and resume callbacks
spi: cadence-qspi: put runtime in runtime PM hooks names
spi: cadence-qspi: remove system-wide suspend helper calls from runtime PM hooks
spi: cadence-qspi: fix pointer reference in runtime PM hooks
Pull regulator fixes from Mark Brown:
"Two small fixes, one small update for the max5970 driver bringing the
driver and DT binding documentation into sync plus a missed update to
the patterns in MAINTAINERS after a DT binding YAML conversion"
* tag 'regulator-fix-v6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: max5970: Fix regulator child node name
MAINTAINERS: repair entry for MICROCHIP MCP16502 PMIC DRIVER
Pull crypto fixes from Herbert Xu:
"This fixes a regression in lskcipher and an out-of-bound access
in arm64/neonbs"
* tag 'v6.8-p5' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: arm64/neonbs - fix out-of-bounds access on short input
crypto: lskcipher - Copy IV in lskcipher glue code always
Commit 'e3e56c050ab6 ("soc: qcom: rpmhpd: Make power_on actually enable
the domain")' aimed to make sure that a power-domain that is being
enabled without any particular performance-state requested will at least
turn the rail on, to avoid filling DeviceTree with otherwise unnecessary
required-opps properties.
But in the event that aggregation happens on a disabled power-domain, with
an enabled peer without performance-state, both the local and peer
corner are 0. The peer's enabled_corner is not considered, with the
result that the underlying (shared) resource is disabled.
One case where this can be observed is when the display stack keeps mmcx
enabled (but without a particular performance-state vote) in order to
access registers and sync_state happens in the rpmhpd driver. As mmcx_ao
is flushed the state of the peer (mmcx) is not considered and mmcx_ao
ends up turning off "mmcx.lvl" underneath mmcx. This has been observed
several times, but has been painted over in DeviceTree by adding an
explicit vote for the lowest non-disabled performance-state.
Fixes: e3e56c050a ("soc: qcom: rpmhpd: Make power_on actually enable the domain")
Reported-by: Johan Hovold <johan@kernel.org>
Closes: https://lore.kernel.org/linux-arm-msm/ZdMwZa98L23mu3u6@hovoldconsulting.com/
Cc: <stable@vger.kernel.org>
Signed-off-by: Bjorn Andersson <quic_bjorande@quicinc.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Tested-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Reviewed-by: Abhinav Kumar <quic_abhinavk@quicinc.com>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Tested-by: Johan Hovold <johan+linaro@kernel.org>
Link: https://lore.kernel.org/r/20240226-rpmhpd-enable-corner-fix-v1-1-68c004cec48c@quicinc.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
hci_coredump_qca() uses __hci_cmd_sync() to send a vendor-specific command
to trigger firmware coredump, but the command does not have any event as
its sync response, so it is not suitable to use __hci_cmd_sync(), fixed by
using __hci_cmd_send().
Fixes: 06d3fdfcdf ("Bluetooth: hci_qca: Add qcom devcoredump support")
Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
BT adapter going into UNCONFIGURED state during BT turn ON when
devicetree has no local-bd-address node.
Bluetooth will not work out of the box on such devices, to avoid this
problem, added check to set HCI_QUIRK_USE_BDADDR_PROPERTY based on
local-bd-address node entry.
When this quirk is not set, the public Bluetooth address read by host
from controller though HCI Read BD Address command is
considered as valid.
Fixes: e668eb1e15 ("Bluetooth: hci_core: Don't stop BT if the BD address missing in dts")
Signed-off-by: Janaki Ramaiah Thota <quic_janathot@quicinc.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Right now Linux BT stack cannot pass test case "GAP/CONN/CPUP/BV-05-C
'Connection Parameter Update Procedure Invalid Parameters Central
Responder'" in Bluetooth Test Suite revision GAP.TS.p44. [0]
That was revoled by commit c49a8682fc ("Bluetooth: validate BLE
connection interval updates"), but later got reverted due to devices
like keyboards and mice may require low connection interval.
So only validate the max value connection interval to pass the Test
Suite, and let devices to request low connection interval if needed.
[0] https://www.bluetooth.org/docman/handlers/DownloadDoc.ashx?doc_id=229869
Fixes: 68d19d7d99 ("Revert "Bluetooth: validate BLE connection interval updates"")
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
LIMITED_DISCOVERABLE flag is not reset from Class of Device and
advertisement on limited discoverable timeout. This prevents to pass PTS
test GAP/DISC/LIMM/BV-02-C
Calling set_discoverable_sync as when the limited discovery is set
correctly update the Class of Device and advertisement.
Signed-off-by: Frédéric Danis <frederic.danis@collabora.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
hci_store_wake_reason() wrongly parses event HCI_Connection_Request
as HCI_Connection_Complete and HCI_Connection_Complete as
HCI_Connection_Request, so causes recording wakeup BD_ADDR error and
potential stability issue, fix it by using the correct field.
Fixes: 2f20216c1d ("Bluetooth: Emit controller suspend and resume events")
Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
During our fuzz testing of the connection and disconnection process at the
RFCOMM layer, we discovered this bug. By comparing the packets from a
normal connection and disconnection process with the testcase that
triggered a KASAN report. We analyzed the cause of this bug as follows:
1. In the packets captured during a normal connection, the host sends a
`Read Encryption Key Size` type of `HCI_CMD` packet
(Command Opcode: 0x1408) to the controller to inquire the length of
encryption key.After receiving this packet, the controller immediately
replies with a Command Completepacket (Event Code: 0x0e) to return the
Encryption Key Size.
2. In our fuzz test case, the timing of the controller's response to this
packet was delayed to an unexpected point: after the RFCOMM and L2CAP
layers had disconnected but before the HCI layer had disconnected.
3. After receiving the Encryption Key Size Response at the time described
in point 2, the host still called the rfcomm_check_security function.
However, by this time `struct l2cap_conn *conn = l2cap_pi(sk)->chan->conn;`
had already been released, and when the function executed
`return hci_conn_security(conn->hcon, d->sec_level, auth_type, d->out);`,
specifically when accessing `conn->hcon`, a null-ptr-deref error occurred.
To fix this bug, check if `sk->sk_state` is BT_CLOSED before calling
rfcomm_recv_frame in rfcomm_process_rx.
Signed-off-by: Yuxuan Hu <20373622@buaa.edu.cn>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
During suspend, only wakeable devices can be in acceptlist, so if the
device was previously added it needs to be removed otherwise the device
can end up waking up the system prematurely.
Fixes: 3b42055388 ("Bluetooth: hci_sync: Fix attempting to suspend with unfiltered passive scan")
Signed-off-by: Clancy Shang <clancy.shang@quectel.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
While handling the HCI_EV_HARDWARE_ERROR event, if the underlying
BT controller is not responding, the GPIO reset mechanism would
free the hci_dev and lead to a use-after-free in hci_error_reset.
Here's the call trace observed on a ChromeOS device with Intel AX201:
queue_work_on+0x3e/0x6c
__hci_cmd_sync_sk+0x2ee/0x4c0 [bluetooth <HASH:3b4a6>]
? init_wait_entry+0x31/0x31
__hci_cmd_sync+0x16/0x20 [bluetooth <HASH:3b4a 6>]
hci_error_reset+0x4f/0xa4 [bluetooth <HASH:3b4a 6>]
process_one_work+0x1d8/0x33f
worker_thread+0x21b/0x373
kthread+0x13a/0x152
? pr_cont_work+0x54/0x54
? kthread_blkcg+0x31/0x31
ret_from_fork+0x1f/0x30
This patch holds the reference count on the hci_dev while processing
a HCI_EV_HARDWARE_ERROR event to avoid potential crash.
Fixes: c7741d16a5 ("Bluetooth: Perform a power cycle when receiving hardware error event")
Signed-off-by: Ying Hsu <yinghsu@chromium.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
There's a very confusing mistake in the code starting a HCI inquiry: We're
calling hci_dev_test_flag() to test for HCI_INQUIRY, but hci_dev_test_flag()
checks hdev->dev_flags instead of hdev->flags. HCI_INQUIRY is a bit that's
set on hdev->flags, not on hdev->dev_flags though.
HCI_INQUIRY equals the integer 7, and in hdev->dev_flags, 7 means
HCI_BONDABLE, so we were actually checking for HCI_BONDABLE here.
The mistake is only present in the synchronous code for starting an inquiry,
not in the async one. Also devices are typically bondable while doing an
inquiry, so that might be the reason why nobody noticed it so far.
Fixes: abfeea476c ("Bluetooth: hci_sync: Convert MGMT_OP_START_DISCOVERY")
Signed-off-by: Jonas Dreßler <verdre@v0yd.nl>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
A recent commit restored the original (and still documented) semantics
for the HCI_QUIRK_USE_BDADDR_PROPERTY quirk so that the device address
is considered invalid unless an address is provided by firmware.
This specifically means that this flag must only be set for devices with
invalid addresses, but the Broadcom BCM4377 driver has so far been
setting this flag unconditionally.
Fortunately the driver already checks for invalid addresses during setup
and sets the HCI_QUIRK_INVALID_BDADDR flag, which can simply be replaced
with HCI_QUIRK_USE_BDADDR_PROPERTY to indicate that the default address
is invalid but can be overridden by firmware (long term, this should
probably just always be allowed).
Fixes: 6945795bc8 ("Bluetooth: fix use-bdaddr-property quirk")
Cc: stable@vger.kernel.org # 6.5
Reported-by: Felix Zhang <mrman@mrman314.tech>
Link: https://lore.kernel.org/r/77419ffacc5b4875e920e038332575a2a5bff29f.camel@mrman314.tech/
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Reported-by: Felix Zhang <mrman@mrman314.tech>
Reviewed-by: Neal Gompa <neal@gompa.dev>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
AC5X spec says PHY init complete bit must be polled until zero.
We see cases in which timeout can take longer than the standard
calculation on AC5X, which is expected following the spec comment above.
According to the spec, we must wait as long as it takes for that bit to
toggle on AC5X.
Cap that with 100 delay loops so we won't get stuck forever.
Fixes: 06c8b667ff ("mmc: sdhci-xenon: Add support to PHYs of Marvell Xenon SDHC")
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Elad Nachman <enachman@marvell.com>
Link: https://lore.kernel.org/r/20240222191714.1216470-3-enachman@marvell.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
The fast path usage breakdown describes the detail for 'inet_sock', fix
the markup title.
Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently when suspending driver and stopping workqueue it is checked whether
workqueue is not NULL and if so, it is destroyed.
Function destroy_workqueue() does drain queue and does clear variable, but
it does not set workqueue variable to NULL. This can cause kernel/module
panic if code attempts to clear workqueue that was not initialized.
This scenario is possible when resuming suspended driver in stmmac_resume(),
because there is no handling for failed stmmac_hw_setup(),
which can fail and return if DMA engine has failed to initialize,
and workqueue is initialized after DMA engine.
Should DMA engine fail to initialize, resume will proceed normally,
but interface won't work and TX queue will eventually timeout,
causing 'Reset adapter' error.
This then does destroy workqueue during reset process.
And since workqueue is initialized after DMA engine and can be skipped,
it will cause kernel/module panic.
To secure against this possible crash, set workqueue variable to NULL when
destroying workqueue.
Log/backtrace from crash goes as follows:
[88.031977]------------[ cut here ]------------
[88.031985]NETDEV WATCHDOG: eth0 (sxgmac): transmit queue 1 timed out
[88.032017]WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:477 dev_watchdog+0x390/0x398
<Skipping backtrace for watchdog timeout>
[88.032251]---[ end trace e70de432e4d5c2c0 ]---
[88.032282]sxgmac 16d88000.ethernet eth0: Reset adapter.
[88.036359]------------[ cut here ]------------
[88.036519]Call trace:
[88.036523] flush_workqueue+0x3e4/0x430
[88.036528] drain_workqueue+0xc4/0x160
[88.036533] destroy_workqueue+0x40/0x270
[88.036537] stmmac_fpe_stop_wq+0x4c/0x70
[88.036541] stmmac_release+0x278/0x280
[88.036546] __dev_close_many+0xcc/0x158
[88.036551] dev_close_many+0xbc/0x190
[88.036555] dev_close.part.0+0x70/0xc0
[88.036560] dev_close+0x24/0x30
[88.036564] stmmac_service_task+0x110/0x140
[88.036569] process_one_work+0x1d8/0x4a0
[88.036573] worker_thread+0x54/0x408
[88.036578] kthread+0x164/0x170
[88.036583] ret_from_fork+0x10/0x20
[88.036588]---[ end trace e70de432e4d5c2c1 ]---
[88.036597]Unable to handle kernel NULL pointer dereference at virtual address 0000000000000004
Fixes: 5a5586112b ("net: stmmac: support FPE link partner hand-shaking procedure")
Signed-off-by: Jakub Raczynski <j.raczynski@samsung.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sun8i_ce_cipher_unprepare should be called before
crypto_finalize_skcipher_request, because client callbacks may
immediately free memory, that isn't needed anymore. But it will be
used by unprepare after free. Before removing prepare/unprepare
callbacks it was handled by crypto engine in crypto_finalize_request.
Usually that results in a pointer dereference problem during a in
crypto selftest.
Unable to handle kernel NULL pointer dereference at
virtual address 0000000000000030
Mem abort info:
ESR = 0x0000000096000004
EC = 0x25: DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
FSC = 0x04: level 0 translation fault
Data abort info:
ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
CM = 0, WnR = 0, TnD = 0, TagAccess = 0
GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
user pgtable: 4k pages, 48-bit VAs, pgdp=000000004716d000
[0000000000000030] pgd=0000000000000000, p4d=0000000000000000
Internal error: Oops: 0000000096000004 [#1] SMP
This problem is detected by KASAN as well.
==================================================================
BUG: KASAN: slab-use-after-free in sun8i_ce_cipher_do_one+0x6e8/0xf80 [sun8i_ce]
Read of size 8 at addr ffff00000dcdc040 by task 1c15000.crypto-/373
Hardware name: Pine64 PinePhone (1.2) (DT)
Call trace:
dump_backtrace+0x9c/0x128
show_stack+0x20/0x38
dump_stack_lvl+0x48/0x60
print_report+0xf8/0x5d8
kasan_report+0x90/0xd0
__asan_load8+0x9c/0xc0
sun8i_ce_cipher_do_one+0x6e8/0xf80 [sun8i_ce]
crypto_pump_work+0x354/0x620 [crypto_engine]
kthread_worker_fn+0x244/0x498
kthread+0x168/0x178
ret_from_fork+0x10/0x20
Allocated by task 379:
kasan_save_stack+0x3c/0x68
kasan_set_track+0x2c/0x40
kasan_save_alloc_info+0x24/0x38
__kasan_kmalloc+0xd4/0xd8
__kmalloc+0x74/0x1d0
alg_test_skcipher+0x90/0x1f0
alg_test+0x24c/0x830
cryptomgr_test+0x38/0x60
kthread+0x168/0x178
ret_from_fork+0x10/0x20
Freed by task 379:
kasan_save_stack+0x3c/0x68
kasan_set_track+0x2c/0x40
kasan_save_free_info+0x38/0x60
__kasan_slab_free+0x100/0x170
slab_free_freelist_hook+0xd4/0x1e8
__kmem_cache_free+0x15c/0x290
kfree+0x74/0x100
kfree_sensitive+0x80/0xb0
alg_test_skcipher+0x12c/0x1f0
alg_test+0x24c/0x830
cryptomgr_test+0x38/0x60
kthread+0x168/0x178
ret_from_fork+0x10/0x20
The buggy address belongs to the object at ffff00000dcdc000
which belongs to the cache kmalloc-256 of size 256
The buggy address is located 64 bytes inside of
freed 256-byte region [ffff00000dcdc000, ffff00000dcdc100)
Signed-off-by: Andrey Skvortsov <andrej.skvortzov@gmail.com>
Fixes: 4136212ab1 ("crypto: sun8i-ce - Remove prepare/unprepare request")
Cc: <stable@vger.kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Kalle Valo says:
====================
wireless fixes for v6.8-rc7
Few remaining fixes, hopefully the last wireless pull request to v6.8.
Two fixes to the stack and two to iwlwifi but no high priority fixes
this time.
* tag 'wireless-2024-02-27' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
wifi: mac80211: only call drv_sta_rc_update for uploaded stations
MAINTAINERS: wifi: Add N: ath1*k entries to match .yaml files
MAINTAINERS: wifi: update Jeff Johnson e-mail address
wifi: iwlwifi: mvm: fix the TXF mapping for BZ devices
wifi: iwlwifi: mvm: ensure offloading TID queue exists
wifi: nl80211: reject iftype change with mesh ID change
====================
Link: https://lore.kernel.org/r/20240227135751.C5EC6C43390@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently using plain XDP/ZC sockets on stmmac results in a kernel crash:
|[ 255.822584] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
|[...]
|[ 255.822764] Call trace:
|[ 255.822766] stmmac_tx_clean.constprop.0+0x848/0xc38
The program counter indicates xsk_tx_metadata_complete(). It works on
compl->tx_timestamp, which is not set by xsk_tx_metadata_to_compl() due to
missing meta data. Therefore, call xsk_tx_metadata_complete() only when
meta data is actually used.
Tested on imx93 without XDP, with XDP and with XDP/ZC.
Fixes: 1347b41931 ("net: stmmac: Add Tx HWTS support to XDP ZC")
Suggested-by: Serge Semin <fancer.lancer@gmail.com>
Tested-by: Serge Semin <fancer.lancer@gmail.com>
Link: https://lore.kernel.org/netdev/87r0h7wg8u.fsf@kurt.kurt.home/
Acked-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Link: https://lore.kernel.org/r/20240222-stmmac_xdp-v2-1-4beee3a037e4@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The MII code does not check the return value of mdio_read (among
others), and therefore no error code should be sent. A previous fix to
the use of an uninitialized variable propagates negative error codes,
that might lead to wrong operations by the MII library.
An example of such issues is the use of mii_nway_restart by the dm9601
driver. The mii_nway_restart function does not check the value returned
by mdio_read, which in this case might be a negative number which could
contain the exact bit the function checks (BMCR_ANENABLE = 0x1000).
Return zero in case of error, as it is common practice in users of
mdio_read to avoid wrong uses of the return value.
Fixes: 8f8abb863f ("net: usb: dm9601: fix uninitialized variable use in dm9601_mdio_read")
Signed-off-by: Javier Carrasco <javier.carrasco.cruz@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Peter Korsgaard <peter@korsgaard.com>
Link: https://lore.kernel.org/r/20240225-dm9601_ret_err-v1-1-02c1d959ea59@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull lsm fixes from Paul Moore:
"Two small patches, one for AppArmor and one for SELinux, to fix
potential uninitialized variable problems in the new LSM syscalls we
added during the v6.8 merge window.
We haven't been able to get a response from John on the AppArmor
patch, but considering both the importance of the patch and it's
rather simple nature it seems like a good idea to get this merged
sooner rather than later.
I'm sure John is just taking some much needed vacation; if we need to
revise this when he gets back to his email we can"
* tag 'lsm-pr-20240227' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm:
apparmor: fix lsm_get_self_attr()
selinux: fix lsm_get_self_attr()
Pull misc fixes from Andrew Morton:
"Six hotfixes. Three are cc:stable and the remainder address post-6.7
issues or aren't considered appropriate for backporting"
* tag 'mm-hotfixes-stable-2024-02-27-14-52' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mm/debug_vm_pgtable: fix BUG_ON with pud advanced test
mm: cachestat: fix folio read-after-free in cache walk
MAINTAINERS: add memory mapping entry with reviewers
mm/vmscan: fix a bug calling wakeup_kswapd() with a wrong zone index
kasan: revert eviction of stack traces in generic mode
stackdepot: use variable size records for non-evictable entries
Currently, GPU resets can now be performed successfully on the Raven
series. While GPU reset is required for the S3 suspend abort case.
So now can enable gpu reset for S3 abort cases on the Raven series.
Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Adds a check in the map_hw_resources function to prevent a potential
buffer overflow. The function was accessing arrays using an index that
could potentially be greater than the size of the arrays, leading to a
buffer overflow.
Adds a check to ensure that the index is within the bounds of the
arrays. If the index is out of bounds, an error message is printed and
break it will continue execution with just ignoring extra data early to
prevent the buffer overflow.
Reported by smatch:
drivers/gpu/drm/amd/amdgpu/../display/dc/dml2/dml2_wrapper.c:79 map_hw_resources() error: buffer overflow 'dml2->v20.scratch.dml_to_dc_pipe_mapping.disp_cfg_to_stream_id' 6 <= 7
drivers/gpu/drm/amd/amdgpu/../display/dc/dml2/dml2_wrapper.c:81 map_hw_resources() error: buffer overflow 'dml2->v20.scratch.dml_to_dc_pipe_mapping.disp_cfg_to_plane_id' 6 <= 7
Fixes: 7966f319c6 ("drm/amd/display: Introduce DML2")
Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Cc: Roman Li <roman.li@amd.com>
Cc: Qingqing Zhuo <Qingqing.Zhuo@amd.com>
Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Cc: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Suggested-by: Roman Li <roman.li@amd.com>
Reviewed-by: Roman Li <roman.li@amd.com>
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
One patch of a series of three that was sent fixing issues with the
ppc4xx driver was targeted at -next, unfortunately it being sandwiched
between two others that targeted mainline tripped up my workflow and
caused it to get merged along with the others. The ppc4xx driver is
only buildable in very limited configurations so none of the CI catches
issues with it.
Fixes: de4af897dd ("spi: ppc4xx: Fix fallout from rename in struct spi_bitbang")
Signed-off-by: Mark Brown <broonie@kernel.org>
When rebuilding the lif after an FLR, be sure to restore the
current netdev features, not do the usual first time feature
init. This prevents losing user changes to things like TSO
or vlan tagging states.
Fixes: 45b84188a0 ("ionic: keep filters across FLR")
Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Since we now have potential cases of NULL cmd_regs and info_regs
during a reset recovery, and left NULL if a reset recovery has
failed, we need to check that they exist before we use them.
Most of the cases were covered in the original patch where we
verify before doing the ioreadb() for health or cmd status.
However, we need to protect a few uses of io mem that could
be hit in error recovery or asynchronous threads calls as well
(e.g. ethtool or devlink handlers).
Fixes: 219e183272 ("ionic: no fw read when PCI reset failed")
Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
AER recovery handler can trigger a PCI Reset after tearing
down the device setup in the error detection handler. The PCI
Reset handler will also attempt to tear down the device setup,
and this second tear down needs to know that it doesn't need
to call pci_release_regions() a second time. We can clear
num_bars on tear down and use that to decide later if we need
to clear the resources. This prevents a harmless but disturbing
warning message
resource: Trying to free nonexistent resource <0xXXXXXXXXXX-0xXXXXXXXXXX>
Fixes: c3a910e1c4 ("ionic: fill out pci error handlers")
Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
If a directory has a block with only ".__afsXXXX" files in it (from
uncompleted silly-rename), these .__afsXXXX files are skipped but without
advancing the file position in the dir_context. This leads to
afs_dir_iterate() repeating the block again and again.
Fix this by making the code that skips the .__afsXXXX file also manually
advance the file position.
The symptoms are a soft lookup:
watchdog: BUG: soft lockup - CPU#3 stuck for 52s! [check:5737]
...
RIP: 0010:afs_dir_iterate_block+0x39/0x1fd
...
? watchdog_timer_fn+0x1a6/0x213
...
? asm_sysvec_apic_timer_interrupt+0x16/0x20
? afs_dir_iterate_block+0x39/0x1fd
afs_dir_iterate+0x10a/0x148
afs_readdir+0x30/0x4a
iterate_dir+0x93/0xd3
__do_sys_getdents64+0x6b/0xd4
This is almost certainly the actual fix for:
https://bugzilla.kernel.org/show_bug.cgi?id=218496
Fixes: 57e9d49c54 ("afs: Hide silly-rename files from userspace")
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/786185.1708694102@warthog.procyon.org.uk
Reviewed-by: Marc Dionne <marc.dionne@auristor.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Markus Suvanto <markus.suvanto@gmail.com>
cc: linux-afs@lists.infradead.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
Matthieu Baerts says:
====================
mptcp: more misc. fixes for v6.8
This series includes 6 types of fixes:
- Patch 1 fixes v4 mapped in v6 addresses support for the userspace PM,
when asking to delete a subflow. It was done everywhere else, but not
there. Patch 2 validates the modification, thanks to a subtest in
mptcp_join.sh. These patches can be backported up to v5.19.
- Patch 3 is a small fix for a recent bug-fix patch, just to avoid
printing an irrelevant warning (pr_warn()) once. It can be backported
up to v5.6, alongside the bug-fix that has been introduced in the
v6.8-rc5.
- Patches 4 to 6 are fixes for bugs found by Paolo while working on
TCP_NOTSENT_LOWAT support for MPTCP. These fixes can improve the
performances in some cases. Patches can be backported up to v5.6,
v5.11 and v6.7 respectively.
- Patch 7 makes sure 'ss -M' is available when starting MPTCP Join
selftest as it is required for some subtests since v5.18.
- Patch 8 fixes a possible double-free on socket dismantle. The issue
always existed, but was unnoticed because it was not causing any
problem so far. This fix can be backported up to v5.6.
- Patch 9 is a fix for a very recent patch causing lockdep warnings in
subflow diag. The patch causing the regression -- which fixes another
issue present since v5.7 -- should be part of the future v6.8-rc6.
Patch 10 validates the modification, thanks to a new subtest in
diag.sh.
====================
Link: https://lore.kernel.org/r/20240223-upstream-net-20240223-misc-fixes-v1-0-162e87e48497@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The mptcp diag interface already experienced a few locking bugs
that lockdep and appropriate coverage have detected in advance.
Let's add a test-case triggering the relevant code path, to prevent
similar issues in the future.
Be careful to cope with very slow environments.
Note that we don't need an explicit timeout on the mptcp_connect
subprocess to cope with eventual bug/hang-up as the final cleanup
terminating the child processes will take care of that.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240223-upstream-net-20240223-misc-fixes-v1-10-162e87e48497@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
when MPTCP server accepts an incoming connection, it clones its listener
socket. However, the pointer to 'inet_opt' for the new socket has the same
value as the original one: as a consequence, on program exit it's possible
to observe the following splat:
BUG: KASAN: double-free in inet_sock_destruct+0x54f/0x8b0
Free of addr ffff888485950880 by task swapper/25/0
CPU: 25 PID: 0 Comm: swapper/25 Kdump: loaded Not tainted 6.8.0-rc1+ #609
Hardware name: Supermicro SYS-6027R-72RF/X9DRH-7TF/7F/iTF/iF, BIOS 3.0 07/26/2013
Call Trace:
<IRQ>
dump_stack_lvl+0x32/0x50
print_report+0xca/0x620
kasan_report_invalid_free+0x64/0x90
__kasan_slab_free+0x1aa/0x1f0
kfree+0xed/0x2e0
inet_sock_destruct+0x54f/0x8b0
__sk_destruct+0x48/0x5b0
rcu_do_batch+0x34e/0xd90
rcu_core+0x559/0xac0
__do_softirq+0x183/0x5a4
irq_exit_rcu+0x12d/0x170
sysvec_apic_timer_interrupt+0x6b/0x80
</IRQ>
<TASK>
asm_sysvec_apic_timer_interrupt+0x16/0x20
RIP: 0010:cpuidle_enter_state+0x175/0x300
Code: 30 00 0f 84 1f 01 00 00 83 e8 01 83 f8 ff 75 e5 48 83 c4 18 44 89 e8 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc fb 45 85 ed <0f> 89 60 ff ff ff 48 c1 e5 06 48 c7 43 18 00 00 00 00 48 83 44 2b
RSP: 0018:ffff888481cf7d90 EFLAGS: 00000202
RAX: 0000000000000000 RBX: ffff88887facddc8 RCX: 0000000000000000
RDX: 1ffff1110ff588b1 RSI: 0000000000000019 RDI: ffff88887fac4588
RBP: 0000000000000004 R08: 0000000000000002 R09: 0000000000043080
R10: 0009b02ea273363f R11: ffff88887fabf42b R12: ffffffff932592e0
R13: 0000000000000004 R14: 0000000000000000 R15: 00000022c880ec80
cpuidle_enter+0x4a/0xa0
do_idle+0x310/0x410
cpu_startup_entry+0x51/0x60
start_secondary+0x211/0x270
secondary_startup_64_no_verify+0x184/0x18b
</TASK>
Allocated by task 6853:
kasan_save_stack+0x1c/0x40
kasan_save_track+0x10/0x30
__kasan_kmalloc+0xa6/0xb0
__kmalloc+0x1eb/0x450
cipso_v4_sock_setattr+0x96/0x360
netlbl_sock_setattr+0x132/0x1f0
selinux_netlbl_socket_post_create+0x6c/0x110
selinux_socket_post_create+0x37b/0x7f0
security_socket_post_create+0x63/0xb0
__sock_create+0x305/0x450
__sys_socket_create.part.23+0xbd/0x130
__sys_socket+0x37/0xb0
__x64_sys_socket+0x6f/0xb0
do_syscall_64+0x83/0x160
entry_SYSCALL_64_after_hwframe+0x6e/0x76
Freed by task 6858:
kasan_save_stack+0x1c/0x40
kasan_save_track+0x10/0x30
kasan_save_free_info+0x3b/0x60
__kasan_slab_free+0x12c/0x1f0
kfree+0xed/0x2e0
inet_sock_destruct+0x54f/0x8b0
__sk_destruct+0x48/0x5b0
subflow_ulp_release+0x1f0/0x250
tcp_cleanup_ulp+0x6e/0x110
tcp_v4_destroy_sock+0x5a/0x3a0
inet_csk_destroy_sock+0x135/0x390
tcp_fin+0x416/0x5c0
tcp_data_queue+0x1bc8/0x4310
tcp_rcv_state_process+0x15a3/0x47b0
tcp_v4_do_rcv+0x2c1/0x990
tcp_v4_rcv+0x41fb/0x5ed0
ip_protocol_deliver_rcu+0x6d/0x9f0
ip_local_deliver_finish+0x278/0x360
ip_local_deliver+0x182/0x2c0
ip_rcv+0xb5/0x1c0
__netif_receive_skb_one_core+0x16e/0x1b0
process_backlog+0x1e3/0x650
__napi_poll+0xa6/0x500
net_rx_action+0x740/0xbb0
__do_softirq+0x183/0x5a4
The buggy address belongs to the object at ffff888485950880
which belongs to the cache kmalloc-64 of size 64
The buggy address is located 0 bytes inside of
64-byte region [ffff888485950880, ffff8884859508c0)
The buggy address belongs to the physical page:
page:0000000056d1e95e refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888485950700 pfn:0x485950
flags: 0x57ffffc0000800(slab|node=1|zone=2|lastcpupid=0x1fffff)
page_type: 0xffffffff()
raw: 0057ffffc0000800 ffff88810004c640 ffffea00121b8ac0 dead000000000006
raw: ffff888485950700 0000000000200019 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff888485950780: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
ffff888485950800: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
>ffff888485950880: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
^
ffff888485950900: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
ffff888485950980: 00 00 00 00 00 01 fc fc fc fc fc fc fc fc fc fc
Something similar (a refcount underflow) happens with CALIPSO/IPv6. Fix
this by duplicating IP / IPv6 options after clone, so that
ip{,6}_sock_destruct() doesn't end up freeing the same memory area twice.
Fixes: cf7da0d66c ("mptcp: Create SUBFLOW socket for incoming connections")
Cc: stable@vger.kernel.org
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240223-upstream-net-20240223-misc-fixes-v1-8-162e87e48497@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
when inserting not contiguous data in the subflow write queue,
the protocol creates a new skb and prevent the TCP stack from
merging it later with already queued skbs by setting the EOR marker.
Still no push flag is explicitly set at the end of previous GSO
packet, making the aggregation on the receiver side sub-optimal -
and packetdrill self-tests less predictable.
Explicitly mark the end of not contiguous DSS with the push flag.
Fixes: 6d0060f600 ("mptcp: Write MPTCP DSS headers to outgoing data packets")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240223-upstream-net-20240223-misc-fixes-v1-4-162e87e48497@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This fixes a possible UAF in if_nlmsg_size(),
which can run without RTNL.
Add rcu protection to "struct dpll_pin"
Move netdev_dpll_pin() from netdevice.h to dpll.h to
decrease name pollution.
Note: This looks possible to no longer acquire RTNL in
netdev_dpll_pin_assign() later in net-next.
v2: do not force rcu_read_lock() in rtnl_dpll_pin_size() (Jiri Pirko)
Fixes: 5f18426928 ("netdev: expose DPLL pin handle for netdevice")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Cc: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20240223123208.3543319-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Same as LAN7800, LAN7850 can be used without EEPROM. If EEPROM is not
present or not flashed, LAN7850 will fail to sync the speed detected by the PHY
with the MAC. In case link speed is 100Mbit, it will accidentally work,
otherwise no data can be transferred.
Better way would be to implement link_up callback, or set auto speed
configuration unconditionally. But this changes would be more intrusive.
So, for now, set it only if no EEPROM is found.
Fixes: e69647a19c ("lan78xx: Set ASD in MAC_CR when EEE is enabled.")
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://lore.kernel.org/r/20240222123839.2816561-1-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If the driver detects that the controller is not ready before sending the
first IOC facts command, it will wait for a maximum of 10 seconds for it to
become ready. However, even if the controller becomes ready within 10
seconds, the driver will still issue a diagnostic reset.
Modify the driver to avoid sending a diag reset if the controller becomes
ready within the 10-second wait time.
Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com>
Link: https://lore.kernel.org/r/20240221071724.14986-1-ranjan.kumar@broadcom.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Doubling the number of PHYs also doubled the stack usage of this function,
exceeding the 32-bit limit of 1024 bytes:
drivers/scsi/mpi3mr/mpi3mr_transport.c: In function 'mpi3mr_refresh_sas_ports':
drivers/scsi/mpi3mr/mpi3mr_transport.c:1818:1: error: the frame size of 1636 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
Since the sas_io_unit_pg0 structure is already allocated dynamically, use
the same method here. The size of the allocation can be smaller based on
the actual number of phys now, so use this as an upper bound.
Fixes: cb5b608946 ("scsi: mpi3mr: Increase maximum number of PHYs to 64 from 32")
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Cc: Sathya Prakash Veerichetty <sathya.prakash@broadcom.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20240123130754.2011469-1-arnd@kernel.org
Tested-by: John Garry <john.g.garry@oracle.com> #build only
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Since MOCK_HUGE_PAGE_SIZE was introduced it allows the core code to invoke
mock with large page sizes. This confuses the validation logic that checks
that map/unmap are paired.
This is because the page size computed for map is based on the physical
address and in many cases will always be the base page size, however the
entire range generated by iommufd will be passed to map.
Randomly iommufd can see small groups of physically contiguous pages,
(say 8k unaligned and grouped together), but that group crosses a huge
page boundary. The map side will observe this as a contiguous run and mark
it accordingly, but there is a chance the unmap side will end up
terminating interior huge pages in the middle of that group and trigger a
validation failure. Meaning the validation only works if the core code
passes the iova/length directly from iommufd to mock.
syzkaller randomly hits this with failures like:
WARNING: CPU: 0 PID: 11568 at drivers/iommu/iommufd/selftest.c:461 mock_domain_unmap_pages+0x1c0/0x250
Modules linked in:
CPU: 0 PID: 11568 Comm: syz-executor.0 Not tainted 6.8.0-rc3+ #4
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
RIP: 0010:mock_domain_unmap_pages+0x1c0/0x250
Code: 2b e8 94 37 0f ff 48 d1 eb 31 ff 48 b8 00 00 00 00 00 00 20 00 48 21 c3 48 89 de e8 aa 32 0f ff 48 85 db 75 07 e8 70 37 0f ff <0f> 0b e8 69 37 0f ff 31 f6 31 ff e8 90 32 0f ff e8 5b 37 0f ff 4c
RSP: 0018:ffff88800e707490 EFLAGS: 00010293
RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff822dfae6
RDX: ffff88800cf86400 RSI: ffffffff822dfaf0 RDI: 0000000000000007
RBP: ffff88800e7074d8 R08: 0000000000000000 R09: ffffed1001167c90
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000001500000
R13: 0000000000083000 R14: 0000000000000001 R15: 0000000000000800
FS: 0000555556048480(0000) GS:ffff88806d400000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000001b2dc23000 CR3: 0000000008cbb000 CR4: 0000000000350eb0
Call Trace:
<TASK>
__iommu_unmap+0x281/0x520
iommu_unmap+0xc9/0x180
iopt_area_unmap_domain_range+0x1b1/0x290
iopt_area_unpin_domain+0x590/0x800
__iopt_area_unfill_domain+0x22e/0x650
iopt_area_unfill_domain+0x47/0x60
iopt_unfill_domain+0x187/0x590
iopt_table_remove_domain+0x267/0x2d0
iommufd_hwpt_paging_destroy+0x1f1/0x370
iommufd_object_remove+0x2a3/0x490
iommufd_device_detach+0x23a/0x2c0
iommufd_selftest_destroy+0x7a/0xf0
iommufd_fops_release+0x1d3/0x340
__fput+0x272/0xb50
__fput_sync+0x4b/0x60
__x64_sys_close+0x8b/0x110
do_syscall_64+0x71/0x140
entry_SYSCALL_64_after_hwframe+0x46/0x4e
Do the simple thing and just disable the validation when the huge page
tests are being run.
Fixes: 7db521e23f ("iommufd/selftest: Hugepage mock domain support")
Link: https://lore.kernel.org/r/0-v1-1e17e60a5c8a+103fb-iommufd_mock_hugepg_jgg@nvidia.com
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Syzkaller reported the following bug:
general protection fault, probably for non-canonical address 0xdffffc0000000038: 0000 [#1] SMP KASAN
KASAN: null-ptr-deref in range [0x00000000000001c0-0x00000000000001c7]
Call Trace:
lock_acquire
lock_acquire+0x1ce/0x4f0
down_read+0x93/0x4a0
iommufd_test_syz_conv_iova+0x56/0x1f0
iommufd_test_access_rw.isra.0+0x2ec/0x390
iommufd_test+0x1058/0x1e30
iommufd_fops_ioctl+0x381/0x510
vfs_ioctl
__do_sys_ioctl
__se_sys_ioctl
__x64_sys_ioctl+0x170/0x1e0
do_syscall_x64
do_syscall_64+0x71/0x140
This is because the new iommufd_access_change_ioas() sets access->ioas to
NULL during its process, so the lock might be gone in a concurrent racing
context.
Fix this by doing the same access->ioas sanity as iommufd_access_rw() and
iommufd_access_pin_pages() functions do.
Cc: stable@vger.kernel.org
Fixes: 9227da7816 ("iommufd: Add iommufd_access_change_ioas(_id) helpers")
Link: https://lore.kernel.org/r/3f1932acaf1dd494d404c04364d73ce8f57f3e5e.1708636627.git.nicolinc@nvidia.com
Reported-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Syzkaller reported the following bug:
sysfs: cannot create duplicate filename '/devices/iommufd_mock4'
Call Trace:
sysfs_warn_dup+0x71/0x90
sysfs_create_dir_ns+0x1ee/0x260
? sysfs_create_mount_point+0x80/0x80
? spin_bug+0x1d0/0x1d0
? do_raw_spin_unlock+0x54/0x220
kobject_add_internal+0x221/0x970
kobject_add+0x11c/0x1e0
? lockdep_hardirqs_on_prepare+0x273/0x3e0
? kset_create_and_add+0x160/0x160
? kobject_put+0x5d/0x390
? bus_get_dev_root+0x4a/0x60
? kobject_put+0x5d/0x390
device_add+0x1d5/0x1550
? __fw_devlink_link_to_consumers.isra.0+0x1f0/0x1f0
? __init_waitqueue_head+0xcb/0x150
iommufd_test+0x462/0x3b60
? lock_release+0x1fe/0x640
? __might_fault+0x117/0x170
? reacquire_held_locks+0x4b0/0x4b0
? iommufd_selftest_destroy+0xd0/0xd0
? __might_fault+0xbe/0x170
iommufd_fops_ioctl+0x256/0x350
? iommufd_option+0x180/0x180
? __lock_acquire+0x1755/0x45f0
__x64_sys_ioctl+0xa13/0x1640
The bug is triggered when Syzkaller created multiple mock devices but
didn't destroy them in the same sequence, messing up the mock_dev_num
counter. Replace the atomic with an mock_dev_ida.
Cc: stable@vger.kernel.org
Fixes: 23a1b46f15 ("iommufd/selftest: Make the mock iommu driver into a real driver")
Link: https://lore.kernel.org/r/5af41d5af6d5c013cc51de01427abb8141b3587e.1708636627.git.nicolinc@nvidia.com
Reported-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Syzkaller reported the following WARN_ON:
WARNING: CPU: 1 PID: 4738 at drivers/iommu/iommufd/io_pagetable.c:1360
Call Trace:
iommufd_access_change_ioas+0x2fe/0x4e0
iommufd_access_destroy_object+0x50/0xb0
iommufd_object_remove+0x2a3/0x490
iommufd_object_destroy_user
iommufd_access_destroy+0x71/0xb0
iommufd_test_staccess_release+0x89/0xd0
__fput+0x272/0xb50
__fput_sync+0x4b/0x60
__do_sys_close
__se_sys_close
__x64_sys_close+0x8b/0x110
do_syscall_x64
The mismatch between the access pointer in the list and the passed-in
pointer is resulting from an overwrite of access->iopt_access_list_id, in
iopt_add_access(). Called from iommufd_access_change_ioas() when
xa_alloc() succeeds but iopt_calculate_iova_alignment() fails.
Add a new_id in iopt_add_access() and only update iopt_access_list_id when
returning successfully.
Cc: stable@vger.kernel.org
Fixes: 9227da7816 ("iommufd: Add iommufd_access_change_ioas(_id) helpers")
Link: https://lore.kernel.org/r/2dda7acb25b8562ec5f1310de828ef5da9ef509c.1708636627.git.nicolinc@nvidia.com
Reported-by: Jason Gunthorpe <jgg@nvidia.com>
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Pull mtd fixes from Miquel Raynal:
"Many NAND page layouts have been added to the Marvell NAND controller
but could not be used in practice so they are being removed.
Regarding the SPI-NAND area, Gigadevice chips were not using the right
buffer for an ECC status check operation.
Aside from these driver fixes, there is also a refcount fix in the MTD
core nodes parsing logic"
* tag 'mtd/fixes-for-6.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux:
mtd: rawnand: marvell: fix layouts
mtd: Fix possible refcounting issue when going through partition nodes
mtd: spinand: gigadevice: Fix the get ecc status issue
Pull btrfs fixes from David Sterba:
"A more fixes for recently reported or discovered problems:
- fix corner case of send that would generate potentially large
stream of zeros if there's a hole at the end of the file
- fix chunk validation in zoned mode on conventional zones, it was
possible to create chunks that would not be allowed on sequential
zones
- fix validation of dev-replace ioctl filenames
- fix KCSAN warnings about access to block reserve struct members"
* tag 'for-6.8-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix data race at btrfs_use_block_rsv() when accessing block reserve
btrfs: fix data races when accessing the reserved amount of block reserves
btrfs: send: don't issue unnecessary zero writes for trailing hole
btrfs: dev-replace: properly validate device names
btrfs: zoned: don't skip block group profile checks on conventional zones
The addition of bal_rank_mask with encoding version 17 was merged
into ceph.git in Oct 2022 and made it into v18.2.0 release normally.
A few months later, the much delayed addition of max_xattr_size got
merged, also with encoding version 17, placed before bal_rank_mask
in the encoding -- but it didn't make v18.2.0 release.
The way this ended up being resolved on the MDS side is that
bal_rank_mask will continue to be encoded in version 17 while
max_xattr_size is now encoded in version 18. This does mean that
older kernels will misdecode version 17, but this is also true for
v18.2.0 and v18.2.1 clients in userspace.
The best we can do is backport this adjustment -- see ceph.git
commit 78abfeaff27fee343fb664db633de5b221699a73 for details.
[ idryomov: changelog ]
Cc: stable@vger.kernel.org
Link: https://tracker.ceph.com/issues/64440
Fixes: d93231a6bc ("ceph: prevent a client from exceeding the MDS maximum xattr size")
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
When CONFIG_NTFS3_LZX_XPRESS is not set then we get the following build
error:
fs/ntfs3/frecord.c:2460:16: error: unused variable ‘i_size’
Signed-off-by: Mark O'Donovan <shiftee@posteo.net>
Fixes: 4fd6c08a16 ("fs/ntfs3: Use i_size_read and i_size_write")
Tested-by: Chris Clayton <chris2553@googlemail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When linking or renaming a file, if only one of the source or
destination directory is backed by an S_PRIVATE inode, then the related
set of layer masks would be used as uninitialized by
is_access_to_paths_allowed(). This would result to indeterministic
access for one side instead of always being allowed.
This bug could only be triggered with a mounted filesystem containing
both S_PRIVATE and !S_PRIVATE inodes, which doesn't seem possible.
The collect_domain_accesses() calls return early if
is_nouser_or_private() returns false, which means that the directory's
superblock has SB_NOUSER or its inode has S_PRIVATE. Because rename or
link actions are only allowed on the same mounted filesystem, the
superblock is always the same for both source and destination
directories. However, it might be possible in theory to have an
S_PRIVATE parent source inode with an !S_PRIVATE parent destination
inode, or vice versa.
To make sure this case is not an issue, explicitly initialized both set
of layer masks to 0, which means to allow all actions on the related
side. If at least on side has !S_PRIVATE, then
collect_domain_accesses() and is_access_to_paths_allowed() check for the
required access rights.
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Günther Noack <gnoack@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Shervin Oloumi <enlightened@chromium.org>
Cc: stable@vger.kernel.org
Fixes: b91c3e4ea7 ("landlock: Add support for file reparenting with LANDLOCK_ACCESS_FS_REFER")
Link: https://lore.kernel.org/r/20240219190345.2928627-1-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
MKTME repurposes the high bit of physical address to key id for encryption
key and, even though MAXPHYADDR in CPUID[0x80000008] remains the same,
the valid bits in the MTRR mask register are based on the reduced number
of physical address bits.
detect_tme() in arch/x86/kernel/cpu/intel.c detects TME and subtracts
it from the total usable physical bits, but it is called too late.
Move the call to early_init_intel() so that it is called in setup_arch(),
before MTRRs are setup.
This fixes boot on TDX-enabled systems, which until now only worked with
"disable_mtrr_cleanup". Without the patch, the values written to the
MTRRs mask registers were 52-bit wide (e.g. 0x000fffff_80000800) and
the writes failed; with the patch, the values are 46-bit wide, which
matches the reduced MAXPHYADDR that is shown in /proc/cpuinfo.
Reported-by: Zixi Chen <zixchen@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc:stable@vger.kernel.org
Link: https://lore.kernel.org/all/20240131230902.1867092-3-pbonzini%40redhat.com
In commit fbf6449f84 ("x86/sev-es: Set x86_virt_bits to the correct
value straight away, instead of a two-phase approach"), the initialization
of c->x86_phys_bits was moved after this_cpu->c_early_init(c). This is
incorrect because early_init_amd() expected to be able to reduce the
value according to the contents of CPUID leaf 0x8000001f.
Fortunately, the bug was negated by init_amd()'s call to early_init_amd(),
which does reduce x86_phys_bits in the end. However, this is very
late in the boot process and, most notably, the wrong value is used for
x86_phys_bits when setting up MTRRs.
To fix this, call get_cpu_address_sizes() as soon as X86_FEATURE_CPUID is
set/cleared, and c->extended_cpuid_level is retrieved.
Fixes: fbf6449f84 ("x86/sev-es: Set x86_virt_bits to the correct value straight away, instead of a two-phase approach")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc:stable@vger.kernel.org
Link: https://lore.kernel.org/all/20240131230902.1867092-2-pbonzini%40redhat.com
Commit a5a923038d (fbdev: fbcon: Properly revert changes when
vc_resize() failed) started restoring old font data upon failure (of
vc_resize()). But it performs so only for user fonts. It means that the
"system"/internal fonts are not restored at all. So in result, the very
first call to fbcon_do_set_font() performs no restore at all upon
failing vc_resize().
This can be reproduced by Syzkaller to crash the system on the next
invocation of font_get(). It's rather hard to hit the allocation failure
in vc_resize() on the first font_set(), but not impossible. Esp. if
fault injection is used to aid the execution/failure. It was
demonstrated by Sirius:
BUG: unable to handle page fault for address: fffffffffffffff8
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD cb7b067 P4D cb7b067 PUD cb7d067 PMD 0
Oops: 0000 [#1] PREEMPT SMP KASAN
CPU: 1 PID: 8007 Comm: poc Not tainted 6.7.0-g9d1694dc91ce #20
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
RIP: 0010:fbcon_get_font+0x229/0x800 drivers/video/fbdev/core/fbcon.c:2286
Call Trace:
<TASK>
con_font_get drivers/tty/vt/vt.c:4558 [inline]
con_font_op+0x1fc/0xf20 drivers/tty/vt/vt.c:4673
vt_k_ioctl drivers/tty/vt/vt_ioctl.c:474 [inline]
vt_ioctl+0x632/0x2ec0 drivers/tty/vt/vt_ioctl.c:752
tty_ioctl+0x6f8/0x1570 drivers/tty/tty_io.c:2803
vfs_ioctl fs/ioctl.c:51 [inline]
...
So restore the font data in any case, not only for user fonts. Note the
later 'if' is now protected by 'old_userfont' and not 'old_data' as the
latter is always set now. (And it is supposed to be non-NULL. Otherwise
we would see the bug above again.)
Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org>
Fixes: a5a923038d ("fbdev: fbcon: Properly revert changes when vc_resize() failed")
Reported-and-tested-by: Ubisectech Sirius <bugreport@ubisectech.com>
Cc: Ubisectech Sirius <bugreport@ubisectech.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Helge Deller <deller@gmx.de>
Cc: linux-fbdev@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20240208114411.14604-1-jirislaby@kernel.org
It seems that if userspace provides a correct IFA_TARGET_NETNSID value
but no IFA_ADDRESS and IFA_LOCAL attributes, inet6_rtm_getaddr()
returns -EINVAL with an elevated "struct net" refcount.
Fixes: 6ecf4c37eb ("ipv6: enable IFA_TARGET_NETNSID for RTM_GETADDR")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Ahern <dsahern@kernel.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Test that we keep GRO flag in sync when XDP is disabled while
the device is closed.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
veth sets NETIF_F_GRO automatically when XDP is enabled,
because both features use the same NAPI machinery.
The logic to clear NETIF_F_GRO sits in veth_disable_xdp() which
is called both on ndo_stop and when XDP is turned off.
To avoid the flag from being cleared when the device is brought
down, the clearing is skipped when IFF_UP is not set.
Bringing the device down should indeed not modify its features.
Unfortunately, this means that clearing is also skipped when
XDP is disabled _while_ the device is down. And there's nothing
on the open path to bring the device features back into sync.
IOW if user enables XDP, disables it and then brings the device
up we'll end up with a stray GRO flag set but no NAPI instances.
We don't depend on the GRO flag on the datapath, so the datapath
won't crash. We will crash (or hang), however, next time features
are sync'ed (either by user via ethtool or peer changing its config).
The GRO flag will go away, and veth will try to disable the NAPIs.
But the open path never created them since XDP was off, the GRO flag
was a stray. If NAPI was initialized before we'll hang in napi_disable().
If it never was we'll crash trying to stop uninitialized hrtimer.
Move the GRO flag updates to the XDP enable / disable paths,
instead of mixing them with the ndo_open / ndo_close paths.
Fixes: d3256efd8e ("veth: allow enabling NAPI even without XDP")
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Reported-by: syzbot+039399a9b96297ddedca@syzkaller.appspotmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After a couple recent changes in LLVM, there is a warning (or error with
CONFIG_WERROR=y or W=e) from the compile time fortify source routines,
specifically the memset() in copy_to_user_tmpl().
In file included from net/xfrm/xfrm_user.c:14:
...
include/linux/fortify-string.h:438:4: error: call to '__write_overflow_field' declared with 'warning' attribute: detected write beyond size of field (1st parameter); maybe use struct_group()? [-Werror,-Wattribute-warning]
438 | __write_overflow_field(p_size_field, size);
| ^
1 error generated.
While ->xfrm_nr has been validated against XFRM_MAX_DEPTH when its value
is first assigned in copy_templates() by calling validate_tmpl() first
(so there should not be any issue in practice), LLVM/clang cannot really
deduce that across the boundaries of these functions. Without that
knowledge, it cannot assume that the loop stops before i is greater than
XFRM_MAX_DEPTH, which would indeed result a stack buffer overflow in the
memset().
To make the bounds of ->xfrm_nr clear to the compiler and add additional
defense in case copy_to_user_tmpl() is ever used in a path where
->xfrm_nr has not been properly validated against XFRM_MAX_DEPTH first,
add an explicit bound check and early return, which clears up the
warning.
Cc: stable@vger.kernel.org
Link: https://github.com/ClangBuiltLinux/linux/issues/1985
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Pull bcachefs fixes from Kent Overstreet:
"Some more mostly boring fixes, but some not
User reported ones:
- the BTREE_ITER_FILTER_SNAPSHOTS one fixes a really nasty
performance bug; user reported an untar initially taking two
seconds and then ~2 minutes
- kill a __GFP_NOFAIL in the buffered read path; this was a leftover
from the trickier fix to kill __GFP_NOFAIL in readahead, where we
can't return errors (and have to silently truncate the read
ourselves).
bcachefs can't use GFP_NOFAIL for folio state unlike iomap based
filesystems because our folio state is just barely too big, 2MB
hugepages cause us to exceed the 2 page threshhold for GFP_NOFAIL.
additionally, the flags argument was just buggy, we weren't
supplying GFP_KERNEL previously (!)"
* tag 'bcachefs-2024-02-25' of https://evilpiepirate.org/git/bcachefs:
bcachefs: fix bch2_save_backtrace()
bcachefs: Fix check_snapshot() memcpy
bcachefs: Fix bch2_journal_flush_device_pins()
bcachefs: fix iov_iter count underflow on sub-block dio read
bcachefs: Fix BTREE_ITER_FILTER_SNAPSHOTS on inodes btree
bcachefs: Kill __GFP_NOFAIL in buffered read path
bcachefs: fix backpointer_to_text() when dev does not exist
Pull two documentation build fixes from Jonathan Corbet:
- The XFS online fsck documentation uses incredibly deeply nested
subsection and list nesting; that broke the PDF docs build. Tweak a
parameter to tell LaTeX to allow the deeper nesting.
- Fix a 6.8 PDF-build regression
* tag 'docs-6.8-fixes3' of git://git.lwn.net/linux:
docs: translations: use attribute to store current language
docs: Instruct LaTeX to cope with deeper nesting
Pull USB fixes from Greg KH:
"Here are some small USB fixes for 6.8-rc6 to resolve some reported
problems. These include:
- regression fixes with typec tpcm code as reported by many
- cdnsp and cdns3 driver fixes
- usb role setting code bugfixes
- build fix for uhci driver
- ncm gadget driver bugfix
- MAINTAINERS entry update
All of these have been in linux-next all week with no reported issues
and there is at least one fix in here that is in Thorsten's regression
list that is being tracked"
* tag 'usb-6.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: typec: tpcm: Fix issues with power being removed during reset
MAINTAINERS: Drop myself as maintainer of TYPEC port controller drivers
usb: gadget: ncm: Avoid dropping datagrams of properly parsed NTBs
Revert "usb: typec: tcpm: reset counter when enter into unattached state after try role"
usb: gadget: omap_udc: fix USB gadget regression on Palm TE
usb: dwc3: gadget: Don't disconnect if not started
usb: cdns3: fix memory double free when handle zero packet
usb: cdns3: fixed memory use after free at cdns3_gadget_ep_disable()
usb: roles: don't get/set_role() when usb_role_switch is unregistered
usb: roles: fix NULL pointer issue when put module's reference
usb: cdnsp: fixed issue with incorrect detecting CDNSP family controllers
usb: cdnsp: blocked some cdns3 specific code
usb: uhci-grlib: Explicitly include linux/platform_device.h
Pull tty/serial driver fixes from Greg KH:
"Here are three small serial/tty driver fixes for 6.8-rc6 that resolve
the following reported errors:
- riscv hvc console driver fix that was reported by many
- amba-pl011 serial driver fix for RS485 mode
- stm32 serial driver fix for RS485 mode
All of these have been in linux-next all week with no reported
problems"
* tag 'tty-6.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
serial: amba-pl011: Fix DMA transmission in RS485 mode
serial: stm32: do not always set SER_RS485_RX_DURING_TX if RS485 is enabled
tty: hvc: Don't enable the RISC-V SBI console by default
Pull x86 fixes from Borislav Petkov:
- Make sure clearing CPU buffers using VERW happens at the latest
possible point in the return-to-userspace path, otherwise memory
accesses after the VERW execution could cause data to land in CPU
buffers again
* tag 'x86_urgent_for_v6.8_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
KVM/VMX: Move VERW closer to VMentry for MDS mitigation
KVM/VMX: Use BT+JNC, i.e. EFLAGS.CF to select VMRESUME vs. VMLAUNCH
x86/bugs: Use ALTERNATIVE() instead of mds_user_clear static key
x86/entry_32: Add VERW just before userspace transition
x86/entry_64: Add VERW just before userspace transition
x86/bugs: Add asm helpers for executing VERW
Pull irq fixes from Borislav Petkov:
- Make sure GICv4 always gets initialized to prevent a kexec-ed kernel
from silently failing to set it up
- Do not call bus_get_dev_root() for the mbigen irqchip as it always
returns NULL - use NULL directly
- Fix hardware interrupt number truncation when assigning MSI
interrupts
- Correct sending end-of-interrupt messages to disabled interrupts
lines on RISC-V PLIC
* tag 'irq_urgent_for_v6.8_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/gic-v3-its: Do not assume vPE tables are preallocated
irqchip/mbigen: Don't use bus_get_dev_root() to find the parent
PCI/MSI: Prevent MSI hardware interrupt number truncation
irqchip/sifive-plic: Enable interrupt if needed before EOI
Pull erofs fix from Gao Xiang:
- Fix page refcount leak when looking up specific inodes
introduced by metabuf reworking
* tag 'erofs-for-6.8-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: fix refcount on the metabuf used for inode lookup
Pull RCU pathwalk fixes from Al Viro:
"We still have some races in filesystem methods when exposed to RCU
pathwalk. This series is a result of code audit (the second round of
it) and it should deal with most of that stuff.
Still pending: ntfs3 ->d_hash()/->d_compare() and ceph_d_revalidate().
Up to maintainers (a note for NTFS folks - when documentation says
that a method may not block, it *does* imply that blocking allocations
are to be avoided. Really)"
[ More explanations for people who aren't familiar with the vagaries of
RCU path walking: most of it is hidden from filesystems, but if a
filesystem actively participates in the low-level path walking it
needs to make sure the fields involved in that walk are RCU-safe.
That "actively participate in low-level path walking" includes things
like having its own ->d_hash()/->d_compare() routines, or by having
its own directory permission function that doesn't just use the common
helpers. Having a ->d_revalidate() function will also have this issue.
Note that instead of making everything RCU safe you can also choose to
abort the RCU pathwalk if your operation cannot be done safely under
RCU, but that obviously comes with a performance penalty. One common
pattern is to allow the simple cases under RCU, and abort only if you
need to do something more complicated.
So not everything needs to be RCU-safe, and things like the inode etc
that the VFS itself maintains obviously already are. But these fixes
tend to be about properly RCU-delaying things like ->s_fs_info that
are maintained by the filesystem and that got potentially released too
early. - Linus ]
* tag 'pull-fixes.pathwalk-rcu-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
ext4_get_link(): fix breakage in RCU mode
cifs_get_link(): bail out in unsafe case
fuse: fix UAF in rcu pathwalks
procfs: make freeing proc_fs_info rcu-delayed
procfs: move dropping pde and pid from ->evict_inode() to ->free_inode()
nfs: fix UAF on pathwalk running into umount
nfs: make nfs_set_verifier() safe for use in RCU pathwalk
afs: fix __afs_break_callback() / afs_drop_open_mmap() race
hfsplus: switch to rcu-delayed unloading of nls and freeing ->s_fs_info
exfat: move freeing sbi, upcase table and dropping nls into rcu-delayed helper
affs: free affs_sb_info with kfree_rcu()
rcu pathwalk: prevent bogus hard errors from may_lookup()
fs/super.c: don't drop ->s_user_ns until we free struct super_block itself
Pull vfs fixes from Al Viro:
"A couple of fixes - revert of regression from this cycle and a fix for
erofs failure exit breakage (had been there since way back)"
* tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
erofs: fix handling kern_mount() failure
Revert "get rid of DCACHE_GENOCIDE"
The "media_ldb_root_clk" is the gate clock to enable or disable the clock
provided by CCM(Clock Control Module) to LDB instead of the "media_ldb"
clock which is the parent of the "media_ldb_root_clk" clock as a composite
clock. Fix LDB clocks property by referencing the "media_ldb_root_clk"
clock instead of the "media_ldb" clock.
Fixes: e7567840ec ("arm64: dts: imx8mp: Reorder clock and reg properties")
Fixes: 94e6197dad ("arm64: dts: imx8mp: Add LCDIF2 & LDB nodes")
Signed-off-by: Liu Ying <victor.liu@nxp.com>
Reviewed-by: Marek Vasut <marex@denx.de>
Reviewed-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
The TC9595 reset GPIO is SAI1_RXC / GPIO4_IO01, fix the DT accordingly.
The SAI5_RXD0 / GPIO3_IO21 is thus far unused TC9595 interrupt line.
Fixes: 20d0b83e71 ("arm64: dts: imx8mp: Add TC9595 bridge on DH electronics i.MX8M Plus DHCOM")
Signed-off-by: Marek Vasut <marex@denx.de>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
So far we used an internal linux-imx@nxp.com email address to
gather all patches related to NXP i.MX development.
Let's switch to an open mailing list that provides ability
for people from the community to subscribe and also have
a proper archive.
List interface at: https://lists.linux.dev.
Archive is at: https://lore.kernel.org/imx/
Signed-off-by: Daniel Baluta <daniel.baluta@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
This fixes the display not working on colibri imx7, the driver fails to
load with the following error:
mxsfb 30730000.lcdif: error -ENODEV: Cannot connect bridge
NXP i.MX7 LCDIF is connected to both the Parallel LCD Display and to a
MIPI DSI IP block, currently it's not possible to describe the
connection to both.
Remove the port endpoint from the SOC dtsi to prevent regressions, this
would need to be defined on the board DTS.
Reported-by: Hiago De Franco <hiagofranco@gmail.com>
Closes: https://lore.kernel.org/r/34yzygh3mbwpqr2re7nxmhyxy3s7qmqy4vhxvoyxnoguktriur@z66m7gvpqlia/
Fixes: edbbae7fba ("ARM: dts: imx7: add MIPI-DSI support")
Signed-off-by: Francesco Dolcini <francesco.dolcini@toradex.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
The 'duplicates' bool argument is always true when efivar_init() is
called from its only caller so let's just drop it instead.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Al points out that kill_sb() will be called if efivarfs_fill_super()
fails and so there is no point in cleaning up the efivar entry list.
Reported-by: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Work around a quirk in a few old (2011-ish) UEFI implementations, where
a call to `GetNextVariableName` with a buffer size larger than 512 bytes
will always return EFI_INVALID_PARAMETER.
There is some lore around EFI variable names being up to 1024 bytes in
size, but this has no basis in the UEFI specification, and the upper
bounds are typically platform specific, and apply to the entire variable
(name plus payload).
Given that Linux does not permit creating files with names longer than
NAME_MAX (255) bytes, 512 bytes (== 256 UTF-16 characters) is a
reasonable limit.
Cc: <stable@vger.kernel.org> # 6.1+
Signed-off-by: Tim Schumacher <timschumi@gmx.de>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
1) errors from ext4_getblk() should not be propagated to caller
unless we are really sure that we would've gotten the same error
in non-RCU pathwalk.
2) we leak buffer_heads if ext4_getblk() is successful, but bh is
not uptodate.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
->d_revalidate() bails out there, anyway. It's not enough
to prevent getting into ->get_link() in RCU mode, but that
could happen only in a very contrieved setup. Not worth
trying to do anything fancy here unless ->d_revalidate()
stops kicking out of RCU mode at least in some cases.
Reviewed-by: Christian Brauner <brauner@kernel.org>
Acked-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
->permission(), ->get_link() and ->inode_get_acl() might dereference
->s_fs_info (and, in case of ->permission(), ->s_fs_info->fc->user_ns
as well) when called from rcu pathwalk.
Freeing ->s_fs_info->fc is rcu-delayed; we need to make freeing ->s_fs_info
and dropping ->user_ns rcu-delayed too.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
makes proc_pid_ns() safe from rcu pathwalk (put_pid_ns()
is still synchronous, but that's not a problem - it does
rcu-delay everything that needs to be)
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
that keeps both around until struct inode is freed, making access
to them safe from rcu-pathwalk
Acked-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
NFS ->d_revalidate(), ->permission() and ->get_link() need to access
some parts of nfs_server when called in RCU mode:
server->flags
server->caps
*(server->io_stats)
and, worst of all, call
server->nfs_client->rpc_ops->have_delegation
(the last one - as NFS_PROTO(inode)->have_delegation()). We really
don't want to RCU-delay the entire nfs_free_server() (it would have
to be done with schedule_work() from RCU callback, since it can't
be made to run from interrupt context), but actual freeing of
nfs_server and ->io_stats can be done via call_rcu() just fine.
nfs_client part is handled simply by making nfs_free_client() use
kfree_rcu().
Acked-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
nfs_set_verifier() relies upon dentry being pinned; if that's
the case, grabbing ->d_lock stabilizes ->d_parent and guarantees
that ->d_parent points to a positive dentry. For something
we'd run into in RCU mode that is *not* true - dentry might've
been through dentry_kill() just as we grabbed ->d_lock, with
its parent going through the same just as we get to into
nfs_set_verifier_locked(). It might get to detaching inode
(and zeroing ->d_inode) before nfs_set_verifier_locked() gets
to fetching that; we get an oops as the result.
That can happen in nfs{,4} ->d_revalidate(); the call chain in
question is nfs_set_verifier_locked() <- nfs_set_verifier() <-
nfs_lookup_revalidate_delegated() <- nfs{,4}_do_lookup_revalidate().
We have checked that the parent had been positive, but that's
done before we get to nfs_set_verifier() and it's possible for
memory pressure to pick our dentry as eviction candidate by that
time. If that happens, back-to-back attempts to kill dentry and
its parent are quite normal. Sure, in case of eviction we'll
fail the ->d_seq check in the caller, but we need to survive
until we return there...
Acked-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
In __afs_break_callback() we might check ->cb_nr_mmap and if it's non-zero
do queue_work(&vnode->cb_work). In afs_drop_open_mmap() we decrement
->cb_nr_mmap and do flush_work(&vnode->cb_work) if it reaches zero.
The trouble is, there's nothing to prevent __afs_break_callback() from
seeing ->cb_nr_mmap before the decrement and do queue_work() after both
the decrement and flush_work(). If that happens, we might be in trouble -
vnode might get freed before the queued work runs.
__afs_break_callback() is always done under ->cb_lock, so let's make
sure that ->cb_nr_mmap can change from non-zero to zero while holding
->cb_lock (the spinlock component of it - it's a seqlock and we don't
need to mess with the counter).
Acked-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
->d_hash() and ->d_compare() use those, so we need to delay freeing
them.
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
That stuff can be accessed by ->d_hash()/->d_compare(); as it is, we have
a hard-to-hit UAF if rcu pathwalk manages to get into ->d_hash() on a filesystem
that is in process of getting shut down.
Besides, having nls and upcase table cleanup moved from ->put_super() towards
the place where sbi is freed makes for simpler failure exits.
Acked-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
If lazy call of ->permission() returns a hard error, check that
try_to_unlazy() succeeds before returning it. That both makes
life easier for ->permission() instances and closes the race
in ENOTDIR handling - it is possible that positive d_can_lookup()
seen in link_path_walk() applies to the state *after* unlink() +
mkdir(), while nd->inode matches the state prior to that.
Normally seeing e.g. EACCES from permission check in rcu pathwalk
means that with some timings non-rcu pathwalk would've run into
the same; however, running into a non-executable regular file
in the middle of a pathname would not get to permission check -
it would fail with ENOTDIR instead.
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Avoids fun races in RCU pathwalk... Same goes for freeing LSM shite
hanging off super_block's arse.
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
check_snapshot() copies the bch_snapshot to a temporary to easily handle
older versions that don't have all the fields of the current version,
but it lacked a min() to correctly handle keys newer and larger than the
current version.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
If a journal write errored, the list of devices it was written to could
be empty - we're not supposed to mark an empty replicas list.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
bch2_direct_IO_read() checks the request offset and size for sector
alignment and then falls through to a couple calculations to shrink
the size of the request based on the inode size. The problem is that
these checks round up to the fs block size, which runs the risk of
underflowing iter->count if the block size happens to be large
enough. This is triggered by fstest generic/361 with a 4k block
size, which subsequently leads to a crash. To avoid this crash,
check that the shorten length doesn't exceed the overall length of
the iter.
Fixes:
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Su Yue <glass.su@suse.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
If we're in FILTER_SNAPSHOTS mode and we start scanning a range of the
keyspace where no keys are visible in the current snapshot, we have a
problem - we'll scan for a very long time before scanning terminates.
Awhile back, this was fixed for most cases with peek_upto() (and
assertions that enforce that it's being used).
But the fix missed the fact that the inodes btree is different - every
key offset is in a different snapshot tree, not just the inode field.
Fixes:
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Recently, we fixed our __GFP_NOFAIL usage in the readahead path, but the
easy one in read_single_folio() (where wa can return an error) was
missed - oops.
Fixes:
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Pull powerpc fixes from Michael Ellerman:
- Fix a crash when hot adding a PCI device to an LPAR since
recent changes
- Fix nested KVM level-2 guest reboot failure due to empty
'arch_compat'
Thanks to Amit Machhiwal, Aneesh Kumar K.V (IBM), Brian King, Gaurav
Batra, and Vaibhav Jain.
* tag 'powerpc-6.8-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
KVM: PPC: Book3S HV: Fix L2 guest reboot failure due to empty 'arch_compat'
powerpc/pseries/iommu: DLPAR add doesn't completely initialize pci_controller
Pull iommu fixes from Joerg Roedel:
- Intel VT-d fixes for nested domain handling:
- Cache invalidation for changes in a parent domain
- Dirty tracking setting for parent and nested domains
- Fix a constant-out-of-range warning
- ARM SMMU fixes:
- Fix CD allocation from atomic context when using SVA with SMMUv3
- Revert the conversion of SMMUv2 to domain_alloc_paging(), as it
breaks the boot for Qualcomm MSM8996 devices
- Restore SVA handle sharing in core code as it turned out there are
still drivers relying on it
* tag 'iommu-fixes-v6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/sva: Restore SVA handle sharing
iommu/arm-smmu-v3: Do not use GFP_KERNEL under as spinlock
iommu/vt-d: Fix constant-out-of-range warning
iommu/vt-d: Set SSADE when attaching to a parent with dirty tracking
iommu/vt-d: Add missing dirty tracking set for parent domain
iommu/vt-d: Wrap the dirty tracking loop to be a helper
iommu/vt-d: Remove domain parameter for intel_pasid_setup_dirty_tracking()
iommu/vt-d: Add missing device iotlb flush for parent domain
iommu/vt-d: Update iotlb in nested domain attach
iommu/vt-d: Add missing iotlb flush for parent domain
iommu/vt-d: Add __iommu_flush_iotlb_psi()
iommu/vt-d: Track nested domains in parent
Revert "iommu/arm-smmu: Convert to domain_alloc_paging()"
Pull cxl fixes from Dan Williams:
"A collection of significant fixes for the CXL subsystem.
The largest change in this set, that bordered on "new development", is
the fix for the fact that the location of the new qos_class attribute
did not match the Documentation. The fix ends up deleting more code
than it added, and it has a new unit test to backstop basic errors in
this interface going forward. So the "red-diff" and unit test saved
the "rip it out and try again" response.
In contrast, the new notification path for firmware reported CXL
errors (CXL CPER notifications) has a locking context bug that can not
be fixed with a red-diff. Given where the release cycle stands, it is
not comfortable to squeeze in that fix in these waning days. So, that
receives the "back it out and try again later" treatment.
There is a regression fix in the code that establishes memory NUMA
nodes for platform CXL regions. That has an ack from x86 folks. There
are a couple more fixups for Linux to understand (reassemble) CXL
regions instantiated by platform firmware. The policy around platforms
that do not match host-physical-address with system-physical-address
(i.e. systems that have an address translation mechanism between the
address range reported in the ACPI CEDT.CFMWS and endpoint decoders)
has been softened to abort driver load rather than teardown the memory
range (can cause system hangs). Lastly, there is a robustness /
regression fix for cases where the driver would previously continue in
the face of error, and a fixup for PCI error notification handling.
Summary:
- Fix NUMA initialization from ACPI CEDT.CFMWS
- Fix region assembly failures due to async init order
- Fix / simplify export of qos_class information
- Fix cxl_acpi initialization vs single-window-init failures
- Fix handling of repeated 'pci_channel_io_frozen' notifications
- Workaround platforms that violate host-physical-address ==
system-physical address assumptions
- Defer CXL CPER notification handling to v6.9"
* tag 'cxl-fixes-6.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
cxl/acpi: Fix load failures due to single window creation failure
acpi/ghes: Remove CXL CPER notifications
cxl/pci: Fix disabling memory if DVSEC CXL Range does not match a CFMWS window
cxl/test: Add support for qos_class checking
cxl: Fix sysfs export of qos_class for memdev
cxl: Remove unnecessary type cast in cxl_qos_class_verify()
cxl: Change 'struct cxl_memdev_state' *_perf_list to single 'struct cxl_dpa_perf'
cxl/region: Allow out of order assembly of autodiscovered regions
cxl/region: Handle endpoint decoders in cxl_region_find_decoder()
x86/numa: Fix the sort compare func used in numa_fill_memblks()
x86/numa: Fix the address overlap check in numa_fill_memblks()
cxl/pci: Skip to handle RAS errors if CXL.mem device is detached
Pull device mapper fix from Mike Snitzer:
- Fix DM integrity and verity targets to not use excessive stack when
they recheck in the error path.
* tag 'for-6.8/dm-fix-3' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm-integrity, dm-verity: reduce stack usage for recheck
Pull SCSI fixes from James Bottomley:
"Six fixes: the four driver ones are pretty trivial.
The larger two core changes are to try to fix various USB attached
devices which have somewhat eccentric ways of handling the VPD and
other mode pages which necessitate multiple revalidates (that were
removed in the interests of efficiency) and updating the heuristic for
supported VPD pages"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: jazz_esp: Only build if SCSI core is builtin
scsi: smartpqi: Fix disable_managed_interrupts
scsi: ufs: Uninitialized variable in ufshcd_devfreq_target()
scsi: target: pscsi: Fix bio_put() for error case
scsi: core: Consult supported VPD page list prior to fetching page
scsi: sd: usb_storage: uas: Access media prior to querying device properties
Pull i2c fix from Wolfram Sang:
"A bugfix for host drivers"
* tag 'i2c-for-6.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: imx: when being a target, mark the last read as processed
Pull LoongArch fixes from Huacai Chen:
"Fix two cpu-hotplug issues, fix the init sequence about FDT system,
fix the coding style of dts, and fix the wrong CPUCFG ID handling of
KVM"
* tag 'loongarch-fixes-6.8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
LoongArch: KVM: Streamline kvm_check_cpucfg() and improve comments
LoongArch: KVM: Rename _kvm_get_cpucfg() to _kvm_get_cpucfg_mask()
LoongArch: KVM: Fix input validation of _kvm_get_cpucfg() & kvm_check_cpucfg()
LoongArch: dts: Minor whitespace cleanup
LoongArch: Call early_init_fdt_scan_reserved_mem() earlier
LoongArch: Update cpu_sibling_map when disabling nonboot CPUs
LoongArch: Disable IRQ before init_fn() for nonboot CPUs
The newly added integrity_recheck() function has another larger stack
allocation, just like its caller integrity_metadata(). When it gets
inlined, the combination of the two exceeds the warning limit for 32-bit
architectures and possibly risks an overflow when this is called from
a deep call chain through a file system:
drivers/md/dm-integrity.c:1767:13: error: stack frame size (1048) exceeds limit (1024) in 'integrity_metadata' [-Werror,-Wframe-larger-than]
1767 | static void integrity_metadata(struct work_struct *w)
Since the caller at this point is done using its checksum buffer,
just reuse the same buffer in the new function to avoid the double
allocation.
[Mikulas: add "noinline" to integrity_recheck and verity_recheck.
These functions are only called on error, so they shouldn't bloat the
stack frame or code size of the caller.]
Fixes: c88f5e553f ("dm-integrity: recheck the integrity tag after a failure")
Fixes: 9177f3c0de ("dm-verity: recheck the hash after a failure")
Cc: stable@vger.kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
There is a loophole in pstate limit clamping for the intel_cpufreq CPU
frequency scaling driver (intel_pstate in passive mode), schedutil CPU
frequency scaling governor, HWP (HardWare Pstate) control enabled, when
the adjust_perf call back path is used.
Fix it.
Fixes: a365ab6b9d cpufreq: intel_pstate: Implement the ->adjust_perf() callback
Signed-off-by: Doug Smythies <dsmythies@telus.net>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Architectures like powerpc add debug checks to ensure we find only devmap
PUD pte entries. These debug checks are only done with CONFIG_DEBUG_VM.
This patch marks the ptes used for PUD advanced test devmap pte entries so
that we don't hit on debug checks on architecture like ppc64 as below.
WARNING: CPU: 2 PID: 1 at arch/powerpc/mm/book3s64/radix_pgtable.c:1382 radix__pud_hugepage_update+0x38/0x138
....
NIP [c0000000000a7004] radix__pud_hugepage_update+0x38/0x138
LR [c0000000000a77a8] radix__pudp_huge_get_and_clear+0x28/0x60
Call Trace:
[c000000004a2f950] [c000000004a2f9a0] 0xc000000004a2f9a0 (unreliable)
[c000000004a2f980] [000d34c100000000] 0xd34c100000000
[c000000004a2f9a0] [c00000000206ba98] pud_advanced_tests+0x118/0x334
[c000000004a2fa40] [c00000000206db34] debug_vm_pgtable+0xcbc/0x1c48
[c000000004a2fc10] [c00000000000fd28] do_one_initcall+0x60/0x388
Also
kernel BUG at arch/powerpc/mm/book3s64/pgtable.c:202!
....
NIP [c000000000096510] pudp_huge_get_and_clear_full+0x98/0x174
LR [c00000000206bb34] pud_advanced_tests+0x1b4/0x334
Call Trace:
[c000000004a2f950] [000d34c100000000] 0xd34c100000000 (unreliable)
[c000000004a2f9a0] [c00000000206bb34] pud_advanced_tests+0x1b4/0x334
[c000000004a2fa40] [c00000000206db34] debug_vm_pgtable+0xcbc/0x1c48
[c000000004a2fc10] [c00000000000fd28] do_one_initcall+0x60/0x388
Link: https://lkml.kernel.org/r/20240129060022.68044-1-aneesh.kumar@kernel.org
Fixes: 27af67f356 ("powerpc/book3s64/mm: enable transparent pud hugepage")
Signed-off-by: Aneesh Kumar K.V (IBM) <aneesh.kumar@kernel.org>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
In cachestat, we access the folio from the page cache's xarray to compute
its page offset, and check for its dirty and writeback flags. However, we
do not hold a reference to the folio before performing these actions,
which means the folio can concurrently be released and reused as another
folio/page/slab.
Get around this altogether by just using xarray's existing machinery for
the folio page offsets and dirty/writeback states.
This changes behavior for tmpfs files to now always report zeroes in their
dirty and writeback counters. This is okay as tmpfs doesn't follow
conventional writeback cache behavior: its pages get "cleaned" during
swapout, after which they're no longer resident etc.
Link: https://lkml.kernel.org/r/20240220153409.GA216065@cmpxchg.org
Fixes: cf264e1329 ("cachestat: implement cachestat syscall")
Reported-by: Jann Horn <jannh@google.com>
Suggested-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Nhat Pham <nphamcs@gmail.com>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Tested-by: Jann Horn <jannh@google.com>
Cc: <stable@vger.kernel.org> [6.4+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Recently there have been a number of patches which have affected various
aspects of the memory mapping logic as implemented in mm/mmap.c where it
would have been useful for regular contributors to have been notified.
Add an entry for this part of mm in particular with regular contributors
tagged as reviewers.
Link: https://lkml.kernel.org/r/20240220064410.4639-1-lstoakes@gmail.com
Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This partially reverts commits cc478e0b6b, 63b85ac56a, 08d7c94d96,
a414d4286f, and 773688a6cb to make use of variable-sized stack depot
records, since eviction of stack entries from stack depot forces fixed-
sized stack records. Care was taken to retain the code cleanups by the
above commits.
Eviction was added to generic KASAN as a response to alleviating the
additional memory usage from fixed-sized stack records, but this still
uses more memory than previously.
With the re-introduction of variable-sized records for stack depot, we can
just switch back to non-evictable stack records again, and return back to
the previous performance and memory usage baseline.
Before (observed after a KASAN kernel boot):
pools: 597
refcounted_allocations: 17547
refcounted_frees: 6477
refcounted_in_use: 11070
freelist_size: 3497
persistent_count: 12163
persistent_bytes: 1717008
After:
pools: 319
refcounted_allocations: 0
refcounted_frees: 0
refcounted_in_use: 0
freelist_size: 0
persistent_count: 29397
persistent_bytes: 5183536
As can be seen from the counters, with a generic KASAN config, refcounted
allocations and evictions are no longer used. Due to using variable-sized
records, I observe a reduction of 278 stack depot pools (saving 4448 KiB)
with my test setup.
Link: https://lkml.kernel.org/r/20240129100708.39460-2-elver@google.com
Fixes: cc478e0b6b ("kasan: avoid resetting aux_lock")
Fixes: 63b85ac56a ("kasan: stop leaking stack trace handles")
Fixes: 08d7c94d96 ("kasan: memset free track in qlink_free")
Fixes: a414d4286f ("kasan: handle concurrent kasan_record_aux_stack calls")
Fixes: 773688a6cb ("kasan: use stack_depot_put for Generic mode")
Signed-off-by: Marco Elver <elver@google.com>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The bit-sliced implementation of AES-CTR operates on blocks of 128
bytes, and will fall back to the plain NEON version for tail blocks or
inputs that are shorter than 128 bytes to begin with.
It will call straight into the plain NEON asm helper, which performs all
memory accesses in granules of 16 bytes (the size of a NEON register).
For this reason, the associated plain NEON glue code will copy inputs
shorter than 16 bytes into a temporary buffer, given that this is a rare
occurrence and it is not worth the effort to work around this in the asm
code.
The fallback from the bit-sliced NEON version fails to take this into
account, potentially resulting in out-of-bounds accesses. So clone the
same workaround, and use a temp buffer for short in/outputs.
Fixes: fc074e1300 ("crypto: arm64/aes-neonbs-ctr - fallback to plain NEON for final chunk")
Cc: <stable@vger.kernel.org>
Reported-by: syzbot+f1ceaa1a09ab891e1934@syzkaller.appspotmail.com
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The lskcipher glue code for skcipher needs to copy the IV every
time rather than only on the first and last request. Otherwise
those algorithms that use IV to perform chaining may break, e.g.,
CBC.
This is because crypto_skcipher_import/export do not include the
IV as part of the saved state.
Reported-by: syzbot+b90b904ef6bdfdafec1d@syzkaller.appspotmail.com
Fixes: 662ea18d08 ("crypto: skcipher - Make use of internal state")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
When being a target, NAK from the controller means that all bytes have
been transferred. So, the last byte needs also to be marked as
'processed'. Otherwise index registers of backends may not increase.
Fixes: f7414cd692 ("i2c: imx: support slave mode for imx I2C driver")
Signed-off-by: Corey Minyard <minyard@acm.org>
Tested-by: Andrew Manley <andrew.manley@sealingtech.com>
Reviewed-by: Andrew Manley <andrew.manley@sealingtech.com>
Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de>
[wsa: fixed comment and commit message to properly describe the case]
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
In apparmor_getselfattr() when an invalid AppArmor attribute is
requested, or a value hasn't been explicitly set for the requested
attribute, the label passed to aa_put_label() is not properly
initialized which can cause problems when the pointer value is non-NULL
and AppArmor attempts to drop a reference on the bogus label object.
Cc: Casey Schaufler <casey@schaufler-ca.com>
Cc: John Johansen <john.johansen@canonical.com>
Fixes: 223981db9b ("AppArmor: Add selfattr hooks")
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Reviewed-by: Paul Moore <paul@paul-moore.com>
[PM: description changes as discussed with MS]
Signed-off-by: Paul Moore <paul@paul-moore.com>
selinux_getselfattr() doesn't properly initialize the string pointer
it passes to selinux_lsm_getattr() which can cause a problem when an
attribute hasn't been explicitly set; selinux_lsm_getattr() returns
0/success, but does not set or initialize the string label/attribute.
Failure to properly initialize the string causes problems later in
selinux_getselfattr() when the function attempts to kfree() the
string.
Cc: Casey Schaufler <casey@schaufler-ca.com>
Fixes: 762c934317 ("SELinux: Add selfattr hooks")
Suggested-by: Paul Moore <paul@paul-moore.com>
[PM: description changes as discussed in the thread]
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Pull parisc architecture fixes from Helge Deller:
"Fixes CPU hotplug, the parisc stack unwinder and two possible build
errors in kprobes and ftrace area:
- Fix CPU hotplug
- Fix unaligned accesses and faults in stack unwinder
- Fix potential build errors by always including asm-generic/kprobes.h
- Fix build bug by add missing CONFIG_DYNAMIC_FTRACE check"
* tag 'parisc-for-6.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Fix stack unwinder
parisc/kprobes: always include asm-generic/kprobes.h
parisc/ftrace: add missing CONFIG_DYNAMIC_FTRACE check
Revert "parisc: Only list existing CPUs in cpu_possible_mask"
Pull arm and RISC-V SoC fixes from Arnd Bergmann:
"The Rockchip and IMX8 platforms get a number of fixes for dts files in
order to address some misconfigurations, including a regression for
USB-C support on some boards.
The other dts fixes are part of a series by Rob Herring to clean up
another class of dtc compiler warnings across all platforms, with a
few others helping out as well. With this, we can enable the warning
for the coming merge window without introducing regressions.
Conor Dooley has collected fixes for RISC-V platforms, both for the
dts files and for platofrm specific drivers.
The ep93xx platform gets a regression for for its gpio descriptors"
* tag 'arm-fixes-6.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (28 commits)
ARM: dts: renesas: rcar-gen2: Add missing #interrupt-cells to DA9063 nodes
cache: ax45mp_cache: Align end size to cache boundary in ax45mp_dma_cache_wback()
arm64: dts: qcom: Fix interrupt-map cell sizes
arm: dts: Fix dtc interrupt_map warnings
arm64: dts: Fix dtc interrupt_provider warnings
arm: dts: Fix dtc interrupt_provider warnings
arm64: dts: freescale: Disable interrupt_map check
ARM: ep93xx: Add terminator to gpiod_lookup_table
riscv: dts: sifive: add missing #interrupt-cells to pmic
arm64: dts: rockchip: Correct Indiedroid Nova GPIO Names
arm64: dts: rockchip: Drop interrupts property from rk3328 pwm-rockchip node
arm64: dts: rockchip: set num-cs property for spi on px30
arm64: dts: rockchip: minor rk3588 whitespace cleanup
riscv: dts: starfive: replace underscores in node names
bus: imx-weim: fix valid range check
Revert "arm64: dts: imx8mn-var-som-symphony: Describe the USB-C connector"
Revert "arm64: dts: imx8mp-dhcom-pdk3: Describe the USB-C connector"
arm64: dts: tqma8mpql: fix audio codec iov-supply
arm64: dts: rockchip: drop unneeded status from rk3588-jaguar gpio-leds
ARM: dts: rockchip: Drop interrupts property from pwm-rockchip nodes
...
Pull arm64 fixes from Will Deacon:
"A simple fix to a definition in the CXL PMU driver, a couple of
patches to restore SME control registers on the resume path (since
Arm's fast model now clears them) and a revert for our jump label asm
constraints after Geert noticed they broke the build with GCC 5.5.
There was then the ensuing discussion about raising the minimum GCC
(and corresponding binutils) versions at [1], but for now we'll keep
things working as they were until that goes ahead.
- Revert fix to jump label asm constraints, as it regresses the build
with some GCC 5.5 toolchains.
- Restore SME control registers when resuming from suspend
- Fix incorrect filter definition in CXL PMU driver"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64/sme: Restore SMCR_EL1.EZT0 on exit from suspend
arm64/sme: Restore SME registers on exit from suspend
Revert "arm64: jump_label: use constraints "Si" instead of "i""
perf: CXL: fix CPMU filter value mask length
Retry page faults without acquiring mmu_lock, and without even faulting
the page into the primary MMU, if the resolved gfn is covered by an active
invalidation. Contending for mmu_lock is especially problematic on
preemptible kernels as the mmu_notifier invalidation task will yield
mmu_lock (see rwlock_needbreak()), delay the in-progress invalidation, and
ultimately increase the latency of resolving the page fault. And in the
worst case scenario, yielding will be accompanied by a remote TLB flush,
e.g. if the invalidation covers a large range of memory and vCPUs are
accessing addresses that were already zapped.
Faulting the page into the primary MMU is similarly problematic, as doing
so may acquire locks that need to be taken for the invalidation to
complete (the primary MMU has finer grained locks than KVM's MMU), and/or
may cause unnecessary churn (getting/putting pages, marking them accessed,
etc).
Alternatively, the yielding issue could be mitigated by teaching KVM's MMU
iterators to perform more work before yielding, but that wouldn't solve
the lock contention and would negatively affect scenarios where a vCPU is
trying to fault in an address that is NOT covered by the in-progress
invalidation.
Add a dedicated lockess version of the range-based retry check to avoid
false positives on the sanity check on start+end WARN, and so that it's
super obvious that checking for a racing invalidation without holding
mmu_lock is unsafe (though obviously useful).
Wrap mmu_invalidate_in_progress in READ_ONCE() to ensure that pre-checking
invalidation in a loop won't put KVM into an infinite loop, e.g. due to
caching the in-progress flag and never seeing it go to '0'.
Force a load of mmu_invalidate_seq as well, even though it isn't strictly
necessary to avoid an infinite loop, as doing so improves the probability
that KVM will detect an invalidation that already completed before
acquiring mmu_lock and bailing anyways.
Do the pre-check even for non-preemptible kernels, as waiting to detect
the invalidation until mmu_lock is held guarantees the vCPU will observe
the worst case latency in terms of handling the fault, and can generate
even more mmu_lock contention. E.g. the vCPU will acquire mmu_lock,
detect retry, drop mmu_lock, re-enter the guest, retake the fault, and
eventually re-acquire mmu_lock. This behavior is also why there are no
new starvation issues due to losing the fairness guarantees provided by
rwlocks: if the vCPU needs to retry, it _must_ drop mmu_lock, i.e. waiting
on mmu_lock doesn't guarantee forward progress in the face of _another_
mmu_notifier invalidation event.
Note, adding READ_ONCE() isn't entirely free, e.g. on x86, the READ_ONCE()
may generate a load into a register instead of doing a direct comparison
(MOV+TEST+Jcc instead of CMP+Jcc), but practically speaking the added cost
is a few bytes of code and maaaaybe a cycle or three.
Reported-by: Yan Zhao <yan.y.zhao@intel.com>
Closes: https://lore.kernel.org/all/ZNnPF4W26ZbAyGto@yzhao56-desk.sh.intel.com
Reported-by: Friedrich Weber <f.weber@proxmox.com>
Cc: Kai Huang <kai.huang@intel.com>
Cc: Yan Zhao <yan.y.zhao@intel.com>
Cc: Yuan Yao <yuan.yao@linux.intel.com>
Cc: Xu Yilun <yilun.xu@linux.intel.com>
Acked-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Link: https://lore.kernel.org/r/20240222012640.2820927-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Pull s390 fixes from Heiko Carstens:
- Fix invalid -EBUSY on ccw_device_start() which can lead to failing
device initialization
- Add missing multiplication by 8 in __iowrite64_copy() to get the
correct byte length before calling zpci_memcpy_toio()
- Various config updates
* tag 's390-6.8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/cio: fix invalid -EBUSY on ccw_device_start
s390: use the correct count for __iowrite64_copy()
s390/configs: update default configurations
s390/configs: enable INIT_STACK_ALL_ZERO in all configurations
s390/configs: provide compat topic configuration target
Pull misc fixes from Andrew Morton:
"A batch of MM (and one non-MM) hotfixes.
Ten are cc:stable and the remainder address post-6.7 issues or aren't
considered appropriate for backporting"
* tag 'mm-hotfixes-stable-2024-02-22-15-02' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
kasan: guard release_free_meta() shadow access with kasan_arch_is_ready()
mm/damon/lru_sort: fix quota status loss due to online tunings
mm/damon/reclaim: fix quota stauts loss due to online tunings
MAINTAINERS: mailmap: update Shakeel's email address
mm/damon/sysfs-schemes: handle schemes sysfs dir removal before commit_schemes_quota_goals
mm: memcontrol: clarify swapaccount=0 deprecation warning
mm/memblock: add MEMBLOCK_RSRV_NOINIT into flagname[] array
mm/zswap: invalidate duplicate entry when !zswap_enabled
lib/Kconfig.debug: TEST_IOV_ITER depends on MMU
mm/swap: fix race when skipping swapcache
mm/swap_state: update zswap LRU's protection range with the folio locked
selftests/mm: uffd-unit-test check if huge page size is 0
mm/damon/core: check apply interval in damon_do_apply_schemes()
mm: zswap: fix missing folio cleanup in writeback race path
Pull device mapper fixes from Mike Snitzer:
- Stable fixes for 3 DM targets (integrity, verity and crypt) to
address systemic failure that can occur if user provided pages map to
the same block.
- Fix DM crypt to not allow modifying data that being encrypted for
authenticated encryption.
- Fix DM crypt and verity targets to align their respective bvec_iter
struct members to avoid the need for byte level access (due to
__packed attribute) that is costly on some arches (like RISC).
* tag 'for-6.8/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm-crypt, dm-integrity, dm-verity: bump target version
dm-verity, dm-crypt: align "struct bvec_iter" correctly
dm-crypt: recheck the integrity tag after a failure
dm-crypt: don't modify the data when using authenticated encryption
dm-verity: recheck the hash after a failure
dm-integrity: recheck the integrity tag after a failure
Pull drm fixes from Dave Airlie:
"This is the weekly drm fixes. Non-drivers there is a fbdev/sparc fix,
syncobj, ttm and buddy fixes.
On the driver side, ivpu, meson, i915 have a small fix each. Then
amdgpu and xe have a bunch. Nouveau has some minor uapi additions to
give userspace some useful info along with a Kconfig change to allow
the new GSP firmware paths to be used by default on the GPUs it
supports.
Seems about the usual amount for this time of release cycle.
fbdev:
- fix sparc undefined reference
syncobj:
- fix sync obj fence waiting
- handle NULL fence in syncobj eventfd code
ttm:
- fix invalid free
buddy:
- fix list handling
- fix 32-bit build
meson:
- don't remove bridges from other drivers
nouveau:
- fix build warnings
- add two minor info parameters
- add a Kconfig to allow GSP by default on some GPUs
ivpu:
- allow fw to do initial tile config
i915:
- fix TV mode
amdgpu:
- Suspend/resume fixes
- Backlight error fix
- DCN 3.5 fixes
- Misc fixes
xe:
- Remove support for persistent exec_queues
- Drop a reduntant sysfs newline printout
- A three-patch fix for a VM_BIND rebind optimization path
- Fix a modpost warning on an xe KUNIT module"
* tag 'drm-fixes-2024-02-23' of git://anongit.freedesktop.org/drm/drm: (27 commits)
nouveau: add an ioctl to report vram usage
nouveau: add an ioctl to return vram bar size.
nouveau/gsp: add kconfig option to enable GSP paths by default
drm/amdgpu: Fix the runtime resume failure issue
drm/amd/display: fix null-pointer dereference on edid reading
drm/amd/display: Fix memory leak in dm_sw_fini()
drm/amd/display: fix input states translation error for dcn35 & dcn351
drm/amd/display: Fix potential null pointer dereference in dc_dmub_srv
drm/amd/display: Only allow dig mapping to pwrseq in new asic
drm/amd/display: adjust few initialization order in dm
drm/syncobj: handle NULL fence in syncobj_eventfd_entry_func
drm/syncobj: call drm_syncobj_fence_add_wait when WAIT_AVAILABLE flag is set
drm/ttm: Fix an invalid freeing on already freed page in error path
sparc: Fix undefined reference to fb_is_primary_device
drm/xe: Fix modpost warning on xe_mocs kunit module
drm/xe/xe_gt_idle: Drop redundant newline in name
drm/xe: Return 2MB page size for compact 64k PTEs
drm/xe: Add XE_VMA_PTE_64K VMA flag
drm/xe: Fix xe_vma_set_pte_size
drm/xe/uapi: Remove support for persistent exec_queues
...
Pull ata fixes from Niklas Cassel:
- Do not try to set a sleeping device to standby. Sleep is a deeper
sleep state than standby, and needs a reset to wake up the drive. A
system resume will reset the port. Sending a command other than reset
to a sleeping device is not wise, as the command will timeout (Damien
Le Moal)
- Do not try to put a device to standby twice during system shutdown.
ata_dev_power_set_standby() is currently called twice during
shutdown, once after the scsi device is removed, and another when
ata_pci_shutdown_one() executes. Modify ata_dev_power_set_standby()
to do nothing if the device is already in standby (Damien Le Moal)
- Add a quirk for ASM1064 to fixup the number of implemented ports. We
probe all ports that the hardware reports to be implemented. Probing
ports that are not implemented causes significantly increased boot
time (Andrey Jr. Melnikov)
- Fix error handling for the ahci_ceva driver. Ensure that the
ahci_ceva driver does a proper cleanup of its resources in the error
path (Radhey Shyam Pandey)
* tag 'ata-6.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux:
ata: libata-core: Do not call ata_dev_power_set_standby() twice
ata: ahci_ceva: fix error handling for Xilinx GT PHY support
ahci: asm1064: correct count of reported ports
ata: libata-core: Do not try to set sleeping devices to standby
Pull gpio fix from Bartosz Golaszewski:
- fix a use-case where no pins are mapped to GPIOs in
gpiochip_generic_config()
* tag 'gpio-fixes-for-v6.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpiolib: Handle no pin_ranges in gpiochip_generic_config()
Before attempting to support the pre-ratification version of vector
found on older T-Head CPUs, disallow "v" in riscv,isa on these
platforms. The deprecated property has no clear way to communicate
the specific version of vector that is supported and much of the vendor
provided software puts "v" in the isa string. riscv,isa-extensions
should be used instead. This should not be too much of a burden for
these systems, as the vendor shipped devicetrees and firmware do not
work with a mainline kernel and will require updating.
We can limit this restriction to only ignore v in riscv,isa on CPUs
that report T-Head's vendor ID and a zero marchid. Newer T-Head CPUs
that support the ratified version of vector should report non-zero
marchid, according to Guo Ren [1].
Link: https://lore.kernel.org/linux-riscv/CAJF2gTRy5eK73=d6s7CVy9m9pB8p4rAoMHM3cZFwzg=AuF7TDA@mail.gmail.com/ [1]
Fixes: dc6667a4e7 ("riscv: Extending cpufeature.c to detect V-extension")
Co-developed-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Acked-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20240223-tidings-shabby-607f086cb4d7@spud
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Pull hwmon fix from Guenter Roeck:
"Fix a global-out-of-bounds bug in nct6775 driver"
* tag 'hwmon-for-v6.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (nct6775) Fix access to temperature configuration registers
Prior to commit 092edaddb6 ("iommu: Support mm PASID 1:n with sva
domains") the code allowed a SVA handle to be bound multiple times to the
same (mm, device) pair. This was alluded to in the kdoc comment, but we
had understood this to be more a remark about allowing multiple devices,
not a literal same-driver re-opening the same SVA.
It turns out uacce and idxd were both relying on the core code to handle
reference counting for same-device same-mm scenarios. As this looks hard
to resolve in the drivers bring it back to the core code.
The new design has changed the meaning of the domain->users refcount to
refer to the number of devices that are sharing that domain for the same
mm. This is part of the design to lift the SVA domain de-duplication out
of the drivers.
Return the old behavior by explicitly de-duplicating the struct iommu_sva
handle. The same (mm, device) will return the same handle pointer and the
core code will handle tracking this. The last unbind of the handle will
destroy it.
Fixes: 092edaddb6 ("iommu: Support mm PASID 1:n with sva domains")
Reported-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Closes: https://lore.kernel.org/all/20240221110658.529-1-zhangfei.gao@linaro.org/
Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/0-v1-9455fc497a6f+3b4-iommu_sva_sharing_jgg@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Arm SMMU fixes for 6.8
- Fix CD allocation from atomic context when using SVA with SMMUv3
- Revert the conversion of SMMUv2 to domain_alloc_paging(), as it
breaks the boot for Qualcomm MSM8996 devices
A recent DRM series purporting to simplify support for "transparent
bridges" and handling of probe deferrals ironically exposed a
use-after-free issue on pmic_glink_altmode probe deferral.
This has manifested itself as the display subsystem occasionally failing
to initialise and NULL-pointer dereferences during boot of machines like
the Lenovo ThinkPad X13s.
Specifically, the dp-hpd bridge is currently registered before all
resources have been acquired which means that it can also be
deregistered on probe deferrals.
In the meantime there is a race window where the new aux bridge driver
(or PHY driver previously) may have looked up the dp-hpd bridge and
stored a (non-reference-counted) pointer to the bridge which is about to
be deallocated.
When the display controller is later initialised, this triggers a
use-after-free when attaching the bridges:
dp -> aux -> dp-hpd (freed)
which may, for example, result in the freed bridge failing to attach:
[drm:drm_bridge_attach [drm]] *ERROR* failed to attach bridge /soc@0/phy@88eb000 to encoder TMDS-31: -16
or a NULL-pointer dereference:
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
...
Call trace:
drm_bridge_attach+0x70/0x1a8 [drm]
drm_aux_bridge_attach+0x24/0x38 [aux_bridge]
drm_bridge_attach+0x80/0x1a8 [drm]
dp_bridge_init+0xa8/0x15c [msm]
msm_dp_modeset_init+0x28/0xc4 [msm]
The DRM bridge implementation is clearly fragile and implicitly built on
the assumption that bridges may never go away. In this case, the fix is
to move the bridge registration in the pmic_glink_altmode driver to
after all resources have been looked up.
Incidentally, with the new dp-hpd bridge implementation, which registers
child devices, this is also a requirement due to a long-standing issue
in driver core that can otherwise lead to a probe deferral loop (see
commit fbc35b45f9 ("Add documentation on meaning of -EPROBE_DEFER")).
[DB: slightly fixed commit message by adding the word 'commit']
Fixes: 080b4e2485 ("soc: qcom: pmic_glink: Introduce altmode support")
Fixes: 2bcca96abf ("soc: qcom: pmic-glink: switch to DRM_AUX_HPD_BRIDGE")
Cc: <stable@vger.kernel.org> # 6.3
Cc: Bjorn Andersson <andersson@kernel.org>
Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Reviewed-by: Bjorn Andersson <andersson@kernel.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20240217150228.5788-4-johan+linaro@kernel.org
Combining allocation and registration is an anti-pattern that should be
avoided. Add two new functions for allocating and registering an dp-hpd
bridge with a proper 'devm' prefix so that it is clear that these are
device managed interfaces.
devm_drm_dp_hpd_bridge_alloc()
devm_drm_dp_hpd_bridge_add()
The new interface will be used to fix a use-after-free bug in the
Qualcomm PMIC GLINK driver and may prevent similar issues from being
introduced elsewhere.
The existing drm_dp_hpd_bridge_register() is reimplemented using the
above and left in place for now.
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Reviewed-by: Bjorn Andersson <andersson@kernel.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20240217150228.5788-3-johan+linaro@kernel.org
snd_soc_card_get_kcontrol() must be holding a read lock on
card->controls_rwsem while walking the controls list.
Compare with snd_ctl_find_numid().
The existing function is renamed snd_soc_card_get_kcontrol_locked()
so that it can be called from contexts that are already holding
card->controls_rwsem (for example, control get/put functions).
There are few direct or indirect callers of
snd_soc_card_get_kcontrol(), and most are safe. Three require
changes, which have been included in this patch:
codecs/cs35l45.c:
cs35l45_activate_ctl() is called from a control put() function so
is changed to call snd_soc_card_get_kcontrol_locked().
codecs/cs35l56.c:
cs35l56_sync_asp1_mixer_widgets_with_firmware() is called from
control get()/put() functions so is changed to call
snd_soc_card_get_kcontrol_locked().
fsl/fsl_xcvr.c:
fsl_xcvr_activate_ctl() is called from three places, one of which
already holds card->controls_rwsem:
1. fsl_xcvr_mode_put(), a control put function, which will
already be holding card->controls_rwsem.
2. fsl_xcvr_startup(), a DAI startup function.
3. fsl_xcvr_shutdown(), a DAI shutdown function.
To fix this, fsl_xcvr_activate_ctl() has been changed to call
snd_soc_card_get_kcontrol_locked() so that it is safe to call
directly from fsl_xcvr_mode_put().
The fsl_xcvr_startup() and fsl_xcvr_shutdown() functions have been
changed to take a read lock on card->controls_rsem() around calls
to fsl_xcvr_activate_ctl(). While this is not very elegant, it
keeps the change small, to avoid this patch creating a large
collateral churn in fsl/fsl_xcvr.c.
Analysis of other callers of snd_soc_card_get_kcontrol() is that
they do not need any changes, they are not holding card->controls_rwsem
when they call snd_soc_card_get_kcontrol().
Direct callers of snd_soc_card_get_kcontrol():
fsl/fsl_spdif.c: fsl_spdif_dai_probe() - DAI probe function
fsl/fsl_micfil.c: voice_detected_fn() - IRQ handler
Indirect callers via soc_component_notify_control():
codecs/cs42l43: cs42l43_mic_shutter() - IRQ handler
codecs/cs42l43: cs42l43_spk_shutter() - IRQ handler
codecs/ak4118.c: ak4118_irq_handler() - IRQ handler
codecs/wm_adsp.c: wm_adsp_write_ctl() - not currently used
Indirect callers via snd_soc_limit_volume():
qcom/sc8280xp.c: sc8280xp_snd_init() - DAIlink init function
ti/rx51.c: rx51_aic34_init() - DAI init function
I don't have hardware to test the fsl/*, qcom/sc828xp.c, ti/rx51.c
and ak4118.c changes.
Backport note:
The fsl/, qcom/, cs35l45, cs35l56 and cs42l43 callers were added
since the Fixes commit so won't all be present on older kernels.
Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Fixes: 209c6cdfd2 ("ASoC: soc-card: move snd_soc_card_get_kcontrol() to soc-card")
Link: https://lore.kernel.org/r/20240221123710.690224-1-rf@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>
RISC-V Devicetree fixes for v6.8-rc6
Two fixes for W=2 issues in devicetrees, which should constitute fixes
for all reasonable-to-fix W=2 problems on RISC-V. The others are caused
by standard USB and MMC property names containing underscores that are
not likely to ever change.
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
* tag 'riscv-dt-fixes-for-v6.8-rc6' of https://git.kernel.org/pub/scm/linux/kernel/git/conor/linux:
riscv: dts: sifive: add missing #interrupt-cells to pmic
riscv: dts: starfive: replace underscores in node names
Link: https://lore.kernel.org/r/20240221-foil-glade-09dbf1aa3fe2@spud
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Commit 3ce4f9c3fb ("net/ps3_gelic_net: Add gelic_descr structures") of
6.8-rc1 had a copy-and-paste error where the pointer that holds the
allocated SKB (struct gelic_descr.skb) was set to NULL after the SKB was
allocated. This resulted in a kernel panic when the SKB pointer was
accessed.
This fix moves the initialization of the gelic_descr to before the SKB
is allocated.
Reported-by: sambat goson <sombat3960@gmail.com>
Fixes: 3ce4f9c3fb ("net/ps3_gelic_net: Add gelic_descr structures")
Signed-off-by: Geoff Levand <geoff@infradead.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since commit 5d93cfcf73 ("net: dpaa: Convert to phylink"), we support
the "10gbase-r" phy-mode through a driver-based conversion of "xgmii",
but we still don't actually support it when the device tree specifies
"10gbase-r" proper.
This is because boards such as LS1046A-RDB do not define pcs-handle-names
(for whatever reason) in the ethernet@f0000 device tree node, and the
code enters through this code path:
err = of_property_match_string(mac_node, "pcs-handle-names", "xfi");
// code takes neither branch and falls through
if (err >= 0) {
(...)
} else if (err != -EINVAL && err != -ENODATA) {
goto _return_fm_mac_free;
}
(...)
/* For compatibility, if pcs-handle-names is missing, we assume this
* phy is the first one in pcsphy-handle
*/
err = of_property_match_string(mac_node, "pcs-handle-names", "sgmii");
if (err == -EINVAL || err == -ENODATA)
pcs = memac_pcs_create(mac_node, 0); // code takes this branch
else if (err < 0)
goto _return_fm_mac_free;
else
pcs = memac_pcs_create(mac_node, err);
// A default PCS is created and saved in "pcs"
// This determination fails and mistakenly saves the default PCS
// memac->sgmii_pcs instead of memac->xfi_pcs, because at this
// stage, mac_dev->phy_if == PHY_INTERFACE_MODE_10GBASER.
if (err && mac_dev->phy_if == PHY_INTERFACE_MODE_XGMII)
memac->xfi_pcs = pcs;
else
memac->sgmii_pcs = pcs;
In other words, in the absence of pcs-handle-names, the default
xfi_pcs assignment logic only works when in the device tree we have
PHY_INTERFACE_MODE_XGMII.
By reversing the order between the fallback xfi_pcs assignment and the
"xgmii" overwrite with "10gbase-r", we are able to support both values
in the device tree, with identical behavior.
Currently, it is impossible to make the s/xgmii/10gbase-r/ device tree
conversion, because it would break forward compatibility (new device
tree with old kernel). The only way to modify existing device trees to
phy-interface-mode = "10gbase-r" is to fix stable kernels to accept this
value and handle it properly.
One reason why the conversion is desirable is because with pre-phylink
kernels, the Aquantia PHY driver used to warn about the improper use
of PHY_INTERFACE_MODE_XGMII [1]. It is best to have a single (latest)
device tree that works with all supported stable kernel versions.
Note that the blamed commit does not constitute a regression per se.
Older stable kernels like 6.1 still do not work with "10gbase-r", but
for a different reason. That is a battle for another time.
[1] https://lore.kernel.org/netdev/20240214-ls1046-dts-use-10gbase-r-v1-1-8c2d68547393@concurrent-rt.com/
Fixes: 5d93cfcf73 ("net: dpaa: Convert to phylink")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Sean Anderson <sean.anderson@seco.com>
Acked-by: Madalin Bucur <madalin.bucur@oss.nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Do the cache flush of converted pages in svm_register_enc_region() before
dropping kvm->lock to fix use-after-free issues where region and/or its
array of pages could be freed by a different task, e.g. if userspace has
__unregister_enc_region_locked() already queued up for the region.
Note, the "obvious" alternative of using local variables doesn't fully
resolve the bug, as region->pages is also dynamically allocated. I.e. the
region structure itself would be fine, but region->pages could be freed.
Flushing multiple pages under kvm->lock is unfortunate, but the entire
flow is a rare slow path, and the manual flush is only needed on CPUs that
lack coherency for encrypted memory.
Fixes: 19a23da539 ("Fix unsynchronized access to sev members through svm_register_enc_region")
Reported-by: Gabe Kirkpatrick <gkirkpatrick@google.com>
Cc: Josh Eads <josheads@google.com>
Cc: Peter Gonda <pgonda@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20240217013430.2079561-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Return used most significant bits from sample bit-width rather than the whole
physical sample word size. The starting bit offset is defined in the format
itself.
The behaviour is not changed for 32-bit formats like S32_LE. But with this
change - msbits value 24 instead 32 is returned for 24-bit formats like S24_LE
etc.
Also, commit 2112aa0349 ("ALSA: pcm: Introduce MSBITS subformat interface")
compares sample bit-width not physical sample bit-width to reset MSBITS_MAX bit
from the subformat bitmask.
Probably no applications are using msbits value for other than S32_LE/U32_LE
formats, because no drivers are reducing msbits value for other formats (with
the msb offset) at the moment.
For sanity, increase PCM protocol version, letting the user space to detect
the changed behaviour.
Signed-off-by: Jaroslav Kysela <perex@perex.cz>
Link: https://lore.kernel.org/r/20240222173649.1447549-1-perex@perex.cz
Signed-off-by: Takashi Iwai <tiwai@suse.de>
When a station has not been uploaded yet, receiving SMPS or channel width
notification action frames can lead to rate_control_rate_update calling
drv_sta_rc_update with uninitialized driver private data.
Fix this by adding a missing check for sta->uploaded.
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://msgid.link/20240221140535.16102-1-nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The PTDMA driver sets DMA masks in two different places for the same
device inconsistently. First call is in pt_pci_probe(), where it uses
48bit mask. The second call is in pt_dmaengine_register(), where it
uses a 64bit mask. Using 64bit dma mask causes IO_PAGE_FAULT errors
on DMA transfers between main memory and other devices.
Without the extra call it works fine. Additionally the second call
doesn't check the return value so it can silently fail.
Remove the superfluous dma_set_mask() call and only use 48bit mask.
Cc: stable@vger.kernel.org
Fixes: b0b4a6b105 ("dmaengine: ptdma: register PTDMA controller as a DMA resource")
Reviewed-by: Basavaraj Natikar <Basavaraj.Natikar@amd.com>
Signed-off-by: Tadeusz Struk <tstruk@gigaio.com>
Link: https://lore.kernel.org/r/20240222163053.13842-1-tstruk@gigaio.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
All the checks currently done in kvm_check_cpucfg can be realized with
early returns, so just do that to avoid extra cognitive burden related
to the return value handling.
While at it, clean up comments of _kvm_get_cpucfg_mask() and
kvm_check_cpucfg(), by removing comments that are merely restatement of
the code nearby, and paraphrasing the rest so they read more natural for
English speakers (that likely are not familiar with the actual Chinese-
influenced grammar).
No functional changes intended.
Reviewed-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: WANG Xuerui <git@xen0n.name>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
The function is not actually a getter of guest CPUCFG, but rather
validation of the input CPUCFG ID plus information about the supported
bit flags of that CPUCFG leaf. So rename it to avoid confusion.
Reviewed-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: WANG Xuerui <git@xen0n.name>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
The range check for the CPUCFG ID is wrong (should have been a ||
instead of &&) and useless in effect, so fix the obvious mistake.
Furthermore, the juggling of the temp return value is unnecessary,
because it is semantically equivalent and more readable to just
return at every switch case's end. This is done too to avoid potential
bugs in the future related to the unwanted complexity.
Also, the return value of _kvm_get_cpucfg is meant to be checked, but
this was not done, so bad CPUCFG IDs wrongly fall back to the default
case and 0 is incorrectly returned; check the return value to fix the
UAPI behavior.
While at it, also remove the redundant range check in kvm_check_cpucfg,
because out-of-range CPUCFG IDs are already rejected by the -EINVAL
as returned by _kvm_get_cpucfg().
Fixes: db1ecca22e ("LoongArch: KVM: Add LSX (128bit SIMD) support")
Fixes: 118e10cd89 ("LoongArch: KVM: Add LASX (256bit SIMD) support")
Reviewed-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: WANG Xuerui <git@xen0n.name>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
The unflatten_and_copy_device_tree() function contains a call to
memblock_alloc(). This means that memblock is allocating memory before
any of the reserved memory regions are set aside in the arch_mem_init()
function which calls early_init_fdt_scan_reserved_mem(). Therefore,
there is a possibility for memblock to allocate from any of the
reserved memory regions.
Hence, move the call to early_init_fdt_scan_reserved_mem() to be earlier
in the init sequence, so that the reserved memory regions are set aside
before any allocations are done using memblock.
Cc: stable@vger.kernel.org
Fixes: 88d4d957ed ("LoongArch: Add FDT booting support from efi system table")
Signed-off-by: Oreoluwa Babatunde <quic_obabatun@quicinc.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
The PAPR spec spells the function name as
"ibm,reset-pe-dma-windows"
but in practice firmware uses the singular form:
"ibm,reset-pe-dma-window"
in the device tree. Since we have the wrong spelling in the RTAS
function table, reverse lookups (token -> name) fail and warn:
unexpected failed lookup for token 86
WARNING: CPU: 1 PID: 545 at arch/powerpc/kernel/rtas.c:659 __do_enter_rtas_trace+0x2a4/0x2b4
CPU: 1 PID: 545 Comm: systemd-udevd Not tainted 6.8.0-rc4 #30
Hardware name: IBM,9105-22A POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1060.00 (NL1060_028) hv:phyp pSeries
NIP [c0000000000417f0] __do_enter_rtas_trace+0x2a4/0x2b4
LR [c0000000000417ec] __do_enter_rtas_trace+0x2a0/0x2b4
Call Trace:
__do_enter_rtas_trace+0x2a0/0x2b4 (unreliable)
rtas_call+0x1f8/0x3e0
enable_ddw.constprop.0+0x4d0/0xc84
dma_iommu_dma_supported+0xe8/0x24c
dma_set_mask+0x5c/0xd8
mlx5_pci_init.constprop.0+0xf0/0x46c [mlx5_core]
probe_one+0xfc/0x32c [mlx5_core]
local_pci_probe+0x68/0x12c
pci_call_probe+0x68/0x1ec
pci_device_probe+0xbc/0x1a8
really_probe+0x104/0x570
__driver_probe_device+0xb8/0x224
driver_probe_device+0x54/0x130
__driver_attach+0x158/0x2b0
bus_for_each_dev+0xa8/0x120
driver_attach+0x34/0x48
bus_add_driver+0x174/0x304
driver_register+0x8c/0x1c4
__pci_register_driver+0x68/0x7c
mlx5_init+0xb8/0x118 [mlx5_core]
do_one_initcall+0x60/0x388
do_init_module+0x7c/0x2a4
init_module_from_file+0xb4/0x108
idempotent_init_module+0x184/0x34c
sys_finit_module+0x90/0x114
And oopses are possible when lockdep is enabled or the RTAS
tracepoints are active, since those paths dereference the result of
the lookup.
Use the correct spelling to match firmware's behavior, adjusting the
related constants to match.
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Fixes: 8252b88294 ("powerpc/rtas: improve function information lookups")
Reported-by: Gaurav Batra <gbatra@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240222-rtas-fix-ibm-reset-pe-dma-window-v1-1-7aaf235ac63c@linux.ibm.com
When kdump kernel tries to copy dump data over SR-IOV, LPAR panics due
to NULL pointer exception:
Kernel attempted to read user page (0) - exploit attempt? (uid: 0)
BUG: Kernel NULL pointer dereference on read at 0x00000000
Faulting instruction address: 0xc000000020847ad4
Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
Modules linked in: mlx5_core(+) vmx_crypto pseries_wdt papr_scm libnvdimm mlxfw tls psample sunrpc fuse overlay squashfs loop
CPU: 12 PID: 315 Comm: systemd-udevd Not tainted 6.4.0-Test102+ #12
Hardware name: IBM,9080-HEX POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1060.00 (NH1060_008) hv:phyp pSeries
NIP: c000000020847ad4 LR: c00000002083b2dc CTR: 00000000006cd18c
REGS: c000000029162ca0 TRAP: 0300 Not tainted (6.4.0-Test102+)
MSR: 800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 48288244 XER: 00000008
CFAR: c00000002083b2d8 DAR: 0000000000000000 DSISR: 40000000 IRQMASK: 1
...
NIP _find_next_zero_bit+0x24/0x110
LR bitmap_find_next_zero_area_off+0x5c/0xe0
Call Trace:
dev_printk_emit+0x38/0x48 (unreliable)
iommu_area_alloc+0xc4/0x180
iommu_range_alloc+0x1e8/0x580
iommu_alloc+0x60/0x130
iommu_alloc_coherent+0x158/0x2b0
dma_iommu_alloc_coherent+0x3c/0x50
dma_alloc_attrs+0x170/0x1f0
mlx5_cmd_init+0xc0/0x760 [mlx5_core]
mlx5_function_setup+0xf0/0x510 [mlx5_core]
mlx5_init_one+0x84/0x210 [mlx5_core]
probe_one+0x118/0x2c0 [mlx5_core]
local_pci_probe+0x68/0x110
pci_call_probe+0x68/0x200
pci_device_probe+0xbc/0x1a0
really_probe+0x104/0x540
__driver_probe_device+0xb4/0x230
driver_probe_device+0x54/0x130
__driver_attach+0x158/0x2b0
bus_for_each_dev+0xa8/0x130
driver_attach+0x34/0x50
bus_add_driver+0x16c/0x300
driver_register+0xa4/0x1b0
__pci_register_driver+0x68/0x80
mlx5_init+0xb8/0x100 [mlx5_core]
do_one_initcall+0x60/0x300
do_init_module+0x7c/0x2b0
At the time of LPAR dump, before kexec hands over control to kdump
kernel, DDWs (Dynamic DMA Windows) are scanned and added to the FDT.
For the SR-IOV case, default DMA window "ibm,dma-window" is removed from
the FDT and DDW added, for the device.
Now, kexec hands over control to the kdump kernel.
When the kdump kernel initializes, PCI busses are scanned and IOMMU
group/tables created, in pci_dma_bus_setup_pSeriesLP(). For the SR-IOV
case, there is no "ibm,dma-window". The original commit: b1fc44eaa9,
fixes the path where memory is pre-mapped (direct mapped) to the DDW.
When TCEs are direct mapped, there is no need to initialize IOMMU
tables.
iommu_table_setparms_lpar() only considers "ibm,dma-window" property
when initiallizing IOMMU table. In the scenario where TCEs are
dynamically allocated for SR-IOV, newly created IOMMU table is not
initialized. Later, when the device driver tries to enter TCEs for the
SR-IOV device, NULL pointer execption is thrown from iommu_area_alloc().
The fix is to initialize the IOMMU table with DDW property stored in the
FDT. There are 2 points to remember:
1. For the dedicated adapter, kdump kernel would encounter both
default and DDW in FDT. In this case, DDW property is used to
initialize the IOMMU table.
2. A DDW could be direct or dynamic mapped. kdump kernel would
initialize IOMMU table and mark the existing DDW as
"dynamic". This works fine since, at the time of table
initialization, iommu_table_clear() makes some space in the
DDW, for some predefined number of TCEs which are needed for
kdump to succeed.
Fixes: b1fc44eaa9 ("pseries/iommu/ddw: Fix kdump to work in absence of ibm,dma-window")
Signed-off-by: Gaurav Batra <gbatra@linux.vnet.ibm.com>
Reviewed-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240125203017.61014-1-gbatra@linux.ibm.com
Currently, mctp_local_output only takes ownership of skb on success, and
we may leak an skb if mctp_local_output fails in specific states; the
skb ownership isn't transferred until the actual output routing occurs.
Instead, make mctp_local_output free the skb on all error paths up to
the route action, so it always consumes the passed skb.
Fixes: 833ef3b91d ("mctp: Populate socket implementation")
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240220081053.1439104-1-jk@codeconstruct.com.au
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2024-02-20 (ice)
This series contains updates to ice driver only.
Yochai sets parent device to properly reflect connection state between
source DPLL and output pin.
Arkadiusz fixes additional issues related to DPLL; proper reporting of
phase_adjust value and preventing use/access of data while resetting.
Amritha resolves ASSERT_RTNL() being triggered on certain reset/rebuild
flows.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
ice: Fix ASSERT_RTNL() warning during certain scenarios
ice: fix pin phase adjust updates on PF reset
ice: fix dpll periodic work data updates on PF reset
ice: fix dpll and dpll_pin data access on PF reset
ice: fix dpll input pin phase_adjust value updates
ice: fix connection state of DPLL and out pin
====================
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20240220214444.1039759-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
syzkaller triggered following kasan splat:
BUG: KASAN: use-after-free in __skb_flow_dissect+0x19d1/0x7a50 net/core/flow_dissector.c:1170
Read of size 1 at addr ffff88812fb4000e by task syz-executor183/5191
[..]
kasan_report+0xda/0x110 mm/kasan/report.c:588
__skb_flow_dissect+0x19d1/0x7a50 net/core/flow_dissector.c:1170
skb_flow_dissect_flow_keys include/linux/skbuff.h:1514 [inline]
___skb_get_hash net/core/flow_dissector.c:1791 [inline]
__skb_get_hash+0xc7/0x540 net/core/flow_dissector.c:1856
skb_get_hash include/linux/skbuff.h:1556 [inline]
ip_tunnel_xmit+0x1855/0x33c0 net/ipv4/ip_tunnel.c:748
ipip_tunnel_xmit+0x3cc/0x4e0 net/ipv4/ipip.c:308
__netdev_start_xmit include/linux/netdevice.h:4940 [inline]
netdev_start_xmit include/linux/netdevice.h:4954 [inline]
xmit_one net/core/dev.c:3548 [inline]
dev_hard_start_xmit+0x13d/0x6d0 net/core/dev.c:3564
__dev_queue_xmit+0x7c1/0x3d60 net/core/dev.c:4349
dev_queue_xmit include/linux/netdevice.h:3134 [inline]
neigh_connected_output+0x42c/0x5d0 net/core/neighbour.c:1592
...
ip_finish_output2+0x833/0x2550 net/ipv4/ip_output.c:235
ip_finish_output+0x31/0x310 net/ipv4/ip_output.c:323
..
iptunnel_xmit+0x5b4/0x9b0 net/ipv4/ip_tunnel_core.c:82
ip_tunnel_xmit+0x1dbc/0x33c0 net/ipv4/ip_tunnel.c:831
ipgre_xmit+0x4a1/0x980 net/ipv4/ip_gre.c:665
__netdev_start_xmit include/linux/netdevice.h:4940 [inline]
netdev_start_xmit include/linux/netdevice.h:4954 [inline]
xmit_one net/core/dev.c:3548 [inline]
dev_hard_start_xmit+0x13d/0x6d0 net/core/dev.c:3564
...
The splat occurs because skb->data points past skb->head allocated area.
This is because neigh layer does:
__skb_pull(skb, skb_network_offset(skb));
... but skb_network_offset() returns a negative offset and __skb_pull()
arg is unsigned. IOW, we skb->data gets "adjusted" by a huge value.
The negative value is returned because skb->head and skb->data distance is
more than 64k and skb->network_header (u16) has wrapped around.
The bug is in the ip_tunnel infrastructure, which can cause
dev->needed_headroom to increment ad infinitum.
The syzkaller reproducer consists of packets getting routed via a gre
tunnel, and route of gre encapsulated packets pointing at another (ipip)
tunnel. The ipip encapsulation finds gre0 as next output device.
This results in the following pattern:
1). First packet is to be sent out via gre0.
Route lookup found an output device, ipip0.
2).
ip_tunnel_xmit for gre0 bumps gre0->needed_headroom based on the future
output device, rt.dev->needed_headroom (ipip0).
3).
ip output / start_xmit moves skb on to ipip0. which runs the same
code path again (xmit recursion).
4).
Routing step for the post-gre0-encap packet finds gre0 as output device
to use for ipip0 encapsulated packet.
tunl0->needed_headroom is then incremented based on the (already bumped)
gre0 device headroom.
This repeats for every future packet:
gre0->needed_headroom gets inflated because previous packets' ipip0 step
incremented rt->dev (gre0) headroom, and ipip0 incremented because gre0
needed_headroom was increased.
For each subsequent packet, gre/ipip0->needed_headroom grows until
post-expand-head reallocations result in a skb->head/data distance of
more than 64k.
Once that happens, skb->network_header (u16) wraps around when
pskb_expand_head tries to make sure that skb_network_offset() is unchanged
after the headroom expansion/reallocation.
After this skb_network_offset(skb) returns a different (and negative)
result post headroom expansion.
The next trip to neigh layer (or anything else that would __skb_pull the
network header) makes skb->data point to a memory location outside
skb->head area.
v2: Cap the needed_headroom update to an arbitarily chosen upperlimit to
prevent perpetual increase instead of dropping the headroom increment
completely.
Reported-and-tested-by: syzbot+bfde3bef047a81b8fde6@syzkaller.appspotmail.com
Closes: https://groups.google.com/g/syzkaller-bugs/c/fL9G6GtWskY/m/VKk_PR5FBAAJ
Fixes: 243aad830e ("ip_gre: include route header_len in max_headroom calculation")
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240220135606.4939-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
BUG: KMSAN: uninit-value in nla_validate_range_unsigned lib/nlattr.c:222 [inline]
BUG: KMSAN: uninit-value in nla_validate_int_range lib/nlattr.c:336 [inline]
BUG: KMSAN: uninit-value in validate_nla lib/nlattr.c:575 [inline]
BUG: KMSAN: uninit-value in __nla_validate_parse+0x2e20/0x45c0 lib/nlattr.c:631
nla_validate_range_unsigned lib/nlattr.c:222 [inline]
nla_validate_int_range lib/nlattr.c:336 [inline]
validate_nla lib/nlattr.c:575 [inline]
...
The message in question matches this policy:
[NFTA_TARGET_REV] = NLA_POLICY_MAX(NLA_BE32, 255),
but because NLA_BE32 size in minlen array is 0, the validation
code will read past the malformed (too small) attribute.
Note: Other attributes, e.g. BITFIELD32, SINT, UINT.. are also missing:
those likely should be added too.
Reported-by: syzbot+3f497b07aa3baf2fb4d0@syzkaller.appspotmail.com
Reported-by: xingwei lee <xrivendell7@gmail.com>
Closes: https://lore.kernel.org/all/CABOYnLzFYHSnvTyS6zGa-udNX55+izqkOt2sB9WDqUcEGW6n8w@mail.gmail.com/raw
Fixes: ecaf75ffd5 ("netlink: introduce bigendian integer types")
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20240221172740.5092-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Since commit bfac19e239 ("fbdev: mx3fb: Remove the driver") backlight
is no longer functional.
The fbdev mx3fb driver used to automatically select
CONFIG_BACKLIGHT_CLASS_DEVICE.
Now that the mx3fb driver has been removed, enable the
CONFIG_BACKLIGHT_CLASS_DEVICE option so that backlight can still work
by default.
Tested on a imx6dl-sabresd board.
Cc: stable@vger.kernel.org
Fixes: bfac19e239 ("fbdev: mx3fb: Remove the driver")
Signed-off-by: Fabio Estevam <festevam@denx.de>
Reviewed-by: Francesco Dolcini <francesco.dolcini@toradex.com>
Tested-by: Francesco Dolcini <francesco.dolcini@toradex.com> # Toradex Colibri iMX7
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Extend set_memory_region_test's invalid flags subtest to verify that
GUEST_MEMFD is incompatible with READONLY. GUEST_MEMFD doesn't currently
support writes from userspace and KVM doesn't support emulated MMIO on
private accesses, and so KVM is supposed to reject the GUEST_MEMFD+READONLY
in order to avoid configuration that KVM can't support.
Link: https://lore.kernel.org/r/20240222190612.2942589-6-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Actually create a GUEST_MEMFD instance and pass it to KVM when doing
negative tests for KVM_SET_USER_MEMORY_REGION2 + KVM_MEM_GUEST_MEMFD.
Without a valid GUEST_MEMFD file descriptor, KVM_SET_USER_MEMORY_REGION2
will always fail with -EINVAL, resulting in false passes for any and all
tests of illegal combinations of KVM_MEM_GUEST_MEMFD and other flags.
Fixes: 5d74316466 ("KVM: selftests: Add a memory region subtest to validate invalid flags")
Link: https://lore.kernel.org/r/20240222190612.2942589-5-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Advertise and support software-protected VMs if and only if the TDP MMU is
enabled, i.e. disallow KVM_SW_PROTECTED_VM if TDP is enabled for KVM's
legacy/shadow MMU. TDP support for the shadow MMU is maintenance-only,
e.g. support for TDX and SNP will also be restricted to the TDP MMU.
Fixes: 89ea60c2c7 ("KVM: x86: Add support for "protected VMs" that can utilize private memory")
Link: https://lore.kernel.org/r/20240222190612.2942589-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Rewrite the help message for KVM_SW_PROTECTED_VM to make it clear that
software-protected VMs are a development and testing vehicle for
guest_memfd(), and that attempting to use KVM_SW_PROTECTED_VM for anything
remotely resembling a "real" VM will fail. E.g. any memory accesses from
KVM will incorrectly access shared memory, nested TDP is wildly broken,
and so on and so forth.
Update KVM's API documentation with similar warnings to discourage anyone
from attempting to run anything but selftests with KVM_X86_SW_PROTECTED_VM.
Fixes: 89ea60c2c7 ("KVM: x86: Add support for "protected VMs" that can utilize private memory")
Link: https://lore.kernel.org/r/20240222190612.2942589-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Disallow creating read-only memslots that support GUEST_MEMFD, as
GUEST_MEMFD is fundamentally incompatible with KVM's semantics for
read-only memslots. Read-only memslots allow the userspace VMM to emulate
option ROMs by filling the backing memory with readable, executable code
and data, while triggering emulated MMIO on writes. GUEST_MEMFD doesn't
currently support writes from userspace and KVM doesn't support emulated
MMIO on private accesses, i.e. the guest can only ever read zeros, and
writes will always be treated as errors.
Cc: Fuad Tabba <tabba@google.com>
Cc: Michael Roth <michael.roth@amd.com>
Cc: Isaku Yamahata <isaku.yamahata@gmail.com>
Cc: Yu Zhang <yu.c.zhang@linux.intel.com>
Cc: Chao Peng <chao.p.peng@linux.intel.com>
Fixes: a7800aa80e ("KVM: Add KVM_CREATE_GUEST_MEMFD ioctl() for guest-specific backing memory")
Link: https://lore.kernel.org/r/20240222190612.2942589-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
This reports the currently used vram allocations.
userspace using this has been proposed for nvk, but
it's a rather trivial uapi addition.
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This returns the BAR resources size so userspace can make
decisions based on rebar support.
userspace using this has been proposed for nvk, but
it's a rather trivial uapi addition.
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The new riscv specific arch_hugetlb_migration_supported() must be
guarded with a #ifdef CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION to avoid
the following build error:
In file included from include/linux/hugetlb.h:851,
from kernel/fork.c:52:
>> arch/riscv/include/asm/hugetlb.h:15:42: error: static declaration of 'arch_hugetlb_migration_supported' follows non-static declaration
15 | #define arch_hugetlb_migration_supported arch_hugetlb_migration_supported
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/hugetlb.h:916:20: note: in expansion of macro 'arch_hugetlb_migration_supported'
916 | static inline bool arch_hugetlb_migration_supported(struct hstate *h)
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
arch/riscv/include/asm/hugetlb.h:14:6: note: previous declaration of 'arch_hugetlb_migration_supported' with type 'bool(struct hstate *)' {aka '_Bool(struct hstate *)'}
14 | bool arch_hugetlb_migration_supported(struct hstate *h);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202402110258.CV51JlEI-lkp@intel.com/
Fixes: ce68c03545 ("riscv: Fix arch_hugetlb_migration_supported() for NAPOT")
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20240211083640.756583-1-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
CALLER_ADDRx returns caller's address at specified level, they are used
for several tracers. These macros eventually use
__builtin_return_address(n) to get the caller's address if arch doesn't
define their own implementation.
In RISC-V, __builtin_return_address(n) only works when n == 0, we need
to walk the stack frame to get the caller's address at specified level.
data.level started from 'level + 3' due to the call flow of getting
caller's address in RISC-V implementation. If we don't have additional
three iteration, the level is corresponding to follows:
callsite -> return_address -> arch_stack_walk -> walk_stackframe
| | | |
level 3 level 2 level 1 level 0
Fixes: 10626c32e3 ("riscv/ftrace: Add basic support")
Cc: stable@vger.kernel.org
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Zong Li <zong.li@sifive.com>
Link: https://lore.kernel.org/r/20240202015102.26251-1-zong.li@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
This single fix is also part of a larger cleanup, so I'm merging it
into my fixes branch so it can be shared with for-next.
* commit '8246601a7d391ce8207408149d65732f28af81a1':
riscv: tlb: fix __p*d_free_tlb()
Pull block fixes from Jens Axboe:
"Mostly just fixlets for md, but also a sed-opal parsing fix"
* tag 'block-6.8-2024-02-22' of git://git.kernel.dk/linux:
block: sed-opal: handle empty atoms when parsing response
md: Don't suspend the array for interrupted reshape
md: Don't register sync_thread for reshape directly
md: Make sure md_do_sync() will set MD_RECOVERY_DONE
md: Don't ignore read-only array in md_check_recovery()
md: Don't ignore suspended array in md_check_recovery()
md: Fix missing release of 'active_io' for flush
Pull iommufd fixes from Jason Gunthorpe:
- Fix dirty tracking bitmap collection when using reporting bitmaps
that are not neatly aligned to u64's or match the IO page table radix
tree layout.
- Add self tests to cover the cases that were found to be broken.
- Add missing enforcement of invalidation type in the uapi.
- Fix selftest config generation
* tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd:
selftests/iommu: fix the config fragment
iommufd: Reject non-zero data_type if no data_len is provided
iommufd/iova_bitmap: Consider page offset for the pages to be pinned
iommufd/selftest: Add mock IO hugepages tests
iommufd/selftest: Hugepage mock domain support
iommufd/selftest: Refactor mock_domain_read_and_clear_dirty()
iommufd/selftest: Refactor dirty bitmap tests
iommufd/iova_bitmap: Handle recording beyond the mapped pages
iommufd/selftest: Test u64 unaligned bitmaps
iommufd/iova_bitmap: Switch iova_bitmap::bitmap to an u8 array
iommufd/iova_bitmap: Bounds check mapped::pages access
Pull x86 platform driver fixes from Hans de Goede:
"Regression fixes:
- Fix INT0002 vGPIO events no longer working after 6.8 ACPI SCI
changes
- AMD-PMF: Fix laptops (e.g. Framework 13 AMD) hanging on suspend
- x86-android-tablets: Fix touchscreen no longer working on Lenovo
Yogabook
- x86-android-tablets: Fix serdev instantiation regression
- intel-vbtn: Fix ThinkPad X1 Tablet Gen2 no longer suspending
Bug fixes:
- think-lmi: Fix changing BIOS settings on Lenovo workstations
- touchscreen_dmi: Fix Hi8 Air touchscreen data sometimes missing
- AMD-PMF: Fix Smart PC support not working after suspend/resume
Other misc small fixes"
* tag 'platform-drivers-x86-v6.8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/x86: thinkpad_acpi: Only update profile if successfully converted
platform/x86: intel-vbtn: Stop calling "VBDL" from notify_handler
platform/x86: x86-android-tablets: Fix acer_b1_750_goodix_gpios name
platform/x86: x86-android-tablets: Fix serdev instantiation no longer working
platform/x86: Add new get_serdev_controller() helper
platform/x86: x86-android-tablets: Fix keyboard touchscreen on Lenovo Yogabook1 X90
platform/x86/amd/pmf: Fix a potential race with policy binary sideload
platform/x86/amd/pmf: Fixup error handling for amd_pmf_init_smart_pc()
platform/x86/amd/pmf: Add debugging message for missing policy data
platform/x86/amd/pmf: Fix a suspend hang on Framework 13
platform/x86/amd/pmf: Fix TEE enact command failure after suspend and resume
platform/x86/amd/pmf: Remove smart_pc_status enum
platform/x86: touchscreen_dmi: Consolidate Goodix upside-down touchscreen data
platform/x86: touchscreen_dmi: Allow partial (prefix) matches for ACPI names
platform/x86: intel: int0002_vgpio: Pass IRQF_ONESHOT to request_irq()
platform/x86: think-lmi: Fix password opcode ordering for workstations
Pull clk fixes from Stephen Boyd:
"Here are some Samsung clk driver fixes I've been sitting on for far
too long.
They fix the bindings and clk driver for the Google GS101 SoC"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: samsung: clk-gs101: comply with the new dt cmu_misc clock names
dt-bindings: clock: gs101: rename cmu_misc clock-names
Pull vfs fixes from Christian Brauner:
- Fix a memory leak in cachefiles
- Restrict aio cancellations to I/O submitted through the aio
interfaces as this is otherwise causing issues for I/O submitted
via io_uring
- Increase buffer for afs volume status to avoid overflow
- Fix a missing zero-length check in unbuffered writes in the
netfs library. If generic_write_checks() returns zero make
netfs_unbuffered_write_iter() return right away
- Prevent a leak in i_dio_count caused by netfs_begin_read() operating
past i_size. It will return early and leave i_dio_count incremented
- Account for ipv4 addresses as well as ipv6 addresses when processing
incoming callbacks in afs
* tag 'vfs-6.8-rc6.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
fs/aio: Restrict kiocb_set_cancel_fn() to I/O submitted via libaio
afs: Increase buffer size in afs_update_volume_status()
afs: Fix ignored callbacks over ipv4
cachefiles: fix memory leak in cachefiles_add_cache()
netfs: Fix missing zero-length check in unbuffered write
netfs: Fix i_dio_count leak on DIO read past i_size
Pull networking fixes from Paolo Abeni:
"Including fixes from bpf and netfilter.
Current release - regressions:
- af_unix: fix another unix GC hangup
Previous releases - regressions:
- core: fix a possible AF_UNIX deadlock
- bpf: fix NULL pointer dereference in sk_psock_verdict_data_ready()
- netfilter: nft_flow_offload: release dst in case direct xmit path
is used
- bridge: switchdev: ensure MDB events are delivered exactly once
- l2tp: pass correct message length to ip6_append_data
- dccp/tcp: unhash sk from ehash for tb2 alloc failure after
check_estalblished()
- tls: fixes for record type handling with PEEK
- devlink: fix possible use-after-free and memory leaks in
devlink_init()
Previous releases - always broken:
- bpf: fix an oops when attempting to read the vsyscall page through
bpf_probe_read_kernel
- sched: act_mirred: use the backlog for mirred ingress
- netfilter: nft_flow_offload: fix dst refcount underflow
- ipv6: sr: fix possible use-after-free and null-ptr-deref
- mptcp: fix several data races
- phonet: take correct lock to peek at the RX queue
Misc:
- handful of fixes and reliability improvements for selftests"
* tag 'net-6.8.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (72 commits)
l2tp: pass correct message length to ip6_append_data
net: phy: realtek: Fix rtl8211f_config_init() for RTL8211F(D)(I)-VD-CG PHY
selftests: ioam: refactoring to align with the fix
Fix write to cloned skb in ipv6_hop_ioam()
phonet/pep: fix racy skb_queue_empty() use
phonet: take correct lock to peek at the RX queue
net: sparx5: Add spinlock for frame transmission from CPU
net/sched: flower: Add lock protection when remove filter handle
devlink: fix port dump cmd type
net: stmmac: Fix EST offset for dwmac 5.10
tools: ynl: don't leak mcast_groups on init error
tools: ynl: make sure we always pass yarg to mnl_cb_run
net: mctp: put sock on tag allocation failure
netfilter: nf_tables: use kzalloc for hook allocation
netfilter: nf_tables: register hooks last when adding new chain/flowtable
netfilter: nft_flow_offload: release dst in case direct xmit path is used
netfilter: nft_flow_offload: reset dst in route object after setting up flow
netfilter: nf_tables: set dormant flag on hook register failure
selftests: tls: add test for peeking past a record of a different type
selftests: tls: add test for merging of same-type control messages
...
On Tegra186, secure world applications may need to access host1x
during suspend/resume, and rely on the kernel to keep Host1x out
of reset during the suspend cycle. As such, as a quirk,
skip asserting Host1x's reset on Tegra186.
We don't need to keep the clocks enabled, as BPMP ensures the clock
stays on while Host1x is being used. On newer SoC's, the reset line
is inaccessible, so there is no need for the quirk.
Fixes: b7c00cdf6d ("gpu: host1x: Enable system suspend callbacks")
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Reviewed-by: Jon Hunter <jonathanh@nvidia.com>
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240222010517.1573931-1-cyndis@kapsi.fi
[Why]
Currently there is an error while translating input clock sates into
output clock states. The highest fclk setting from output sates is
being dropped because of this error.
[How]
For dcn35 and dcn351, make output_states equal to input states.
Reviewed-by: Charlene Liu <charlene.liu@amd.com>
Acked-by: Rodrigo Siqueira <rodrigo.siqueira@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Swapnil Patel <swapnil.patel@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fixes potential null pointer dereference warnings in the
dc_dmub_srv_cmd_list_queue_execute() and dc_dmub_srv_is_hw_pwr_up()
functions.
In both functions, the 'dc_dmub_srv' variable was being dereferenced
before it was checked for null. This could lead to a null pointer
dereference if 'dc_dmub_srv' is null. The fix is to check if
'dc_dmub_srv' is null before dereferencing it.
Thus moving the null checks for 'dc_dmub_srv' to the beginning of the
functions to ensure that 'dc_dmub_srv' is not null when it is
dereferenced.
Found by smatch & thus fixing the below:
drivers/gpu/drm/amd/amdgpu/../display/dc/dc_dmub_srv.c:133 dc_dmub_srv_cmd_list_queue_execute() warn: variable dereferenced before check 'dc_dmub_srv' (see line 128)
drivers/gpu/drm/amd/amdgpu/../display/dc/dc_dmub_srv.c:1167 dc_dmub_srv_is_hw_pwr_up() warn: variable dereferenced before check 'dc_dmub_srv' (see line 1164)
Fixes: 028bac5834 ("drm/amd/display: decouple dmcub execution to reduce lock granularity")
Fixes: 65138eb72e ("drm/amd/display: Add DCN35 DMUB")
Cc: JinZe.Xu <jinze.xu@amd.com>
Cc: Hersen Wu <hersenxs.wu@amd.com>
Cc: Josip Pavic <josip.pavic@amd.com>
Cc: Roman Li <roman.li@amd.com>
Cc: Qingqing Zhuo <Qingqing.Zhuo@amd.com>
Cc: Harry Wentland <Harry.Wentland@amd.com>
Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Cc: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Pull tracing fix from Steven Rostedt:
- While working on the ring buffer I noticed that the counter used for
knowing where the end of the data is on a sub-buffer was not a full
"int" but just 20 bits. It was masked out to 0xfffff.
With the new code that allows the user to change the size of the
sub-buffer, it is theoretically possible to ask for a size bigger
than 2^20. If that happens, unexpected results may occur as there's
no code checking if the counter overflowed the 20 bits of the write
mask. There are other checks to make sure events fit in the
sub-buffer, but if the sub-buffer itself is too big, that is not
checked.
Add a check in the resize of the sub-buffer to make sure that it
never goes beyond the size of the counter that holds how much data is
on it.
* tag 'trace-v6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
ring-buffer: Do not let subbuf be bigger than write mask
[Why]
Observe error message "Can't retrieve aconnector in hpd_rx_irq_offload_work"
when boot up with a mst tbt4 dock connected. After analyzing, there are few
parts needed to be adjusted:
1. hpd_rx_offload_wq[].aconnector is not initialzed before the dmub outbox
hpd_irq handler get registered which causes the error message.
2. registeration of hpd and hpd_rx_irq event for usb4 dp tunneling is not
aligned with legacy interface sequence
[How]
Put DMUB_NOTIFICATION_HPD and DMUB_NOTIFICATION_HPD_IRQ handler
registration into register_hpd_handlers() to align other interfaces and
get hpd_rx_offload_wq[].aconnector initialized earlier than that.
Leave DMUB_NOTIFICATION_AUX_REPLY registered as it was since we need that
while calling dc_link_detect(). USB4 connection status will be proactively
detected by dc_link_detect_connection_type() in amdgpu_dm_initialize_drm_device()
Cc: Stable <stable@vger.kernel.org>
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Acked-by: Rodrigo Siqueira <rodrigo.siqueira@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Wayne Lin <wayne.lin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The MGBE power-domains on Tegra234 are mapped to the MGBE controllers as
follows:
MGBE0 (0x68000000) --> Power-Domain MGBEB
MGBE1 (0x69000000) --> Power-Domain MGBEC
MGBE2 (0x6a000000) --> Power-Domain MGBED
Update the device-tree nodes for Tegra234 to correct this.
Fixes: 610cdf3186 ("arm64: tegra: Add MGBE nodes on Tegra234")
Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
The s390 common I/O layer (CIO) returns an unexpected -EBUSY return code
when drivers try to start I/O while a path-verification (PV) process is
pending. This can lead to failed device initialization attempts with
symptoms like broken network connectivity after boot.
Fix this by replacing the -EBUSY return code with a deferred condition
code 1 reply to make path-verification handling consistent from a
driver's point of view.
The problem can be reproduced semi-regularly using the following process,
while repeating steps 2-3 as necessary (example assumes an OSA device
with bus-IDs 0.0.a000-0.0.a002 on CHPID 0.02):
1. echo 0.0.a000,0.0.a001,0.0.a002 >/sys/bus/ccwgroup/drivers/qeth/group
2. echo 0 > /sys/bus/ccwgroup/devices/0.0.a000/online
3. echo 1 > /sys/bus/ccwgroup/devices/0.0.a000/online ; \
echo on > /sys/devices/css0/chp0.02/status
Background information:
The common I/O layer starts path-verification I/Os when it receives
indications about changes in a device path's availability. This occurs
for example when hardware events indicate a change in channel-path
status, or when a manual operation such as a CHPID vary or configure
operation is performed.
If a driver attempts to start I/O while a PV is running, CIO reports a
successful I/O start (ccw_device_start() return code 0). Then, after
completion of PV, CIO synthesizes an interrupt response that indicates
an asynchronous status condition that prevented the start of the I/O
(deferred condition code 1).
If a PV indication arrives while a device is busy with driver-owned I/O,
PV is delayed until after I/O completion was reported to the driver's
interrupt handler. To ensure that PV can be started eventually, CIO
reports a device busy condition (ccw_device_start() return code -EBUSY)
if a driver tries to start another I/O while PV is pending.
In some cases this -EBUSY return code causes device drivers to consider
a device not operational, resulting in failed device initialization.
Note: The code that introduced the problem was added in 2003. Symptoms
started appearing with the following CIO commit that causes a PV
indication when a device is removed from the cio_ignore list after the
associated parent subchannel device was probed, but before online
processing of the CCW device has started:
2297791c92 ("s390/cio: dont unregister subchannel from child-drivers")
During boot, the cio_ignore list is modified by the cio_ignore dracut
module [1] as well as Linux vendor-specific systemd service scripts[2].
When combined, this commit and boot scripts cause a frequent occurrence
of the problem during boot.
[1] https://github.com/dracutdevs/dracut/tree/master/modules.d/81cio_ignore
[2] https://github.com/SUSE/s390-tools/blob/master/cio_ignore.service
Cc: stable@vger.kernel.org # v5.15+
Fixes: 2297791c92 ("s390/cio: dont unregister subchannel from child-drivers")
Tested-By: Thorsten Winkler <twinkler@linux.ibm.com>
Reviewed-by: Thorsten Winkler <twinkler@linux.ibm.com>
Signed-off-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
The config fragment doesn't follow the correct format to enable those
config options which make the config options getting missed while
merging with other configs.
➜ merge_config.sh -m .config tools/testing/selftests/iommu/config
Using .config as base
Merging tools/testing/selftests/iommu/config
➜ make olddefconfig
.config:5295:warning: unexpected data: CONFIG_IOMMUFD
.config:5296:warning: unexpected data: CONFIG_IOMMUFD_TEST
While at it, add CONFIG_FAULT_INJECTION as well which is needed for
CONFIG_IOMMUFD_TEST. If CONFIG_FAULT_INJECTION isn't present in base
config (such as x86 defconfig), CONFIG_IOMMUFD_TEST doesn't get enabled.
Fixes: 57f0988706 ("iommufd: Add a selftest")
Link: https://lore.kernel.org/r/20240222074934.71380-1-usama.anjum@collabora.com
Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
During syncobj_eventfd_entry_func, dma_fence_chain_find_seqno may set
the fence to NULL if the given seqno is signaled and a later seqno has
already been submitted. In that case, the eventfd should be signaled
immediately which currently does not happen.
This is a similar issue to the one addressed by commit b19926d4f3
("drm/syncobj: Deal with signalled fences in drm_syncobj_find_fence.").
As a fix, if the return value of dma_fence_chain_find_seqno indicates
success but it sets the fence to NULL, we will assign a stub fence to
ensure the following code still signals the eventfd.
v1 -> v2: assign a stub fence instead of signaling the eventfd
Signed-off-by: Erik Kurzinger <ekurzinger@nvidia.com>
Fixes: c7a4722971 ("drm/syncobj: add IOCTL to register an eventfd")
Signed-off-by: Simon Ser <contact@emersion.fr>
Link: https://patchwork.freedesktop.org/patch/msgid/20240221184527.37667-1-ekurzinger@nvidia.com
If the SMMU is configured to use a two level CD table then
arm_smmu_write_ctx_desc() allocates a CD table leaf internally using
GFP_KERNEL. Due to recent changes this is being done under a spinlock to
iterate over the device list - thus it will trigger a sleeping while
atomic warning:
arm_smmu_sva_set_dev_pasid()
mutex_lock(&sva_lock);
__arm_smmu_sva_bind()
arm_smmu_mmu_notifier_get()
spin_lock_irqsave()
arm_smmu_write_ctx_desc()
arm_smmu_get_cd_ptr()
arm_smmu_alloc_cd_leaf_table()
dmam_alloc_coherent(GFP_KERNEL)
This is a 64K high order allocation and really should not be done
atomically.
At the moment the rework of the SVA to follow the new API is half
finished. Recently the CD table memory was moved from the domain to the
master, however we have the confusing situation where the SVA code is
wrongly using the RID domains device's list to track which CD tables the
SVA is installed in.
Remove the logic to replicate the CD across all the domain's masters
during attach. We know which master and which CD table the PASID should be
installed in.
Right now SVA only works when dma-iommu.c is in control of the RID
translation, which means we have a single iommu_domain shared across the
entire group and that iommu_domain is not shared outside the group.
Critically this means that the iommu_group->devices list and RID's
smmu_domain->devices list describe the same set of masters.
For PCI cases the core code also insists on singleton groups so there is
only one entry in the smmu_domain->devices list that is equal to the
master being passed in to arm_smmu_sva_set_dev_pasid().
Only non-PCI cases may have multi-device groups. However, the core code
will repeat the calls to arm_smmu_sva_set_dev_pasid() across the entire
iommu_group->devices list.
Instead of having arm_smmu_mmu_notifier_get() indirectly loop over all the
devices in the group via the RID's smmu_domain, rely on
__arm_smmu_sva_bind() to be called for each device in the group and
install the repeated CD entry that way.
This avoids taking the spinlock to access the devices list and permits the
arm_smmu_write_ctx_desc() to use a sleeping allocation. Leave the
arm_smmu_mm_release() as a confusing situation, this requires tracking
attached masters inside the SVA domain.
Removing the loop allows arm_smmu_write_ctx_desc() to be called outside
the spinlock and thus is safe to use GFP_KERNEL.
Move the clearing of the CD into arm_smmu_sva_remove_dev_pasid() so that
arm_smmu_mmu_notifier_get/put() remain paired functions.
Fixes: 24503148c5 ("iommu/arm-smmu-v3: Refactor write_ctx_desc")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/all/4e25d161-0cf8-4050-9aa3-dfa21cd63e56@moroto.mountain/
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Michael Shavit <mshavit@google.com>
Link: https://lore.kernel.org/r/0-v3-11978fc67151+112-smmu_cd_atomic_jgg@nvidia.com
Signed-off-by: Will Deacon <will@kernel.org>
Each SPI controller is expected to call the spi_controller_suspend() and
spi_controller_resume() callbacks at system-wide suspend and resume.
It (1) handles the kthread worker for queued controllers and (2) marks
the controller as suspended to have spi_sync() fail while the
controller is unavailable.
Those two operations do not require the controller to be active, we do
not need to increment the runtime PM usage counter.
Signed-off-by: Théo Lebrun <theo.lebrun@bootlin.com>
Link: https://msgid.link/r/20240222-cdns-qspi-pm-fix-v4-4-6b6af8bcbf59@bootlin.com
Signed-off-by: Mark Brown <broonie@kernel.org>
The ->runtime_suspend() and ->runtime_resume() callbacks are not
expected to call spi_controller_suspend() and spi_controller_resume().
Remove calls to those in the cadence-qspi driver.
Those helpers have two roles currently:
- They stop/start the queue, including dealing with the kworker.
- They toggle the SPI controller SPI_CONTROLLER_SUSPENDED flag. It
requires acquiring ctlr->bus_lock_mutex.
Step one is irrelevant because cadence-qspi is not queued. Step two
however has two implications:
- A deadlock occurs, because ->runtime_resume() is called in a context
where the lock is already taken (in the ->exec_op() callback, where
the usage count is incremented).
- It would disallow all operations once the device is auto-suspended.
Here is a brief call tree highlighting the mutex deadlock:
spi_mem_exec_op()
...
spi_mem_access_start()
mutex_lock(&ctlr->bus_lock_mutex)
cqspi_exec_mem_op()
pm_runtime_resume_and_get()
cqspi_resume()
spi_controller_resume()
mutex_lock(&ctlr->bus_lock_mutex)
...
spi_mem_access_end()
mutex_unlock(&ctlr->bus_lock_mutex)
...
Fixes: 0578a6dbfe ("spi: spi-cadence-quadspi: add runtime pm support")
Signed-off-by: Théo Lebrun <theo.lebrun@bootlin.com>
Link: https://msgid.link/r/20240222-cdns-qspi-pm-fix-v4-2-6b6af8bcbf59@bootlin.com
Signed-off-by: Mark Brown <broonie@kernel.org>
dev_get_drvdata() gets used to acquire the pointer to cqspi and the SPI
controller. Neither embed the other; this lead to memory corruption.
On a given platform (Mobileye EyeQ5) the memory corruption is hidden
inside cqspi->f_pdata. Also, this uninitialised memory is used as a
mutex (ctlr->bus_lock_mutex) by spi_controller_suspend().
Fixes: 2087e85bb6 ("spi: cadence-quadspi: fix suspend-resume implementations")
Reviewed-by: Dhruva Gole <d-gole@ti.com>
Signed-off-by: Théo Lebrun <theo.lebrun@bootlin.com>
Link: https://msgid.link/r/20240222-cdns-qspi-pm-fix-v4-1-6b6af8bcbf59@bootlin.com
Signed-off-by: Mark Brown <broonie@kernel.org>
When waiting for a syncobj timeline point whose fence has not yet been
submitted with the WAIT_FOR_SUBMIT flag, a callback is registered using
drm_syncobj_fence_add_wait and the thread is put to sleep until the
timeout expires. If the fence is submitted before then,
drm_syncobj_add_point will wake up the sleeping thread immediately which
will proceed to wait for the fence to be signaled.
However, if the WAIT_AVAILABLE flag is used instead,
drm_syncobj_fence_add_wait won't get called, meaning the waiting thread
will always sleep for the full timeout duration, even if the fence gets
submitted earlier. If it turns out that the fence *has* been submitted
by the time it eventually wakes up, it will still indicate to userspace
that the wait completed successfully (it won't return -ETIME), but it
will have taken much longer than it should have.
To fix this, we must call drm_syncobj_fence_add_wait if *either* the
WAIT_FOR_SUBMIT flag or the WAIT_AVAILABLE flag is set. The only
difference being that with WAIT_FOR_SUBMIT we will also wait for the
fence to be signaled after it has been submitted while with
WAIT_AVAILABLE we will return immediately.
IGT test patch: https://lists.freedesktop.org/archives/igt-dev/2024-January/067537.html
v1 -> v2: adjust lockdep_assert_none_held_once condition
(cherry picked from commit 8c44ea8163)
Fixes: 01d6c35783 ("drm/syncobj: add support for timeline point wait v8")
Signed-off-by: Erik Kurzinger <ekurzinger@nvidia.com>
Signed-off-by: Simon Ser <contact@emersion.fr>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Simon Ser <contact@emersion.fr>
Link: https://patchwork.freedesktop.org/patch/msgid/20240119163208.3723457-1-ekurzinger@nvidia.com
At btrfs_use_block_rsv() we read the size of a block reserve without
locking its spinlock, which makes KCSAN complain because the size of a
block reserve is always updated while holding its spinlock. The report
from KCSAN is the following:
[653.313148] BUG: KCSAN: data-race in btrfs_update_delayed_refs_rsv [btrfs] / btrfs_use_block_rsv [btrfs]
[653.314755] read to 0x000000017f5871b8 of 8 bytes by task 7519 on cpu 0:
[653.314779] btrfs_use_block_rsv+0xe4/0x2f8 [btrfs]
[653.315606] btrfs_alloc_tree_block+0xdc/0x998 [btrfs]
[653.316421] btrfs_force_cow_block+0x220/0xe38 [btrfs]
[653.317242] btrfs_cow_block+0x1ac/0x568 [btrfs]
[653.318060] btrfs_search_slot+0xda2/0x19b8 [btrfs]
[653.318879] btrfs_del_csums+0x1dc/0x798 [btrfs]
[653.319702] __btrfs_free_extent.isra.0+0xc24/0x2028 [btrfs]
[653.320538] __btrfs_run_delayed_refs+0xd3c/0x2390 [btrfs]
[653.321340] btrfs_run_delayed_refs+0xae/0x290 [btrfs]
[653.322140] flush_space+0x5e4/0x718 [btrfs]
[653.322958] btrfs_preempt_reclaim_metadata_space+0x102/0x2f8 [btrfs]
[653.323781] process_one_work+0x3b6/0x838
[653.323800] worker_thread+0x75e/0xb10
[653.323817] kthread+0x21a/0x230
[653.323836] __ret_from_fork+0x6c/0xb8
[653.323855] ret_from_fork+0xa/0x30
[653.323887] write to 0x000000017f5871b8 of 8 bytes by task 576 on cpu 3:
[653.323906] btrfs_update_delayed_refs_rsv+0x1a4/0x250 [btrfs]
[653.324699] btrfs_add_delayed_data_ref+0x468/0x6d8 [btrfs]
[653.325494] btrfs_free_extent+0x76/0x120 [btrfs]
[653.326280] __btrfs_mod_ref+0x6a8/0x6b8 [btrfs]
[653.327064] btrfs_dec_ref+0x50/0x70 [btrfs]
[653.327849] walk_up_proc+0x236/0xa50 [btrfs]
[653.328633] walk_up_tree+0x21c/0x448 [btrfs]
[653.329418] btrfs_drop_snapshot+0x802/0x1328 [btrfs]
[653.330205] btrfs_clean_one_deleted_snapshot+0x184/0x238 [btrfs]
[653.330995] cleaner_kthread+0x2b0/0x2f0 [btrfs]
[653.331781] kthread+0x21a/0x230
[653.331800] __ret_from_fork+0x6c/0xb8
[653.331818] ret_from_fork+0xa/0x30
So add a helper to get the size of a block reserve while holding the lock.
Reading the field while holding the lock instead of using the data_race()
annotation is used in order to prevent load tearing.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
At space_info.c we have several places where we access the ->reserved
field of a block reserve without taking the block reserve's spinlock
first, which makes KCSAN warn about a data race since that field is
always updated while holding the spinlock.
The reports from KCSAN are like the following:
[117.193526] BUG: KCSAN: data-race in btrfs_block_rsv_release [btrfs] / need_preemptive_reclaim [btrfs]
[117.195148] read to 0x000000017f587190 of 8 bytes by task 6303 on cpu 3:
[117.195172] need_preemptive_reclaim+0x222/0x2f0 [btrfs]
[117.195992] __reserve_bytes+0xbb0/0xdc8 [btrfs]
[117.196807] btrfs_reserve_metadata_bytes+0x4c/0x120 [btrfs]
[117.197620] btrfs_block_rsv_add+0x78/0xa8 [btrfs]
[117.198434] btrfs_delayed_update_inode+0x154/0x368 [btrfs]
[117.199300] btrfs_update_inode+0x108/0x1c8 [btrfs]
[117.200122] btrfs_dirty_inode+0xb4/0x140 [btrfs]
[117.200937] btrfs_update_time+0x8c/0xb0 [btrfs]
[117.201754] touch_atime+0x16c/0x1e0
[117.201789] filemap_read+0x674/0x728
[117.201823] btrfs_file_read_iter+0xf8/0x410 [btrfs]
[117.202653] vfs_read+0x2b6/0x498
[117.203454] ksys_read+0xa2/0x150
[117.203473] __s390x_sys_read+0x68/0x88
[117.203495] do_syscall+0x1c6/0x210
[117.203517] __do_syscall+0xc8/0xf0
[117.203539] system_call+0x70/0x98
[117.203579] write to 0x000000017f587190 of 8 bytes by task 11 on cpu 0:
[117.203604] btrfs_block_rsv_release+0x2e8/0x578 [btrfs]
[117.204432] btrfs_delayed_inode_release_metadata+0x7c/0x1d0 [btrfs]
[117.205259] __btrfs_update_delayed_inode+0x37c/0x5e0 [btrfs]
[117.206093] btrfs_async_run_delayed_root+0x356/0x498 [btrfs]
[117.206917] btrfs_work_helper+0x160/0x7a0 [btrfs]
[117.207738] process_one_work+0x3b6/0x838
[117.207768] worker_thread+0x75e/0xb10
[117.207797] kthread+0x21a/0x230
[117.207830] __ret_from_fork+0x6c/0xb8
[117.207861] ret_from_fork+0xa/0x30
So add a helper to get the reserved amount of a block reserve while
holding the lock. The value may be not be up to date anymore when used by
need_preemptive_reclaim() and btrfs_preempt_reclaim_metadata_space(), but
that's ok since the worst it can do is cause more reclaim work do be done
sooner rather than later. Reading the field while holding the lock instead
of using the data_race() annotation is used in order to prevent load
tearing.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
If we have a sparse file with a trailing hole (from the last extent's end
to i_size) and then create an extent in the file that ends before the
file's i_size, then when doing an incremental send we will issue a write
full of zeroes for the range that starts immediately after the new extent
ends up to i_size. While this isn't incorrect because the file ends up
with exactly the same data, it unnecessarily results in using extra space
at the destination with one or more extents full of zeroes instead of
having a hole. In same cases this results in using megabytes or even
gigabytes of unnecessary space.
Example, reproducer:
$ cat test.sh
#!/bin/bash
DEV=/dev/sdh
MNT=/mnt/sdh
mkfs.btrfs -f $DEV
mount $DEV $MNT
# Create 1G sparse file.
xfs_io -f -c "truncate 1G" $MNT/foobar
# Create base snapshot.
btrfs subvolume snapshot -r $MNT $MNT/mysnap1
# Create send stream (full send) for the base snapshot.
btrfs send -f /tmp/1.snap $MNT/mysnap1
# Now write one extent at the beginning of the file and one somewhere
# in the middle, leaving a gap between the end of this second extent
# and the file's size.
xfs_io -c "pwrite -S 0xab 0 128K" \
-c "pwrite -S 0xcd 512M 128K" \
$MNT/foobar
# Now create a second snapshot which is going to be used for an
# incremental send operation.
btrfs subvolume snapshot -r $MNT $MNT/mysnap2
# Create send stream (incremental send) for the second snapshot.
btrfs send -p $MNT/mysnap1 -f /tmp/2.snap $MNT/mysnap2
# Now recreate the filesystem by receiving both send streams and
# verify we get the same content that the original filesystem had
# and file foobar has only two extents with a size of 128K each.
umount $MNT
mkfs.btrfs -f $DEV
mount $DEV $MNT
btrfs receive -f /tmp/1.snap $MNT
btrfs receive -f /tmp/2.snap $MNT
echo -e "\nFile fiemap in the second snapshot:"
# Should have:
#
# 128K extent at file range [0, 128K[
# hole at file range [128K, 512M[
# 128K extent file range [512M, 512M + 128K[
# hole at file range [512M + 128K, 1G[
xfs_io -r -c "fiemap -v" $MNT/mysnap2/foobar
# File should be using 256K of data (two 128K extents).
echo -e "\nSpace used by the file: $(du -h $MNT/mysnap2/foobar | cut -f 1)"
umount $MNT
Running the test, we can see with fiemap that we get an extent for the
range [512M, 1G[, while in the source filesystem we have an extent for
the range [512M, 512M + 128K[ and a hole for the rest of the file (the
range [512M + 128K, 1G[):
$ ./test.sh
(...)
File fiemap in the second snapshot:
/mnt/sdh/mysnap2/foobar:
EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS
0: [0..255]: 26624..26879 256 0x0
1: [256..1048575]: hole 1048320
2: [1048576..2097151]: 2156544..3205119 1048576 0x1
Space used by the file: 513M
This happens because once we finish processing an inode, at
finish_inode_if_needed(), we always issue a hole (write operations full
of zeros) if there's a gap between the end of the last processed extent
and the file's size, even if that range is already a hole in the parent
snapshot. Fix this by issuing the hole only if the range is not already
a hole.
After this change, running the test above, we get the expected layout:
$ ./test.sh
(...)
File fiemap in the second snapshot:
/mnt/sdh/mysnap2/foobar:
EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS
0: [0..255]: 26624..26879 256 0x0
1: [256..1048575]: hole 1048320
2: [1048576..1048831]: 26880..27135 256 0x1
3: [1048832..2097151]: hole 1048320
Space used by the file: 256K
A test case for fstests will follow soon.
CC: stable@vger.kernel.org # 6.1+
Reported-by: Dorai Ashok S A <dash.btrfs@inix.me>
Link: https://lore.kernel.org/linux-btrfs/c0bf7818-9c45-46a8-b3d3-513230d0c86e@inix.me/
Reviewed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
On a zoned filesystem with conventional zones, we're skipping the block
group profile checks for the conventional zones.
This allows converting a zoned filesystem's data block groups to RAID when
all of the zones backing the chunk are on conventional zones. But this
will lead to problems, once we're trying to allocate chunks backed by
sequential zones.
So also check for conventional zones when loading a block group's profile
on them.
Reported-by: HAN Yuwei <hrx@bupt.moe>
Link: https://lore.kernel.org/all/1ACD2E3643008A17+da260584-2c7f-432a-9e22-9d390aae84cc@bupt.moe/#t
Reviewed-by: Boris Burkov <boris@bur.io>
Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
l2tp_ip6_sendmsg needs to avoid accounting for the transport header
twice when splicing more data into an already partially-occupied skbuff.
To manage this, we check whether the skbuff contains data using
skb_queue_empty when deciding how much data to append using
ip6_append_data.
However, the code which performed the calculation was incorrect:
ulen = len + skb_queue_empty(&sk->sk_write_queue) ? transhdrlen : 0;
...due to C operator precedence, this ends up setting ulen to
transhdrlen for messages with a non-zero length, which results in
corrupted packets on the wire.
Add parentheses to correct the calculation in line with the original
intent.
Fixes: 9d4c75800f ("ipv4, ipv6: Fix handling of transhdrlen in __ip{,6}_append_data()")
Cc: David Howells <dhowells@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240220122156.43131-1-tparkin@katalix.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Commit eb9299bead ("ACPI: EC: Use a spin lock without disabing
interrupts") introduced an unexpected user-visible change in
behavior, which is a significant CPU load increase when the EC
is in use.
This most likely happens due to increased spinlock contention
and so reducing this effect would require a major rework of the
EC driver locking. There is no time for this in the current
cycle, so revert commit eb9299bead.
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218511
Reported-by: Dieter Mummenschanz <dmummenschanz@web.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
1) If user requests to wake up a table and hook fails, restore the
dormant flag from the error path, from Florian Westphal.
2) Reset dst after transferring it to the flow object, otherwise dst
gets released twice from the error path.
3) Release dst in case the flowtable selects a direct xmit path, eg.
transmission to bridge port. Otherwise, dst is memleaked.
4) Register basechain and flowtable hooks at the end of the command.
Error path releases these datastructure without waiting for the
rcu grace period.
5) Use kzalloc() to initialize struct nft_hook to fix a KMSAN report
on access to hook type, also from Florian Westphal.
netfilter pull request 24-02-22
* tag 'nf-24-02-22' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nf_tables: use kzalloc for hook allocation
netfilter: nf_tables: register hooks last when adding new chain/flowtable
netfilter: nft_flow_offload: release dst in case direct xmit path is used
netfilter: nft_flow_offload: reset dst in route object after setting up flow
netfilter: nf_tables: set dormant flag on hook register failure
====================
Link: https://lore.kernel.org/r/20240222000843.146665-1-pablo@netfilter.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Daniel Borkmann says:
====================
pull-request: bpf 2024-02-22
The following pull-request contains BPF updates for your *net* tree.
We've added 11 non-merge commits during the last 24 day(s) which contain
a total of 15 files changed, 217 insertions(+), 17 deletions(-).
The main changes are:
1) Fix a syzkaller-triggered oops when attempting to read the vsyscall
page through bpf_probe_read_kernel and friends, from Hou Tao.
2) Fix a kernel panic due to uninitialized iter position pointer in
bpf_iter_task, from Yafang Shao.
3) Fix a race between bpf_timer_cancel_and_free and bpf_timer_cancel,
from Martin KaFai Lau.
4) Fix a xsk warning in skb_add_rx_frag() (under CONFIG_DEBUG_NET)
due to incorrect truesize accounting, from Sebastian Andrzej Siewior.
5) Fix a NULL pointer dereference in sk_psock_verdict_data_ready,
from Shigeru Yoshida.
6) Fix a resolve_btfids warning when bpf_cpumask symbol cannot be
resolved, from Hari Bathini.
bpf-for-netdev
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf, sockmap: Fix NULL pointer dereference in sk_psock_verdict_data_ready()
selftests/bpf: Add negtive test cases for task iter
bpf: Fix an issue due to uninitialized bpf_iter_task
selftests/bpf: Test racing between bpf_timer_cancel_and_free and bpf_timer_cancel
bpf: Fix racing between bpf_timer_cancel_and_free and bpf_timer_cancel
selftest/bpf: Test the read of vsyscall page under x86-64
x86/mm: Disallow vsyscall page read for copy_from_kernel_nofault()
x86/mm: Move is_vsyscall_vaddr() into asm/vsyscall.h
bpf, scripts: Correct GPL license name
xsk: Add truesize to skb_add_rx_frag().
bpf: Fix warning for bpf_cpumask in verifier
====================
Link: https://lore.kernel.org/r/20240221231826.1404-1-daniel@iogearbox.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Commit bb726b753f ("net: phy: realtek: add support for
RTL8211F(D)(I)-VD-CG") extended support of the driver from the existing
support for RTL8211F(D)(I)-CG PHY to the newer RTL8211F(D)(I)-VD-CG PHY.
While that commit indicated that the RTL8211F_PHYCR2 register is not
supported by the "VD-CG" PHY model and therefore updated the corresponding
section in rtl8211f_config_init() to be invoked conditionally, the call to
"genphy_soft_reset()" was left as-is, when it should have also been invoked
conditionally. This is because the call to "genphy_soft_reset()" was first
introduced by the commit 0a4355c2b7 ("net: phy: realtek: add dt property
to disable CLKOUT clock") since the RTL8211F guide indicates that a PHY
reset should be issued after setting bits in the PHYCR2 register.
As the PHYCR2 register is not applicable to the "VD-CG" PHY model, fix the
rtl8211f_config_init() function by invoking "genphy_soft_reset()"
conditionally based on the presence of the "PHYCR2" register.
Fixes: bb726b753f ("net: phy: realtek: add support for RTL8211F(D)(I)-VD-CG")
Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240220070007.968762-1-s-vadapalli@ti.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
ioam6_parser uses a packet socket. After the fix to prevent writing to
cloned skb's, the receiver does not see its IOAM data anymore, which
makes input/forward ioam-selftests to fail. As a workaround,
ioam6_parser now uses an IPv6 raw socket and leverages ancillary data to
get hop-by-hop options. As a consequence, the hook is "after" the IOAM
data insertion by the receiver and all tests are working again.
Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
ioam6_fill_trace_data() writes inside the skb payload without ensuring
it's writeable (e.g., not cloned). This function is called both from the
input and output path. The output path (ioam6_iptunnel) already does the
check. This commit provides a fix for the input path, inside
ipv6_hop_ioam(). It also updates ip6_parse_tlv() to refresh the network
header pointer ("nh") when returning from ipv6_hop_ioam().
Fixes: 9ee11f0fff ("ipv6: ioam: Data plane support for Pre-allocated Trace")
Reported-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The receive queues are protected by their respective spin-lock, not
the socket lock. This could lead to skb_peek() unexpectedly
returning NULL or a pointer to an already dequeued socket buffer.
Fixes: 9641458d3e ("Phonet: Pipe End Point for Phonet Pipes protocol")
Signed-off-by: Rémi Denis-Courmont <courmisch@gmail.com>
Link: https://lore.kernel.org/r/20240218081214.4806-2-remi@remlab.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
In erofs_find_target_block() when erofs_dirnamecmp() returns 0,
we do not assign the target metabuf. This causes the caller
erofs_namei()'s erofs_put_metabuf() at the end to be not effective
leaving the refcount on the page.
As the page from metabuf (buf->page) is never put, such page cannot be
migrated or reclaimed. Fix it now by putting the metabuf from
previous loop and assigning the current metabuf to target before
returning so caller erofs_namei() can do the final put as it was
intended.
Fixes: 500edd0956 ("erofs: use meta buffers for inode lookup")
Cc: <stable@vger.kernel.org> # 5.18+
Signed-off-by: Sandeep Dhavale <dhavale@google.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20240221210348.3667795-1-dhavale@google.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Both registers used when doing manual injection or fdma injection are
shared between all the net devices of the switch. It was noticed that
when having two process which each of them trying to inject frames on
different ethernet ports, that the HW started to behave strange, by
sending out more frames then expected. When doing fdma injection it is
required to set the frame in the DCB and then make sure that the next
pointer of the last DCB is invalid. But because there is no locks for
this, then easily this pointer between the DCB can be broken and then it
would create a loop of DCBs. And that means that the HW will
continuously transmit these frames in a loop. Until the SW will break
this loop.
Therefore to fix this issue, add a spin lock for when accessing the
registers for manual or fdma injection.
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Reviewed-by: Daniel Machon <daniel.machon@microchip.com>
Fixes: f3cad2611a ("net: sparx5: add hostmode with phylink support")
Link: https://lore.kernel.org/r/20240219080043.1561014-1-horatiu.vultur@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Unlike other commands, due to a c&p error, port dump fills-up cmd with
wrong value, different from port-get request cmd, port-get doit reply
and port notification.
Fix it by filling cmd with value DEVLINK_CMD_PORT_NEW.
Skimmed through devlink userspace implementations, none of them cares
about this cmd value. Only ynl, for which, this is actually a fix, as it
expects doit and dumpit ops rsp_value to be the same.
Omit the fixes tag, even thought this is fix, better to target this for
next release.
Fixes: bfcd3a4661 ("Introduce devlink infrastructure")
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20240220075245.75416-1-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fix EST offset for dwmac 5.10.
Currently configuring Qbv doesn't work as expected. The schedule is
configured, but never confirmed:
|[ 128.250219] imx-dwmac 428a0000.ethernet eth1: configured EST
The reason seems to be the refactoring of the EST code which set the wrong
EST offset for the dwmac 5.10. After fixing this it works as before:
|[ 106.359577] imx-dwmac 428a0000.ethernet eth1: configured EST
|[ 128.430715] imx-dwmac 428a0000.ethernet eth1: EST: SWOL has been switched
Tested on imx93.
Fixes: c3f3b97238 ("net: stmmac: Refactor EST implementation")
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Reviewed-by: Serge Semin <fancer.lancer@gmail.com>
Link: https://lore.kernel.org/r/20240220-stmmac_est-v1-1-c41f9ae2e7b7@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
There is one common error handler in ynl - ynl_cb_error().
It expects priv to be a pointer to struct ynl_parse_arg AKA yarg.
To avoid potential crashes if we encounter a stray NLMSG_ERROR
always pass yarg as priv (or a struct which has it as the first
member).
ynl_cb_null() has a similar problem directly - it expects yarg
but priv passed by the caller is ys.
Found by code inspection.
Fixes: 86878f14d7 ("tools: ynl: user space helpers")
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Link: https://lore.kernel.org/r/20240220161112.2735195-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
KMSAN reports unitialized variable when registering the hook,
reg->hook_ops_type == NF_HOOK_OP_BPF)
~~~~~~~~~~~ undefined
This is a small structure, just use kzalloc to make sure this
won't happen again when new fields get added to nf_hook_ops.
Fixes: 7b4b2fa375 ("netfilter: annotate nf_tables base hook ops")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Register hooks last when adding chain/flowtable to ensure that packets do
not walk over datastructure that is being released in the error path
without waiting for the rcu grace period.
Fixes: 91c7b38dc9 ("netfilter: nf_tables: use new transaction infrastructure to handle chain")
Fixes: 3b49e2e94e ("netfilter: nf_tables: add flow table netlink frontend")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
dst is transferred to the flow object, route object does not own it
anymore. Reset dst in route object, otherwise if flow_offload_add()
fails, error path releases dst twice, leading to a refcount underflow.
Fixes: a3c90f7a23 ("netfilter: nf_tables: flow offload expression")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
We need to set the dormant flag again if we fail to register
the hooks.
During memory pressure hook registration can fail and we end up
with a table marked as active but no registered hooks.
On table/base chain deletion, nf_tables will attempt to unregister
the hook again which yields a warn splat from the nftables core.
Reported-and-tested-by: syzbot+de4025c006ec68ac56fc@syzkaller.appspotmail.com
Fixes: 179d9ba555 ("netfilter: nf_tables: fix table flag updates")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Nathan Chancellor <nathan@kernel.org> says:
Eric reported that builds of LLVM with [1] (close to tip of tree) have
CONFIG_AS_HAS_OPTION_ARCH=n because the test for expected failure on
invalid input has started succeeding.
This Kconfig test was added because '.option arch' only causes an
assembler warning when it is unsupported, rather than a hard error,
which is what users of as-instr expect when something is unsupported.
This can be resolved by turning assembler warnings into errors with
'-Wa,--fatal-warnings' like we do with the compiler with '-Werror',
which is what the first patch does. The second patch removes the invalid
test, as the valid test is good enough with fatal warnings.
I have diffed several configurations for the different architectures
that use as-instr and I have found no issues.
[1]: 3ac9fe69f7
* b4-shazam-merge:
RISC-V: Drop invalid test from CONFIG_AS_HAS_OPTION_ARCH
kbuild: Add -Wa,--fatal-warnings to as-instr invocation
Link: https://lore.kernel.org/r/20240125-fix-riscv-option-arch-llvm-18-v1-0-390ac9cc3cd0@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Sabrina Dubroca says:
====================
tls: fixes for record type handling with PEEK
There are multiple bugs in tls_sw_recvmsg's handling of record types
when MSG_PEEK flag is used, which can lead to incorrectly merging two
records:
- consecutive non-DATA records shouldn't be merged, even if they're
the same type (partly handled by the test at the end of the main
loop)
- records of the same type (even DATA) shouldn't be merged if one
record of a different type comes in between
====================
Link: https://lore.kernel.org/r/cover.1708007371.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If we queue 3 records:
- record 1, type DATA
- record 2, some other type
- record 3, type DATA
and do a recv(PEEK), the rx_list will contain the first two records.
The next large recv will walk through the rx_list and copy data from
record 1, then stop because record 2 is a different type. Since we
haven't filled up our buffer, we will process the next available
record. It's also DATA, so we can merge it with the current read.
We shouldn't do that, since there was a record in between that we
ignored.
Add a flag to let process_rx_list inform tls_sw_recvmsg that it had
more data available.
Fixes: 692d7b5d1f ("tls: Fix recvmsg() to be able to peek across multiple records")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/f00c0c0afa080c60f016df1471158c1caf983c34.1708007371.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If we have a non-DATA record on the rx_list and another record of the
same type still on the queue, we will end up merging them:
- process_rx_list copies the non-DATA record
- we start the loop and process the first available record since it's
of the same type
- we break out of the loop since the record was not DATA
Just check the record type and jump to the end in case process_rx_list
did some work.
Fixes: 692d7b5d1f ("tls: Fix recvmsg() to be able to peek across multiple records")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/bd31449e43bd4b6ff546f5c51cf958c31c511deb.1708007371.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
PEEK needs to leave decrypted records on the rx_list so that we can
receive them later on, so it jumps back into the async code that
queues the skb. Unfortunately that makes us skip the
TLS_RECORD_TYPE_DATA check at the bottom of the main loop, so if two
records of the same (non-DATA) type are queued, we end up merging
them.
Add the same record type check, and make it unlikely to not penalize
the async fastpath. Async decrypt only applies to data record, so this
check is only needed for PEEK.
process_rx_list also has similar issues.
Fixes: 692d7b5d1f ("tls: Fix recvmsg() to be able to peek across multiple records")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/3df2eef4fdae720c55e69472b5bea668772b45a2.1708007371.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Akira Yokosawa reported [1] that the "translations" extension we added in
commit 7418ec5b15 ("docs: translations: add translations links when they
exist") broke the build on Sphinx versions v6.1.3 through 7.1.2 (possibly
others) with the following error:
Exception occurred:
File "/usr/lib/python3.12/site-packages/sphinx/util/nodes.py", line 624, in _copy_except__document
newnode = self.__class__(rawsource=self.rawsource, **self.attributes)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: LanguagesNode.__init__() missing 1 required positional argument: 'current_language'
The full traceback has been saved in /tmp/sphinx-err-7xmwytuu.log, if you want to report the issue to the developers.
Solve this problem by making 'current_language' a true element attribute
of the LanguagesNode element, which is probably the more correct way to do
it anyway.
Tested on Sphinx 2.x, 3.x, 6.x, and 7.x.
[1]: https://lore.kernel.org/all/54a56c2e-a27c-45a0-b712-02a7bc7d2673@gmail.com/
Fixes: 7418ec5b15 ("docs: translations: add translations links when they exist")
Reported-by: Akira Yokosawa <akiyks@gmail.com>
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Closes: https://lore.kernel.org/all/54a56c2e-a27c-45a0-b712-02a7bc7d2673@gmail.com/
Tested-by: Akira Yokosawa <akiyks@gmail.com> # Sphinx 4.3.2, 5.3.0 and 6.2.1
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20240215064109.1193556-1-vegard.nossum@oracle.com
The GIC/ITS code is designed to ensure to pick up any preallocated LPI
tables on the redistributors, as enabling LPIs is a one-way switch. There
is no such restriction for vLPIs, and for GICv4.1 it is expected to
allocate a new vPE table at boot.
This works as intended when initializing an ITS, however when setting up a
redistributor in cpu_init_lpis() the early return for preallocated RD
tables skips straight past the GICv4 setup. This all comes to a head when
trying to kexec() into a new kernel, as the new kernel silently fails to
set up GICv4, leading to a complete loss of SGIs and LPIs for KVM VMs.
Slap a band-aid on the problem by ensuring its_cpu_init_lpis() always
initializes GICv4 on the way out, even if the other RD tables were
preallocated.
Fixes: 6479450f72 ("irqchip/gic-v4: Fix occasional VLPI drop")
Reported-by: George Cherian <gcherian@marvell.com>
Co-developed-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240219185809.286724-2-oliver.upton@linux.dev
For regular system shutdown, ata_dev_power_set_standby() will be
executed twice: once the scsi device is removed and another when
ata_pci_shutdown_one() executes and EH completes unloading the devices.
Make the second call to ata_dev_power_set_standby() do nothing by using
ata_dev_power_is_active() and return if the device is already in
standby.
Fixes: 2da4c5e24e ("ata: libata-core: Improve ata_dev_power_set_active()")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Niklas Cassel <cassel@kernel.org>
bus_get_dev_root() returns sp->dev_root which is set in subsys_register(),
but subsys_register() is not called by platform_bus_init().
Therefor for the platform_bus_type, bus_get_dev_root() always returns NULL.
This makes mbigen_of_create_domain() always return -ENODEV.
Don't try to retrieve the parent via bus_get_dev_root() and
unconditionally hand a NULL pointer to of_platform_device_create() to
fix this.
Fixes: fea087fc29 ("irqchip/mbigen: move to use bus_get_dev_root()")
Signed-off-by: Chen Jun <chenjun102@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240220111429.110666-1-chenjun102@huawei.com
Pull kvm fixes from Paolo Bonzini:
"Two fixes for ARM ITS emulation. Unmapped interrupts were used instead
of ignored, causing NULL pointer dereferences"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: arm64: vgic-its: Test for valid IRQ in MOVALL handler
KVM: arm64: vgic-its: Test for valid IRQ in its_sync_lpi_pending_table()
Pull btrfs fixes from David Sterba:
- Fix a deadlock in fiemap.
There was a big lock around the whole operation that can interfere
with a page fault and mkwrite.
Reducing the lock scope can also speed up fiemap
- Fix range condition for extent defragmentation which could lead to
worse layout in some cases
* tag 'for-6.8-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix deadlock with fiemap and extent locking
btrfs: defrag: avoid unnecessary defrag caused by incorrect extent size
Pull crypto fix from Herbert Xu:
"Fix a stack overflow in virtio"
* tag 'v6.8-p4' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: virtio/akcipher - Fix stack overflow on memcpy
syzbot reported the following NULL pointer dereference issue [1]:
BUG: kernel NULL pointer dereference, address: 0000000000000000
[...]
RIP: 0010:0x0
[...]
Call Trace:
<TASK>
sk_psock_verdict_data_ready+0x232/0x340 net/core/skmsg.c:1230
unix_stream_sendmsg+0x9b4/0x1230 net/unix/af_unix.c:2293
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg+0x221/0x270 net/socket.c:745
____sys_sendmsg+0x525/0x7d0 net/socket.c:2584
___sys_sendmsg net/socket.c:2638 [inline]
__sys_sendmsg+0x2b0/0x3a0 net/socket.c:2667
do_syscall_64+0xf9/0x240
entry_SYSCALL_64_after_hwframe+0x6f/0x77
If sk_psock_verdict_data_ready() and sk_psock_stop_verdict() are called
concurrently, psock->saved_data_ready can be NULL, causing the above issue.
This patch fixes this issue by calling the appropriate data ready function
using the sk_psock_data_ready() helper and protecting it from concurrency
with sk->sk_callback_lock.
Fixes: 6df7f764cd ("bpf, sockmap: Wake up polling after data copy")
Reported-by: syzbot+fd7b34375c1c8ce29c93@syzkaller.appspotmail.com
Signed-off-by: Shigeru Yoshida <syoshida@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: syzbot+fd7b34375c1c8ce29c93@syzkaller.appspotmail.com
Acked-by: John Fastabend <john.fastabend@gmail.com>
Closes: https://syzkaller.appspot.com/bug?extid=fd7b34375c1c8ce29c93 [1]
Link: https://lore.kernel.org/bpf/20240218150933.6004-1-syoshida@redhat.com
If kiocb_set_cancel_fn() is called for I/O submitted via io_uring, the
following kernel warning appears:
WARNING: CPU: 3 PID: 368 at fs/aio.c:598 kiocb_set_cancel_fn+0x9c/0xa8
Call trace:
kiocb_set_cancel_fn+0x9c/0xa8
ffs_epfile_read_iter+0x144/0x1d0
io_read+0x19c/0x498
io_issue_sqe+0x118/0x27c
io_submit_sqes+0x25c/0x5fc
__arm64_sys_io_uring_enter+0x104/0xab0
invoke_syscall+0x58/0x11c
el0_svc_common+0xb4/0xf4
do_el0_svc+0x2c/0xb0
el0_svc+0x2c/0xa4
el0t_64_sync_handler+0x68/0xb4
el0t_64_sync+0x1a4/0x1a8
Fix this by setting the IOCB_AIO_RW flag for read and write I/O that is
submitted by libaio.
Suggested-by: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Avi Kivity <avi@scylladb.com>
Cc: Sandeep Dhavale <dhavale@google.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: stable@vger.kernel.org
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20240215204739.2677806-2-bvanassche@acm.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
In the case where __lpass_get_dmactl_handle is called and the driver
id dai_id is invalid the pointer dmactl is not being assigned a value,
and dmactl contains a garbage value since it has not been initialized
and so the null check may not work. Fix this to initialize dmactl to
NULL. One could argue that modern compilers will set this to zero, but
it is useful to keep this initialized as per the same way in functions
__lpass_platform_codec_intf_init and lpass_cdc_dma_daiops_hw_params.
Cleans up clang scan build warning:
sound/soc/qcom/lpass-cdc-dma.c:275:7: warning: Branch condition
evaluates to a garbage value [core.uninitialized.Branch]
Fixes: b81af585ea ("ASoC: qcom: Add lpass CPU driver for codec dma control")
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Link: https://msgid.link/r/20240221134804.3475989-1-colin.i.king@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
The cited commit [1] added framer support under drivers/net/wan,
which is covered by NETWORKING [GENERAL]. And it is implied
that framer-provider.h and framer.h, which were also added
buy the same patch, are also maintained as part of NETWORKING [GENERAL].
Make this explicit by adding these files to the corresponding
section in MAINTAINERS.
[1] 82c944d05b ("net: wan: Add framer framework support")
Signed-off-by: Simon Horman <horms@kernel.org>
Reviewed-by: Herve Codina <herve.codina@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
syzbot reported another task hung in __unix_gc(). [0]
The current while loop assumes that all of the left candidates
have oob_skb and calling kfree_skb(oob_skb) releases the remaining
candidates.
However, I missed a case that oob_skb has self-referencing fd and
another fd and the latter sk is placed before the former in the
candidate list. Then, the while loop never proceeds, resulting
the task hung.
__unix_gc() has the same loop just before purging the collected skb,
so we can call kfree_skb(oob_skb) there and let __skb_queue_purge()
release all inflight sockets.
[0]:
Sending NMI from CPU 0 to CPUs 1:
NMI backtrace for cpu 1
CPU: 1 PID: 2784 Comm: kworker/u4:8 Not tainted 6.8.0-rc4-syzkaller-01028-g71b605d32017 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/25/2024
Workqueue: events_unbound __unix_gc
RIP: 0010:__sanitizer_cov_trace_pc+0x0/0x70 kernel/kcov.c:200
Code: 89 fb e8 23 00 00 00 48 8b 3d 84 f5 1a 0c 48 89 de 5b e9 43 26 57 00 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 <f3> 0f 1e fa 48 8b 04 24 65 48 8b 0d 90 52 70 7e 65 8b 15 91 52 70
RSP: 0018:ffffc9000a17fa78 EFLAGS: 00000287
RAX: ffffffff8a0a6108 RBX: ffff88802b6c2640 RCX: ffff88802c0b3b80
RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000000
RBP: ffffc9000a17fbf0 R08: ffffffff89383f1d R09: 1ffff1100ee5ff84
R10: dffffc0000000000 R11: ffffed100ee5ff85 R12: 1ffff110056d84ee
R13: ffffc9000a17fae0 R14: 0000000000000000 R15: ffffffff8f47b840
FS: 0000000000000000(0000) GS:ffff8880b9500000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffef5687ff8 CR3: 0000000029b34000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<NMI>
</NMI>
<TASK>
__unix_gc+0xe69/0xf40 net/unix/garbage.c:343
process_one_work kernel/workqueue.c:2633 [inline]
process_scheduled_works+0x913/0x1420 kernel/workqueue.c:2706
worker_thread+0xa5f/0x1000 kernel/workqueue.c:2787
kthread+0x2ef/0x390 kernel/kthread.c:388
ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1b/0x30 arch/x86/entry/entry_64.S:242
</TASK>
Reported-and-tested-by: syzbot+ecab4d36f920c3574bf9@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=ecab4d36f920c3574bf9
Fixes: 25236c91b5 ("af_unix: Fix task hung while purging oob_skb in GC.")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In newer hardware, IPA supports more than 32 endpoints. Some
registers--such as IPA interrupt registers--represent endpoints
as bits in a 4-byte register, and such registers are repeated as
needed to represent endpoints beyond the first 32.
In ipa_interrupt_suspend_clear_all(), we clear all pending IPA
suspend interrupts by reading all status register(s) and writing
corresponding registers to clear interrupt conditions.
Unfortunately the number of registers to read/write is calculated
incorrectly, and as a result we access *many* more registers than
intended. This bug occurs only when the IPA hardware signals a
SUSPEND interrupt, which happens when a packet is received for an
endpoint (or its underlying GSI channel) that is suspended. This
situation is difficult to reproduce, but possible.
Fix this by correctly computing the number of interrupt registers to
read and write. This is the only place in the code where registers
that map endpoints or channels this way perform this calculation.
Fixes: f298ba785e ("net: ipa: add a parameter to suspend registers")
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
AF reserves MCAM entries for each PF, VF present in the
system and populates the entry with DMAC and action with
default RSS so that basic packet I/O works. Since PF/VF is
not aware of the RSS action installed by AF, AF only fixup
the actions of the rules installed by PF/VF with corresponding
default RSS action. This worked well for rules installed by
PF/VF for features like RX VLAN offload and DMAC filters but
rules involving action like drop/forward to queue are also
getting modified by AF. Hence fix it by setting the default
RSS action only if requested by PF/VF.
Fixes: 967db3529e ("octeontx2-af: add support for multicast/promisc packet replication feature")
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
KVM/arm64 fixes for 6.8, take #3
- Check for the validity of interrupts handled by a MOVALL
command
- Check for the validity of interrupts while reading the
pending state on enabling LPIs.
On 32-bit builds, the vt-d driver causes a warning with clang:
drivers/iommu/intel/nested.c:112:13: error: result of comparison of constant 18446744073709551615 with expression of type 'unsigned long' is always false [-Werror,-Wtautological-constant-out-of-range-compare]
112 | if (npages == U64_MAX)
| ~~~~~~ ^ ~~~~~~~
Make the variable a 64-bit type, which matches both the caller and the
use anyway.
Fixes: f6f3721244 ("iommu/vt-d: Add iotlb flush for nested domain")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20240213095832.455245-1-arnd@kernel.org
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Setting dirty tracking for a s2 domain requires to loop all the related
devices and set the dirty tracking enable bit in the PASID table entry.
This includes the devices that are attached to the nested domains of a
s2 domain if this s2 domain is used as parent. However, the existing dirty
tracking set only loops s2 domain's own devices. It will miss dirty page
logs in the parent domain.
Now, the parent domain tracks the nested domains, so it can loop the
nested domains and the devices attached to the nested domains to ensure
dirty tracking on the parent is set completely.
Fixes: b41e38e225 ("iommu/vt-d: Add nested domain allocation")
Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20240208082307.15759-9-yi.l.liu@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
ATS-capable devices cache the result of nested translation. This result
relies on the mappings in s2 domain (a.k.a. parent). When there are
modifications in the s2 domain, the related nested translation caches on
the device should be flushed. This includes the devices that are attached
to the s1 domain. However, the existing code ignores this fact to only
loops its own devices.
As there is no easy way to identify the exact set of nested translations
affected by the change of s2 domain. So, this just flushes the entire
device iotlb on the device.
As above, driver loops the s2 domain's s1_domains list and loops the
devices list of each s1_domain to flush the entire device iotlb on the
devices.
Fixes: b41e38e225 ("iommu/vt-d: Add nested domain allocation")
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20240208082307.15759-6-yi.l.liu@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
If a domain is used as the parent in nested translation its mappings might
be cached using DID of the nested domain. But the existing code ignores
this fact to only invalidate the iotlb entries tagged by the domain's own
DID.
Loop the s1_domains list, if any, to invalidate all iotlb entries related
to the target s2 address range. According to VT-d spec there is no need for
software to explicitly flush the affected s1 cache. It's implicitly done by
HW when s2 cache is invalidated.
Fixes: b41e38e225 ("iommu/vt-d: Add nested domain allocation")
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20240208082307.15759-4-yi.l.liu@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Today the parent domain (s2_domain) is unaware of which DID's are
used by and which devices are attached to nested domains (s1_domain)
nested on it. This leads to a problem that some operations (flush
iotlb/devtlb and enable dirty tracking) on parent domain only apply to
DID's and devices directly tracked in the parent domain hence are
incomplete.
This tracks the nested domains in list in parent domain. With this,
operations on parent domain can loop the nested domains and refer to
the devices and iommu_array to ensure the operations on parent domain
take effect on all the affected devices and iommus.
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20240208082307.15759-2-yi.l.liu@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Commit 1fd4a5a36f ("drm/connector: Rename legacy TV property") failed
to update all the users of the struct drm_tv_connector_state mode field,
which resulted in a build failure in i915.
However, a subsequent commit in the same series reintroduced a mode
field in that structure, with a different semantic but the same type,
with the assumption that all previous users were updated.
Since that didn't happen, the i915 driver now compiles, but mixes
accesses to the legacy_mode field and the newer mode field, but with the
previous semantics.
This obviously doesn't work very well, so we need to update the accesses
that weren't in the legacy renaming commit.
Fixes: 1fd4a5a36f ("drm/connector: Rename legacy TV property")
Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Maxime Ripard <mripard@kernel.org>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240220131251.453060-1-mripard@kernel.org
(cherry picked from commit bf7626f19d)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
The expectation is that cxl_parse_cfwms() continues in the face the of
failure as evidenced by code like:
cxlrd = cxl_root_decoder_alloc(root_port, ways, cxl_calc_hb);
if (IS_ERR(cxlrd))
return 0;
There are other error paths in that function which mistakenly follow
idiomatic expectations and return an error when they should not. Most of
those mistakes are innocuous checks that hardly ever fail in practice.
However, a recent change succeed in making the implementation more
fragile by applying an idiomatic, but still wrong "fix" [1]. In this
failure case the kernel reports:
cxl root0: Failed to populate active decoder targets
cxl_acpi ACPI0017:00: Failed to add decode range: [mem 0x00000000-0x7fffffff flags 0x200]
...which is a real issue with that one window (to be fixed separately),
but ends up failing the entirety of cxl_acpi_probe().
Undo that recent breakage while also removing the confusion about
ignoring errors. Update all exits paths to return an error per typical
expectations and let an outer wrapper function handle dropping the
error.
Fixes: 91019b5bc7 ("cxl/acpi: Return 'rc' instead of '0' in cxl_parse_cfmws()") [1]
Cc: <stable@vger.kernel.org>
Cc: Breno Leitao <leitao@debian.org>
Cc: Alison Schofield <alison.schofield@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Initial tests with the CXL CPER implementation identified that error
reports were being duplicated in the log and the trace event [1]. Then
it was discovered that the notification handler took sleeping locks
while the GHES event handling runs in spin_lock_irqsave() context [2]
While the duplicate reporting was fixed in v6.8-rc4, the fix for the
sleeping-lock-vs-atomic collision would enjoy more time to settle and
gain some test cycles. Given how late it is in the development cycle,
remove the CXL hookup for now and try again during the next merge
window.
Note that end result is that v6.8 does not emit CXL CPER payloads to the
kernel log, but this is in line with the CXL trend to move error
reporting to trace events instead of the kernel log.
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Rafael J. Wysocki <rafael@kernel.org>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: http://lore.kernel.org/r/20240108165855.00002f5a@Huawei.com [1]
Closes: http://lore.kernel.org/r/b963c490-2c13-4b79-bbe7-34c6568423c7@moroto.mountain [2]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Pull rdma fixes from Jason Gunthorpe:
"Mostly irdma and bnxt_re fixes:
- Missing error unwind in hf1
- For bnxt - fix fenching behavior to work on new chips, fail
unsupported SRQ resize back to userspace, propogate SRQ FW failure
back to userspace.
- Correctly fail unsupported SRQ resize back to userspace in bnxt
- Adjust a memcpy in mlx5 to not overflow a struct field.
- Prevent userspace from triggering mlx5 fw syndrome logging from
sysfs
- Use the correct access mode for MLX5_IB_METHOD_DEVX_OBJ_MODIFY to
avoid a userspace failure on modify
- For irdma - Don't UAF a concurrent tasklet during destroy, prevent
userspace from issuing invalid QP attrs, fix a possible CQ
overflow, capture a missing HW async error event
- sendmsg() triggerable memory access crash in hfi1
- Fix the srpt_service_guid parameter to not crash due to missing
function pointer
- Don't leak objects in error unwind in qedr
- Don't weirdly cast function pointers in srpt"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/srpt: fix function pointer cast warnings
RDMA/qedr: Fix qedr_create_user_qp error flow
RDMA/srpt: Support specifying the srpt_service_guid parameter
IB/hfi1: Fix sdma.h tx->num_descs off-by-one error
RDMA/irdma: Add AE for too many RNRS
RDMA/irdma: Set the CQ read threshold for GEN 1
RDMA/irdma: Validate max_send_wr and max_recv_wr
RDMA/irdma: Fix KASAN issue with tasklet
RDMA/mlx5: Relax DEVX access upon modify commands
IB/mlx5: Don't expose debugfs entries for RRoCE general parameters if not supported
RDMA/mlx5: Fix fortify source warning while accessing Eth segment
RDMA/bnxt_re: Add a missing check in bnxt_qplib_query_srq
RDMA/bnxt_re: Return error for SRQ resize
RDMA/bnxt_re: Fix unconditional fence for newer adapters
RDMA/bnxt_re: Remove a redundant check inside bnxt_re_vf_res_config
RDMA/bnxt_re: Avoid creating fence MR for newer adapters
IB/hfi1: Fix a memleak in init_credit_return
release_free_meta() accesses the shadow directly through the path
kasan_slab_free
__kasan_slab_free
kasan_release_object_meta
release_free_meta
kasan_mem_to_shadow
There are no kasan_arch_is_ready() guards here, allowing an oops when the
shadow is not initialized. The oops can be seen on a Power8 KVM guest.
This patch adds the guard to release_free_meta(), as it's the first level
that specifically requires the shadow.
It is safe to put the guard at the start of this function, before the
stack put: only kasan_save_free_info() can initialize the saved stack,
which itself is guarded with kasan_arch_is_ready() by its caller
poison_slab_object(). If the arch becomes ready before
release_free_meta() then we will not observe KASAN_SLAB_FREE_META in the
object's shadow, so we will not put an uninitialized stack either.
Link: https://lkml.kernel.org/r/20240213033958.139383-1-bgray@linux.ibm.com
Fixes: 63b85ac56a ("kasan: stop leaking stack trace handles")
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
For online parameters change, DAMON_LRU_SORT creates new schemes based on
latest values of the parameters and replaces the old schemes with the new
one. When creating it, the internal status of the quotas of the old
schemes is not preserved. As a result, charging of the quota starts from
zero after the online tuning. The data that collected to estimate the
throughput of the scheme's action is also reset, and therefore the
estimation should start from the scratch again. Because the throughput
estimation is being used to convert the time quota to the effective size
quota, this could result in temporal time quota inaccuracy. It would be
recovered over time, though. In short, the quota accuracy could be
temporarily degraded after online parameters update.
Fix the problem by checking the case and copying the internal fields for
the status.
Link: https://lkml.kernel.org/r/20240216194025.9207-3-sj@kernel.org
Fixes: 40e983cca9 ("mm/damon: introduce DAMON-based LRU-lists Sorting")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: <stable@vger.kernel.org> [6.0+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "mm/damon: fix quota status loss due to online tunings".
DAMON_RECLAIM and DAMON_LRU_SORT is not preserving internal quota status
when applying new user parameters, and hence could cause temporal quota
accuracy degradation. Fix it by preserving the status.
This patch (of 2):
For online parameters change, DAMON_RECLAIM creates new scheme based on
latest values of the parameters and replaces the old scheme with the new
one. When creating it, the internal status of the quota of the old
scheme is not preserved. As a result, charging of the quota starts from
zero after the online tuning. The data that collected to estimate the
throughput of the scheme's action is also reset, and therefore the
estimation should start from the scratch again. Because the throughput
estimation is being used to convert the time quota to the effective size
quota, this could result in temporal time quota inaccuracy. It would be
recovered over time, though. In short, the quota accuracy could be
temporarily degraded after online parameters update.
Fix the problem by checking the case and copying the internal fields for
the status.
Link: https://lkml.kernel.org/r/20240216194025.9207-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20240216194025.9207-2-sj@kernel.org
Fixes: e035c280f6 ("mm/damon/reclaim: support online inputs update")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: <stable@vger.kernel.org> [5.19+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
'commit_schemes_quota_goals' command handler,
damos_sysfs_set_quota_scores() assumes the number of schemes sysfs
directory will be same to the number of schemes of the DAMON context. The
assumption is wrong since users can remove schemes sysfs directories while
DAMON is running. In the case, illegal memory accesses can happen. Fix
it by checking the case.
Link: https://lkml.kernel.org/r/20240213023633.124928-1-sj@kernel.org
Fixes: d91beaa505 ("mm/damon/sysfs-schemes: implement a command for scheme quota goals only commit")
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Trying to run the iov_iter unit test on a nommu system such as the qemu
kc705-nommu emulation results in a crash.
KTAP version 1
# Subtest: iov_iter
# module: kunit_iov_iter
1..9
BUG: failure at mm/nommu.c:318/vmap()!
Kernel panic - not syncing: BUG!
The test calls vmap() directly, but vmap() is not supported on nommu
systems, causing the crash. TEST_IOV_ITER therefore needs to depend on
MMU.
Link: https://lkml.kernel.org/r/20240208153010.1439753-1-linux@roeck-us.net
Fixes: 2d71340ff1 ("iov_iter: Kunit tests for copying to/from an iterator")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Cc: David Howells <dhowells@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
When skipping swapcache for SWP_SYNCHRONOUS_IO, if two or more threads
swapin the same entry at the same time, they get different pages (A, B).
Before one thread (T0) finishes the swapin and installs page (A) to the
PTE, another thread (T1) could finish swapin of page (B), swap_free the
entry, then swap out the possibly modified page reusing the same entry.
It breaks the pte_same check in (T0) because PTE value is unchanged,
causing ABA problem. Thread (T0) will install a stalled page (A) into the
PTE and cause data corruption.
One possible callstack is like this:
CPU0 CPU1
---- ----
do_swap_page() do_swap_page() with same entry
<direct swapin path> <direct swapin path>
<alloc page A> <alloc page B>
swap_read_folio() <- read to page A swap_read_folio() <- read to page B
<slow on later locks or interrupt> <finished swapin first>
... set_pte_at()
swap_free() <- entry is free
<write to page B, now page A stalled>
<swap out page B to same swap entry>
pte_same() <- Check pass, PTE seems
unchanged, but page A
is stalled!
swap_free() <- page B content lost!
set_pte_at() <- staled page A installed!
And besides, for ZRAM, swap_free() allows the swap device to discard the
entry content, so even if page (B) is not modified, if swap_read_folio()
on CPU0 happens later than swap_free() on CPU1, it may also cause data
loss.
To fix this, reuse swapcache_prepare which will pin the swap entry using
the cache flag, and allow only one thread to swap it in, also prevent any
parallel code from putting the entry in the cache. Release the pin after
PT unlocked.
Racers just loop and wait since it's a rare and very short event. A
schedule_timeout_uninterruptible(1) call is added to avoid repeated page
faults wasting too much CPU, causing livelock or adding too much noise to
perf statistics. A similar livelock issue was described in commit
029c4628b2 ("mm: swap: get rid of livelock in swapin readahead")
Reproducer:
This race issue can be triggered easily using a well constructed
reproducer and patched brd (with a delay in read path) [1]:
With latest 6.8 mainline, race caused data loss can be observed easily:
$ gcc -g -lpthread test-thread-swap-race.c && ./a.out
Polulating 32MB of memory region...
Keep swapping out...
Starting round 0...
Spawning 65536 workers...
32746 workers spawned, wait for done...
Round 0: Error on 0x5aa00, expected 32746, got 32743, 3 data loss!
Round 0: Error on 0x395200, expected 32746, got 32743, 3 data loss!
Round 0: Error on 0x3fd000, expected 32746, got 32737, 9 data loss!
Round 0 Failed, 15 data loss!
This reproducer spawns multiple threads sharing the same memory region
using a small swap device. Every two threads updates mapped pages one by
one in opposite direction trying to create a race, with one dedicated
thread keep swapping out the data out using madvise.
The reproducer created a reproduce rate of about once every 5 minutes, so
the race should be totally possible in production.
After this patch, I ran the reproducer for over a few hundred rounds and
no data loss observed.
Performance overhead is minimal, microbenchmark swapin 10G from 32G
zram:
Before: 10934698 us
After: 11157121 us
Cached: 13155355 us (Dropping SWP_SYNCHRONOUS_IO flag)
[kasong@tencent.com: v4]
Link: https://lkml.kernel.org/r/20240219082040.7495-1-ryncsn@gmail.com
Link: https://lkml.kernel.org/r/20240206182559.32264-1-ryncsn@gmail.com
Fixes: 0bcac06f27 ("mm, swap: skip swapcache for swapin of synchronous device")
Reported-by: "Huang, Ying" <ying.huang@intel.com>
Closes: https://lore.kernel.org/lkml/87bk92gqpx.fsf_-_@yhuang6-desk2.ccr.corp.intel.com/
Link: https://github.com/ryncsn/emm-test-project/tree/master/swap-stress-race [1]
Signed-off-by: Kairui Song <kasong@tencent.com>
Reviewed-by: "Huang, Ying" <ying.huang@intel.com>
Acked-by: Yu Zhao <yuzhao@google.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Chris Li <chrisl@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Yosry Ahmed <yosryahmed@google.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Barry Song <21cnbao@gmail.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
When a folio is swapped in, the protection size of the corresponding zswap
LRU is incremented, so that the zswap shrinker is more conservative with
its reclaiming action. This field is embedded within the struct lruvec,
so updating it requires looking up the folio's memcg and lruvec. However,
currently this lookup can happen after the folio is unlocked, for instance
if a new folio is allocated, and swap_read_folio() unlocks the folio
before returning. In this scenario, there is no stability guarantee for
the binding between a folio and its memcg and lruvec:
* A folio's memcg and lruvec can be freed between the lookup and the
update, leading to a UAF.
* Folio migration can clear the now-unlocked folio's memcg_data, which
directs the zswap LRU protection size update towards the root memcg
instead of the original memcg. This was recently picked up by the
syzbot thanks to a warning in the inlined folio_lruvec() call.
Move the zswap LRU protection range update above the swap_read_folio()
call, and only when a new page is allocated, to prevent this.
[nphamcs@gmail.com: add VM_WARN_ON_ONCE() to zswap_folio_swapin()]
Link: https://lkml.kernel.org/r/20240206180855.3987204-1-nphamcs@gmail.com
[nphamcs@gmail.com: remove unneeded if (folio) checks]
Link: https://lkml.kernel.org/r/20240206191355.83755-1-nphamcs@gmail.com
Link: https://lkml.kernel.org/r/20240205232442.3240571-1-nphamcs@gmail.com
Fixes: b5ba474f3f ("zswap: shrink zswap pool based on memory pressure")
Reported-by: syzbot+17a611d10af7d18a7092@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/000000000000ae47f90610803260@google.com/
Signed-off-by: Nhat Pham <nphamcs@gmail.com>
Reviewed-by: Chengming Zhou <zhouchengming@bytedance.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Yosry Ahmed <yosryahmed@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
kdamond_apply_schemes() checks apply intervals of schemes and avoid
further applying any schemes if no scheme passed its apply interval.
However, the following schemes applying function, damon_do_apply_schemes()
iterates all schemes without the apply interval check. As a result, the
shortest apply interval is applied to all schemes. Fix the problem by
checking the apply interval in damon_do_apply_schemes().
Link: https://lkml.kernel.org/r/20240205201306.88562-1-sj@kernel.org
Fixes: 42f994b714 ("mm/damon/core: implement scheme-specific apply interval")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: <stable@vger.kernel.org> [6.7.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
In zswap_writeback_entry(), after we get a folio from
__read_swap_cache_async(), we grab the tree lock again to check that the
swap entry was not invalidated and recycled. If it was, we delete the
folio we just added to the swap cache and exit.
However, __read_swap_cache_async() returns the folio locked when it is
newly allocated, which is always true for this path, and the folio is
ref'd. Make sure to unlock and put the folio before returning.
This was discovered by code inspection, probably because this path handles
a race condition that should not happen often, and the bug would not crash
the system, it will only strand the folio indefinitely.
Link: https://lkml.kernel.org/r/20240125085127.1327013-1-yosryahmed@google.com
Fixes: 04fc781608 ("mm: fix zswap writeback race condition")
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
Reviewed-by: Chengming Zhou <zhouchengming@bytedance.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Nhat Pham <nphamcs@gmail.com>
Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The addition of the XFS online fsck documentation starting with
commit a8f6c2e54d ("xfs: document the motivation for online fsck design")
added a deeper level of nesting than LaTeX is prepared to deal with. That
caused a pdfdocs build failure with the helpful "Too deeply nested" error
message buried deeply in Documentation/output/filesystems.log.
Increase the "maxlistdepth" parameter to instruct LaTeX that it needs to
deal with the deeper nesting whether it wants to or not.
Suggested-by: Akira Yokosawa <akiyks@gmail.com>
Tested-by: Akira Yokosawa <akiyks@gmail.com>
Cc: stable@vger.kernel.org # v6.4+
Link: https://lore.kernel.org/linux-doc/67f6ac60-7957-4b92-9d72-a08fbad0e028@gmail.com/
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Commit 91fdbce7e8 ("ice: Add support in the driver for associating
queue with napi") invoked the netif_queue_set_napi() call. This
kernel function requires to be called with rtnl_lock taken,
otherwise ASSERT_RTNL() warning will be triggered. ice_vsi_rebuild()
initiating this call is under rtnl_lock when the rebuild is in
response to configuration changes from external interfaces (such as
tc, ethtool etc. which holds the lock). But, the VSI rebuild
generated from service tasks and resets (PFR/CORER/GLOBR) is not
under rtnl lock protection. Handle these cases as well to hold lock
before the kernel call (by setting the 'locked' boolean to false).
netif_queue_set_napi() is also used to clear previously set napi
in the q_vector unroll flow. Handle this for locked/lockless execution
paths.
Fixes: 91fdbce7e8 ("ice: Add support in the driver for associating queue with napi")
Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Several Freescale Layerscape platforms extirq binding use a malformed
interrupt-map property missing parent address cells. These are
documented in of_irq_imap_abusers list in drivers/of/irq.c. In order to
enable dtc interrupt_map check tree wide, we need to disable it for
these platforms which will not be fixed (as that would break
compatibility).
Signed-off-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20240213-arm-dt-cleanups-v1-1-f2dee1292525@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Some fixes to make devicetrees conform to bindings better (pwm irqs), dt
styling fixes (unneeded jaguar status, whitespaces, Cool Pi regulator
naming) and functionality fixes (px30 spi chipselect number, allowing
rk3588-evb1 to turn off, pcie lane numbers on CoolPi, wrong gpio-names
on Indidroid Nova and some CoolPi sdmmc aliases to match what uboot uses).
* tag 'v6.8-rockchip-dtsfixes1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip:
arm64: dts: rockchip: Correct Indiedroid Nova GPIO Names
arm64: dts: rockchip: Drop interrupts property from rk3328 pwm-rockchip node
arm64: dts: rockchip: set num-cs property for spi on px30
arm64: dts: rockchip: minor rk3588 whitespace cleanup
arm64: dts: rockchip: drop unneeded status from rk3588-jaguar gpio-leds
ARM: dts: rockchip: Drop interrupts property from pwm-rockchip nodes
arm64: dts: rockchip: Fix the num-lanes of pcie3x4 on Cool Pi CM5 EVB
arm64: dts: rockchip: rename vcc5v0_usb30_host regulator for Cool Pi CM5 EVB
arm64: dts: rockchip: aliase sdmmc as mmc1 for Cool Pi CM5 EVB
arm64: dts: rockchip: aliase sdmmc as mmc1 for Cool Pi 4B
arm64: dts: rockchip: mark system power controller on rk3588-evb1
Link: https://lore.kernel.org/r/2450634.jE0xQCEvom@phil
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Guenter Roeck reports that commit a64056bb5a ("drm/tests/drm_buddy:
add alloc_contiguous test") causes build failures on 32-bit targets:
"This patch breaks the build on all 32-bit systems since it introduces
an unhandled direct 64-bit divide operation.
ERROR: modpost: "__umoddi3" [drivers/gpu/drm/tests/drm_buddy_test.ko] undefined!
ERROR: modpost: "__moddi3" [drivers/gpu/drm/tests/drm_buddy_test.ko] undefined!"
and the uses of 'u64' are all entirely pointless. Yes, the arguments to
drm_buddy_init() and drm_buddy_alloc_blocks() are in fact of type 'u64',
but none of the values here are remotely relevant, and the compiler will
happily just do the type expansion.
Of course, in a perfect world the compiler would also have just noticed
that all the values in question are tiny, and range analysis would have
shown that doing a 64-bit divide is pointless, but that is admittedly
expecting a fair amount of the compiler.
IOW, we shouldn't write code that the compiler then has to notice is
unnecessarily complicated just to avoid extra work. We do have fairly
high expectations of compilers, but kernel code should be reasonable to
begin with.
It turns out that there are also other issues with this code: the KUnit
assertion messages have incorrect types in the format strings, but
that's a widely spread issue caused by the KUnit infrastructure not
having enabled format string verification. We'll get that sorted out
separately.
Reported-by: Guenter Roeck <linux@roeck-us.net>
Fixes: a64056bb5a ("drm/tests/drm_buddy: add alloc_contiguous test")
Link: https://lore.kernel.org/all/538327ff-8d34-41d5-a9ae-1a334744f5ae@roeck-us.net/
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
"struct bvec_iter" is defined with the __packed attribute, so it is
aligned on a single byte. On X86 (and on other architectures that support
unaligned addresses in hardware), "struct bvec_iter" is accessed using the
8-byte and 4-byte memory instructions, however these instructions are less
efficient if they operate on unaligned addresses.
(on RISC machines that don't have unaligned access in hardware, GCC
generates byte-by-byte accesses that are very inefficient - see [1])
This commit reorders the entries in "struct dm_verity_io" and "struct
convert_context", so that "struct bvec_iter" is aligned on 8 bytes.
[1] https://lore.kernel.org/all/ZcLuWUNRZadJr0tQ@fedora/T/
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
If a userspace process reads (with O_DIRECT) multiple blocks into the same
buffer, dm-crypt reports an authentication error [1]. The error is
reported in a log and it may cause RAID leg being kicked out of the
array.
This commit fixes dm-crypt, so that if integrity verification fails, the
data is read again into a kernel buffer (where userspace can't modify it)
and the integrity tag is rechecked. If the recheck succeeds, the content
of the kernel buffer is copied into the user buffer; if the recheck fails,
an integrity error is reported.
[1] https://people.redhat.com/~mpatocka/testcases/blk-auth-modify/read2.c
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
It was said that authenticated encryption could produce invalid tag when
the data that is being encrypted is modified [1]. So, fix this problem by
copying the data into the clone bio first and then encrypt them inside the
clone bio.
This may reduce performance, but it is needed to prevent the user from
corrupting the device by writing data with O_DIRECT and modifying them at
the same time.
[1] https://lore.kernel.org/all/20240207004723.GA35324@sol.localdomain/T/
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
If a userspace process reads (with O_DIRECT) multiple blocks into the same
buffer, dm-verity reports an error [1].
This commit fixes dm-verity, so that if hash verification fails, the data
is read again into a kernel buffer (where userspace can't modify it) and
the hash is rechecked. If the recheck succeeds, the content of the kernel
buffer is copied into the user buffer; if the recheck fails, an error is
reported.
[1] https://people.redhat.com/~mpatocka/testcases/blk-auth-modify/read2.c
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
If a userspace process reads (with O_DIRECT) multiple blocks into the same
buffer, dm-integrity reports an error [1]. The error is reported in a log
and it may cause RAID leg being kicked out of the array.
This commit fixes dm-integrity, so that if integrity verification fails,
the data is read again into a kernel buffer (where userspace can't modify
it) and the integrity tag is rechecked. If the recheck succeeds, the
content of the kernel buffer is copied into the user buffer; if the
recheck fails, an integrity error is reported.
[1] https://people.redhat.com/~mpatocka/testcases/blk-auth-modify/read2.c
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Do not allow to set phase adjust value for a pin if PF reset is in
progress, this would cause confusing netlink extack errors as the firmware
cannot process the request properly during the reset time.
Return (-EBUSY) and report extack error for the user who tries configure
pin phase adjust during the reset time.
Test by looping execution of below steps until netlink error appears:
- perform PF reset
$ echo 1 > /sys/class/net/<ice PF>/device/reset
- change pin phase adjust value:
$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/dpll.yaml \
--do pin-set --json '{"id":0, "phase-adjust":1000}'
Fixes: 90e1c90750 ("ice: dpll: implement phase related callbacks")
Reviewed-by: Igor Bagnucki <igor.bagnucki@intel.com>
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Do not allow dpll periodic work function to acquire data from firmware
if PF reset is in progress. Acquiring data will cause dmesg errors as the
firmware cannot respond or process the request properly during the reset
time.
Test by looping execution of below step until dmesg error appears:
- perform PF reset
$ echo 1 > /sys/class/net/<ice PF>/device/reset
Fixes: d7999f5ea6 ("ice: implement dpll interface to control cgu")
Reviewed-by: Igor Bagnucki <igor.bagnucki@intel.com>
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Do not allow to acquire data or alter configuration of dpll and pins
through firmware if PF reset is in progress, this would cause confusing
netlink extack errors as the firmware cannot respond or process the
request properly during the reset time.
Return (-EBUSY) and extack error for the user who tries access/modify
the config of dpll/pin through firmware during the reset time.
The PF reset and kernel access to dpll data are both asynchronous. It is
not possible to guard all the possible reset paths with any determinictic
approach. I.e., it is possible that reset starts after reset check is
performed (or if the reset would be checked after mutex is locked), but at
the same time it is not possible to wait for dpll mutex unlock in the
reset flow.
This is best effort solution to at least give a clue to the user
what is happening in most of the cases, knowing that there are possible
race conditions where the user could see a different error received
from firmware due to reset unexpectedly starting.
Test by looping execution of below steps until netlink error appears:
- perform PF reset
$ echo 1 > /sys/class/net/<ice PF>/device/reset
- i.e. try to alter/read dpll/pin config:
$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/dpll.yaml \
--dump pin-get
Fixes: d7999f5ea6 ("ice: implement dpll interface to control cgu")
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The value of phase_adjust for input pin shall be updated in
ice_dpll_pin_state_update(..). Fix by adding proper argument to the
firmware query function call - a pin's struct field pointer where the
phase_adjust value during driver runtime is stored.
Previously the phase_adjust used to misinform user about actual
phase_adjust value. I.e., if phase_adjust was set to a non zero value and
if driver was reloaded, the user would see the value equal 0, which is
not correct - the actual value is equal to value set before driver reload.
Fixes: 90e1c90750 ("ice: dpll: implement phase related callbacks")
Reviewed-by: Alan Brady <alan.brady@intel.com>
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Fix the connection state between source DPLL and output pin, updating the
attribute 'state' of 'parent_device'. Previously, the connection state
was broken, and didn't reflect the correct state.
When 'state_on_dpll_set' is called with the value
'DPLL_PIN_STATE_CONNECTED' (1), the output pin will switch to the given
DPLL, and the state of the given DPLL will be set to connected.
E.g.:
--do pin-set --json '{"id":2, "parent-device":{"parent-id":1,
"state": 1 }}'
This command will connect DPLL device with id 1 to output pin with id 2.
When 'state_on_dpll_set' is called with the value
'DPLL_PIN_STATE_DISCONNECTED' (2) and the given DPLL is currently
connected, then the output pin will be disabled.
E.g:
--do pin-set --json '{"id":2, "parent-device":{"parent-id":1,
"state": 2 }}'
This command will disable output pin with id 2 if DPLL device with ID 1 is
connected to it; otherwise, the command is ignored.
Fixes: d7999f5ea6 ("ice: implement dpll interface to control cgu")
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Signed-off-by: Yochai Hagvi <yochai.hagvi@intel.com>
Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
On some systems, sys_membarrier can be very expensive, causing overall
slowdowns for everything. So put a lock on the path in order to
serialize the accesses to prevent the ability for this to be called at
too high of a frequency and saturate the machine.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-and-tested-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Borislav Petkov <bp@alien8.de>
Fixes: 22e4ebb975 ("membarrier: Provide expedited private command")
Fixes: c5f58bd58f ("membarrier: Provide GLOBAL_EXPEDITED command")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
i.MX fixes for 6.8:
- A tqma8mpql device tree fix to correct audio codec iov-supply.
- A couple of USB-C connector DT description revert to fix regression
on imx8mp-dhcom-pdk3 and imx8mn-var-som-symphony board.
- Fix valid range check for imx-weim bus driver.
- Disable UART4 on Data Modul i.MX8M Plus eDM SBC to avoid boot hang
in case that RDC protection is in place.
* tag 'imx-fixes-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux:
bus: imx-weim: fix valid range check
Revert "arm64: dts: imx8mn-var-som-symphony: Describe the USB-C connector"
Revert "arm64: dts: imx8mp-dhcom-pdk3: Describe the USB-C connector"
arm64: dts: tqma8mpql: fix audio codec iov-supply
arm64: dts: imx8mp: Disable UART4 by default on Data Modul i.MX8M Plus eDM SBC
Link: https://lore.kernel.org/r/20240206151744.2459-1-shawnguo2@yeah.net
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Randomly a Lenovo Z13 will trigger a kernel warning traceback from this
condition:
```
if (WARN_ON((profile < 0) || (profile >= ARRAY_SIZE(profile_names))))
```
This happens because thinkpad-acpi always assumes that
convert_dytc_to_profile() successfully updated the profile. On the
contrary a condition can occur that when dytc_profile_refresh() is called
the profile doesn't get updated as there is a -EOPNOTSUPP branch.
Catch this situation and avoid updating the profile. Also log this into
dynamic debugging in case any other modes should be added in the future.
Fixes: c3bfcd4c67 ("platform/x86: thinkpad_acpi: Add platform profile support")
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://lore.kernel.org/r/20240217022311.113879-1-mario.limonciello@amd.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
After commit b286f4e87e ("serial: core: Move tty and serdev to be
children of serial core port device") x86_instantiate_serdev() no longer
works due to the serdev-controller-device moving in the device hierarchy
from (e.g.) /sys/devices/pci0000:00/8086228A:00/serial0 to
/sys/devices/pci0000:00/8086228A:00/8086228A:00:0/8086228A:00:0.0/serial0
Use the new get_serdev_controller() helper function to fix this.
Fixes: b286f4e87e ("serial: core: Move tty and serdev to be children of serial core port device")
Cc: Tony Lindgren <tony@atomide.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20240216201721.239791-4-hdegoede@redhat.com
In some cases UART attached devices which require an in kernel driver,
e.g. UART attached Bluetooth HCIs are described in the ACPI tables
by an ACPI device with a broken or missing UartSerialBusV2() resource.
This causes the kernel to create a /dev/ttyS# char-device for the UART
instead of creating an in kernel serdev-controller + serdev-device pair
for the in kernel driver.
The quirk handling in acpi_quirk_skip_serdev_enumeration() makes the kernel
create a serdev-controller device for these UARTs instead of a /dev/ttyS#.
Instantiating the actual serdev-device to bind to is up to pdx86 code,
so far this was handled by the x86-android-tablets code. But since
commit b286f4e87e ("serial: core: Move tty and serdev to be children of
serial core port device") the serdev-controller device has moved in the
device hierarchy from (e.g.) /sys/devices/pci0000:00/8086228A:00/serial0 to
/sys/devices/pci0000:00/8086228A:00/8086228A:00:0/8086228A:00:0.0/serial0 .
This makes this a bit trickier to do and another driver is in the works
which will also need this functionality.
Add a new helper to get the serdev-controller device, so that the new
code for this can be shared.
Fixes: b286f4e87e ("serial: core: Move tty and serdev to be children of serial core port device")
Cc: Tony Lindgren <tony@atomide.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20240216201721.239791-3-hdegoede@redhat.com
After commit 4014ae236b ("platform/x86: x86-android-tablets: Stop using
gpiolib private APIs") the touchscreen in the keyboard half of
the Lenovo Yogabook1 X90 stopped working with the following error:
Goodix-TS i2c-goodix_ts: error -EBUSY: Failed to get irq GPIO
The problem is that when getting the IRQ for instantiated i2c_client-s
from a GPIO (rather then using an IRQ directly from the IOAPIC),
x86_acpi_irq_helper_get() now properly requests the GPIO, which disallows
other drivers from requesting it. Normally this is a good thing, but
the goodix touchscreen also uses the IRQ as an output during reset
to select which of its 2 possible I2C addresses should be used.
Add a new free_gpio flag to struct x86_acpi_irq_data to deal with this
and release the GPIO after getting the IRQ in this special case.
Fixes: 4014ae236b ("platform/x86: x86-android-tablets: Stop using gpiolib private APIs")
Cc: stable@vger.kernel.org
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20240216201721.239791-2-hdegoede@redhat.com
The fields in SMCR_EL1 and SMPRI_EL1 reset to an architecturally UNKNOWN
value. Since we do not otherwise manage the traps configured in this
register at runtime we need to reconfigure them after a suspend in case
nothing else was kind enough to preserve them for us.
The vector length will be restored as part of restoring the SME state for
the next SME using task.
Fixes: a1f4ccd25c ("arm64/sme: Provide Kconfig for SME")
Reported-by: Jackson Cooper-Driver <Jackson.Cooper-Driver@arm.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20240213-arm64-sme-resume-v3-1-17e05e493471@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
This reverts commit f9daab0ad0.
Geert reports that his particular GCC 5.5 vintage toolchain fails to
build an arm64 defconfig because of this change:
| arch/arm64/include/asm/jump_label.h:25:2: error: invalid 'asm':
| invalid operand
| asm goto(
^
Aopparently, this is something we claim to support, so let's revert back
to the old jump label constraint for now while discussions about raising
the minimum GCC version are ongoing.
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Link: https://lore.kernel.org/r/CAMuHMdX+6fnAf8Hm6EqYJPAjrrLO9T7c=Gu3S8V_pqjSDowJ6g@mail.gmail.com
Signed-off-by: Will Deacon <will@kernel.org>
Similar to gpiochip_generic_request() and gpiochip_generic_free() the
gpiochip_generic_config() function needs to handle the case where there
are no pinctrl pins mapped to the GPIOs, usually through the gpio-ranges
device tree property.
Commit f34fd6ee1b ("gpio: dwapb: Use generic request, free and
set_config") set the .set_config callback to gpiochip_generic_config()
in the dwapb GPIO driver so the GPIO API can set pinctrl configuration
for the corresponding pins. Most boards using the dwapb driver do not
set the gpio-ranges device tree property though, and in this case
gpiochip_generic_config() would return -EPROPE_DEFER rather than the
previous -ENOTSUPP return value. This in turn makes
gpio_set_config_with_argument_optional() fail and propagate the error to
any driver requesting GPIOs.
Fixes: 2956b5d94a ("pinctrl / gpio: Introduce .set_config() callback for GPIO chips")
Reported-by: Jisheng Zhang <jszhang@kernel.org>
Closes: https://lore.kernel.org/linux-gpio/ZdC_g3U4l0CJIWzh@xhacker/
Tested-by: Jisheng Zhang <jszhang@kernel.org>
Signed-off-by: Emil Renner Berthing <emil.renner.berthing@canonical.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Currently, rebooting a pseries nested qemu-kvm guest (L2) results in
below error as L1 qemu sends PVR value 'arch_compat' == 0 via
ppc_set_compat ioctl. This triggers a condition failure in
kvmppc_set_arch_compat() resulting in an EINVAL.
qemu-system-ppc64: Unable to set CPU compatibility mode in KVM: Invalid
argument
Also, a value of 0 for arch_compat generally refers the default
compatibility of the host. But, arch_compat, being a Guest Wide Element
in nested API v2, cannot be set to 0 in GSB as PowerVM (L0) expects a
non-zero value. A value of 0 triggers a kernel trap during a reboot and
consequently causes it to fail:
[ 22.106360] reboot: Restarting system
KVM: unknown exit, hardware reason ffffffffffffffea
NIP 0000000000000100 LR 000000000000fe44 CTR 0000000000000000 XER 0000000020040092 CPU#0
MSR 0000000000001000 HID0 0000000000000000 HF 6c000000 iidx 3 didx 3
TB 00000000 00000000 DECR 0
GPR00 0000000000000000 0000000000000000 c000000002a8c300 000000007fe00000
GPR04 0000000000000000 0000000000000000 0000000000001002 8000000002803033
GPR08 000000000a000000 0000000000000000 0000000000000004 000000002fff0000
GPR12 0000000000000000 c000000002e10000 0000000105639200 0000000000000004
GPR16 0000000000000000 000000010563a090 0000000000000000 0000000000000000
GPR20 0000000105639e20 00000001056399c8 00007fffe54abab0 0000000105639288
GPR24 0000000000000000 0000000000000001 0000000000000001 0000000000000000
GPR28 0000000000000000 0000000000000000 c000000002b30840 0000000000000000
CR 00000000 [ - - - - - - - - ] RES 000@ffffffffffffffff
SRR0 0000000000000000 SRR1 0000000000000000 PVR 0000000000800200 VRSAVE 0000000000000000
SPRG0 0000000000000000 SPRG1 0000000000000000 SPRG2 0000000000000000 SPRG3 0000000000000000
SPRG4 0000000000000000 SPRG5 0000000000000000 SPRG6 0000000000000000 SPRG7 0000000000000000
HSRR0 0000000000000000 HSRR1 0000000000000000
CFAR 0000000000000000
LPCR 0000000000020400
PTCR 0000000000000000 DAR 0000000000000000 DSISR 0000000000000000
kernel:trap=0xffffffea | pc=0x100 | msr=0x1000
This patch updates kvmppc_set_arch_compat() to use the host PVR value if
'compat_pvr' == 0 indicating that qemu doesn't want to enforce any
specific PVR compat mode.
The relevant part of the code might need a rework if PowerVM implements
a support for `arch_compat == 0` in nestedv2 API.
Fixes: 19d31c5f11 ("KVM: PPC: Add support for nestedv2 guests")
Reviewed-by: "Aneesh Kumar K.V (IBM)" <aneesh.kumar@kernel.org>
Reviewed-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Signed-off-by: Amit Machhiwal <amachhiw@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240207054526.3720087-1-amachhiw@linux.ibm.com
Since the merge of b717dfbf73 ("Revert "usb: typec: tcpm: fix
cc role at port reset"") into mainline the LibreTech Renegade
Elite/Firefly has died during boot, the main symptom observed in testing
is a sudden stop in console output. Gábor Stefanik identified in review
that the patch would cause power to be removed from devices without
batteries (like this board), observing that while the patch is correct
according to the spec this appears to be an oversight in the spec.
Given that the change makes previously working systems unusable let's
revert it, there was some discussion of identifying systems that have
alternative power and implementing the standards conforming behaviour in
only that case.
Fixes: b717dfbf73 ("Revert "usb: typec: tcpm: fix cc role at port reset"")
Cc: stable <stable@kernel.org>
Cc: Badhri Jagan Sridharan <badhri@google.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Acked-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/20240212-usb-fix-renegade-v1-1-22c43c88d635@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
if you have a variable that holds NULL or a pointer to live struct mount,
do not shove ERR_PTR() into it - not if you later treat "not NULL" as
"holds a pointer to object".
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
During VMentry VERW is executed to mitigate MDS. After VERW, any memory
access like register push onto stack may put host data in MDS affected
CPU buffers. A guest can then use MDS to sample host data.
Although likelihood of secrets surviving in registers at current VERW
callsite is less, but it can't be ruled out. Harden the MDS mitigation
by moving the VERW mitigation late in VMentry path.
Note that VERW for MMIO Stale Data mitigation is unchanged because of
the complexity of per-guest conditional VERW which is not easy to handle
that late in asm with no GPRs available. If the CPU is also affected by
MDS, VERW is unconditionally executed late in asm regardless of guest
having MMIO access.
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/all/20240213-delay-verw-v8-6-a6216d83edb7%40linux.intel.com
The VERW mitigation at exit-to-user is enabled via a static branch
mds_user_clear. This static branch is never toggled after boot, and can
be safely replaced with an ALTERNATIVE() which is convenient to use in
asm.
Switch to ALTERNATIVE() to use the VERW mitigation late in exit-to-user
path. Also remove the now redundant VERW in exc_nmi() and
arch_exit_to_user_mode().
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/all/20240213-delay-verw-v8-4-a6216d83edb7%40linux.intel.com
Mitigation for MDS is to use VERW instruction to clear any secrets in
CPU Buffers. Any memory accesses after VERW execution can still remain
in CPU buffers. It is safer to execute VERW late in return to user path
to minimize the window in which kernel data can end up in CPU buffers.
There are not many kernel secrets to be had after SWITCH_TO_USER_CR3.
Add support for deploying VERW mitigation after user register state is
restored. This helps minimize the chances of kernel data ending up into
CPU buffers after executing VERW.
Note that the mitigation at the new location is not yet enabled.
Corner case not handled
=======================
Interrupts returning to kernel don't clear CPUs buffers since the
exit-to-user path is expected to do that anyways. But, there could be
a case when an NMI is generated in kernel after the exit-to-user path
has cleared the buffers. This case is not handled and NMI returning to
kernel don't clear CPU buffers because:
1. It is rare to get an NMI after VERW, but before returning to userspace.
2. For an unprivileged user, there is no known way to make that NMI
less rare or target it.
3. It would take a large number of these precisely-timed NMIs to mount
an actual attack. There's presumably not enough bandwidth.
4. The NMI in question occurs after a VERW, i.e. when user state is
restored and most interesting data is already scrubbed. Whats left
is only the data that NMI touches, and that may or may not be of
any interest.
Suggested-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/all/20240213-delay-verw-v8-2-a6216d83edb7%40linux.intel.com
MDS mitigation requires clearing the CPU buffers before returning to
user. This needs to be done late in the exit-to-user path. Current
location of VERW leaves a possibility of kernel data ending up in CPU
buffers for memory accesses done after VERW such as:
1. Kernel data accessed by an NMI between VERW and return-to-user can
remain in CPU buffers since NMI returning to kernel does not
execute VERW to clear CPU buffers.
2. Alyssa reported that after VERW is executed,
CONFIG_GCC_PLUGIN_STACKLEAK=y scrubs the stack used by a system
call. Memory accesses during stack scrubbing can move kernel stack
contents into CPU buffers.
3. When caller saved registers are restored after a return from
function executing VERW, the kernel stack accesses can remain in
CPU buffers(since they occur after VERW).
To fix this VERW needs to be moved very late in exit-to-user path.
In preparation for moving VERW to entry/exit asm code, create macros
that can be used in asm. Also make VERW patching depend on a new feature
flag X86_FEATURE_CLEAR_CPU_BUF.
Reported-by: Alyssa Milburn <alyssa.milburn@intel.com>
Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/all/20240213-delay-verw-v8-1-a6216d83edb7%40linux.intel.com
Debugging shows a large number of unaligned access traps in the unwinder
code. Code analysis reveals a number of issues with this code:
- handle_interruption is passed twice through
dereference_kernel_function_descriptor()
- ret_from_kernel_thread, syscall_exit, intr_return,
_switch_to_ret, and _call_on_stack are passed through
dereference_kernel_function_descriptor() even though they are
not declared as function pointers.
To fix the problems, drop one of the calls to
dereference_kernel_function_descriptor() for handle_interruption,
and compare the other pointers directly.
Fixes: 6414b30b39 ("parisc: unwind: Avoid missing prototype warning for handle_interruption()")
Fixes: 8e0ba125c2 ("parisc/unwind: fix unwinder when CONFIG_64BIT is enabled")
Cc: Helge Deller <deller@gmx.de>
Cc: Sven Schnelle <svens@stackframe.org>
Cc: John David Anglin <dave.anglin@bell.net>
Cc: Charlie Jenkins <charlie@rivosinc.com>
Cc: David Laight <David.Laight@ACULAB.COM>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Helge Deller <deller@gmx.de>
On my EliteBook 840 G8 Notebook PC (ProdId 5S7R6EC#ABD; built 2022 for
german market) the Mute LED is always on. The mute button itself works
as expected. alsa-info.sh shows a different subsystem-id 0x8ab9 for
Realtek ALC285 Codec, thus the existing quirks for HP 840 G8 don't work.
Therefore, add a new quirk for this type of EliteBook.
Signed-off-by: Hans Peter <flurry123@gmx.ch>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20240219164518.4099-1-flurry123@gmx.ch
Signed-off-by: Takashi Iwai <tiwai@suse.de>
The driver must write 0 to HALO_STATE before sending the SYSTEM_RESET
command to the firmware.
HALO_STATE is in DSP memory, which is preserved across a soft reset.
The SYSTEM_RESET command does not change the value of HALO_STATE.
There is period of time while the CS35L56 is resetting, before the
firmware has started to boot, where a read of HALO_STATE will return
the value it had before the SYSTEM_RESET. If the driver does not
clear HALO_STATE, this would return BOOT_DONE status even though the
firmware has not booted.
Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Fixes: 8a731fd37f ("ASoC: cs35l56: Move utility functions to shared file")
Link: https://msgid.link/r/20240216140535.1434933-1-rf@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>
While calculating the hardware interrupt number for a MSI interrupt, the
higher bits (i.e. from bit-5 onwards a.k.a domain_nr >= 32) of the PCI
domain number gets truncated because of the shifted value casting to return
type of pci_domain_nr() which is 'int'. This for example is resulting in
same hardware interrupt number for devices 0019:00:00.0 and 0039:00:00.0.
To address this cast the PCI domain number to 'irq_hw_number_t' before left
shifting it to calculate the hardware interrupt number.
Please note that this fixes the issue only on 64-bit systems and doesn't
change the behavior for 32-bit systems i.e. the 32-bit systems continue to
have the issue. Since the issue surfaces only if there are too many PCIe
controllers in the system which usually is the case in modern server
systems and they don't tend to run 32-bit kernels.
Fixes: 3878eaefb8 ("PCI/MSI: Enhance core to support hierarchy irqdomain")
Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Shanker Donthineni <sdonthineni@nvidia.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240115135649.708536-1-vidyas@nvidia.com
RISC-V PLIC cannot "end-of-interrupt" (EOI) disabled interrupts, as
explained in the description of Interrupt Completion in the PLIC spec:
"The PLIC signals it has completed executing an interrupt handler by
writing the interrupt ID it received from the claim to the claim/complete
register. The PLIC does not check whether the completion ID is the same
as the last claim ID for that target. If the completion ID does not match
an interrupt source that *is currently enabled* for the target, the
completion is silently ignored."
Commit 69ea463021 ("irqchip/sifive-plic: Fixup EOI failed when masked")
ensured that EOI is successful by enabling interrupt first, before EOI.
Commit a1706a1c50 ("irqchip/sifive-plic: Separate the enable and mask
operations") removed the interrupt enabling code from the previous
commit, because it assumes that interrupt should already be enabled at the
point of EOI.
However, this is incorrect: there is a window after a hart claiming an
interrupt and before irq_desc->lock getting acquired, interrupt can be
disabled during this window. Thus, EOI can be invoked while the interrupt
is disabled, effectively nullify this EOI. This results in the interrupt
never gets asserted again, and the device who uses this interrupt appears
frozen.
Make sure that interrupt is really enabled before EOI.
Fixes: a1706a1c50 ("irqchip/sifive-plic: Separate the enable and mask operations")
Signed-off-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Samuel Holland <samuel@sholland.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: linux-riscv@lists.infradead.org
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20240131081933.144512-1-namcao@linutronix.de
amd_pmf_init_smart_pc() calls out to amd_pmf_get_bios_buffer() but
the error handling flow doesn't clean everything up allocated
memory.
As amd_pmf_get_bios_buffer() is only called by amd_pmf_init_smart_pc(),
fold it into the function and add labels to clean up any step that
can fail along the way. Explicitly set everything allocated to NULL as
there are other features that may access some of the same variables.
Fixes: 7c45534afa ("platform/x86/amd/pmf: Add support for PMF Policy Binary")
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://lore.kernel.org/r/20240217014107.113749-3-mario.limonciello@amd.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
TEE enact command failures are seen after each suspend/resume cycle;
fix this by cancelling the policy builder workqueue before going into
suspend and reschedule the workqueue after resume.
[ 629.516792] ccp 0000:c2:00.2: tee: command 0x5 timed out, disabling PSP
[ 629.516835] amd-pmf AMDI0102:00: TEE enact cmd failed. err: ffff000e, ret:0
[ 630.550464] amd-pmf AMDI0102:00: AMD_PMF_REGISTER_RESPONSE:1
[ 630.550511] amd-pmf AMDI0102:00: AMD_PMF_REGISTER_ARGUMENT:7
[ 630.550548] amd-pmf AMDI0102:00: AMD_PMF_REGISTER_MESSAGE:16
Fixes: ae82cef7d9 ("platform/x86/amd/pmf: Add support for PMF-TA interaction")
Co-developed-by: Patil Rajesh Reddy <Patil.Reddy@amd.com>
Signed-off-by: Patil Rajesh Reddy <Patil.Reddy@amd.com>
Signed-off-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://lore.kernel.org/r/20240216064112.962582-2-Shyam-sundar.S-k@amd.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
On some devices the ACPI name of the touchscreen is e.g. either
MSSL1680:00 or MSSL1680:01 depending on the BIOS version.
This happens for example on the "Chuwi Hi8 Air" tablet where the initial
commit's ts_data uses "MSSL1680:00" but the tablets from the github issue
and linux-hardware.org probe linked below both use "MSSL1680:01".
Replace the strcmp() match on ts_data->acpi_name with a strstarts()
check to allow using a partial match on just the ACPI HID of "MSSL1680"
and change the ts_data->acpi_name for the "Chuwi Hi8 Air" accordingly
to fix the touchscreen not working on models where it is "MSSL1680:01".
Note this drops the length check for I2C_NAME_SIZE. This never was
necessary since the ACPI names used are never more then 11 chars and
I2C_NAME_SIZE is 20 so the replaced strncmp() would always stop long
before reaching I2C_NAME_SIZE.
Link: https://linux-hardware.org/?computer=AC4301C0542A
Fixes: bbb97d728f ("platform/x86: touchscreen_dmi: Add info for the Chuwi Hi8 Air tablet")
Closes: https://github.com/onitake/gsl-firmware/issues/91
Cc: stable@vger.kernel.org
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20240212120608.30469-1-hdegoede@redhat.com
Since commit 7a36b901a6 ("ACPI: OSL: Use a threaded interrupt handler
for SCI") the ACPI OSL code passes IRQF_ONESHOT when requesting the SCI.
Since the INT0002 GPIO is typically shared with the ACPI SCI the INT0002
driver must pass the same flags.
This fixes the INT0002 driver failing to probe due to following error +
as well as removing the backtrace that follows this error:
"genirq: Flags mismatch irq 9. 00000084 (INT0002) vs. 00002080 (acpi)"
Fixes: 7a36b901a6 ("ACPI: OSL: Use a threaded interrupt handler for SCI")
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20240210110149.12803-1-hdegoede@redhat.com
Incorporate a test case to assess the handling of invalid flags or
task__nullable parameters passed to bpf_iter_task_new(). Prior to the
preceding commit, this scenario could potentially trigger a kernel panic.
However, with the previous commit, this test case is expected to function
correctly.
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20240217114152.1623-3-laoar.shao@gmail.com
Failure to initialize it->pos, coupled with the presence of an invalid
value in the flags variable, can lead to it->pos referencing an invalid
task, potentially resulting in a kernel panic. To mitigate this risk, it's
crucial to ensure proper initialization of it->pos to NULL.
Fixes: ac8148d957 ("bpf: bpf_iter_task_next: use next_task(kit->task) rather than next_task(kit->pos)")
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Link: https://lore.kernel.org/bpf/20240217114152.1623-2-laoar.shao@gmail.com
This selftest is based on a Alexei's test adopted from an internal
user to troubleshoot another bug. During this exercise, a separate
racing bug was discovered between bpf_timer_cancel_and_free
and bpf_timer_cancel. The details can be found in the previous
patch.
This patch is to add a selftest that can trigger the bug.
I can trigger the UAF everytime in my qemu setup with KASAN. The idea
is to have multiple user space threads running in a tight loop to exercise
both bpf_map_update_elem (which calls into bpf_timer_cancel_and_free)
and bpf_timer_cancel.
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/bpf/20240215211218.990808-2-martin.lau@linux.dev
The following race is possible between bpf_timer_cancel_and_free
and bpf_timer_cancel. It will lead a UAF on the timer->timer.
bpf_timer_cancel();
spin_lock();
t = timer->time;
spin_unlock();
bpf_timer_cancel_and_free();
spin_lock();
t = timer->timer;
timer->timer = NULL;
spin_unlock();
hrtimer_cancel(&t->timer);
kfree(t);
/* UAF on t */
hrtimer_cancel(&t->timer);
In bpf_timer_cancel_and_free, this patch frees the timer->timer
after a rcu grace period. This requires a rcu_head addition
to the "struct bpf_hrtimer". Another kfree(t) happens in bpf_timer_init,
this does not need a kfree_rcu because it is still under the
spin_lock and timer->timer has not been visible by others yet.
In bpf_timer_cancel, rcu_read_lock() is added because this helper
can be used in a non rcu critical section context (e.g. from
a sleepable bpf prog). Other timer->timer usages in helpers.c
have been audited, bpf_timer_cancel() is the only place where
timer->timer is used outside of the spin_lock.
Another solution considered is to mark a t->flag in bpf_timer_cancel
and clear it after hrtimer_cancel() is done. In bpf_timer_cancel_and_free,
it busy waits for the flag to be cleared before kfree(t). This patch
goes with a straight forward solution and frees timer->timer after
a rcu grace period.
Fixes: b00628b1c7 ("bpf: Introduce bpf timers.")
Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/bpf/20240215211218.990808-1-martin.lau@linux.dev
FORTIFY_SOURCE has been ignoring 0-sized destinations while the kernel
code base has been converted to flexible arrays. In order to enforce
the 0-sized destinations (e.g. with __counted_by), the remaining 0-sized
destinations need to be handled. Unfortunately, struct vic_provinfo
resists full conversion, as it contains a flexible array of flexible
arrays, which is only possible with the 0-sized fake flexible array.
Use unsafe_memcpy() to avoid future false positives under
CONFIG_FORTIFY_SOURCE.
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since there is a utility available for this, use
the API rather than open code.
Fixes: 13943d6c82 ("ionic: prevent pci disable of already disabled device")
Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While working on the patchset to remove extent locking I got a lockdep
splat with fiemap and pagefaulting with my new extent lock replacement
lock.
This deadlock exists with our normal code, we just don't have lockdep
annotations with the extent locking so we've never noticed it.
Since we're copying the fiemap extent to user space on every iteration
we have the chance of pagefaulting. Because we hold the extent lock for
the entire range we could mkwrite into a range in the file that we have
mmap'ed. This would deadlock with the following stack trace
[<0>] lock_extent+0x28d/0x2f0
[<0>] btrfs_page_mkwrite+0x273/0x8a0
[<0>] do_page_mkwrite+0x50/0xb0
[<0>] do_fault+0xc1/0x7b0
[<0>] __handle_mm_fault+0x2fa/0x460
[<0>] handle_mm_fault+0xa4/0x330
[<0>] do_user_addr_fault+0x1f4/0x800
[<0>] exc_page_fault+0x7c/0x1e0
[<0>] asm_exc_page_fault+0x26/0x30
[<0>] rep_movs_alternative+0x33/0x70
[<0>] _copy_to_user+0x49/0x70
[<0>] fiemap_fill_next_extent+0xc8/0x120
[<0>] emit_fiemap_extent+0x4d/0xa0
[<0>] extent_fiemap+0x7f8/0xad0
[<0>] btrfs_fiemap+0x49/0x80
[<0>] __x64_sys_ioctl+0x3e1/0xb50
[<0>] do_syscall_64+0x94/0x1a0
[<0>] entry_SYSCALL_64_after_hwframe+0x6e/0x76
I wrote an fstest to reproduce this deadlock without my replacement lock
and verified that the deadlock exists with our existing locking.
To fix this simply don't take the extent lock for the entire duration of
the fiemap. This is safe in general because we keep track of where we
are when we're searching the tree, so if an ordered extent updates in
the middle of our fiemap call we'll still emit the correct extents
because we know what offset we were on before.
The only place we maintain the lock is searching delalloc. Since the
delalloc stuff can change during writeback we want to lock the extent
range so we have a consistent view of delalloc at the time we're
checking to see if we need to set the delalloc flag.
With this patch applied we no longer deadlock with my testcase.
CC: stable@vger.kernel.org # 6.1+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
With the following file extent layout, defrag would do unnecessary IO
and result more on-disk space usage.
# mkfs.btrfs -f $dev
# mount $dev $mnt
# xfs_io -f -c "pwrite 0 40m" $mnt/foobar
# sync
# xfs_io -f -c "pwrite 40m 16k" $mnt/foobar
# sync
Above command would lead to the following file extent layout:
item 6 key (257 EXTENT_DATA 0) itemoff 15816 itemsize 53
generation 7 type 1 (regular)
extent data disk byte 298844160 nr 41943040
extent data offset 0 nr 41943040 ram 41943040
extent compression 0 (none)
item 7 key (257 EXTENT_DATA 41943040) itemoff 15763 itemsize 53
generation 8 type 1 (regular)
extent data disk byte 13631488 nr 16384
extent data offset 0 nr 16384 ram 16384
extent compression 0 (none)
Which is mostly fine. We can allow the final 16K to be merged with the
previous 40M, but it's upon the end users' preference.
But if we defrag the file using the default parameters, it would result
worse file layout:
# btrfs filesystem defrag $mnt/foobar
# sync
item 6 key (257 EXTENT_DATA 0) itemoff 15816 itemsize 53
generation 7 type 1 (regular)
extent data disk byte 298844160 nr 41943040
extent data offset 0 nr 8650752 ram 41943040
extent compression 0 (none)
item 7 key (257 EXTENT_DATA 8650752) itemoff 15763 itemsize 53
generation 9 type 1 (regular)
extent data disk byte 340787200 nr 33292288
extent data offset 0 nr 33292288 ram 33292288
extent compression 0 (none)
item 8 key (257 EXTENT_DATA 41943040) itemoff 15710 itemsize 53
generation 8 type 1 (regular)
extent data disk byte 13631488 nr 16384
extent data offset 0 nr 16384 ram 16384
extent compression 0 (none)
Note the original 40M extent is still there, but a new 32M extent is
created for no benefit at all.
[CAUSE]
There is an existing check to make sure we won't defrag a large enough
extent (the threshold is by default 32M).
But the check is using the length to the end of the extent:
range_len = em->len - (cur - em->start);
/* Skip too large extent */
if (range_len >= extent_thresh)
goto next;
This means, for the first 8MiB of the extent, the range_len is always
smaller than the default threshold, and would not be defragged.
But after the first 8MiB, the remaining part would fit the requirement,
and be defragged.
Such different behavior inside the same extent caused the above problem,
and we should avoid different defrag decision inside the same extent.
[FIX]
Instead of using @range_len, just use @em->len, so that we have a
consistent decision among the same file extent.
Now with this fix, we won't touch the extent, thus not making it any
worse.
Reported-by: Filipe Manana <fdmanana@suse.com>
Fixes: 0cb5950f3f ("btrfs: fix deadlock when reserving space during defrag")
CC: stable@vger.kernel.org # 6.1+
Reviewed-by: Boris Burkov <boris@bur.io>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Platform clock and phy error resources are not cleaned up in Xilinx GT PHY
error path.
To fix introduce the function ceva_ahci_platform_enable_resources() which
is a customized version of ahci_platform_enable_resources() and inline with
SATA IP programming sequence it does:
- Assert SATA reset
- Program PS GTR phy
- Bring SATA by de-asserting the reset
- Wait for GT lane PLL to be locked
ceva_ahci_platform_enable_resources() is also used in the resume path
as the same SATA programming sequence (as in probe) should be followed.
Also cleanup the mixed usage of ahci_platform_enable_resources() and custom
implementation in the probe function as both are not required.
Fixes: 9a9d3abe24 ("ata: ahci: ceva: Update the driver to support xilinx GT phy")
Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Niklas Cassel <cassel@kernel.org>
The ASM1064 SATA host controller always reports wrongly,
that it has 24 ports. But in reality, it only has four ports.
before:
ahci 0000:04:00.0: SSS flag set, parallel bus scan disabled
ahci 0000:04:00.0: AHCI 0001.0301 32 slots 24 ports 6 Gbps 0xffff0f impl SATA mode
ahci 0000:04:00.0: flags: 64bit ncq sntf stag pm led only pio sxs deso sadm sds apst
after:
ahci 0000:04:00.0: ASM1064 has only four ports
ahci 0000:04:00.0: forcing port_map 0xffff0f -> 0xf
ahci 0000:04:00.0: SSS flag set, parallel bus scan disabled
ahci 0000:04:00.0: AHCI 0001.0301 32 slots 24 ports 6 Gbps 0xf impl SATA mode
ahci 0000:04:00.0: flags: 64bit ncq sntf stag pm led only pio sxs deso sadm sds apst
Signed-off-by: "Andrey Jr. Melnikov" <temnota.am@gmail.com>
Signed-off-by: Niklas Cassel <cassel@kernel.org>
In bond priority testing, we set the primary interface to eth1 and add
eth0,1,2 to bond in serial. This is OK in normal times. But when in
debug kernel, the bridge port that eth0,1,2 connected would start
slowly (enter blocking, forwarding state), which caused the primary
interface down for a while after enslaving and active slave changed.
Here is a test log from Jakub's debug test[1].
[ 400.399070][ T50] br0: port 1(s0) entered disabled state
[ 400.400168][ T50] br0: port 4(s2) entered disabled state
[ 400.941504][ T2791] bond0: (slave eth0): making interface the new active one
[ 400.942603][ T2791] bond0: (slave eth0): Enslaving as an active interface with an up link
[ 400.943633][ T2766] br0: port 1(s0) entered blocking state
[ 400.944119][ T2766] br0: port 1(s0) entered forwarding state
[ 401.128792][ T2792] bond0: (slave eth1): making interface the new active one
[ 401.130771][ T2792] bond0: (slave eth1): Enslaving as an active interface with an up link
[ 401.131643][ T69] br0: port 2(s1) entered blocking state
[ 401.132067][ T69] br0: port 2(s1) entered forwarding state
[ 401.346201][ T2793] bond0: (slave eth2): Enslaving as a backup interface with an up link
[ 401.348414][ T50] br0: port 4(s2) entered blocking state
[ 401.348857][ T50] br0: port 4(s2) entered forwarding state
[ 401.519669][ T250] bond0: (slave eth0): link status definitely down, disabling slave
[ 401.526522][ T250] bond0: (slave eth1): link status definitely down, disabling slave
[ 401.526986][ T250] bond0: (slave eth2): making interface the new active one
[ 401.629470][ T250] bond0: (slave eth0): link status definitely up
[ 401.630089][ T250] bond0: (slave eth1): link status definitely up
[...]
# TEST: prio (active-backup ns_ip6_target primary_reselect 1) [FAIL]
# Current active slave is eth2 but not eth1
Fix it by setting active slave to primary slave specifically before
testing.
[1] https://netdev-3.bots.linux.dev/vmksft-bonding-dbg/results/464301/1-bond-options-sh/stdout
Fixes: 481b56e039 ("selftests: bonding: re-format bond option tests")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Before commit 07c30ea586 ("serial: Do not hold the port lock when setting
rx-during-tx GPIO") the SER_RS485_RX_DURING_TX flag was only set if the
rx-during-tx mode was not controlled by a GPIO. Now the flag is set
unconditionally when RS485 is enabled. This results in an incorrect setting
if the rx-during-tx GPIO is not asserted.
Fix this by setting the flag only if the rx-during-tx mode is not
controlled by a GPIO and thus restore the correct behaviour.
Cc: stable@vger.kernel.org # 6.6+
Fixes: 07c30ea586 ("serial: Do not hold the port lock when setting rx-during-tx GPIO")
Signed-off-by: Lino Sanfilippo <l.sanfilippo@kunbus.com>
Link: https://lore.kernel.org/r/20240216224709.9928-1-l.sanfilippo@kunbus.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It is observed sometimes when tethering is used over NCM with Windows 11
as host, at some instances, the gadget_giveback has one byte appended at
the end of a proper NTB. When the NTB is parsed, unwrap call looks for
any leftover bytes in SKB provided by u_ether and if there are any pending
bytes, it treats them as a separate NTB and parses it. But in case the
second NTB (as per unwrap call) is faulty/corrupt, all the datagrams that
were parsed properly in the first NTB and saved in rx_list are dropped.
Adding a few custom traces showed the following:
[002] d..1 7828.532866: dwc3_gadget_giveback: ep1out:
req 000000003868811a length 1025/16384 zsI ==> 0
[002] d..1 7828.532867: ncm_unwrap_ntb: K: ncm_unwrap_ntb toprocess: 1025
[002] d..1 7828.532867: ncm_unwrap_ntb: K: ncm_unwrap_ntb nth: 1751999342
[002] d..1 7828.532868: ncm_unwrap_ntb: K: ncm_unwrap_ntb seq: 0xce67
[002] d..1 7828.532868: ncm_unwrap_ntb: K: ncm_unwrap_ntb blk_len: 0x400
[002] d..1 7828.532868: ncm_unwrap_ntb: K: ncm_unwrap_ntb ndp_len: 0x10
[002] d..1 7828.532869: ncm_unwrap_ntb: K: Parsed NTB with 1 frames
In this case, the giveback is of 1025 bytes and block length is 1024.
The rest 1 byte (which is 0x00) won't be parsed resulting in drop of
all datagrams in rx_list.
Same is case with packets of size 2048:
[002] d..1 7828.557948: dwc3_gadget_giveback: ep1out:
req 0000000011dfd96e length 2049/16384 zsI ==> 0
[002] d..1 7828.557949: ncm_unwrap_ntb: K: ncm_unwrap_ntb nth: 1751999342
[002] d..1 7828.557950: ncm_unwrap_ntb: K: ncm_unwrap_ntb blk_len: 0x800
Lecroy shows one byte coming in extra confirming that the byte is coming
in from PC:
Transfer 2959 - Bytes Transferred(1025) Timestamp((18.524 843 590)
- Transaction 8391 - Data(1025 bytes) Timestamp(18.524 843 590)
--- Packet 4063861
Data(1024 bytes)
Duration(2.117us) Idle(14.700ns) Timestamp(18.524 843 590)
--- Packet 4063863
Data(1 byte)
Duration(66.160ns) Time(282.000ns) Timestamp(18.524 845 722)
According to Windows driver, no ZLP is needed if wBlockLength is non-zero,
because the non-zero wBlockLength has already told the function side the
size of transfer to be expected. However, there are in-market NCM devices
that rely on ZLP as long as the wBlockLength is multiple of wMaxPacketSize.
To deal with such devices, it pads an extra 0 at end so the transfer is no
longer multiple of wMaxPacketSize.
Cc: <stable@vger.kernel.org>
Fixes: 9f6ce4240a ("usb: gadget: f_ncm.c added")
Signed-off-by: Krishna Kurapati <quic_kriskura@quicinc.com>
Reviewed-by: Maciej Żenczykowski <maze@google.com>
Link: https://lore.kernel.org/r/20240205074650.200304-1-quic_kriskura@quicinc.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The reverted commit makes the state machine only ever go from SRC_ATTACH_WAIT
to SNK_TRY in endless loop when toggling. After revert it goes to SRC_ATTACHED
after initially trying SNK_TRY earlier, as it should for toggling to ever detect
the power source mode and the port is again able to provide power to attached
power sinks.
This reverts commit 2d6d801270.
Cc: stable@vger.kernel.org
Fixes: 2d6d801270 ("usb: typec: tcpm: reset counter when enter into unattached state after try role")
Signed-off-by: Ondrej Jirman <megi@xff.cz>
Link: https://lore.kernel.org/r/20240217162023.1719738-1-megi@xff.cz
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
829 if (request->complete) {
830 spin_unlock(&priv_dev->lock);
831 usb_gadget_giveback_request(&priv_ep->endpoint,
832 request);
833 spin_lock(&priv_dev->lock);
834 }
835
836 if (request->buf == priv_dev->zlp_buf)
837 cdns3_gadget_ep_free_request(&priv_ep->endpoint, request);
Driver append an additional zero packet request when queue a packet, which
length mod max packet size is 0. When transfer complete, run to line 831,
usb_gadget_giveback_request() will free this requestion. 836 condition is
true, so cdns3_gadget_ep_free_request() free this request again.
Log:
[ 1920.140696][ T150] BUG: KFENCE: use-after-free read in cdns3_gadget_giveback+0x134/0x2c0 [cdns3]
[ 1920.140696][ T150]
[ 1920.151837][ T150] Use-after-free read at 0x000000003d1cd10b (in kfence-#36):
[ 1920.159082][ T150] cdns3_gadget_giveback+0x134/0x2c0 [cdns3]
[ 1920.164988][ T150] cdns3_transfer_completed+0x438/0x5f8 [cdns3]
Add check at line 829, skip call usb_gadget_giveback_request() if it is
additional zero length packet request. Needn't call
usb_gadget_giveback_request() because it is allocated in this driver.
Cc: stable@vger.kernel.org
Fixes: 7733f6c32e ("usb: cdns3: Add Cadence USB3 DRD Driver")
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Reviewed-by: Roger Quadros <rogerq@kernel.org>
Acked-by: Peter Chen <peter.chen@kernel.org>
Link: https://lore.kernel.org/r/20240202154217.661867-2-Frank.Li@nxp.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In current design, usb role class driver will get usb_role_switch parent's
module reference after the user get usb_role_switch device and put the
reference after the user put the usb_role_switch device. However, the
parent device of usb_role_switch may be removed before the user put the
usb_role_switch. If so, then, NULL pointer issue will be met when the user
put the parent module's reference.
This will save the module pointer in structure of usb_role_switch. Then,
we don't need to find module by iterating long relations.
Fixes: 5c54fcac9a ("usb: roles: Take care of driver module reference counting")
cc: stable@vger.kernel.org
Signed-off-by: Xu Yang <xu.yang_2@nxp.com>
Acked-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Link: https://lore.kernel.org/r/20240129093739.2371530-1-xu.yang_2@nxp.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
host.c file has some parts of code that were introduced for CDNS3 driver
and should not be used with CDNSP driver.
This patch blocks using these parts of codes by CDNSP driver.
These elements include:
- xhci_plat_cdns3_xhci object
- cdns3 specific XECP_PORT_CAP_REG register
- cdns3 specific XECP_AUX_CTRL_REG1 register
cc: stable@vger.kernel.org
Fixes: 3d82904559 ("usb: cdnsp: cdns3 Add main part of Cadence USBSSP DRD Driver")
Signed-off-by: Pawel Laszczak <pawell@cadence.com>
Link: https://lore.kernel.org/r/20240206104018.48272-1-pawell@cadence.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The local helper function to compare the given pair of cycle count
evaluates them. If the left value is less than the right value, the
function returns negative value.
If the safe cycle is less than the current cycle, it is the case of
cycle lost. However, it is not currently handled properly.
This commit fixes the bug.
Cc: <stable@vger.kernel.org>
Fixes: 705794c53b ("ALSA: firewire-lib: check cycle continuity")
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Link: https://lore.kernel.org/r/20240218033026.72577-1-o-takashi@sakamocchi.jp
Signed-off-by: Takashi Iwai <tiwai@suse.de>
This fixes relying upon linux/of_platform.h to include
linux/platform_device.h, which it no longer does, thereby fixing
compilation problems like:
In file included from drivers/usb/host/uhci-hcd.c:850:
drivers/usb/host/uhci-grlib.c: In function 'uhci_hcd_grlib_probe':
drivers/usb/host/uhci-grlib.c:92:29: error: invalid use of undefined type 'struct platform_device'
92 | struct device_node *dn = op->dev.of_node;
| ^~
Fixes: ef175b29a2 ("of: Stop circularly including of_device.h and of_platform.h")
Signed-off-by: Andreas Larsson <andreas@gaisler.com>
Link: https://lore.kernel.org/r/20240129075056.1511630-1-andreas@gaisler.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pull Kbuild fixes from Masahiro Yamada:
- Reformat nested if-conditionals in Makefiles with 4 spaces
- Fix CONFIG_DEBUG_INFO_BTF builds for big endian
- Fix modpost for module srcversion
- Fix an escape sequence warning in gen_compile_commands.py
- Fix kallsyms to ignore ARMv4 thunk symbols
* tag 'kbuild-fixes-v6.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kallsyms: ignore ARMv4 thunks along with others
modpost: trim leading spaces when processing source files list
gen_compile_commands: fix invalid escape sequence warning
kbuild: Fix changing ELF file type for output of gen_btf for big endian
docs: kconfig: Fix grammar and formatting
kbuild: use 4-space indentation when followed by conditionals
Pull x86 fix from Borislav Petkov:
- Use a GB page for identity mapping only when memory of this size is
requested so that mapping of reserved regions is prevented which
would otherwise lead to system crashes on UV machines
* tag 'x86_urgent_for_v6.8_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm/ident_map: Use gbpages only where full GB page should be mapped.
Pull irq fixes from Borislav Petkov:
- Fix GICv4.1 affinity update
- Restore a quirk for ACPI-based GICv4 systems
- Handle non-coherent GICv4 redistributors properly
- Prevent spurious interrupts on Broadcom devices using GIC v3
architecture
- Other minor fixes
* tag 'irq_urgent_for_v6.8_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/gic-v3-its: Fix GICv4.1 VPE affinity update
irqchip/gic-v3-its: Restore quirk probing for ACPI-based systems
irqchip/gic-v3-its: Handle non-coherent GICv4 redistributors
irqchip/qcom-mpm: Fix IS_ERR() vs NULL check in qcom_mpm_init()
irqchip/loongson-eiointc: Use correct struct type in eiointc_domain_alloc()
irqchip/irq-brcmstb-l2: Add write memory barrier before exit
Pull i2c fixes from Wolfram Sang:
"Two fixes for i801 and qcom-geni devices. Meanwhile, a fix from Arnd
addresses a compilation error encountered during compile test on
powerpc"
* tag 'i2c-for-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: i801: Fix block process call transactions
i2c: pasemi: split driver into two separate modules
i2c: qcom-geni: Correct I2C TRE sequence
Justin Chen says:
====================
net: bcmasp: bug fixes for bcmasp
Fix two bugs.
- Indicate that PM is managed by mac to prevent double pm calls. This
doesn't lead to a crash, but waste a noticable amount of time
suspending/resuming.
- Sanity check for OOB write was off by one. Leading to a false error
when using the full array.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
A sanity check for OOB write is off by one leading to a false positive
when the array is full.
Fixes: 9b90aca97f ("net: ethernet: bcmasp: fix possible OOB write in bcmasp_netfilt_get_all_active()")
Signed-off-by: Justin Chen <justin.chen@broadcom.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Avoid the PHY library call unnecessarily into the suspend/resume
functions by setting phydev->mac_managed_pm to true. The ASP driver
essentially does exactly what mdio_bus_phy_resume() does.
Fixes: 490cb41200 ("net: bcmasp: Add support for ASP2.0 Ethernet controller")
Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Justin Chen <justin.chen@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Matthieu Baerts says:
====================
mptcp: misc. fixes for v6.8
This series includes 4 types of fixes:
Patches 1 and 2 force the path-managers not to allocate a new address
entry when dealing with the "special" ID 0, reserved to the address of
the initial subflow. These patches can be backported up to v5.19 and
v5.12 respectively.
Patch 3 to 6 fix the in-kernel path-manager not to create duplicated
subflows. Patch 6 is the main fix, but patches 3 to 5 are some kind of
pre-requisities: they fix some data races that could also lead to the
creation of unexpected subflows. These patches can be backported up to
v5.7, v5.10, v6.0, and v5.15 respectively.
Note that patch 3 modifies the existing ULP API. No better solutions
have been found for -net, and there is some similar prior art, see
commit 0df48c26d8 ("tcp: add tcpi_bytes_acked to tcp_info"). Please
also note that TLS ULP Diag has likely the same issue.
Patches 7 to 9 fix issues in the selftests, when executing them on older
kernels, e.g. when testing the last version of these kselftests on the
v5.15.148 kernel as it is done by LKFT when validating stable kernels.
These patches only avoid printing expected errors the console and
marking some tests as "OK" while they have been skipped. Patches 7 and 8
can be backported up to v6.6.
Patches 10 to 13 make sure all MPTCP selftests subtests have a unique
name. It is important to have a unique (sub)test name in TAP, because
that's the test identifier. Some CI environments might drop tests with
duplicated names. Patches 10 to 12 can be backported up to v6.6.
====================
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is important to have a unique (sub)test name in TAP, because some CI
environments drop tests with duplicated name.
Some 'cestab' subtests from the diag selftest had the same names, e.g.:
....chk 0 cestab
Now the previous value is taken, to have different names, e.g.:
....chk 2->0 cestab after flush
While at it, the 'after flush' info is added, similar to what is done
with the 'in use' subtests. Also inspired by these 'in use' subtests,
'many' is displayed instead of a large number:
many msk socket present [ ok ]
....chk many msk in use [ ok ]
....chk many cestab [ ok ]
....chk many->0 msk in use after flush [ ok ]
....chk many->0 cestab after flush [ ok ]
Fixes: 81ab772819 ("selftests: mptcp: diag: check CURRESTAB counters")
Cc: stable@vger.kernel.org
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is important to have a unique (sub)test name in TAP, because some CI
environments drop tests with duplicated name.
Some 'in use' subtests from the diag selftest had the same names, e.g.:
chk 0 msk in use after flush
Now the previous value is taken, to have different names, e.g.:
chk 2->0 msk in use after flush
While at it, avoid repeating the full message, declare it once in the
helper.
Fixes: ce99025736 ("selftests: mptcp: diag: format subtests results in TAP")
Cc: stable@vger.kernel.org
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is important to have a unique (sub)test name in TAP, because some CI
environments drop tests with duplicated names.
Some subtests from the userspace_pm selftest had the same names. That's
because different subflows are created (and deleted) between the same
pair of IP addresses.
Simply adding the destination port in the name is then enough to have
different names, because the destination port is always different.
Note that adding such info takes a bit more space, so we need to
increase a bit the width to print the name, simply to keep all the
'[ OK ]' aligned as before.
Fixes: f589234e1a ("selftests: mptcp: userspace_pm: format subtests results in TAP")
Cc: stable@vger.kernel.org
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The selftest was correctly recording all the results, but the 'reverse
direction' part was missing in the name when needed.
It is important to have a unique (sub)test name in TAP, because some CI
environments drop tests with duplicated name.
Fixes: 675d99338e ("selftests: mptcp: simult flows: format subtests results in TAP")
Cc: stable@vger.kernel.org
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since the 'Fixes' commit mentioned below, the command that is executed
in __chk_nr() helper can return nothing if the feature is not supported.
This is the case when the MPTCP CURRESTAB counter is not supported.
To avoid this warning ...
./diag.sh: line 65: [: !=: unary operator expected
... we just need to surround '$nr' with double quotes, to support an
empty string when the feature is not supported.
Fixes: 81ab772819 ("selftests: mptcp: diag: check CURRESTAB counters")
Cc: stable@vger.kernel.org
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since the 'Fixes' commit mentioned below, and if the kernel being tested
doesn't support the 'fullmesh' flag, this error will be printed:
netlink error -22 (Invalid argument)
./pm_nl_ctl: bailing out due to netlink error[s]
But that can be normal if the kernel doesn't support the feature, no
need to print this worrying error message while everything else looks
OK. So we can mute stderr. Failures will still be detected if any.
Fixes: 1dc88d241f ("selftests: mptcp: pm_nl_ctl: always look for errors")
Cc: stable@vger.kernel.org
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the feature is not supported by older kernels, and instead of just
ignoring some tests, we should mark them as skipped, so we can still
track them.
Fixes: d85555ac11 ("selftests: mptcp: pm_netlink: format subtests results in TAP")
Cc: stable@vger.kernel.org
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fullmesh endpoints could end-up unexpectedly generating duplicate
subflows - same local and remote addresses - when multiple incoming
ADD_ADDR are processed before the PM creates the subflow for the local
endpoints.
Address the issue explicitly checking for duplicates at subflow
creation time.
To avoid a quadratic computational complexity, track the unavailable
remote address ids in a temporary bitmap and initialize such bitmap
with the remote ids of all the existing subflows matching the local
address currently processed.
The above allows additionally replacing the existing code checking
for duplicate entry in the current set with a simple bit test
operation.
Fixes: 2843ff6f36 ("mptcp: remote addresses fullmesh")
Cc: stable@vger.kernel.org
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/435
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The local address id is accessed lockless by the NL PM, add
all the required ONCE annotation. There is a caveat: the local
id can be initialized late in the subflow life-cycle, and its
validity is controlled by the local_id_valid flag.
Remove such flag and encode the validity in the local_id field
itself with negative value before initialization. That allows
accessing the field consistently with a single read operation.
Fixes: 0ee4261a36 ("mptcp: implement mptcp_pm_remove_subflow")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since the introduction of the subflow ULP diag interface, the
dump callback accessed all the subflow data with lockless.
We need either to annotate all the read and write operation accordingly,
or acquire the subflow socket lock. Let's do latter, even if slower, to
avoid a diffstat havoc.
Fixes: 5147dfb508 ("mptcp: allow dumping subflow context to userspace")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Just the same as userspace PM, a new parameter needs_id is added for
in-kernel PM mptcp_pm_nl_append_new_local_addr() too.
Add a new helper mptcp_pm_has_addr_attr_id() to check whether an address
ID is set from PM or not.
In mptcp_pm_nl_get_local_id(), needs_id is always true, but in
mptcp_pm_nl_add_addr_doit(), pass mptcp_pm_has_addr_attr_id() to
needs_it.
Fixes: efd5a4c04e ("mptcp: add the address ID assignment bitmap")
Cc: stable@vger.kernel.org
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
When userspace PM requires to create an ID 0 subflow in "userspace pm
create id 0 subflow" test like this:
userspace_pm_add_sf $ns2 10.0.3.2 0
An ID 1 subflow, in fact, is created.
Since in mptcp_pm_nl_append_new_local_addr(), 'id 0' will be treated as
no ID is set by userspace, and will allocate a new ID immediately:
if (!e->addr.id)
e->addr.id = find_next_zero_bit(pernet->id_bitmap,
MPTCP_PM_MAX_ADDR_ID + 1,
1);
To solve this issue, a new parameter needs_id is added for
mptcp_userspace_pm_append_new_local_addr() to distinguish between
whether userspace PM has set an ID 0 or whether userspace PM has
not set any address.
needs_id is true in mptcp_userspace_pm_get_local_id(), but false in
mptcp_pm_nl_announce_doit() and mptcp_pm_nl_subflow_create_doit().
Fixes: e5ed101a60 ("mptcp: userspace pm allow creating id 0 subflow")
Cc: stable@vger.kernel.org
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet says:
====================
inet: fix NLM_F_DUMP_INTR logic
Make sure NLM_F_DUMP_INTR is generated if dev_base_seq and
dev_addr_genid are changed by the same amount.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
net->dev_base_seq and ipv6.dev_addr_genid are monotonically increasing.
If we XOR their values, we could miss to detect if both values
were changed with the same amount.
Fixes: 63998ac24f ("ipv6: provide addr and netconf dump consistency info")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
net->dev_base_seq and ipv4.dev_addr_genid are monotonically increasing.
If we XOR their values, we could miss to detect if both values
were changed with the same amount.
Fixes: 0465277f6b ("ipv4: provide addr and netconf dump consistency info")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Hong found that when using ftruncate to expand an empty file,
exfat_ent_set() will fail if discontinuous clusters are allocated.
The reason is that the empty file does not have a cluster chain,
but exfat_ent_set() attempts to append the newly allocated cluster
to the cluster chain. In addition, exfat_find_last_cluster() only
supports finding the last cluster in a non-empty file.
So this commit adds a check whether the file is empty. If the file
is empty, exfat_find_last_cluster() and exfat_ent_set() are no longer
called as they do not need to be called.
Fixes: f55c096f62 ("exfat: do not zero the extended part")
Reported-by: Eric Hong <erichong@qnap.com>
Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Pull powerpc fixes from Michael Ellerman:
"This is a bit of a big batch for rc4, but just due to holiday hangover
and because I didn't send any fixes last week due to a late revert
request. I think next week should be back to normal.
- Fix ftrace bug on boot caused by exit text sections with
'-fpatchable-function-entry'
- Fix accuracy of stolen time on pseries since the switch to
VIRT_CPU_ACCOUNTING_GEN
- Fix a crash in the IOMMU code when doing DLPAR remove
- Set pt_regs->link on scv entry to fix BPF stack unwinding
- Add missing PPC_FEATURE_BOOKE on 64-bit e5500/e6500, which broke
gdb
- Fix boot on some 6xx platforms with STRICT_KERNEL_RWX enabled
- Fix build failures with KASAN enabled and 32KB stack size
- Some other minor fixes
Thanks to Arnd Bergmann, Benjamin Gray, Christophe Leroy, David
Engraf, Gaurav Batra, Jason Gunthorpe, Jiangfeng Xiao, Matthias
Schiffer, Nathan Lynch, Naveen N Rao, Nicholas Piggin, Nysal Jan K.A,
R Nageswara Sastry, Shivaprasad G Bhat, Shrikanth Hegde, Spoorthy,
Srikar Dronamraju, and Venkat Rao Bagalkote"
* tag 'powerpc-6.8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/iommu: Fix the missing iommu_group_put() during platform domain attach
powerpc/pseries: fix accuracy of stolen time
powerpc/ftrace: Ignore ftrace locations in exit text sections
powerpc/cputable: Add missing PPC_FEATURE_BOOKE on PPC64 Book-E
powerpc/kasan: Limit KASAN thread size increase to 32KB
Revert "powerpc/pseries/iommu: Fix iommu initialisation during DLPAR add"
powerpc: 85xx: mark local functions static
powerpc: udbg_memcons: mark functions static
powerpc/kasan: Fix addr error caused by page alignment
powerpc/6xx: set High BAT Enable flag on G2_LE cores
selftests/powerpc/papr_vpd: Check devfd before get_system_loc_code()
powerpc/64: Set task pt_regs->link to the LR value on scv entry
powerpc/pseries/iommu: Fix iommu initialisation during DLPAR add
powerpc/pseries/papr-sysparm: use u8 arrays for payloads
Pull bcachefs fixes from Kent Overstreet:
"Mostly pretty trivial, the user visible ones are:
- don't barf when replicas_required > replicas
- fix check_version_upgrade() so it doesn't do something nonsensical
when we're downgrading"
* tag 'bcachefs-2024-02-17' of https://evilpiepirate.org/git/bcachefs:
bcachefs: Fix missing va_end()
bcachefs: Fix check_version_upgrade()
bcachefs: Clamp replicas_required to replicas
bcachefs: fix missing endiannes conversion in sb_members
bcachefs: fix kmemleak in __bch2_read_super error handling path
bcachefs: Fix missing bch2_err_class() calls
If 'dev' or 'data' is NULL, the 'priv' variable has an incorrect address
when dereferencing calling netdev_err().
Since we get as 'dev_id' or 'data' what was passed as the 'dev' argument
to request_irq() during interrupt initialization (that is, the net_device
and rx/tx queue pointers initialized at the time of the call) and since
there are usually no checks for the 'dev_id' argument in such handlers
in other drivers, remove these checks from the handlers in stmmac driver.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: 8532f613bc ("net: stmmac: introduce MSI Interrupt routines for mac, safety, RX & TX")
Signed-off-by: Pavel Sakharov <p.sakharov@ispras.ru>
Reviewed-by: Serge Semin <fancer.lancer@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull driver core fixes from Greg KH:
"Here are some driver core fixes, a kobject fix, and a documentation
update for 6.8-rc5. In detail these changes are:
- devlink fixes for reported issues with 6.8-rc1
- topology scheduling regression fix that has been reported by many
- kobject loosening of checks change in -rc1 is now reverted as some
codepaths seemed to need the checks
- documentation update for the CVE process. Has been reviewed by
many, the last minute change to the document was to bring the .rst
format back into the the new style rules, the contents did not
change.
All of these, except for the documentation update, have been in
linux-next for over a week. The documentation update has been reviewed
for weeks by a group of developers, and in public for a week and the
wording has stabilized for now. If future changes are needed, we can
do so before 6.8-final is out (or anytime after that)"
* tag 'driver-core-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
Documentation: Document the Linux Kernel CVE process
Revert "kobject: Remove redundant checks for whether ktype is NULL"
driver core: fw_devlink: Improve logs for cycle detection
driver core: fw_devlink: Improve detection of overlapping cycles
driver core: Fix device_link_flag_is_sync_state_only()
topology: Set capacity_freq_ref in all cases
Pull char / miscdriver fixes from Greg KH:
"Here is a small set of char/misc and IIO driver fixes for 6.8-rc5.
Included in here are:
- lots of iio driver fixes for reported issues
- nvmem device naming fixup for reported problem
- interconnect driver fixes for reported issues
All of these have been in linux-next for a while with no reported the
issues (the nvmem patch was included in a different branch in
linux-next before sent to me for inclusion here)"
* tag 'char-misc-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (21 commits)
nvmem: include bit index in cell sysfs file name
iio: adc: ad4130: only set GPIO_CTRL if pin is unused
iio: adc: ad4130: zero-initialize clock init data
interconnect: qcom: x1e80100: Add missing ACV enable_mask
interconnect: qcom: sm8650: Use correct ACV enable_mask
iio: accel: bma400: Fix a compilation problem
iio: commom: st_sensors: ensure proper DMA alignment
iio: hid-sensor-als: Return 0 for HID_USAGE_SENSOR_TIME_TIMESTAMP
iio: move LIGHT_UVA and LIGHT_UVB to the end of iio_modifier
staging: iio: ad5933: fix type mismatch regression
iio: humidity: hdc3020: fix temperature offset
iio: adc: ad7091r8: Fix error code in ad7091r8_gpio_setup()
iio: adc: ad_sigma_delta: ensure proper DMA alignment
iio: imu: adis: ensure proper DMA alignment
iio: humidity: hdc3020: Add Makefile, Kconfig and MAINTAINERS entry
iio: imu: bno055: serdev requires REGMAP
iio: magnetometer: rm3100: add boundary check for the value read from RM3100_REG_TMRC
iio: pressure: bmp280: Add missing bmp085 to SPI id table
iio: core: fix memleak in iio_device_register_sysfs
interconnect: qcom: sm8550: Enable sync_state
...
Pull tty / serial fixes from Greg KH:
"Here are three small tty and serial driver fixes for 6.8-rc5:
- revert a 8250_pci1xxxx off-by-one change that was incorrect
- two changes to fix the transmit path of the mxs-auart driver,
fixing a regression in the 6.2 release
All of these have been in linux-next for over a week with no reported
issues"
* tag 'tty-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
serial: mxs-auart: fix tx
serial: core: introduce uart_port_tx_flags()
serial: 8250_pci1xxxx: partially revert off by one patch
Pull USB / Thunderbolt fixes from Greg KH:
"Here are two small fixes for 6.8-rc5:
- thunderbolt to fix a reported issue on many platforms
- dwc3 driver revert of a commit that caused problems in -rc1
Both of these changes have been in linux-next for over a week with no
reported issues"
* tag 'usb-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
Revert "usb: dwc3: Support EBC feature of DWC_usb31"
thunderbolt: Fix setting the CNS bit in ROUTER_CS_5
Pull media fixes from Mauro Carvalho Chehab:
- regression fix for rkisp1 shared IRQ logic
- fix atomisp breakage due to a kAPI change
- permission fix for remote controller BPF support
- memleak fix in ir_toy driver
- Kconfig dependency fix for pwm-ir-rx
* tag 'media/v6.8-4' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
media: pwm-ir-tx: Depend on CONFIG_HIGH_RES_TIMERS
media: ir_toy: fix a memleak in irtoy_tx
media: rc: bpf attach/detach requires write permission
media: atomisp: Adjust for v4l2_subdev_state handling changes in 6.8
media: rkisp1: Fix IRQ handling due to shared interrupts
media: Revert "media: rkisp1: Drop IRQF_SHARED"
Pull pci fixes from Bjorn Helgaas:
- Keep bridges in D0 if we need to poll downstream devices for PME to
resolve a v6.6 regression where we failed to enumerate devices below
bridges put in D3hot by runtime PM, e.g., NVMe drives connected via
Thunderbolt or USB4 docks (Alex Williamson)
- Add Siddharth Vadapalli as PCI TI DRA7XX/J721E reviewer
* tag 'pci-v6.8-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
MAINTAINERS: Add Siddharth Vadapalli as PCI TI DRA7XX/J721E reviewer
PCI: Fix active state requirement in PME polling
Pull probes fix from Masami Hiramatsu:
- tracing/probes: Fix BTF structure member finder to find the members
which are placed after any anonymous union member correctly.
* tag 'probes-fixes-v6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing/probes: Fix to search structure fields correctly
Pull smb client fixes from Steve French:
"Five smb3 client fixes, most also for stable:
- Two multichannel fixes (one to fix potential handle leak on retry)
- Work around possible serious data corruption (due to change in
folios in 6.3, for cases when non standard maximum write size
negotiated)
- Symlink creation fix
- Multiuser automount fix"
* tag '6.8-rc4-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb: Fix regression in writes when non-standard maximum write size negotiated
smb: client: handle path separator of created SMB symlinks
smb: client: set correct id, uid and cruid for multiuser automounts
cifs: update the same create_guid on replay
cifs: fix underflow in parse_server_interfaces()
Fix to search a field from the structure which has anonymous union
correctly.
Since the reference `type` pointer was updated in the loop, the search
loop suddenly aborted where it hits an anonymous union. Thus it can not
find the field after the anonymous union. This avoids updating the
cursor `type` pointer in the loop.
Link: https://lore.kernel.org/all/170791694361.389532.10047514554799419688.stgit@devnote2/
Fixes: 302db0f5b3 ("tracing/probes: Add a function to search a member of a struct/union")
Cc: stable@vger.kernel.org
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Three fixes are included here. Two are strictly hardware-related
for the i801 and qcom-geni devices. Meanwhile, a fix from Arnd
addresses a compilation error encountered during compile test on
powerpc.
The Linux CXL subsystem is built on the assumption that HPA == SPA.
That is, the host physical address (HPA) the HDM decoder registers are
programmed with are system physical addresses (SPA).
During HDM decoder setup, the DVSEC CXL range registers (cxl-3.1,
8.1.3.8) are checked if the memory is enabled and the CXL range is in
a HPA window that is described in a CFMWS structure of the CXL host
bridge (cxl-3.1, 9.18.1.3).
Now, if the HPA is not an SPA, the CXL range does not match a CFMWS
window and the CXL memory range will be disabled then. The HDM decoder
stops working which causes system memory being disabled and further a
system hang during HDM decoder initialization, typically when a CXL
enabled kernel boots.
Prevent a system hang and do not disable the HDM decoder if the
decoder's CXL range is not found in a CFMWS window.
Note the change only fixes a hardware hang, but does not implement
HPA/SPA translation. Support for this can be added in a follow on
patch series.
Signed-off-by: Robert Richter <rrichter@amd.com>
Fixes: 34e37b4c43 ("cxl/port: Enable HDM Capability after validating DVSEC Ranges")
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20240216160113.407141-1-rrichter@amd.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Current implementation exports only to
/sys/bus/cxl/devices/.../memN/qos_class. With both ram and pmem exposed,
the second registered sysfs attribute is rejected as duplicate. It's not
possible to create qos_class under the dev_groups via the driver due to
the ram and pmem sysfs sub-directories already created by the device sysfs
groups. Move the ram and pmem qos_class to the device sysfs groups and add
a call to sysfs_update() after the perf data are validated so the
qos_class can be visible. The end results should be
/sys/bus/cxl/devices/.../memN/ram/qos_class and
/sys/bus/cxl/devices/.../memN/pmem/qos_class.
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20240206190431.1810289-4-dave.jiang@intel.com
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Autodiscovered regions can fail to assemble if they are not discovered
in HPA decode order. The user will see failure messages like:
[] cxl region0: endpoint5: HPA order violation region1
[] cxl region0: endpoint5: failed to allocate region reference
The check that is causing the failure helps the CXL driver enforce
a CXL spec mandate that decoders be committed in HPA order. The
check is needless for autodiscovered regions since their decoders
are already programmed. Trying to enforce order in the assembly of
these regions is useless because they are assembled once all their
member endpoints arrive, and there is no guarantee on the order in
which endpoints are discovered during probe.
Keep the existing check, but for autodiscovered regions, allow the
out of order assembly after a sanity check that the lesser numbered
decoder has the lesser HPA starting address.
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Tested-by: Wonjae Lee <wj28.lee@samsung.com>
Link: https://lore.kernel.org/r/3dec69ee97524ab229a20c6739272c3000b18408.1706736863.git.alison.schofield@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The compare function used to sort memblks into starting address
order fails when the result of its u64 address subtraction gets
truncated to an int upon return.
The impact of the bad sort is that memblks will be filled out
incorrectly. Depending on the set of memblks, a user may see no
errors at all but still have a bad fill, or see messages reporting
a node overlap that leads to numa init failure:
[] node 0 [mem: ] overlaps with node 1 [mem: ]
[] No NUMA configuration found
Replace with a comparison that can only result in: 1, 0, -1.
Fixes: 8f012db27c ("x86/numa: Introduce numa_fill_memblks()")
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Link: https://lore.kernel.org/r/99dcb3ae87e04995e9f293f6158dc8fa0749a487.1705085543.git.alison.schofield@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
numa_fill_memblks() fills in the gaps in numa_meminfo memblks over a
physical address range. To do so, it first creates a list of existing
memblks that overlap that address range. The issue is that it is off
by one when comparing to the end of the address range, so memblks
that do not overlap are selected.
The impact of selecting a memblk that does not actually overlap is
that an existing memblk may be filled when the expected action is to
do nothing and return NUMA_NO_MEMBLK to the caller. The caller can
then add a new NUMA node and memblk.
Replace the broken open-coded search for address overlap with the
memblock helper memblock_addrs_overlap(). Update the kernel doc
and in code comments.
Suggested by: "Huang, Ying" <ying.huang@intel.com>
Fixes: 8f012db27c ("x86/numa: Introduce numa_fill_memblks()")
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Link: https://lore.kernel.org/r/10a3e6109c34c21a8dd4c513cf63df63481a2b07.1705085543.git.alison.schofield@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
When emulating an atomic access on behalf of the guest, mark the target
gfn dirty if the CMPXCHG by KVM is attempted and doesn't fault. This
fixes a bug where KVM effectively corrupts guest memory during live
migration by writing to guest memory without informing userspace that the
page is dirty.
Marking the page dirty got unintentionally dropped when KVM's emulated
CMPXCHG was converted to do a user access. Before that, KVM explicitly
mapped the guest page into kernel memory, and marked the page dirty during
the unmap phase.
Mark the page dirty even if the CMPXCHG fails, as the old data is written
back on failure, i.e. the page is still written. The value written is
guaranteed to be the same because the operation is atomic, but KVM's ABI
is that all writes are dirty logged regardless of the value written. And
more importantly, that's what KVM did before the buggy commit.
Huge kudos to the folks on the Cc list (and many others), who did all the
actual work of triaging and debugging.
Fixes: 1c2361f667 ("KVM: x86: Use __try_cmpxchg_user() to emulate atomic accesses")
Cc: stable@vger.kernel.org
Cc: David Matlack <dmatlack@google.com>
Cc: Pasha Tatashin <tatashin@google.com>
Cc: Michael Krebs <mkrebs@google.com>
base-commit: 6769ea8da8a93ed4630f1ce64df6aafcaabfce64
Reviewed-by: Jim Mattson <jmattson@google.com>
Link: https://lore.kernel.org/r/20240215010004.1456078-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Commit e4bb020f3d ("riscv: detect assembler support for .option arch")
added two tests, one for a valid value to '.option arch' that should
succeed and one for an invalid value that is expected to fail to make
sure that support for '.option arch' is properly detected because Clang
does not error when '.option arch' is not supported:
$ clang --target=riscv64-linux-gnu -Werror -x assembler -c -o /dev/null <(echo '.option arch, +m')
/dev/fd/63:1:9: warning: unknown option, expected 'push', 'pop', 'rvc', 'norvc', 'relax' or 'norelax'
.option arch, +m
^
$ echo $?
0
Unfortunately, the invalid test started being accepted by Clang after
the linked llvm-project change, which causes CONFIG_AS_HAS_OPTION_ARCH
and configurations that depend on it to be silently disabled, even
though those versions do support '.option arch'.
The invalid test can be avoided altogether by using
'-Wa,--fatal-warnings', which will turn all assembler warnings into
errors, like '-Werror' does for the compiler:
$ clang --target=riscv64-linux-gnu -Werror -Wa,--fatal-warnings -x assembler -c -o /dev/null <(echo '.option arch, +m')
/dev/fd/63:1:9: error: unknown option, expected 'push', 'pop', 'rvc', 'norvc', 'relax' or 'norelax'
.option arch, +m
^
$ echo $?
1
The as-instr macros have been updated to make use of this flag, so
remove the invalid test, which allows CONFIG_AS_HAS_OPTION_ARCH to work
for all compiler versions.
Cc: stable@vger.kernel.org
Fixes: e4bb020f3d ("riscv: detect assembler support for .option arch")
Link: 3ac9fe69f7
Reported-by: Eric Biggers <ebiggers@kernel.org>
Closes: https://lore.kernel.org/r/20240121011341.GA97368@sol.localdomain/
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Eric Biggers <ebiggers@google.com>
Tested-by: Andy Chiu <andybnac@gmail.com>
Reviewed-by: Andy Chiu <andybnac@gmail.com>
Tested-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Acked-by: Masahiro Yamada <masahiroy@kernel.org>
Link: https://lore.kernel.org/r/20240125-fix-riscv-option-arch-llvm-18-v1-2-390ac9cc3cd0@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Certain assembler instruction tests may only induce warnings from the
assembler on an unsupported instruction or option, which causes as-instr
to succeed when it was expected to fail. Some tests workaround this
limitation by additionally testing that invalid input fails as expected.
However, this is fragile if the assembler is changed to accept the
invalid input, as it will cause the instruction/option to be unavailable
like it was unsupported even when it is.
Use '-Wa,--fatal-warnings' in the as-instr macro to turn these warnings
into hard errors, which avoids this fragility and makes tests more
robust and well formed.
Cc: stable@vger.kernel.org
Suggested-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Eric Biggers <ebiggers@google.com>
Tested-by: Andy Chiu <andybnac@gmail.com>
Reviewed-by: Andy Chiu <andybnac@gmail.com>
Tested-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Acked-by: Masahiro Yamada <masahiroy@kernel.org>
Link: https://lore.kernel.org/r/20240125-fix-riscv-option-arch-llvm-18-v1-1-390ac9cc3cd0@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
The SED Opal response parsing function response_parse() does not
handle the case of an empty atom in the response. This causes
the entry count to be too high and the response fails to be
parsed. Recognizing, but ignoring, empty atoms allows response
handling to succeed.
Signed-off-by: Greg Joyce <gjoyce@linux.ibm.com>
Link: https://lore.kernel.org/r/20240216210417.3526064-2-gjoyce@linux.ibm.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The bq27xxx i2c-client may not have an IRQ, in which case
client->irq will be 0. bq27xxx_battery_i2c_probe() already has
an if (client->irq) check wrapping the request_threaded_irq().
But bq27xxx_battery_i2c_remove() unconditionally calls
free_irq(client->irq) leading to:
[ 190.310742] ------------[ cut here ]------------
[ 190.310843] Trying to free already-free IRQ 0
[ 190.310861] WARNING: CPU: 2 PID: 1304 at kernel/irq/manage.c:1893 free_irq+0x1b8/0x310
Followed by a backtrace when unbinding the driver. Add
an if (client->irq) to bq27xxx_battery_i2c_remove() mirroring
probe() to fix this.
Fixes: 444ff00734 ("power: supply: bq27xxx: Fix I2C IRQ race on remove")
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20240215155133.70537-1-hdegoede@redhat.com
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Pull SCSI fixes from James Bottomley:
"Three fixes: the two fnic ones are a revert and a refix, which is why
the diffstat is a bit big. The target one also extracts a function to
add a check for configuration and so looks bigger than it is"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: fnic: Move fnic_fnic_flush_tx() to a work queue
scsi: Revert "scsi: fcoe: Fix potential deadlock on &fip->ctlr_lock"
scsi: target: Fix unmap setup during configuration
Pull workqueue fix from Tejun Heo:
"Just one patch to revert commit ca10d851b9 ("workqueue: Override
implicit ordered attribute in workqueue_apply_unbound_cpumask()").
This commit could break ordering guarantees for ordered workqueues.
The problem that the commit tried to resolve partially - making
ordered workqueues follow unbound cpumask - is fully solved in
wq/for-6.9 branch"
* tag 'wq-for-6.8-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
Revert "workqueue: Override implicit ordered attribute in workqueue_apply_unbound_cpumask()"
Pull io_uring fix from Jens Axboe:
"Just a single fix for a regression in how overflow is handled for
multishot accept requests"
* tag 'io_uring-6.8-2024-02-16' of git://git.kernel.dk/linux:
io_uring/net: fix multishot accept overflow handling
Pull ceph fixes from Ilya Dryomov:
"Additional cap handling fixes from Xiubo to avoid "client isn't
responding to mclientcaps(revoke)" stalls on the MDS side"
* tag 'ceph-for-6.8-rc5' of https://github.com/ceph/ceph-client:
ceph: add ceph_cap_unlink_work to fire check_caps() immediately
ceph: always queue a writeback when revoking the Fb caps
The NOKPROBE_SYMBOL macro (and others) were moved to
asm-generic/kprobes.h in 2017 by commit 7d134b2ce6 ("kprobes: move
kprobe declarations to asm-generic/kprobes.h"), and this new header
was included by asm/kprobes.h unconditionally on all architectures.
When kprobe support was added to parisc in 2017 by commit
8858ac8e9e ("parisc: Implement kprobes"), that header was only
included when CONFIG_KPROBES was enabled.
This can lead to build failures when NOKPROBE_SYMBOL is used, but
CONFIG_KPROBES is disabled. This mistake however was never actually
noticed because linux/kprobes.h also includes asm-generic/kprobes.h
(though I do not understand why that is, because it also includes
asm/kprobes.h).
To prevent eventual build failures, I suggest to always include
asm-generic/kprobes.h on parisc, just like all the other architectures
do. This way, including asm/kprobes.h suffices, and nobody (outside
of arch/) ever needs to explicitly include asm-generic/kprobes.h.
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Signed-off-by: Helge Deller <deller@gmx.de>
Fixes a bug revealed by -Wmissing-prototypes when
CONFIG_FUNCTION_GRAPH_TRACER is enabled but not CONFIG_DYNAMIC_FTRACE:
arch/parisc/kernel/ftrace.c:82:5: error: no previous prototype for 'ftrace_enable_ftrace_graph_caller' [-Werror=missing-prototypes]
82 | int ftrace_enable_ftrace_graph_caller(void)
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
arch/parisc/kernel/ftrace.c:88:5: error: no previous prototype for 'ftrace_disable_ftrace_graph_caller' [-Werror=missing-prototypes]
88 | int ftrace_disable_ftrace_graph_caller(void)
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Signed-off-by: Helge Deller <deller@gmx.de>
Pull KVM fixes from Paolo Bonzini:
"ARM:
- Avoid dropping the page refcount twice when freeing an unlinked
page-table subtree.
- Don't source the VFIO Kconfig twice
- Fix protected-mode locking order between kvm and vcpus
RISC-V:
- Fix steal-time related sparse warnings
x86:
- Cleanup gtod_is_based_on_tsc() to return "bool" instead of an "int"
- Make a KVM_REQ_NMI request while handling KVM_SET_VCPU_EVENTS if
and only if the incoming events->nmi.pending is non-zero. If the
target vCPU is in the UNITIALIZED state, the spurious request will
result in KVM exiting to userspace, which in turn causes QEMU to
constantly acquire and release QEMU's global mutex, to the point
where the BSP is unable to make forward progress.
- Fix a type (u8 versus u64) goof that results in pmu->fixed_ctr_ctrl
being incorrectly truncated, and ultimately causes KVM to think a
fixed counter has already been disabled (KVM thinks the old value
is '0').
- Fix a stack leak in KVM_GET_MSRS where a failed MSR read from
userspace that is ultimately ignored due to ignore_msrs=true
doesn't zero the output as intended.
Selftests cleanups and fixes:
- Remove redundant newlines from error messages.
- Delete an unused variable in the AMX test (which causes build
failures when compiling with -Werror).
- Fail instead of skipping tests if open(), e.g. of /dev/kvm, fails
with an error code other than ENOENT (a Hyper-V selftest bug
resulted in an EMFILE, and the test eventually got skipped).
- Fix TSC related bugs in several Hyper-V selftests.
- Fix a bug in the dirty ring logging test where a sem_post() could
be left pending across multiple runs, resulting in incorrect
synchronization between the main thread and the vCPU worker thread.
- Relax the dirty log split test's assertions on 4KiB mappings to fix
false positives due to the number of mappings for memslot 0 (used
for code and data that is NOT being dirty logged) changing, e.g.
due to NUMA balancing"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (25 commits)
KVM: arm64: Fix double-free following kvm_pgtable_stage2_free_unlinked()
RISC-V: KVM: Use correct restricted types
RISC-V: paravirt: Use correct restricted types
RISC-V: paravirt: steal_time should be static
KVM: selftests: Don't assert on exact number of 4KiB in dirty log split test
KVM: selftests: Fix a semaphore imbalance in the dirty ring logging test
KVM: x86: Fix KVM_GET_MSRS stack info leak
KVM: arm64: Do not source virt/lib/Kconfig twice
KVM: x86/pmu: Fix type length error when reading pmu->fixed_ctr_ctrl
KVM: x86: Make gtod_is_based_on_tsc() return 'bool'
KVM: selftests: Make hyperv_clock require TSC based system clocksource
KVM: selftests: Run clocksource dependent tests with hyperv_clocksource_tsc_page too
KVM: selftests: Use generic sys_clocksource_is_tsc() in vmx_nested_tsc_scaling_test
KVM: selftests: Generalize check_clocksource() from kvm_clock_test
KVM: x86: make KVM_REQ_NMI request iff NMI pending for vcpu
KVM: arm64: Fix circular locking dependency
KVM: selftests: Fail tests when open() fails with !ENOENT
KVM: selftests: Avoid infinite loop in hyperv_features when invtsc is missing
KVM: selftests: Delete superfluous, unused "stage" variable in AMX test
KVM: selftests: x86_64: Remove redundant newlines
...
Pull tracing fixes from Steven Rostedt:
- Fix the #ifndef that didn't have the 'CONFIG_' prefix on
HAVE_DYNAMIC_FTRACE_WITH_REGS
The fix to have dynamic trampolines work with x86 broke arm64 as the
config used in the #ifdef was HAVE_DYNAMIC_FTRACE_WITH_REGS and not
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_REGS which removed the fix that the
previous fix was to fix.
- Fix tracing_on state
The code to test if "tracing_on" is set incorrectly used
ring_buffer_record_is_on() which returns false if the ring buffer
isn't able to be written to.
But the ring buffer disable has several bits that disable it. One is
internal disabling which is used for resizing and other modifications
of the ring buffer. But the "tracing_on" user space visible flag
should only report if tracing is actually on and not internally
disabled, as this can cause confusion as writing "1" when it is
disabled will not enable it.
Instead use ring_buffer_record_is_set_on() which shows the user space
visible settings.
- Fix a false positive kmemleak on saved cmdlines
Now that the saved_cmdlines structure is allocated via alloc_page()
and not via kmalloc() it has become invisible to kmemleak. The
allocation done to one of its pointers was flagged as a dangling
allocation leak. Make kmemleak aware of this allocation and free.
- Fix synthetic event dynamic strings
An update that cleaned up the synthetic event code removed the return
value of trace_string(), and had it return zero instead of the
length, causing dynamic strings in the synthetic event to always have
zero size.
- Clean up documentation and header files for seq_buf
* tag 'trace-v6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
seq_buf: Fix kernel documentation
seq_buf: Don't use "proxy" headers
tracing/synthetic: Fix trace_string() return value
tracing: Inform kmemleak of saved_cmdlines allocation
tracing: Use ring_buffer_record_is_set_on() in tracer_tracing_is_on()
tracing: Fix HAVE_DYNAMIC_FTRACE_WITH_REGS ifdef
Pull arm64 fixes from Will Deacon:
"It's a little busier than normal, but it's still not a lot of code and
things seem fairly quiet in general:
- Fix allocation failure during SVE coredumps
- Fix handling of SVE context on signal delivery
- Enable Neoverse N2 CPU errata workarounds for Microsoft's "Azure
Cobalt 100" clone
- Work around CMN PMU erratum in AmpereOneX implementation
- Fix typo in CXL PMU event definition
- Fix jump label asm constraints"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64/sve: Lower the maximum allocation for the SVE ptrace regset
arm64: Subscribe Microsoft Azure Cobalt 100 to ARM Neoverse N2 errata
perf/arm-cmn: Workaround AmpereOneX errata AC04_MESH_1 (incorrect child count)
arm64: jump_label: use constraints "Si" instead of "i"
arm64: fix typo in comments
perf: CXL: fix mismatched cpmu event opcode
arm64/signal: Don't assume that TIF_SVE means we saved SVE state
clang-16 warns about casting between incompatible function types:
drivers/gpu/drm/nouveau/nvkm/subdev/bios/shadow.c:161:10: error: cast from 'void (*)(const struct firmware *)' to 'void (*)(void *)' converts to incompatible function type [-Werror,-Wcast-function-type-strict]
161 | .fini = (void(*)(void *))release_firmware,
This one was done to use the generic shadow_fw_release() function as a
callback for struct nvbios_source. Change it to use the same prototype
as the other five instances, with a trivial helper function that actually
calls release_firmware.
Fixes: 70c0f263cc ("drm/nouveau/bios: pull in basic vbios subdev, more to come later")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Danilo Krummrich <dakr@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240213095753.455062-1-arnd@kernel.org
Pull zonefs fix from Damien Le Moal:
- Fix direct write error handling to avoid a race between failed IO
completion and the submission path itself which can result in an
invalid file size exposed to the user after the failed IO.
* tag 'zonefs-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs:
zonefs: Improve error handling
Pull sound fixes from Takashi Iwai:
"A collection of device-specific fixes. It became a bit bigger than
wished, but all look reasonably small and safe to apply.
- A few Cirrus Logic CS35L56 and CS42L43 driver fixes
- ASoC SOF fixes and workarounds
- Various ASoC Intel fixes
- Lots of HD-, USB-audio and AMD ACP quirks"
* tag 'sound-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (33 commits)
ALSA: usb-audio: More relaxed check of MIDI jack names
ALSA: hda/realtek: fix mute/micmute LED For HP mt645
ALSA: hda/realtek: cs35l41: Fix order and duplicates in quirks table
ALSA: hda/realtek: cs35l41: Fix device ID / model name
ALSA: hda/realtek: cs35l41: Add internal speaker support for ASUS UM3402 with missing DSD
ASoC: cs35l56: Workaround for ACPI with broken spk-id-gpios property
ALSA: hda: Add Lenovo Legion 7i gen7 sound quirk
ASoC: SOF: IPC3: fix message bounds on ipc ops
ASoC: SOF: ipc4-pcm: Workaround for crashed firmware on system suspend
ASoC: q6dsp: fix event handler prototype
ASoC: SOF: Intel: pci-lnl: Change the topology path to intel/sof-ipc4-tplg
ASoC: SOF: Intel: pci-tgl: Change the default paths and firmware names
ASoC: amd: yc: Fix non-functional mic on Lenovo 82UU
ASoC: rt5645: Add DMI quirk for inverted jack-detect on MeeGoPad T8
ASoC: rt5645: Make LattePanda board DMI match more precise
ASoC: SOF: amd: Fix locking in ACP IRQ handler
ASoC: rt5645: Fix deadlock in rt5645_jack_detect_work()
ASoC: Intel: cht_bsw_rt5645: Cleanup codec_name handling
ASoC: Intel: Boards: Fix NULL pointer deref in BYT/CHT boards
ASoC: cs35l56: Remove default from IRQ1_CFG register
...
Pull gpio fixes from Bartosz Golaszewski:
- add missing stubs for functions that are not built with GPIOLIB
disabled
* tag 'gpio-fixes-for-v6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpiolib: add gpio_device_get_label() stub for !GPIOLIB
gpiolib: add gpio_device_get_base() stub for !GPIOLIB
gpiolib: add gpiod_to_gpio_device() stub for !GPIOLIB
Pull lsm fix from Paul Moore:
"One small LSM patch to fix a potential integer overflow in the newly
added lsm_set_self_attr() syscall"
* tag 'lsm-pr-20240215' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm:
lsm: fix integer overflow in lsm_set_self_attr() syscall
Currently when the kernel fails to add a cert to the .machine keyring,
it will throw an error immediately in the function integrity_add_key.
Since the kernel will try adding to the .platform keyring next or throw
an error (in the caller of integrity_add_key i.e. add_to_machine_keyring),
so there is no need to throw an error immediately in integrity_add_key.
Reported-by: itrymybest80@protonmail.com
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2239331
Fixes: d19967764b ("integrity: Introduce a Linux keyring called machine")
Reviewed-by: Eric Snowberg <eric.snowberg@oracle.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
head is defined in idxd->evl as a shadow of head in the EVLSTATUS register.
There are two issues related to the shadow head:
1. Mismatch between the shadow head and the state of the EVLSTATUS
register:
If Event Log is supported, upon completion of the Enable Device command,
the Event Log head in the variable idxd->evl->head should be cleared to
match the state of the EVLSTATUS register. But the variable is not reset
currently, leading mismatch between the variable and the register state.
The mismatch causes incorrect processing of Event Log entries.
2. Unnecessary shadow head definition:
The shadow head is unnecessary as head can be read directly from the
EVLSTATUS register. Reading head from the register incurs no additional
cost because event log head and tail are always read together and
tail is already read directly from the register as required by hardware.
Remove the shadow Event Log head stored in idxd->evl to address the
mentioned issues.
Fixes: 244da66cda ("dmaengine: idxd: setup event log configuration")
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20240215024931.1739621-1-fenghua.yu@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
The MSM8996 platform has registers setup different to the rest of QMP v3
USB platforms. It has PCS region at 0x600 and no PCS_MISC region, while
other platforms have PCS region at 0x800 and PCS_MISC at 0x600. This
results in the malfunctioning USB host on some of the platforms. The
commit f74c35b630 ("phy: qcom-qmp-usb: fix register offsets for
ipq8074/ipq6018") fixed the issue for IPQ platforms, but missed the
SDM845 which has the same register layout.
To simplify future platform addition and to make the driver more future
proof, rename qmp_usb_offsets_v3 to qmp_usb_offsets_v3_msm8996 (to mark
its peculiarity), rename qmp_usb_offsets_ipq8074 to qmp_usb_offsets_v3
and use it for SDM845 platform.
Fixes: 2be22aae6b ("phy: qcom-qmp-usb: populate offsets configuration")
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Link: https://lore.kernel.org/r/20240213133824.2218916-1-dmitry.baryshkov@linaro.org
Signed-off-by: Vinod Koul <vkoul@kernel.org>
If we're redirecting the skb, and haven't called tcf_mirred_forward(),
yet, we need to tell the core to drop the skb by setting the retcode
to SHOT. If we have called tcf_mirred_forward(), however, the skb
is out of our hands and returning SHOT will lead to UaF.
Move the retval override to the error path which actually need it.
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Fixes: e5cf1baf92 ("act_mirred: use TC_ACT_REINSERT when possible")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The test Davide added in commit ca22da2fbd ("act_mirred: use the backlog
for nested calls to mirred ingress") hangs our testing VMs every 10 or so
runs, with the familiar tcp_v4_rcv -> tcp_v4_rcv deadlock reported by
lockdep.
The problem as previously described by Davide (see Link) is that
if we reverse flow of traffic with the redirect (egress -> ingress)
we may reach the same socket which generated the packet. And we may
still be holding its socket lock. The common solution to such deadlocks
is to put the packet in the Rx backlog, rather than run the Rx path
inline. Do that for all egress -> ingress reversals, not just once
we started to nest mirred calls.
In the past there was a concern that the backlog indirection will
lead to loss of error reporting / less accurate stats. But the current
workaround does not seem to address the issue.
Fixes: 53592b3640 ("net/sched: act_mirred: Implement ingress actions")
Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Suggested-by: Davide Caratti <dcaratti@redhat.com>
Link: https://lore.kernel.org/netdev/33dc43f587ec1388ba456b4915c75f02a8aae226.1663945716.git.dcaratti@redhat.com/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This driver uses functions that are supplied by the Kconfig symbol
PHYLIB, so select it to ensure that they are built as needed.
When CONFIG_ADIN1110=y and CONFIG_PHYLIB=m, there are multiple build
(linker) errors that are resolved by this Kconfig change:
ld: drivers/net/ethernet/adi/adin1110.o: in function `adin1110_net_open':
drivers/net/ethernet/adi/adin1110.c:933: undefined reference to `phy_start'
ld: drivers/net/ethernet/adi/adin1110.o: in function `adin1110_probe_netdevs':
drivers/net/ethernet/adi/adin1110.c:1603: undefined reference to `get_phy_device'
ld: drivers/net/ethernet/adi/adin1110.c:1609: undefined reference to `phy_connect'
ld: drivers/net/ethernet/adi/adin1110.o: in function `adin1110_disconnect_phy':
drivers/net/ethernet/adi/adin1110.c:1226: undefined reference to `phy_disconnect'
ld: drivers/net/ethernet/adi/adin1110.o: in function `devm_mdiobus_alloc':
include/linux/phy.h:455: undefined reference to `devm_mdiobus_alloc_size'
ld: drivers/net/ethernet/adi/adin1110.o: in function `adin1110_register_mdiobus':
drivers/net/ethernet/adi/adin1110.c:529: undefined reference to `__devm_mdiobus_register'
ld: drivers/net/ethernet/adi/adin1110.o: in function `adin1110_net_stop':
drivers/net/ethernet/adi/adin1110.c:958: undefined reference to `phy_stop'
ld: drivers/net/ethernet/adi/adin1110.o: in function `adin1110_disconnect_phy':
drivers/net/ethernet/adi/adin1110.c:1226: undefined reference to `phy_disconnect'
ld: drivers/net/ethernet/adi/adin1110.o: in function `adin1110_adjust_link':
drivers/net/ethernet/adi/adin1110.c:1077: undefined reference to `phy_print_status'
ld: drivers/net/ethernet/adi/adin1110.o: in function `adin1110_ioctl':
drivers/net/ethernet/adi/adin1110.c:790: undefined reference to `phy_do_ioctl'
ld: drivers/net/ethernet/adi/adin1110.o:(.rodata+0xf60): undefined reference to `phy_ethtool_get_link_ksettings'
ld: drivers/net/ethernet/adi/adin1110.o:(.rodata+0xf68): undefined reference to `phy_ethtool_set_link_ksettings'
Fixes: bc93e19d08 ("net: ethernet: adi: Add ADIN1110 support")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202402070626.eZsfVHG5-lkp@intel.com/
Cc: Lennart Franzen <lennart@lfdomain.com>
Cc: Alexandru Tachici <alexandru.tachici@analog.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: netdev@vger.kernel.org
Reviewed-by: Nuno Sa <nuno.sa@analog.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tobias Waldekranz says:
====================
net: bridge: switchdev: Ensure MDB events are delivered exactly once
When a device is attached to a bridge, drivers will request a replay
of objects that were created before the device joined the bridge, that
are still of interest to the joining port. Typical examples include
FDB entries and MDB memberships on other ports ("foreign interfaces")
or on the bridge itself.
Conversely when a device is detached, the bridge will synthesize
deletion events for all those objects that are still live, but no
longer applicable to the device in question.
This series eliminates two races related to the synching and
unsynching phases of a bridge's MDB with a joining or leaving device,
that would cause notifications of such objects to be either delivered
twice (1/2), or not at all (2/2).
A similar race to the one solved by 1/2 still remains for the
FDB. This is much harder to solve, due to the lockless operation of
the FDB's rhashtable, and is therefore knowingly left out of this
series.
v1 -> v2:
- Squash the previously separate addition of
switchdev_port_obj_act_is_deferred into first consumer.
- Use ether_addr_equal to compare MAC addresses.
- Document switchdev_port_obj_act_is_deferred (renamed from
switchdev_port_obj_is_deferred in v1, to indicate that we also match
on the action).
- Delay allocations of MDB objects until we know they're needed.
- Use non-RCU version of the hash list iterator, now that the MDB is
not scanned while holding the RCU read lock.
- Add Fixes tag to commit message
v2 -> v3:
- Fix unlocking in error paths
- Access RCU protected port list via mlock_dereference, since MDB is
guaranteed to remain constant for the duration of the scan.
v3 -> v4:
- Limit the search for exiting deferred events in 1/2 to only apply to
additions, since the problem does not exist in the deletion case.
- Add 2/2, to plug a related race when unoffloading an indirectly
associated device.
v4 -> v5:
- Fix grammatical errors in kerneldoc of
switchdev_port_obj_act_is_deferred
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When unoffloading a device, it is important to ensure that all
relevant deferred events are delivered to it before it disassociates
itself from the bridge.
Before this change, this was true for the normal case when a device
maps 1:1 to a net_bridge_port, i.e.
br0
/
swp0
When swp0 leaves br0, the call to switchdev_deferred_process() in
del_nbp() makes sure to process any outstanding events while the
device is still associated with the bridge.
In the case when the association is indirect though, i.e. when the
device is attached to the bridge via an intermediate device, like a
LAG...
br0
/
lag0
/
swp0
...then detaching swp0 from lag0 does not cause any net_bridge_port to
be deleted, so there was no guarantee that all events had been
processed before the device disassociated itself from the bridge.
Fix this by always synchronously processing all deferred events before
signaling completion of unoffloading back to the driver.
Fixes: 4e51bf44a0 ("net: bridge: move the switchdev object replay helpers to "push" mode")
Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Before this change, generation of the list of MDB events to replay
would race against the creation of new group memberships, either from
the IGMP/MLD snooping logic or from user configuration.
While new memberships are immediately visible to walkers of
br->mdb_list, the notification of their existence to switchdev event
subscribers is deferred until a later point in time. So if a replay
list was generated during a time that overlapped with such a window,
it would also contain a replay of the not-yet-delivered event.
The driver would thus receive two copies of what the bridge internally
considered to be one single event. On destruction of the bridge, only
a single membership deletion event was therefore sent. As a
consequence of this, drivers which reference count memberships (at
least DSA), would be left with orphan groups in their hardware
database when the bridge was destroyed.
This is only an issue when replaying additions. While deletion events
may still be pending on the deferred queue, they will already have
been removed from br->mdb_list, so no duplicates can be generated in
that scenario.
To a user this meant that old group memberships, from a bridge in
which a port was previously attached, could be reanimated (in
hardware) when the port joined a new bridge, without the new bridge's
knowledge.
For example, on an mv88e6xxx system, create a snooping bridge and
immediately add a port to it:
root@infix-06-0b-00:~$ ip link add dev br0 up type bridge mcast_snooping 1 && \
> ip link set dev x3 up master br0
And then destroy the bridge:
root@infix-06-0b-00:~$ ip link del dev br0
root@infix-06-0b-00:~$ mvls atu
ADDRESS FID STATE Q F 0 1 2 3 4 5 6 7 8 9 a
DEV:0 Marvell 88E6393X
33:33:00:00:00:6a 1 static - - 0 . . . . . . . . . .
33:33:ff:87:e4:3f 1 static - - 0 . . . . . . . . . .
ff:ff:ff:ff:ff:ff 1 static - - 0 1 2 3 4 5 6 7 8 9 a
root@infix-06-0b-00:~$
The two IPv6 groups remain in the hardware database because the
port (x3) is notified of the host's membership twice: once via the
original event and once via a replay. Since only a single delete
notification is sent, the count remains at 1 when the bridge is
destroyed.
Then add the same port (or another port belonging to the same hardware
domain) to a new bridge, this time with snooping disabled:
root@infix-06-0b-00:~$ ip link add dev br1 up type bridge mcast_snooping 0 && \
> ip link set dev x3 up master br1
All multicast, including the two IPv6 groups from br0, should now be
flooded, according to the policy of br1. But instead the old
memberships are still active in the hardware database, causing the
switch to only forward traffic to those groups towards the CPU (port
0).
Eliminate the race in two steps:
1. Grab the write-side lock of the MDB while generating the replay
list.
This prevents new memberships from showing up while we are generating
the replay list. But it leaves the scenario in which a deferred event
was already generated, but not delivered, before we grabbed the
lock. Therefore:
2. Make sure that no deferred version of a replay event is already
enqueued to the switchdev deferred queue, before adding it to the
replay list, when replaying additions.
Fixes: 4f2673b3a2 ("net: bridge: add helper to replay port and host-joined mdb entries")
Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
iucv_path_table is a dynamically allocated array of pointers to
struct iucv_path items. Yet, its size is calculated as if it was
an array of struct iucv_path items.
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Olliver reported that his system crashes when plugging in Thunderbolt 1
device:
BUG: kernel NULL pointer dereference, address: 0000000000000020
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP NOPTI
RIP: 0010:tb_port_do_update_credits+0x1b/0x130 [thunderbolt]
Call Trace:
<TASK>
? __die+0x23/0x70
? page_fault_oops+0x171/0x4e0
? exc_page_fault+0x7f/0x180
? asm_exc_page_fault+0x26/0x30
? tb_port_do_update_credits+0x1b/0x130
? tb_switch_update_link_attributes+0x83/0xd0
tb_switch_add+0x7a2/0xfe0
tb_scan_port+0x236/0x6f0
tb_handle_hotplug+0x6db/0x900
process_one_work+0x171/0x340
worker_thread+0x27b/0x3a0
? __pfx_worker_thread+0x10/0x10
kthread+0xe5/0x120
? __pfx_kthread+0x10/0x10
ret_from_fork+0x31/0x50
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1b/0x30
</TASK>
This is due the fact that some Thunderbolt 1 devices only have one lane
adapter. Fix this by checking for the lane 1 before we read its credits.
Reported-by: Olliver Schinagl <oliver@schinagl.nl>
Closes: https://lore.kernel.org/linux-usb/c24c7882-6254-4e68-8f22-f3e8f65dc84f@schinagl.nl/
Fixes: 81af2952e6 ("thunderbolt: Add support for asymmetric link")
Cc: stable@vger.kernel.org
Cc: Gil Fine <gil.fine@linux.intel.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
The conversion to netfs in the 6.3 kernel caused a regression when
maximum write size is set by the server to an unexpected value which is
not a multiple of 4096 (similarly if the user overrides the maximum
write size by setting mount parm "wsize", but sets it to a value that
is not a multiple of 4096). When negotiated write size is not a
multiple of 4096 the netfs code can skip the end of the final
page when doing large sequential writes, causing data corruption.
This section of code is being rewritten/removed due to a large
netfs change, but until that point (ie for the 6.3 kernel until now)
we can not support non-standard maximum write sizes.
Add a warning if a user specifies a wsize on mount that is not
a multiple of 4096 (and round down), also add a change where we
round down the maximum write size if the server negotiates a value
that is not a multiple of 4096 (we also have to check to make sure that
we do not round it down to zero).
Reported-by: R. Diez" <rdiez-2006@rd10.de>
Fixes: d08089f649 ("cifs: Change the I/O paths to use an iterator rather than a page list")
Suggested-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
Acked-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
Tested-by: Matthew Ruffell <matthew.ruffell@canonical.com>
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Cc: stable@vger.kernel.org # v6.3+
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Under x86-64, when using bpf_probe_read_kernel{_str}() or
bpf_probe_read{_str}() to read vsyscall page, the read may trigger oops,
so add one test case to ensure that the problem is fixed. Beside those
four bpf helpers mentioned above, testing the read of vsyscall page by
using bpf_probe_read_user{_str} and bpf_copy_from_user{_task}() as well.
The test case passes the address of vsyscall page to these six helpers
and checks whether the returned values are expected:
1) For bpf_probe_read_kernel{_str}()/bpf_probe_read{_str}(), the
expected return value is -ERANGE as shown below:
bpf_probe_read_kernel_common
copy_from_kernel_nofault
// false, return -ERANGE
copy_from_kernel_nofault_allowed
2) For bpf_probe_read_user{_str}(), the expected return value is -EFAULT
as show below:
bpf_probe_read_user_common
copy_from_user_nofault
// false, return -EFAULT
__access_ok
3) For bpf_copy_from_user(), the expected return value is -EFAULT:
// return -EFAULT
bpf_copy_from_user
copy_from_user
_copy_from_user
// return false
access_ok
4) For bpf_copy_from_user_task(), the expected return value is -EFAULT:
// return -EFAULT
bpf_copy_from_user_task
access_process_vm
// return 0
vma_lookup()
// return 0
expand_stack()
The occurrence of oops depends on the availability of CPU SMAP [1]
feature and there are three possible configurations of vsyscall page in
the boot cmd-line: vsyscall={xonly|none|emulate}, so there are a total
of six possible combinations. Under all these combinations, the test
case runs successfully.
[1]: https://en.wikipedia.org/wiki/Supervisor_Mode_Access_Prevention
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20240202103935.3154011-4-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
When trying to use copy_from_kernel_nofault() to read vsyscall page
through a bpf program, the following oops was reported:
BUG: unable to handle page fault for address: ffffffffff600000
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 3231067 P4D 3231067 PUD 3233067 PMD 3235067 PTE 0
Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 1 PID: 20390 Comm: test_progs ...... 6.7.0+ #58
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) ......
RIP: 0010:copy_from_kernel_nofault+0x6f/0x110
......
Call Trace:
<TASK>
? copy_from_kernel_nofault+0x6f/0x110
bpf_probe_read_kernel+0x1d/0x50
bpf_prog_2061065e56845f08_do_probe_read+0x51/0x8d
trace_call_bpf+0xc5/0x1c0
perf_call_bpf_enter.isra.0+0x69/0xb0
perf_syscall_enter+0x13e/0x200
syscall_trace_enter+0x188/0x1c0
do_syscall_64+0xb5/0xe0
entry_SYSCALL_64_after_hwframe+0x6e/0x76
</TASK>
......
---[ end trace 0000000000000000 ]---
The oops is triggered when:
1) A bpf program uses bpf_probe_read_kernel() to read from the vsyscall
page and invokes copy_from_kernel_nofault() which in turn calls
__get_user_asm().
2) Because the vsyscall page address is not readable from kernel space,
a page fault exception is triggered accordingly.
3) handle_page_fault() considers the vsyscall page address as a user
space address instead of a kernel space address. This results in the
fix-up setup by bpf not being applied and a page_fault_oops() is invoked
due to SMAP.
Considering handle_page_fault() has already considered the vsyscall page
address as a userspace address, fix the problem by disallowing vsyscall
page read for copy_from_kernel_nofault().
Originally-by: Thomas Gleixner <tglx@linutronix.de>
Reported-by: syzbot+72aa0161922eba61b50e@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/bpf/CAG48ez06TZft=ATH1qh2c5mpS5BT8UakwNkzi6nvK5_djC-4Nw@mail.gmail.com
Reported-by: xingwei lee <xrivendell7@gmail.com>
Closes: https://lore.kernel.org/bpf/CABOYnLynjBoFZOf3Z4BhaZkc5hx_kHfsjiW+UWLoB=w33LvScw@mail.gmail.com
Signed-off-by: Hou Tao <houtao1@huawei.com>
Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240202103935.3154011-3-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Write error handling is racy and can sometime lead to the error recovery
path wrongly changing the inode size of a sequential zone file to an
incorrect value which results in garbage data being readable at the end
of a file. There are 2 problems:
1) zonefs_file_dio_write() updates a zone file write pointer offset
after issuing a direct IO with iomap_dio_rw(). This update is done
only if the IO succeed for synchronous direct writes. However, for
asynchronous direct writes, the update is done without waiting for
the IO completion so that the next asynchronous IO can be
immediately issued. However, if an asynchronous IO completes with a
failure right before the i_truncate_mutex lock protecting the update,
the update may change the value of the inode write pointer offset
that was corrected by the error path (zonefs_io_error() function).
2) zonefs_io_error() is called when a read or write error occurs. This
function executes a report zone operation using the callback function
zonefs_io_error_cb(), which does all the error recovery handling
based on the current zone condition, write pointer position and
according to the mount options being used. However, depending on the
zoned device being used, a report zone callback may be executed in a
context that is different from the context of __zonefs_io_error(). As
a result, zonefs_io_error_cb() may be executed without the inode
truncate mutex lock held, which can lead to invalid error processing.
Fix both problems as follows:
- Problem 1: Perform the inode write pointer offset update before a
direct write is issued with iomap_dio_rw(). This is safe to do as
partial direct writes are not supported (IOMAP_DIO_PARTIAL is not
set) and any failed IO will trigger the execution of zonefs_io_error()
which will correct the inode write pointer offset to reflect the
current state of the one on the device.
- Problem 2: Change zonefs_io_error_cb() into zonefs_handle_io_error()
and call this function directly from __zonefs_io_error() after
obtaining the zone information using blkdev_report_zones() with a
simple callback function that copies to a local stack variable the
struct blk_zone obtained from the device. This ensures that error
handling is performed holding the inode truncate mutex.
This change also simplifies error handling for conventional zone files
by bypassing the execution of report zones entirely. This is safe to
do because the condition of conventional zones cannot be read-only or
offline and conventional zone files are always fully mapped with a
constant file size.
Reported-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Fixes: 8dcc1a9d90 ("fs: New zonefs file system")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
md_start_sync() will suspend the array if there are spares that can be
added or removed from conf, however, if reshape is still in progress,
this won't happen at all or data will be corrupted(remove_and_add_spares
won't be called from md_choose_sync_action for reshape), hence there is
no need to suspend the array if reshape is not done yet.
Meanwhile, there is a potential deadlock for raid456:
1) reshape is interrupted;
2) set one of the disk WantReplacement, and add a new disk to the array,
however, recovery won't start until the reshape is finished;
3) then issue an IO across reshpae position, this IO will wait for
reshape to make progress;
4) continue to reshape, then md_start_sync() found there is a spare disk
that can be added to conf, mddev_suspend() is called;
Step 4 and step 3 is waiting for each other, deadlock triggered. Noted
this problem is found by code review, and it's not reporduced yet.
Fix this porblem by don't suspend the array for interrupted reshape,
this is safe because conf won't be changed until reshape is done.
Fixes: bc08041b32 ("md: suspend array in md_start_sync() if array need reconfiguration")
Cc: stable@vger.kernel.org # v6.7+
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20240201092559.910982-6-yukuai1@huaweicloud.com
Currently, if reshape is interrupted, then reassemble the array will
register sync_thread directly from pers->run(), in this case
'MD_RECOVERY_RUNNING' is set directly, however, there is no guarantee
that md_do_sync() will be executed, hence stop_sync_thread() will hang
because 'MD_RECOVERY_RUNNING' can't be cleared.
Last patch make sure that md_do_sync() will set MD_RECOVERY_DONE,
however, following hang can still be triggered by dm-raid test
shell/lvconvert-raid-reshape.sh occasionally:
[root@fedora ~]# cat /proc/1982/stack
[<0>] stop_sync_thread+0x1ab/0x270 [md_mod]
[<0>] md_frozen_sync_thread+0x5c/0xa0 [md_mod]
[<0>] raid_presuspend+0x1e/0x70 [dm_raid]
[<0>] dm_table_presuspend_targets+0x40/0xb0 [dm_mod]
[<0>] __dm_destroy+0x2a5/0x310 [dm_mod]
[<0>] dm_destroy+0x16/0x30 [dm_mod]
[<0>] dev_remove+0x165/0x290 [dm_mod]
[<0>] ctl_ioctl+0x4bb/0x7b0 [dm_mod]
[<0>] dm_ctl_ioctl+0x11/0x20 [dm_mod]
[<0>] vfs_ioctl+0x21/0x60
[<0>] __x64_sys_ioctl+0xb9/0xe0
[<0>] do_syscall_64+0xc6/0x230
[<0>] entry_SYSCALL_64_after_hwframe+0x6c/0x74
Meanwhile mddev->recovery is:
MD_RECOVERY_RUNNING |
MD_RECOVERY_INTR |
MD_RECOVERY_RESHAPE |
MD_RECOVERY_FROZEN
Fix this problem by remove the code to register sync_thread directly
from raid10 and raid5. And let md_check_recovery() to register
sync_thread.
Fixes: f67055780c ("[PATCH] md: Checkpoint and allow restart of raid5 reshape")
Fixes: f52f5c71f3 ("md: fix stopping sync thread")
Cc: stable@vger.kernel.org # v6.7+
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20240201092559.910982-5-yukuai1@huaweicloud.com
stop_sync_thread() will interrupt md_do_sync(), and md_do_sync() must
set MD_RECOVERY_DONE, so that follow up md_check_recovery() will
unregister sync_thread, clear MD_RECOVERY_RUNNING and wake up
stop_sync_thread().
If MD_RECOVERY_WAIT is set or the array is read-only, md_do_sync() will
return without setting MD_RECOVERY_DONE, and after commit f52f5c71f3
("md: fix stopping sync thread"), dm-raid switch from
md_reap_sync_thread() to stop_sync_thread() to unregister sync_thread
from md_stop() and md_stop_writes(), causing the test
shell/lvconvert-raid-reshape.sh hang.
We shouldn't switch back to md_reap_sync_thread() because it's
problematic in the first place. Fix the problem by making sure
md_do_sync() will set MD_RECOVERY_DONE.
Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Closes: https://lore.kernel.org/all/ece2b06f-d647-6613-a534-ff4c9bec1142@redhat.com/
Fixes: d5d885fd51 ("md: introduce new personality funciton start()")
Fixes: 5fd6c1dce0 ("[PATCH] md: allow checkpoint of recovery with version-1 superblock")
Fixes: f52f5c71f3 ("md: fix stopping sync thread")
Cc: stable@vger.kernel.org # v6.7+
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20240201092559.910982-4-yukuai1@huaweicloud.com
Usually if the array is not read-write, md_check_recovery() won't
register new sync_thread in the first place. And if the array is
read-write and sync_thread is registered, md_set_readonly() will
unregister sync_thread before setting the array read-only. md/raid
follow this behavior hence there is no problem.
After commit f52f5c71f3 ("md: fix stopping sync thread"), following
hang can be triggered by test shell/integrity-caching.sh:
1) array is read-only. dm-raid update super block:
rs_update_sbs
ro = mddev->ro
mddev->ro = 0
-> set array read-write
md_update_sb
2) register new sync thread concurrently.
3) dm-raid set array back to read-only:
rs_update_sbs
mddev->ro = ro
4) stop the array:
raid_dtr
md_stop
stop_sync_thread
set_bit(MD_RECOVERY_INTR, &mddev->recovery);
md_wakeup_thread_directly(mddev->sync_thread);
wait_event(..., !test_bit(MD_RECOVERY_RUNNING, &mddev->recovery))
5) sync thread done:
md_do_sync
set_bit(MD_RECOVERY_DONE, &mddev->recovery);
md_wakeup_thread(mddev->thread);
6) daemon thread can't unregister sync thread:
md_check_recovery
if (!md_is_rdwr(mddev) &&
!test_bit(MD_RECOVERY_NEEDED, &mddev->recovery))
return;
-> -> MD_RECOVERY_RUNNING can't be cleared, hence step 4 hang;
The root cause is that dm-raid manipulate 'mddev->ro' by itself,
however, dm-raid really should stop sync thread before setting the
array read-only. Unfortunately, I need to read more code before I
can refacter the handler of 'mddev->ro' in dm-raid, hence let's fix
the problem the easy way for now to prevent dm-raid regression.
Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Closes: https://lore.kernel.org/all/9801e40-8ac7-e225-6a71-309dcf9dc9aa@redhat.com/
Fixes: ecbfb9f118 ("dm raid: add raid level takeover support")
Fixes: f52f5c71f3 ("md: fix stopping sync thread")
Cc: stable@vger.kernel.org # v6.7+
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20240201092559.910982-3-yukuai1@huaweicloud.com
mddev_suspend() never stop sync_thread, hence it doesn't make sense to
ignore suspended array in md_check_recovery(), which might cause
sync_thread can't be unregistered.
After commit f52f5c71f3 ("md: fix stopping sync thread"), following
hang can be triggered by test shell/integrity-caching.sh:
1) suspend the array:
raid_postsuspend
mddev_suspend
2) stop the array:
raid_dtr
md_stop
__md_stop_writes
stop_sync_thread
set_bit(MD_RECOVERY_INTR, &mddev->recovery);
md_wakeup_thread_directly(mddev->sync_thread);
wait_event(..., !test_bit(MD_RECOVERY_RUNNING, &mddev->recovery))
3) sync thread done:
md_do_sync
set_bit(MD_RECOVERY_DONE, &mddev->recovery);
md_wakeup_thread(mddev->thread);
4) daemon thread can't unregister sync thread:
md_check_recovery
if (mddev->suspended)
return; -> return directly
md_read_sync_thread
clear_bit(MD_RECOVERY_RUNNING, &mddev->recovery);
-> MD_RECOVERY_RUNNING can't be cleared, hence step 2 hang;
This problem is not just related to dm-raid, fix it by ignoring
suspended array in md_check_recovery(). And follow up patches will
improve dm-raid better to frozen sync thread during suspend.
Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Closes: https://lore.kernel.org/all/8fb335e-6d2c-dbb5-d7-ded8db5145a@redhat.com/
Fixes: 68866e425b ("MD: no sync IO while suspended")
Fixes: f52f5c71f3 ("md: fix stopping sync thread")
Cc: stable@vger.kernel.org # v6.7+
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20240201092559.910982-2-yukuai1@huaweicloud.com
Commit c92a6b5d63 ("scsi: core: Query VPD size before getting full
page") removed the logic which checks whether a VPD page is present on
the supported pages list before asking for the page itself. That was
done because SPC helpfully states "The Supported VPD Pages VPD page
list may or may not include all the VPD pages that are able to be
returned by the device server". Testing had revealed a few devices
that supported some of the 0xBn pages but didn't actually list them in
page 0.
Julian Sikorski bisected a problem with his drive resetting during
discovery to the commit above. As it turns out, this particular drive
firmware will crash if we attempt to fetch page 0xB9.
Various approaches were attempted to work around this. In the end,
reinstating the logic that consults VPD page 0 before fetching any
other page was the path of least resistance. A firmware update for the
devices which originally compelled us to remove the check has since
been released.
Link: https://lore.kernel.org/r/20240214221411.2888112-1-martin.petersen@oracle.com
Fixes: c92a6b5d63 ("scsi: core: Query VPD size before getting full page")
Cc: stable@vger.kernel.org
Cc: Bart Van Assche <bvanassche@acm.org>
Reported-by: Julian Sikorski <belegdol@gmail.com>
Tested-by: Julian Sikorski <belegdol@gmail.com>
Reviewed-by: Lee Duncan <lee.duncan@suse.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Pull networking fixes from Jakub Kicinski:
"Including fixes from can, wireless and netfilter.
Current release - regressions:
- af_unix: fix task hung while purging oob_skb in GC
- pds_core: do not try to run health-thread in VF path
Current release - new code bugs:
- sched: act_mirred: don't zero blockid when net device is being
deleted
Previous releases - regressions:
- netfilter:
- nat: restore default DNAT behavior
- nf_tables: fix bidirectional offload, broken when unidirectional
offload support was added
- openvswitch: limit the number of recursions from action sets
- eth: i40e: do not allow untrusted VF to remove administratively set
MAC address
Previous releases - always broken:
- tls: fix races and bugs in use of async crypto
- mptcp: prevent data races on some of the main socket fields, fix
races in fastopen handling
- dpll: fix possible deadlock during netlink dump operation
- dsa: lan966x: fix crash when adding interface under a lag when some
of the ports are disabled
- can: j1939: prevent deadlock by changing j1939_socks_lock to rwlock
Misc:
- a handful of fixes and reliability improvements for selftests
- fix sysfs documentation missing net/ in paths
- finish the work of squashing the missing MODULE_DESCRIPTION()
warnings in networking"
* tag 'net-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (92 commits)
net: fill in MODULE_DESCRIPTION()s for missing arcnet
net: fill in MODULE_DESCRIPTION()s for mdio_devres
net: fill in MODULE_DESCRIPTION()s for ppp
net: fill in MODULE_DESCRIPTION()s for fddik/skfp
net: fill in MODULE_DESCRIPTION()s for plip
net: fill in MODULE_DESCRIPTION()s for ieee802154/fakelb
net: fill in MODULE_DESCRIPTION()s for xen-netback
net: ravb: Count packets instead of descriptors in GbEth RX path
pppoe: Fix memory leak in pppoe_sendmsg()
net: sctp: fix skb leak in sctp_inq_free()
net: bcmasp: Handle RX buffer allocation failure
net-timestamp: make sk_tskey more predictable in error path
selftests: tls: increase the wait in poll_partial_rec_async
ice: Add check for lport extraction to LAG init
netfilter: nf_tables: fix bidirectional offload regression
netfilter: nat: restore default DNAT behavior
netfilter: nft_set_pipapo: fix missing : in kdoc
igc: Remove temporary workaround
igb: Fix string truncation warnings in igb_set_fw_version
can: netlink: Fix TDCO calculation using the old data bittiming
...
Pull xen fixes from Juergen Gross:
"Fixes and simple cleanups:
- use a proper flexible array instead of a one-element array in order
to avoid array-bounds sanitizer errors
- add NULL pointer checks after allocating memory
- use memdup_array_user() instead of open-coding it
- fix a rare race condition in Xen event channel allocation code
- make struct bus_type instances const
- make kerneldoc inline comments match reality"
* tag 'for-linus-6.8a-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/events: close evtchn after mapping cleanup
xen/gntalloc: Replace UAPI 1-element array
xen: balloon: make balloon_subsys const
xen: pcpu: make xen_pcpu_subsys const
xen/privcmd: Use memdup_array_user() in alloc_ioreq()
x86/xen: Add some null pointer checking to smp.c
xen/xenbus: document will_handle argument for xenbus_watch_path()
Gfx11 debug flags mask is currently set with an implicit assumption that
no other mqd update flags exist. This needs to be fixed with newly
introduced flag UPDATE_FLAG_IS_GWS by the previous patch.
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
In certain cooperative group dispatch scenarios the default SPI resource
allocation may cause reduced per-CU workgroup occupancy. Set
COMPUTE_RESOURCE_LIMITS.FORCE_SIMD_DIST=1 to mitigate soft hang
scenarions.
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Suggested-by: Joseph Greathouse <Joseph.Greathouse@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The dcn30_get_gamcor_current() function is responsible for determining
the current gamma correction mode used by the display controller.
However, the 'mode' variable, which stores the gamma correction mode,
was not initialized before its first usage, leading to an uninitialized
symbol error.
Thus initializes the 'mode' variable with a default value of LUT_BYPASS
before the conditional statements in the function, improves code clarity
and stability, ensuring correct behavior of the
dcn30_get_gamcor_current() function in determining the gamma correction
mode.
Fixes the below:
drivers/gpu/drm/amd/amdgpu/../display/dc/dcn30/dcn30_dpp_cm.c:77 dpp30_get_gamcor_current() error: uninitialized symbol 'mode'.
Fixes: 03f54d7d34 ("drm/amd/display: Add DCN3 DPP")
Cc: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Cc: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Suggested-by: Roman Li <roman.li@amd.com>
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
As part of a cleanup amdgpu_dm_fini() function, which is typically
called when a device is being shut down or a driver is being unloaded
The below error message suggests that there is a potential null pointer
dereference issue with adev->dm.dc.
In the below, line of code where adev->dm.dc is used without a preceding
null check:
for (i = 0; i < adev->dm.dc->caps.max_links; i++) {
To fix this issue, add a null check for adev->dm.dc before this line.
Reported by smatch:
drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:1959 amdgpu_dm_fini() error: we previously assumed 'adev->dm.dc' could be null (see line 1943)
Fixes: 006c26a0f1 ("drm/amd/display: Fix crash on device remove/driver unload")
Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Cc: Harry Wentland <harry.wentland@amd.com>
Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Roman Li <roman.li@amd.com>
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
commit ab4750332d ("drm/amdgpu/sdma5.2: add begin/end_use ring
callbacks") caused GFXOFF control to be used more heavily and the
codepath that was removed from commit 0dee726395 ("drm/amd: flush any
delayed gfxoff on suspend entry") now can be exercised at suspend again.
Users report that by using GNOME to suspend the lockscreen trigger will
cause SDMA traffic and the system can deadlock.
This reverts commit 0dee726395.
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Fixes: ab4750332d ("drm/amdgpu/sdma5.2: add begin/end_use ring callbacks")
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
commit 5095d54181 ("drm/amd: Evict resources during PM ops prepare()
callback") intentionally moved the eviction of resources to earlier in
the suspend process, but this introduced a subtle change that it occurs
before adev->in_s0ix or adev->in_s3 are set. This meant that APUs
actually started to evict resources at suspend time as well.
Explicitly set s0ix or s3 in the prepare() stage, and unset them if the
prepare() stage failed.
v2: squash in warning fix from Stephen Rothwell
Reported-by: Jürg Billeter <j@bitron.ch>
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3132#note_2271038
Fixes: 5095d54181 ("drm/amd: Evict resources during PM ops prepare() callback")
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
when 'find_dcfclk_for_voltage()' function is looping over
VG_NUM_SOC_VOLTAGE_LEVELS (which is 8), but the size of the DcfClocks
array is VG_NUM_DCFCLK_DPM_LEVELS (which is 7).
When the loop variable i reaches 7, the function tries to access
clock_table->DcfClocks[7]. However, since the size of the DcfClocks
array is 7, the valid indices are 0 to 6. Index 7 is beyond the size of
the array, leading to a buffer overflow.
Reported by smatch & thus fixing the below:
drivers/gpu/drm/amd/amdgpu/../display/dc/clk_mgr/dcn301/vg_clk_mgr.c:550 find_dcfclk_for_voltage() error: buffer overflow 'clock_table->DcfClocks' 7 <= 7
Fixes: 3a83e4e64b ("drm/amd/display: Add dcn3.01 support to DC (v2)")
Cc: Roman Li <Roman.Li@amd.com>
Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Roman Li <roman.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
'max_chunks_fbc_mode' is only declared and assigned a value under a
specific condition in the following lines:
if (data->fbc_en[i] == 1) {
max_chunks_fbc_mode = 128 - dmif_chunk_buff_margin;
}
If 'data->fbc_en[i]' is not equal to 1 for any i, max_chunks_fbc_mode
will not be initialized if it's used outside of this for loop.
Ensure that 'max_chunks_fbc_mode' is properly initialized before it's
used. Initialize it to a default value right after its declaration to
ensure that it gets a value assigned under all possible control flow
paths.
Thus fixing the below:
drivers/gpu/drm/amd/amdgpu/../display/dc/basics/dce_calcs.c:914 calculate_bandwidth() error: uninitialized symbol 'max_chunks_fbc_mode'.
drivers/gpu/drm/amd/amdgpu/../display/dc/basics/dce_calcs.c:917 calculate_bandwidth() error: uninitialized symbol 'max_chunks_fbc_mode'.
Fixes: 4562236b3b ("drm/amd/dc: Add dc display driver (v2)")
Cc: Harry Wentland <harry.wentland@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Roman Li <roman.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
wait_time_microsec = max(wait_time_microsec, (uint32_t)
DPIA_CLK_SYNC_DELAY);
Above line is trying to assign the maximum value between
'wait_time_microsec' and 'DPIA_CLK_SYNC_DELAY' to wait_time_microsec.
However, 'wait_time_microsec' has not been assigned a value before this
line, initialize 'wait_time_microsec' at the point of declaration.
Fixes the below:
drivers/gpu/drm/amd/amdgpu/../display/dc/link/protocols/link_dp_training_dpia.c:697 dpia_training_eq_non_transparent() error: uninitialized symbol 'wait_time_microsec'.
Fixes: 630168a973 ("drm/amd/display: move dp link training logic to link_dp_training")
Cc: Wenjing Liu <wenjing.liu@amd.com>
Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Roman Li <roman.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
These ANDs should be ORs or it will lead to a NULL dereference.
Fixes: fb5a3d0370 ("drm/amd/display: Add NULL test for 'timing generator' in 'dcn21_set_pipe()'")
Fixes: 886571d217 ("drm/amd/display: Fix 'panel_cntl' could be null in 'dcn21_set_backlight_level()'")
Reviewed-by: Anthony Koo <anthony.koo@amd.com>
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
We have observed that there are quite a number of PSR-SU panels on the
market that are unable to keep up with what user space throws at them,
resulting in hangs and random black screens. So, make damage clips
support configurable and disable it by default for PSR-SU displays.
Cc: stable@vger.kernel.org
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
In commit 4356e9f841 ("work around gcc bugs with 'asm goto' with
outputs") I did the gcc workaround unconditionally, because the cause of
the bad code generation wasn't entirely clear.
In the meantime, Jakub Jelinek debugged the issue, and has come up with
a fix in gcc [2], which also got backported to the still maintained
branches of gcc-11, gcc-12 and gcc-13.
Note that while the fix technically wasn't in the original gcc-14
branch, Jakub says:
"while it is true that no GCC 14 snapshots until today (or whenever the
fix will be committed) have the fix, for GCC trunk it is up to the
distros to use the latest snapshot if they use it at all and would
allow better testing of the kernel code without the workaround, so
that if there are other issues they won't be discovered years later.
Most userland code doesn't actually use asm goto with outputs..."
so we will consider gcc-14 to be fixed - if somebody is using gcc
snapshots of the gcc-14 before the fix, they should upgrade.
Note that while the bug goes back to gcc-11, in practice other gcc
changes seem to have effectively hidden it since gcc-12.1 as per a
bisect by Jakub. So even a gcc-14 snapshot without the fix likely
doesn't show actual problems.
Also, make the default 'asm_goto_output()' macro mark the asm as
volatile by hand, because of an unrelated gcc issue [1] where it doesn't
match the documented behavior ("asm goto is always volatile").
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103979 [1]
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113921 [2]
Link: https://lore.kernel.org/all/20240208220604.140859-1-seanjc@google.com/
Requested-by: Jakub Jelinek <jakub@redhat.com>
Cc: Uros Bizjak <ubizjak@gmail.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Andrew Pinski <quic_apinski@quicinc.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull devicetree fixes from Rob Herring:
- Improve devlink dependency parsing for DT graphs
- Fix devlink handling of io-channels dependencies
- Fix PCI addressing in marvell,prestera example
- A few schema fixes for property constraints
- Improve performance of DT unprobed devices kselftest
- Fix regression in DT_SCHEMA_FILES handling
- Fix compile error in unittest for !OF_DYNAMIC
* tag 'devicetree-fixes-for-6.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
dt-bindings: ufs: samsung,exynos-ufs: Add size constraints on "samsung,sysreg"
of: property: Add in-ports/out-ports support to of_graph_get_port_parent()
of: property: Improve finding the supplier of a remote-endpoint property
of: property: Improve finding the consumer of a remote-endpoint property
net: marvell,prestera: Fix example PCI bus addressing
of: unittest: Fix compile in the non-dynamic case
of: property: fix typo in io-channels
dt-bindings: tpm: Drop type from "resets"
dt-bindings: display: nxp,tda998x: Fix 'audio-ports' constraints
dt-bindings: xilinx: replace Piyush Mehta maintainership
kselftest: dt: Stop relying on dirname to improve performance
dt-bindings: don't anchor DT_SCHEMA_FILES to bindings directory
Pull spi fixes from Mark Brown:
"A smallish collection of fixes for SPI, all driver specific, plus one
device ID addition for a new Intel part.
The ppc4xx isn't routinely covered by most of the automated testing so
there were some errors that were missed in some of the recent API
conversions, otherwise there's nothing super remarkable here"
* tag 'spi-fix-v6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi-mxs: Fix chipselect glitch
spi: intel-pci: Add support for Lunar Lake-M SPI serial flash
spi: omap2-mcspi: Revert FIFO support without DMA
spi: ppc4xx: Drop write-only variable
spi: ppc4xx: Fix fallout from rename in struct spi_bitbang
spi: ppc4xx: Fix fallout from include cleanup
spi: spi-ppc4xx: include missing platform_device.h
spi: imx: fix the burst length at DMA mode and CPU mode
Pull regmap test fixes from Mark Brown:
"Guenter runs a lot of KUnit tests so noticed that there were a couple
of the regmap tests, including the newly added noinc test, which could
show spurious failures due to the use of randomly generated test
values. These changes handle the randomly generated data properly"
* tag 'regmap-fix-v6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
regmap: kunit: Ensure that changed bytes are actually different
regmap: kunit: fix raw noinc write test wrapping
Pull HID fixes from Jiri Kosina:
- fix for 'MSC_SERIAL = 0' corner case handling in wacom driver (Jason
Gerecke)
- ACPI S3 suspend/resume fix for intel-ish-hid (Even Xu)
- race condition fix preventing Wacom driver from losing events shortly
after initialization (Jason Gerecke)
- fix preventing certain Logitech HID++ devices from spamming kernel
log (Oleksandr Natalenko)
* tag 'hid-for-linus-2024021501' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
HID: wacom: generic: Avoid reporting a serial of '0' to userspace
HID: Intel-ish-hid: Ishtp: Fix sensor reads after ACPI S3 suspend
HID: multitouch: Add required quirk for Synaptics 0xcddc device
HID: wacom: Do not register input devices until after hid_hw_start
HID: logitech-hidpp: Do not flood kernel log
The brute force iommu_flush_iotlb_all() was good enough for unmap, but
in some cases a map operation could require removing a table pte entry
to replace with a block entry. This also requires tlb invalidation.
Missing this was resulting an obscure iova fault on what should be a
valid buffer address.
Thanks to Robin Murphy for helping me understand the cause of the fault.
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: stable@vger.kernel.org
Fixes: b145c6e65e ("drm/msm: Add support to create a local pagetable")
Signed-off-by: Rob Clark <robdclark@chromium.org>
Patchwork: https://patchwork.freedesktop.org/patch/578117/
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2024-02-06 (igb, igc)
This series contains updates to igb and igc drivers.
Kunwu Chan adjusts firmware version string implementation to resolve
possible NULL pointer issue for igb.
Sasha removes workaround on igc.
* '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
igc: Remove temporary workaround
igb: Fix string truncation warnings in igb_set_fw_version
====================
Link: https://lore.kernel.org/r/20240214180347.3219650-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Breno Leitao says:
====================
Fix MODULE_DESCRIPTION() for net (p6)
There are a few network modules left that misses MODULE_DESCRIPTION(),
causing a warnning when compiling with W=1. Example:
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/net/arcnet/....
This last patchset solves the problem for all the missing driver. It is
not expect to see any warning for the driver/net and net/ directory once
all these patches have landed.
v1: https://lore.kernel.org/all/20240213112122.404045-1-leitao@debian.org/
====================
Link: https://lore.kernel.org/r/20240214152741.670178-1-leitao@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The USB audio driver tries to retrieve MIDI jack name strings that can
be used for rawmidi substream names and sequencer port names, but its
checking is too strict: often the firmware provides the jack info for
unexpected directions, and then we miss the info although it's
present.
In this patch, the code to extract the jack info is changed to allow
both in and out directions in a single loop. That is, the former two
functions to obtain the descriptor pointers for jack in and out are
changed to a single function that returns iJack of the corresponding
jack ID, no matter which direction is used. It's a code
simplification at the same time as well as the fix.
Fixes: eb596e0fd1 ("ALSA: usb-audio: generate midi streaming substream names from jack names")
Link: https://lore.kernel.org/r/20240215153144.26047-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
ASoC: Fixes for v6.8
A relatively large set of fixes and quirk additions here but they're all
driver specific, people seem to be back into the swing of things after
the holidays. This is all driver specific and much of it fairly minor.
lld is now able to build ARMv4 and ARMv4T kernels, which means it can
generate thunks for those (__ARMv4PILongThunk_*, __ARMv4PILongBXThunk_*)
that can interfere with kallsyms table generation since they do not get
ignore like the corresponding ARMv5+ ones are:
Inconsistent kallsyms data
Try "make KALLSYMS_EXTRA_PASS=1" as a workaround
Replace the hardcoded list of thunk symbols with a more general regex that
covers this one along with future symbols that follow the same pattern.
Fixes: 5eb6e28043 ("ARM: 9289/1: Allow pre-ARMv5 builds with ld.lld 16.0.0 and newer")
Fixes: efe6e30680 ("kallsyms: fix nonconverging kallsyms table with lld")
Suggested-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
The buffer_pg variable needs to hold an order-5 allocation (32 x
PAGE_SIZE) which, under memory pressure may fail to be allocated. Deal
with that error condition properly to avoid doing a NULL pointer
de-reference in the subsequent call to dma_map_page().
In addition, the err_reclaim_tx error label in bcmasp_netif_init() needs
to ensure that the TX NAPI object is properly deleted, otherwise
unregister_netdev() will spin forever attempting to test and clear
the NAPI_STATE_HASHED bit.
Fixes: 490cb41200 ("net: bcmasp: Add support for ASP2.0 Ethernet controller")
Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Justin Chen <justin.chen@broadcom.com>
Link: https://lore.kernel.org/r/20240213173339.3438713-1-florian.fainelli@broadcom.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following batch contains Netfilter fixes for net:
1) Missing : in kdoc field in nft_set_pipapo.
2) Restore default DNAT behavior When a DNAT rule is configured via
iptables with different port ranges, from Kyle Swenson.
3) Restore flowtable hardware offload for bidirectional flows
by setting NF_FLOW_HW_BIDIRECTIONAL flag, from Felix Fietkau.
netfilter pull request 24-02-15
* tag 'nf-24-02-15' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nf_tables: fix bidirectional offload regression
netfilter: nat: restore default DNAT behavior
netfilter: nft_set_pipapo: fix missing : in kdoc
====================
Link: https://lore.kernel.org/r/20240214233818.7946-1-pablo@netfilter.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Doug Anderson observed that ChromeOS crashes are being reported which
include failing allocations of order 7 during core dumps due to ptrace
allocating storage for regsets:
chrome: page allocation failure: order:7,
mode:0x40dc0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO),
nodemask=(null),cpuset=urgent,mems_allowed=0
...
regset_get_alloc+0x1c/0x28
elf_core_dump+0x3d8/0xd8c
do_coredump+0xeb8/0x1378
with further investigation showing that this is:
[ 66.957385] DOUG: Allocating 279584 bytes
which is the maximum size of the SVE regset. As Doug observes it is not
entirely surprising that such a large allocation of contiguous memory might
fail on a long running system.
The SVE regset is currently sized to hold SVE registers with a VQ of
SVE_VQ_MAX which is 512, substantially more than the architectural maximum
of 16 which we might see even in a system emulating the limits of the
architecture. Since we don't expose the size we tell the regset core
externally let's define ARCH_SVE_VQ_MAX with the actual architectural
maximum and use that for the regset, we'll still overallocate most of the
time but much less so which will be helpful even if the core is fixed to
not require contiguous allocations.
Specify ARCH_SVE_VQ_MAX in terms of the maximum value that can be written
into ZCR_ELx.LEN (where this is set in the hardware). For consistency
update the maximum SME vector length to be specified in the same style
while we are at it.
We could also teach the ptrace core about runtime discoverable regset sizes
but that would be a more invasive change and this is being observed in
practical systems.
Reported-by: Doug Anderson <dianders@chromium.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
Tested-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20240213-arm64-sve-ptrace-regset-size-v2-1-c7600ca74b9b@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Marc Kleine-Budde says:
====================
pull-request: can 2024-02-14
this is a pull request of 3 patches for net/master.
the first patch is by Ziqi Zhao and targets the CAN J1939 protocol, it
fixes a potential deadlock by replacing the spinlock by an rwlock.
Oleksij Rempel's patch adds a missing spin_lock_bh() to prevent a
potential Use-After-Free in the CAN J1939's
setsockopt(SO_J1939_FILTER).
Maxime Jayat contributes a patch to fix the transceiver delay
compensation (TDCO) calculation, which is needed for higher CAN-FD bit
rates (usually 2Mbit/s).
* tag 'linux-can-fixes-for-6.8-20240214' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
can: netlink: Fix TDCO calculation using the old data bittiming
can: j1939: Fix UAF in j1939_sk_match_filter during setsockopt(SO_J1939_FILTER)
can: j1939: prevent deadlock by changing j1939_socks_lock to rwlock
====================
Link: https://lore.kernel.org/r/20240214140348.2412776-1-mkl@pengutronix.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
When SOF_TIMESTAMPING_OPT_ID is used to ambiguate timestamped datagrams,
the sk_tskey can become unpredictable in case of any error happened
during sendmsg(). Move increment later in the code and make decrement of
sk_tskey in error path. This solution is still racy in case of multiple
threads doing snedmsg() over the very same socket in parallel, but still
makes error path much more predictable.
Fixes: 09c2d251b7 ("net-timestamp: add key to disambiguate concurrent datagrams")
Reported-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20240213110428.1681540-1-vadfed@meta.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
It's currently possible to change the mesh ID when the
interface isn't yet in mesh mode, at the same time as
changing it into mesh mode. This leads to an overwrite
of data in the wdev->u union for the interface type it
currently has, causing cfg80211_change_iface() to do
wrong things when switching.
We could probably allow setting an interface to mesh
while setting the mesh ID at the same time by doing a
different order of operations here, but realistically
there's no userspace that's going to do this, so just
disallow changes in iftype when setting mesh ID.
Cc: stable@vger.kernel.org
Fixes: 29cbe68c51 ("cfg80211/mac80211: add mesh join/leave commands")
Reported-by: syzbot+dd4779978217b1973180@syzkaller.appspotmail.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
clang-16 warns about a cast between incompatible function types:
drivers/gpu/drm/xe/xe_range_fence.c:155:10: error: cast from 'void (*)(const void *)' to 'void (*)(struct xe_range_fence *)' converts to incompatible function type [-Werror,-Wcast-function-type-strict]
155 | .free = (void (*)(struct xe_range_fence *rfence)) kfree,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Avoid this with a trivial helper function that calls kfree() here.
v2:
- s/* rfence/*rfence/ (Thomas)
Fixes: 845f64bdbf ("drm/xe: Introduce a range-fence utility")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240213095719.454865-1-arnd@kernel.org
(cherry picked from commit f2c9364db5)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
The function xe_vm_prepare_vma was blindly accepting zero as the
number of fences and forwarded that to drm_exec_prepare_obj.
However, that leads to an out-of-bounds shift in the
dma_resv_reserve_fences() and while one could argue that the
dma_resv code should be robust against that, avoid attempting
to reserve zero fences.
Relevant stack trace:
[773.183188] ------------[ cut here ]------------
[773.183199] UBSAN: shift-out-of-bounds in ../include/linux/log2.h:57:13
[773.183241] shift exponent 64 is too large for 64-bit type 'long unsigned int'
[773.183254] CPU: 2 PID: 1816 Comm: xe_evict Tainted: G U 6.8.0-rc3-xe #1
[773.183256] Hardware name: ASUS System Product Name/PRIME Z690-P D4, BIOS 2014 10/14/2022
[773.183257] Call Trace:
[773.183258] <TASK>
[773.183260] dump_stack_lvl+0xaf/0xd0
[773.183266] dump_stack+0x10/0x20
[773.183283] ubsan_epilogue+0x9/0x40
[773.183286] __ubsan_handle_shift_out_of_bounds+0x10f/0x170
[773.183293] dma_resv_reserve_fences.cold+0x2b/0x48
[773.183295] ? ww_mutex_lock+0x3c/0x110
[773.183301] drm_exec_prepare_obj+0x45/0x60 [drm_exec]
[773.183313] xe_vm_prepare_vma+0x33/0x70 [xe]
[773.183375] xe_vma_destroy_unlocked+0x55/0xa0 [xe]
[773.183427] xe_vm_close_and_put+0x526/0x940 [xe]
Fixes: 2714d50936 ("drm/xe: Convert pagefaulting code to use drm_exec")
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240208132115.3132-1-thomas.hellstrom@linux.intel.com
(cherry picked from commit eb538b5574)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Test runners on debug kernels occasionally fail with:
# # RUN tls_err.13_aes_gcm.poll_partial_rec_async ...
# # tls.c:1883:poll_partial_rec_async:Expected poll(&pfd, 1, 5) (0) == 1 (1)
# # tls.c:1870:poll_partial_rec_async:Expected status (256) == 0 (0)
# # poll_partial_rec_async: Test failed at step #17
# # FAIL tls_err.13_aes_gcm.poll_partial_rec_async
# not ok 699 tls_err.13_aes_gcm.poll_partial_rec_async
# # FAILED: 698 / 699 tests passed.
This points to the second poll() in the test which is expected
to wait for the sender to send the rest of the data.
Apparently under some conditions that doesn't happen within 5ms,
bump the timeout to 20ms.
Fixes: 23fcb62bc1 ("selftests: tls: add tests for poll behavior")
Link: https://lore.kernel.org/r/20240213142055.395564-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Johannes Berg says:
====================
Valentine's day edition, with just few fixes because
that's how we love it ;-)
iwlwifi:
- correct A3 in A-MSDUs
- fix crash when operating as AP and running out of station
slots to use
- clear link ID to correct some later checks against it
- fix error codes in SAR table loading
- fix error path in PPAG table read
mac80211:
- reload a pointer after SKB may have changed
(only in certain monitor inject mode scenarios)
* tag 'wireless-2024-02-14' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
wifi: iwlwifi: mvm: fix a crash when we run out of stations
wifi: iwlwifi: uninitialized variable in iwl_acpi_get_ppag_table()
wifi: iwlwifi: Fix some error codes
wifi: iwlwifi: clear link_id in time_event
wifi: iwlwifi: mvm: use correct address 3 in A-MSDU
wifi: mac80211: reload info pointer in ieee80211_tx_dequeue()
====================
Link: https://lore.kernel.org/r/20240214184326.132813-3-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If we hit CQ ring overflow when attempting to post a multishot accept
completion, we don't properly save the result or return code. This
results in losing the accepted fd value.
Instead, we return the result from the poll operation that triggered
the accept retry. This is generally POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND
which is 0xc3, or 195, which looks like a valid file descriptor, but it
really has no connection to that.
Handle this like we do for other multishot completions - assign the
result, and return IOU_STOP_MULTISHOT to cancel any further completions
from this request when overflow is hit. This preserves the result, as we
should, and tells the application that the request needs to be re-armed.
Cc: stable@vger.kernel.org
Fixes: 515e269612 ("io_uring: revert "io_uring fix multishot accept ordering"")
Link: https://github.com/axboe/liburing/issues/1062
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull MIPS fixes from Thomas Bogendoerfer:
- Fix for broken ipv6 checksums
- Fix handling of exceptions in delay slots
* tag 'mips-fixes_6.8_2' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
mm/memory: Use exception ip to search exception tables
MIPS: Clear Cause.BD in instruction_pointer_set
ptrace: Introduce exception_ip arch hook
MIPS: Add 'memory' clobber to csum_ipv6_magic() inline assembler
Pull landlock test fixes from Mickaël Salaün:
"Fix build issues for tests, and improve test compatibility"
* tag 'landlock-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux:
selftests/landlock: Fix capability for net_test
selftests/landlock: Fix fs_test build with old libc
selftests/landlock: Fix net_test build with old libc
Pull btrfs fixes from David Sterba:
"A few regular fixes and one fix for space reservation regression since
6.7 that users have been reporting:
- fix over-reservation of metadata chunks due to not keeping proper
balance between global block reserve and delayed refs reserve; in
practice this leaves behind empty metadata block groups, the
workaround is to reclaim them by using the '-musage=1' balance
filter
- other space reservation fixes:
- do not delete unused block group if it may be used soon
- do not reserve space for checksums for NOCOW files
- fix extent map assertion failure when writing out free space inode
- reject encoded write if inode has nodatasum flag set
- fix chunk map leak when loading block group zone info"
* tag 'for-6.8-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: don't refill whole delayed refs block reserve when starting transaction
btrfs: zoned: fix chunk map leak when loading block group zone info
btrfs: reject encoded write if inode has nodatasum flag set
btrfs: don't reserve space for checksums when writing to nocow files
btrfs: add new unused block groups to the list of unused block groups
btrfs: do not delete unused block group if it may be used soon
btrfs: add and use helper to check if block group is used
btrfs: don't drop extent_map for free space inode on write error
Pull KUnit fix from Shuah Khan:
"One important fix to unregister kunit_bus when KUnit module is
unloaded.
Not doing so causes an error when KUnit module tries to re-register
the bus when it gets reloaded"
* tag 'linux_kselftest-kunit-fixes-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
kunit: device: Unregister the kunit_bus on shutdown
Commit 8f84780b84 ("netfilter: flowtable: allow unidirectional rules")
made unidirectional flow offload possible, while completely ignoring (and
breaking) bidirectional flow offload for nftables.
Add the missing flag that was left out as an exercise for the reader :)
Cc: Vlad Buslov <vladbu@nvidia.com>
Fixes: 8f84780b84 ("netfilter: flowtable: allow unidirectional rules")
Reported-by: Daniel Golle <daniel@makrotopia.org>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
When a DNAT rule is configured via iptables with different port ranges,
iptables -t nat -A PREROUTING -p tcp -d 10.0.0.2 -m tcp --dport 32000:32010
-j DNAT --to-destination 192.168.0.10:21000-21010
we seem to be DNATing to some random port on the LAN side. While this is
expected if --random is passed to the iptables command, it is not
expected without passing --random. The expected behavior (and the
observed behavior prior to the commit in the "Fixes" tag) is the traffic
will be DNAT'd to 192.168.0.10:21000 unless there is a tuple collision
with that destination. In that case, we expect the traffic to be
instead DNAT'd to 192.168.0.10:21001, so on so forth until the end of
the range.
This patch intends to restore the behavior observed prior to the "Fixes"
tag.
Fixes: 6ed5943f87 ("netfilter: nat: remove l4 protocol port rovers")
Signed-off-by: Kyle Swenson <kyle.swenson@est.tech>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Add missing : in kdoc field names.
Fixes: 8683f4b995 ("nft_set_pipapo: Prepare for vectorised implementation: helpers")
Reported-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
get_line() does not trim the leading spaces, but the
parse_source_files() expects to get lines with source files paths where
the first space occurs after the file path.
Fixes: 70f30cfe5b ("modpost: use read_text_file() and get_line() for reading text files")
Signed-off-by: Radek Krejci <radek.krejci@oracle.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
According to the Intel datasheets, software must reset the block
buffer index twice for block process call transactions: once before
writing the outgoing data to the buffer, and once again before
reading the incoming data from the buffer.
The driver is currently missing the second reset, causing the wrong
portion of the block buffer to be read.
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Reported-by: Piotr Zakowski <piotr.zakowski@intel.com>
Closes: https://lore.kernel.org/linux-i2c/20240213120553.7b0ab120@endymion.delvare/
Fixes: 315cd67c94 ("i2c: i801: Add Block Write-Block Read Process Call support")
Reviewed-by: Alexander Sverdlin <alexander.sverdlin@gmail.com>
Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
On powerpc, it is possible to compile test both the new apple (arm) and
old pasemi (powerpc) drivers for the i2c hardware at the same time,
which leads to a warning about linking the same object file twice:
scripts/Makefile.build:244: drivers/i2c/busses/Makefile: i2c-pasemi-core.o is added to multiple modules: i2c-apple i2c-pasemi
Rework the driver to have an explicit helper module, letting Kbuild
take care of whether this should be built-in or a loadable driver.
Fixes: 9bc5f4f660 ("i2c: pasemi: Split pci driver to its own file")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Sven Peter <sven@svenpeter.dev>
Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
security_setselfattr() has an integer overflow bug that leads to
out-of-bounds access when userspace provides bogus input:
`lctx->ctx_len + sizeof(*lctx)` is checked against `lctx->len` (and,
redundantly, also against `size`), but there are no checks on
`lctx->ctx_len`.
Therefore, userspace can provide an `lsm_ctx` with `->ctx_len` set to a
value between `-sizeof(struct lsm_ctx)` and -1, and this bogus `->ctx_len`
will then be passed to an LSM module as a buffer length, causing LSM
modules to perform out-of-bounds accesses.
The following reproducer will demonstrate this under ASAN (if AppArmor is
loaded as an LSM):
```
struct lsm_ctx {
uint64_t id;
uint64_t flags;
uint64_t len;
uint64_t ctx_len;
char ctx[];
};
int main(void) {
size_t size = sizeof(struct lsm_ctx);
struct lsm_ctx *ctx = malloc(size);
ctx->id = 104/*LSM_ID_APPARMOR*/;
ctx->flags = 0;
ctx->len = size;
ctx->ctx_len = -sizeof(struct lsm_ctx);
syscall(
460/*__NR_lsm_set_self_attr*/,
/*attr=*/ 100/*LSM_ATTR_CURRENT*/,
/*ctx=*/ ctx,
/*size=*/ size,
/*flags=*/ 0
);
}
```
Fixes: a04a119808 ("LSM: syscalls for current process attributes")
Signed-off-by: Jann Horn <jannh@google.com>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
[PM: subj tweak, removed ref to ASAN splat that isn't included]
Signed-off-by: Paul Moore <paul@paul-moore.com>
It has been observed that some USB/UAS devices return generic properties
hardcoded in firmware for mode pages for a period of time after a device
has been discovered. The reported properties are either garbage or they do
not accurately reflect the characteristics of the physical storage device
attached in the case of a bridge.
Prior to commit 1e029397d1 ("scsi: sd: Reorganize DIF/DIX code to
avoid calling revalidate twice") we would call revalidate several
times during device discovery. As a result, incorrect values would
eventually get replaced with ones accurately describing the attached
storage. When we did away with the redundant revalidate pass, several
cases were reported where devices reported nonsensical values or would
end up in write-protected state.
An initial attempt at addressing this issue involved introducing a
delayed second revalidate invocation. However, this approach still
left some devices reporting incorrect characteristics.
Tasos Sahanidis debugged the problem further and identified that
introducing a READ operation prior to MODE SENSE fixed the problem and that
it wasn't a timing issue. Issuing a READ appears to cause the devices to
update their state to reflect the actual properties of the storage
media. Device properties like vendor, model, and storage capacity appear to
be correctly reported from the get-go. It is unclear why these devices
defer populating the remaining characteristics.
Match the behavior of a well known commercial operating system and
trigger a READ operation prior to querying device characteristics to
force the device to populate the mode pages.
The additional READ is triggered by a flag set in the USB storage and
UAS drivers. We avoid issuing the READ for other transport classes
since some storage devices identify Linux through our particular
discovery command sequence.
Link: https://lore.kernel.org/r/20240213143306.2194237-1-martin.petersen@oracle.com
Fixes: 1e029397d1 ("scsi: sd: Reorganize DIF/DIX code to avoid calling revalidate twice")
Cc: stable@vger.kernel.org
Reported-by: Tasos Sahanidis <tasos@tasossah.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Tested-by: Tasos Sahanidis <tasos@tasossah.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
PHY_CONTROL register works as defined in the IEEE 802.3 specification
(IEEE 802.3-2008 22.2.4.1). Tidy up the temporary workaround.
User impact: PHY can now be powered down when the ethernet link is down.
Testing hints: ip link set down <device> (or just disconnect the
ethernet cable).
Oldest tested NVM version is: 1045:740.
Fixes: 5586838fe9 ("igc: Add code for PHY support")
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Commit 1978d3ead8 ("intel: fix string truncation warnings")
fixes '-Wformat-truncation=' warnings in igb_main.c by using kasprintf.
drivers/net/ethernet/intel/igb/igb_main.c:3092:53: warning:‘%d’ directive output may be truncated writing between 1 and 5 bytes into a region of size between 1 and 13 [-Wformat-truncation=]
3092 | "%d.%d, 0x%08x, %d.%d.%d",
| ^~
drivers/net/ethernet/intel/igb/igb_main.c:3092:34: note:directive argument in the range [0, 65535]
3092 | "%d.%d, 0x%08x, %d.%d.%d",
| ^~~~~~~~~~~~~~~~~~~~~~~~~
drivers/net/ethernet/intel/igb/igb_main.c:3092:34: note:directive argument in the range [0, 65535]
drivers/net/ethernet/intel/igb/igb_main.c:3090:25: note:‘snprintf’ output between 23 and 43 bytes into a destination of size 32
kasprintf() returns a pointer to dynamically allocated memory
which can be NULL upon failure.
Fix this warning by using a larger space for adapter->fw_version,
and then fall back and continue to use snprintf.
Fixes: 1978d3ead8 ("intel: fix string truncation warnings")
Signed-off-by: Kunwu Chan <chentao@kylinos.cn>
Cc: Kunwu Chan <kunwu.chan@hotmail.com>
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
KVM selftests fixes/cleanups (and one KVM x86 cleanup) for 6.8:
- Remove redundant newlines from error messages.
- Delete an unused variable in the AMX test (which causes build failures when
compiling with -Werror).
- Fail instead of skipping tests if open(), e.g. of /dev/kvm, fails with an
error code other than ENOENT (a Hyper-V selftest bug resulted in an EMFILE,
and the test eventually got skipped).
- Fix TSC related bugs in several Hyper-V selftests.
- Fix a bug in the dirty ring logging test where a sem_post() could be left
pending across multiple runs, resulting in incorrect synchronization between
the main thread and the vCPU worker thread.
- Relax the dirty log split test's assertions on 4KiB mappings to fix false
positives due to the number of mappings for memslot 0 (used for code and
data that is NOT being dirty logged) changing, e.g. due to NUMA balancing.
- Have KVM's gtod_is_based_on_tsc() return "bool" instead of an "int" (the
function generates boolean values, and all callers treat the return value as
a bool).
KVM x86 fixes for 6.8:
- Make a KVM_REQ_NMI request while handling KVM_SET_VCPU_EVENTS if and only
if the incoming events->nmi.pending is non-zero. If the target vCPU is in
the UNITIALIZED state, the spurious request will result in KVM exiting to
userspace, which in turn causes QEMU to constantly acquire and release
QEMU's global mutex, to the point where the BSP is unable to make forward
progress.
- Fix a type (u8 versus u64) goof that results in pmu->fixed_ctr_ctrl being
incorrectly truncated, and ultimately causes KVM to think a fixed counter
has already been disabled (KVM thinks the old value is '0').
- Fix a stack leak in KVM_GET_MSRS where a failed MSR read from userspace
that is ultimately ignored due to ignore_msrs=true doesn't zero the output
as intended.
The bpf_doc script refers to the GPL as the "GNU Privacy License".
I strongly suspect that the author wanted to refer to the GNU General
Public License, under which the Linux kernel is released, as, to the
best of my knowledge, there is no license named "GNU Privacy License".
This patch corrects the license name in the script accordingly.
Fixes: 56a092c895 ("bpf: add script and prepare bpf.h for new helpers documentation")
Signed-off-by: Gianmarco Lusvardi <glusvardi@posteo.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20240213230544.930018-3-glusvardi@posteo.net
Creating sysfs files for all Cells caused a boot failure for linux-6.8-rc1 on
Apple M1, which (in downstream dts files) has multiple nvmem cells that use the
same byte address. This causes the device probe to fail with
[ 0.605336] sysfs: cannot create duplicate filename '/devices/platform/soc@200000000/2922bc000.efuse/apple_efuses_nvmem0/cells/efuse@a10'
[ 0.605347] CPU: 7 PID: 1 Comm: swapper/0 Tainted: G S 6.8.0-rc1-arnd-5+ #133
[ 0.605355] Hardware name: Apple Mac Studio (M1 Ultra, 2022) (DT)
[ 0.605362] Call trace:
[ 0.605365] show_stack+0x18/0x2c
[ 0.605374] dump_stack_lvl+0x60/0x80
[ 0.605383] dump_stack+0x18/0x24
[ 0.605388] sysfs_warn_dup+0x64/0x80
[ 0.605395] sysfs_add_bin_file_mode_ns+0xb0/0xd4
[ 0.605402] internal_create_group+0x268/0x404
[ 0.605409] sysfs_create_groups+0x38/0x94
[ 0.605415] devm_device_add_groups+0x50/0x94
[ 0.605572] nvmem_populate_sysfs_cells+0x180/0x1b0
[ 0.605682] nvmem_register+0x38c/0x470
[ 0.605789] devm_nvmem_register+0x1c/0x6c
[ 0.605895] apple_efuses_probe+0xe4/0x120
[ 0.606000] platform_probe+0xa8/0xd0
As far as I can tell, this is a problem for any device with multiple cells on
different bits of the same address. Avoid the issue by changing the file name
to include the first bit number.
Fixes: 0331c61194 ("nvmem: core: Expose cells through sysfs")
Link: https://github.com/AsahiLinux/linux/blob/bd0a1a7d4/arch/arm64/boot/dts/apple/t600x-dieX.dtsi#L156
Cc: <regressions@lists.linux.dev>
Cc: Miquel Raynal <miquel.raynal@bootlin.com>
Cc: Rafał Miłecki <rafal@milecki.pl>
Cc: Chen-Yu Tsai <wenst@chromium.org>
Cc: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: <asahi@lists.linux.dev>
Cc: Sven Peter <sven@svenpeter.dev>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Reviewed-by: Eric Curtin <ecurtin@redhat.com>
Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/r/20240209163454.98051-1-srinivas.kandagatla@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jonathan writes:
IIO: 1st set of fixes for the 6.8 cycle
Usual mixed bag of issues introduced this cycle and fixes for long term
issues that have been identified recently + one case where I messed up
a merge resolution and dropped the build file changes.
Most important is the userspace ABI fix for the iio_modifier enum
where we accidentally added new entries in the middle rather than at
the end.
IIO Core
- Close a memory leak in an error path.
- Move LIGHT_UVA and LIGHT_UVB definitions to end of the iio_modifier
enum to avoid breaking older userspace. (not yet in a released kernel
thankfully).
adi,adis
- Fix a DMA buffer alignment issue that was missing in series that fixed
these across IIO.
adi,ad-sigma-delta
- Fix a DMA buffer alignment issue that was missing in series that fixed
these across IIO.
adi,ad4130
- Zero init remaining fields of clock init data.
- Only set GPIO control bits on pins that aren't in use for anything else.
adi,ad5933
- Fix an old bug due to type mismatch. This is a rare device so good to
get some new test coverage.
adi,ad7091r
- Use right variable for an error return code.
bosch,bma400
- Add missing CONFIG_REGMAP_I2C dependency.
bosch,bmp280:
- Add missing bmp085 ID to the SPI table to avoid mismatch with the
of_device_id table.
hid-sensors:
- Avoid returning an error for timestamp read back that succeeds.
pni,rm3100
- Check value read from RM31000_REG_TMRC register is valid before using
it. Hardening to avoid a real world issue seen on some faulty hardware.
st,st-sensors
- Fix a DMA buffer alignment issue that was missing in series that fixed
these across IIO.
ti,hdc3020
- Add missing Kconfig and Makefile entrees accidentally dropped when patches
were applied.
- Fix wrong temperature offset (negated)
* tag 'iio-fixes-for-6.8a' of http://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio:
iio: adc: ad4130: only set GPIO_CTRL if pin is unused
iio: adc: ad4130: zero-initialize clock init data
iio: accel: bma400: Fix a compilation problem
iio: commom: st_sensors: ensure proper DMA alignment
iio: hid-sensor-als: Return 0 for HID_USAGE_SENSOR_TIME_TIMESTAMP
iio: move LIGHT_UVA and LIGHT_UVB to the end of iio_modifier
staging: iio: ad5933: fix type mismatch regression
iio: humidity: hdc3020: fix temperature offset
iio: adc: ad7091r8: Fix error code in ad7091r8_gpio_setup()
iio: adc: ad_sigma_delta: ensure proper DMA alignment
iio: imu: adis: ensure proper DMA alignment
iio: humidity: hdc3020: Add Makefile, Kconfig and MAINTAINERS entry
iio: imu: bno055: serdev requires REGMAP
iio: magnetometer: rm3100: add boundary check for the value read from RM3100_REG_TMRC
iio: pressure: bmp280: Add missing bmp085 to SPI id table
iio: core: fix memleak in iio_device_register_sysfs
The following 3 locks would race against each other, causing the
deadlock situation in the Syzbot bug report:
- j1939_socks_lock
- active_session_list_lock
- sk_session_queue_lock
A reasonable fix is to change j1939_socks_lock to an rwlock, since in
the rare situations where a write lock is required for the linked list
that j1939_socks_lock is protecting, the code does not attempt to
acquire any more locks. This would break the circular lock dependency,
where, for example, the current thread already locks j1939_socks_lock
and attempts to acquire sk_session_queue_lock, and at the same time,
another thread attempts to acquire j1939_socks_lock while holding
sk_session_queue_lock.
NOTE: This patch along does not fix the unregister_netdevice bug
reported by Syzbot; instead, it solves a deadlock situation to prepare
for one or more further patches to actually fix the Syzbot bug, which
appears to be a reference counting problem within the j1939 codebase.
Reported-by: <syzbot+1591462f226d9cbf0564@syzkaller.appspotmail.com>
Signed-off-by: Ziqi Zhao <astrajoan@yahoo.com>
Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de>
Acked-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://lore.kernel.org/all/20230721162226.8639-1-astrajoan@yahoo.com
[mkl: remove unrelated newline change]
Cc: stable@vger.kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
clang-16 warns about the mismatched prototypes for the devm_* callbacks:
drivers/net/ethernet/ti/cpts.c:691:12: error: cast from 'void (*)(struct clk_hw *)' to 'void (*)(void *)' converts to incompatible function type [-Werror,-Wcast-function-type-strict]
691 | (void(*)(void *))clk_hw_unregister_mux,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/device.h:406:34: note: expanded from macro 'devm_add_action_or_reset'
406 | __devm_add_action_or_reset(dev, action, data, #action)
| ^~~~~~
drivers/net/ethernet/ti/cpts.c:703:12: error: cast from 'void (*)(struct device_node *)' to 'void (*)(void *)' converts to incompatible function type [-Werror,-Wcast-function-type-strict]
703 | (void(*)(void *))of_clk_del_provider,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/device.h:406:34: note: expanded from macro 'devm_add_action_or_reset'
406 | __devm_add_action_or_reset(dev, action, data, #action)
Use separate helper functions for this instead, using the expected prototypes
with a void* argument.
Fixes: a3047a81ba ("net: ethernet: ti: cpts: add support for ext rftclk selection")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
clang-16 warns about a function pointer cast:
drivers/net/ethernet/brocade/bna/bnad.c:1995:4: error: cast from 'void (*)(struct delayed_work *)' to 'work_func_t' (aka 'void (*)(struct work_struct *)') converts to incompatible function type [-Werror,-Wcast-function-type-strict]
1995 | (work_func_t)bnad_tx_cleanup);
drivers/net/ethernet/brocade/bna/bnad.c:2252:4: error: cast from 'void (*)(void *)' to 'work_func_t' (aka 'void (*)(struct work_struct *)') converts to incompatible function type [-Werror,-Wcast-function-type-strict]
2252 | (work_func_t)(bnad_rx_cleanup));
The problem here is mixing up work_struct and delayed_work, which relies
the former being the first member of the latter.
Change the code to use consistent types here to address the warning and
make it more robust against workqueue interface changes.
Side note: the use of a delayed workqueue for cleaning up TX descriptors
is probably a bad idea since this introduces a noticeable delay. The
driver currently does not appear to use BQL, but if one wanted to add
that, this would have to be changed as well.
Fixes: 01b54b1451 ("bna: tx rx cleanup fix")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 67f562e3e1 ("net/smc: transfer fasync_list in case of fallback")
leaves the socket's fasync list pointer within a container socket as well.
When the latter is destroyed, '__sock_release()' warns about its non-empty
fasync list, which is a dangling pointer to previously freed fasync list
of an underlying TCP socket. Fix this spurious warning by nullifying
fasync list of a container socket.
Fixes: 67f562e3e1 ("net/smc: transfer fasync_list in case of fallback")
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2024-02-12 (i40e)
This series contains updates to i40e driver only.
Ivan Vecera corrects the looping value used while waiting for queues to
be disabled as well as an incorrect mask being used for DCB
configuration.
Maciej resolves an issue related to XDP traffic; removing a double call to
i40e_pf_rxq_wait() and accounting for XDP rings when stopping rings.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Turning on CONFIG_DMA_API_DEBUG_SG results in the following warning:
DMA-API: mmci-pl18x 48220000.mmc: cacheline tracking EEXIST,
overlapping mappings aren't supported
WARNING: CPU: 1 PID: 51 at kernel/dma/debug.c:568
add_dma_entry+0x234/0x2f4
Modules linked in:
CPU: 1 PID: 51 Comm: kworker/1:2 Not tainted 6.1.28 #1
Hardware name: STMicroelectronics STM32MP257F-EV1 Evaluation Board (DT)
Workqueue: events_freezable mmc_rescan
Call trace:
add_dma_entry+0x234/0x2f4
debug_dma_map_sg+0x198/0x350
__dma_map_sg_attrs+0xa0/0x110
dma_map_sg_attrs+0x10/0x2c
sdmmc_idma_prep_data+0x80/0xc0
mmci_prep_data+0x38/0x84
mmci_start_data+0x108/0x2dc
mmci_request+0xe4/0x190
__mmc_start_request+0x68/0x140
mmc_start_request+0x94/0xc0
mmc_wait_for_req+0x70/0x100
mmc_send_tuning+0x108/0x1ac
sdmmc_execute_tuning+0x14c/0x210
mmc_execute_tuning+0x48/0xec
mmc_sd_init_uhs_card.part.0+0x208/0x464
mmc_sd_init_card+0x318/0x89c
mmc_attach_sd+0xe4/0x180
mmc_rescan+0x244/0x320
DMA API debug brings to light leaking dma-mappings as dma_map_sg and
dma_unmap_sg are not correctly balanced.
If an error occurs in mmci_cmd_irq function, only mmci_dma_error
function is called and as this API is not managed on stm32 variant,
dma_unmap_sg is never called in this error path.
Signed-off-by: Christophe Kerello <christophe.kerello@foss.st.com>
Fixes: 46b723dd86 ("mmc: mmci: add stm32 sdmmc variant")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240207143951.938144-1-christophe.kerello@foss.st.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
In ata ata_dev_power_set_standby(), check that the target device is not
sleeping. If it is, there is no need to do anything.
Fixes: aa3998dbeb ("ata: libata-scsi: Disable scsi device manage_system_start_stop")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Niklas Cassel <cassel@kernel.org>
NPC transmit side mcam rules can use the pcifunc (in packet metadata
added by hardware) of transmitting device for mcam lookup similar to
the channel of receiving device at receive side.
The commit 18603683d7 ("octeontx2-af: Remove channel verification
while installing MCAM rules") removed the receive side channel
verification to save hardware MCAM filters while switching packets
across interfaces but missed removing transmit side checks.
This patch removes transmit side rules validation.
Fixes: 18603683d7 ("octeontx2-af: Remove channel verification while installing MCAM rules")
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
clang-16 notices that srpt_qp_event() gets called through an incompatible
pointer here:
drivers/infiniband/ulp/srpt/ib_srpt.c:1815:5: error: cast from 'void (*)(struct ib_event *, struct srpt_rdma_ch *)' to 'void (*)(struct ib_event *, void *)' converts to incompatible function type [-Werror,-Wcast-function-type-strict]
1815 | = (void(*)(struct ib_event *, void*))srpt_qp_event;
Change srpt_qp_event() to use the correct prototype and adjust the
argument inside of it.
Fixes: a42d985bd5 ("ib_srpt: Initial SRP Target merge for v3.3-rc1")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20240213100728.458348-1-arnd@kernel.org
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
The patch 51d9760799 ("ALSA: hda/realtek:
Add quirks for ASUS Zenbook 2022 Models") modified the entry 1043:1e2e
from "ASUS UM3402" to "ASUS UM6702RA/RC" and added another entry for
"ASUS UM3402" with 104e:1ee2.
The first entry was correct, while the new one corresponds to model
"ASUS UM6702RA/RC"
Fix the model names for both devices.
Fixes: 51d9760799 ("ALSA: hda/realtek: Add quirks for ASUS Zenbook 2022 Models")
Signed-off-by: Jean-Loïc Charroud <lagiraudiere+linux@free.fr>
Link: https://lore.kernel.org/r/1656546983.650349575.1707867732866.JavaMail.zimbra@free.fr
Signed-off-by: Takashi Iwai <tiwai@suse.de>
At W=2 dtc complains:
hifive-unmatched-a00.dts:120.10-238.4: Warning (interrupt_provider): /soc/i2c@10030000/pmic@58: Missing '#interrupt-cells' in interrupt provider
Add the missing property.
Reviewed-by: Samuel Holland <samuel.holland@sifive.com>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
powerVM hypervisor updates the VPA fields with stolen time data.
It currently reports enqueue_dispatch_tb and ready_enqueue_tb for
this purpose. In linux these two fields are used to report the stolen time.
The VPA fields are updated at the TB frequency. On powerPC its mostly
set at 512Mhz. Hence this needs a conversion to ns when reporting it
back as rest of the kernel timings are in ns. This conversion is already
handled in tb_to_ns function. So use that function to report accurate
stolen time.
Observed this issue and used an Capped Shared Processor LPAR(SPLPAR) to
simplify the experiments. In all these cases, 100% VP Load is run using
stress-ng workload. Values of stolen time is in percentages as reported
by mpstat. With the patch values are close to expected.
6.8.rc1 +Patch
12EC/12VP 0.0 0.0
12EC/24VP 25.7 50.2
12EC/36VP 37.3 69.2
12EC/48VP 38.5 78.3
Fixes: 0e8a631328 ("powerpc/pseries: Implement CONFIG_PARAVIRT_TIME_ACCOUNTING")
Cc: stable@vger.kernel.org # v6.1+
Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240213052635.231597-1-sshegde@linux.ibm.com
Michael reported that we are seeing an ftrace bug on bootup when KASAN
is enabled and we are using -fpatchable-function-entry:
ftrace: allocating 47780 entries in 18 pages
ftrace-powerpc: 0xc0000000020b3d5c: No module provided for non-kernel address
------------[ ftrace bug ]------------
ftrace faulted on modifying
[<c0000000020b3d5c>] 0xc0000000020b3d5c
Initializing ftrace call sites
ftrace record flags: 0
(0)
expected tramp: c00000000008cef4
------------[ cut here ]------------
WARNING: CPU: 0 PID: 0 at kernel/trace/ftrace.c:2180 ftrace_bug+0x3c0/0x424
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 6.5.0-rc3-00120-g0f71dcfb4aef #860
Hardware name: IBM pSeries (emulated by qemu) POWER9 (raw) 0x4e1202 0xf000005 of:SLOF,HEAD hv:linux,kvm pSeries
NIP: c0000000003aa81c LR: c0000000003aa818 CTR: 0000000000000000
REGS: c0000000033cfab0 TRAP: 0700 Not tainted (6.5.0-rc3-00120-g0f71dcfb4aef)
MSR: 8000000002021033 <SF,VEC,ME,IR,DR,RI,LE> CR: 28028240 XER: 00000000
CFAR: c0000000002781a8 IRQMASK: 3
...
NIP [c0000000003aa81c] ftrace_bug+0x3c0/0x424
LR [c0000000003aa818] ftrace_bug+0x3bc/0x424
Call Trace:
ftrace_bug+0x3bc/0x424 (unreliable)
ftrace_process_locs+0x5f4/0x8a0
ftrace_init+0xc0/0x1d0
start_kernel+0x1d8/0x484
With CONFIG_FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY=y and
CONFIG_KASAN=y, compiler emits nops in functions that it generates for
registering and unregistering global variables (unlike with -pg and
-mprofile-kernel where calls to _mcount() are not generated in those
functions). Those functions then end up in INIT_TEXT and EXIT_TEXT
respectively. We don't expect to see any profiled functions in
EXIT_TEXT, so ftrace_init_nop() assumes that all addresses that aren't
in the core kernel text belongs to a module. Since these functions do
not match that criteria, we see the above bug.
Address this by having ftrace ignore all locations in the text exit
sections of vmlinux.
Fixes: 0f71dcfb4a ("powerpc/ftrace: Add support for -fpatchable-function-entry")
Cc: stable@vger.kernel.org # v6.6+
Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Naveen N Rao <naveen@kernel.org>
Reviewed-by: Benjamin Gray <bgray@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240213175410.1091313-1-naveen@kernel.org
Commit e320a76db4 ("powerpc/cputable: Split cpu_specs[] out of
cputable.h") moved the cpu_specs to separate header files. Previously
PPC_FEATURE_BOOKE was enabled by CONFIG_PPC_BOOK3E_64. The definition in
cpu_specs_e500mc.h for PPC64 no longer enables PPC_FEATURE_BOOKE.
This breaks user space reading the ELF hwcaps and expect
PPC_FEATURE_BOOKE. Debugging an application with gdb is no longer
working on e5500/e6500 because the 64-bit detection relies on
PPC_FEATURE_BOOKE for Book-E.
Fixes: e320a76db4 ("powerpc/cputable: Split cpu_specs[] out of cputable.h")
Cc: stable@vger.kernel.org # v6.1+
Signed-off-by: David Engraf <david.engraf@sysgo.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240207092758.1058893-1-david.engraf@sysgo.com
KASAN is seen to increase stack usage, to the point that it was reported
to lead to stack overflow on some 32-bit machines (see link).
To avoid overflows the stack size was doubled for KASAN builds in
commit 3e8635fb2e ("powerpc/kasan: Force thread size increase with
KASAN").
However with a 32KB stack size to begin with, the doubling leads to a
64KB stack, which causes build errors:
arch/powerpc/kernel/switch.S:249: Error: operand out of range (0x000000000000fe50 is not between 0xffffffffffff8000 and 0x0000000000007fff)
Although the asm could be reworked, in practice a 32KB stack seems
sufficient even for KASAN builds - the additional usage seems to be in
the 2-3KB range for a 64-bit KASAN build.
So only increase the stack for KASAN if the stack size is < 32KB.
Fixes: 18f14afe28 ("powerpc/64s: Increase default stack size to 32KB")
Reported-by: Spoorthy <spoorthy@linux.ibm.com>
Reported-by: Benjamin Gray <bgray@linux.ibm.com>
Reviewed-by: Benjamin Gray <bgray@linux.ibm.com>
Link: https://lore.kernel.org/linuxppc-dev/bug-207129-206035@https.bugzilla.kernel.org%2F/
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240212064244.3924505-1-mpe@ellerman.id.au
When also downgrading, check_version_upgrade() could pick a new version
greater than the latest supported version.
Fixes:
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This prevents going emergency read only when the user has specified
replicas_required > replicas.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Remove superfluous initialization of status variable in
nvmet_execute_admin_connect() and nvmet_execute_io_connect(), since it
will get overwritten by nvmet_copy_from_sgl().
Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Initializing an eMMC that's connected via a 1-bit bus is current failing,
if the HW (DT) informs that 4-bit bus is supported. In fact this is a
regression, as we were earlier capable of falling back to 1-bit mode, when
switching to 4/8-bit bus failed. Therefore, let's restore the behaviour.
Log for Samsung eMMC 5.1 chip connected via 1bit bus (only D0 pin)
Before patch:
[134509.044225] mmc0: switch to bus width 4 failed
[134509.044509] mmc0: new high speed MMC card at address 0001
[134509.054594] mmcblk0: mmc0:0001 BGUF4R 29.1 GiB
[134509.281602] mmc0: switch to bus width 4 failed
[134509.282638] I/O error, dev mmcblk0, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2
[134509.282657] Buffer I/O error on dev mmcblk0, logical block 0, async page read
[134509.284598] I/O error, dev mmcblk0, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2
[134509.284602] Buffer I/O error on dev mmcblk0, logical block 0, async page read
[134509.284609] ldm_validate_partition_table(): Disk read failed.
[134509.286495] I/O error, dev mmcblk0, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2
[134509.286500] Buffer I/O error on dev mmcblk0, logical block 0, async page read
[134509.288303] I/O error, dev mmcblk0, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2
[134509.288308] Buffer I/O error on dev mmcblk0, logical block 0, async page read
[134509.289540] I/O error, dev mmcblk0, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2
[134509.289544] Buffer I/O error on dev mmcblk0, logical block 0, async page read
[134509.289553] mmcblk0: unable to read partition table
[134509.289728] mmcblk0boot0: mmc0:0001 BGUF4R 31.9 MiB
[134509.290283] mmcblk0boot1: mmc0:0001 BGUF4R 31.9 MiB
[134509.294577] I/O error, dev mmcblk0, sector 0 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 2
[134509.295835] I/O error, dev mmcblk0, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2
[134509.295841] Buffer I/O error on dev mmcblk0, logical block 0, async page read
After patch:
[134551.089613] mmc0: switch to bus width 4 failed
[134551.090377] mmc0: new high speed MMC card at address 0001
[134551.102271] mmcblk0: mmc0:0001 BGUF4R 29.1 GiB
[134551.113365] mmcblk0: p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 p12 p13 p14 p15 p16 p17 p18 p19 p20 p21
[134551.114262] mmcblk0boot0: mmc0:0001 BGUF4R 31.9 MiB
[134551.114925] mmcblk0boot1: mmc0:0001 BGUF4R 31.9 MiB
Fixes: 577fb13199 ("mmc: rework selection of bus speed mode")
Cc: stable@vger.kernel.org
Signed-off-by: Ivan Semenov <ivan@semenov.dev>
Link: https://lore.kernel.org/r/20240206172845.34316-1-ivan@semenov.dev
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
xsk_build_skb() allocates a page and adds it to the skb via
skb_add_rx_frag() and specifies 0 for truesize. This leads to a warning
in skb_add_rx_frag() with CONFIG_DEBUG_NET enabled because size is
larger than truesize.
Increasing truesize requires to add the same amount to socket's
sk_wmem_alloc counter in order not to underflow the counter during
release in the destructor (sock_wfree()).
Pass the size of the allocated page as truesize to skb_add_rx_frag().
Add this mount to socket's sk_wmem_alloc counter.
Fixes: cf24f5a5fe ("xsk: add support for AF_XDP multi-buffer on Tx path")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/bpf/20240202163221.2488589-1-bigeasy@linutronix.de
The ACPI in some SoundWire laptops has a spk-id-gpios property but
it points to the wrong Device node. This patch adds a workaround to
try to get the GPIO directly from the correct Device node.
If the attempt to get the GPIOs from the property fails, the workaround
looks for the SDCA node "AF01", which is where the GpioIo resource is
defined. If this exists, a spk-id-gpios mapping is added to that node
and then the GPIO is got from that node using the property.
Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Link: https://msgid.link/r/20240209111840.1543630-1-rf@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>
The px30 has two spi controllers with two chip-selects each.
The num-cs property is specified as the total number of chip
selects a controllers has and is used since 2020 to find uses
of chipselects outside that range in the Rockchip spi driver.
Without the property set, the default is 1, so spi devices
using the second chipselect will not be created.
Fixes: eb1262e3cc ("spi: spi-rockchip: use num-cs property and ctlr->enable_gpiods")
Signed-off-by: Heiko Stuebner <heiko.stuebner@cherry.de>
Reviewed-by: Quentin Schulz <quentin.schulz@theobroma-systems.com>
Link: https://lore.kernel.org/r/20240119101656.965744-1-heiko@sntech.de
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Paolo Abeni says:
====================
selftests: net: more pmtu.sh fixes
The mentioned test is still flaky, unusally enough in 'fast'
environments.
Patch 2/2 [try to] address the existing issues, while patch 1/2
introduces more strict tests for the existing net helpers, to hopefully
prevent future pain.
====================
Link: https://lore.kernel.org/r/cover.1707731086.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The netdev CI is reporting failures for the pmtu test:
[ 115.929264] br0: port 2(vxlan_a) entered forwarding state
# 2024/02/08 17:33:22 socat[7871] E bind(7, {AF=10 [0000:0000:0000:0000:0000:0000:0000:0000]:50000}, 28): Address already in use
# 2024/02/08 17:33:22 socat[7877] E write(7, 0x5598fb6ff000, 8192): Connection refused
# TEST: IPv6, bridged vxlan4: PMTU exceptions [FAIL]
# File size 0 mismatches exepcted value in locally bridged vxlan test
The root cause is apparently a socket created by a previous iteration
of the relevant loop still lasting in LAST_ACK state.
Note that even the file size check is racy, the receiver process dumping
the file could still be running in background
Allow the listener to bound on the same local port via SO_REUSEADDR and
collect file output file size only after the listener completion.
Fixes: 136a1b434b ("selftests: net: test vxlan pmtu exceptions with tcp")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/4f51c11a1ce7ca7a4dabd926cffff63dadac9ba1.1707731086.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The helper waiting for a listener port can match any socket whose
hexadecimal representation of source or destination addresses
matches that of the given port.
Additionally, any socket state is accepted.
All the above can let the helper return successfully before the
relevant listener is actually ready, with unexpected results.
So far I could not find any related failure in the netdev CI, but
the next patch is going to make the critical event more easily
reproducible.
Address the issue matching the port hex only vs the relevant socket
field and additionally checking the socket state for TCP sockets.
Fixes: 3bdd9fd29c ("selftests/net: synchronize udpgro tests' tx and rx connection")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/192b3dbc443d953be32991d1b0ca432bd4c65008.1707731086.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The mentioned test is failing in slow environments:
# SO_TXTIME ipv4 clock monotonic
# ./so_txtime: recv: timeout: Resource temporarily unavailable
not ok 1 selftests: net: so_txtime.sh # exit=1
Tuning the tolerance in the test binary is error-prone and doomed
to failures is slow-enough environment.
Just resort to suppress any error in such cases. Note to suppress
them we need first to refactor a bit the code moving it to explicit
error handling.
Fixes: af5136f950 ("selftests/net: SO_TXTIME with ETF and FQ")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/2142d9ed4b5c5aa07dd1b455779625d91b175373.1707730902.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The gro self-tests sends the packets to be aggregated with
multiple write operations.
When running is slow environment, it's hard to guarantee that
the GRO engine will wait for the last packet in an intended
train.
The above causes almost deterministic failures in our CI for
the 'large' test-case.
Address the issue explicitly ignoring failures for such case
in slow environments (KSFT_MACHINE_SLOW==true).
Fixes: 7d1575014a ("selftests/net: GRO coalesce test")
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/97d3ba83f5a2bfeb36f6bc0fb76724eb3dafb608.1707729403.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Since commit 28270e25c6 ("btrfs: always reserve space for delayed refs
when starting transaction") we started not only to reserve metadata space
for the delayed refs a caller of btrfs_start_transaction() might generate
but also to try to fully refill the delayed refs block reserve, because
there are several case where we generate delayed refs and haven't reserved
space for them, relying on the global block reserve. Relying too much on
the global block reserve is not always safe, and can result in hitting
-ENOSPC during transaction commits or worst, in rare cases, being unable
to mount a filesystem that needs to do orphan cleanup or anything that
requires modifying the filesystem during mount, and has no more
unallocated space and the metadata space is nearly full. This was
explained in detail in that commit's change log.
However the gap between the reserved amount and the size of the delayed
refs block reserve can be huge, so attempting to reserve space for such
a gap can result in allocating many metadata block groups that end up
not being used. After a recent patch, with the subject:
"btrfs: add new unused block groups to the list of unused block groups"
We started to add new block groups that are unused to the list of unused
block groups, to avoid having them around for a very long time in case
they are never used, because a block group is only added to the list of
unused block groups when we deallocate the last extent or when mounting
the filesystem and the block group has 0 bytes used. This is not a problem
introduced by the commit mentioned earlier, it always existed as our
metadata space reservations are, most of the time, pessimistic and end up
not using all the space they reserved, so we can occasionally end up with
one or two unused metadata block groups for a long period. However after
that commit mentioned earlier, we are just more pessimistic in the
metadata space reservations when starting a transaction and therefore the
issue is more likely to happen.
This however is not always enough because we might create unused metadata
block groups when reserving metadata space at a high rate if there's
always a gap in the delayed refs block reserve and the cleaner kthread
isn't triggered often enough or is busy with other work (running delayed
iputs, cleaning deleted roots, etc), not to mention the block group's
allocated space is only usable for a new block group after the transaction
used to remove it is committed.
A user reported that he's getting a lot of allocated metadata block groups
but the usage percentage of metadata space was very low compared to the
total allocated space, specially after running a series of block group
relocations.
So for now stop trying to refill the gap in the delayed refs block reserve
and reserve space only for the delayed refs we are expected to generate
when starting a transaction.
CC: stable@vger.kernel.org # 6.7+
Reported-by: Ivan Shapovalov <intelfx@intelfx.name>
Link: https://lore.kernel.org/linux-btrfs/9cdbf0ca9cdda1b4c84e15e548af7d7f9f926382.camel@intelfx.name/
Link: https://lore.kernel.org/linux-btrfs/CAL3q7H6802ayLHUJFztzZAVzBLJAGdFx=6FHNNy87+obZXXZpQ@mail.gmail.com/
Tested-by: Ivan Shapovalov <intelfx@intelfx.name>
Reported-by: Heddxh <g311571057@gmail.com>
Link: https://lore.kernel.org/linux-btrfs/CAE93xANEby6RezOD=zcofENYZOT-wpYygJyauyUAZkLv6XVFOA@mail.gmail.com/
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
At btrfs_load_block_group_zone_info() we never drop a reference on the
chunk map we have looked up, therefore leaking a reference on it. So
add the missing btrfs_free_chunk_map() at the end of the function.
Fixes: 7dc66abb5a ("btrfs: use a dedicated data structure for chunk maps")
Reported-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Tested-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Currently we allow an encoded write against inodes that have the NODATASUM
flag set, either because they are NOCOW files or they were created while
the filesystem was mounted with "-o nodatasum". This results in having
compressed extents without corresponding checksums, which is a filesystem
inconsistency reported by 'btrfs check'.
For example, running btrfs/281 with MOUNT_OPTIONS="-o nodatacow" triggers
this and 'btrfs check' errors out with:
[1/7] checking root items
[2/7] checking extents
[3/7] checking free space tree
[4/7] checking fs roots
root 256 inode 257 errors 1040, bad file extent, some csum missing
root 256 inode 258 errors 1040, bad file extent, some csum missing
ERROR: errors found in fs roots
(...)
So reject encoded writes if the target inode has NODATASUM set.
CC: stable@vger.kernel.org # 6.1+
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Currently when doing a write to a file we always reserve metadata space
for inserting data checksums. However we don't need to do it if we have
a nodatacow file (-o nodatacow mount option or chattr +C) or if checksums
are disabled (-o nodatasum mount option), as in that case we are only
adding unnecessary pressure to metadata reservations.
For example on x86_64, with the default node size of 16K, a 4K buffered
write into a nodatacow file is reserving 655360 bytes of metadata space,
as it's accounting for checksums. After this change, which stops reserving
space for checksums if we have a nodatacow file or checksums are disabled,
we only need to reserve 393216 bytes of metadata.
CC: stable@vger.kernel.org # 6.1+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Merge series from Peter Ujfalusi <peter.ujfalusi@linux.intel.com>:
Hi,
Align the IPC4 firmware path/name and the topology path to the documentation:
default_fw_path: intel/sof-ipc4/{platform_name}
default_lib_path: intel/sof-ipc4-lib/{platform_name}
default_tplg_path: intel/sof-ipc4-tplg
default_fw_filename: sof-{platform_name}.ri
Tiger Lake and Lunar Lake support is not yet available via the official
firmware release, the paths can be changed now to avoid misalignment in the
future.
Regards,
Peter
---
Peter Ujfalusi (2):
ASoC: SOF: Intel: pci-tgl: Change the default paths and firmware names
ASoC: SOF: Intel: pci-lnl: Change the topology path to
intel/sof-ipc4-tplg
sound/soc/sof/intel/pci-lnl.c | 2 +-
sound/soc/sof/intel/pci-tgl.c | 64 +++++++++++++++++------------------
2 files changed, 33 insertions(+), 33 deletions(-)
--
2.43.0
Commit a8b9cf62ad ("ftrace: Fix DIRECT_CALLS to use SAVE_REGS by
default") attempted to fix an issue with direct trampolines on x86, see
its description for details. However, it wrongly referenced the
HAVE_DYNAMIC_FTRACE_WITH_REGS config option and the problem is still
present.
Add the missing "CONFIG_" prefix for the logic to work as intended.
Link: https://lore.kernel.org/linux-trace-kernel/20240213132434.22537-1-petr.pavlu@suse.com
Fixes: a8b9cf62ad ("ftrace: Fix DIRECT_CALLS to use SAVE_REGS by default")
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
NVM Express TP4167 provides a way for controllers to report a relaxed
execution constraint. Specifically, it notifies of exclusivity for IO
vs. admin commands instead of grouping these together. If set, then we
don't need to freeze IO in order to execute that admin command. The
freezing distrupts IO processes, so it's nice to avoid that if the
controller tells us it's not necessary.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Underscores should not be used in node names (dtc with W=2 warns about
them), so replace them with hyphens.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Pull tracing tooling fixes from Steven Rostedt:
"RTLA:
- rtla tools are exiting with a positive value when usage() is
called. Make them return 0 if the usage was called via -h/--help
- the -P priority sets the sched priority for rtla workload. When the
SCHED_OTHER scheduler is selected, it sets the rt_priority instead
of the nice parameter. Setting the nice value is the correct thing,
so fix it
- rtla is failing to compile with clang due to unsupported options
from gcc. Adjusting the compiler/linker options makes clang work
properly
- Remove the sched_getattr() unused function on utils.c
- Fixes for variable initialization and size, reported by clang
Verification:
- rv is failing to compile with clang due to unsupported options from
gcc. Adjusting the compiler/linker options makes clang work
properly
- Fix an uninitialized variable on in_kernel.c reported by clang"
* tag 'trace-tools-v6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tools/rtla: Exit with EXIT_SUCCESS when help is invoked
tools/rtla: Replace setting prio with nice for SCHED_OTHER
tools/rv: Fix curr_reactor uninitialized variable
tools/rv: Fix Makefile compiler options for clang
tools/rtla: Remove unused sched_getattr() function
tools/rtla: Fix clang warning about mount_point var size
tools/rtla: Fix uninitialized bucket/data->bucket_size warning
tools/rtla: Fix Makefile compiler options for clang
In nvmf_connect_io_queue(), if connect I/O command fails, we log the
error and continue for authentication. This overrides error captured
from __nvme_submit_sync_cmd(), causing wrong return value.
Add goto out_free_data after logging connect error to fix the issue.
Fixes: f50fff73d6 ("nvme: implement In-Band authentication")
Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
There was a change in the mxs-dma engine that uses a new custom flag.
The change was not applied to the mxs spi driver.
This results in chipselect being deasserted too early.
This fixes the chipselect problem by using the new flag in the mxs-spi
driver.
Fixes: ceeeb99cd8 ("dmaengine: mxs: rename custom flag")
Signed-off-by: Ralf Schlatterbeck <rsc@runtux.com>
Link: https://msgid.link/r/20240202115330.wxkbfmvd76sy3a6a@runtux.com
Signed-off-by: Mark Brown <broonie@kernel.org>
gcc-14 notices that the allocation with sizeof(void) on 32-bit architectures
is not enough for a 64-bit phys_addr_t:
drivers/firmware/efi/capsule-loader.c: In function 'efi_capsule_open':
drivers/firmware/efi/capsule-loader.c:295:24: error: allocation of insufficient size '4' for type 'phys_addr_t' {aka 'long long unsigned int'} with size '8' [-Werror=alloc-size]
295 | cap_info->phys = kzalloc(sizeof(void *), GFP_KERNEL);
| ^
Use the correct type instead here.
Fixes: f24c4d4780 ("efi/capsule-loader: Reinstate virtual capsule mapping")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
When the system is suspended while audio is active, the
sof_ipc4_pcm_hw_free() is invoked to reset the pipelines since during
suspend the DSP is turned off, streams will be re-started after resume.
If the firmware crashes during while audio is running (or when we reset
the stream before suspend) then the sof_ipc4_set_multi_pipeline_state()
will fail with IPC error and the state change is interrupted.
This will cause misalignment between the kernel and firmware state on next
DSP boot resulting errors returned by firmware for IPC messages, eventually
failing the audio resume.
On stream close the errors are ignored so the kernel state will be
corrected on the next DSP boot, so the second boot after the DSP panic.
If sof_ipc4_trigger_pipelines() is called from sof_ipc4_pcm_hw_free() then
state parameter is SOF_IPC4_PIPE_RESET and only in this case.
Treat a forced pipeline reset similarly to how we treat a pcm_free by
ignoring error on state sending to allow the kernel's state to be
consistent with the state the firmware will have after the next boot.
Link: https://github.com/thesofproject/sof/issues/8721
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Bard Liao <yung-chuan.liao@linux.intel.com>
Link: https://msgid.link/r/20240213115233.15716-1-peter.ujfalusi@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
clang-16 points out a mismatch in function types that was hidden
by a typecast:
sound/soc/qcom/qdsp6/q6apm-dai.c:355:38: error: cast from 'void (*)(uint32_t, uint32_t, uint32_t *, void *)' (aka 'void (*)(unsigned int, unsigned int, unsigned int *, void *)') to 'q6apm_cb' (aka 'void (*)(unsigned int, unsigned int, void *, void *)') converts to incompatible function type [-Werror,-Wcast-function-type-strict]
355 | prtd->graph = q6apm_graph_open(dev, (q6apm_cb)event_handler, prtd, graph_id);
sound/soc/qcom/qdsp6/q6apm-dai.c:499:38: error: cast from 'void (*)(uint32_t, uint32_t, uint32_t *, void *)' (aka 'void (*)(unsigned int, unsigned int, unsigned int *, void *)') to 'q6apm_cb' (aka 'void (*)(unsigned int, unsigned int, void *, void *)') converts to incompatible function type [-Werror,-Wcast-function-type-strict]
499 | prtd->graph = q6apm_graph_open(dev, (q6apm_cb)event_handler_compr, prtd, graph_id);
The only difference here is the 'payload' argument, which is not even
used in this function, so just fix its type and remove the cast.
Fixes: 88b60bf047 ("ASoC: q6dsp: q6apm-dai: Add open/free compress DAI callbacks")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://msgid.link/r/20240213101105.459402-1-arnd@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
The currently used paths and firmware name reflects the reference firmware
convention:
default_fw_path: intel/avs/{platform_name}
default_lib_path: intel/avs-lib/{platform_name}
default_tplg_path: intel/avs-tplg
default_fw_filename: dsp_basefw.bin
The SOF supports building the firmware for cAVS2.5 platforms using IPC4 and
it is the preferred IPC4 implementation to be used on these devices.
Change the paths and firmware names to reflect this:
default_fw_path: intel/sof-ipc4/{platform_name}
default_lib_path: intel/sof-ipc4-lib/{platform_name}
default_tplg_path: intel/sof-ipc4-tplg
default_fw_filename: sof-{platform_name}.ri
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com>
Reviewed-by: Bard Liao <yung-chuan.liao@linux.intel.com>
Reviewed-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Link: https://msgid.link/r/20240213080418.21256-2-peter.ujfalusi@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
When CONFIG_PTP_1588_CLOCK=m and CONFIG_TI_ICSSG_PRUETH=y, there are
kconfig dependency warnings and build errors referencing PTP functions.
Fix these by making TI_ICSSG_PRUETH depend on PTP_1588_CLOCK_OPTIONAL.
Fixes these build errors and warnings:
WARNING: unmet direct dependencies detected for TI_ICSS_IEP
Depends on [m]: NETDEVICES [=y] && ETHERNET [=y] && NET_VENDOR_TI [=y] && PTP_1588_CLOCK_OPTIONAL [=m] && TI_PRUSS [=y]
Selected by [y]:
- TI_ICSSG_PRUETH [=y] && NETDEVICES [=y] && ETHERNET [=y] && NET_VENDOR_TI [=y] && PRU_REMOTEPROC [=y] && ARCH_K3 [=y] && OF [=y] && TI_K3_UDMA_GLUE_LAYER [=y]
aarch64-linux-ld: drivers/net/ethernet/ti/icssg/icss_iep.o: in function `icss_iep_get_ptp_clock_idx':
icss_iep.c:(.text+0x1d4): undefined reference to `ptp_clock_index'
aarch64-linux-ld: drivers/net/ethernet/ti/icssg/icss_iep.o: in function `icss_iep_exit':
icss_iep.c:(.text+0xde8): undefined reference to `ptp_clock_unregister'
aarch64-linux-ld: drivers/net/ethernet/ti/icssg/icss_iep.o: in function `icss_iep_init':
icss_iep.c:(.text+0x176c): undefined reference to `ptp_clock_register'
Fixes: 186734c158 ("net: ti: icssg-prueth: add packet timestamping and ptp support")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Roger Quadros <rogerq@ti.com>
Cc: Md Danish Anwar <danishanwar@ti.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: netdev@vger.kernel.org
Reviewed-by: MD Danish Anwar <danishanwar@ti.com>
Link: https://lore.kernel.org/r/20240211061152.14696-1-rdunlap@infradead.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
On unloading of the scmi_perf_domain module got the below splat, when in
the DT provided to the system under test the '#power-domain-cells' property
was missing. Indeed, this particular setup causes the probe to bail out
early without giving any error, which leads to the ->remove() callback gets
to run too, but without all the expected initialized structures in place.
Add a check and bail out early on remove too.
Call trace:
scmi_perf_domain_remove+0x28/0x70 [scmi_perf_domain]
scmi_dev_remove+0x28/0x40 [scmi_core]
device_remove+0x54/0x90
device_release_driver_internal+0x1dc/0x240
driver_detach+0x58/0xa8
bus_remove_driver+0x78/0x108
driver_unregister+0x38/0x70
scmi_driver_unregister+0x28/0x180 [scmi_core]
scmi_perf_domain_driver_exit+0x18/0xb78 [scmi_perf_domain]
__arm64_sys_delete_module+0x1a8/0x2c0
invoke_syscall+0x50/0x128
el0_svc_common.constprop.0+0x48/0xf0
do_el0_svc+0x24/0x38
el0_svc+0x34/0xb8
el0t_64_sync_handler+0x100/0x130
el0t_64_sync+0x190/0x198
Code: a90153f3 f9403c14 f9414800 955f8a05 (b9400a80)
---[ end trace 0000000000000000 ]---
Fixes: 2af23ceb86 ("pmdomain: arm: Add the SCMI performance domain")
Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240125191756.868860-1-cristian.marussi@arm.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
The xf86-input-wacom driver does not treat '0' as a valid serial
number and will drop any input report which contains an
MSC_SERIAL = 0 event. The kernel driver already takes care to
avoid sending any MSC_SERIAL event if the value of serial[0] == 0
(which is the case for devices that don't actually report a
serial number), but this is not quite sufficient.
Only the lower 32 bits of the serial get reported to userspace,
so if this portion of the serial is zero then there can still
be problems.
This commit allows the driver to report either the lower 32 bits
if they are non-zero or the upper 32 bits otherwise.
Signed-off-by: Jason Gerecke <jason.gerecke@wacom.com>
Signed-off-by: Tatsunosuke Tobita <tatsunosuke.tobita@wacom.com>
Fixes: f85c9dc678 ("HID: wacom: generic: Support tool ID and additional tool types")
CC: stable@vger.kernel.org # v4.10
Signed-off-by: Jiri Kosina <jkosina@suse.com>
After legacy suspend/resume via ACPI S3, sensor read operation fails
with timeout. Also, it will cause delay in resume operation as there
will be retries on failure.
This is caused by commit f645a90e8f ("HID: intel-ish-hid:
ishtp-hid-client: use helper functions for connection"), which used
helper functions to simplify connect, reset and disconnect process.
Also avoid freeing and allocating client buffers again during reconnect
process.
But there is a case, when ISH firmware resets after ACPI S3 suspend,
ishtp bus driver frees client buffers. Since there is no realloc again
during reconnect, there are no client buffers available to send connection
requests to the firmware. Without successful connection to the firmware,
subsequent sensor reads will timeout.
To address this issue, ishtp bus driver does not free client buffers on
warm reset after S3 resume. Simply add the buffers from the read list
to free list of buffers.
Fixes: f645a90e8f ("HID: intel-ish-hid: ishtp-hid-client: use helper functions for connection")
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218442
Signed-off-by: Even Xu <even.xu@intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Jiri Kosina <jkosina@suse.com>
When updating the affinity of a VPE, the VMOVP command is currently skipped
if the two CPUs are part of the same VPE affinity.
But this is wrong, as the doorbell corresponding to this VPE is still
delivered on the 'old' CPU, which screws up the balancing. Furthermore,
offlining that 'old' CPU results in doorbell interrupts generated for this
VPE being discarded.
The harsh reality is that VMOVP cannot be elided when a set_affinity()
request occurs. It needs to be obeyed, and if an optimisation is to be
made, it is at the point where the affinity change request is made (such as
in KVM).
Drop the VMOVP elision altogether, and only use the vpe_table_mask
to try and stay within the same ITS affinity group if at all possible.
Fixes: dd3f050a21 (irqchip/gic-v4.1: Implement the v4.1 flavour of VMOVP)
Reported-by: Kunkun Jiang <jiangkunkun@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240213101206.2137483-4-maz@kernel.org
While refactoring the way the ITSs are probed, the handling of quirks
applicable to ACPI-based platforms was lost. As a result, systems such as
HIP07 lose their GICv4 functionnality, and some other may even fail to
boot, unless they are configured to boot with DT.
Move the enabling of quirks into its_probe_one(), making it common to all
firmware implementations.
Fixes: 9585a495ac ("irqchip/gic-v3-its: Split allocation from initialisation of its_node")
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240213101206.2137483-3-maz@kernel.org
Although the GICv3 code base has gained some handling of systems failing to
handle the shareability attributes, the GICv4 side of things has been
firmly ignored.
This is unfortunate, as the new recent addition of the "dma-noncoherent" is
supposed to apply to all of the GICR tables, and not just the ones that are
common to v3 and v4.
Add some checks to handle the VPROPBASE/VPENDBASE shareability and
cacheability attributes in the same way we deal with the other GICR_BASE
registers, wrapping the flag check in a helper for improved readability.
Note that this has been found by inspection only, as I don't have access to
HW that suffers from this particular issue.
Fixes: 3a0fff0fb6 ("irqchip/gic-v3: Enable non-coherent redistributors/ITSes DT probing")
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Link: https://lore.kernel.org/r/20240213101206.2137483-2-maz@kernel.org
When unlinking a file the check caps could be delayed for more than
5 seconds, but in MDS side it maybe waiting for the clients to
release caps.
This will use the cap_wq work queue and a dedicated list to help
fire the check_caps() and dirty buffer flushing immediately.
Link: https://tracker.ceph.com/issues/50223
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Milind Changire <mchangir@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
In case there is 'Fw' dirty caps and 'CHECK_CAPS_FLUSH' is set we
will always ignore queue a writeback. Queue a writeback is very
important because it will block kclient flushing the snapcaps to
MDS and which will block MDS waiting for revoking the 'Fb' caps.
Link: https://tracker.ceph.com/issues/50223
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Milind Changire <mchangir@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
eiointc_domain_alloc() uses struct eiointc, which is not defined, for a
pointer. Older compilers treat that as a forward declaration and due to
assignment of a void pointer there is no warning emitted. As the variable
is then handed in as a void pointer argument to irq_domain_set_info() the
code is functional.
Use struct eiointc_priv instead.
[ tglx: Rewrote changelog ]
Fixes: dd281e1a1a ("irqchip: Add Loongson Extended I/O interrupt controller support")
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Huacai Chen <chenhuacai@loongson.cn>
Link: https://lore.kernel.org/r/20240130082722.2912576-2-maobibo@loongson.cn
shutdown_pirq and startup_pirq are not taking the
irq_mapping_update_lock because they can't due to lock inversion. Both
are called with the irq_desc->lock being taking. The lock order,
however, is first irq_mapping_update_lock and then irq_desc->lock.
This opens multiple races:
- shutdown_pirq can be interrupted by a function that allocates an event
channel:
CPU0 CPU1
shutdown_pirq {
xen_evtchn_close(e)
__startup_pirq {
EVTCHNOP_bind_pirq
-> returns just freed evtchn e
set_evtchn_to_irq(e, irq)
}
xen_irq_info_cleanup() {
set_evtchn_to_irq(e, -1)
}
}
Assume here event channel e refers here to the same event channel
number.
After this race the evtchn_to_irq mapping for e is invalid (-1).
- __startup_pirq races with __unbind_from_irq in a similar way. Because
__startup_pirq doesn't take irq_mapping_update_lock it can grab the
evtchn that __unbind_from_irq is currently freeing and cleaning up. In
this case even though the event channel is allocated, its mapping can
be unset in evtchn_to_irq.
The fix is to first cleanup the mappings and then close the event
channel. In this way, when an event channel gets allocated it's
potential previous evtchn_to_irq mappings are guaranteed to be unset already.
This is also the reverse order of the allocation where first the event
channel is allocated and then the mappings are setup.
On a 5.10 kernel prior to commit 3fcdaf3d76 ("xen/events: modify internal
[un]bind interfaces"), we hit a BUG like the following during probing of NVMe
devices. The issue is that during nvme_setup_io_queues, pci_free_irq
is called for every device which results in a call to shutdown_pirq.
With many nvme devices it's therefore likely to hit this race during
boot because there will be multiple calls to shutdown_pirq and
startup_pirq are running potentially in parallel.
------------[ cut here ]------------
blkfront: xvda: barrier or flush: disabled; persistent grants: enabled; indirect descriptors: enabled; bounce buffer: enabled
kernel BUG at drivers/xen/events/events_base.c:499!
invalid opcode: 0000 [#1] SMP PTI
CPU: 44 PID: 375 Comm: kworker/u257:23 Not tainted 5.10.201-191.748.amzn2.x86_64 #1
Hardware name: Xen HVM domU, BIOS 4.11.amazon 08/24/2006
Workqueue: nvme-reset-wq nvme_reset_work
RIP: 0010:bind_evtchn_to_cpu+0xdf/0xf0
Code: 5d 41 5e c3 cc cc cc cc 44 89 f7 e8 2b 55 ad ff 49 89 c5 48 85 c0 0f 84 64 ff ff ff 4c 8b 68 30 41 83 fe ff 0f 85 60 ff ff ff <0f> 0b 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 0f 1f 44 00 00
RSP: 0000:ffffc9000d533b08 EFLAGS: 00010046
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000006
RDX: 0000000000000028 RSI: 00000000ffffffff RDI: 00000000ffffffff
RBP: ffff888107419680 R08: 0000000000000000 R09: ffffffff82d72b00
R10: 0000000000000000 R11: 0000000000000000 R12: 00000000000001ed
R13: 0000000000000000 R14: 00000000ffffffff R15: 0000000000000002
FS: 0000000000000000(0000) GS:ffff88bc8b500000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000002610001 CR4: 00000000001706e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
? show_trace_log_lvl+0x1c1/0x2d9
? show_trace_log_lvl+0x1c1/0x2d9
? set_affinity_irq+0xdc/0x1c0
? __die_body.cold+0x8/0xd
? die+0x2b/0x50
? do_trap+0x90/0x110
? bind_evtchn_to_cpu+0xdf/0xf0
? do_error_trap+0x65/0x80
? bind_evtchn_to_cpu+0xdf/0xf0
? exc_invalid_op+0x4e/0x70
? bind_evtchn_to_cpu+0xdf/0xf0
? asm_exc_invalid_op+0x12/0x20
? bind_evtchn_to_cpu+0xdf/0xf0
? bind_evtchn_to_cpu+0xc5/0xf0
set_affinity_irq+0xdc/0x1c0
irq_do_set_affinity+0x1d7/0x1f0
irq_setup_affinity+0xd6/0x1a0
irq_startup+0x8a/0xf0
__setup_irq+0x639/0x6d0
? nvme_suspend+0x150/0x150
request_threaded_irq+0x10c/0x180
? nvme_suspend+0x150/0x150
pci_request_irq+0xa8/0xf0
? __blk_mq_free_request+0x74/0xa0
queue_request_irq+0x6f/0x80
nvme_create_queue+0x1af/0x200
nvme_create_io_queues+0xbd/0xf0
nvme_setup_io_queues+0x246/0x320
? nvme_irq_check+0x30/0x30
nvme_reset_work+0x1c8/0x400
process_one_work+0x1b0/0x350
worker_thread+0x49/0x310
? process_one_work+0x350/0x350
kthread+0x11b/0x140
? __kthread_bind_mask+0x60/0x60
ret_from_fork+0x22/0x30
Modules linked in:
---[ end trace a11715de1eee1873 ]---
Fixes: d46a78b05c ("xen: implement pirq type event channels")
Cc: stable@vger.kernel.org
Co-debugged-by: Andrew Panyakin <apanyaki@amazon.com>
Signed-off-by: Maximilian Heyne <mheyne@amazon.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20240124163130.31324-1-mheyne@amazon.de
Signed-off-by: Juergen Gross <jgross@suse.com>
It was observed on Broadcom devices that use GIC v3 architecture L1
interrupt controllers as the parent of brcmstb-l2 interrupt controllers
that the deactivation of the parent interrupt could happen before the
brcmstb-l2 deasserted its output. This would lead the GIC to reactivate the
interrupt only to find that no L2 interrupt was pending. The result was a
spurious interrupt invoking handle_bad_irq() with its associated
messaging. While this did not create a functional problem it is a waste of
cycles.
The hazard exists because the memory mapped bus writes to the brcmstb-l2
registers are buffered and the GIC v3 architecture uses a very efficient
system register write to deactivate the interrupt.
Add a write memory barrier prior to invoking chained_irq_exit() to
introduce a dsb(st) on those systems to ensure the system register write
cannot be executed until the memory mapped writes are visible to the
system.
[ florian: Added Fixes tag ]
Fixes: 7f646e9276 ("irqchip: brcmstb-l2: Add Broadcom Set Top Box Level-2 interrupt controller")
Signed-off-by: Doug Berger <opendmb@gmail.com>
Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Florian Fainelli <florian.fainelli@broadcom.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240210012449.3009125-1-florian.fainelli@broadcom.com
This issue was found and also debugged by Carl Lei <me@xecycle.info>.
If the device is not enabled, iblock/file will have not setup their
se_device to bdev/file mappings. If a user tries to config the unmap
settings at this time, we will then crash trying to access a NULL pointer
where the bdev/file should be.
This patch adds a check to make sure the device is configured before
we try to call the configure_unmap callout.
Fixes: 34bd1dcacf ("scsi: target: Detect UNMAP support post configuration")
Reported-by: Carl Lei <me@xecycle.info>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Link: https://lore.kernel.org/r/20240209215247.5213-1-michael.christie@oracle.com
Reviewed-by: Maurizio Lombardi <mlombard@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The cited commit introduces and uses the string constants dpp_tx_err and
dpp_rx_err. These are assigned to constant fields of the array
dwxgmac3_error_desc.
It has been reported that on GCC 6 and 7.5.0 this results in warnings
such as:
.../dwxgmac2_core.c:836:20: error: initialiser element is not constant
{ true, "TDPES0", dpp_tx_err },
I have been able to reproduce this using: GCC 7.5.0, 8.4.0, 9.4.0 and 10.5.0.
But not GCC 13.2.0.
So it seems this effects older compilers but not newer ones.
As Jon points out in his report, the minimum compiler supported by
the kernel is GCC 5.1, so it does seem that this ought to be fixed.
It is not clear to me what combination of 'const', if any, would address
this problem. So this patch takes of using #defines for the string
constants
Compile tested only.
Fixes: 46eba193d0 ("net: stmmac: xgmac: fix handling of DPP safety error for DMA channels")
Reported-by: Jon Hunter <jonathanh@nvidia.com>
Closes: https://lore.kernel.org/netdev/c25eb595-8d91-40ea-9f52-efa15ebafdbc@nvidia.com/
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202402081135.lAxxBXHk-lkp@intel.com/
Signed-off-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240208-xgmac-const-v1-1-e69a1eeabfc8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Seth reported that on his side XDP traffic can not survive a round of
down/up against i40e interface. Dmesg output was telling us that we were
not able to disable the very first XDP ring. That was due to the fact
that in i40e_vsi_stop_rings() in a pre-work that is done before calling
i40e_vsi_wait_queues_disabled(), XDP Tx queues were not taken into the
account.
To fix this, let us distinguish between Rx and Tx queue boundaries and
take into the account XDP queues for Tx side.
Reported-by: Seth Forshee <sforshee@kernel.org>
Closes: https://lore.kernel.org/netdev/ZbkE7Ep1N1Ou17sA@do-x1extreme/
Fixes: 65662a8dcd ("i40e: Fix logic of disabling queues")
Tested-by: Seth Forshee <sforshee@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Reviewed-by: Ivan Vecera <ivecera@redhat.com>
Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Currently, when interface is being brought down and
i40e_vsi_stop_rings() is called, i40e_pf_rxq_wait() is called two times,
which is wrong. To showcase this scenario, simplified call stack looks
as follows:
i40e_vsi_stop_rings()
i40e_control wait rx_q()
i40e_control_rx_q()
i40e_pf_rxq_wait()
i40e_vsi_wait_queues_disabled()
i40e_pf_rxq_wait() // redundant call
To fix this, let us s/i40e_control_wait_rx_q/i40e_control_rx_q within
i40e_vsi_stop_rings().
Fixes: 65662a8dcd ("i40e: Fix logic of disabling queues")
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Reviewed-by: Ivan Vecera <ivecera@redhat.com>
Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Mask used for clearing PRTDCB_RETSTCC register in function
i40e_dcb_hw_rx_ets_bw_config() is incorrect as there is used
define I40E_PRTDCB_RETSTCC_ETSTC_SHIFT instead of define
I40E_PRTDCB_RETSTCC_ETSTC_MASK.
The PRTDCB_RETSTCC register is used to configure whether ETS
or strict priority is used as TSA in Rx for particular TC.
In practice it means that once the register is set to use ETS
as TSA then it is not possible to switch back to strict priority
without CoreR reset.
Fix the value in the clearing mask.
Fixes: 90bc8e003b ("i40e: Add hardware configuration for software based DCB")
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The function i40e_pf_wait_queues_disabled() iterates all PF's VSIs
up to 'pf->hw.func_caps.num_vsis' but this is incorrect because
the real number of VSIs can be up to 'pf->num_alloc_vsi' that
can be higher. Fix this loop.
Fixes: 69129dc39f ("i40e: Modify Tx disable wait flow in case of DCB reconfiguration")
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Currently when PF administratively sets VF's MAC address and the VF
is put down (VF tries to delete all MACs) then the MAC is removed
from MAC filters and primary VF MAC is zeroed.
Do not allow untrusted VF to remove primary MAC when it was set
administratively by PF.
Reproducer:
1) Create VF
2) Set VF interface up
3) Administratively set the VF's MAC
4) Put VF interface down
[root@host ~]# echo 1 > /sys/class/net/enp2s0f0/device/sriov_numvfs
[root@host ~]# ip link set enp2s0f0v0 up
[root@host ~]# ip link set enp2s0f0 vf 0 mac fe:6c:b5:da:c7:7d
[root@host ~]# ip link show enp2s0f0
23: enp2s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
link/ether 3c:ec:ef:b7:dd:04 brd ff:ff:ff:ff:ff:ff
vf 0 link/ether fe:6c:b5:da:c7:7d brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
[root@host ~]# ip link set enp2s0f0v0 down
[root@host ~]# ip link show enp2s0f0
23: enp2s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
link/ether 3c:ec:ef:b7:dd:04 brd ff:ff:ff:ff:ff:ff
vf 0 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
Fixes: 700bbf6c1f ("i40e: allow VF to remove any MAC filter")
Fixes: ceb29474bb ("i40e: Add support for VF to specify its primary MAC address")
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20240208180335.1844996-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When ident_pud_init() uses only gbpages to create identity maps, large
ranges of addresses not actually requested can be included in the
resulting table; a 4K request will map a full GB. On UV systems, this
ends up including regions that will cause hardware to halt the system
if accessed (these are marked "reserved" by BIOS). Even processor
speculation into these regions is enough to trigger the system halt.
Only use gbpages when map creation requests include the full GB page
of space. Fall back to using smaller 2M pages when only portions of a
GB page are included in the request.
No attempt is made to coalesce mapping requests. If a request requires
a map entry at the 2M (pmd) level, subsequent mapping requests within
the same 1G region will also be at the pmd level, even if adjacent or
overlapping such requests could have been combined to map a full
gbpage. Existing usage starts with larger regions and then adds
smaller regions, so this should not have any great consequence.
[ dhansen: fix up comment formatting, simplifty changelog ]
Signed-off-by: Steve Wahl <steve.wahl@hpe.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/20240126164841.170866-1-steve.wahl%40hpe.com
Pull documentation fix from Jonathan Corbet:
"A single fix to the kernel_feat extension for a bug that will crash
the docs build in some situations"
* tag 'docs-6.8-fixes2' of git://git.lwn.net/linux:
docs: kernel_feat.py: fix build error for missing files
Clear Cause.BD after we use instruction_pointer_set to override
EPC.
This can prevent exception_epc check against instruction code at
new return address.
It won't be considered as "in delay slot" after epc being overridden
anyway.
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
After 'lib: checksum: Use aligned accesses for ip_fast_csum and
csum_ipv6_magic tests' was applied, the test_csum_ipv6_magic unit test
started failing for all mips platforms, both little and bit endian.
Oddly enough, adding debug code into test_csum_ipv6_magic() made the
problem disappear.
The gcc manual says:
"The "memory" clobber tells the compiler that the assembly code performs
memory reads or writes to items other than those listed in the input
and output operands (for example, accessing the memory pointed to by one
of the input parameters)
"
This is definitely the case for csum_ipv6_magic(). Indeed, adding the
'memory' clobber fixes the problem.
Cc: Charlie Jenkins <charlie@rivosinc.com>
Cc: Palmer Dabbelt <palmer@rivosinc.com>
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Charlie Jenkins <charlie@rivosinc.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Convert path separator to CIFS_DIR_SEP(cifs_sb) from symlink target
before sending it over the wire otherwise the created SMB symlink may
become innaccesible from server side.
Fixes: 514d793e27 ("smb: client: allow creating symlinks via reparse points")
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
For i2c read operation in GSI mode, we are getting timeout
due to malformed TRE basically incorrect TRE sequence
in gpi(drivers/dma/qcom/gpi.c) driver.
I2C driver has geni_i2c_gpi(I2C_WRITE) function which generates GO TRE and
geni_i2c_gpi(I2C_READ)generates DMA TRE. Hence to generate GO TRE before
DMA TRE, we should move geni_i2c_gpi(I2C_WRITE) before
geni_i2c_gpi(I2C_READ) inside the I2C GSI mode transfer function
i.e. geni_i2c_gpi_xfer().
TRE stands for Transfer Ring Element - which is basically an element with
size of 4 words. It contains all information like slave address,
clk divider, dma address value data size etc).
Mainly we have 3 TREs(Config, GO and DMA tre).
- CONFIG TRE : consists of internal register configuration which is
required before start of the transfer.
- DMA TRE : contains DDR/Memory address, called as DMA descriptor.
- GO TRE : contains Transfer directions, slave ID, Delay flags, Length
of the transfer.
I2c driver calls GPI driver API to config each TRE depending on the
protocol.
For read operation tre sequence will be as below which is not aligned
to hardware programming guide.
- CONFIG tre
- DMA tre
- GO tre
As per Qualcomm's internal Hardware Programming Guide, we should configure
TREs in below sequence for any RX only transfer.
- CONFIG tre
- GO tre
- DMA tre
Fixes: d8703554f4 ("i2c: qcom-geni: Add support for GPI DMA")
Reviewed-by: Andi Shyti <andi.shyti@kernel.org>
Reviewed-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
Tested-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org> # qrb5165-rb5
Co-developed-by: Mukesh Kumar Savaliya <quic_msavaliy@quicinc.com>
Signed-off-by: Mukesh Kumar Savaliya <quic_msavaliy@quicinc.com>
Signed-off-by: Viken Dadhaniya <quic_vdadhani@quicinc.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
Pull vfs fixes from Christian Brauner:
- Fix performance regression introduced by moving the security
permission hook out of do_clone_file_range() and into its caller
vfs_clone_file_range().
This causes the security hook to be called in situation were it
wasn't called before as the fast permission checks were left in
do_clone_file_range().
Fix this by merging the two implementations back together and
restoring the old ordering: fast permission checks first, expensive
ones later.
- Tweak mount_setattr() permission checking so that mount properties on
the real rootfs can be changed.
When we added mount_setattr() we added additional checks compared to
legacy mount(2). If the mount had a parent then verify that the
caller and the mount namespace the mount is attached to match and if
not make sure that it's an anonymous mount.
But the real rootfs falls into neither category. It is neither an
anoymous mount because it is obviously attached to the initial mount
namespace but it also obviously doesn't have a parent mount. So that
means legacy mount(2) allows changing mount properties on the real
rootfs but mount_setattr(2) blocks this. This causes regressions (See
the commit for details).
Fix this by relaxing the check. If the mount has a parent or if it
isn't a detached mount, verify that the mount namespaces of the
caller and the mount are the same. Technically, we could probably
write this even simpler and check that the mount namespaces match if
it isn't a detached mount. But the slightly longer check makes it
clearer what conditions one needs to think about.
* tag 'vfs-6.8-rc5.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
fs: relax mount_setattr() permission checks
remap_range: merge do_clone_file_range() into vfs_clone_file_range()
The conversion to kvcalloc() mixed up the object size and count
arguments, causing a warning:
drivers/gpu/drm/nouveau/nouveau_svm.c: In function 'nouveau_svm_fault_buffer_ctor':
drivers/gpu/drm/nouveau/nouveau_svm.c:1010:40: error: 'kvcalloc' sizes specified with 'sizeof' in the earlier argument and not in the later argument [-Werror=calloc-transposed-args]
1010 | buffer->fault = kvcalloc(sizeof(*buffer->fault), buffer->entries, GFP_KERNEL);
| ^
drivers/gpu/drm/nouveau/nouveau_svm.c:1010:40: note: earlier argument should specify number of elements, later size of each element
The behavior is still correct aside from the warning, but fixing it avoids
the warnings and can help the compiler track the individual objects better.
Fixes: 71e4bbca07 ("nouveau/svm: Use kvcalloc() instead of kvzalloc()")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Danilo Krummrich <dakr@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240212112230.1117284-1-arnd@kernel.org
During the cache sync test we verify that values we expect to have been
written only to the cache do not appear in the hardware. This works most
of the time but since we randomly generate both the original and new values
there is a low probability that these values may actually be the same.
Wrap get_random_bytes() to ensure that the values are different, there
are other tests which should have similar verification that we actually
changed something.
While we're at it refactor the test to use three changed values rather
than attempting to use one of them twice, that just complicates checking
that our new values are actually new.
We use random generation to try to avoid data dependencies in the tests.
Reported-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/r/20240211-regmap-kunit-random-change-v3-1-e387a9ea4468@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
MCSPI controller have few limitations regarding the transaction
size when the FIFO buffer is enabled and the WCNT feature is used
to find the end of word, in this case if WCNT is not a multiple of
the FIFO Almost Empty Level (AEL), then the FIFO empty event is not
generated correctly. In addition to this limitation, few other unknown
sequence of events that causes the FIFO empty status to not reflect the
exact status were found when FIFO is being used without DMA enabled
during extended testing in AM65x platform. Till the exact root cause
is found and fixed, revert the FIFO support without DMA.
See J721E Technical Reference Manual (SPRUI1C), section 12.1.5
for further details: http://www.ti.com/lit/pdf/spruil1
This reverts commit 75223bbea8 ("spi: omap2-mcspi: Add FIFO support
without DMA")
Signed-off-by: Vaishnav Achath <vaishnav.a@ti.com>
Link: https://msgid.link/r/20240212120049.438495-1-vaishnav.a@ti.com
Signed-off-by: Mark Brown <broonie@kernel.org>
The Documentation/ABI/testing/sysfs-class-net-statistics documentation
is pointing to the wrong path for the interface. Documentation is
pointing to /sys/class/<iface>, instead of /sys/class/net/<iface>.
Fix it by adding the `net/` directory before the interface.
Fixes: 6044f97006 ("net: sysfs: document /sys/class/net/statistics/*")
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
nouveau_abi16_ioctl_channel_alloc() and nouveau_cli_init() simply call
their corresponding *_fini() counterpart. This can lead to
nouveau_sched_fini() being called without struct nouveau_sched ever
being initialized in the first place.
Instead of embedding struct nouveau_sched into struct nouveau_cli and
struct nouveau_chan_abi16, allocate struct nouveau_sched separately,
such that we can check for the corresponding pointer to be NULL in the
particular *_fini() functions.
It makes sense to allocate struct nouveau_sched separately anyway, since
in a subsequent commit we can also avoid to allocate a struct
nouveau_sched in nouveau_abi16_ioctl_channel_alloc() at all, if the
VM_BIND uAPI has been disabled due to the legacy uAPI being used.
Fixes: 5f03a507b2 ("drm/nouveau: implement 1:1 scheduler - entity relationship")
Reported-by: Timur Tabi <ttabi@nvidia.com>
Tested-by: Timur Tabi <ttabi@nvidia.com>
Closes: https://lore.kernel.org/nouveau/20240131213917.1545604-1-ttabi@nvidia.com/
Reviewed-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Danilo Krummrich <dakr@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240202000606.3526-1-dakr@redhat.com
Matthieu Baerts says:
====================
mptcp: locking cleanup & misc. fixes
Patches 1-4 are fixes for issues found by Paolo while working on adding
TCP_NOTSENT_LOWAT support. The latter will need to track more states
under the msk data lock. Since the locking msk locking schema is already
quite complex, do a long awaited clean-up step by moving several
confusing lockless initialization under the relevant locks. Note that it
is unlikely a real race could happen even prior to such patches as the
MPTCP-level state machine implicitly ensures proper serialization of the
write accesses, even lacking explicit lock. But still, simplification is
welcome and this will help for the maintenance. This can be backported
up to v5.6.
Patch 5 is a fix for the userspace PM, not to add new local address
entries if the address is already in the list. This behaviour can be
seen since v5.19.
Patch 6 fixes an issue when Fastopen is used. The issue can happen since
v6.2. A previous fix has already been applied, but not taking care of
all cases according to syzbot.
Patch 7 updates Geliang's email address in the MAINTAINERS file.
====================
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Before adding a new entry in mptcp_userspace_pm_get_local_id(), it's
better to check whether this address is already in userspace pm local
address list. If it's in the list, no need to add a new entry, just
return it's address ID and use this address.
Fixes: 8b20137012 ("mptcp: read attributes of addr entries managed by userspace PMs")
Cc: stable@vger.kernel.org
Signed-off-by: Geliang Tang <geliang.tang@linux.dev>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Most MPTCP-level related fields are under the mptcp data lock
protection, but are written one-off without such lock at MPC
complete time, both for the client and the server
Leverage the mptcp_propagate_state() infrastructure to move such
initialization under the proper lock client-wise.
The server side critical init steps are done by
mptcp_subflow_fully_established(): ensure the caller properly held the
relevant lock, and avoid acquiring the same lock in the nested scopes.
There are no real potential races, as write access to such fields
is implicitly serialized by the MPTCP state machine; the primary
goal is consistency.
Fixes: d22f4988ff ("mptcp: process MP_CAPABLE data option")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The 'msk->write_seq' and 'msk->snd_nxt' are always updated under
the msk socket lock, except at MPC handshake completiont time.
Builds-up on the previous commit to move such init under the relevant
lock.
There are no known problems caused by the potential race, the
primary goal is consistency.
Fixes: 6d0060f600 ("mptcp: Write MPTCP DSS headers to outgoing data packets")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
mptcp_rcv_space_init() is supposed to happen under the msk socket
lock, but active msk socket does that without such protection.
Leverage the existing mptcp_propagate_state() helper to that extent.
We need to ensure mptcp_rcv_space_init will happen before
mptcp_rcv_space_adjust(), and the release_cb does not assure that:
explicitly check for such condition.
While at it, move the wnd_end initialization out of mptcp_rcv_space_init(),
it never belonged there.
Note that the race does not produce ill effect in practice, but
change allows cleaning-up and defying better the locking model.
Fixes: a6b118febb ("mptcp: add receive buffer auto-tuning")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Such field is there to avoid acquiring the data lock in a few spots,
but it adds complexity to the already non trivial locking schema.
All the relevant call sites (mptcp-level re-injection, set socket
options), are slow-path, drop such field in favor of 'cb_flags', adding
the relevant locking.
This patch could be seen as an improvement, instead of a fix. But it
simplifies the next patch. The 'Fixes' tag has been added to help having
this series backported to stable.
Fixes: e9d09baca6 ("mptcp: avoid atomic bit manipulation when possible")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet says:
====================
net: more three misplaced fields
We recently reorganized some structures for better data locality
in networking fast paths.
This series moves three fields that were not correctly classified.
There probably more to come.
Reference : https://lwn.net/Articles/951321/
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
dev->lstats is notably used from loopback ndo_start_xmit()
and other virtual drivers.
Per cpu stats updates are dirtying per-cpu data,
but the pointer itself is read-only.
Fixes: 43a71cd66b ("net-device: reorganize net_device fast path variables")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Coco Li <lixiaoyan@google.com>
Cc: Simon Horman <horms@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
When compiling rtla with clang, I am getting the following warnings:
$ make HOSTCC=clang CC=clang LLVM_IAS=1
[..]
clang -O -g -DVERSION=\"6.8.0-rc3\" -flto=auto -fexceptions
-fstack-protector-strong -fasynchronous-unwind-tables
-fstack-clash-protection -Wall -Werror=format-security
-Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS
$(pkg-config --cflags libtracefs)
-c -o src/osnoise_hist.o src/osnoise_hist.c
src/osnoise_hist.c:138:6: warning: variable 'bucket' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized]
138 | if (data->bucket_size)
| ^~~~~~~~~~~~~~~~~
src/osnoise_hist.c:149:6: note: uninitialized use occurs here
149 | if (bucket < entries)
| ^~~~~~
src/osnoise_hist.c:138:2: note: remove the 'if' if its condition is always true
138 | if (data->bucket_size)
| ^~~~~~~~~~~~~~~~~~~~~~
139 | bucket = duration / data->bucket_size;
src/osnoise_hist.c:132:12: note: initialize the variable 'bucket' to silence this warning
132 | int bucket;
| ^
| = 0
1 warning generated.
[...]
clang -O -g -DVERSION=\"6.8.0-rc3\" -flto=auto -fexceptions
-fstack-protector-strong -fasynchronous-unwind-tables
-fstack-clash-protection -Wall -Werror=format-security
-Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS
$(pkg-config --cflags libtracefs)
-c -o src/timerlat_hist.o src/timerlat_hist.c
src/timerlat_hist.c:181:6: warning: variable 'bucket' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized]
181 | if (data->bucket_size)
| ^~~~~~~~~~~~~~~~~
src/timerlat_hist.c:204:6: note: uninitialized use occurs here
204 | if (bucket < entries)
| ^~~~~~
src/timerlat_hist.c:181:2: note: remove the 'if' if its condition is always true
181 | if (data->bucket_size)
| ^~~~~~~~~~~~~~~~~~~~~~
182 | bucket = latency / data->bucket_size;
src/timerlat_hist.c:175:12: note: initialize the variable 'bucket' to silence this warning
175 | int bucket;
| ^
| = 0
1 warning generated.
This is a legit warning, but data->bucket_size is always > 0 (see
timerlat_hist_parse_args()), so the if is not necessary.
Remove the unneeded if (data->bucket_size) to avoid the warning.
Link: https://lkml.kernel.org/r/6e1b1665cd99042ae705b3e0fc410858c4c42346.1707217097.git.bristot@kernel.org
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Bill Wendling <morbo@google.com>
Cc: Justin Stitt <justinstitt@google.com>
Cc: Donald Zickus <dzickus@redhat.com>
Fixes: 1eeb6328e8 ("rtla/timerlat: Add timerlat hist mode")
Fixes: 829a6c0b56 ("rtla/osnoise: Add the hist mode")
Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Issue IP reset before shutdown in order to
complete all upstream requests to the SOC.
Without this DevTLB is complaining about
incomplete transactions and NPU cannot resume from
suspend.
This problem is only happening on recent IFWI
releases.
IP reset in rare corner cases can mess up PCI
configuration, so save it before the reset.
After this happens it is also impossible to
issue PLL requests and D0->D3->D0 cycle is needed
to recover the NPU. Add WP 0 request on power up,
so the PUNIT is always notified about NPU reset.
Use D0/D3 cycle for recovery as it can recover
from failed IP reset and FLR cannot.
Fixes: 3f7c063492 ("accel/ivpu/37xx: Fix hangs related to MMIO reset")
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240207102446.3126981-1-jacek.lawrynowicz@linux.intel.com
Merge series from Hans de Goede <hdegoede@redhat.com>:
While testing 6.8 on a Bay Trail device with a ALC5640 codec
I noticed a regression in 6.8 which causes a NULL pointer deref
in probe().
All BYT/CHT Intel machine drivers are affected. Patch 1/2 of
this series fixes all of them.
Patch 2/2 adds some small cleanups to cht_bsw_rt5645.c for
issues which I noticed while working on 1/2.
File open requests made to the server contain a
CreateGuid, which is used by the server to identify
the open request. If the same request needs to be
replayed, it needs to be sent with the same CreateGuid
in the durable handle v2 context.
Without doing so, we could end up leaking handles on
the server when:
1. multichannel is used AND
2. connection goes down, but not for all channels
This is because the replayed open request would have a
new CreateGuid and the server will treat this as a new
request and open a new handle.
This change fixes this by reusing the existing create_guid
stored in the cached fid struct.
REF: MS-SMB2 4.9 Replay Create Request on an Alternate Channel
Fixes: 4f1fffa237 ("cifs: commands that are retried should have replay flag set")
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
In this loop, we step through the buffer and after each item we check
if the size_left is greater than the minimum size we need. However,
the problem is that "bytes_left" is type ssize_t while sizeof() is type
size_t. That means that because of type promotion, the comparison is
done as an unsigned and if we have negative bytes left the loop
continues instead of ending.
Fixes: fe856be475 ("CIFS: parse and store info on iface queries")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Pull timer fix from Borislav Petkov:
- Make sure a warning is issued when a hrtimer gets queued after the
timers have been migrated on the CPU down path and thus said timer
will get ignored
* tag 'timers_urgent_for_v6.8_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
hrtimer: Report offline hrtimer enqueue
Pull x86 fixes from Borislav Petkov:
- Correct the minimum CPU family for Transmeta Crusoe in Kconfig so
that such hw can boot again
- Do not take into accout XSTATE buffer size info supplied by userspace
when constructing a sigreturn frame
- Switch get_/put_user* to EX_TYPE_UACCESS exception handling when an
MCE is encountered so that it can be properly recovered from instead
of simply panicking
* tag 'x86_urgent_for_v6.8_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/Kconfig: Transmeta Crusoe is CPU family 5, not 6
x86/fpu: Stop relying on userspace for info to fault in xsave buffer
x86/lib: Revert to _ASM_EXTABLE_UA() for {get,put}_user() fixups
A recent change in acp_irq_thread() was meant to address a potential race
condition while trying to acquire the hardware semaphore responsible for
the synchronization between firmware and host IPC interrupts.
This resulted in an improper use of the IPC spinlock, causing normal
kernel memory allocations (which may sleep) inside atomic contexts:
1707255557.133976 kernel: BUG: sleeping function called from invalid context at include/linux/sched/mm.h:315
...
1707255557.134757 kernel: sof_ipc3_rx_msg+0x70/0x130 [snd_sof]
1707255557.134793 kernel: acp_sof_ipc_irq_thread+0x1e0/0x550 [snd_sof_amd_acp]
1707255557.134855 kernel: acp_irq_thread+0xa3/0x130 [snd_sof_amd_acp]
1707255557.134904 kernel: ? irq_thread+0xb5/0x1e0
1707255557.134947 kernel: ? __pfx_irq_thread_fn+0x10/0x10
1707255557.134985 kernel: irq_thread_fn+0x23/0x60
Moreover, there are attempts to lock a mutex from the same atomic
context:
1707255557.136357 kernel: =============================
1707255557.136393 kernel: [ BUG: Invalid wait context ]
1707255557.136413 kernel: 6.8.0-rc3-next-20240206-audio-next #9 Tainted: G W
1707255557.136432 kernel: -----------------------------
1707255557.136451 kernel: irq/66-AudioDSP/502 is trying to lock:
1707255557.136470 kernel: ffff965152f26af8 (&sb->s_type->i_mutex_key#2){+.+.}-{3:3}, at: start_creating.part.0+0x5f/0x180
...
1707255557.137429 kernel: start_creating.part.0+0x5f/0x180
1707255557.137457 kernel: __debugfs_create_file+0x61/0x210
1707255557.137475 kernel: snd_sof_debugfs_io_item+0x75/0xc0 [snd_sof]
1707255557.137494 kernel: sof_ipc3_do_rx_work+0x7cf/0x9f0 [snd_sof]
1707255557.137513 kernel: sof_ipc3_rx_msg+0xb3/0x130 [snd_sof]
1707255557.137532 kernel: acp_sof_ipc_irq_thread+0x1e0/0x550 [snd_sof_amd_acp]
1707255557.137551 kernel: acp_irq_thread+0xa3/0x130 [snd_sof_amd_acp]
Fix the issues by reducing the lock scope in acp_irq_thread(), so that
it guards only the hardware semaphore acquiring attempt. Additionally,
restore the initial locking in acp_sof_ipc_irq_thread() to synchronize
the handling of immediate replies from DSP core.
Fixes: 802134c8c2 ("ASoC: SOF: amd: Refactor spinlock_irq(&sdev->ipc_lock) sequence in irq_handler")
Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com>
Link: https://lore.kernel.org/r/20240208234315.2182048-1-cristian.ciocaltea@collabora.com
Signed-off-by: Mark Brown <broonie@kernel.org>
There is a path in rt5645_jack_detect_work(), where rt5645->jd_mutex
is left locked forever. That may lead to deadlock
when rt5645_jack_detect_work() is called for the second time.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: cdba4301ad ("ASoC: rt5650: add mutex to avoid the jack detection failure")
Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Link: https://lore.kernel.org/r/1707645514-21196-1-git-send-email-khoroshilov@ispras.ru
Signed-off-by: Mark Brown <broonie@kernel.org>
4 fixes / cleanups to the rt5645 mc driver's codec_name handling:
1. In the for loop looking for the dai_index for the codec, replace
card->dai_link[i] with cht_dailink[i]. The for loop already uses
ARRAY_SIZE(cht_dailink) as bound and card->dai_link is just a pointer to
cht_dailink using card->dai_link only obfuscates that cht_dailink is being
modified directly rather then say a copy of cht_dailink. Using
cht_dailink[i] also makes the code consistent with other machine drivers.
2. Don't set cht_dailink[dai_index].codecs->name in the for loop,
this immediately gets overridden using acpi_dev_name(adev) directly
below the loop.
3. Add a missing break to the loop.
4. Remove the now no longer used (only set, never read) codec_name field
from struct cht_mc_private.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20240210134400.24913-3-hdegoede@redhat.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Since commit 13f58267cd ("ASoC: soc.h: don't create dummy Component
via COMP_DUMMY()") dummy snd_soc_dai_link.codecs entries no longer
have a name set.
This means that when looking for the codec dai_link the machine
driver can no longer unconditionally run strcmp() on
snd_soc_dai_link.codecs[0].name since this may now be NULL.
Add a check for snd_soc_dai_link.codecs[0].name being NULL to all
BYT/CHT machine drivers to avoid NULL pointer dereferences in
their probe() methods.
Fixes: 13f58267cd ("ASoC: soc.h: don't create dummy Component via COMP_DUMMY()")
Cc: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20240210134400.24913-2-hdegoede@redhat.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Pull misc fixes from Andrew Morton:
"21 hotfixes. 12 are cc:stable and the remainder pertain to post-6.7
issues or aren't considered to be needed in earlier kernel versions"
* tag 'mm-hotfixes-stable-2024-02-10-11-16' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (21 commits)
nilfs2: fix potential bug in end_buffer_async_write
mm/damon/sysfs-schemes: fix wrong DAMOS tried regions update timeout setup
nilfs2: fix hang in nilfs_lookup_dirty_data_buffers()
MAINTAINERS: Leo Yan has moved
mm/zswap: don't return LRU_SKIP if we have dropped lru lock
fs,hugetlb: fix NULL pointer dereference in hugetlbs_fill_super
mailmap: switch email address for John Moon
mm: zswap: fix objcg use-after-free in entry destruction
mm/madvise: don't forget to leave lazy MMU mode in madvise_cold_or_pageout_pte_range()
arch/arm/mm: fix major fault accounting when retrying under per-VMA lock
selftests: core: include linux/close_range.h for CLOSE_RANGE_* macros
mm/memory-failure: fix crash in split_huge_page_to_list from soft_offline_page
mm: memcg: optimize parent iteration in memcg_rstat_updated()
nilfs2: fix data corruption in dsync block recovery for small block sizes
mm/userfaultfd: UFFDIO_MOVE implementation should use ptep_get()
exit: wait_task_zombie: kill the no longer necessary spin_lock_irq(siglock)
fs/proc: do_task_stat: use sig->stats_lock to gather the threads/children stats
fs/proc: do_task_stat: move thread_group_cputime_adjusted() outside of lock_task_sighand()
getrusage: use sig->stats_lock rather than lock_task_sighand()
getrusage: move thread_group_cputime_adjusted() outside of lock_task_sighand()
...
During xfstest tests, there are some kmemleak reports e.g. generic/051 with
if USE_KMEMLEAK=yes:
====================================================================
EXPERIMENTAL kmemleak reported some memory leaks! Due to the way kmemleak
works, the leak might be from an earlier test, or something totally unrelated.
unreferenced object 0xffff9ef905aaf778 (size 8):
comm "mount.bcachefs", pid 169844, jiffies 4295281209 (age 87.040s)
hex dump (first 8 bytes):
a5 cc cc cc cc cc cc cc ........
backtrace:
[<ffffffff87fd9a43>] __kmem_cache_alloc_node+0x1f3/0x2c0
[<ffffffff87f49b66>] kmalloc_trace+0x26/0xb0
[<ffffffffc0a3fefe>] __bch2_read_super+0xfe/0x4e0 [bcachefs]
[<ffffffffc0a3ad22>] bch2_fs_open+0x262/0x1710 [bcachefs]
[<ffffffffc09c9e24>] bch2_mount+0x4c4/0x640 [bcachefs]
[<ffffffff88080c90>] legacy_get_tree+0x30/0x60
[<ffffffff8802c748>] vfs_get_tree+0x28/0xf0
[<ffffffff88061fe5>] path_mount+0x475/0xb60
[<ffffffff880627e5>] __x64_sys_mount+0x105/0x140
[<ffffffff88932642>] do_syscall_64+0x42/0xf0
[<ffffffff88a000e6>] entry_SYSCALL_64_after_hwframe+0x6e/0x76
unreferenced object 0xffff9ef96cdc4fc0 (size 32):
comm "mount.bcachefs", pid 169844, jiffies 4295281209 (age 87.040s)
hex dump (first 32 bytes):
2f 64 65 76 2f 6d 61 70 70 65 72 2f 74 65 73 74 /dev/mapper/test
2d 31 00 cc cc cc cc cc cc cc cc cc cc cc cc cc -1..............
backtrace:
[<ffffffff87fd9a43>] __kmem_cache_alloc_node+0x1f3/0x2c0
[<ffffffff87f4a081>] __kmalloc_node_track_caller+0x51/0x150
[<ffffffff87f3adc2>] kstrdup+0x32/0x60
[<ffffffffc0a3ff1a>] __bch2_read_super+0x11a/0x4e0 [bcachefs]
[<ffffffffc0a3ad22>] bch2_fs_open+0x262/0x1710 [bcachefs]
[<ffffffffc09c9e24>] bch2_mount+0x4c4/0x640 [bcachefs]
[<ffffffff88080c90>] legacy_get_tree+0x30/0x60
[<ffffffff8802c748>] vfs_get_tree+0x28/0xf0
[<ffffffff88061fe5>] path_mount+0x475/0xb60
[<ffffffff880627e5>] __x64_sys_mount+0x105/0x140
[<ffffffff88932642>] do_syscall_64+0x42/0xf0
[<ffffffff88a000e6>] entry_SYSCALL_64_after_hwframe+0x6e/0x76
====================================================================
The leak happens if bdev_open_by_path() failed to open a block device
then it goes label 'out' directly without call of bch2_free_super().
Fix it by going to label 'err' instead of 'out' if bdev_open_by_path()
fails.
Signed-off-by: Su Yue <glass.su@suse.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Jakub Kicinski says:
====================
net: tls: fix some issues with async encryption
valis was reporting a race on socket close so I sat down to try to fix it.
I used Sabrina's async crypto debug patch to test... and in the process
run into some of the same issues, and created very similar fixes :(
I didn't realize how many of those patches weren't applied. Once I found
Sabrina's code [1] it turned out to be so similar in fact that I added
her S-o-b's and Co-develop'eds in a semi-haphazard way.
With this series in place all expected tests pass with async crypto.
Sabrina had a few more fixes, but I'll leave those to her, things are
not crashing anymore.
[1] https://lore.kernel.org/netdev/cover.1694018970.git.sd@queasysnail.net/
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
We double count async, non-zc rx data. The previous fix was
lucky because if we fully zc async_copy_bytes is 0 so we add 0.
Decrypted already has all the bytes we handled, in all cases.
We don't have to adjust anything, delete the erroneous line.
Fixes: 4d42cd6bc2 ("tls: rx: fix return value for async crypto")
Co-developed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This exact case was fail for async crypto and we weren't
catching it.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
tls_decrypt_sg doesn't take a reference on the pages from clear_skb,
so the put_page() in tls_decrypt_done releases them, and we trigger
a use-after-free in process_rx_list when we try to read from the
partially-read skb.
Fixes: fd31f3996a ("tls: rx: decrypt into a fresh skb")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since we're setting the CRYPTO_TFM_REQ_MAY_BACKLOG flag on our
requests to the crypto API, crypto_aead_{encrypt,decrypt} can return
-EBUSY instead of -EINPROGRESS in valid situations. For example, when
the cryptd queue for AESNI is full (easy to trigger with an
artificially low cryptd.cryptd_max_cpu_qlen), requests will be enqueued
to the backlog but still processed. In that case, the async callback
will also be called twice: first with err == -EINPROGRESS, which it
seems we can just ignore, then with err == 0.
Compared to Sabrina's original patch this version uses the new
tls_*crypt_async_wait() helpers and converts the EBUSY to
EINPROGRESS to avoid having to modify all the error handling
paths. The handling is identical.
Fixes: a54667f672 ("tls: Add support for encryption using async offload accelerator")
Fixes: 94524d8fc9 ("net/tls: Add support for async decryption of tls records")
Co-developed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/netdev/9681d1febfec295449a62300938ed2ae66983f28.1694018970.git.sd@queasysnail.net/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Similarly to previous commit, the submitting thread (recvmsg/sendmsg)
may exit as soon as the async crypto handler calls complete().
Reorder scheduling the work before calling complete().
This seems more logical in the first place, as it's
the inverse order of what the submitting thread will do.
Reported-by: valis <sec@valis.email>
Fixes: a42055e8d2 ("net/tls: Add support for async encryption of records for performance")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The submitting thread (one which called recvmsg/sendmsg)
may exit as soon as the async crypto handler calls complete()
so any code past that point risks touching already freed data.
Try to avoid the locking and extra flags altogether.
Have the main thread hold an extra reference, this way
we can depend solely on the atomic ref counter for
synchronization.
Don't futz with reiniting the completion, either, we are now
tightly controlling when completion fires.
Reported-by: valis <sec@valis.email>
Fixes: 0cada33241 ("net/tls: fix race condition causing kernel panic")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Factor out waiting for async encrypt and decrypt to finish.
There are already multiple copies and a subsequent fix will
need more. No functional changes.
Note that crypto_wait_req() returns wait->err
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull block fixes from Jens Axboe:
- NVMe pull request via Keith:
- Update a potentially stale firmware attribute (Maurizio)
- Fixes for the recent verbose error logging (Keith, Chaitanya)
- Protection information payload size fix for passthrough (Francis)
- Fix for a queue freezing issue in virtblk (Yi)
- blk-iocost underflow fix (Tejun)
- blk-wbt task detection fix (Jan)
* tag 'block-6.8-2024-02-10' of git://git.kernel.dk/linux:
virtio-blk: Ensure no requests in virtqueues before deleting vqs.
blk-iocost: Fix an UBSAN shift-out-of-bounds warning
nvme: use ns->head->pi_size instead of t10_pi_tuple structure size
nvme-core: fix comment to reflect right functions
nvme: move passthrough logging attribute to head
blk-wbt: Fix detection of dirty-throttled tasks
nvme-host: fix the updating of the firmware version
Pull firewire fix from Takashi Sakamoto:
"A change to accelerate the device detection step in some cases.
In the self-identification step after bus-reset, all nodes in the same
bus broadcast selfID packet including the value of gap count. The
value is related to the cable hops between nodes, and used to
calculate the subaction gap and the arbitration reset gap.
When each node has the different value of the gap count, the
asynchronous communication between them is unreliable, since an
asynchronous transaction could be interrupted by another asynchronous
transaction before completion. The gap count inconsistency can be
resolved by several ways; e.g. the transfer of PHY configuration
packet and generation of bus-reset.
The current implementation of firewire stack can correctly detect the
gap count inconsistency, however the recovery action from the
inconsistency tends to be delayed after reading configuration ROM of
root node. This results in the long time to probe devices in some
combinations of hardware.
Here the stack is changed to schedule the action as soon as possible"
* tag 'firewire-fixes-6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
firewire: core: send bus reset promptly on gap count error
Pull smb server fixes from Steve French:
"Two ksmbd server fixes:
- memory leak fix
- a minor kernel-doc fix"
* tag '6.8-rc3-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
ksmbd: free aux buffer if ksmbd_iov_pin_rsp_read fails
ksmbd: Add kernel-doc for ksmbd_extract_sharename() function
This reverts commit 5785160732.
Unfortunately, while we only call that thing once, the callback
*can* be called more than once for the same dentry - all it
takes is rename_lock being touched while we are in d_walk().
For now let's revert it.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Pull SCSI fixes from James Bottomley:
"Three small driver fixes and one core fix.
The core fix being a fixup to the one in the last pull request which
didn't entirely move checking of scsi_host_busy() out from under the
host lock"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: ufs: core: Remove the ufshcd_release() in ufshcd_err_handling_prepare()
scsi: ufs: core: Fix shift issue in ufshcd_clear_cmd()
scsi: lpfc: Use unsigned type for num_sge
scsi: core: Move scsi_host_busy() out of host lock if it is for per-command
Pull smb client fixes from Steve French:
- reconnect fix
- multichannel channel selection fix
- minor mount warning fix
- reparse point fix
- null pointer check improvement
* tag '6.8-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb3: clarify mount warning
cifs: handle cases where multiple sessions share connection
cifs: change tcon status when need_reconnect is set on it
smb: client: set correct d_type for reparse points under DFS mounts
smb3: add missing null server pointer check
Pull ceph fixes from Ilya Dryomov:
"Some fscrypt-related fixups (sparse reads are used only for encrypted
files) and two cap handling fixes from Xiubo and Rishabh"
* tag 'ceph-for-6.8-rc4' of https://github.com/ceph/ceph-client:
ceph: always check dir caps asynchronously
ceph: prevent use-after-free in encode_cap_msg()
ceph: always set initial i_blkbits to CEPH_FSCRYPT_BLOCK_SHIFT
libceph: just wait for more data to be available on the socket
libceph: rename read_sparse_msg_*() to read_partial_sparse_msg_*()
libceph: fail sparse-read if the data length doesn't match
Pull ntfs3 fixes from Konstantin Komarov:
"Fixed:
- size update for compressed file
- some logic errors, overflows
- memory leak
- some code was refactored
Added:
- implement super_operations::shutdown
Improved:
- alternative boot processing
- reduced stack usage"
* tag 'ntfs3_for_6.8' of https://github.com/Paragon-Software-Group/linux-ntfs3: (28 commits)
fs/ntfs3: Slightly simplify ntfs_inode_printk()
fs/ntfs3: Add ioctl operation for directories (FITRIM)
fs/ntfs3: Fix oob in ntfs_listxattr
fs/ntfs3: Fix an NULL dereference bug
fs/ntfs3: Update inode->i_size after success write into compressed file
fs/ntfs3: Fixed overflow check in mi_enum_attr()
fs/ntfs3: Correct function is_rst_area_valid
fs/ntfs3: Use i_size_read and i_size_write
fs/ntfs3: Prevent generic message "attempt to access beyond end of device"
fs/ntfs3: use non-movable memory for ntfs3 MFT buffer cache
fs/ntfs3: Use kvfree to free memory allocated by kvmalloc
fs/ntfs3: Disable ATTR_LIST_ENTRY size check
fs/ntfs3: Fix c/mtime typo
fs/ntfs3: Add NULL ptr dereference checking at the end of attr_allocate_frame()
fs/ntfs3: Add and fix comments
fs/ntfs3: ntfs3_forced_shutdown use int instead of bool
fs/ntfs3: Implement super_operations::shutdown
fs/ntfs3: Drop suid and sgid bits as a part of fpunch
fs/ntfs3: Add file_modified
fs/ntfs3: Correct use bh_read
...
We've had issues with gcc and 'asm goto' before, and we created a
'asm_volatile_goto()' macro for that in the past: see commits
3f0116c323 ("compiler/gcc4: Add quirk for 'asm goto' miscompilation
bug") and a9f180345f ("compiler/gcc4: Make quirk for
asm_volatile_goto() unconditional").
Then, much later, we ended up removing the workaround in commit
43c249ea0b ("compiler-gcc.h: remove ancient workaround for gcc PR
58670") because we no longer supported building the kernel with the
affected gcc versions, but we left the macro uses around.
Now, Sean Christopherson reports a new version of a very similar
problem, which is fixed by re-applying that ancient workaround. But the
problem in question is limited to only the 'asm goto with outputs'
cases, so instead of re-introducing the old workaround as-is, let's
rename and limit the workaround to just that much less common case.
It looks like there are at least two separate issues that all hit in
this area:
(a) some versions of gcc don't mark the asm goto as 'volatile' when it
has outputs:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98619https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110420
which is easy to work around by just adding the 'volatile' by hand.
(b) Internal compiler errors:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110422
which are worked around by adding the extra empty 'asm' as a
barrier, as in the original workaround.
but the problem Sean sees may be a third thing since it involves bad
code generation (not an ICE) even with the manually added 'volatile'.
but the same old workaround works for this case, even if this feels a
bit like voodoo programming and may only be hiding the issue.
Reported-and-tested-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/all/20240208220604.140859-1-seanjc@google.com/
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Uros Bizjak <ubizjak@gmail.com>
Cc: Jakub Jelinek <jakub@redhat.com>
Cc: Andrew Pinski <quic_apinski@quicinc.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Breno Leitao says:
====================
net: Fix MODULE_DESCRIPTION() for net (p5)
There are hundreds of network modules that misses MODULE_DESCRIPTION(),
causing a warning when compiling with W=1. Example:
WARNING: modpost: missing MODULE_DESCRIPTION() in net/sched/em_cmp.o
WARNING: modpost: missing MODULE_DESCRIPTION() in net/sched/em_nbyte.o
WARNING: modpost: missing MODULE_DESCRIPTION() in net/sched/em_u32.o
WARNING: modpost: missing MODULE_DESCRIPTION() in net/sched/em_meta.o
WARNING: modpost: missing MODULE_DESCRIPTION() in net/sched/em_text.o
WARNING: modpost: missing MODULE_DESCRIPTION() in net/sched/em_canid.o
WARNING: modpost: missing MODULE_DESCRIPTION() in net/ipv4/ip_tunnel.o
WARNING: modpost: missing MODULE_DESCRIPTION() in net/ipv4/ipip.o
WARNING: modpost: missing MODULE_DESCRIPTION() in net/ipv4/ip_gre.o
WARNING: modpost: missing MODULE_DESCRIPTION() in net/ipv4/udp_tunnel.o
WARNING: modpost: missing MODULE_DESCRIPTION() in net/ipv4/ip_vti.o
WARNING: modpost: missing MODULE_DESCRIPTION() in net/ipv4/ah4.o
WARNING: modpost: missing MODULE_DESCRIPTION() in net/ipv4/esp4.o
WARNING: modpost: missing MODULE_DESCRIPTION() in net/ipv4/xfrm4_tunnel.o
WARNING: modpost: missing MODULE_DESCRIPTION() in net/ipv4/tunnel4.o
WARNING: modpost: missing MODULE_DESCRIPTION() in net/xfrm/xfrm_algo.o
WARNING: modpost: missing MODULE_DESCRIPTION() in net/xfrm/xfrm_user.o
WARNING: modpost: missing MODULE_DESCRIPTION() in net/ipv6/ah6.o
WARNING: modpost: missing MODULE_DESCRIPTION() in net/ipv6/esp6.o
WARNING: modpost: missing MODULE_DESCRIPTION() in net/ipv6/xfrm6_tunnel.o
WARNING: modpost: missing MODULE_DESCRIPTION() in net/ipv6/tunnel6.o
This part5 of the patchset focus on the missing net/ module, which
are now warning free.
v1: https://lore.kernel.org/all/20240205101400.1480521-1-leitao@debian.org/
v2: https://lore.kernel.org/all/20240207101929.484681-1-leitao@debian.org/
====================
Link: https://lore.kernel.org/r/20240208164244.3818498-1-leitao@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
There is a crash when adding one of the lan966x interfaces under a lag
interface. The issue can be reproduced like this:
ip link add name bond0 type bond miimon 100 mode balance-xor
ip link set dev eth0 master bond0
The reason is because when adding a interface under the lag it would go
through all the ports and try to figure out which other ports are under
that lag interface. And the issue is that lan966x can have ports that are
NULL pointer as they are not probed. So then iterating over these ports
it would just crash as they are NULL pointers.
The fix consists in actually checking for NULL pointers before accessing
something from the ports. Like we do in other places.
Fixes: cabc9d4933 ("net: lan966x: Add lag support for lan966x")
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240206123054.3052966-1-horatiu.vultur@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
While testing tdc with parallel tests for mirred to block we caught an
intermittent bug. The blockid was being zeroed out when a net device
was deleted and, thus, giving us an incorrect blockid value whenever
we tried to dump the mirred action. Since we don't increment the block
refcount in the control path (and only use the ID), we don't need to
zero the blockid field whenever a net device is going down.
Fixes: 42f39036cd ("net/sched: act_mirred: Allow mirred to block")
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://lore.kernel.org/r/20240207222902.1469398-1-victor@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Aaron Conole says:
====================
net: openvswitch: limit the recursions from action sets
Open vSwitch module accepts actions as a list from the netlink socket
and then creates a copy which it uses in the action set processing.
During processing of the action list on a packet, the module keeps a
count of the execution depth and exits processing if the action depth
goes too high.
However, during netlink processing the recursion depth isn't checked
anywhere, and the copy trusts that kernel has large enough stack to
accommodate it. The OVS sample action was the original action which
could perform this kinds of recursion, and it originally checked that
it didn't exceed the sample depth limit. However, when sample became
optimized to provide the clone() semantics, the recursion limit was
dropped.
This series adds a depth limit during the __ovs_nla_copy_actions() call
that will ensure we don't exceed the max that the OVS userspace could
generate for a clone().
Additionally, this series provides a selftest in 2/2 that can be used to
determine if the OVS module is allowing unbounded access. It can be
safely omitted where the ovs selftest framework isn't available.
====================
Link: https://lore.kernel.org/r/20240207132416.1488485-1-aconole@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add a test case into the netlink checks that will show the number of
nested action recursions won't exceed 16. Going to 17 on a small
clone call isn't enough to exhaust the stack on (most) systems, so
it should be safe to run even on systems that don't have the fix
applied.
Signed-off-by: Aaron Conole <aconole@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240207132416.1488485-3-aconole@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The ovs module allows for some actions to recursively contain an action
list for complex scenarios, such as sampling, checking lengths, etc.
When these actions are copied into the internal flow table, they are
evaluated to validate that such actions make sense, and these calls
happen recursively.
The ovs-vswitchd userspace won't emit more than 16 recursion levels
deep. However, the module has no such limit and will happily accept
limits larger than 16 levels nested. Prevent this by tracking the
number of recursions happening and manually limiting it to 16 levels
nested.
The initial implementation of the sample action would track this depth
and prevent more than 3 levels of recursion, but this was removed to
support the clone use case, rather than limited at the current userspace
limit.
Fixes: 798c166173 ("openvswitch: Optimize sample action for the clone use cases")
Signed-off-by: Aaron Conole <aconole@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240207132416.1488485-2-aconole@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When a user tries to use the "sec=krb5p" mount parameter to encrypt
data on connection to a server (when authenticating with Kerberos), we
indicate that it is not supported, but do not note the equivalent
recommended mount parameter ("sec=krb5,seal") which turns on encryption
for that mount (and uses Kerberos for auth). Update the warning message.
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Based on our implementation of multichannel, it is entirely
possible that a server struct may not be found in any channel
of an SMB session.
In such cases, we should be prepared to move on and search for
the server struct in the next session.
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
When a tcon is marked for need_reconnect, the intention
is to have it reconnected.
This change adjusts tcon->status in cifs_tree_connect
when need_reconnect is set. Also, this change has a minor
correction in resetting need_reconnect on success. It makes
sure that it is done with tc_lock held.
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Ido Schimmel says:
====================
selftests: forwarding: Various fixes
Fix various problems in the forwarding selftests so that they will pass
in the netdev CI instead of being ignored. See commit messages for
details.
====================
Link: https://lore.kernel.org/r/20240208155529.1199729-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The redirection test case fails in the netdev CI on debug kernels
because an FDB entry is learned despite the presence of a tc filter that
redirects incoming traffic [1].
I am unable to reproduce the failure locally, but I can see how it can
happen given that learning is first enabled and only then the ingress tc
filter is configured. On debug kernels the time window between these two
operations is longer compared to regular kernels, allowing random
packets to be transmitted and trigger learning.
Fix by reversing the order and configure the ingress tc filter before
enabling learning.
[1]
[...]
# TEST: Locked port MAB redirect [FAIL]
# Locked entry created for redirected traffic
Fixes: 38c43a1ce7 ("selftests: forwarding: Add test case for traffic redirection from a locked port")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://lore.kernel.org/r/20240208155529.1199729-5-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Suppress the following grep warnings:
[...]
INFO: # Port group entries configuration tests - (*, G)
TEST: Common port group entries configuration tests (IPv4 (*, G)) [ OK ]
TEST: Common port group entries configuration tests (IPv6 (*, G)) [ OK ]
grep: warning: stray \ before /
grep: warning: stray \ before /
grep: warning: stray \ before /
TEST: IPv4 (*, G) port group entries configuration tests [ OK ]
grep: warning: stray \ before /
grep: warning: stray \ before /
grep: warning: stray \ before /
TEST: IPv6 (*, G) port group entries configuration tests [ OK ]
[...]
They do not fail the test, but do clutter the output.
Fixes: b6d00da086 ("selftests: forwarding: Add bridge MDB test")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://lore.kernel.org/r/20240208155529.1199729-4-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
After enabling a multicast querier on the bridge (like the test is
doing), the bridge will wait for the Max Response Delay before starting
to forward according to its MDB in order to let Membership Reports
enough time to be received and processed.
Currently, the test is waiting for exactly the default Max Response
Delay (10 seconds) which is racy and leads to failures [1].
Fix by reducing the Max Response Delay to 1 second.
[1]
[...]
# TEST: IPv4 host entries forwarding tests [FAIL]
# Packet locally received after flood
Fixes: b6d00da086 ("selftests: forwarding: Add bridge MDB test")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://lore.kernel.org/r/20240208155529.1199729-3-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
After enabling a multicast querier on the bridge (like the test is
doing), the bridge will wait for the Max Response Delay before starting
to forward according to its MDB in order to let Membership Reports
enough time to be received and processed.
Currently, the test is waiting for exactly the default Max Response
Delay (10 seconds) which is racy and leads to failures [1].
Fix by reducing the Max Response Delay to 1 second.
[1]
[...]
# TEST: L2 miss - Multicast (IPv4) [FAIL]
# Unregistered multicast filter was hit after adding MDB entry
Fixes: 8c33266ae2 ("selftests: forwarding: Add layer 2 miss test cases")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://lore.kernel.org/r/20240208155529.1199729-2-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The test toggles the carrier of a bridge port in order to test the
bridge backup port feature.
Due to the linkwatch delayed work the carrier change is not always
reflected fast enough to the bridge driver and packets are not forwarded
as the test expects, resulting in failures [1].
Fix by busy waiting on the bridge port state until it changes to the
desired state following the carrier change.
[1]
# Backup port
# -----------
[...]
# TEST: swp1 carrier off [ OK ]
# TEST: No forwarding out of swp1 [FAIL]
[ 641.995910] br0: port 1(swp1) entered disabled state
# TEST: No forwarding out of vx0 [ OK ]
Fixes: b408453053 ("selftests: net: Add bridge backup port and backup nexthop ID test")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://lore.kernel.org/r/20240208123110.1063930-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Space reservations for metadata are, most of the time, pessimistic as we
reserve space for worst possible cases - where tree heights are at the
maximum possible height (8), we need to COW every extent buffer in a tree
path, need to split extent buffers, etc.
For data, we generally reserve the exact amount of space we are going to
allocate. The exception here is when using compression, in which case we
reserve space matching the uncompressed size, as the compression only
happens at writeback time and in the worst possible case we need that
amount of space in case the data is not compressible.
This means that when there's not available space in the corresponding
space_info object, we may need to allocate a new block group, and then
that block group might not be used after all. In this case the block
group is never added to the list of unused block groups and ends up
never being deleted - except if we unmount and mount again the fs, as
when reading block groups from disk we add unused ones to the list of
unused block groups (fs_info->unused_bgs). Otherwise a block group is
only added to the list of unused block groups when we deallocate the
last extent from it, so if no extent is ever allocated, the block group
is kept around forever.
This also means that if we have a bunch of tasks reserving space in
parallel we can end up allocating many block groups that end up never
being used or kept around for too long without being used, which has
the potential to result in ENOSPC failures in case for example we over
allocate too many metadata block groups and then end up in a state
without enough unallocated space to allocate a new data block group.
This is more likely to happen with metadata reservations as of kernel
6.7, namely since commit 28270e25c6 ("btrfs: always reserve space for
delayed refs when starting transaction"), because we started to always
reserve space for delayed references when starting a transaction handle
for a non-zero number of items, and also to try to reserve space to fill
the gap between the delayed block reserve's reserved space and its size.
So to avoid this, when finishing the creation a new block group, add the
block group to the list of unused block groups if it's still unused at
that time. This way the next time the cleaner kthread runs, it will delete
the block group if it's still unused and not needed to satisfy existing
space reservations.
Reported-by: Ivan Shapovalov <intelfx@intelfx.name>
Link: https://lore.kernel.org/linux-btrfs/9cdbf0ca9cdda1b4c84e15e548af7d7f9f926382.camel@intelfx.name/
CC: stable@vger.kernel.org # 6.7+
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Before deleting a block group that is in the list of unused block groups
(fs_info->unused_bgs), we check if the block group became used before
deleting it, as extents from it may have been allocated after it was added
to the list.
However even if the block group was not yet used, there may be tasks that
have only reserved space and have not yet allocated extents, and they
might be relying on the availability of the unused block group in order
to allocate extents. The reservation works first by increasing the
"bytes_may_use" field of the corresponding space_info object (which may
first require flushing delayed items, allocating a new block group, etc),
and only later a task does the actual allocation of extents.
For metadata we usually don't end up using all reserved space, as we are
pessimistic and typically account for the worst cases (need to COW every
single node in a path of a tree at maximum possible height, etc). For
data we usually reserve the exact amount of space we're going to allocate
later, except when using compression where we always reserve space based
on the uncompressed size, as compression is only triggered when writeback
starts so we don't know in advance how much space we'll actually need, or
if the data is compressible.
So don't delete an unused block group if the total size of its space_info
object minus the block group's size is less then the sum of used space and
space that may be used (space_info->bytes_may_use), as that means we have
tasks that reserved space and may need to allocate extents from the block
group. In this case, besides skipping the deletion, re-add the block group
to the list of unused block groups so that it may be reconsidered later,
in case the tasks that reserved space end up not needing to allocate
extents from it.
Allowing the deletion of the block group while we have reserved space, can
result in tasks failing to allocate metadata extents (-ENOSPC) while under
a transaction handle, resulting in a transaction abort, or failure during
writeback for the case of data extents.
CC: stable@vger.kernel.org # 6.0+
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Add a helper function to determine if a block group is being used and make
use of it at btrfs_delete_unused_bgs(). This helper will also be used in
future code changes.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
While running the CI for an unrelated change I hit the following panic
with generic/648 on btrfs_holes_spacecache.
assertion failed: block_start != EXTENT_MAP_HOLE, in fs/btrfs/extent_io.c:1385
------------[ cut here ]------------
kernel BUG at fs/btrfs/extent_io.c:1385!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 1 PID: 2695096 Comm: fsstress Kdump: loaded Tainted: G W 6.8.0-rc2+ #1
RIP: 0010:__extent_writepage_io.constprop.0+0x4c1/0x5c0
Call Trace:
<TASK>
extent_write_cache_pages+0x2ac/0x8f0
extent_writepages+0x87/0x110
do_writepages+0xd5/0x1f0
filemap_fdatawrite_wbc+0x63/0x90
__filemap_fdatawrite_range+0x5c/0x80
btrfs_fdatawrite_range+0x1f/0x50
btrfs_write_out_cache+0x507/0x560
btrfs_write_dirty_block_groups+0x32a/0x420
commit_cowonly_roots+0x21b/0x290
btrfs_commit_transaction+0x813/0x1360
btrfs_sync_file+0x51a/0x640
__x64_sys_fdatasync+0x52/0x90
do_syscall_64+0x9c/0x190
entry_SYSCALL_64_after_hwframe+0x6e/0x76
This happens because we fail to write out the free space cache in one
instance, come back around and attempt to write it again. However on
the second pass through we go to call btrfs_get_extent() on the inode to
get the extent mapping. Because this is a new block group, and with the
free space inode we always search the commit root to avoid deadlocking
with the tree, we find nothing and return a EXTENT_MAP_HOLE for the
requested range.
This happens because the first time we try to write the space cache out
we hit an error, and on an error we drop the extent mapping. This is
normal for normal files, but the free space cache inode is special. We
always expect the extent map to be correct. Thus the second time
through we end up with a bogus extent map.
Since we're deprecating this feature, the most straightforward way to
fix this is to simply skip dropping the extent map range for this failed
range.
I shortened the test by using error injection to stress the area to make
it easier to reproduce. With this patch in place we no longer panic
with my error injection test.
CC: stable@vger.kernel.org # 4.14+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Pull RISC-V fixes from Palmer Dabbelt:
- fix missing TLB flush during early boot on SPARSEMEM_VMEMMAP
configurations
- fixes to correctly implement the break-before-make behavior requried
by the ISA for NAPOT mappings
- fix a missing TLB flush on intermediate mapping changes
- fix build warning about a missing declaration of overflow_stack
- fix performace regression related to incorrect tracking of completed
batch TLB flushes
* tag 'riscv-for-linus-6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: Fix arch_tlbbatch_flush() by clearing the batch cpumask
riscv: declare overflow_stack as exported from traps.c
riscv: Fix arch_hugetlb_migration_supported() for NAPOT
riscv: Flush the tlb when a page directory is freed
riscv: Fix hugetlb_mask_last_page() when NAPOT is enabled
riscv: Fix set_huge_pte_at() for NAPOT mapping
riscv: mm: execute local TLB flush after populating vmemmap
Pull tracing fixes from Steven Rostedt:
- Fix broken direct trampolines being called when another callback is
attached the same function.
ARM 64 does not support FTRACE_WITH_REGS, and when it added direct
trampoline calls from ftrace, it removed the "WITH_REGS" flag from
the ftrace_ops for direct trampolines. This broke x86 as x86 requires
direct trampolines to have WITH_REGS.
This wasn't noticed because direct trampolines work as long as the
function it is attached to is not shared with other callbacks (like
the function tracer). When there are other callbacks, a helper
trampoline is called, to call all the non direct callbacks and when
it returns, the direct trampoline is called.
For x86, the direct trampoline sets a flag in the regs field to tell
the x86 specific code to call the direct trampoline. But this only
works if the ftrace_ops had WITH_REGS set. ARM does things
differently that does not require this. For now, set WITH_REGS if the
arch supports WITH_REGS (which ARM does not), and this makes it work
for both ARM64 and x86.
- Fix wasted memory in the saved_cmdlines logic.
The saved_cmdlines is a cache that maps PIDs to COMMs that tracing
can use. Most trace events only save the PID in the event. The
saved_cmdlines file lists PIDs to COMMs so that the tracing tools can
show an actual name and not just a PID for each event. There's an
array of PIDs that map to a small set of saved COMM strings. The
array is set to PID_MAX_DEFAULT which is usually set to 32768. When a
PID comes in, it will add itself to this array along with the index
into the COMM array (note if the system allows more than
PID_MAX_DEFAULT, this cache is similar to cache lines as an update of
a PID that has the same PID_MAX_DEFAULT bits set will flush out
another task with the same matching bits set).
A while ago, the size of this cache was changed to be dynamic and the
array was moved into a structure and created with kmalloc(). But this
new structure had the size of 131104 bytes, or 0x20020 in hex. As
kmalloc allocates in powers of two, it was actually allocating
0x40000 bytes (262144) leaving 131040 bytes of wasted memory. The
last element of this structure was a pointer to the COMM string array
which defaulted to just saving 128 COMMs.
By changing the last field of this structure to a variable length
string, and just having it round up to fill the allocated memory, the
default size of the saved COMM cache is now 8190. This not only uses
the wasted space, but actually saves space by removing the extra
allocation for the COMM names.
* tag 'trace-v6.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: Fix wasted memory in saved_cmdlines logic
ftrace: Fix DIRECT_CALLS to use SAVE_REGS by default
Pull probes fixes from Masami Hiramatsu:
- remove unnecessary initial values of kprobes local variables
- probe-events parser bug fixes:
- calculate the argument size and format string after setting type
information from BTF, because BTF can change the size and format
string.
- show $comm parse error correctly instead of failing silently.
* tag 'probes-fixes-v6.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
kprobes: Remove unnecessary initial values of variables
tracing/probes: Fix to set arg size and fmt after setting type from BTF
tracing/probes: Fix to show a parse error for bad type for $comm
The commit noted in fixes added a bogus requirement that runtime PM managed
devices need to be in the RPM_ACTIVE state for PME polling. In fact, only
devices in low power states should be polled.
However there's still a requirement that the device config space must be
accessible, which has implications for both the current state of the polled
device and the parent bridge, when present. It's not sufficient to assume
the bridge remains in D0 and cases have been observed where the bridge
passes the D0 test, but the PM state indicates RPM_SUSPENDING and config
space of the polled device becomes inaccessible during pci_pme_wakeup().
Therefore, since the bridge is already effectively required to be in the
RPM_ACTIVE state, formalize this in the code and elevate the PM usage count
to maintain the state while polling the subordinate device.
This resolves a regression reported in the bugzilla below where a
Thunderbolt/USB4 hierarchy fails to scan for an attached NVMe endpoint
downstream of a bridge in a D3hot power state.
Link: https://lore.kernel.org/r/20240123185548.1040096-1-alex.williamson@redhat.com
Fixes: d3fcd73603 ("PCI: Fix runtime PM race with PME polling")
Reported-by: Sanath S <sanath.s@amd.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218360
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Sanath S <sanath.s@amd.com>
Reviewed-by: Rafael J. Wysocki <rafael@kernel.org>
Cc: Lukas Wunner <lukas@wunner.de>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Pull EFI fixes from Ard Biesheuvel:
"The only notable change here is the patch that changes the way we deal
with spurious errors from the EFI memory attribute protocol. This will
be backported to v6.6, and is intended to ensure that we will not
paint ourselves into a corner when we tighten this further in order to
comply with MS requirements on signed EFI code.
Note that this protocol does not currently exist in x86 production
systems in the field, only in Microsoft's fork of OVMF, but it will be
mandatory for Windows logo certification for x86 PCs in the future.
- Tighten ELF relocation checks on the RISC-V EFI stub
- Give up if the new EFI memory attributes protocol fails spuriously
on x86
- Take care not to place the kernel in the lowest 16 MB of DRAM on
x86
- Omit special purpose EFI memory from memblock
- Some fixes for the CXL CPER reporting code
- Make the PE/COFF layout of mixed-mode capable images comply with a
strict interpretation of the spec"
* tag 'efi-fixes-for-v6.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
x86/efistub: Use 1:1 file:memory mapping for PE/COFF .compat section
cxl/trace: Remove unnecessary memcpy's
cxl/cper: Fix errant CPER prints for CXL events
efi: Don't add memblocks for soft-reserved memory
efi: runtime: Fix potential overflow of soft-reserved region size
efi/libstub: Add one kernel-doc comment
x86/efistub: Avoid placing the kernel below LOAD_PHYSICAL_ADDR
x86/efistub: Give up if memory attribute protocol returns an error
riscv/efistub: Tighten ELF relocation check
riscv/efistub: Ensure GP-relative addressing is not used
Pull pci fixes from Bjorn Helgaas:
- Fix an unintentional truncation of DWC MSI-X address to 32 bits and
update similar MSI code to match (Dan Carpenter)
* tag 'pci-v6.8-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
PCI: dwc: Clean up dw_pcie_ep_raise_msi_irq() alignment
PCI: dwc: Fix a 64bit bug in dw_pcie_ep_raise_msix_irq()
Pull hwmon fixes from Guenter Roeck:
- coretemp: Various fixes, and increase number of supported CPU cores
- aspeed-pwm-tacho: Add missing mutex protection
* tag 'hwmon-for-v6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (coretemp) Enlarge per package core count limit
hwmon: (coretemp) Fix bogus core_id to attr name mapping
hwmon: (coretemp) Fix out-of-bounds memory access
hwmon: (aspeed-pwm-tacho) mutex for tach reading
Pull pmdomain fixes from Ulf Hansson:
"Core:
- Move the unused cleanup to a _sync initcall
Providers:
- mediatek: Fix race conditions at probe/remove with genpd
- renesas: r8a77980-sysc: CR7 must be always on"
* tag 'pmdomain-v6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm:
pmdomain: mediatek: fix race conditions with genpd
pmdomain: renesas: r8a77980-sysc: CR7 must be always on
pmdomain: core: Move the unused cleanup to a _sync initcall
Pull gpio fix from Bartosz Golaszewski:
- remove the new GPIO device from the global list unconditionally in
error path in core GPIOLIB
* tag 'gpio-fixes-for-v6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpio: remove GPIO device from the list unconditionally in error path
Pull drm fixes from Dave Airlie:
"Regular weekly fixes, xe, amdgpu and msm are most of them, with some
misc in i915, ivpu and nouveau, scattered but nothing too intense at
this point.
i915:
- gvt: docs fix, uninit var, MAINTAINERS
ivpu:
- add aborted job status
- disable d3 hot delay
- mmu fixes
nouveau:
- fix gsp rpc size request
- fix dma buffer leaks
- use common code for gsp mem ctor
xe:
- Fix a loop in an error path
- Fix a missing dma-fence reference
- Fix a retry path on userptr REMAP
- Workaround for a false gcc warning
- Fix missing map of the usm batch buffer in the migrate vm.
- Fix a memory leak.
- Fix a bad assumption of used page size
- Fix hitting a BUG() due to zero pages to map.
- Remove some leftover async bind queue relics
amdgpu:
- Misc NULL/bounds check fixes
- ODM pipe policy fix
- Aborted suspend fixes
- JPEG 4.0.5 fix
- DCN 3.5 fixes
- PSP fix
- DP MST fix
- Phantom pipe fix
- VRAM vendor fix
- Clang fix
- SR-IOV fix
msm:
- DPU:
- fix for kernel doc warnings and smatch warnings in dpu_encoder
- fix for smatch warning in dpu_encoder
- fix the bus bandwidth value for SDM670
- DP:
- fixes to handle unknown bpc case correctly for DP
- fix for MISC0 programming
- GPU:
- dmabuf vmap fix
- a610 UBWC corruption fix (incorrect hbb)
- revert a commit that was making GPU recovery unreliable"
* tag 'drm-fixes-2024-02-09' of git://anongit.freedesktop.org/drm/drm: (43 commits)
drm/xe: Remove TEST_VM_ASYNC_OPS_ERROR
drm/xe/vm: don't ignore error when in_kthread
drm/xe: Assume large page size if VMA not yet bound
drm/xe/display: Fix memleak in display initialization
drm/xe: Map both mem.kernel_bb_pool and usm.bb_pool
drm/xe: circumvent bogus stringop-overflow warning
drm/xe: Pick correct userptr VMA to repin on REMAP op failure
drm/xe: Take a reference in xe_exec_queue_last_fence_get()
drm/xe: Fix loop in vm_bind_ioctl_ops_unwind
drm/amdgpu: Fix HDP flush for VFs on nbio v7.9
drm/amd/display: Implement bounds check for stream encoder creation in DCN301
drm/amd/display: Increase frame-larger-than for all display_mode_vba files
drm/amd/display: Clear phantom stream count and plane count
drm/amdgpu: Avoid fetching VRAM vendor info
drm/amd/display: Disable ODM by default for DCN35
drm/amd/display: Update phantom pipe enable / disable sequence
drm/amd/display: Fix MST Null Ptr for RV
drm/amdgpu: Fix shared buff copy to user
drm/amd/display: Increase eval/entry delay for DCN35
drm/amdgpu: remove asymmetrical irq disabling in jpeg 4.0.5 suspend
...
The generic constraint "i" seems to be copied from x86 or arm (and with
a redundant generic operand modifier "c"). It works with -fno-PIE but
not with -fPIE/-fPIC in GCC's aarch64 port.
The machine constraint "S", which denotes a symbol or label reference
with a constant offset, supports PIC and has been available in GCC since
2012 and in Clang since 7.0. However, Clang before 19 does not support
"S" on a symbol with a constant offset [1] (e.g.
`static_key_false(&nf_hooks_needed[pf][hook])` in
include/linux/netfilter.h), so we use "i" as a fallback.
Suggested-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Fangrui Song <maskray@google.com>
Link: https://github.com/llvm/llvm-project/pull/80255 [1]
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20240206074552.541154-1-maskray@google.com
Signed-off-by: Will Deacon <will@kernel.org>
When we are in a syscall we will only save the FPSIMD subset even though
the task still has access to the full register set, and on context switch
we will only remove TIF_SVE when loading the register state. This means
that the signal handling code should not assume that TIF_SVE means that
the register state is stored in SVE format, it should instead check the
format that was recorded during save.
Fixes: 8c845e2731 ("arm64/sve: Leave SVE enabled on syscall if we don't context switch")
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240130-arm64-sve-signal-regs-v2-1-9fc6f9502782@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
The kernel built with MCRUSOE is unbootable on Transmeta Crusoe. It shows
the following error message:
This kernel requires an i686 CPU, but only detected an i586 CPU.
Unable to boot - please use a kernel appropriate for your CPU.
Remove MCRUSOE from the condition introduced in commit in Fixes, effectively
changing X86_MINIMUM_CPU_FAMILY back to 5 on that machine, which matches the
CPU family given by CPUID.
[ bp: Massage commit message. ]
Fixes: 25d76ac888 ("x86/Kconfig: Explicitly enumerate i686-class CPUs in Kconfig")
Signed-off-by: Aleksander Mazur <deweloper@wp.pl>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: <stable@kernel.org>
Link: https://lore.kernel.org/r/20240123134309.1117782-1-deweloper@wp.pl
the driver currently fails to compile on 6.8-rc3 due to:
| spi-ppc4xx.c: In function ‘spi_ppc4xx_of_probe’:
| @346:36: error: invalid use of undefined type ‘struct platform_device’
| 346 | struct device_node *np = op->dev.of_node;
| | ^~
| ... (more similar errors)
it was working with 6.7. Looks like it only needed the include
and its compiling fine!
Signed-off-by: Christian Lamparter <chunkeey@gmail.com>
Link: https://lore.kernel.org/r/3eb3f9c4407ba99d1cd275662081e46b9e839173.1707490664.git.chunkeey@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
The driver never uses the IRQ1_CFG register so there's no need to provide
a default value. It's set as a readable register only for debugging
through the regmap registers file.
A system-specific firmware could overwrite this register with a non-default
value. Therefore the driver can't hardcode what the initial value actually
is. As the register is only for debugging the value can be left unknown
until someone wants to read it through debugfs.
Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Link: https://lore.kernel.org/r/20240209145700.1555950-1-rf@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>
It looks like all distributions will enable INIT_STACK_ALL_ZERO.
Reflect that in the default configurations.
Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
CONFIG_COMPAT is disabled by default, however compat code still needs to
be compile tested. Add a compat topic configuration target which allows to
enable it easily.
Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Commit 73cfbfa9ca ("ALSA: hda/cs35l56: Add driver for Cirrus Logic
CS35L56 amplifier") adds configs SND_HDA_SCODEC_CS35L56_{I2C,SPI},
which selects the non-existing config CS_DSP. Note the renaming in
commit d7cfdf17cb ("firmware: cs_dsp: Rename KConfig symbol CS_DSP ->
FW_CS_DSP"), though.
Select the intended config FW_CS_DSP.
This broken select command probably was not noticed as the configs also
select SND_HDA_CS_DSP_CONTROLS and this then selects FW_CS_DSP. So, the
select FW_CS_DSP could actually be dropped, but we will keep this
redundancy in place as the author originally also intended to have this
redundancy of selects in place.
Fixes: 73cfbfa9ca ("ALSA: hda/cs35l56: Add driver for Cirrus Logic CS35L56 amplifier")
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Reviewed-by: Simon Trimmer <simont@opensource.cirrus.com>
Link: https://lore.kernel.org/r/20240209082044.3981-1-lukas.bulwahn@gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
While looking at improving the saved_cmdlines cache I found a huge amount
of wasted memory that should be used for the cmdlines.
The tracing data saves pids during the trace. At sched switch, if a trace
occurred, it will save the comm of the task that did the trace. This is
saved in a "cache" that maps pids to comms and exposed to user space via
the /sys/kernel/tracing/saved_cmdlines file. Currently it only caches by
default 128 comms.
The structure that uses this creates an array to store the pids using
PID_MAX_DEFAULT (which is usually set to 32768). This causes the structure
to be of the size of 131104 bytes on 64 bit machines.
In hex: 131104 = 0x20020, and since the kernel allocates generic memory in
powers of two, the kernel would allocate 0x40000 or 262144 bytes to store
this structure. That leaves 131040 bytes of wasted space.
Worse, the structure points to an allocated array to store the comm names,
which is 16 bytes times the amount of names to save (currently 128), which
is 2048 bytes. Instead of allocating a separate array, make the structure
end with a variable length string and use the extra space for that.
This is similar to a recommendation that Linus had made about eventfs_inode names:
https://lore.kernel.org/all/20240130190355.11486-5-torvalds@linux-foundation.org/
Instead of allocating a separate string array to hold the saved comms,
have the structure end with: char saved_cmdlines[]; and round up to the
next power of two over sizeof(struct saved_cmdline_buffers) + num_cmdlines * TASK_COMM_LEN
It will use this extra space for the saved_cmdline portion.
Now, instead of saving only 128 comms by default, by using this wasted
space at the end of the structure it can save over 8000 comms and even
saves space by removing the need for allocating the other array.
Link: https://lore.kernel.org/linux-trace-kernel/20240209063622.1f7b6d5f@rorschach.local.home
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Vincent Donnefort <vdonnefort@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Mete Durlu <meted@linux.ibm.com>
Fixes: 939c7a4f04 ("tracing: Introduce saved_cmdlines_size file")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Similar to the existing "ports" node name, coresight device tree bindings
have added "in-ports" and "out-ports" as standard node names for a
collection of ports.
Add support for these name to of_graph_get_port_parent() so that
remote-endpoint parsing can find the correct parent node for these
coresight ports too.
Signed-off-by: Saravana Kannan <saravanak@google.com>
Link: https://lore.kernel.org/r/20240207011803.2637531-4-saravanak@google.com
Signed-off-by: Rob Herring <robh@kernel.org>
After commit 4a032827da ("of: property: Simplify of_link_to_phandle()"),
remote-endpoint properties created a fwnode link from the consumer device
to the supplier endpoint. This is a tiny bit inefficient (not buggy) when
trying to create device links or detecting cycles. So, improve this the
same way we improved finding the consumer of a remote-endpoint property.
Fixes: 4a032827da ("of: property: Simplify of_link_to_phandle()")
Signed-off-by: Saravana Kannan <saravanak@google.com>
Link: https://lore.kernel.org/r/20240207011803.2637531-3-saravanak@google.com
Signed-off-by: Rob Herring <robh@kernel.org>
We have a more accurate function to find the right consumer of a
remote-endpoint property instead of searching for a parent with
compatible string property. So, use that instead. While at it, make the
code to find the consumer a bit more flexible and based on the property
being parsed.
Fixes: f7514a6630 ("of: property: fw_devlink: Add support for remote-endpoint")
Signed-off-by: Saravana Kannan <saravanak@google.com>
Link: https://lore.kernel.org/r/20240207011803.2637531-2-saravanak@google.com
Signed-off-by: Rob Herring <robh@kernel.org>
This reverts commit 398aa9a7e7.
The update to the gadget API to support EBC feature is incomplete. It's
missing at least the following:
* New usage documentation
* Gadget capability check
* Condition for the user to check how many and which endpoints can be
used as "fifo_mode"
* Description of how it can affect completed request (e.g. dwc3 won't
update TRB on completion -- ie. how it can affect request's actual
length report)
Let's revert this until it's ready.
Fixes: 398aa9a7e7 ("usb: dwc3: Support EBC feature of DWC_usb31")
Signed-off-by: Thinh Nguyen <Thinh.Nguyen@synopsys.com>
Link: https://lore.kernel.org/r/3042f847ff904b4dd4e4cf66a1b9df470e63439e.1707441690.git.Thinh.Nguyen@synopsys.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The commit 60c8971899 ("ftrace: Make DIRECT_CALLS work WITH_ARGS
and !WITH_REGS") changed DIRECT_CALLS to use SAVE_ARGS when there
are multiple ftrace_ops at the same function, but since the x86 only
support to jump to direct_call from ftrace_regs_caller, when we set
the function tracer on the same target function on x86, ftrace-direct
does not work as below (this actually works on arm64.)
At first, insmod ftrace-direct.ko to put a direct_call on
'wake_up_process()'.
# insmod kernel/samples/ftrace/ftrace-direct.ko
# less trace
...
<idle>-0 [006] ..s1. 564.686958: my_direct_func: waking up rcu_preempt-17
<idle>-0 [007] ..s1. 564.687836: my_direct_func: waking up kcompactd0-63
<idle>-0 [006] ..s1. 564.690926: my_direct_func: waking up rcu_preempt-17
<idle>-0 [006] ..s1. 564.696872: my_direct_func: waking up rcu_preempt-17
<idle>-0 [007] ..s1. 565.191982: my_direct_func: waking up kcompactd0-63
Setup a function filter to the 'wake_up_process' too, and enable it.
# cd /sys/kernel/tracing/
# echo wake_up_process > set_ftrace_filter
# echo function > current_tracer
# less trace
...
<idle>-0 [006] ..s3. 686.180972: wake_up_process <-call_timer_fn
<idle>-0 [006] ..s3. 686.186919: wake_up_process <-call_timer_fn
<idle>-0 [002] ..s3. 686.264049: wake_up_process <-call_timer_fn
<idle>-0 [002] d.h6. 686.515216: wake_up_process <-kick_pool
<idle>-0 [002] d.h6. 686.691386: wake_up_process <-kick_pool
Then, only function tracer is shown on x86.
But if you enable 'kprobe on ftrace' event (which uses SAVE_REGS flag)
on the same function, it is shown again.
# echo 'p wake_up_process' >> dynamic_events
# echo 1 > events/kprobes/p_wake_up_process_0/enable
# echo > trace
# less trace
...
<idle>-0 [006] ..s2. 2710.345919: p_wake_up_process_0: (wake_up_process+0x4/0x20)
<idle>-0 [006] ..s3. 2710.345923: wake_up_process <-call_timer_fn
<idle>-0 [006] ..s1. 2710.345928: my_direct_func: waking up rcu_preempt-17
<idle>-0 [006] ..s2. 2710.349931: p_wake_up_process_0: (wake_up_process+0x4/0x20)
<idle>-0 [006] ..s3. 2710.349934: wake_up_process <-call_timer_fn
<idle>-0 [006] ..s1. 2710.349937: my_direct_func: waking up rcu_preempt-17
To fix this issue, use SAVE_REGS flag for multiple ftrace_ops flag of
direct_call by default.
Link: https://lore.kernel.org/linux-trace-kernel/170484558617.178953.1590516949390270842.stgit@devnote2
Fixes: 60c8971899 ("ftrace: Make DIRECT_CALLS work WITH_ARGS and !WITH_REGS")
Cc: stable@vger.kernel.org
Cc: Florent Revest <revest@chromium.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com> [arm64]
Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
steal_time is not used outside paravirt.c, make it static,
as sparse suggested.
Fixes: fdf68acccf ("RISC-V: paravirt: Implement steal-time support")
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
sizeof(struct virtio_crypto_akcipher_session_para) is less than
sizeof(struct virtio_crypto_op_ctrl_req::u), copying more bytes from
stack variable leads stack overflow. Clang reports this issue by
commands:
make -j CC=clang-14 mrproper >/dev/null 2>&1
make -j O=/tmp/crypto-build CC=clang-14 allmodconfig >/dev/null 2>&1
make -j O=/tmp/crypto-build W=1 CC=clang-14 drivers/crypto/virtio/
virtio_crypto_akcipher_algs.o
Fixes: 59ca6c9338 ("virtio-crypto: implement RSA algorithm")
Link: https://lore.kernel.org/all/0a194a79-e3a3-45e7-be98-83abd3e1cb7e@roeck-us.net/
Cc: <stable@vger.kernel.org>
Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
Tested-by: Nathan Chancellor <nathan@kernel.org> # build
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Recently, handshake_req_destroy_test1 started failing:
Expected handshake_req_destroy_test == req, but
handshake_req_destroy_test == 0000000000000000
req == 0000000060f99b40
not ok 11 req_destroy works
This is because "sock_release(sock)" was replaced with "fput(filp)"
to address a memory leak. Note that sock_release() is synchronous
but fput() usually delays the final close and clean-up.
The delay is not consequential in the other cases that were changed
but handshake_req_destroy_test1 is testing that handshake_req_cancel()
followed by closing the file actually does call the ->hp_destroy
method. Thus the PTR_EQ test at the end has to be sure that the
final close is complete before it checks the pointer.
We cannot use a completion here because if ->hp_destroy is never
called (ie, there is an API bug) then the test will hang.
Reported by: Guenter Roeck <linux@roeck-us.net>
Closes: https://lore.kernel.org/netdev/ZcKDd1to4MPANCrn@tissot.1015granger.net/T/#mac5c6299f86799f1c71776f3a07f9c566c7c3c40
Fixes: 4a0f07d71b ("net/handshake: Fix memory leak in __sock_create() and sock_alloc_file()")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/170724699027.91401.7839730697326806733.stgit@oracle-102.nfsv4bat.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Recently, I've been hitting following deadlock warning during dpll pin
dump:
[52804.637962] ======================================================
[52804.638536] WARNING: possible circular locking dependency detected
[52804.639111] 6.8.0-rc2jiri+ #1 Not tainted
[52804.639529] ------------------------------------------------------
[52804.640104] python3/2984 is trying to acquire lock:
[52804.640581] ffff88810e642678 (nlk_cb_mutex-GENERIC){+.+.}-{3:3}, at: netlink_dump+0xb3/0x780
[52804.641417]
but task is already holding lock:
[52804.642010] ffffffff83bde4c8 (dpll_lock){+.+.}-{3:3}, at: dpll_lock_dumpit+0x13/0x20
[52804.642747]
which lock already depends on the new lock.
[52804.643551]
the existing dependency chain (in reverse order) is:
[52804.644259]
-> #1 (dpll_lock){+.+.}-{3:3}:
[52804.644836] lock_acquire+0x174/0x3e0
[52804.645271] __mutex_lock+0x119/0x1150
[52804.645723] dpll_lock_dumpit+0x13/0x20
[52804.646169] genl_start+0x266/0x320
[52804.646578] __netlink_dump_start+0x321/0x450
[52804.647056] genl_family_rcv_msg_dumpit+0x155/0x1e0
[52804.647575] genl_rcv_msg+0x1ed/0x3b0
[52804.648001] netlink_rcv_skb+0xdc/0x210
[52804.648440] genl_rcv+0x24/0x40
[52804.648831] netlink_unicast+0x2f1/0x490
[52804.649290] netlink_sendmsg+0x36d/0x660
[52804.649742] __sock_sendmsg+0x73/0xc0
[52804.650165] __sys_sendto+0x184/0x210
[52804.650597] __x64_sys_sendto+0x72/0x80
[52804.651045] do_syscall_64+0x6f/0x140
[52804.651474] entry_SYSCALL_64_after_hwframe+0x46/0x4e
[52804.652001]
-> #0 (nlk_cb_mutex-GENERIC){+.+.}-{3:3}:
[52804.652650] check_prev_add+0x1ae/0x1280
[52804.653107] __lock_acquire+0x1ed3/0x29a0
[52804.653559] lock_acquire+0x174/0x3e0
[52804.653984] __mutex_lock+0x119/0x1150
[52804.654423] netlink_dump+0xb3/0x780
[52804.654845] __netlink_dump_start+0x389/0x450
[52804.655321] genl_family_rcv_msg_dumpit+0x155/0x1e0
[52804.655842] genl_rcv_msg+0x1ed/0x3b0
[52804.656272] netlink_rcv_skb+0xdc/0x210
[52804.656721] genl_rcv+0x24/0x40
[52804.657119] netlink_unicast+0x2f1/0x490
[52804.657570] netlink_sendmsg+0x36d/0x660
[52804.658022] __sock_sendmsg+0x73/0xc0
[52804.658450] __sys_sendto+0x184/0x210
[52804.658877] __x64_sys_sendto+0x72/0x80
[52804.659322] do_syscall_64+0x6f/0x140
[52804.659752] entry_SYSCALL_64_after_hwframe+0x46/0x4e
[52804.660281]
other info that might help us debug this:
[52804.661077] Possible unsafe locking scenario:
[52804.661671] CPU0 CPU1
[52804.662129] ---- ----
[52804.662577] lock(dpll_lock);
[52804.662924] lock(nlk_cb_mutex-GENERIC);
[52804.663538] lock(dpll_lock);
[52804.664073] lock(nlk_cb_mutex-GENERIC);
[52804.664490]
The issue as follows: __netlink_dump_start() calls control->start(cb)
with nlk->cb_mutex held. In control->start(cb) the dpll_lock is taken.
Then nlk->cb_mutex is released and taken again in netlink_dump(), while
dpll_lock still being held. That leads to ABBA deadlock when another
CPU races with the same operation.
Fix this by moving dpll_lock taking into dumpit() callback which ensures
correct lock taking order.
Fixes: 9d71b54b65 ("dpll: netlink: Add DPLL framework base functions")
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Link: https://lore.kernel.org/r/20240207115902.371649-1-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fixes for v6.8-rc4
DPU:
- fix for kernel doc warnings and smatch warnings in dpu_encoder
- fix for smatch warning in dpu_encoder
- fix the bus bandwidth value for SDM670
DP:
- fixes to handle unknown bpc case correctly for DP. The current code was
spilling over into other bits of DP configuration register, had to be
fixed to avoid the extra shifts which were causing the spill over
- fix for MISC0 programming in DP driver to program the correct
colorimetry value
GPU:
- dmabuf vmap fix
- a610 UBWC corruption fix (incorrect hbb)
- revert a commit that was making GPU recovery unreliable
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rob Clark <robdclark@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/CAF6AEGv+tb1+_cp7ftxcMZbbxE9810rvxeaC50eL=msQ+zkm0g@mail.gmail.com
Driver Changes:
- Fix a loop in an error path
- Fix a missing dma-fence reference
- Fix a retry path on userptr REMAP
- Workaround for a false gcc warning
- Fix missing map of the usm batch buffer
in the migrate vm.
- Fix a memory leak.
- Fix a bad assumption of used page size
- Fix hitting a BUG() due to zero pages to map.
- Remove some leftover async bind queue relics
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/ZcS2LllawGifubsk@fedora
Pull networking fixes from Paolo Abeni:
"Including fixes from WiFi and netfilter.
Current release - regressions:
- nic: intel: fix old compiler regressions
- netfilter: ipset: missing gc cancellations fixed
Current release - new code bugs:
- netfilter: ctnetlink: fix filtering for zone 0
Previous releases - regressions:
- core: fix from address in memcpy_to_iter_csum()
- netfilter: nfnetlink_queue: un-break NF_REPEAT
- af_unix: fix memory leak for dead unix_(sk)->oob_skb in GC.
- devlink: avoid potential loop in devlink_rel_nested_in_notify_work()
- iwlwifi:
- mvm: fix a battery life regression
- fix double-free bug
- mac80211: fix waiting for beacons logic
- nic: nfp: flower: prevent re-adding mac index for bonded port
Previous releases - always broken:
- rxrpc: fix generation of serial numbers to skip zero
- tipc: check the bearer type before calling tipc_udp_nl_bearer_add()
- tunnels: fix out of bounds access when building IPv6 PMTU error
- nic: hv_netvsc: register VF in netvsc_probe if NET_DEVICE_REGISTER
missed
- nic: atlantic: fix DMA mapping for PTP hwts ring
Misc:
- selftests: more fixes to deal with very slow hosts"
* tag 'net-6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (80 commits)
netfilter: nft_set_pipapo: remove scratch_aligned pointer
netfilter: nft_set_pipapo: add helper to release pcpu scratch area
netfilter: nft_set_pipapo: store index in scratch maps
netfilter: nft_set_rbtree: skip end interval element from gc
netfilter: nfnetlink_queue: un-break NF_REPEAT
netfilter: nf_tables: use timestamp to check for set element timeout
netfilter: nft_ct: reject direction for ct id
netfilter: ctnetlink: fix filtering for zone 0
s390/qeth: Fix potential loss of L3-IP@ in case of network issues
netfilter: ipset: Missing gc cancellations fixed
octeontx2-af: Initialize maps.
net: ethernet: ti: cpsw: enable mac_managed_pm to fix mdio
net: ethernet: ti: cpsw_new: enable mac_managed_pm to fix mdio
netfilter: nft_set_pipapo: remove static in nft_pipapo_get()
netfilter: nft_compat: restrict match/target protocol to u16
netfilter: nft_compat: reject unused compat flag
netfilter: nft_compat: narrow down revision to unsigned 8-bits
net: intel: fix old compiler regressions
MAINTAINERS: Maintainer change for rds
selftests: cmsg_ipv6: repeat the exact packet
...
Pull pinctrl fix from Linus Walleij:
"A single fix for the AMD driver which affects developer laptops, the
pinctrl/GPIO driver won't probe on some systems"
* tag 'pinctrl-v6.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: amd: Add IRQF_ONESHOT to the interrupt request
Pull NVMe fixes from Keith:
"nvme fixes for Linux 6.8
- Update a potentially stale firmware attribute (Maurizio)
- Fixes for the recent verbose error logging (Keith, Chaitanya)
- Protection information payload size fix for passthrough (Francis)"
* tag 'nvme-6.8-2023-02-08' of git://git.infradead.org/nvme:
nvme: use ns->head->pi_size instead of t10_pi_tuple structure size
nvme-core: fix comment to reflect right functions
nvme: move passthrough logging attribute to head
nvme-host: fix the updating of the firmware version
Ensure no remaining requests in virtqueues before resetting vdev and
deleting virtqueues. Otherwise these requests will never be completed.
It may cause the system to become unresponsive.
Function blk_mq_quiesce_queue() can ensure that requests have become
in_flight status, but it cannot guarantee that requests have been
processed by the device. Virtqueues should never be deleted before
all requests become complete status.
Function blk_mq_freeze_queue() ensure that all requests in virtqueues
become complete status. And no requests can enter in virtqueues.
Signed-off-by: Yi Sun <yi.sun@unisoc.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Link: https://lore.kernel.org/r/20240129085250.1550594-1-yi.sun@unisoc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
If the directory passed to the '.. kernel-feat::' directive does not
exist or the get_feat.pl script does not find any files to extract
features from, Sphinx will report the following error:
Sphinx parallel build error:
UnboundLocalError: local variable 'fname' referenced before assignment
make[2]: *** [Documentation/Makefile:102: htmldocs] Error 2
This is due to how I changed the script in c48a7c44a1 ("docs:
kernel_feat.py: fix potential command injection"). Before that, the
filename passed along to self.nestedParse() in this case was weirdly
just the whole get_feat.pl invocation.
We can fix it by doing what kernel_abi.py does -- just pass
self.arguments[0] as 'fname'.
Fixes: c48a7c44a1 ("docs: kernel_feat.py: fix potential command injection")
Cc: Justin Forbes <jforbes@fedoraproject.org>
Cc: Salvatore Bonaccorso <carnil@debian.org>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Link: https://lore.kernel.org/r/20240205175133.774271-2-vegard.nossum@oracle.com
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
When iocg_kick_delay() is called from a CPU different than the one which set
the delay, @now may be in the past of @iocg->delay_at leading to the
following warning:
UBSAN: shift-out-of-bounds in block/blk-iocost.c:1359:23
shift exponent 18446744073709 is too large for 64-bit type 'u64' (aka 'unsigned long long')
...
Call Trace:
<TASK>
dump_stack_lvl+0x79/0xc0
__ubsan_handle_shift_out_of_bounds+0x2ab/0x300
iocg_kick_delay+0x222/0x230
ioc_rqos_merge+0x1d7/0x2c0
__rq_qos_merge+0x2c/0x80
bio_attempt_back_merge+0x83/0x190
blk_attempt_plug_merge+0x101/0x150
blk_mq_submit_bio+0x2b1/0x720
submit_bio_noacct_nocheck+0x320/0x3e0
__swap_writepage+0x2ab/0x9d0
The underflow itself doesn't really affect the behavior in any meaningful
way; however, the past timestamp may exaggerate the delay amount calculated
later in the code, which shouldn't be a material problem given the nature of
the delay mechanism.
If @now is in the past, this CPU is racing another CPU which recently set up
the delay and there's nothing this CPU can contribute w.r.t. the delay.
Let's bail early from iocg_kick_delay() in such cases.
Reported-by: Breno Leitão <leitao@debian.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: 5160a5a53c ("blk-iocost: implement delay adjustment hysteresis")
Link: https://lore.kernel.org/r/ZVvc9L_CYk5LO1fT@slm.duckdns.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Send query dir requests with an info level of
SMB_FIND_FILE_FULL_DIRECTORY_INFO rather than
SMB_FIND_FILE_DIRECTORY_INFO when the client is generating its own
inode numbers (e.g. noserverino) so that reparse tags still
can be parsed directly from the responses, but server won't
send UniqueId (server inode number)
Signed-off-by: Paulo Alcantara <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Address static checker warning in cifs_ses_get_chan_index():
warn: variable dereferenced before check 'server'
To be consistent, and reduce risk, we should add another check
for null server pointer.
Fixes: 88675b22d3 ("cifs: do not search for channel if server is terminating")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Fix to show a parse error for bad type (non-string) for $comm/$COMM and
immediate-string. With this fix, error_log file shows appropriate error
message as below.
/sys/kernel/tracing # echo 'p vfs_read $comm:u32' >> kprobe_events
sh: write error: Invalid argument
/sys/kernel/tracing # echo 'p vfs_read \"hoge":u32' >> kprobe_events
sh: write error: Invalid argument
/sys/kernel/tracing # cat error_log
[ 30.144183] trace_kprobe: error: $comm and immediate-string only accepts string type
Command: p vfs_read $comm:u32
^
[ 62.618500] trace_kprobe: error: $comm and immediate-string only accepts string type
Command: p vfs_read \"hoge":u32
^
Link: https://lore.kernel.org/all/170602215411.215583.2238016352271091852.stgit@devnote2/
Fixes: 3dd1f7f24f ("tracing: probeevent: Fix to make the type of $comm string")
Cc: stable@vger.kernel.org
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
With the change in the widget free logic to power down the cores only
when the scheduler widgets are freed, we need to ensure that the
scheduler widget is freed only after all the widgets associated with the
scheduler are freed. This is to ensure that the secondary core that the
scheduler is scheduled to run on is kept powered on until all widgets
that need them are in use. While this works well for dynamic pipelines,
in the case of static pipelines the current logic does not take this into
account and frees all widgets in the order they occur in the
widget_list. So, modify this to ensure that the scheduler widgets are freed
only after all other types of widgets in the widget_list are freed.
Link: https://github.com/thesofproject/linux/issues/4807
Fixes: 31ed8da1c8 ("ASoC: SOF: sof-audio: Modify logic for enabling/disabling topology cores")
Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Péter Ujfalusi <peter.ujfalusi@linux.intel.com>
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com>
Link: https://lore.kernel.org/r/20240208133432.1688-1-peter.ujfalusi@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Rewrite the handling of ASP1 TX mixer mux initialization to prevent a
deadlock during component_remove().
The firmware can overwrite the ASP1 TX mixer registers with
system-specific settings. This is mainly for hardware that uses the
ASP as a chip-to-chip link controlled by the firmware. Because of this
the driver cannot know the starting state of the ASP1 mixer muxes until
the firmware has been downloaded and rebooted.
The original workaround for this was to queue a work function from the
dsp_work() job. This work then read the register values (populating the
regmap cache the first time around) and then called
snd_soc_dapm_mux_update_power(). The problem with this is that it was
ultimately triggered by cs35l56_component_probe() queueing dsp_work,
which meant that it would be running in parallel with the rest of the
ASoC component and card initialization. To prevent accessing DAPM before
it was fully initialized the work function took the card mutex. But this
would deadlock if cs35l56_component_remove() was called before the work job
had completed, because ASoC calls component_remove() with the card mutex
held.
This new version removes the work function. Instead the regmap cache and
DAPM mux widgets are initialized the first time any of the associated ALSA
controls is read or written.
Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Fixes: 07f7d6e7a1 ("ASoC: cs35l56: Fix for initializing ASP1 mixer registers")
Link: https://lore.kernel.org/r/20240208123742.1278104-1-rf@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>
A DoS tool that injects loads of authentication frames made our AP
crash. The iwl_mvm_is_dup() function couldn't find the per-queue
dup_data which was not allocated.
The root cause for that is that we ran out of stations in the firmware
and we didn't really add the station to the firmware, yet we didn't
return an error to mac80211.
Mac80211 was thinking that we have the station and because of that,
sta_info::uploaded was set to 1. This allowed
ieee80211_find_sta_by_ifaddr() to return a valid station object, but
that ieee80211_sta didn't have any iwl_mvm_sta object initialized and
that caused the crash mentioned earlier when we got Rx on that station.
Cc: stable@vger.kernel.org
Fixes: 57974a55d9 ("wifi: iwlwifi: mvm: refactor iwl_mvm_mac_sta_state_common()")
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://msgid.link/20240206175739.1f76c44b2486.I6a00955e2842f15f0a089db2f834adb9d10fbe35@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
As described in IEEE sta 802.11-2020, table 9-30 (Address
field contents), A-MSDU address 3 should contain the BSSID
address.
In TX_CMD we copy the MAC header from skb, and skb address 3
holds the destination address, but it may not be identical to
the BSSID.
Using the wrong destination address appears to work with (most)
receivers without MLO, but in MLO some devices are checking for
it carefully, perhaps as a consequence of link to MLD address
translation.
Replace address 3 in the TX_CMD MAC header with the correct
address while retaining the skb address 3 unchanged.
This ensures that skb address 3 will be utilized later for
constructing the A-MSDU subframes.
Note that we fill in the MLD address, but the firmware will do the
necessary translation to link address after encryption.
Signed-off-by: Daniel Gabay <daniel.gabay@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://msgid.link/20240204235836.4583a1bf9188.I3f8e7892bdf8f86b4daa28453771a8c9817b2416@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
1) Narrow down target/match revision to u8 in nft_compat.
2) Bail out with unused flags in nft_compat.
3) Restrict layer 4 protocol to u16 in nft_compat.
4) Remove static in pipapo get command that slipped through when
reducing set memory footprint.
5) Follow up incremental fix for the ipset performance regression,
this includes the missing gc cancellation, from Jozsef Kadlecsik.
6) Allow to filter by zone 0 in ctnetlink, do not interpret zone 0
as no filtering, from Felix Huettner.
7) Reject direction for NFT_CT_ID.
8) Use timestamp to check for set element expiration while transaction
is handled to prevent garbage collection from removing set elements
that were just added by this transaction. Packet path and netlink
dump/get path still use current time to check for expiration.
9) Restore NF_REPEAT in nfnetlink_queue, from Florian Westphal.
10) map_index needs to be percpu and per-set, not just percpu.
At this time its possible for a pipapo set to fill the all-zero part
with ones and take the 'might have bits set' as 'start-from-zero' area.
From Florian Westphal. This includes three patches:
- Change scratchpad area to a structure that provides space for a
per-set-and-cpu toggle and uses it of the percpu one.
- Add a new free helper to prepare for the next patch.
- Remove the scratch_aligned pointer and makes AVX2 implementation
use the exact same memory addresses for read/store of the matching
state.
netfilter pull request 24-02-08
* tag 'nf-24-02-08' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nft_set_pipapo: remove scratch_aligned pointer
netfilter: nft_set_pipapo: add helper to release pcpu scratch area
netfilter: nft_set_pipapo: store index in scratch maps
netfilter: nft_set_rbtree: skip end interval element from gc
netfilter: nfnetlink_queue: un-break NF_REPEAT
netfilter: nf_tables: use timestamp to check for set element timeout
netfilter: nft_ct: reject direction for ct id
netfilter: ctnetlink: fix filtering for zone 0
netfilter: ipset: Missing gc cancellations fixed
netfilter: nft_set_pipapo: remove static in nft_pipapo_get()
netfilter: nft_compat: restrict match/target protocol to u16
netfilter: nft_compat: reject unused compat flag
netfilter: nft_compat: narrow down revision to unsigned 8-bits
====================
Link: https://lore.kernel.org/r/20240208112834.1433-1-pablo@netfilter.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
use ->scratch for both avx2 and the generic implementation.
After previous change the scratch->map member is always aligned properly
for AVX2, so we can just use scratch->map in AVX2 too.
The alignoff delta is stored in the scratchpad so we can reconstruct
the correct address to free the area again.
Fixes: 7400b06396 ("nft_set_pipapo: Introduce AVX2-based lookup implementation")
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
After next patch simple kfree() is not enough anymore, so add
a helper for it.
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pipapo needs a scratchpad area to keep state during matching.
This state can be large and thus cannot reside on stack.
Each set preallocates percpu areas for this.
On each match stage, one scratchpad half starts with all-zero and the other
is inited to all-ones.
At the end of each stage, the half that starts with all-ones is
always zero. Before next field is tested, pointers to the two halves
are swapped, i.e. resmap pointer turns into fill pointer and vice versa.
After the last field has been processed, pipapo stashes the
index toggle in a percpu variable, with assumption that next packet
will start with the all-zero half and sets all bits in the other to 1.
This isn't reliable.
There can be multiple sets and we can't be sure that the upper
and lower half of all set scratch map is always in sync (lookups
can be conditional), so one set might have swapped, but other might
not have been queried.
Thus we need to keep the index per-set-and-cpu, just like the
scratchpad.
Note that this bug fix is incomplete, there is a related issue.
avx2 and normal implementation might use slightly different areas of the
map array space due to the avx2 alignment requirements, so
m->scratch (generic/fallback implementation) and ->scratch_aligned
(avx) may partially overlap. scratch and scratch_aligned are not distinct
objects, the latter is just the aligned address of the former.
After this change, write to scratch_align->map_index may write to
scratch->map, so this issue becomes more prominent, we can set to 1
a bit in the supposedly-all-zero area of scratch->map[].
A followup patch will remove the scratch_aligned and makes generic and
avx code use the same (aligned) area.
Its done in a separate change to ease review.
Fixes: 3c4287f620 ("nf_tables: Add set type for arbitrary concatenation of ranges")
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
rbtree lazy gc on insert might collect an end interval element that has
been just added in this transactions, skip end interval elements that
are not yet active.
Fixes: f718863aca ("netfilter: nft_set_rbtree: fix overlap expiration walk")
Cc: stable@vger.kernel.org
Reported-by: lonial con <kongln9170@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Only override userspace verdict if the ct hook returns something
other than ACCEPT.
Else, this replaces NF_REPEAT (run all hooks again) with NF_ACCEPT
(move to next hook).
Fixes: 6291b3a67a ("netfilter: conntrack: convert nf_conntrack_update to netfilter verdicts")
Reported-by: l.6diay@passmail.com
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Add a timestamp field at the beginning of the transaction, store it
in the nftables per-netns area.
Update set backend .insert, .deactivate and sync gc path to use the
timestamp, this avoids that an element expires while control plane
transaction is still unfinished.
.lookup and .update, which are used from packet path, still use the
current time to check if the element has expired. And .get path and dump
also since this runs lockless under rcu read size lock. Then, there is
async gc which also needs to check the current time since it runs
asynchronously from a workqueue.
Fixes: c3e1b005ed ("netfilter: nf_tables: add set element timeout support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Direction attribute is ignored, reject it in case this ever needs to be
supported
Fixes: 3087c3f7c2 ("netfilter: nft_ct: Add ct id support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Symptom:
In case of a bad cable connection (e.g. dirty optics) a fast sequence of
network DOWN-UP-DOWN-UP could happen. UP triggers recovery of the qeth
interface. In case of a second DOWN while recovery is still ongoing, it
can happen that the IP@ of a Layer3 qeth interface is lost and will not
be recovered by the second UP.
Problem:
When registration of IP addresses with Layer 3 qeth devices fails, (e.g.
because of bad address format) the respective IP address is deleted from
its hash-table in the driver. If registration fails because of a ENETDOWN
condition, the address should stay in the hashtable, so a subsequent
recovery can restore it.
3caa4af834 ("qeth: keep ip-address after LAN_OFFLINE failure")
fixes this for registration failures during normal operation, but not
during recovery.
Solution:
Keep L3-IP address in case of ENETDOWN in qeth_l3_recover_ip(). For
consistency with qeth_l3_add_ip() we also keep it in case of EADDRINUSE,
i.e. for some reason the card already/still has this address registered.
Fixes: 4a71df5004 ("qeth: new qeth device driver")
Cc: stable@vger.kernel.org
Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
Link: https://lore.kernel.org/r/20240206085849.2902775-1-wintera@linux.ibm.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The patch fdb8e12cc2cc ("netfilter: ipset: fix performance regression
in swap operation") missed to add the calls to gc cancellations
at the error path of create operations and at module unload. Also,
because the half of the destroy operations now executed by a
function registered by call_rcu(), neither NFNL_SUBSYS_IPSET mutex
or rcu read lock is held and therefore the checking of them results
false warnings.
Fixes: 97f7cf1cd8 ("netfilter: ipset: fix performance regression in swap operation")
Reported-by: syzbot+52bbc0ad036f6f0d4a25@syzkaller.appspotmail.com
Reported-by: Brad Spengler <spender@grsecurity.net>
Reported-by: Стас Ничипорович <stasn77@gmail.com>
Tested-by: Brad Spengler <spender@grsecurity.net>
Tested-by: Стас Ничипорович <stasn77@gmail.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The below commit introduced a WARN when phy state is not in the states:
PHY_HALTED, PHY_READY and PHY_UP.
commit 744d23c71a ("net: phy: Warn about incorrect mdio_bus_phy_resume() state")
When cpsw resumes, there have port in PHY_NOLINK state, so the below
warning comes out. Set mac_managed_pm be true to tell mdio that the phy
resume/suspend is managed by the mac, to fix the following warning:
WARNING: CPU: 0 PID: 965 at drivers/net/phy/phy_device.c:326 mdio_bus_phy_resume+0x140/0x144
CPU: 0 PID: 965 Comm: sh Tainted: G O 6.1.46-g247b2535b2 #1
Hardware name: Generic AM33XX (Flattened Device Tree)
unwind_backtrace from show_stack+0x18/0x1c
show_stack from dump_stack_lvl+0x24/0x2c
dump_stack_lvl from __warn+0x84/0x15c
__warn from warn_slowpath_fmt+0x1a8/0x1c8
warn_slowpath_fmt from mdio_bus_phy_resume+0x140/0x144
mdio_bus_phy_resume from dpm_run_callback+0x3c/0x140
dpm_run_callback from device_resume+0xb8/0x2b8
device_resume from dpm_resume+0x144/0x314
dpm_resume from dpm_resume_end+0x14/0x20
dpm_resume_end from suspend_devices_and_enter+0xd0/0x924
suspend_devices_and_enter from pm_suspend+0x2e0/0x33c
pm_suspend from state_store+0x74/0xd0
state_store from kernfs_fop_write_iter+0x104/0x1ec
kernfs_fop_write_iter from vfs_write+0x1b8/0x358
vfs_write from ksys_write+0x78/0xf8
ksys_write from ret_fast_syscall+0x0/0x54
Exception stack(0xe094dfa8 to 0xe094dff0)
dfa0: 00000004 005c3fb8 00000001 005c3fb8 00000004 00000001
dfc0: 00000004 005c3fb8 b6f6bba0 00000004 00000004 0059edb8 00000000 00000000
dfe0: 00000004 bed918f0 b6f09bd3 b6e89a66
Cc: <stable@vger.kernel.org> # v6.0+
Fixes: 744d23c71a ("net: phy: Warn about incorrect mdio_bus_phy_resume() state")
Fixes: fba863b816 ("net: phy: make PHY PM ops a no-op if MAC driver manages PHY PM")
Signed-off-by: Sinthu Raja <sinthu.raja@ti.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The below commit introduced a WARN when phy state is not in the states:
PHY_HALTED, PHY_READY and PHY_UP.
commit 744d23c71a ("net: phy: Warn about incorrect mdio_bus_phy_resume() state")
When cpsw_new resumes, there have port in PHY_NOLINK state, so the below
warning comes out. Set mac_managed_pm be true to tell mdio that the phy
resume/suspend is managed by the mac, to fix the following warning:
WARNING: CPU: 0 PID: 965 at drivers/net/phy/phy_device.c:326 mdio_bus_phy_resume+0x140/0x144
CPU: 0 PID: 965 Comm: sh Tainted: G O 6.1.46-g247b2535b2 #1
Hardware name: Generic AM33XX (Flattened Device Tree)
unwind_backtrace from show_stack+0x18/0x1c
show_stack from dump_stack_lvl+0x24/0x2c
dump_stack_lvl from __warn+0x84/0x15c
__warn from warn_slowpath_fmt+0x1a8/0x1c8
warn_slowpath_fmt from mdio_bus_phy_resume+0x140/0x144
mdio_bus_phy_resume from dpm_run_callback+0x3c/0x140
dpm_run_callback from device_resume+0xb8/0x2b8
device_resume from dpm_resume+0x144/0x314
dpm_resume from dpm_resume_end+0x14/0x20
dpm_resume_end from suspend_devices_and_enter+0xd0/0x924
suspend_devices_and_enter from pm_suspend+0x2e0/0x33c
pm_suspend from state_store+0x74/0xd0
state_store from kernfs_fop_write_iter+0x104/0x1ec
kernfs_fop_write_iter from vfs_write+0x1b8/0x358
vfs_write from ksys_write+0x78/0xf8
ksys_write from ret_fast_syscall+0x0/0x54
Exception stack(0xe094dfa8 to 0xe094dff0)
dfa0: 00000004 005c3fb8 00000001 005c3fb8 00000004 00000001
dfc0: 00000004 005c3fb8 b6f6bba0 00000004 00000004 0059edb8 00000000 00000000
dfe0: 00000004 bed918f0 b6f09bd3 b6e89a66
Cc: <stable@vger.kernel.org> # v6.0+
Fixes: 744d23c71a ("net: phy: Warn about incorrect mdio_bus_phy_resume() state")
Fixes: fba863b816 ("net: phy: make PHY PM ops a no-op if MAC driver manages PHY PM")
Signed-off-by: Sinthu Raja <sinthu.raja@ti.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Since commit 48e1b4d369 ("gpiolib: remove the GPIO device from the list
when it's unregistered") we remove the GPIO device entry from the global
list (used to order devices by their GPIO ranges) when unregistering the
chip, not when releasing the device. It will not happen when the last
reference is put anymore. This means, we need to remove it in error path
in gpiochip_add_data_with_key() unconditionally, without checking if the
device's .release() callback is set.
Fixes: 48e1b4d369 ("gpiolib: remove the GPIO device from the list when it's unregistered")
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
This has slipped through when reducing memory footprint for set
elements, remove it.
Fixes: 9dad402b89 ("netfilter: nf_tables: expose opaque set element as struct nft_elem_priv")
Reported-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
intel_power_domains_init is called twice in xe_device_probe:
1) intel_power_domains_init()
xe_display_init_nommio()
xe_device_probe()
2) intel_power_domains_init()
intel_display_driver_probe_noirq()
xe_display_init_noirq()
xe_device_probe()
It needs remove one to avoid power_domains->power_wells double malloc.
unreferenced object 0xffff88811150ee00 (size 512):
comm "systemd-udevd", pid 506, jiffies 4294674198 (age 3605.560s)
hex dump (first 32 bytes):
10 b4 9d a0 ff ff ff ff ff ff ff ff ff ff ff ff ................
ff ff ff ff ff ff ff ff 00 00 00 00 00 00 00 00 ................
backtrace:
[<ffffffff8134b901>] __kmem_cache_alloc_node+0x1c1/0x2b0
[<ffffffff812c98b2>] __kmalloc+0x52/0x150
[<ffffffffa08b0033>] __set_power_wells+0xc3/0x360 [xe]
[<ffffffffa08562fc>] xe_display_init_nommio+0x4c/0x70 [xe]
[<ffffffffa07f0d1c>] xe_device_probe+0x3c/0x5a0 [xe]
[<ffffffffa082e48f>] xe_pci_probe+0x33f/0x5a0 [xe]
[<ffffffff817f2187>] local_pci_probe+0x47/0xa0
[<ffffffff817f3db3>] pci_device_probe+0xc3/0x1f0
[<ffffffff8192f2a2>] really_probe+0x1a2/0x410
[<ffffffff8192f598>] __driver_probe_device+0x78/0x160
[<ffffffff8192f6ae>] driver_probe_device+0x1e/0x90
[<ffffffff8192f92a>] __driver_attach+0xda/0x1d0
[<ffffffff8192c95c>] bus_for_each_dev+0x7c/0xd0
[<ffffffff8192e159>] bus_add_driver+0x119/0x220
[<ffffffff81930d00>] driver_register+0x60/0x120
[<ffffffffa05e50a0>] 0xffffffffa05e50a0
The call to intel_power_domains_cleanup() needs to stay where it is for
now. The main issue is that while the init is called by the display
side, shared by i915 and xe, the cleanup is called by a non-shared code
path. Fixing that will be done as a separate commit.
Fixes: 44e694958b ("drm/xe/display: Implement display support")
Signed-off-by: Xiaoming Wang <xiaoming.wang@intel.com>
[ reword commit message and explain why the fini needs to stay
where it is ]
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240202215658.561298-1-lucas.demarchi@intel.com
(cherry picked from commit 86c99abb5f)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
gcc-13 warns about an array overflow that it sees but that is
prevented by the "asid % NUM_PF_QUEUE" calculation:
drivers/gpu/drm/xe/xe_gt_pagefault.c: In function 'xe_guc_pagefault_handler':
include/linux/fortify-string.h:57:33: error: writing 16 bytes into a region of size 0 [-Werror=stringop-overflow=]
include/linux/fortify-string.h:689:26: note: in expansion of macro '__fortify_memcpy_chk'
689 | #define memcpy(p, q, s) __fortify_memcpy_chk(p, q, s, \
| ^~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/xe/xe_gt_pagefault.c:341:17: note: in expansion of macro 'memcpy'
341 | memcpy(pf_queue->data + pf_queue->tail, msg, len * sizeof(u32));
| ^~~~~~
drivers/gpu/drm/xe/xe_gt_types.h:102:25: note: at offset [1144, 265324] into destination object 'tile' of size 8
I found that rewriting the assignment using pointer addition rather than the
equivalent array index calculation prevents the warning, so use that instead.
I sent a bug report against gcc for the false positive warning.
Fixes: dd08ebf6c3 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113214
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240103114819.2913937-1-arnd@kernel.org
(cherry picked from commit 774ef5dfc9)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
submit_flushes
atomic_set(&mddev->flush_pending, 1);
rdev_for_each_rcu(rdev, mddev)
atomic_inc(&mddev->flush_pending);
bi->bi_end_io = md_end_flush
submit_bio(bi);
/* flush io is done first */
md_end_flush
if (atomic_dec_and_test(&mddev->flush_pending))
percpu_ref_put(&mddev->active_io)
-> active_io is not released
if (atomic_dec_and_test(&mddev->flush_pending))
-> missing release of active_io
For consequence, mddev_suspend() will wait for 'active_io' to be zero
forever.
Fix this problem by releasing 'active_io' in submit_flushes() if
'flush_pending' is decreased to zero.
Fixes: fa2bbff7b0 ("md: synchronize flush io with array reconfiguration")
Cc: stable@vger.kernel.org # v6.1+
Reported-by: Blazej Kucman <blazej.kucman@linux.intel.com>
Closes: https://lore.kernel.org/lkml/20240130172524.0000417b@linux.intel.com/
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20240201092559.910982-7-yukuai1@huaweicloud.com
Pull crypto fixes from Herbert Xu:
"Fix regressions in cbc and algif_hash, as well as an older
NULL-pointer dereference in ccp"
* tag 'v6.8-p3' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: algif_hash - Remove bogus SGL free on zero-length error path
crypto: cbc - Ensure statesize is zero
crypto: ccp - Fix null pointer dereference in __sev_platform_shutdown_locked
Pull percpu fix from Dennis Zhou:
- fix riscv wrong size passed to local_flush_tlb_range_asid()
* tag 'percpu-for-6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu:
riscv: Fix wrong size passed to local_flush_tlb_range_asid()
According to a syzbot report, end_buffer_async_write(), which handles the
completion of block device writes, may detect abnormal condition of the
buffer async_write flag and cause a BUG_ON failure when using nilfs2.
Nilfs2 itself does not use end_buffer_async_write(). But, the async_write
flag is now used as a marker by commit 7f42ec3941 ("nilfs2: fix issue
with race condition of competition between segments for dirty blocks") as
a means of resolving double list insertion of dirty blocks in
nilfs_lookup_dirty_data_buffers() and nilfs_lookup_node_buffers() and the
resulting crash.
This modification is safe as long as it is used for file data and b-tree
node blocks where the page caches are independent. However, it was
irrelevant and redundant to also introduce async_write for segment summary
and super root blocks that share buffers with the backing device. This
led to the possibility that the BUG_ON check in end_buffer_async_write
would fail as described above, if independent writebacks of the backing
device occurred in parallel.
The use of async_write for segment summary buffers has already been
removed in a previous change.
Fix this issue by removing the manipulation of the async_write flag for
the remaining super root block buffer.
Link: https://lkml.kernel.org/r/20240203161645.4992-1-konishi.ryusuke@gmail.com
Fixes: 7f42ec3941 ("nilfs2: fix issue with race condition of competition between segments for dirty blocks")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: syzbot+5c04210f7c7f897c1e7f@syzkaller.appspotmail.com
Closes: https://lkml.kernel.org/r/00000000000019a97c05fd42f8c8@google.com
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
DAMON sysfs interface's update_schemes_tried_regions command has a timeout
of two apply intervals of the DAMOS scheme. Having zero value DAMOS
scheme apply interval means it will use the aggregation interval as the
value. However, the timeout setup logic is mistakenly using the sampling
interval insted of the aggregartion interval for the case. This could
cause earlier-than-expected timeout of the command. Fix it.
Link: https://lkml.kernel.org/r/20240202191956.88791-1-sj@kernel.org
Fixes: 7d6fa31a2f ("mm/damon/sysfs-schemes: add timeout for update_schemes_tried_regions")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: <stable@vger.kernel.org> # 6.7.x
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Syzbot reported a hang issue in migrate_pages_batch() called by mbind()
and nilfs_lookup_dirty_data_buffers() called in the log writer of nilfs2.
While migrate_pages_batch() locks a folio and waits for the writeback to
complete, the log writer thread that should bring the writeback to
completion picks up the folio being written back in
nilfs_lookup_dirty_data_buffers() that it calls for subsequent log
creation and was trying to lock the folio. Thus causing a deadlock.
In the first place, it is unexpected that folios/pages in the middle of
writeback will be updated and become dirty. Nilfs2 adds a checksum to
verify the validity of the log being written and uses it for recovery at
mount, so data changes during writeback are suppressed. Since this is
broken, an unclean shutdown could potentially cause recovery to fail.
Investigation revealed that the root cause is that the wait for writeback
completion in nilfs_page_mkwrite() is conditional, and if the backing
device does not require stable writes, data may be modified without
waiting.
Fix these issues by making nilfs_page_mkwrite() wait for writeback to
finish regardless of the stable write requirement of the backing device.
Link: https://lkml.kernel.org/r/20240131145657.4209-1-konishi.ryusuke@gmail.com
Fixes: 1d1d1a7672 ("mm: only enforce stable page writes if the backing device requires it")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: syzbot+ee2ae68da3b22d04cd8d@syzkaller.appspotmail.com
Closes: https://lkml.kernel.org/r/00000000000047d819061004ad6c@google.com
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
LRU_SKIP can only be returned if we don't ever dropped lru lock, or we
need to return LRU_RETRY to restart from the head of lru list.
Otherwise, the iteration might continue from a cursor position that was
freed while the locks were dropped.
Actually we may need to introduce another LRU_STOP to really terminate the
ongoing shrinking scan process, when we encounter a warm page already in
the swap cache. The current list_lru implementation doesn't have this
function to early break from __list_lru_walk_one.
Link: https://lkml.kernel.org/r/20240126-zswap-writeback-race-v2-1-b10479847099@bytedance.com
Fixes: b5ba474f3f ("zswap: shrink zswap pool based on memory pressure")
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Nhat Pham <nphamcs@gmail.com>
Cc: Chris Li <chriscli@google.com>
Cc: Yosry Ahmed <yosryahmed@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
In memcg_rstat_updated(), we iterate the memcg being updated and its
parents to update memcg->vmstats_percpu->stats_updates in the fast path
(i.e. no atomic updates). According to my math, this is 3 memory loads
(and potentially 3 cache misses) per memcg:
- Load the address of memcg->vmstats_percpu.
- Load vmstats_percpu->stats_updates (based on some percpu calculation).
- Load the address of the parent memcg.
Avoid most of the cache misses by caching a pointer from each struct
memcg_vmstats_percpu to its parent on the corresponding CPU. In this
case, for the first memcg we have 2 memory loads (same as above):
- Load the address of memcg->vmstats_percpu.
- Load vmstats_percpu->stats_updates (based on some percpu calculation).
Then for each additional memcg, we need a single load to get the
parent's stats_updates directly. This reduces the number of loads from
O(3N) to O(2+N) -- where N is the number of memcgs we need to iterate.
Additionally, stash a pointer to memcg->vmstats in each struct
memcg_vmstats_percpu such that we can access the atomic counter that all
CPUs fold into, memcg->vmstats->stats_updates.
memcg_should_flush_stats() is changed to memcg_vmstats_needs_flush() to
accept a struct memcg_vmstats pointer accordingly.
In struct memcg_vmstats_percpu, make sure both pointers together with
stats_updates live on the same cacheline. Finally, update
mem_cgroup_alloc() to take in a parent pointer and initialize the new
cache pointers on each CPU. The percpu loop in mem_cgroup_alloc() may
look concerning, but there are multiple similar loops in the cgroup
creation path (e.g. cgroup_rstat_init()), most of which are hidden
within alloc_percpu().
According to Oliver's testing [1], this fixes multiple 30-38%
regressions in vm-scalability, will-it-scale-tlb_flush2, and
will-it-scale-fallocate1. This comes at a cost of 2 more pointers per
CPU (<2KB on a machine with 128 CPUs).
[1] https://lore.kernel.org/lkml/ZbDJsfsZt2ITyo61@xsang-OptiPlex-9020/
[yosryahmed@google.com: fix struct memcg_vmstats_percpu size and alignment]
Link: https://lkml.kernel.org/r/20240203044612.1234216-1-yosryahmed@google.com
Link: https://lkml.kernel.org/r/20240124100023.660032-1-yosryahmed@google.com
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
Fixes: 8d59d2214c ("mm: memcg: make stats flushing threshold per-memcg")
Tested-by: kernel test robot <oliver.sang@intel.com>
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202401221624.cb53a8ca-oliver.sang@intel.com
Acked-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Greg Thelen <gthelen@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The helper function nilfs_recovery_copy_block() of
nilfs_recovery_dsync_blocks(), which recovers data from logs created by
data sync writes during a mount after an unclean shutdown, incorrectly
calculates the on-page offset when copying repair data to the file's page
cache. In environments where the block size is smaller than the page
size, this flaw can cause data corruption and leak uninitialized memory
bytes during the recovery process.
Fix these issues by correcting this byte offset calculation on the page.
Link: https://lkml.kernel.org/r/20240124121936.10575-1-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Commit c33c794828 ("mm: ptep_get() conversion") converted all (non-arch)
call sites to use ptep_get() instead of doing a direct dereference of the
pte. Full rationale can be found in that commit's log.
Since then, UFFDIO_MOVE has been implemented which does 7 direct pte
dereferences. Let's fix those up to use ptep_get().
I've asserted in the past that there is no reliable automated mechanism to
catch these; I'm relying on a combination of Coccinelle (which throws up a
lot of false positives) and some compiler magic to force a compiler error
on dereference. But given the frequency with which new issues are coming
up, I'll add it to my todo list to try to find an automated solution.
Link: https://lkml.kernel.org/r/20240123141755.3836179-1-ryan.roberts@arm.com
Fixes: adef440691 ("userfaultfd: UFFDIO_MOVE uABI")
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "fs/proc: do_task_stat: use sig->stats_".
do_task_stat() has the same problem as getrusage() had before "getrusage:
use sig->stats_lock rather than lock_task_sighand()": a hard lockup. If
NR_CPUS threads call lock_task_sighand() at the same time and the process
has NR_THREADS, spin_lock_irq will spin with irqs disabled O(NR_CPUS *
NR_THREADS) time.
This patch (of 3):
thread_group_cputime() does its own locking, we can safely shift
thread_group_cputime_adjusted() which does another for_each_thread loop
outside of ->siglock protected section.
Not only this removes for_each_thread() from the critical section with
irqs disabled, this removes another case when stats_lock is taken with
siglock held. We want to remove this dependency, then we can change the
users of stats_lock to not disable irqs.
Link: https://lkml.kernel.org/r/20240123153313.GA21832@redhat.com
Link: https://lkml.kernel.org/r/20240123153355.GA21854@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Dylan Hatch <dylanbhatch@google.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
lock_task_sighand() can trigger a hard lockup. If NR_CPUS threads call
getrusage() at the same time and the process has NR_THREADS, spin_lock_irq
will spin with irqs disabled O(NR_CPUS * NR_THREADS) time.
Change getrusage() to use sig->stats_lock, it was specifically designed
for this type of use. This way it runs lockless in the likely case.
TODO:
- Change do_task_stat() to use sig->stats_lock too, then we can
remove spin_lock_irq(siglock) in wait_task_zombie().
- Turn sig->stats_lock into seqcount_rwlock_t, this way the
readers in the slow mode won't exclude each other. See
https://lore.kernel.org/all/20230913154907.GA26210@redhat.com/
- stats_lock has to disable irqs because ->siglock can be taken
in irq context, it would be very nice to change __exit_signal()
to avoid the siglock->stats_lock dependency.
Link: https://lkml.kernel.org/r/20240122155053.GA26214@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reported-by: Dylan Hatch <dylanbhatch@google.com>
Tested-by: Dylan Hatch <dylanbhatch@google.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "getrusage: use sig->stats_lock", v2.
This patch (of 2):
thread_group_cputime() does its own locking, we can safely shift
thread_group_cputime_adjusted() which does another for_each_thread loop
outside of ->siglock protected section.
This is also preparation for the next patch which changes getrusage() to
use stats_lock instead of siglock, thread_group_cputime() takes the same
lock. With the current implementation recursive read_seqbegin_or_lock()
is fine, thread_group_cputime() can't enter the slow mode if the caller
holds stats_lock, yet this looks more safe and better performance-wise.
Link: https://lkml.kernel.org/r/20240122155023.GA26169@redhat.com
Link: https://lkml.kernel.org/r/20240122155050.GA26205@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reported-by: Dylan Hatch <dylanbhatch@google.com>
Tested-by: Dylan Hatch <dylanbhatch@google.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
For shared memory of type SHM_HUGETLB, hugetlb pages are reserved in
shmget() call. If SHM_NORESERVE flags is specified then the hugetlb pages
are not reserved. However when the shared memory is attached with the
shmat() call the hugetlb pages are getting reserved incorrectly for
SHM_HUGETLB shared memory created with SHM_NORESERVE which is a bug.
-------------------------------
Following test shows the issue.
$cat shmhtb.c
int main()
{
int shmflags = 0660 | IPC_CREAT | SHM_HUGETLB | SHM_NORESERVE;
int shmid;
shmid = shmget(SKEY, SHMSZ, shmflags);
if (shmid < 0)
{
printf("shmat: shmget() failed, %d\n", errno);
return 1;
}
printf("After shmget()\n");
system("cat /proc/meminfo | grep -i hugepages_");
shmat(shmid, NULL, 0);
printf("\nAfter shmat()\n");
system("cat /proc/meminfo | grep -i hugepages_");
shmctl(shmid, IPC_RMID, NULL);
return 0;
}
#sysctl -w vm.nr_hugepages=20
#./shmhtb
After shmget()
HugePages_Total: 20
HugePages_Free: 20
HugePages_Rsvd: 0
HugePages_Surp: 0
After shmat()
HugePages_Total: 20
HugePages_Free: 20
HugePages_Rsvd: 5 <--
HugePages_Surp: 0
--------------------------------
Fix is to ensure that hugetlb pages are not reserved for SHM_HUGETLB shared
memory in the shmat() call.
Link: https://lkml.kernel.org/r/1706040282-12388-1-git-send-email-prakash.sangappa@oracle.com
Signed-off-by: Prakash Sangappa <prakash.sangappa@oracle.com>
Acked-by: Muchun Song <muchun.song@linux.dev>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
ksmbd_iov_pin_rsp_read() doesn't free the provided aux buffer if it
fails. Seems to be the caller's responsibility to clear the buffer in
error case.
Found by Linux Verification Center (linuxtesting.org).
Fixes: e2b76ab8b5 ("ksmbd: add support for read compound")
Cc: stable@vger.kernel.org
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
The ksmbd_extract_sharename() function lacked a complete kernel-doc
comment. This patch adds parameter descriptions and detailed function
behavior to improve code readability and maintainability.
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Currently kernel supports 8 byte and 16 byte protection information.
So, use ns->head->pi_size instead of sizeof(struct t10_pi_tuple).
Signed-off-by: Francis Pravin <francis.p@samsung.com>
Signed-off-by: Sathyavathi M <sathya.m@samsung.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
'stream_enc_regs' array is an array of dcn10_stream_enc_registers
structures. The array is initialized with four elements, corresponding
to the four calls to stream_enc_regs() in the array initializer. This
means that valid indices for this array are 0, 1, 2, and 3.
The error message 'stream_enc_regs' 4 <= 5 below, is indicating that
there is an attempt to access this array with an index of 5, which is
out of bounds. This could lead to undefined behavior
Here, eng_id is used as an index to access the stream_enc_regs array. If
eng_id is 5, this would result in an out-of-bounds access on the
stream_enc_regs array.
Thus fixing Buffer overflow error in dcn301_stream_encoder_create
reported by Smatch:
drivers/gpu/drm/amd/amdgpu/../display/dc/resource/dcn301/dcn301_resource.c:1011 dcn301_stream_encoder_create() error: buffer overflow 'stream_enc_regs' 4 <= 5
Fixes: 3a83e4e64b ("drm/amd/display: Add dcn3.01 support to DC (v2)")
Cc: Roman Li <Roman.Li@amd.com>
Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Roman Li <roman.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
After a recent change in LLVM, allmodconfig (which has CONFIG_KCSAN=y
and CONFIG_WERROR=y enabled) has a few new instances of
-Wframe-larger-than for the mode support and system configuration
functions:
drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn20/display_mode_vba_20v2.c:3393:6: error: stack frame size (2144) exceeds limit (2048) in 'dml20v2_ModeSupportAndSystemConfigurationFull' [-Werror,-Wframe-larger-than]
3393 | void dml20v2_ModeSupportAndSystemConfigurationFull(struct display_mode_lib *mode_lib)
| ^
1 error generated.
drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn21/display_mode_vba_21.c:3520:6: error: stack frame size (2192) exceeds limit (2048) in 'dml21_ModeSupportAndSystemConfigurationFull' [-Werror,-Wframe-larger-than]
3520 | void dml21_ModeSupportAndSystemConfigurationFull(struct display_mode_lib *mode_lib)
| ^
1 error generated.
drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn20/display_mode_vba_20.c:3286:6: error: stack frame size (2128) exceeds limit (2048) in 'dml20_ModeSupportAndSystemConfigurationFull' [-Werror,-Wframe-larger-than]
3286 | void dml20_ModeSupportAndSystemConfigurationFull(struct display_mode_lib *mode_lib)
| ^
1 error generated.
Without the sanitizers enabled, there are no warnings.
This was the catalyst for commit 6740ec97bc ("drm/amd/display:
Increase frame warning limit with KASAN or KCSAN in dml2") and that same
change was made to dml in commit 5b750b2253 ("drm/amd/display:
Increase frame warning limit with KASAN or KCSAN in dml") but the
frame_warn_flag variable was not applied to all files. Do so now to
clear up the warnings and make all these files consistent.
Cc: stable@vger.kernel.org
Closes: https://github.com/ClangBuiltLinux/linux/issue/1990
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
When dc_state_destruct() was refactored the new phantom_stream_count
and phantom_plane_count members weren't cleared.
Fixes: 012a04b1d6 ("drm/amd/display: Refactor phantom resource allocation")
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Previously we would call apply_ctx_to_hw to enable and disable
phantom pipes. However, apply_ctx_to_hw can potentially update
non-phantom pipes as well which is undesired. Instead of calling
apply_ctx_to_hw as a whole, call the relevant helpers for each
phantom pipe when enabling / disabling which will avoid us modifying
hardware state for non-phantom pipes unknowingly.
The use case is for an FRL display where FRL_Update is requested
by the display. In this case link_state_valid flag is cleared in
a passive callback thread and should be handled in the next stream /
link update. However, due to the call to apply_ctx_to_hw for the
phantom pipes during a flip, the main pipes were modified outside
of the desired sequence (driver does not handle link_state_valid = 0
on flips).
Cc: stable@vger.kernel.org # 6.6+
Reviewed-by: Samson Tam <samson.tam@amd.com>
Acked-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alvin Lee <alvin.lee2@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
ta if invoke node buffer
|-------- ta type ----------|
|-------- ta id ----------|
|-------- cmd id ----------|
|------ shared buf len -----|
|------ shared buffer ------|
ta if invoke node buffer is as above, copy shared buffer data to correct location
Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
In the s3 suspend abort case some type of gfx9 power
rail not turn off from FCH side and this will put the
GPU in an unknown power status, so let's reset the gpu
to a known good power state before reinitialize gpu
device.
Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
In the suspend abort cases, the gfx power rail doesn't turn off so
some GFXDEC registers/CSB can't reset to default value and at this
moment reinitialize GFXDEC/CSB will result in an unexpected error.
So let skip those program sequence for the suspend abort case.
Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[why]
When populating dml pipes, odm combine policy should be assigned based
on the pipe topology of the context passed in. DML pipes could be
repopulated multiple times during single validate bandwidth attempt. We
need to make sure that whenever we repopulate the dml pipes it is always
aligned with the updated context. There is a case where DML pipes get
repopulated during FPO optimization after ODM combine policy is changed.
Since in the current code we reinitlaize ODM combine policy, even though
the current context has ODM combine enabled, we overwrite it despite the
pipes are already split. This causes DML to think that MPC combine is
used so we mistakenly enable MPC combine because we apply pipe split
with ODM combine policy reset. This issue doesn't impact non windowed
MPO with ODM case because the legacy policy has restricted use cases. We
don't encounter the case where both ODM and FPO optimizations are
enabled together. So we decide to leave it as is because it is about to
be replaced anyway.
Cc: stable@vger.kernel.org # 6.6+
Reviewed-by: Chaitanya Dhere <chaitanya.dhere@amd.com>
Reviewed-by: Alvin Lee <alvin.lee2@amd.com>
Acked-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Wenjing Liu <wenjing.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
'panel_cntl' structure used to control the display panel could be null,
dereferencing it could lead to a null pointer access.
Fixes the below:
drivers/gpu/drm/amd/amdgpu/../display/dc/hwss/dcn21/dcn21_hwseq.c:269 dcn21_set_backlight_level() error: we previously assumed 'panel_cntl' could be null (see line 250)
Fixes: 474ac4a875 ("drm/amd/display: Implement some asic specific abm call backs.")
Cc: Yongqiang Sun <yongqiang.sun@amd.com>
Cc: Anthony Koo <Anthony.Koo@amd.com>
Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Anthony Koo <Anthony.Koo@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
xt_check_{match,target} expects u16, but NFTA_RULE_COMPAT_PROTO is u32.
NLA_POLICY_MAX(NLA_BE32, 65535) cannot be used because .max in
nla_policy is s16, see 3e48be05f3 ("netlink: add attribute range
validation to policy").
Fixes: 0ca743a559 ("netfilter: nf_tables: add compatibility layer for x_tables")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Flag (1 << 0) is ignored is set, never used, reject it it with EINVAL
instead.
Fixes: 0ca743a559 ("netfilter: nf_tables: add compatibility layer for x_tables")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
xt_find_revision() expects u8, restrict it to this datatype.
Fixes: 0ca743a559 ("netfilter: nf_tables: add compatibility layer for x_tables")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
When we added mount_setattr() I added additional checks compared to the
legacy do_reconfigure_mnt() and do_change_type() helpers used by regular
mount(2). If that mount had a parent then verify that the caller and the
mount namespace the mount is attached to match and if not make sure that
it's an anonymous mount.
The real rootfs falls into neither category. It is neither an anoymous
mount because it is obviously attached to the initial mount namespace
but it also obviously doesn't have a parent mount. So that means legacy
mount(2) allows changing mount properties on the real rootfs but
mount_setattr(2) blocks this. I never thought much about this but of
course someone on this planet of earth changes properties on the real
rootfs as can be seen in [1].
Since util-linux finally switched to the new mount api in 2.39 not so
long ago it also relies on mount_setattr() and that surfaced this issue
when Fedora 39 finally switched to it. Fix this.
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2256843
Link: https://lore.kernel.org/r/20240206-vfs-mount-rootfs-v1-1-19b335eee133@kernel.org
Reviewed-by: Jan Kara <jack@suse.cz>
Reported-by: Karel Zak <kzak@redhat.com>
Cc: stable@vger.kernel.org # v5.12+
Signed-off-by: Christian Brauner <brauner@kernel.org>
The functions and the attribute listed in the comment doesn't exists in
the code, (ns->logging_enabled, nvme_passthru_err_log_enabled_store()
and nvme_passthru_err_log_enabled_show())
Update the comment with right function names and a comment
ns->head->passthru_err_log_enabled,
nvme_io_passthru_err_log_enabled_store() and
nvme_io_passthru_err_log_enabled_show().
Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Alan Adamson <alan.adamson@oracle.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Kalle Valo says:
====================
wireless fixes for v6.8-rc4
This time we have unusually large wireless pull request. Several
functionality fixes to both stack and iwlwifi. Lots of fixes to
warnings, especially to MODULE_DESCRIPTION().
* tag 'wireless-2024-02-06' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: (31 commits)
wifi: mt76: mt7996: fix fortify warning
wifi: brcmfmac: Adjust n_channels usage for __counted_by
wifi: iwlwifi: do not announce EPCS support
wifi: iwlwifi: exit eSR only after the FW does
wifi: iwlwifi: mvm: fix a battery life regression
wifi: mac80211: accept broadcast probe responses on 6 GHz
wifi: mac80211: adding missing drv_mgd_complete_tx() call
wifi: mac80211: fix waiting for beacons logic
wifi: mac80211: fix unsolicited broadcast probe config
wifi: mac80211: initialize SMPS mode correctly
wifi: mac80211: fix driver debugfs for vif type change
wifi: mac80211: set station RX-NSS on reconfig
wifi: mac80211: fix RCU use in TDLS fast-xmit
wifi: mac80211: improve CSA/ECSA connection refusal
wifi: cfg80211: detect stuck ECSA element in probe resp
wifi: iwlwifi: remove extra kernel-doc
wifi: fill in MODULE_DESCRIPTION()s for mt76 drivers
wifi: fill in MODULE_DESCRIPTION()s for wilc1000
wifi: fill in MODULE_DESCRIPTION()s for wl18xx
wifi: fill in MODULE_DESCRIPTION()s for p54spi
...
====================
Link: https://lore.kernel.org/r/20240206095722.CD9D2C433F1@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The namespace does not have attributes, but the head does. Move the new
logging attribute to that structure instead of dereferencing the wrong
type.
And while we're here, fix the reverse-tree coding style.
Fixes: 9f079dda14 ("nvme: allow passthru cmd error logging")
Reported-by: Tasmiya Nalatwad <tasmiya@linux.vnet.ibm.com>
Tested-by: Tasmiya Nalatwad <tasmiya@linux.vnet.ibm.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Alan Adamson <alan.adamson@oracle.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Pull LoongArch fixes from Huacai Chen:
"Fix acpi_core_pic[] array overflow, fix earlycon parameter if KASAN
enabled, disable UBSAN instrumentation for vDSO build, and two Kconfig
cleanups"
* tag 'loongarch-fixes-6.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
LoongArch: vDSO: Disable UBSAN instrumentation
LoongArch: Fix earlycon parameter if KASAN enabled
LoongArch: Change acpi_core_pic[NR_CPUS] to acpi_core_pic[MAX_CORE_PIC]
LoongArch: Select HAVE_ARCH_SECCOMP to use the common SECCOMP menu
LoongArch: Select ARCH_ENABLE_THP_MIGRATION instead of redefining it
The percpu area overflow_stacks is exported from arch/riscv/kernel/traps.c
for use in the entry code, but is not declared anywhere. Add the relevant
declaration to arch/riscv/include/asm/stacktrace.h to silence the following
sparse warning:
arch/riscv/kernel/traps.c:395:1: warning: symbol '__pcpu_scope_overflow_stack' was not declared. Should it be static?
We don't add the stackinfo_get_overflow() call as for some of the other
architectures as this doesn't seem to be used yet, so just silence the
warning.
Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Fixes: be97d0db5f ("riscv: VMAP_STACK overflow detection thread-safe")
Link: https://lore.kernel.org/r/20231123134214.81481-1-ben.dooks@codethink.co.uk
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Pull kvm fixes from Paolo Bonzini:
"x86 guest:
- Avoid false positive for check that only matters on AMD processors
x86:
- Give a hint when Win2016 might fail to boot due to XSAVES &&
!XSAVEC configuration
- Do not allow creating an in-kernel PIT unless an IOAPIC already
exists
RISC-V:
- Allow ISA extensions that were enabled for bare metal in 6.8 (Zbc,
scalar and vector crypto, Zfh[min], Zihintntl, Zvfh[min], Zfa)
S390:
- fix CC for successful PQAP instruction
- fix a race when creating a shadow page"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
x86/coco: Define cc_vendor without CONFIG_ARCH_HAS_CC_PLATFORM
x86/kvm: Fix SEV check in sev_map_percpu_data()
KVM: x86: Give a hint when Win2016 might fail to boot due to XSAVES erratum
KVM: x86: Check irqchip mode before create PIT
KVM: riscv: selftests: Add Zfa extension to get-reg-list test
RISC-V: KVM: Allow Zfa extension for Guest/VM
KVM: riscv: selftests: Add Zvfh[min] extensions to get-reg-list test
RISC-V: KVM: Allow Zvfh[min] extensions for Guest/VM
KVM: riscv: selftests: Add Zihintntl extension to get-reg-list test
RISC-V: KVM: Allow Zihintntl extension for Guest/VM
KVM: riscv: selftests: Add Zfh[min] extensions to get-reg-list test
RISC-V: KVM: Allow Zfh[min] extensions for Guest/VM
KVM: riscv: selftests: Add vector crypto extensions to get-reg-list test
RISC-V: KVM: Allow vector crypto extensions for Guest/VM
KVM: riscv: selftests: Add scaler crypto extensions to get-reg-list test
RISC-V: KVM: Allow scalar crypto extensions for Guest/VM
KVM: riscv: selftests: Add Zbc extension to get-reg-list test
RISC-V: KVM: Allow Zbc extension for Guest/VM
KVM: s390: fix cc for successful PQAP
KVM: s390: vsie: fix race during shadow creation
Pull nfsd fix from Chuck Lever:
- Address a deadlock regression in RELEASE_LOCKOWNER
* tag 'nfsd-6.8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
nfsd: don't take fi_lock in nfsd_break_deleg_cb()
The kernel build regressions/improvements email contained a couple of
issues with old compilers (in fact all the reports were on different
architectures, but all gcc 5.5) and the FIELD_PREP() and FIELD_GET()
conversions. They're all because an integer #define that should have
been declared as unsigned, was shifted to the point that it could set
the sign bit.
The fix just involves making sure the defines use the "U" identifier on
the constants to make sure they're unsigned. Should make the checkers
happier.
Confirmed with objdump before/after that there is no change to the
binaries.
Issues were reported as follows:
./drivers/net/ethernet/intel/ice/ice_base.c:238:7: note: in expansion of macro 'FIELD_GET'
(FIELD_GET(GLINT_CTL_ITR_GRAN_25_M, regval) == ICE_ITR_GRAN_US))
^
./include/linux/compiler_types.h:435:38: error: call to '__compiletime_assert_1093' declared with attribute error: FIELD_GET: mask is not constant
drivers/net/ethernet/intel/ice/ice_nvm.c:709:16: note: in expansion of macro ‘FIELD_GET’
orom->major = FIELD_GET(ICE_OROM_VER_MASK, combo_ver);
^
./include/linux/compiler_types.h:435:38: error: call to ‘__compiletime_assert_796’ declared with attribute error: FIELD_GET: mask is not constant
drivers/net/ethernet/intel/ice/ice_common.c:945:18: note: in expansion of macro ‘FIELD_GET’
u8 max_agg_bw = FIELD_GET(GL_PWR_MODE_CTL_CAR_MAX_BW_M,
^
./include/linux/compiler_types.h:435:38: error: call to ‘__compiletime_assert_420’ declared with attribute error: FIELD_GET: mask is not constant
drivers/net/ethernet/intel/i40e/i40e_dcb.c:458:8: note: in expansion of macro ‘FIELD_GET’
oui = FIELD_GET(I40E_LLDP_TLV_OUI_MASK, ouisubtype);
^
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Closes: https://lore.kernel.org/lkml/d03e90ca-8485-4d1b-5ec1-c3398e0e8da@linux-m68k.org/ #i40e #ice
Fixes: 62589808d7 ("i40e: field get conversion")
Fixes: 5a259f8e0b ("ice: field get conversion")
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Link: https://lore.kernel.org/r/20240206022906.2194214-1-jesse.brandeburg@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit 64db3e8d7b ("regulator: dt-bindings: microchip,mcp16502: convert
to YAML") converts mcp16502-regulator.txt to microchip,mcp16502.yaml, but
misses to adjust its reference in MAINTAINERS.
Hence, ./scripts/get_maintainer.pl --self-test=patterns complains about a
broken reference.
Repair this file reference in MICROCHIP MCP16502 PMIC DRIVER.
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Link: https://lore.kernel.org/r/20240207132231.16392-1-lukas.bulwahn@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
The MDS will issue the 'Fr' caps for async dirop, while there is
buggy in kclient and it could miss releasing the async dirop caps,
which is 'Fsxr'. And then the MDS will complain with:
"[WRN] client.xxx isn't responding to mclientcaps(revoke) ..."
So when releasing the dirop async requests or when they fail we
should always make sure that being revoked caps could be released.
Link: https://tracker.ceph.com/issues/50223
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Milind Changire <mchangir@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
In fs/ceph/caps.c, in encode_cap_msg(), "use after free" error was
caught by KASAN at this line - 'ceph_buffer_get(arg->xattr_buf);'. This
implies before the refcount could be increment here, it was freed.
In same file, in "handle_cap_grant()" refcount is decremented by this
line - 'ceph_buffer_put(ci->i_xattrs.blob);'. It appears that a race
occurred and resource was freed by the latter line before the former
line could increment it.
encode_cap_msg() is called by __send_cap() and __send_cap() is called by
ceph_check_caps() after calling __prep_cap(). __prep_cap() is where
arg->xattr_buf is assigned to ci->i_xattrs.blob. This is the spot where
the refcount must be increased to prevent "use after free" error.
Cc: stable@vger.kernel.org
Link: https://tracker.ceph.com/issues/59259
Signed-off-by: Rishabh Dave <ridave@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Georgi writes:
interconnect fixes for v6.8-rc
These are tiny fixes for reported issues in driver code for a few
platforms. One of them sorts out a hang issue, the other improves
the power consumption and the rest are fixing some bitmasks to make
sure the hardware does thing right.
- interconnect: qcom: sc8180x: Mark CO0 BCM keepalive
- interconnect: qcom: sm8550: Enable sync_state
- interconnect: qcom: sm8650: Use correct ACV enable_mask
- interconnect: qcom: x1e80100: Add missing ACV enable_mask
Signed-off-by: Georgi Djakov <djakov@kernel.org>
* tag 'icc-6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/djakov/icc:
interconnect: qcom: x1e80100: Add missing ACV enable_mask
interconnect: qcom: sm8650: Use correct ACV enable_mask
interconnect: qcom: sm8550: Enable sync_state
interconnect: qcom: sc8180x: Mark CO0 BCM keepalive
The fscrypt code will use i_blkbits to setup ci_data_unit_bits when
allocating the new inode, but ceph will initiate i_blkbits ater when
filling the inode, which is too late. Since ci_data_unit_bits will only
be used by the fscrypt framework so initiating i_blkbits with
CEPH_FSCRYPT_BLOCK_SHIFT is safe.
Link: https://tracker.ceph.com/issues/64035
Fixes: 5b11888471 ("fscrypt: support crypto data unit size less than filesystem block size")
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
A short read may occur while reading the message footer from the
socket. Later, when the socket is ready for another read, the
messenger invokes all read_partial_*() handlers, including
read_partial_sparse_msg_data(). The expectation is that
read_partial_sparse_msg_data() would bail, allowing the messenger to
invoke read_partial() for the footer and pick up where it left off.
However read_partial_sparse_msg_data() violates that and ends up
calling into the state machine in the OSD client. The sparse-read
state machine assumes that it's a new op and interprets some piece of
the footer as the sparse-read header and returns bogus extents/data
length, etc.
To determine whether read_partial_sparse_msg_data() should bail, let's
reuse cursor->total_resid. Because once it reaches to zero that means
all the extents and data have been successfully received in last read,
else it could break out when partially reading any of the extents and
data. And then osd_sparse_read() could continue where it left off.
[ idryomov: changelog ]
Link: https://tracker.ceph.com/issues/63586
Fixes: d396f89db3 ("libceph: add sparse read support to msgr1")
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
These functions are supposed to behave like other read_partial_*()
handlers: the contract with messenger v1 is that the handler bails if
the area of the message it's responsible for is already processed.
This comes up when handling short reads from the socket.
[ idryomov: changelog ]
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Acked-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
cmsg_ipv6 test requests tcpdump to capture 4 packets,
and sends until tcpdump quits. Only the first packet
is "real", however, and the rest are basic UDP packets.
So if tcpdump doesn't start in time it will miss
the real packet and only capture the UDP ones.
This makes the test fail on slow machine (no KVM or with
debug enabled) 100% of the time, while it passes in fast
environments.
Repeat the "real" / expected packet.
Fixes: 9657ad09e1 ("selftests: net: test IPV6_TCLASS")
Fixes: 05ae83d5a4 ("selftests: net: test IPV6_HOPLIMIT")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of incrementing the base of the global reg fields, which renders
the second instance of the repeater broken due to wrong offsets, use
regmap with base and offset. As for zeroing out the rest of the tuning
regs, avoid looping though the table and just use the table as is,
as it is already zero initialized.
Fixes: 99a517a582 ("phy: qualcomm: phy-qcom-eusb2-repeater: Zero out untouched tuning regs")
Tested-by: Elliot Berman <quic_eberman@quicinc.com> # sm8650-qrd
Signed-off-by: Abel Vesa <abel.vesa@linaro.org>
Link: https://lore.kernel.org/r/20240201-phy-qcom-eusb2-repeater-fixes-v4-1-cf18c8cef6d7@linaro.org
Signed-off-by: Vinod Koul <vkoul@kernel.org>
As explained by a comment in <linux/u64_stats_sync.h>, write side of struct
u64_stats_sync must ensure mutual exclusion, or one seqcount update could
be lost on 32-bit platforms, thus blocking readers forever. Such lockups
have been observed in real world after stmmac_xmit() on one CPU raced with
stmmac_napi_poll_tx() on another CPU.
To fix the issue without introducing a new lock, split the statics into
three parts:
1. fields updated only under the tx queue lock,
2. fields updated only during NAPI poll,
3. fields updated only from interrupt context,
Updates to fields in the first two groups are already serialized through
other locks. It is sufficient to split the existing struct u64_stats_sync
so that each group has its own.
Note that tx_set_ic_bit is updated from both contexts. Split this counter
so that each context gets its own, and calculate their sum to get the total
value in stmmac_get_ethtool_stats().
For the third group, multiple interrupts may be processed by different CPUs
at the same time, but interrupts on the same CPU will not nest. Move fields
from this group to a newly created per-cpu struct stmmac_pcpu_stats.
Fixes: 133466c3bb ("net: stmmac: use per-queue 64 bit statistics where necessary")
Link: https://lore.kernel.org/netdev/Za173PhviYg-1qIn@torres.zugschlus.de/t/
Cc: stable@vger.kernel.org
Signed-off-by: Petr Tesarik <petr@tesarici.cz>
Reviewed-by: Jisheng Zhang <jszhang@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Initialize the qDMA irqs after the registers are configured so that
interrupts that may have been pending from a primary kernel don't get
processed by the irq handler before it is ready to and cause panic with
the following trace:
Call trace:
fsl_qdma_queue_handler+0xf8/0x3e8
__handle_irq_event_percpu+0x78/0x2b0
handle_irq_event_percpu+0x1c/0x68
handle_irq_event+0x44/0x78
handle_fasteoi_irq+0xc8/0x178
generic_handle_irq+0x24/0x38
__handle_domain_irq+0x90/0x100
gic_handle_irq+0x5c/0xb8
el1_irq+0xb8/0x180
_raw_spin_unlock_irqrestore+0x14/0x40
__setup_irq+0x4bc/0x798
request_threaded_irq+0xd8/0x190
devm_request_threaded_irq+0x74/0xe8
fsl_qdma_probe+0x4d4/0xca8
platform_drv_probe+0x50/0xa0
really_probe+0xe0/0x3f8
driver_probe_device+0x64/0x130
device_driver_attach+0x6c/0x78
__driver_attach+0xbc/0x158
bus_for_each_dev+0x5c/0x98
driver_attach+0x20/0x28
bus_add_driver+0x158/0x220
driver_register+0x60/0x110
__platform_driver_register+0x44/0x50
fsl_qdma_driver_init+0x18/0x20
do_one_initcall+0x48/0x258
kernel_init_freeable+0x1a4/0x23c
kernel_init+0x10/0xf8
ret_from_fork+0x10/0x18
Cc: stable@vger.kernel.org
Fixes: b092529e0a ("dmaengine: fsl-qdma: Add qDMA controller driver for Layerscape SoCs")
Signed-off-by: Curtis Klein <curtis.klein@hpe.com>
Signed-off-by: Yi Zhao <yi.zhao@nxp.com>
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Link: https://lore.kernel.org/r/20240201220406.440145-1-Frank.Li@nxp.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
There is chip (ls1028a) errata:
The SoC may hang on 16 byte unaligned read transactions by QDMA.
Unaligned read transactions initiated by QDMA may stall in the NOC
(Network On-Chip), causing a deadlock condition. Stalled transactions will
trigger completion timeouts in PCIe controller.
Workaround:
Enable prefetch by setting the source descriptor prefetchable bit
( SD[PF] = 1 ).
Implement this workaround.
Cc: stable@vger.kernel.org
Fixes: b092529e0a ("dmaengine: fsl-qdma: Add qDMA controller driver for Layerscape SoCs")
Signed-off-by: Peng Ma <peng.ma@nxp.com>
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Link: https://lore.kernel.org/r/20240201215007.439503-1-Frank.Li@nxp.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
The current check of ch_en enabled to know the maximum number of available
hardware channels is wrong as it check the number of ch_en register set
but all of them are unset at probe. This register is set at the
dw_hdma_v0_core_start function which is run lately before a DMA transfer.
The HDMA IP have no way to know the number of hardware channels available
like the eDMA IP, then let set it to maximum channels and let the platform
set the right number of channels.
Fixes: e74c39573d ("dmaengine: dw-edma: Add support for native HDMA")
Acked-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Reviewed-by: Serge Semin <fancer.lancer@gmail.com>
Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Link: https://lore.kernel.org/r/20240129-b4-feature_hdma_mainline-v7-1-8e8c1acb7a46@bootlin.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Pull btrfs fixes from David Sterba:
- two fixes preventing deletion and manual creation of subvolume qgroup
- unify error code returned for unknown send flags
- fix assertion during subvolume creation when anonymous device could
be allocated by other thread (e.g. due to backref walk)
* tag 'for-6.8-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: do not ASSERT() if the newly created subvolume already got read
btrfs: forbid deleting live subvol qgroup
btrfs: forbid creating subvol qgroups
btrfs: send: return EOPNOTSUPP on unknown flags
syzbot reported a warning [0] in __unix_gc() with a repro, which
creates a socketpair and sends one socket's fd to itself using the
peer.
socketpair(AF_UNIX, SOCK_STREAM, 0, [3, 4]) = 0
sendmsg(4, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\360", iov_len=1}],
msg_iovlen=1, msg_control=[{cmsg_len=20, cmsg_level=SOL_SOCKET,
cmsg_type=SCM_RIGHTS, cmsg_data=[3]}],
msg_controllen=24, msg_flags=0}, MSG_OOB|MSG_PROBE|MSG_DONTWAIT|MSG_ZEROCOPY) = 1
This forms a self-cyclic reference that GC should finally untangle
but does not due to lack of MSG_OOB handling, resulting in memory
leak.
Recently, commit 11498715f2 ("af_unix: Remove io_uring code for
GC.") removed io_uring's dead code in GC and revealed the problem.
The code was executed at the final stage of GC and unconditionally
moved all GC candidates from gc_candidates to gc_inflight_list.
That papered over the reported problem by always making the following
WARN_ON_ONCE(!list_empty(&gc_candidates)) false.
The problem has been there since commit 2aab4b9690 ("af_unix: fix
struct pid leaks in OOB support") added full scm support for MSG_OOB
while fixing another bug.
To fix this problem, we must call kfree_skb() for unix_sk(sk)->oob_skb
if the socket still exists in gc_candidates after purging collected skb.
Then, we need to set NULL to oob_skb before calling kfree_skb() because
it calls last fput() and triggers unix_release_sock(), where we call
duplicate kfree_skb(u->oob_skb) if not NULL.
Note that the leaked socket remained being linked to a global list, so
kmemleak also could not detect it. We need to check /proc/net/protocol
to notice the unfreed socket.
[0]:
WARNING: CPU: 0 PID: 2863 at net/unix/garbage.c:345 __unix_gc+0xc74/0xe80 net/unix/garbage.c:345
Modules linked in:
CPU: 0 PID: 2863 Comm: kworker/u4:11 Not tainted 6.8.0-rc1-syzkaller-00583-g1701940b1a02 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/25/2024
Workqueue: events_unbound __unix_gc
RIP: 0010:__unix_gc+0xc74/0xe80 net/unix/garbage.c:345
Code: 8b 5c 24 50 e9 86 f8 ff ff e8 f8 e4 22 f8 31 d2 48 c7 c6 30 6a 69 89 4c 89 ef e8 97 ef ff ff e9 80 f9 ff ff e8 dd e4 22 f8 90 <0f> 0b 90 e9 7b fd ff ff 48 89 df e8 5c e7 7c f8 e9 d3 f8 ff ff e8
RSP: 0018:ffffc9000b03fba0 EFLAGS: 00010293
RAX: 0000000000000000 RBX: ffffc9000b03fc10 RCX: ffffffff816c493e
RDX: ffff88802c02d940 RSI: ffffffff896982f3 RDI: ffffc9000b03fb30
RBP: ffffc9000b03fce0 R08: 0000000000000001 R09: fffff52001607f66
R10: 0000000000000003 R11: 0000000000000002 R12: dffffc0000000000
R13: ffffc9000b03fc10 R14: ffffc9000b03fc10 R15: 0000000000000001
FS: 0000000000000000(0000) GS:ffff8880b9400000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00005559c8677a60 CR3: 000000000d57a000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
process_one_work+0x889/0x15e0 kernel/workqueue.c:2633
process_scheduled_works kernel/workqueue.c:2706 [inline]
worker_thread+0x8b9/0x12a0 kernel/workqueue.c:2787
kthread+0x2c6/0x3b0 kernel/kthread.c:388
ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1b/0x30 arch/x86/entry/entry_64.S:242
</TASK>
Reported-by: syzbot+fa3ef895554bdbfd1183@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=fa3ef895554bdbfd1183
Fixes: 2aab4b9690 ("af_unix: fix struct pid leaks in OOB support")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240203183149.63573-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If KUnit is built as a module, and it's unloaded, the kunit_bus is not
unregistered. This causes an error if it's then re-loaded later, as we
try to re-register the bus.
Unregister the bus and root_device on shutdown, if it looks valid.
In addition, be more specific about the value of kunit_bus_device. It
is:
- a valid struct device* if the kunit_bus initialised correctly.
- an ERR_PTR if it failed to initialise.
- NULL before initialisation and after shutdown.
Fixes: d03c720e03 ("kunit: Add APIs for managing devices")
Signed-off-by: David Gow <davidgow@google.com>
Reviewed-by: Rae Moar <rmoar@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
If we are bus manager and the bus has inconsistent gap counts, send a
bus reset immediately instead of trying to read the root node's config
ROM first. Otherwise, we could spend a lot of time trying to read the
config ROM but never succeeding.
This eliminates a 50+ second delay before the FireWire bus is usable after
a newly connected device is powered on in certain circumstances.
The delay occurs if a gap count inconsistency occurs, we are not the root
node, and we become bus manager. One scenario that causes this is with a TI
XIO2213B OHCI, the first time a Sony DSR-25 is powered on after being
connected to the FireWire cable. In this configuration, the Linux box will
not receive the initial PHY configuration packet sent by the DSR-25 as IRM,
resulting in the DSR-25 having a gap count of 44 while the Linux box has a
gap count of 63.
FireWire devices have a gap count parameter, which is set to 63 on power-up
and can be changed with a PHY configuration packet. This determines the
duration of the subaction and arbitration gaps. For reliable communication,
all nodes on a FireWire bus must have the same gap count.
A node may have zero or more of the following roles: root node, bus manager
(BM), isochronous resource manager (IRM), and cycle master. Unless a root
node was forced with a PHY configuration packet, any node might become root
node after a bus reset. Only the root node can become cycle master. If the
root node is not cycle master capable, the BM or IRM should force a change
of root node.
After a bus reset, each node sends a self-ID packet, which contains its
current gap count. A single bus reset does not change the gap count, but
two bus resets in a row will set the gap count to 63. Because a consistent
gap count is required for reliable communication, IEEE 1394a-2000 requires
that the bus manager generate a bus reset if it detects that the gap count
is inconsistent.
When the gap count is inconsistent, build_tree() will notice this after the
self identification process. It will set card->gap_count to the invalid
value 0. If we become bus master, this will force bm_work() to send a bus
reset when it performs gap count optimization.
After a bus reset, there is no bus manager. We will almost always try to
become bus manager. Once we become bus manager, we will first determine
whether the root node is cycle master capable. Then, we will determine if
the gap count should be changed. If either the root node or the gap count
should be changed, we will generate a bus reset.
To determine if the root node is cycle master capable, we read its
configuration ROM. bm_work() will wait until we have finished trying to
read the configuration ROM.
However, an inconsistent gap count can make this take a long time.
read_config_rom() will read the first few quadlets from the config ROM. Due
to the gap count inconsistency, eventually one of the reads will time out.
When read_config_rom() fails, fw_device_init() calls it again until
MAX_RETRIES is reached. This takes 50+ seconds.
Once we give up trying to read the configuration ROM, bm_work() will wake
up, assume that the root node is not cycle master capable, and do a bus
reset. Hopefully, this will resolve the gap count inconsistency.
This change makes bm_work() check for an inconsistent gap count before
waiting for the root node's configuration ROM. If the gap count is
inconsistent, bm_work() will immediately do a bus reset. This eliminates
the 50+ second delay and rapidly brings the bus to a working state.
I considered that if the gap count is inconsistent, a PHY configuration
packet might not be successful, so it could be desirable to skip the PHY
configuration packet before the bus reset in this case. However, IEEE
1394a-2000 and IEEE 1394-2008 say that the bus manager may transmit a PHY
configuration packet before a bus reset when correcting a gap count error.
Since the standard endorses this, I decided it's safe to retain the PHY
configuration packet transmission.
Normally, after a topology change, we will reset the bus a maximum of 5
times to change the root node and perform gap count optimization. However,
if there is a gap count inconsistency, we must always generate a bus reset.
Otherwise the gap count inconsistency will persist and communication will
be unreliable. For that reason, if there is a gap count inconstency, we
generate a bus reset even if we already reached the 5 reset limit.
Signed-off-by: Adam Goldman <adamg@pobox.com>
Reference: https://sourceforge.net/p/linux1394/mailman/message/58727806/
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
This reverts commit 0921244f6f.
It broke CPU hotplugging because it modifies the __cpu_possible_mask
after bootup, so that it will be different than nr_cpu_ids, which
then effictively breaks the workqueue setup code and triggers crashes
when shutting down CPUs at runtime.
Guenter was the first who noticed the wrong values in __cpu_possible_mask,
since the cpumask Kunit tests were failig.
Reverting this commit fixes both issues, but sadly brings back this
uncritical runtime warning:
register_cpu_capacity_sysctl: too early to get CPU4 device!
Signed-off-by: Helge Deller <deller@gmx.de>
Reported-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lkml.org/lkml/2024/2/4/146
Link: https://lore.kernel.org/lkml/Zb0mbHlIud_bqftx@slm.duckdns.org/t/
Cc: stable@vger.kernel.org # 6.0+
Drop dirty_log_page_splitting_test's assertion that the number of 4KiB
pages remains the same across dirty logging being enabled and disabled, as
the test doesn't guarantee that mappings outside of the memslots being
dirty logged are stable, e.g. KVM's mappings for code and pages in
memslot0 can be zapped by things like NUMA balancing.
To preserve the spirit of the check, assert that (a) the number of 4KiB
pages after splitting is _at least_ the number of 4KiB pages across all
memslots under test, and (b) the number of hugepages before splitting adds
up to the number of pages across all memslots under test. (b) is a little
tenuous as it relies on memslot0 being incompatible with transparent
hugepages, but that holds true for now as selftests explicitly madvise()
MADV_NOHUGEPAGE for memslot0 (__vm_create() unconditionally specifies the
backing type as VM_MEM_SRC_ANONYMOUS).
Reported-by: Yi Lai <yi1.lai@intel.com>
Reported-by: Tao Su <tao1.su@linux.intel.com>
Reviewed-by: Tao Su <tao1.su@linux.intel.com>
Link: https://lore.kernel.org/r/20240131222728.4100079-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
When finishing the final iteration of dirty_log_test testcase, set
host_quit _before_ the final "continue" so that the vCPU worker doesn't
run an extra iteration, and delete the hack-a-fix of an extra "continue"
from the dirty ring testcase. This fixes a bug where the extra post to
sem_vcpu_cont may not be consumed, which results in failures in subsequent
runs of the testcases. The bug likely was missed during development as
x86 supports only a single "guest mode", i.e. there aren't any subsequent
testcases after the dirty ring test, because for_each_guest_mode() only
runs a single iteration.
For the regular dirty log testcases, letting the vCPU run one extra
iteration is a non-issue as the vCPU worker waits on sem_vcpu_cont if and
only if the worker is explicitly told to stop (vcpu_sync_stop_requested).
But for the dirty ring test, which needs to periodically stop the vCPU to
reap the dirty ring, letting the vCPU resume the guest _after_ the last
iteration means the vCPU will get stuck without an extra "continue".
However, blindly firing off an post to sem_vcpu_cont isn't guaranteed to
be consumed, e.g. if the vCPU worker sees host_quit==true before resuming
the guest. This results in a dangling sem_vcpu_cont, which leads to
subsequent iterations getting out of sync, as the vCPU worker will
continue on before the main task is ready for it to resume the guest,
leading to a variety of asserts, e.g.
==== Test Assertion Failure ====
dirty_log_test.c:384: dirty_ring_vcpu_ring_full
pid=14854 tid=14854 errno=22 - Invalid argument
1 0x00000000004033eb: dirty_ring_collect_dirty_pages at dirty_log_test.c:384
2 0x0000000000402d27: log_mode_collect_dirty_pages at dirty_log_test.c:505
3 (inlined by) run_test at dirty_log_test.c:802
4 0x0000000000403dc7: for_each_guest_mode at guest_modes.c:100
5 0x0000000000401dff: main at dirty_log_test.c:941 (discriminator 3)
6 0x0000ffff9be173c7: ?? ??:0
7 0x0000ffff9be1749f: ?? ??:0
8 0x000000000040206f: _start at ??:?
Didn't continue vcpu even without ring full
Alternatively, the test could simply reset the semaphores before each
testcase, but papering over hacks with more hacks usually ends in tears.
Reported-by: Shaoqin Huang <shahuang@redhat.com>
Fixes: 84292e5659 ("KVM: selftests: Add dirty ring buffer test")
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Shaoqin Huang <shahuang@redhat.com>
Link: https://lore.kernel.org/r/20240202231831.354848-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
The detection of dirty-throttled tasks in blk-wbt has been subtly broken
since its beginning in 2016. Namely if we are doing cgroup writeback and
the throttled task is not in the root cgroup, balance_dirty_pages() will
set dirty_sleep for the non-root bdi_writeback structure. However
blk-wbt checks dirty_sleep only in the root cgroup bdi_writeback
structure. Thus detection of recently throttled tasks is not working in
this case (we noticed this when we switched to cgroup v2 and suddently
writeback was slow).
Since blk-wbt has no easy way to get to proper bdi_writeback and
furthermore its intention has always been to work on the whole device
rather than on individual cgroups, just move the dirty_sleep timestamp
from bdi_writeback to backing_dev_info. That fixes the checking for
recently throttled task and saves memory for everybody as a bonus.
CC: stable@vger.kernel.org
Fixes: b57d74aff9 ("writeback: track if we're sleeping on progress in balance_dirty_pages()")
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20240123175826.21452-1-jack@suse.cz
[axboe: fixup indentation errors]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
commit dfad37051a ("remap_range: move permission hooks out of
do_clone_file_range()") moved the permission hooks from
do_clone_file_range() out to its caller vfs_clone_file_range(),
but left all the fast sanity checks in do_clone_file_range().
This makes the expensive security hooks be called in situations
that they would not have been called before (e.g. fs does not support
clone).
The only reason for the do_clone_file_range() helper was that overlayfs
did not use to be able to call vfs_clone_file_range() from copy up
context with sb_writers lock held. However, since commit c63e56a4a6
("ovl: do not open/llseek lower file with upper sb_writers held"),
overlayfs just uses an open coded version of vfs_clone_file_range().
Merge_clone_file_range() into vfs_clone_file_range(), restoring the
original order of checks as it was before the regressing commit and adapt
the overlayfs code to call vfs_clone_file_range() before the permission
hooks that were added by commit ca7ab48240 ("ovl: add permission hooks
outside of do_splice_direct()").
Note that in the merge of do_clone_file_range(), the file_start_write()
context was reduced to cover ->remap_file_range() without holding it
over the permission hooks, which was the reason for doing the regressing
commit in the first place.
Reported-and-tested-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202401312229.eddeb9a6-oliver.sang@intel.com
Fixes: dfad37051a ("remap_range: move permission hooks out of do_clone_file_range()")
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Link: https://lore.kernel.org/r/20240202102258.1582671-1-amir73il@gmail.com
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
For small bitmaps that aren't PAGE_SIZE aligned *and* that are less than
512 pages in bitmap length, use an extra page to be able to cover the
entire range e.g. [1M..3G] which would be iterated more efficiently in a
single iteration, rather than two.
Fixes: b058ea3ab5 ("vfio/iova_bitmap: refactor iova_bitmap_set() to better handle page boundaries")
Link: https://lore.kernel.org/r/20240202133415.23819-10-joao.m.martins@oracle.com
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Tested-by: Avihai Horon <avihaih@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Leverage previously added MOCK_FLAGS_DEVICE_HUGE_IOVA flag to create an
IOMMU domain with more than MOCK_IO_PAGE_SIZE supported.
Plumb the hugetlb backing memory for buffer allocation and change the
expected page size to MOCK_HUGE_PAGE_SIZE (1M) when hugepage variant test
cases are used. These so far are limited to 128M and 256M IOVA range tests
cases which is when 1M hugepages can be used.
Link: https://lore.kernel.org/r/20240202133415.23819-9-joao.m.martins@oracle.com
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Add support to mock iommu hugepages of 1M (for a 2K mock io page size). To
avoid breaking test suite defaults, the way this is done is by explicitly
creating a iommu mock device which has hugepage support (i.e. through
MOCK_FLAGS_DEVICE_HUGE_IOVA).
The same scheme is maintained of mock base page index tracking in the
XArray, except that an extra bit is added to mark it as a hugepage. One
subpage containing the dirty bit, means that the whole hugepage is dirty
(similar to AMD IOMMU non-standard page sizes). For clearing, same thing
applies, and it must clear all dirty subpages.
This is in preparation for dirty tracking to mark mock hugepages as
dirty to exercise all the iova-bitmap fixes.
Link: https://lore.kernel.org/r/20240202133415.23819-8-joao.m.martins@oracle.com
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Move the clearing of the dirty bit of the mock domain into
mock_domain_test_and_clear_dirty() helper, simplifying the caller
function.
Additionally, rework the mock_domain_read_and_clear_dirty() loop to
iterate over a potentially variable IO page size. No functional change
intended with the loop refactor.
This is in preparation for dirty tracking support for IOMMU hugepage mock
domains.
Link: https://lore.kernel.org/r/20240202133415.23819-7-joao.m.martins@oracle.com
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Rework the functions that test and set the bitmaps to receive a new
parameter (the pte_page_size) that reflects the expected PTE size in the
page tables. The same scheme is still used i.e. even bits are dirty and
odd page indexes aren't dirty. Here it just refactors to consider the size
of the PTE rather than hardcoded to IOMMU mock base page assumptions.
While at it, refactor dirty bitmap tests to use the idev_id created by the
fixture instead of creating a new one.
This is in preparation for doing tests with IOMMU hugepages where multiple
bits set as part of recording a whole hugepage as dirty and thus the
pte_page_size will vary depending on io hugepages or io base pages.
Link: https://lore.kernel.org/r/20240202133415.23819-6-joao.m.martins@oracle.com
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
IOVA bitmap is a zero-copy scheme of recording dirty bits that iterate the
different bitmap user pages at chunks of a maximum of
PAGE_SIZE/sizeof(struct page*) pages.
When the iterations are split up into 64G, the end of the range may be
broken up in a way that's aligned with a non base page PTE size. This
leads to only part of the huge page being recorded in the bitmap. Note
that in pratice this is only a problem for IOMMU dirty tracking i.e. when
the backing PTEs are in IOMMU hugepages and the bitmap is in base page
granularity. So far this not something that affects VF dirty trackers
(which reports and records at the same granularity).
To fix that, if there is a remainder of bits left to set in which the
current IOVA bitmap doesn't cover, make a copy of the bitmap structure and
iterate-and-set the rest of the bits remaining. Finally, when advancing
the iterator, skip all the bits that were set ahead.
Link: https://lore.kernel.org/r/20240202133415.23819-5-joao.m.martins@oracle.com
Reported-by: Avihai Horon <avihaih@nvidia.com>
Fixes: f35f22cc76 ("iommu/vt-d: Access/Dirty bit support for SS domains")
Fixes: 421a511a29 ("iommu/amd: Access/Dirty bit support in IOPTEs")
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Tested-by: Avihai Horon <avihaih@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Exercise the dirty tracking bitmaps with byte unaligned addresses in
addition to the PAGE_SIZE unaligned bitmaps, using a address towards the
end of the page boundary.
In doing so, increase the tailroom we allocate for the bitmap from
MOCK_PAGE_SIZE(2K) into PAGE_SIZE(4K), such that we can test end of bitmap
boundary.
Link: https://lore.kernel.org/r/20240202133415.23819-4-joao.m.martins@oracle.com
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
iova_bitmap_mapped_length() don't deal correctly with the small bitmaps
(< 2M bitmaps) when the starting address isn't u64 aligned, leading to
skipping a tiny part of the IOVA range. This is materialized as not
marking data dirty that should otherwise have been.
Fix that by using a u8 * in the internal state of IOVA bitmap. Most of the
data structures use the type of the bitmap to adjust its indexes, thus
changing the type of the bitmap decreases the granularity of the bitmap
indexes.
Fixes: b058ea3ab5 ("vfio/iova_bitmap: refactor iova_bitmap_set() to better handle page boundaries")
Link: https://lore.kernel.org/r/20240202133415.23819-3-joao.m.martins@oracle.com
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Tested-by: Avihai Horon <avihaih@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Dirty IOMMU hugepages reported on a base page page-size granularity can
lead to an attempt to set dirty pages in the bitmap beyond the limits that
are pinned.
Bounds check the page index of the array we are trying to access is within
the limits before we kmap() and return otherwise.
While it is also a defensive check, this is also in preparation to defer
setting bits (outside the mapped range) to the next iteration(s) when the
pages become available.
Fixes: b058ea3ab5 ("vfio/iova_bitmap: refactor iova_bitmap_set() to better handle page boundaries")
Link: https://lore.kernel.org/r/20240202133415.23819-2-joao.m.martins@oracle.com
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Tested-by: Avihai Horon <avihaih@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
If a input device is opened before hid_hw_start is called, events may
not be received from the hardware. In the case of USB-backed devices,
for example, the hid_hw_start function is responsible for filling in
the URB which is submitted when the input device is opened. If a device
is opened prematurely, polling will never start because the device will
not have been in the correct state to send the URB.
Because the wacom driver registers its input devices before calling
hid_hw_start, there is a window of time where a device can be opened
and end up in an inoperable state. Some ARM-based Chromebooks in particular
reliably trigger this bug.
This commit splits the wacom_register_inputs function into two pieces.
One which is responsible for setting up the allocated inputs (and runs
prior to hid_hw_start so that devices are ready for any input events
they may end up receiving) and another which only registers the devices
(and runs after hid_hw_start to ensure devices can be immediately opened
without issue). Note that the functions to initialize the LEDs and remotes
are also moved after hid_hw_start to maintain their own dependency chains.
Fixes: 7704ac9373 ("HID: wacom: implement generic HID handling for pen generic devices")
Cc: stable@vger.kernel.org # v3.18+
Suggested-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Jason Gerecke <jason.gerecke@wacom.com>
Tested-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.com>
Since commit 680ee411a9 ("HID: logitech-hidpp: Fix connect event race")
the following messages appear in the kernel log from time to time:
logitech-hidpp-device 0003:046D:408A.0005: HID++ 4.5 device connected.
logitech-hidpp-device 0003:046D:408A.0005: HID++ 4.5 device connected.
logitech-hidpp-device 0003:046D:4051.0006: Disconnected
logitech-hidpp-device 0003:046D:408A.0005: Disconnected
As discussed, print the first per-device "device connected" message
at info level, demoting subsequent messages to debug level. Also,
demote the "Disconnected message" to debug level unconditionally.
Link: https://lore.kernel.org/lkml/3277085.44csPzL39Z@natalenko.name/
Signed-off-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.com>
Mika writes:
thunderbolt: Fix for v6.8-rc4
This includes one USB4/Thunderbolt fix for v6.8-rc4:
- Correct the CNS (CM TBT3 Not Supported) bit setting for USB4
routers.
This has been in linux-next with no reported issues.
* tag 'thunderbolt-for-v6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/westeri/thunderbolt:
thunderbolt: Fix setting the CNS bit in ROUTER_CS_5
DDPP is copied from Synopsys Data book:
DDPP: Disable Data path Parity Protection.
When it is 0x0, Data path Parity Protection is enabled.
When it is 0x1, Data path Parity Protection is disabled.
The macro name should be XGMAC_DPP_DISABLE.
Fixes: 46eba193d0 ("net: stmmac: xgmac: fix handling of DPP safety error for DMA channels")
Signed-off-by: Furong Xu <0x1207@gmail.com>
Reviewed-by: Serge Semin <fancer.lancer@gmail.com>
Link: https://lore.kernel.org/r/20240203053133.1129236-1-0x1207@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
These functions are either used in only one file and can just be
made static or need an #include statement to avoid a warning:
arch/powerpc/platforms/85xx/mpc8536_ds.c:30:13: error: no previous prototype for 'mpc8536_ds_pic_init' [-Werror=missing-prototypes]
arch/powerpc/platforms/85xx/p1010rdb.c:27:13: error: no previous prototype for 'p1010_rdb_pic_init' [-Werror=missing-prototypes]
arch/powerpc/platforms/85xx/p1022_ds.c:373:6: error: no previous prototype for 'p1022ds_set_pixel_clock' [-Werror=missing-prototypes]
arch/powerpc/platforms/85xx/p1022_ds.c:422:1: error: no previous prototype for 'p1022ds_valid_monitor_port' [-Werror=missing-prototypes]
arch/powerpc/platforms/85xx/p1022_ds.c:435:13: error: no previous prototype for 'p1022_ds_pic_init' [-Werror=missing-prototypes]
arch/powerpc/platforms/85xx/p1022_rdk.c:43:6: error: no previous prototype for 'p1022rdk_set_pixel_clock' [-Werror=missing-prototypes]
arch/powerpc/platforms/85xx/p1022_rdk.c:92:1: error: no previous prototype for 'p1022rdk_valid_monitor_port' [-Werror=missing-prototypes]
arch/powerpc/platforms/85xx/p1022_rdk.c:99:13: error: no previous prototype for 'p1022_rdk_pic_init' [-Werror=missing-prototypes]
arch/powerpc/platforms/85xx/socrates_fpga_pic.c:273:13: error: no previous prototype for 'socrates_fpga_pic_init' [-Werror=missing-prototypes]
arch/powerpc/platforms/85xx/xes_mpc85xx.c:40:13: error: no previous prototype for 'xes_mpc85xx_pic_init' [-Werror=missing-prototypes]
arch/powerpc/platforms/85xx/mvme2500.c:24:13: error: no previous prototype for 'mvme2500_pic_init' [-Werror=missing-prototypes]
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240123125148.2004648-2-arnd@kernel.org
ppc64_book3e_allmodconfig has one more driver that triggeres a
few missing-prototypes warnings:
arch/powerpc/sysdev/udbg_memcons.c:44:6: error: no previous prototype for 'memcons_putc' [-Werror=missing-prototypes]
arch/powerpc/sysdev/udbg_memcons.c:57:5: error: no previous prototype for 'memcons_getc_poll' [-Werror=missing-prototypes]
arch/powerpc/sysdev/udbg_memcons.c:80:5: error: no previous prototype for 'memcons_getc' [-Werror=missing-prototypes]
Mark all these function static as there are no other users.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240123125148.2004648-1-arnd@kernel.org
The hrtimers migration on CPU-down hotplug process has been moved
earlier, before the CPU actually goes to die. This leaves a small window
of opportunity to queue an hrtimer in a blind spot, leaving it ignored.
For example a practical case has been reported with RCU waking up a
SCHED_FIFO task right before the CPUHP_AP_IDLE_DEAD stage, queuing that
way a sched/rt timer to the local offline CPU.
Make sure such situations never go unnoticed and warn when that happens.
Fixes: 5c0930ccaa ("hrtimers: Push pending hrtimers away from outgoing CPU earlier")
Reported-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240129235646.3171983-4-boqun.feng@gmail.com
Selftests here check not only that connect()/accept() for
TCP-AO/TCP-MD5/non-signed-TCP combinations do/don't establish
connections, but also counters: those are per-AO-key, per-socket and
per-netns.
The counters are checked on the server's side, as the server listener
has TCP-AO/TCP-MD5/no keys for different peers. All tests run in
the same namespaces with the same veth pair, created in test_init().
After close() in both client and server, the sides go through
the regular FIN/ACK + FIN/ACK sequence, which goes in the background.
If the selftest has already started a new testing scenario, read
per-netns counters - it may fail in the end iff it doesn't expect
the TCPAOGood per-netns counters go up during the test.
Let's just kill both TCP-AO sides - that will avoid any asynchronous
background TCP-AO segments going to either sides.
Reported-by: Jakub Kicinski <kuba@kernel.org>
Closes: https://lore.kernel.org/all/20240201132153.4d68f45e@kernel.org/T/#u
Fixes: 6f0c472a68 ("selftests/net: Add TCP-AO + TCP-MD5 + no sign listen socket tests")
Signed-off-by: Dmitry Safonov <dima@arista.com>
Link: https://lore.kernel.org/r/20240202-unsigned-md5-netns-counters-v1-1-8b90c37c0566@arista.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
After commit a9ef277488 ("x86/kvm: Fix SEV check in
sev_map_percpu_data()"), there is a build error when building
x86_64_defconfig with GCOV using LLVM:
ld.lld: error: undefined symbol: cc_vendor
>>> referenced by kvm.c
>>> arch/x86/kernel/kvm.o:(kvm_smp_prepare_boot_cpu) in archive vmlinux.a
which corresponds to
if (cc_vendor != CC_VENDOR_AMD ||
!cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT))
return;
Without GCOV, clang is able to eliminate the use of cc_vendor because
cc_platform_has() evaluates to false when CONFIG_ARCH_HAS_CC_PLATFORM is
not set, meaning that if statement will be true no matter what value
cc_vendor has.
With GCOV, the instrumentation keeps the use of cc_vendor around for
code coverage purposes but cc_vendor is only declared, not defined,
without CONFIG_ARCH_HAS_CC_PLATFORM, leading to the build error above.
Provide a macro definition of cc_vendor when CONFIG_ARCH_HAS_CC_PLATFORM
is not set with a value of CC_VENDOR_NONE, so that the first condition
can always be evaluated/eliminated at compile time, avoiding the build
error altogether. This is very similar to the situation prior to
commit da86eb9611 ("x86/coco: Get rid of accessor functions").
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Message-Id: <20240202-provide-cc_vendor-without-arch_has_cc_platform-v1-1-09ad5f2a3099@kernel.org>
Fixes: a9ef277488 ("x86/kvm: Fix SEV check in sev_map_percpu_data()", 2024-01-31)
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
syzbot reported the following general protection fault [1]:
general protection fault, probably for non-canonical address 0xdffffc0000000010: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000080-0x0000000000000087]
...
RIP: 0010:tipc_udp_is_known_peer+0x9c/0x250 net/tipc/udp_media.c:291
...
Call Trace:
<TASK>
tipc_udp_nl_bearer_add+0x212/0x2f0 net/tipc/udp_media.c:646
tipc_nl_bearer_add+0x21e/0x360 net/tipc/bearer.c:1089
genl_family_rcv_msg_doit+0x1fc/0x2e0 net/netlink/genetlink.c:972
genl_family_rcv_msg net/netlink/genetlink.c:1052 [inline]
genl_rcv_msg+0x561/0x800 net/netlink/genetlink.c:1067
netlink_rcv_skb+0x16b/0x440 net/netlink/af_netlink.c:2544
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1341 [inline]
netlink_unicast+0x53b/0x810 net/netlink/af_netlink.c:1367
netlink_sendmsg+0x8b7/0xd70 net/netlink/af_netlink.c:1909
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg+0xd5/0x180 net/socket.c:745
____sys_sendmsg+0x6ac/0x940 net/socket.c:2584
___sys_sendmsg+0x135/0x1d0 net/socket.c:2638
__sys_sendmsg+0x117/0x1e0 net/socket.c:2667
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0x40/0x110 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x63/0x6b
The cause of this issue is that when tipc_nl_bearer_add() is called with
the TIPC_NLA_BEARER_UDP_OPTS attribute, tipc_udp_nl_bearer_add() is called
even if the bearer is not UDP.
tipc_udp_is_known_peer() called by tipc_udp_nl_bearer_add() assumes that
the media_ptr field of the tipc_bearer has an udp_bearer type object, so
the function goes crazy for non-UDP bearers.
This patch fixes the issue by checking the bearer type before calling
tipc_udp_nl_bearer_add() in tipc_nl_bearer_add().
Fixes: ef20cd4dd1 ("tipc: introduce UDP replicast")
Reported-and-tested-by: syzbot+5142b87a9abc510e14fa@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=5142b87a9abc510e14fa [1]
Signed-off-by: Shigeru Yoshida <syoshida@redhat.com>
Reviewed-by: Tung Nguyen <tung.q.nguyen@dektech.com.au>
Link: https://lore.kernel.org/r/20240131152310.4089541-1-syoshida@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Pull bcachefs fixes from Kent Overstreet:
"Two serious ones here that we'll want to backport to stable: a fix for
a race in the thread_with_file code, and another locking fixup in the
subvolume deletion path"
* tag 'bcachefs-2024-02-05' of https://evilpiepirate.org/git/bcachefs:
bcachefs: time_stats: Check for last_event == 0 when updating freq stats
bcachefs: install fd later to avoid race with close
bcachefs: unlock parent dir if entry is not found in subvolume deletion
bcachefs: Fix build on parisc by avoiding __multi3()
When the range parsing was open-coded the number of u32 entries to
parse had to be a multiple of 4 and the driver checks this. With
the range parsing converted to the range parser the counting changes
from individual u32 entries to a complete range, so the check must
not reject counts not divisible by 4.
Fixes: 2a88e4792c ("bus: imx-weim: Remove open coded "ranges" parsing")
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
The earlycon parameter is based on fixmap, and fixmap addresses are not
supposed to be shadowed by KASAN. So return the kasan_early_shadow_page
in kasan_mem_to_shadow() if the input address is above FIXADDR_START.
Otherwise earlycon cannot work after kasan_init().
Cc: stable@vger.kernel.org
Fixes: 5aa4ac64e6 ("LoongArch: Add KASAN (Kernel Address Sanitizer) support")
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
LoongArch missed the refactoring made by commit 282a181b1a ("seccomp:
Move config option SECCOMP to arch/Kconfig") because LoongArch was not
mainlined at that time.
The 'depends on PROC_FS' statement is stale as described in that commit.
Select HAVE_ARCH_SECCOMP, and remove the duplicated config entry.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
ARCH_ENABLE_THP_MIGRATION is supposed to be selected by arch Kconfig.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
This reverts commit ca10d851b9.
The commit allowed workqueue_apply_unbound_cpumask() to clear __WQ_ORDERED
on now removed implicitly ordered workqueues. This was incorrect in that
system-wide config change shouldn't break ordering properties of all
workqueues. The reason why apply_workqueue_attrs() path was allowed to do so
was because it was targeting the specific workqueue - either the workqueue
had WQ_SYSFS set or the workqueue user specifically tried to change
max_active, both of which indicate that the workqueue doesn't need to be
ordered.
The implicitly ordered workqueue promotion was removed by the previous
commit 3bc1e711c2 ("workqueue: Don't implicitly make UNBOUND workqueues w/
@max_active==1 ordered"). However, it didn't update this path and broke
build. Let's revert the commit which was incorrect in the first place which
also fixes build.
Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: 3bc1e711c2 ("workqueue: Don't implicitly make UNBOUND workqueues w/ @max_active==1 ordered")
Fixes: ca10d851b9 ("workqueue: Override implicit ordered attribute in workqueue_apply_unbound_cpumask()")
Cc: stable@vger.kernel.org # v6.6+
Signed-off-by: Tejun Heo <tj@kernel.org>
LUNs going into "failed ready running" state observed on >1T and on even
numbers of size (2T, 4T, 6T, 8T and 10T). The issue occurs when DIF is
enabled at the host.
The kernel logs:
Cannot setup S/G List for HBAIO segs 1/1 SGL 512 SCSI 256: 3 0
The host lpfc driver is failing to setup scatter/gather list (protection
data) for the I/Os.
The return type lpfc_bg_setup_sgl()/lpfc_bg_setup_sgl_prot() causes the
compiler to remove the most significant bit. Use an unsigned type instead.
Signed-off-by: Hannes Reinecke <hare@suse.de>
[dwagner: added commit message]
Signed-off-by: Daniel Wagner <dwagner@suse.de>
Link: https://lore.kernel.org/r/20231220162658.12392-1-dwagner@suse.de
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Commit 4373534a98 ("scsi: core: Move scsi_host_busy() out of host lock
for waking up EH handler") intended to fix a hard lockup issue triggered by
EH. The core idea was to move scsi_host_busy() out of the host lock when
processing individual commands for EH. However, a suggested style change
inadvertently caused scsi_host_busy() to remain under the host lock. Fix
this by calling scsi_host_busy() outside the lock.
Fixes: 4373534a98 ("scsi: core: Move scsi_host_busy() out of host lock for waking up EH handler")
Cc: Sathya Prakash Veerichetty <safhya.prakash@broadcom.com>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20240203024521.2006455-1-ming.lei@redhat.com
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Commit 6abe9c1386 ("KVM: X86: Move ignore_msrs handling upper the
stack") changed the 'ignore_msrs' handling, including sanitizing return
values to the caller. This was fine until commit 12bc2132b1 ("KVM:
X86: Do the same ignore_msrs check for feature msrs") which allowed
non-existing feature MSRs to be ignored, i.e. to not generate an error
on the ioctl() level. It even tried to preserve the sanitization of the
return value. However, the logic is flawed, as '*data' will be
overwritten again with the uninitialized stack value of msr.data.
Fix this by simplifying the logic and always initializing msr.data,
vanishing the need for an additional error exit path.
Fixes: 12bc2132b1 ("KVM: X86: Do the same ignore_msrs check for feature msrs")
Signed-off-by: Mathias Krause <minipli@grsecurity.net>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Link: https://lore.kernel.org/r/20240203124522.592778-2-minipli@grsecurity.net
Signed-off-by: Sean Christopherson <seanjc@google.com>
Nouveau manages GSP-RM DMA buffers with nvkm_gsp_mem objects. Several of
these buffers are never dealloced. Some of them can be deallocated
right after GSP-RM is initialized, but the rest need to stay until the
driver unloads.
Also futher bullet-proof these objects by poisoning the buffer and
clearing the nvkm_gsp_mem object when it is deallocated. Poisoning
the buffer should trigger an error (or crash) from GSP-RM if it tries
to access the buffer after we've deallocated it, because we were wrong
about when it is safe to deallocate.
Finally, change the mem->size field to a size_t because that's the same
type that dma_alloc_coherent expects.
Cc: <stable@vger.kernel.org> # v6.7
Fixes: 176fdcbddf ("drm/nouveau/gsp/r535: add support for booting GSP-RM")
Signed-off-by: Timur Tabi <ttabi@nvidia.com>
Signed-off-by: Danilo Krummrich <dakr@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240202230608.1981026-1-ttabi@nvidia.com
The example for PCI devices has some addressing errors. 'reg' is written
as if the parent bus is PCI, but the default bus for examples is 1
address and size cell. 'ranges' is defining config space with a
size of 0. Generally, config space should not be defined in
'ranges', only PCI memory and I/O spaces. Fix these issues by updating
the values with made-up, but valid values.
This was uncovered with recent dtschema changes.
Link: https://lore.kernel.org/r/20240122173514.935742-1-robh@kernel.org
Signed-off-by: Rob Herring <robh@kernel.org>
The check in nand_base.c, nand_scan_tail() : has the following code:
(ecc->steps * ecc->size != mtd->writesize) which fails for some NAND chips.
Remove ECC entries in this driver which are not integral multiplications,
and adjust the number of chunks for entries which fails the above
calculation so it will calculate correctly (this was previously done
automatically before the check and was removed in a later commit).
Fixes: 68c18dae68 ("mtd: rawnand: marvell: add missing layouts")
Cc: stable@vger.kernel.org
Signed-off-by: Elad Nachman <enachman@marvell.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
A recent change to check_for_locks() changed it to take ->flc_lock while
holding ->fi_lock. This creates a lock inversion (reported by lockdep)
because there is a case where ->fi_lock is taken while holding
->flc_lock.
->flc_lock is held across ->fl_lmops callbacks, and
nfsd_break_deleg_cb() is one of those and does take ->fi_lock. However
it doesn't need to.
Prior to v4.17-rc1~110^2~22 ("nfsd: create a separate lease for each
delegation") nfsd_break_deleg_cb() would walk the ->fi_delegations list
and so needed the lock. Since then it doesn't walk the list and doesn't
need the lock.
Two actions are performed under the lock. One is to call
nfsd_break_one_deleg which calls nfsd4_run_cb(). These doesn't act on
the nfs4_file at all, so don't need the lock.
The other is to set ->fi_had_conflict which is in the nfs4_file.
This field is only ever set here (except when initialised to false)
so there is no possible problem will multiple threads racing when
setting it.
The field is tested twice in nfs4_set_delegation(). The first test does
not hold a lock and is documented as an opportunistic optimisation, so
it doesn't impose any need to hold ->fi_lock while setting
->fi_had_conflict.
The second test in nfs4_set_delegation() *is* make under ->fi_lock, so
removing the locking when ->fi_had_conflict is set could make a change.
The change could only be interesting if ->fi_had_conflict tested as
false even though nfsd_break_one_deleg() ran before ->fi_lock was
unlocked. i.e. while hash_delegation_locked() was running.
As hash_delegation_lock() doesn't interact in any way with nfs4_run_cb()
there can be no importance to this interaction.
So this patch removes the locking from nfsd_break_one_deleg() and moves
the final test on ->fi_had_conflict out of the locked region to make it
clear that locking isn't important to the test. It is still tested
*after* vfs_setlease() has succeeded. This might be significant and as
vfs_setlease() takes ->flc_lock, and nfsd_break_one_deleg() is called
under ->flc_lock this "after" is a true ordering provided by a spinlock.
Fixes: edcf972515 ("nfsd: fix RELEASE_LOCKOWNER")
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
For DMA mode, the bus width of the DMA is equal to the size of data
word, so burst length should be configured as bits per word.
For CPU mode, because of the spi transfer len is in byte, so calculate
the total number of words according to spi transfer len and bits per
word, burst length should be configured as total data bits.
Signed-off-by: Carlos Song <carlos.song@nxp.com>
Reviewed-by: Clark Wang <xiaoning.wang@nxp.com>
Fixes: e9b220aeac ("spi: spi-imx: correctly configure burst length when using dma")
Fixes: 5f66db08cb ("spi: imx: Take in account bits per word instead of assuming 8-bits")
Link: https://lore.kernel.org/r/20240204091912.36488-1-carlos.song@nxp.com
Signed-off-by: Mark Brown <broonie@kernel.org>
The tascodec_init() of the snd-soc-tas2781-comlib module is called from
snd-soc-tas2781-i2c and snd-hda-scodec-tas2781-i2c modules. It calls
request_firmware_nowait() with parameter THIS_MODULE and a cont/callback
from the latter modules.
The latter modules can be removed while their callbacks are running,
resulting in a general protection failure.
Add module parameter to tascodec_init() so request_firmware_nowait() can
be called with the module of the callback.
Fixes: ef3bcde75d ("ASoC: tas2781: Add tas2781 driver")
CC: stable@vger.kernel.org
Signed-off-by: Gergo Koteles <soyer@irl.hu>
Link: https://lore.kernel.org/r/118dad922cef50525e5aab09badef2fa0eb796e5.1707076603.git.soyer@irl.hu
Signed-off-by: Mark Brown <broonie@kernel.org>
In very slow environments, most big TCP cases including
segmentation and reassembly of big TCP packets have a good
chance to fail: by default the TCP client uses write size
well below 64K. If the host is low enough autocorking is
unable to build real big TCP packets.
Address the issue using much larger write operations.
Note that is hard to observe the issue without an extremely
slow and/or overloaded environment; reduce the TCP transfer
time to allow for much easier/faster reproducibility.
Fixes: 6bb382bcf7 ("selftests: add a selftest for big tcp")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Howells says:
====================
rxrpc: Miscellaneous fixes
Here some miscellaneous fixes for AF_RXRPC:
(1) The zero serial number has a special meaning in an ACK packet serial
reference, so skip it when assigning serial numbers to transmitted
packets.
(2) Don't set the reference serial number in a delayed ACK as the ACK
cannot be used for RTT calculation.
(3) Don't emit a DUP ACK response to a PING RESPONSE ACK coming back to a
call that completed in the meantime.
(4) Fix the counting of acks and nacks in ACK packet to better drive
congestion management. We want to know if there have been new
acks/nacks since the last ACK packet, not that there are still
acks/nacks. This is more complicated as we have to save the old SACK
table and compare it.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix the counting of new acks and nacks when parsing a packet - something
that is used in congestion control.
As the code stands, it merely notes if there are any nacks whereas what we
really should do is compare the previous SACK table to the new one,
assuming we get two successive ACK packets with nacks in them. However, we
really don't want to do that if we can avoid it as the tables might not
correspond directly as one may be shifted from the other - something that
will only get harder to deal with once extended ACK tables come into full
use (with a capacity of up to 8192).
Instead, count the number of nacks shifted out of the old SACK, the number
of nacks retained in the portion still active and the number of new acks
and nacks in the new table then calculate what we need.
Note this ends up a bit of an estimate as the Rx protocol allows acks to be
withdrawn by the receiver and packets requested to be retransmitted.
Fixes: d57a3a1516 ("rxrpc: Save last ACK's SACK table rather than marking txbufs")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: linux-afs@lists.infradead.org
cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
In the Rx protocol, every packet generated is marked with a per-connection
monotonically increasing serial number. This number can be referenced in
an ACK packet generated in response to an incoming packet - thereby
allowing the sender to use this for RTT determination, amongst other
things.
However, if the reference field in the ACK is zero, it doesn't refer to any
incoming packet (it could be a ping to find out if a packet got lost, for
example) - so we shouldn't generate zero serial numbers.
Fix the generation of serial numbers to retry if it comes up with a zero.
Furthermore, since the serial numbers are only ever allocated within the
I/O thread this connection is bound to, there's no need for atomics so
remove that too.
Fixes: 17926a7932 ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: linux-afs@lists.infradead.org
cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
In kasan_init_region, when k_start is not page aligned, at the begin of
for loop, k_cur = k_start & PAGE_MASK is less than k_start, and then
`va = block + k_cur - k_start` is less than block, the addr va is invalid,
because the memory address space from va to block is not alloced by
memblock_alloc, which will not be reserved by memblock_reserve later, it
will be used by other places.
As a result, memory overwriting occurs.
for example:
int __init __weak kasan_init_region(void *start, size_t size)
{
[...]
/* if say block(dcd97000) k_start(feef7400) k_end(feeff3fe) */
block = memblock_alloc(k_end - k_start, PAGE_SIZE);
[...]
for (k_cur = k_start & PAGE_MASK; k_cur < k_end; k_cur += PAGE_SIZE) {
/* at the begin of for loop
* block(dcd97000) va(dcd96c00) k_cur(feef7000) k_start(feef7400)
* va(dcd96c00) is less than block(dcd97000), va is invalid
*/
void *va = block + k_cur - k_start;
[...]
}
[...]
}
Therefore, page alignment is performed on k_start before
memblock_alloc() to ensure the validity of the VA address.
Fixes: 663c0c9496 ("powerpc/kasan: Fix shadow area set up for modules.")
Signed-off-by: Jiangfeng Xiao <xiaojiangfeng@huawei.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/1705974359-43790-1-git-send-email-xiaojiangfeng@huawei.com
MMU_FTR_USE_HIGH_BATS is set for G2_LE cores and derivatives like e300cX,
but the high BATs need to be enabled in HID2 to work. Add register
definitions and add the needed setup to __setup_cpu_603.
This fixes boot on CPUs like the MPC5200B with STRICT_KERNEL_RWX enabled
on systems where the flag has not been set by the bootloader already.
Fixes: e4d6654ebe ("powerpc/mm/32s: rework mmu_mapin_ram()")
Signed-off-by: Matthias Schiffer <matthias.schiffer@ew.tq-group.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240124103838.43675-1-matthias.schiffer@ew.tq-group.com
Calling get_system_loc_code before checking devfd and errno fails the test
when the device is not available, the expected behaviour is a SKIP.
Change the order of 'SKIP_IF_MSG' to correctly SKIP when the /dev/
papr-vpd device is not available.
Test output before:
Test FAILED on line 271
Test output after:
[SKIP] Test skipped on line 266: /dev/papr-vpd not present
Signed-off-by: R Nageswara Sastry <rnsastry@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240131130859.14968-1-rnsastry@linux.ibm.com
Nysal reported that userspace backtraces are missing in offcputime bcc
tool. As an example:
$ sudo ./bcc/tools/offcputime.py -uU
Tracing off-CPU time (us) of user threads by user stack... Hit Ctrl-C to end.
^C
write
- python (9107)
8
write
- sudo (9105)
9
mmap
- python (9107)
16
clock_nanosleep
- multipathd (697)
3001604
The offcputime bcc tool attaches a bpf program to a kprobe on
finish_task_switch(), which is usually hit on a syscall from userspace.
With the switch to system call vectored, we started setting
pt_regs->link to zero. This is because system call vectored behaves like
a function call with LR pointing to the system call return address, and
with no modification to SRR0/SRR1. The LR value does indicate our next
instruction, so it is being saved as pt_regs->nip, and pt_regs->link is
being set to zero. This is not a problem by itself, but BPF uses perf
callchain infrastructure for capturing stack traces, and that stores LR
as the second entry in the stack trace. perf has code to cope with the
second entry being zero, and skips over it. However, generic userspace
unwinders assume that a zero entry indicates end of the stack trace,
resulting in a truncated userspace stack trace.
Rather than fixing all userspace unwinders to ignore/skip past the
second entry, store the real LR value in pt_regs->link so that there
continues to be a valid, though duplicate entry in the stack trace.
With this change:
$ sudo ./bcc/tools/offcputime.py -uU
Tracing off-CPU time (us) of user threads by user stack... Hit Ctrl-C to end.
^C
write
write
[unknown]
[unknown]
[unknown]
[unknown]
[unknown]
PyObject_VectorcallMethod
[unknown]
[unknown]
PyObject_CallOneArg
PyFile_WriteObject
PyFile_WriteString
[unknown]
[unknown]
PyObject_Vectorcall
_PyEval_EvalFrameDefault
PyEval_EvalCode
[unknown]
[unknown]
[unknown]
_PyRun_SimpleFileObject
_PyRun_AnyFileObject
Py_RunMain
[unknown]
Py_BytesMain
[unknown]
__libc_start_main
- python (1293)
7
write
write
[unknown]
sudo_ev_loop_v1
sudo_ev_dispatch_v1
[unknown]
[unknown]
[unknown]
[unknown]
__libc_start_main
- sudo (1291)
7
syscall
syscall
bpf_open_perf_buffer_opts
[unknown]
[unknown]
[unknown]
[unknown]
_PyObject_MakeTpCall
PyObject_Vectorcall
_PyEval_EvalFrameDefault
PyEval_EvalCode
[unknown]
[unknown]
[unknown]
_PyRun_SimpleFileObject
_PyRun_AnyFileObject
Py_RunMain
[unknown]
Py_BytesMain
[unknown]
__libc_start_main
- python (1293)
11
clock_nanosleep
clock_nanosleep
nanosleep
sleep
[unknown]
[unknown]
__clone
- multipathd (698)
3001661
Fixes: 7fa95f9ada ("powerpc/64s: system call support for scv/rfscv instructions")
Cc: stable@vger.kernel.org
Reported-by: "Nysal Jan K.A" <nysal@linux.ibm.com>
Signed-off-by: Naveen N Rao <naveen@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240202154316.395276-1-naveen@kernel.org
Louis Peens says:
====================
nfp: a few simple driver fixes
This is combining a few unrelated one-liner fixes which have been
floating around internally into a single series. I'm not sure what is
the least amount of overhead for reviewers, this or a separate
submission per-patch? I guess it probably depends on personal
preference, but please let me know if there is a strong preference to
rather split these in the future.
Summary:
Patch1: Fixes an old issue which was hidden because 0 just so happens to
be the correct value.
Patch2: Fixes a corner case for flower offloading with bond ports
Patch3: Re-enables the 'NETDEV_XDP_ACT_REDIRECT', which was accidentally
disabled after a previous refactor.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Enable previously excluded xdp feature flag for NFD3 devices. This
feature flag is required in order to bind nfp interfaces to an xdp
socket and the nfp driver does in fact support the feature.
Fixes: 66c0e13ad2 ("drivers: net: turn on XDP features")
Cc: stable@vger.kernel.org # 6.3+
Signed-off-by: James Hershaw <james.hershaw@corigine.com>
Signed-off-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When physical ports are reset (either through link failure or manually
toggled down and up again) that are slaved to a Linux bond with a tunnel
endpoint IP address on the bond device, not all tunnel packets arriving
on the bond port are decapped as expected.
The bond dev assigns the same MAC address to itself and each of its
slaves. When toggling a slave device, the same MAC address is therefore
offloaded to the NFP multiple times with different indexes.
The issue only occurs when re-adding the shared mac. The
nfp_tunnel_add_shared_mac() function has a conditional check early on
that checks if a mac entry already exists and if that mac entry is
global: (entry && nfp_tunnel_is_mac_idx_global(entry->index)). In the
case of a bonded device (For example br-ex), the mac index is obtained,
and no new index is assigned.
We therefore modify the conditional in nfp_tunnel_add_shared_mac() to
check if the port belongs to the LAG along with the existing checks to
prevent a new global mac index from being re-assigned to the slave port.
Fixes: 20cce88650 ("nfp: flower: enable MAC address sharing for offloadable devs")
CC: stable@vger.kernel.org # 5.1+
Signed-off-by: Daniel de Villiers <daniel.devilliers@corigine.com>
Signed-off-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The 1st and 2nd expansion BAR configuration registers are configured,
when the driver starts up, in variables 'barcfg_msix_general' and
'barcfg_msix_xpb', respectively. The 'LengthSelect' field is ORed in
from bit 0, which is incorrect. The 'LengthSelect' field should
start from bit 27.
This has largely gone un-noticed because
NFP_PCIE_BAR_PCIE2CPP_LengthSelect_32BIT happens to be 0.
Fixes: 4cb584e0ee ("nfp: add CPP access core")
Cc: stable@vger.kernel.org # 4.11+
Signed-off-by: Daniel Basilio <daniel.basilio@corigine.com>
Signed-off-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The .compat section is a dummy PE section that contains the address of
the 32-bit entrypoint of the 64-bit kernel image if it is bootable from
32-bit firmware (i.e., CONFIG_EFI_MIXED=y)
This section is only 8 bytes in size and is only referenced from the
loader, and so it is placed at the end of the memory view of the image,
to avoid the need for padding it to 4k, which is required for sections
appearing in the middle of the image.
Unfortunately, this violates the PE/COFF spec, and even if most EFI
loaders will work correctly (including the Tianocore reference
implementation), PE loaders do exist that reject such images, on the
basis that both the file and memory views of the file contents should be
described by the section headers in a monotonically increasing manner
without leaving any gaps.
So reorganize the sections to avoid this issue. This results in a slight
padding overhead (< 4k) which can be avoided if desired by disabling
CONFIG_EFI_MIXED (which is only needed in rare cases these days)
Fixes: 3e3eabe26d ("x86/boot: Increase section and file alignment to 4k/512")
Reported-by: Mike Beaton <mjsbeaton@gmail.com>
Link: https://lkml.kernel.org/r/CAHzAAWQ6srV6LVNdmfbJhOwhBw5ZzxxZZ07aHt9oKkfYAdvuQQ%40mail.gmail.com
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
This reverts commit 095b96b2b8.
Marek reported the similar change in imx8mp-dhcom-pdk3.dts
caused a regression:
"This patch breaks USB-C port on this board.
If I plug in USB-C storage device, it is not detected. If I revert this
patch, it is detected.
Please drop this patch for now."
Revert it here as well.
Fixes: 095b96b2b8 ("arm64: dts: imx8mn-var-som-symphony: Describe the USB-C connector")
Signed-off-by: Fabio Estevam <festevam@denx.de>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
IOVDD is supplied by 1.8V, fix the referenced regulator.
Fixes: d8f9d81265 ("arm64: dts: imx8mp: Add analog audio output on i.MX8MP TQMa8MPxL/MBa8MPxL")
Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Make loading ib_srpt with this parameter set work. The current behavior is
that setting that parameter while loading the ib_srpt kernel module
triggers the following kernel crash:
BUG: kernel NULL pointer dereference, address: 0000000000000000
Call Trace:
<TASK>
parse_one+0x18c/0x1d0
parse_args+0xe1/0x230
load_module+0x8de/0xa60
init_module_from_file+0x8b/0xd0
idempotent_init_module+0x181/0x240
__x64_sys_finit_module+0x5a/0xb0
do_syscall_64+0x5f/0xe0
entry_SYSCALL_64_after_hwframe+0x6e/0x76
Cc: LiHonggang <honggangli@163.com>
Reported-by: LiHonggang <honggangli@163.com>
Fixes: a42d985bd5 ("ib_srpt: Initial SRP Target merge for v3.3-rc1")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20240205004207.17031-1-bvanassche@acm.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Calling fd_install() makes a file reachable for userland, including the
possibility to close the file descriptor, which leads to calling its
'release' hook. If that happens before the code had a chance to bump the
reference of the newly created task struct, the release callback will
call put_task_struct() too early, leading to the premature destruction
of the kernel thread.
Avoid that race by calling fd_install() later, after all the setup is
done.
Fixes: 1c6fdbd8f2 ("bcachefs: Initial commit")
Signed-off-by: Mathias Krause <minipli@grsecurity.net>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Some PAPR system parameter values are formatted by firmware as
nul-terminated strings (e.g. LPAR name, shared processor attributes).
But the values returned for other parameters, such as processor module
info and TLB block invalidate characteristics, are binary data with
parameter-specific layouts. So char[] isn't the appropriate type for
the general case. Use u8/__u8.
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Fixes: 905b9e4878 ("powerpc/pseries/papr-sysparm: Expose character device to user space")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240202-papr-sysparm-ioblock-data-use-u8-v1-1-f5c6c89f65ec@linux.ibm.com
inet_recv_error() is called without holding the socket lock.
IPv6 socket could mutate to IPv4 with IPV6_ADDRFORM
socket option and trigger a KCSAN warning.
Fixes: f4713a3dfa ("net-timestamp: make tcp_recvmsg call ipv6_recv_error for AF_INET6 socks")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, coretemp driver supports only 128 cores per package.
This loses some core temperature information on systems that have more
than 128 cores per package.
[ 58.685033] coretemp coretemp.0: Adding Core 128 failed
[ 58.692009] coretemp coretemp.0: Adding Core 129 failed
...
Enlarge the limitation to 512 because there are platforms with more than
256 cores per package.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Link: https://lore.kernel.org/r/20240202092144.71180-4-rui.zhang@intel.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Before commit 7108b80a54 ("hwmon/coretemp: Handle large core ID
value"), there is a fixed mapping between
1. cpu_core_id
2. the index in pdata->core_data[] array
3. the sysfs attr name, aka "tempX_"
The later two always equal cpu_core_id + 2.
After the commit, pdata->core_data[] index is got from ida so that it
can handle sparse core ids and support more cores within a package.
However, the commit erroneously maps the sysfs attr name to
pdata->core_data[] index instead of cpu_core_id + 2.
As a result, the code is not aligned with the comments, and brings user
visible changes in hwmon sysfs on systems with sparse core id.
For example, before commit 7108b80a54 ("hwmon/coretemp: Handle large
core ID value"),
/sys/class/hwmon/hwmon2/temp2_label:Core 0
/sys/class/hwmon/hwmon2/temp3_label:Core 1
/sys/class/hwmon/hwmon2/temp4_label:Core 2
/sys/class/hwmon/hwmon2/temp5_label:Core 3
/sys/class/hwmon/hwmon2/temp6_label:Core 4
/sys/class/hwmon/hwmon3/temp10_label:Core 8
/sys/class/hwmon/hwmon3/temp11_label:Core 9
after commit,
/sys/class/hwmon/hwmon2/temp2_label:Core 0
/sys/class/hwmon/hwmon2/temp3_label:Core 1
/sys/class/hwmon/hwmon2/temp4_label:Core 2
/sys/class/hwmon/hwmon2/temp5_label:Core 3
/sys/class/hwmon/hwmon2/temp6_label:Core 4
/sys/class/hwmon/hwmon2/temp7_label:Core 8
/sys/class/hwmon/hwmon2/temp8_label:Core 9
Restore the previous behavior and rework the code, comments and variable
names to avoid future confusions.
Fixes: 7108b80a54 ("hwmon/coretemp: Handle large core ID value")
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Link: https://lore.kernel.org/r/20240202092144.71180-3-rui.zhang@intel.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
If hv_netvsc driver is unloaded and reloaded, the NET_DEVICE_REGISTER
handler cannot perform VF register successfully as the register call
is received before netvsc_probe is finished. This is because we
register register_netdevice_notifier() very early( even before
vmbus_driver_register()).
To fix this, we try to register each such matching VF( if it is visible
as a netdevice) at the end of netvsc_probe.
Cc: stable@vger.kernel.org
Fixes: 8552085646 ("hv_netvsc: Fix race of register_netdevice_notifier and VF register")
Suggested-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For ARCH=arm64, virt/lib/Kconfig is sourced twice,
from arch/arm64/kvm/Kconfig and from drivers/vfio/Kconfig.
There is no good reason to parse virt/lib/Kconfig twice.
Commit 2412405b31 ("KVM: arm/arm64: register irq bypass consumer
on ARM/ARM64") should not have added this 'source' directive.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20240204074305.31492-1-masahiroy@kernel.org
Pull ext4 fixes from Ted Ts'o:
"Miscellaneous bug fixes and cleanups in ext4's multi-block allocator
and extent handling code"
* tag 'for-linus-6.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (23 commits)
ext4: make ext4_set_iomap() recognize IOMAP_DELALLOC map type
ext4: make ext4_map_blocks() distinguish delalloc only extent
ext4: add a hole extent entry in cache after punch
ext4: correct the hole length returned by ext4_map_blocks()
ext4: convert to exclusive lock while inserting delalloc extents
ext4: refactor ext4_da_map_blocks()
ext4: remove 'needed' in trace_ext4_discard_preallocations
ext4: remove unnecessary parameter "needed" in ext4_discard_preallocations
ext4: remove unused return value of ext4_mb_release_group_pa
ext4: remove unused return value of ext4_mb_release_inode_pa
ext4: remove unused return value of ext4_mb_release
ext4: remove unused ext4_allocation_context::ac_groups_considered
ext4: remove unneeded return value of ext4_mb_release_context
ext4: remove unused parameter ngroup in ext4_mb_choose_next_group_*()
ext4: remove unused return value of __mb_check_buddy
ext4: mark the group block bitmap as corrupted before reporting an error
ext4: avoid allocating blocks from corrupted group in ext4_mb_find_by_goal()
ext4: avoid allocating blocks from corrupted group in ext4_mb_try_best_found()
ext4: avoid dividing by 0 in mb_update_avg_fragment_size() when block bitmap corrupt
ext4: avoid bb_free and bb_fragments inconsistency in mb_free_blocks()
...
Pull smb client fixes from Steve French:
"Five smb3 client fixes, mostly multichannel related:
- four multichannel fixes including fix for channel allocation when
multiple inactive channels, fix for unneeded race in channel
deallocation, correct redundant channel scaling, and redundant
multichannel disabling scenarios
- add warning if max compound requests reached"
* tag 'v6.8-rc3-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb: client: increase number of PDUs allowed in a compound request
cifs: failure to add channel on iface should bump up weight
cifs: do not search for channel if server is terminating
cifs: avoid redundant calls to disable multichannel
cifs: make sure that channel scaling is done only once
Pull xfs fixes from Chandan Babu:
- Clear XFS_ATTR_INCOMPLETE filter on removing xattr from a node format
attribute fork
- Remove conditional compilation of realtime geometry validator
functions to prevent confusing error messages from being printed on
the console during the mount operation
* tag 'xfs-6.8-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: remove conditional building of rt geometry validator functions
xfs: reset XFS_ATTR_INCOMPLETE filter on node removal
Pull char/misc driver fixes from Greg KH:
"Here are three tiny driver fixes for 6.8-rc3. They include:
- Android binder long-term bug with epoll finally being fixed
- fastrpc driver shutdown bugfix
- open-dice lockdep fix
All of these have been in linux-next this week with no reported
issues"
* tag 'char-misc-6.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
binder: signal epoll threads of self-work
misc: open-dice: Fix spurious lockdep warning
misc: fastrpc: Mark all sessions as invalid in cb_remove
Pull tty and serial driver fixes from Greg KH:
"Here are some small tty and serial driver fixes for 6.8-rc3 that
resolve a number of reported issues. Included in here are:
- rs485 flag definition fix that affected the user/kernel abi in -rc1
- max310x driver fixes
- 8250_pci1xxxx driver off-by-one fix
- uart_tiocmget locking race fix
All of these have been in linux-next for over a week with no reported
issues"
* tag 'tty-6.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
serial: max310x: prevent infinite while() loop in port startup
serial: max310x: fail probe if clock crystal is unstable
serial: max310x: improve crystal stable clock detection
serial: max310x: set default value when reading clock ready bit
serial: core: Fix atomicity violation in uart_tiocmget
serial: 8250_pci1xxxx: fix off by one in pci1xxxx_process_read_data()
tty: serial: Fix bit order in RS485 flag definitions
Pull USB driver fixes from Greg KH:
"Here are a bunch of small USB driver fixes for 6.8-rc3. Included in
here are:
- new usb-serial driver ids
- new dwc3 driver id added
- typec driver change revert
- ncm gadget driver endian bugfix
- xhci bugfixes for a number of reported issues
- usb hub bugfix for alternate settings
- ulpi driver debugfs memory leak fix
- chipidea driver bugfix
- usb gadget driver fixes
All of these have been in linux-next for a while with no reported
issues"
* tag 'usb-6.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (24 commits)
USB: serial: option: add Fibocom FM101-GL variant
USB: serial: qcserial: add new usb-id for Dell Wireless DW5826e
USB: serial: cp210x: add ID for IMST iM871A-USB
usb: typec: tcpm: fix the PD disabled case
usb: ucsi_acpi: Quirk to ack a connector change ack cmd
usb: ucsi_acpi: Fix command completion handling
usb: ucsi: Add missing ppm_lock
usb: ulpi: Fix debugfs directory leak
Revert "usb: typec: tcpm: fix cc role at port reset"
usb: gadget: pch_udc: fix an Excess kernel-doc warning
usb: f_mass_storage: forbid async queue when shutdown happen
USB: hub: check for alternate port before enabling A_ALT_HNP_SUPPORT
usb: chipidea: core: handle power lost in workqueue
usb: dwc3: gadget: Fix NULL pointer dereference in dwc3_gadget_suspend
usb: dwc3: pci: add support for the Intel Arrow Lake-H
usb: core: Prevent null pointer dereference in update_port_device_state
xhci: handle isoc Babble and Buffer Overrun events properly
xhci: process isoc TD properly when there was a transaction error mid TD.
xhci: fix off by one check when adding a secondary interrupter.
xhci: fix possible null pointer dereference at secondary interrupter removal
...
Pull i2c fixlet from Wolfram Sang:
"MAINTAINERS update to point people to the new tree for i2c host driver
changes"
* tag 'i2c-for-6.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
MAINTAINERS: Update i2c host drivers repository
Pull dmaengine fixes from Vinod Koul:
"Core:
- fix return value of is_slave_direction() for D2D dma
Driver fixes for:
- Documentaion fixes to resolve warnings for at_hdmac driver
- bunch of fsl driver fixes for memory leaks, and useless kfree
- TI edma and k3 fixes for packet error and null pointer checks"
* tag 'dmaengine-fix-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine:
dmaengine: at_hdmac: add missing kernel-doc style description
dmaengine: fix is_slave_direction() return false when DMA_DEV_TO_DEV
dmaengine: fsl-qdma: Remove a useless devm_kfree()
dmaengine: fsl-qdma: Fix a memory leak related to the queue command DMA
dmaengine: fsl-qdma: Fix a memory leak related to the status queue DMA
dmaengine: ti: k3-udma: Report short packet errors
dmaengine: ti: edma: Add some null pointer checks to the edma_probe
dmaengine: fsl-dpaa2-qdma: Fix the size of dma pools
dmaengine: at_hdmac: fix some kernel-doc warnings
Jonathan reports that CXL CPER events dump an extra generic error
message.
{1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
{1}[Hardware Error]: event severity: recoverable
{1}[Hardware Error]: Error 0, type: recoverable
{1}[Hardware Error]: section type: unknown, fbcd0a77-c260-417f-85a9-088b1621eba6
{1}[Hardware Error]: section length: 0x90
{1}[Hardware Error]: 00000000: 00000090 00000007 00000000 0d938086 ................
{1}[Hardware Error]: 00000010: 00100000 00000000 00040000 00000000 ................
...
CXL events were rerouted though the CXL subsystem for additional
processing. However, when that work was done it was missed that
cper_estatus_print_section() continued with a generic error message
which is confusing.
Teach CPER print code to ignore printing details of some section types.
Assign the CXL event GUIDs to this set to prevent confusing unknown
prints.
Reported-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Suggested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
UART4 is used as CM7 coprocessor debug UART and may not be accessible from
Linux in case it is protected by RDC. The RDC protection is set up by the
platform firmware. UART4 is not used on this platform by Linux. Disable
UART4 by default to prevent boot hangs, which occur when the RDC protection
is in place.
Fixes: 562d222f23 ("arm64: dts: imx8mp: Add support for Data Modul i.MX8M Plus eDM SBC")
Signed-off-by: Marek Vasut <marex@denx.de>
Reviewed-by: Fabio Estevam <festevam@gmail.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Pull perf tools fixes from Arnaldo Carvalho de Melo:
"Vendor events:
- Intel Alderlake/Sapphire Rapids metric fixes, the CPU type
("cpu_atom", "cpu_core") needs to be used as a prefix to be
considered on a metric formula, detected via one of the 'perf test'
entries.
'perf test' fixes:
- Fix the creation of event selector lists on 'perf test' entries, by
initializing the sample ID flag, which is done by 'perf record', so
this fix affects only the tests, the common case isn't affected
- Make 'perf list' respect debug settings (-v) to fix its 'perf test'
entry
- Fix 'perf script' test when python support isn't enabled
- Special case 'perf script' tests on s390, where only DWARF call
graphs are supported and only on software events
- Make 'perf daemon' signal test less racy
Compiler warnings/errors:
- Remove needless malloc(0) call in 'perf top' that triggers
-Walloc-size
- Fix calloc() argument order to address error introduced in gcc-14
Build:
- Make minimal shellcheck version to v0.6.0, avoiding the build to
fail with older versions
Sync kernel header copies:
- stat.h to pick STATX_MNT_ID_UNIQUE
- msr-index.h to pick IA32_MKTME_KEYID_PARTITIONING
- drm.h to pick DRM_IOCTL_MODE_CLOSEFB
- unistd.h to pick {list,stat}mount,
lsm_{[gs]et_self_attr,list_modules} syscall numbers
- x86 cpufeatures to pick TDX, Zen, APIC MSR fence changes
- x86's mem{cpy,set}_64.S used in 'perf bench'
- Also, without tooling effects: asm-generic/unaligned.h, mount.h,
fcntl.h, kvm headers"
* tag 'perf-tools-fixes-for-v6.8-1-2024-02-01' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (21 commits)
perf tools headers: update the asm-generic/unaligned.h copy with the kernel sources
tools include UAPI: Sync linux/mount.h copy with the kernel sources
perf evlist: Fix evlist__new_default() for > 1 core PMU
tools headers: Update the copy of x86's mem{cpy,set}_64.S used in 'perf bench'
tools headers x86 cpufeatures: Sync with the kernel sources to pick TDX, Zen, APIC MSR fence changes
tools headers UAPI: Sync unistd.h to pick {list,stat}mount, lsm_{[gs]et_self_attr,list_modules} syscall numbers
perf vendor events intel: Alderlake/sapphirerapids metric fixes
tools headers UAPI: Sync kvm headers with the kernel sources
perf tools: Fix calloc() arguments to address error introduced in gcc-14
perf top: Remove needless malloc(0) call that triggers -Walloc-size
perf build: Make minimal shellcheck version to v0.6.0
tools headers UAPI: Update tools's copy of drm.h headers to pick DRM_IOCTL_MODE_CLOSEFB
perf test shell daemon: Make signal test less racy
perf test shell script: Fix test for python being disabled
perf test: Workaround debug output in list test
perf list: Add output file option
perf list: Switch error message to pr_err() to respect debug settings (-v)
perf test: Fix 'perf script' tests on s390
tools headers UAPI: Sync linux/fcntl.h with the kernel sources
tools arch x86: Sync the msr-index.h copy with the kernel sources to pick IA32_MKTME_KEYID_PARTITIONING
...
When qmem_alloc and pfvf->hw_ops->sq_aq_init fails, sq->sg should be
freed to prevent memleak.
Fixes: c9c12d339d ("octeontx2-pf: Add support for PTP clock")
Signed-off-by: Zhipeng Lu <alexious@zju.edu.cn>
Acked-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When alloc_scq fails, card->vcs[0] (i.e. vc) should be freed. Otherwise,
in the following call chain:
idt77252_init_one
|-> idt77252_dev_open
|-> open_card_ubr0
|-> alloc_scq [failed]
|-> deinit_card
|-> vfree(card->vcs);
card->vcs is freed and card->vcs[0] is leaked.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Zhipeng Lu <alexious@zju.edu.cn>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the ICMPv6 error is built from a non-linear skb we get the following
splat,
BUG: KASAN: slab-out-of-bounds in do_csum+0x220/0x240
Read of size 4 at addr ffff88811d402c80 by task netperf/820
CPU: 0 PID: 820 Comm: netperf Not tainted 6.8.0-rc1+ #543
...
kasan_report+0xd8/0x110
do_csum+0x220/0x240
csum_partial+0xc/0x20
skb_tunnel_check_pmtu+0xeb9/0x3280
vxlan_xmit_one+0x14c2/0x4080
vxlan_xmit+0xf61/0x5c00
dev_hard_start_xmit+0xfb/0x510
__dev_queue_xmit+0x7cd/0x32a0
br_dev_queue_push_xmit+0x39d/0x6a0
Use skb_checksum instead of csum_partial who cannot deal with non-linear
SKBs.
Fixes: 4cb47a8644 ("tunnels: PMTU discovery support for directly bridged IP packets")
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For XDP_TX action xdp_buff is converted to xdp_frame. The conversion is
done by xdp_convert_buff_to_frame(). The memory type of the resulting
xdp_frame depends on the memory type of the xdp_buff. For page pool
based xdp_buff it produces xdp_frame with memory type
MEM_TYPE_PAGE_POOL. For zero copy XSK pool based xdp_buff it produces
xdp_frame with memory type MEM_TYPE_PAGE_ORDER0.
tsnep_xdp_xmit_back() is not prepared for that and uses always the page
pool buffer type TSNEP_TX_TYPE_XDP_TX. This leads to invalid mappings
and the transmission of undefined data.
Improve tsnep_xdp_xmit_back() to use the generic buffer type
TSNEP_TX_TYPE_XDP_NDO for zero copy XDP_TX.
Fixes: 3fc2333933 ("tsnep: Add XDP socket zero-copy RX support")
Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni says:
====================
selftests: net: more fixes
Another small bunch of fixes, addressing issues outlined by the
netdev CI.
The first 2 patches are just rebased.
The following 2 are new fixes, for even more problems that surfaced
meanwhile.
====================
Link: https://lore.kernel.org/r/cover.1706812005.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Using hard-coded constant timeout to wait for some expected
event is deemed to fail sooner or later, especially in slow
env.
Our CI has spotted another of such race:
# TEST: ipv6: cleanup of cached exceptions - nexthop objects [FAIL]
# can't delete veth device in a timely manner, PMTU dst likely leaked
Replace the crude sleep with a loop looking for the expected condition
at low interval for a much longer range.
Fixes: b3cc4f8a8a ("selftests: pmtu: add explicit tests for PMTU exceptions cleanup")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/fd5c745e9bb665b724473af6a9373a8c2a62b247.1706812005.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The pmtu.sh test uses a few TCP listener in a problematic way:
It hard-codes a constant timeout to wait for the listener starting-up
in background. That introduces unneeded latency and on very slow and
busy host it can fail.
Additionally the test starts again the same listener in the same
namespace on the same port, just after the previous connection
completed. Fast host can attempt starting the new server before the
old one really closed the socket.
Address the issues using the wait_local_port_listen helper and
explicitly waiting for the background listener process exit.
Fixes: 136a1b434b ("selftests: net: test vxlan pmtu exceptions with tcp")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/f8e8f6d44427d8c45e9f6a71ee1a321047452087.1706812005.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The udpgro_fwd.sh self-tests are somewhat unstable. There are
a few timing constraints the we struggle to meet on very slow
environments.
Instead of skipping the whole tests in such envs, increase the
test resilience WRT very slow hosts: increase the inter-packets
timeouts, avoid resetting the counters every second and finally
disable reduce the background traffic noise.
Tested with:
for I in $(seq 1 100); do
./tools/testing/selftests/kselftest_install/run_kselftest.sh \
-t net:udpgro_fwd.sh || exit -1
done
in a slow environment.
Fixes: a062260a9d ("selftests: net: add UDP GRO forwarding self-tests")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/f4b6b11064a0d39182a9ae6a853abae3e9b4426a.1706812005.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull tracing and eventfs fixes from Steven Rostedt:
- Fix the return code for ring_buffer_poll_wait()
It was returing a -EINVAL instead of EPOLLERR.
- Zero out the tracefs_inode so that all fields are initialized.
The ti->private could have had stale data, but instead of just
initializing it to NULL, clear out the entire structure when it is
allocated.
- Fix a crash in timerlat
The hrtimer was initialized at read and not open, but is canceled at
close. If the file was opened and never read the close will pass a
NULL pointer to hrtime_cancel().
- Rewrite of eventfs.
Linus wrote a patch series to remove the dentry references in the
eventfs_inode and to use ref counting and more of proper VFS
interfaces to make it work.
- Add warning to put_ei() if ei is not set to free. That means
something is about to free it when it shouldn't.
- Restructure the eventfs_inode to make it more compact, and remove the
unused llist field.
- Remove the fsnotify*() funtions for when the inodes were being
created in the lookup code. It doesn't make sense to notify about
creation just because something is being looked up.
- The inode hard link count was not accurate.
It was being updated when a file was looked up. The inodes of
directories were updating their parent inode hard link count every
time the inode was created. That means if memory reclaim cleaned a
stale directory inode and the inode was lookup up again, it would
increment the parent inode again as well. Al Viro said to just have
all eventfs directories have a hard link count of 1. That tells user
space not to trust it.
* tag 'trace-v6.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
eventfs: Keep all directory links at 1
eventfs: Remove fsnotify*() functions from lookup()
eventfs: Restructure eventfs_inode structure to be more condensed
eventfs: Warn if an eventfs_inode is freed without is_freed being set
tracing/timerlat: Move hrtimer_init to timerlat_fd open()
eventfs: Get rid of dentry pointers without refcounts
eventfs: Clean up dentry ops and add revalidate function
eventfs: Remove unused d_parent pointer field
tracefs: dentry lookup crapectomy
tracefs: Avoid using the ei->dentry pointer unnecessarily
eventfs: Initialize the tracefs inode properly
tracefs: Zero out the tracefs_inode when allocating it
ring-buffer: Clean ring_buffer_poll_wait() error return
Pull gfs2 revert from Andreas Gruenbacher:
"It turns out that the commit to use GL_NOBLOCK flag for non-blocking
lookups has several issues, and not all of them have a simple fix"
* tag 'gfs2-v6.8-rc2-revert' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
Revert "gfs2: Use GL_NOBLOCK flag for non-blocking lookups"
Use a u64 instead of a u8 when taking a snapshot of pmu->fixed_ctr_ctrl
when reprogramming fixed counters, as truncating the value results in KVM
thinking fixed counter 2 is already disabled (the bug also affects fixed
counters 3+, but KVM doesn't yet support those). As a result, if the
guest disables fixed counter 2, KVM will get a false negative and fail to
reprogram/disable emulation of the counter, which can leads to incorrect
counts and spurious PMIs in the guest.
Fixes: 76d287b234 ("KVM: x86/pmu: Drop "u8 ctrl, int idx" for reprogram_fixed_counter()")
Cc: stable@vger.kernel.org
Signed-off-by: Mingwei Zhang <mizhang@google.com>
Link: https://lore.kernel.org/r/20240123221220.3911317-1-mizhang@google.com
[sean: rewrite changelog to call out the effects of the bug]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Pull pci fixes from Bjorn Helgaas:
- Fix a potential deadlock that was reintroduced by an ASPM revert
merged for v6.8 (Johan Hovold)
- Add Manivannan Sadhasivam as PCI Endpoint maintainer (Lorenzo
Pieralisi)
* tag 'pci-v6.8-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
MAINTAINERS: Add Manivannan Sadhasivam as PCI Endpoint maintainer
PCI/ASPM: Fix deadlock when enabling ASPM
Pul drm fixes from Dave Airlie:
"Regular weekly fixes, mostly amdgpu and xe. One nouveau fix is a
better fix for the deadlock and also helps with a sync race we were
seeing.
dma-buf:
- heaps CMA page accounting fix
virtio-gpu:
- fix segment size
xe:
- A crash fix
- A fix for an assert due to missing mem_acces ref
- Only allow a single user-fence per exec / bind.
- Some sparse warning fixes
- Two fixes for compilation failures on various odd combinations of
gcc / arch pointed out on LKML.
- Fix a fragile partial allocation pointed out on LKML.
- A sysfs ABI documentation warning fix
amdgpu:
- Fix reboot issue seen on some 7000 series dGPUs
- Fix client init order for KFD
- Misc display fixes
- USB-C fix
- DCN 3.5 fixes
- Fix issues with GPU scheduler and GPU reset
- GPU firmware loading fix
- Misc fixes
- GC 11.5 fix
- VCN 4.0.5 fix
- IH overflow fix
amdkfd:
- SVM fixes
- Trap handler fix
- Fix device permission lookup
- Properly reserve BO before validating it
nouveau:
- fence/irq lock deadlock fix (second attempt)
- gsp command size fix
* tag 'drm-fixes-2024-02-03' of git://anongit.freedesktop.org/drm/drm: (35 commits)
nouveau: offload fence uevents work to workqueue
nouveau/gsp: use correct size for registry rpc.
drm/amdgpu/pm: Use inline function for IP version check
drm/hwmon: Fix abi doc warnings
drm/xe: Make all GuC ABI shift values unsigned
drm/xe/vm: Subclass userptr vmas
drm/xe: Use LRC prefix rather than CTX prefix in lrc desc defines
drm/xe: Don't use __user error pointers
drm/xe: Annotate mcr_[un]lock()
drm/xe: Only allow 1 ufence per exec / bind IOCTL
drm/xe: Grab mem_access when disabling C6 on skip_guc_pc platforms
drm/xe: Fix crash in trace_dma_fence_init()
drm/amdgpu: Reset IH OVERFLOW_CLEAR bit
drm/amdgpu: remove asymmetrical irq disabling in vcn 4.0.5 suspend
drm/amdgpu: drm/amdgpu: remove golden setting for gfx 11.5.0
drm/amdkfd: reserve the BO before validating it
drm/amdgpu: Fix missing error code in 'gmc_v6/7/8/9_0_hw_init()'
drm/amd/display: Fix buffer overflow in 'get_host_router_total_dp_tunnel_bw()'
drm/amd/display: Add NULL check for kzalloc in 'amdgpu_dm_atomic_commit_tail()'
drm/amd: Don't init MEC2 firmware when it fails to load
...
Pull input fixes from Dmitry Torokhov:
- a fix for the fix to deal with newer laptops which get confused by
the "GET ID" command when probing for PS/2 keyboards
- a couple of tweaks to i8042 to handle Clevo NS70PU and Lifebook U728
laptops
- a change to bcm5974 to validate that the device has appropriate
endpoints
- an addition of new product ID to xpad driver to recognize Lenovo
Legion Go controllers
- a quirk to Goodix controller to deal with extra GPIO described in
ACPI tables on some devices.
* tag 'input-for-v6.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: i8042 - add Fujitsu Lifebook U728 to i8042 quirk table
Input: i8042 - fix strange behavior of touchpad on Clevo NS70PU
Input: atkbd - do not skip atkbd_deactivate() when skipping ATKBD_CMD_GETID
Input: atkbd - skip ATKBD_CMD_SETLEDS when skipping ATKBD_CMD_GETID
Input: bcm5974 - check endpoint type before starting traffic
Input: xpad - add Lenovo Legion Go controllers
Input: goodix - accept ACPI resources with gpio_count == 3 && gpio_int_idx == 0
Pull sound fixes from Takashi Iwai:
"A collection of fixes, mostly device-specific ones:
- Minor PCM core fix for name strings
- ASoC Qualcomm fixes, including DAI support extensions
- ASoC AMD platform updates
- ASoC Allwinner platform updates
- Various ASoC codec fixes for WSA, WCD, ES8326 drivers
- Various HD-audio and USB-audio fixes and quirks
- A series of fixes for Cirrus CS35L56 codecs"
* tag 'sound-6.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (63 commits)
ALSA: usb-audio: Ignore clock selector errors for single connection
ALSA: hda/realtek: Enable headset mic on Vaio VJFE-ADL
ALSA: hda: cs35l56: Remove unused test stub function
ALSA: hda: cs35l56: Firmware file must match the version of preloaded firmware
ALSA: hda: cs35l56: Fix filename string field layout
ALSA: hda: cs35l56: Fix order of searching for firmware files
ASoC: cs35l56: Allow more time for firmware to boot
ASoC: cs35l56: Load tunings for the correct speaker models
ASoC: cs35l56: Firmware file must match the version of preloaded firmware
ASoC: cs35l56: Fix misuse of wm_adsp 'part' string for silicon revision
ASoC: cs35l56: Fix for initializing ASP1 mixer registers
ALSA: hda: cs35l56: Initialize all ASP1 registers
ASoC: cs35l56: Fix default SDW TX mixer registers
ASoC: cs35l56: Fix to ensure ASP1 registers match cache
ASoC: cs35l56: Remove buggy checks from cs35l56_is_fw_reload_needed()
ASoC: cs35l56: Don't add the same register patch multiple times
ASoC: cs35l56: cs35l56_component_remove() must clean up wm_adsp
ASoC: cs35l56: cs35l56_component_remove() must clear cs35l56->component
ASoC: wm_adsp: Don't overwrite fwf_name with the default
ASoC: wm_adsp: Fix firmware file search order
...
Pull power supply fix from Sebastian Reichel:
- qcom_battmgr: revert broken fix
* tag 'for-v6.8-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply:
Revert "power: supply: qcom_battmgr: Register the power supplies after PDR is up"
Pul iommu fixes from Joerg Roedel:
- Make iommu_ops->default_domain work without CONFIG_IOMMU_DMA to fix
initialization of FSL-PAMU devices
- Fix for Tegra fbdev initialization failure
- Fix for a VFIO device unbinding failure on PowerPC
* tag 'iommu-fixes-v6.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
powerpc: iommu: Bring back table group release_ownership() call
drm/tegra: Do not assume that a NULL domain means no DMA IOMMU
iommu: Allow ops->default_domain to work when !CONFIG_IOMMU_DMA
Pull device mapper fixes from Mike Snitzer:
- Fix DM ioctl interface to avoid INT_MAX overflow warnings from
kvmalloc by limiting the number of targets and parameter size area.
- Fix DM stats to avoid INT_MAX overflow warnings from kvmalloc by
limiting the number of entries supported.
- Fix DM writecache to support mapping devices larger than 1 TiB by
switching from using kvmalloc_array to vmalloc_array -- which avoids
INT_MAX overflow in kvmalloc_node and associated warnings.
- Remove the (ab)use of tasklets from both the DM crypt and verity
targets. They will be converted to use BH workqueue in future.
* tag 'for-6.8/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm-crypt, dm-verity: disable tasklets
dm writecache: allow allocations larger than 2GiB
dm stats: limit the number of entries
dm: limit the number of targets and parameter size area
Pull ata fix from Niklas Cassel:
- Following up on last week's ASMedia ASM1061 43-bit dma_mask quirk, we
sent an email to ASMedia developers that have previously been active
on the mailing list, asking exactly which SATA controllers that are
affected by this hardware limitation.
We got a reply that it affects all the SATA controllers in the
ASM106x family, thus extend the existing 43-bit dma_mask quirk to
apply to all the affected ASMedia SATA controllers.
* tag 'ata-6.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux:
ahci: Extend ASM1061 43-bit DMA address quirk to other ASM106x parts
Another Fujitsu-related patch.
In the initial boot stage the integrated keyboard of Fujitsu Lifebook U728
refuses to work and it's not possible to type for example a dm-crypt
passphrase without the help of an external keyboard.
i8042.nomux kernel parameter resolves this issue but using that a PS/2
mouse is detected. This input device is unused even when the i2c-hid-acpi
kernel module is blacklisted making the integrated ELAN touchpad
(04F3:3092) not working at all.
So this notebook uses a hid-over-i2c touchpad which is managed by the
i2c_designware input driver. Since you can't find a PS/2 mouse port on this
computer and you can't connect a PS/2 mouse to it even with an official
port replicator I think it's safe to not use the PS/2 mouse port at all.
Signed-off-by: Szilard Fabian <szfabian@bluemarch.art>
Link: https://lore.kernel.org/r/20240103014717.127307-2-szfabian@bluemarch.art
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Pull block fixes from Jens Axboe:
- NVMe pull request via Keith:
- Remove duplicated enums (Guixen)
- Use appropriate controller state accessors (Keith)
- Retryable authentication (Hannes)
- Add missing module descriptions (Chaitanya)
- Fibre-channel fixes for blktests (Daniel)
- Various type correctness updates (Caleb)
- Improve fabrics connection debugging prints (Nitin)
- Passthrough command verbose error logging (Adam)
- Fix for where we set IO priority in the bio for drivers that use
fops->submit_bio() to queue IO, like md/dm etc.
* tag 'block-6.8-2024-02-01' of git://git.kernel.dk/linux: (32 commits)
block: Fix where bio IO priority gets set
nvme: allow passthru cmd error logging
nvme-fc: show hostnqn when connecting to fc target
nvme-rdma: show hostnqn when connecting to rdma target
nvme-tcp: show hostnqn when connecting to tcp target
nvmet-fc: use RCU list iterator for assoc_list
nvmet-fc: take ref count on tgtport before delete assoc
nvmet-fc: avoid deadlock on delete association path
nvmet-fc: abort command when there is no binding
nvmet-fc: do not tack refs on tgtports from assoc
nvmet-fc: remove null hostport pointer check
nvmet-fc: hold reference on hostport match
nvmet-fc: free queue and assoc directly
nvmet-fc: defer cleanup using RCU properly
nvmet-fc: release reference on target port
nvmet-fcloop: swap the list_add_tail arguments
nvme-fc: do not wait in vain when unloading module
nvme-fc: log human-readable opcode on timeout
nvme: split out fabrics version of nvme_opcode_str()
nvme: take const cmd pointer in read-only helpers
...
Pull io_uring fixes from Jens Axboe:
- Fix for missing retry for read multishot.
If we trigger the execution of it and there's more than one buffer to
be read, then we don't always read more than the first one. As it's
edge triggered, this can lead to stalls.
- Limit inline receive multishot retries for fairness reasons.
If we have a very bursty socket receiving data, we still need to
ensure we process other requests as well. This is really two minor
cleanups, then adding a way for poll reissue to trigger a requeue,
and then finally having multishot receive utilize that.
- Fix for a weird corner case for non-multishot receive with
MSG_WAITALL, using provided buffers, and setting the length to
zero (to let the buffer dictate the receive size).
* tag 'io_uring-6.8-2024-02-01' of git://git.kernel.dk/linux:
io_uring/net: fix sr->len for IORING_OP_RECV with MSG_WAITALL and buffers
io_uring/net: limit inline multishot retries
io_uring/poll: add requeue return code from poll multishot handling
io_uring/net: un-indent mshot retry path in io_recv_finish()
io_uring/poll: move poll execution helpers higher up
io_uring/rw: ensure poll based multishot read retries appropriately
Pull arm64 fixes from Will Deacon:
"Two small fixes.
The first one is an alternative fix for the SCS patching problem we
thought we'd fixed in -rc1; it turned out not to be robust with all
toolchains/configs, so this is a revert+retry which has seen some more
testing.
The other one simply removes an unused header file, but I couldn't
resist the negative diffstat.
- Really fix shadow call stack patching with LTO=full
- Remove unused (empty) header file generated from the compat vDSO"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: vdso32: Remove unused vdso32-offsets.h
arm64: scs: Disable LTO for SCS patching code
arm64: Revert "scs: Work around full LTO issue with dynamic SCS"
Adding memblocks for soft-reserved regions prevents them from later being
hotplugged in by dax_kmem.
Signed-off-by: Andrew Bresticker <abrestic@rivosinc.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
md_size will have been narrowed if we have >= 4GB worth of pages in a
soft-reserved region.
Signed-off-by: Andrew Bresticker <abrestic@rivosinc.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
When closing the laptop lid with an external screen connected, the mouse
pointer has a constant movement to the lower right corner. Opening the
lid again stops this movement, but after that the touchpad does no longer
register clicks.
The touchpad is connected both via i2c-hid and PS/2, the predecessor of
this device (NS70MU) has the same layout in this regard and also strange
behaviour caused by the psmouse and the i2c-hid driver fighting over
touchpad control. This fix is reusing the same workaround by just
disabling the PS/2 aux port, that is only used by the touchpad, to give the
i2c-hid driver the lone control over the touchpad.
v2: Rebased on current master
Signed-off-by: Werner Sembach <wse@tuxedocomputers.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20231205163602.16106-1-wse@tuxedocomputers.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Tasklets have an inherent problem with memory corruption. The function
tasklet_action_common calls tasklet_trylock, then it calls the tasklet
callback and then it calls tasklet_unlock. If the tasklet callback frees
the structure that contains the tasklet or if it calls some code that may
free it, tasklet_unlock will write into free memory.
The commits 8e14f61015 and d9a02e016a try to fix it for dm-crypt, but
it is not a sufficient fix and the data corruption can still happen [1].
There is no fix for dm-verity and dm-verity will write into free memory
with every tasklet-processed bio.
There will be atomic workqueues implemented in the kernel 6.9 [2]. They
will have better interface and they will not suffer from the memory
corruption problem.
But we need something that stops the memory corruption now and that can be
backported to the stable kernels. So, I'm proposing this commit that
disables tasklets in both dm-crypt and dm-verity. This commit doesn't
remove the tasklet support, because the tasklet code will be reused when
atomic workqueues will be implemented.
[1] https://lore.kernel.org/all/d390d7ee-f142-44d3-822a-87949e14608b@suse.de/T/
[2] https://lore.kernel.org/lkml/20240130091300.2968534-1-tj@kernel.org/
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Fixes: 39d42fa96b ("dm crypt: add flags to optionally bypass kcryptd workqueues")
Fixes: 5721d4e5a9 ("dm verity: Add optional "try_verify_in_tasklet" feature")
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Johan writes:
USB-serial device ids for 6.8-rc3
Here are some new device ids for 6.8-rc3.
All have been in linux-next with no reported issues.
* tag 'usb-serial-6.8-rc3' of https://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial:
USB: serial: option: add Fibocom FM101-GL variant
USB: serial: qcserial: add new usb-id for Dell Wireless DW5826e
USB: serial: cp210x: add ID for IMST iM871A-USB
We get following warning with W=1:
drivers/dma/at_hdmac.c:243: warning: Function parameter or struct member 'boundary' not described in 'at_desc'
drivers/dma/at_hdmac.c:243: warning: Function parameter or struct member 'dst_hole' not described in 'at_desc'
drivers/dma/at_hdmac.c:243: warning: Function parameter or struct member 'src_hole' not described in 'at_desc'
drivers/dma/at_hdmac.c:243: warning: Function parameter or struct member 'memset_buffer' not described in 'at_desc'
drivers/dma/at_hdmac.c:243: warning: Function parameter or struct member 'memset_paddr' not described in 'at_desc'
drivers/dma/at_hdmac.c:243: warning: Function parameter or struct member 'memset_vaddr' not described in 'at_desc'
drivers/dma/at_hdmac.c:255: warning: Enum value 'ATC_IS_PAUSED' not described in enum 'atc_status'
drivers/dma/at_hdmac.c:255: warning: Enum value 'ATC_IS_CYCLIC' not described in enum 'atc_status'
drivers/dma/at_hdmac.c:287: warning: Function parameter or struct member 'cyclic' not described in 'at_dma_chan'
drivers/dma/at_hdmac.c:350: warning: Function parameter or struct member 'memset_pool' not described in 'at_dma'
Fix this by adding the required description and also drop unused struct
member 'cyclic' in 'at_dma_chan'
Reviewed-by: Tudor Ambarus <tudor.ambarus@linaro.org>
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Link: https://lore.kernel.org/r/20240130163216.633034-1-vkoul@kernel.org
Signed-off-by: Vinod Koul <vkoul@kernel.org>
The links in a cycle are not all logged in a consistent manner or not
logged at all. Make them consistent by adding a "cycle:" string and log all
the link in the cycles (even the child ==> parent dependency) so that it's
easier to debug cycle detection code. Also, mark the start and end of a
cycle so it's easy to tell when multiple cycles are logged back to back.
Signed-off-by: Saravana Kannan <saravanak@google.com>
Tested-by: Xu Yang <xu.yang_2@nxp.com>
Link: https://lore.kernel.org/r/20240202095636.868578-4-saravanak@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
fw_devlink can detect most overlapping/intersecting cycles. However it was
missing a few corner cases because of an incorrect optimization logic that
tries to avoid repeating cycle detection for devices that are already
marked as part of a cycle.
Here's an example provided by Xu Yang (edited for clarity):
usb
+-----+
tcpc | |
+-----+ | +--|
| |----------->|EP|
|--+ | | +--|
|EP|<-----------| |
|--+ | | B |
| | +-----+
| A | |
+-----+ |
^ +-----+ |
| | | |
+-----| C |<--+
| |
+-----+
usb-phy
Node A (tcpc) will be populated as device 1-0050.
Node B (usb) will be populated as device 38100000.usb.
Node C (usb-phy) will be populated as device 381f0040.usb-phy.
The description below uses the notation:
consumer --> supplier
child ==> parent
1. Node C is populated as device C. No cycles detected because cycle
detection is only run when a fwnode link is converted to a device link.
2. Node B is populated as device B. As we convert B --> C into a device
link we run cycle detection and find and mark the device link/fwnode
link cycle:
C--> A --> B.EP ==> B --> C
3. Node A is populated as device A. As we convert C --> A into a device
link, we see it's already part of a cycle (from step 2) and don't run
cycle detection. Thus we miss detecting the cycle:
A --> B.EP ==> B --> A.EP ==> A
Looking at it another way, A depends on B in one way:
A --> B.EP ==> B
But B depends on A in two ways and we only detect the first:
B --> C --> A
B --> A.EP ==> A
To detect both of these, we remove the incorrect optimization attempt in
step 3 and run cycle detection even if the fwnode link from which the
device link is being created has already been marked as part of a cycle.
Reported-by: Xu Yang <xu.yang_2@nxp.com>
Closes: https://lore.kernel.org/lkml/DU2PR04MB8822693748725F85DC0CB86C8C792@DU2PR04MB8822.eurprd04.prod.outlook.com/
Fixes: 3fb16866b5 ("driver core: fw_devlink: Make cycle detection more robust")
Signed-off-by: Saravana Kannan <saravanak@google.com>
Tested-by: Xu Yang <xu.yang_2@nxp.com>
Link: https://lore.kernel.org/r/20240202095636.868578-3-saravanak@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
device_link_flag_is_sync_state_only() correctly returns true on the flags
of an existing device link that only implements sync_state() functionality.
However, it incorrectly and confusingly returns false if it's called with
DL_FLAG_SYNC_STATE_ONLY.
This bug doesn't manifest in any of the existing calls to this function,
but fix this confusing behavior to avoid future bugs.
Fixes: 67cad5c670 ("driver core: fw_devlink: Add DL_FLAG_CYCLE support to device links")
Signed-off-by: Saravana Kannan <saravanak@google.com>
Tested-by: Xu Yang <xu.yang_2@nxp.com>
Link: https://lore.kernel.org/r/20240202095636.868578-2-saravanak@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently the driver exits eSR by calling
iwl_mvm_esr_mode_inactive() before updating the FW
(by deactivating one of the links), and therefore before
sending the EML frame notifying that we are no longer in eSR.
This is wrong for several reasons:
1. The driver sends SMPS activation frames when we are still in eSR
and SMPS should be disabled when in eSR
2. The driver restores RLC configuration as it was before eSR
entering, and RLC command shouldn't be sent in eSR
Fix this by calling iwl_mvm_esr_mode_inactive() after FW update
Fixes: 12bacfc2c0 ("wifi: iwlwifi: handle eSR transitions")
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Reviewed-by: Ilan Peer <ilan.peer@intel.com>
Reviewed-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://msgid.link/20240201155157.d8d9dc277d4e.Ib5aee0fd05e35b1da7f18753eb3c8fa0a3f872f3@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
If a driver implements the change_interface() method, we switch
interface type without taking the interface down, but still will
recreate the debugfs for it since it's a new type. As such, we
should use the ieee80211_debugfs_recreate_netdev() function here
to also recreate the driver's files, if it is indeed from a type
change while up.
Link: https://msgid.link/20240129155402.7311a36ffeeb.I18df02bbeb685d4250911de5ffbaf090f60c3803@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
We recently added some validation that we don't try to
connect to an AP that is currently in a channel switch
process, since that might want the channel to be quiet
or we might not be able to connect in time to hear the
switching in a beacon. This was in commit c09c4f3199
("wifi: mac80211: don't connect to an AP while it's in
a CSA process").
However, we promptly got a report that this caused new
connection failures, and it turns out that the AP that
we now cannot connect to is permanently advertising an
extended channel switch announcement, even with quiet.
The AP in question was an Asus RT-AC53, with firmware
3.0.0.4.380_10760-g21a5898.
As a first step, attempt to detect that we're dealing
with such a situation, so mac80211 can use this later.
Reported-by: coldolt <andypalmadi@gmail.com>
Closes: https://lore.kernel.org/linux-wireless/CAJvGw+DQhBk_mHXeu6RTOds5iramMW2FbMB01VbKRA4YbHHDTA@mail.gmail.com/
Fixes: c09c4f3199 ("wifi: mac80211: don't connect to an AP while it's in a CSA process")
Reviewed-by: Miriam Rachel Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://msgid.link/20240129131413.246972c8775e.Ibf834d7f52f9951a353b6872383da710a7358338@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Commit 56e58d6c8a ("net: stmmac: Implement Safety Features in
XGMAC core") checks and reports safety errors, but leaves the
Data Path Parity Errors for each channel in DMA unhandled at all, lead to
a storm of interrupt.
Fix it by checking and clearing the DMA_DPP_Interrupt_Status register.
Fixes: 56e58d6c8a ("net: stmmac: Implement Safety Features in XGMAC core")
Signed-off-by: Furong Xu <0x1207@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
CAP_NET_ADMIN allows to configure network interfaces, not CAP_SYS_ADMIN
which only allows to call unshare(2). Without this change, running
network tests as a non-root user but with all capabilities would fail at
the setup_loopback() step with "RTNETLINK answers: Operation not
permitted".
The issue is only visible when running tests with non-root users (i.e.
only relying on ambient capabilities). Indeed, when configuring the
network interface, the "ip" command is called, which may lead to the
special handling of capabilities for the root user by execve(2). If
root is the caller, then the inherited, permitted and effective
capabilities are all reset, which then includes CAP_NET_ADMIN. However,
if a non-root user is the caller, then ambient capabilities are masked
by the inherited ones, which were explicitly dropped.
To make execution deterministic whatever users are running the tests,
set the noroot secure bit for each test, and set the inheritable and
ambient capabilities to CAP_NET_ADMIN, the only capability that may be
required after an execve(2).
Factor out _effective_cap() into _change_cap(), and use it to manage
ambient capabilities with the new set_ambient_cap() and
clear_ambient_cap() helpers.
This makes it possible to run all Landlock tests with check-linux.sh
from https://github.com/landlock-lsm/landlock-test-tools
Cc: Konstantin Meskhidze <konstantin.meskhidze@huawei.com>
Fixes: a549d055a2 ("selftests/landlock: Add network tests")
Link: https://lore.kernel.org/r/20240125153230.3817165-2-mic@digikod.net
[mic: Make sure SECBIT_NOROOT_LOCKED is set]
Signed-off-by: Mickaël Salaün <mic@digikod.net>
The commit 2ad56efa80 ("powerpc/iommu: Setup a default domain and
remove set_platform_dma_ops") refactored the code removing the
set_platform_dma_ops(). It missed out the table group
release_ownership() call which would have got called otherwise
during the guest shutdown via vfio_group_detach_container(). On
PPC64, this particular call actually sets up the 32-bit TCE table,
and enables the 64-bit DMA bypass etc. Now after guest shutdown,
the subsequent host driver (e.g megaraid-sas) probe post unbind
from vfio-pci fails like,
megaraid_sas 0031:01:00.0: Warning: IOMMU dma not supported: mask 0x7fffffffffffffff, table unavailable
megaraid_sas 0031:01:00.0: Warning: IOMMU dma not supported: mask 0xffffffff, table unavailable
megaraid_sas 0031:01:00.0: Failed to set DMA mask
megaraid_sas 0031:01:00.0: Failed from megasas_init_fw 6539
The patch brings back the call to table_group release_ownership()
call when switching back to PLATFORM domain from BLOCKED, while
also separates the domain_ops for both.
Fixes: 2ad56efa80 ("powerpc/iommu: Setup a default domain and remove set_platform_dma_ops")
Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/170628173462.3742.18330000394415935845.stgit@ltcd48-lp2.aus.stglab.ibm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Timur pointed this out before, and it just slipped my mind,
but this might help some things work better, around pcie power
management.
Fixes: 8d55b0a940 ("nouveau/gsp: add some basic registry entries.")
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/576336/
After commit 936e4d49ec ("Input: atkbd - skip ATKBD_CMD_GETID in
translated mode") not only the getid command is skipped, but also
the de-activating of the keyboard at the end of atkbd_probe(), potentially
re-introducing the problem fixed by commit be2d7e4233 ("Input: atkbd -
fix multi-byte scancode handling on reconnect").
Make sure multi-byte scancode handling on reconnect is still handled
correctly by not skipping the atkbd_deactivate() call.
Fixes: 936e4d49ec ("Input: atkbd - skip ATKBD_CMD_GETID in translated mode")
Tested-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20240126160724.13278-3-hdegoede@redhat.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
In ext4_map_blocks(), if we can't find a range of mapping in the
extents cache, we are calling ext4_ext_map_blocks() to search the real
path and ext4_ext_determine_hole() to determine the hole range. But if
the querying range was partially or completely overlaped by a delalloc
extent, we can't find it in the real extent path, so the returned hole
length could be incorrect.
Fortunately, ext4_ext_put_gap_in_cache() have already handle delalloc
extent, but it searches start from the expanded hole_start, doesn't
start from the querying range, so the delalloc extent found could not be
the one that overlaped the querying range, plus, it also didn't adjust
the hole length. Let's just remove ext4_ext_put_gap_in_cache(), handle
delalloc and insert adjusted hole extent in ext4_ext_determine_hole().
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Suggested-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20240127015825.1608160-4-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
ext4_da_map_blocks() only hold i_data_sem in shared mode and i_rwsem
when inserting delalloc extents, it could be raced by another querying
path of ext4_map_blocks() without i_rwsem, .e.g buffered read path.
Suppose we buffered read a file containing just a hole, and without any
cached extents tree, then it is raced by another delayed buffered write
to the same area or the near area belongs to the same hole, and the new
delalloc extent could be overwritten to a hole extent.
pread() pwrite()
filemap_read_folio()
ext4_mpage_readpages()
ext4_map_blocks()
down_read(i_data_sem)
ext4_ext_determine_hole()
//find hole
ext4_ext_put_gap_in_cache()
ext4_es_find_extent_range()
//no delalloc extent
ext4_da_map_blocks()
down_read(i_data_sem)
ext4_insert_delayed_block()
//insert delalloc extent
ext4_es_insert_extent()
//overwrite delalloc extent to hole
This race could lead to inconsistent delalloc extents tree and
incorrect reserved space counter. Fix this by converting to hold
i_data_sem in exclusive mode when adding a new delalloc extent in
ext4_da_map_blocks().
Cc: stable@vger.kernel.org
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Suggested-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20240127015825.1608160-3-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
UAPI Changes:
- Only allow a single user-fence per exec / bind.
The reason for this clarification fix is a limitation in the implementation
which can be lifted moving forward, if needed.
Driver Changes:
- A crash fix
- A fix for an assert due to missing mem_acces ref
- Only allow a single user-fence per exec / bind.
- Some sparse warning fixes
- Two fixes for compilation failures on various odd
combinations of gcc / arch pointed out on LKML.
- Fix a fragile partial allocation pointed out on LKML.
Cross-driver Change:
- A sysfs ABI documentation warning fix
This also touches i915 and is acked by i915 maintainers.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/ZbuCYdMDVK-kAWC5@fedora
This reverts commit abe2023b4c.
Changing the locking order means that scheduler/msm_job_run() can race
with the recovery kthread worker, with the result that the GPU gets an
extra runpm get when we are trying to power it off. Leaving the GPU in
an unrecovered state.
I'll need to come up with a different scheme for appeasing lockdep.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Patchwork: https://patchwork.freedesktop.org/patch/573835/
During the testing of Gnome on Qualcomm Robotics platform screen
corruption has been observed. Lowering GPU's highest_bank_bit from 14 to
13 seems to fix the screen corruption.
Note, the MDSS and DPU drivers use HBB=1 (which maps to the
highest_bank_bit = 14). So this change merely works around the UBWC
swizzling issue on this platform until the real cause is found.
Fixes: e7fc9398e6 ("drm/msm/a6xx: Add A610 support")
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Patchwork: https://patchwork.freedesktop.org/patch/573838/
Signed-off-by: Rob Clark <robdclark@chromium.org>
Since commit 79e2cf2e7a ("drm/gem: Take reservation lock for vmap/vunmap
operations"), the resv lock is already held in the prime vmap path, so
don't try to grab it again.
v2: This applies to vunmap path as well
v3: Fix fixes commit
Fixes: 79e2cf2e7a ("drm/gem: Take reservation lock for vmap/vunmap operations")
Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Christian König <christian.koenig@amd.com>
Patchwork: https://patchwork.freedesktop.org/patch/576642/
It turns out it was never just gcc-11 that was broken. Apparently it
just happens to work on x86-64 with other gcc versions.
On arm64, I see warnings with gcc version 13.2.1, and the kernel test
robot reports the same problem on s390 with gcc 13.2.0.
Admittedly it seems to be just the new Xe drm driver, but this is
keeping me from doing my normal arm64 build testing. So it gets
reverted until somebody figures out what causes the problem (and why it
doesn't show on x86-64, which is what makes me suspect it was never just
about gcc-11, and more about just random happenstance).
This also changes the Kconfig naming a bit - just make the "disable this
for GCC" conditional be one simple Kconfig entry, and we can put the gcc
version dependencies in that entry once we figure out what the correct
rules are.
The version dependency _may_ still end up being "gcc version larger than
11" if the issue is purely in the Xe driver, but even if that ends up
the case, let's make that all part of the "GCC_NO_STRINGOP_OVERFLOW"
logic.
For now, we just disable it for all gcc versions while the exact cause
is unknown.
Link: https://lore.kernel.org/all/202401161031.hjGJHMiJ-lkp@intel.com/T/
Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexandre Ghiti <alexghiti@rivosinc.com> says:
While merging riscv napot and arm64 contpte support, I noticed we did
not abide by the specification which states that we should clear a
napot mapping before setting a new one, called "break before make" in
arm64 (patch 1). And also that we did not add the new hugetlb page size
added by napot in hugetlb_mask_last_page() (patch 2).
* b4-shazam-merge:
riscv: Fix hugetlb_mask_last_page() when NAPOT is enabled
riscv: Fix set_huge_pte_at() for NAPOT mapping
Link: https://lore.kernel.org/r/20240117195741.1926459-1-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Pull networking fixes from Jakub Kicinski:
"Including fixes from netfilter.
As Paolo promised we continue to hammer out issues in our selftests.
This is not the end but probably the peak.
Current release - regressions:
- smc: fix incorrect SMC-D link group matching logic
Current release - new code bugs:
- eth: bnxt: silence WARN() when device skips a timestamp, it happens
Previous releases - regressions:
- ipmr: fix null-deref when forwarding mcast packets
- conntrack: evaluate window negotiation only for packets in the
REPLY direction, otherwise SYN retransmissions trigger incorrect
window scale negotiation
- ipset: fix performance regression in swap operation
Previous releases - always broken:
- tcp: add sanity checks to types of pages getting into the rx
zerocopy path, we only support basic NIC -> user, no page cache
pages etc.
- ip6_tunnel: make sure to pull inner header in __ip6_tnl_rcv()
- nt_tables: more input sanitization changes
- dsa: mt7530: fix 10M/100M speed on MediaTek MT7988 switch
- bridge: mcast: fix loss of snooping after long uptime, jiffies do
wrap on 32bit
- xen-netback: properly sync TX responses, protect with locking
- phy: mediatek-ge-soc: sync calibration values with MediaTek SDK,
increase connection stability
- eth: pds: fixes for various teardown, and reset races
Misc:
- hsr: silence WARN() if we can't alloc supervision frame, it
happens"
* tag 'net-6.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (82 commits)
doc/netlink/specs: Add missing attr in rt_link spec
idpf: avoid compiler padding in virtchnl2_ptype struct
selftests: mptcp: join: stop transfer when check is done (part 2)
selftests: mptcp: join: stop transfer when check is done (part 1)
selftests: mptcp: allow changing subtests prefix
selftests: mptcp: decrease BW in simult flows
selftests: mptcp: increase timeout to 30 min
selftests: mptcp: add missing kconfig for NF Mangle
selftests: mptcp: add missing kconfig for NF Filter in v6
selftests: mptcp: add missing kconfig for NF Filter
mptcp: fix data re-injection from stale subflow
selftests: net: enable some more knobs
selftests: net: add missing config for NF_TARGET_TTL
selftests: forwarding: List helper scripts in TEST_FILES Makefile variable
selftests: net: List helper scripts in TEST_FILES Makefile variable
selftests: net: Remove executable bits from library scripts
selftests: bonding: Check initial state
selftests: team: Add missing config options
hv_netvsc: Fix race condition between netvsc_probe and netvsc_remove
xen-netback: properly sync TX responses
...
Pull parisc architecture fixes from Helge Deller:
"The current exception handler, which helps on kernel accesses to
userspace, may exhibit data corruption. The problem is that it is not
guaranteed that the compiler will use the processor register we
specified in the source code, but may choose another register which
then will lead to silent register- and data corruption. To fix this
issue we now use another strategy to help the exception handler to
always find and set the error code into the correct CPU register.
The other fixes are small: fixing CPU hotplug bringup, fix the page
alignment of the RO_DATA section, added a check for the calculated
cache stride and fix possible hangups when printing longer output at
bootup when running on serial console.
Most of the patches are tagged for stable series.
- Fix random data corruption triggered by exception handler
- Fix crash when setting up BTLB at CPU bringup
- Prevent hung tasks when printing inventory on serial console
- Make RO_DATA page aligned in vmlinux.lds.S
- Add check for valid cache stride size"
* tag 'parisc-for-6.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: BTLB: Fix crash when setting up BTLB at CPU bringup
parisc: Fix random data corruption from exception handler
parisc: Drop unneeded semicolon in parse_tree_node()
parisc: Prevent hung tasks when printing inventory on serial console
parisc: Check for valid stride size for cache flushes
parisc: Make RO_DATA page aligned in vmlinux.lds.S
Pull Kbuild fixes from Masahiro Yamada:
- Fix UML build with clang-18 and newer
- Avoid using the alias attribute in host programs
- Replace tabs with spaces when followed by conditionals for future GNU
Make versions
- Fix rpm-pkg for the systemd-provided kernel-install tool
- Fix the undefined behavior in Kconfig for a 'int' symbol used in a
conditional
* tag 'kbuild-fixes-v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kconfig: initialize sym->curr.tri to 'no' for all symbol types again
kbuild: rpm-pkg: simplify installkernel %post
kbuild: Replace tabs with spaces when followed by conditionals
modpost: avoid using the alias attribute
kbuild: fix W= flags in the help message
modpost: Add '.ltext' and '.ltext.*' to TEXT_SECTIONS
um: Fix adding '-no-pie' for clang
kbuild: defconf: use SRCARCH to find merged configs
Pull nfsd fix from Chuck Lever:
- Fix a recent backchannel timeout fix
* tag 'nfsd-6.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
NFSv4.1: Assign the right value for initval and retries for rpc timeout
Pull exfat fix from Namjae Jeon:
- Fix BUG in iov_iter_revert reported from syzbot
* tag 'exfat-for-6.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat:
exfat: fix zero the unwritten part for dio read
ASoC: Fixes for v6.8
This pull request adds Richard Fitzgerald's series with extensive fixes
for the CS35L56, he said:
These patches fix various things that were undocumented, unknown or
uncertain when the original driver code was written. And also a few
things that were just bugs.
Pull HID fixes from Benjamin Tissoires:
- cleanups in the error path in hid-steam (Dan Carpenter)
- fixes for Wacom tablets selftests that sneaked in while the CI was
taking a break during the year end holidays (Benjamin Tissoires)
- null pointer check in nvidia-shield (Kunwu Chan)
- memory leak fix in hidraw (Su Hui)
- another null pointer fix in i2c-hid-of (Johan Hovold)
- another memory leak fix in HID-BPF this time, as well as a double
fdget() fix reported by Dan Carpenter (Benjamin Tissoires)
- fix for Cirque touchpad when they go on suspend (Kai-Heng Feng)
- new device ID in hid-logitech-hidpp: "Logitech G Pro X SuperLight 2"
(Jiri Kosina)
* tag 'hid-for-linus-2024020101' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
HID: bpf: use __bpf_kfunc instead of noinline
HID: bpf: actually free hdev memory after attaching a HID-BPF program
HID: bpf: remove double fdget()
HID: i2c-hid-of: fix NULL-deref on failed power up
HID: hidraw: fix a problem of memory leak in hidraw_release()
HID: i2c-hid: Skip SET_POWER SLEEP for Cirque touchpad on system suspend
HID: nvidia-shield: Add missing null pointer checks to LED initialization
HID: logitech-hidpp: add support for Logitech G Pro X Superlight 2
selftests/hid: wacom: fix confidence tests
HID: hid-steam: Fix cleanup in probe()
HID: hid-steam: remove pointless error message
With the introduction of SMB2_OP_QUERY_WSL_EA, the client may now send
5 commands in a single compound request in order to query xattrs from
potential WSL reparse points, which should be fine as we currently
allow up to 5 PDUs in a single compound request. However, if
encryption is enabled (e.g. 'seal' mount option) or enforced by the
server, current MAX_COMPOUND(5) won't be enough as we require an extra
PDU for the transform header.
Fix this by increasing MAX_COMPOUND to 7 and, while we're at it, add
an WARN_ON_ONCE() and return -EIO instead of -ENOMEM in case we
attempt to send a compound request that couldn't include the extra
transform header.
Signed-off-by: Paulo Alcantara <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
After the interface selection policy change to do a weighted
round robin, each iface maintains a weight_fulfilled. When the
weight_fulfilled reaches the total weight for the iface, we know
that the weights can be reset and ifaces can be allocated from
scratch again.
During channel allocation failures on a particular channel,
weight_fulfilled is not incremented. If a few interfaces are
inactive, we could end up in a situation where the active
interfaces are all allocated for the total_weight, and inactive
ones are all that remain. This can cause a situation where
no more channels can be allocated further.
This change fixes it by increasing weight_fulfilled, even when
channel allocation failure happens. This could mean that if
there are temporary failures in channel allocation, the iface
weights may not strictly be adhered to. But that's still okay.
Fixes: a6d8fb54a5 ("cifs: distribute channels across interfaces based on speed")
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Pull firewire fixes from Takashi Sakamoto:
"FireWire subsystem now supports the legacy layout of configuration
ROM, while it appears that some of DV devices in the early 2000's have
the legacy layout with a quirk. This includes some changes to handle
the quirk"
* tag 'firewire-fixes-6.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
firewire: core: search descriptor leaf just after vendor directory entry in root directory
firewire: core: correct documentation of fw_csr_string() kernel API
In order to scale down the channels, the following sequence
of operations happen:
1. server struct is marked for terminate
2. the channel is deallocated in the ses->chans array
3. at a later point the cifsd thread actually terminates the server
Between 2 and 3, there can be calls to find the channel for
a server struct. When that happens, there can be an ugly warning
that's logged. But this is expected.
So this change does two things:
1. in cifs_ses_get_chan_index, if server->terminate is set, return
2. always make sure server->terminate is set with chan_lock held
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
When the server reports query network interface info call
as unsupported following a tree connect, it means that
multichannel is unsupported, even if the server capabilities
report otherwise.
When this happens, cifs_chan_skip_or_disable is called to
disable multichannel on the client. However, we only need
to call this when multichannel is currently setup.
Fixes: f591062bdb ("cifs: handle servers that still advertise multichannel after disabling")
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Pull spi fix from Mark Brown:
"One simple fix for a minor but valid issue with constants overflowing
identified via cppcheck"
* tag 'spi-fix-v6.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: sh-msiof: avoid integer overflow in constants
Pull regulator fixes from Mark Brown:
"The main set of fixes here are for the PWM regulator, fixing
bootstrapping issues on some platforms where the hardware setup looked
like it was out of spec for the constraints we have for the regulator
causing us to make spurious and unhelpful changes to try to bring
things in line with the constraints.
There's also a couple of other driver specific fixes"
* tag 'regulator-fix-v6.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator (max5970): Fix IRQ handler
regulator: ti-abb: don't use devm_platform_ioremap_resource_byname for shared interrupt register
regulator: pwm-regulator: Manage boot-on with disabled PWM channels
regulator: pwm-regulator: Calculate the output voltage for disabled PWMs
regulator: pwm-regulator: Add validity checks in continuous .get_voltage
Pull crypto fixes from Herbert Xu:
"Fix regressions in caam and qat"
* tag 'v6.8-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: caam - fix asynchronous hash
crypto: qat - fix arbiter mapping generation algorithm for QAT 402xx
Pull lsm fixes from Paul Moore:
"Two small patches to fix some problems relating to LSM hook return
values and how the individual LSMs interact"
* tag 'lsm-pr-20240131' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm:
lsm: fix default return value of the socket_getpeersec_*() hooks
lsm: fix the logic in security_inode_getsecctx()
Commit 82b74cac28 ("blk-ioprio: Convert from rqos policy to direct
call") pushed setting bio I/O priority down into blk_mq_submit_bio()
-- which is too low within block core's submit_bio() because it
skips setting I/O priority for block drivers that implement
fops->submit_bio() (e.g. DM, MD, etc).
Fix this by moving bio_set_ioprio() up from blk-mq.c to blk-core.c and
call it from submit_bio(). This ensures all block drivers call
bio_set_ioprio() during initial bio submission.
Fixes: a78418e6a0 ("block: Always initialize bio IO priority on submit")
Co-developed-by: Yibin Ding <yibin.ding@unisoc.com>
Signed-off-by: Yibin Ding <yibin.ding@unisoc.com>
Signed-off-by: Hongyu Jin <hongyu.jin@unisoc.com>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Mikulas Patocka <mpatocka@redhat.com>
[snitzer: revised commit header]
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20240130202638.62600-2-snitzer@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Merge series from Richard Fitzgerald <rf@opensource.cirrus.com>:
These patches fixe various things that were undocumented, unknown or
uncertain when the original driver code was written. And also a few
things that were just bugs.
Simon Wunderlich says:
====================
Here are some batman-adv bugfixes:
- fix a timeout issue and a memory leak in batman-adv multicast,
by Linus Lüssing (2 patches)
* tag 'batadv-net-pullrequest-20240201' of git://git.open-mesh.org/linux-merge:
batman-adv: mcast: fix memory leak on deleting a batman-adv interface
batman-adv: mcast: fix mcast packet type counter on timeouted nodes
====================
Link: https://lore.kernel.org/r/20240201110110.29129-1-sw@simonwunderlich.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The original code didn't update the firmware version if the
"next slot" of the AFI register isn't zero or if the
"current slot" field is zero; in those cases it assumed
that a reset was needed.
However, the NVMe specification doesn't exclude the possibility that
the "next slot" value is equal to the "current slot" value,
meaning that the same firmware slot will be activated after performing
a controller level reset; in this case a reset is clearly not
necessary and we can safely update the firmware version.
Modify the code so the kernel will report that a Controller Level Reset
is needed only in the following cases:
1) If the "current slot" field is zero. This is invalid and means that
something is wrong, a reset is needed.
or
2) if the "next slot" field isn't zero AND it's not equal to the
"current slot" value. This means that at the next reset a different
firmware slot will be activated.
Fixes: 983a338b96 ("nvme: update firmware version after commit")
Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
Reviewed-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
1) TCP conntrack now only evaluates window negotiation for packets in
the REPLY direction, from Ryan Schaefer. Otherwise SYN retransmissions
trigger incorrect window scale negotiation. From Ryan Schaefer.
2) Restrict tunnel objects to NFPROTO_NETDEV which is where it makes sense
to use this object type.
3) Fix conntrack pick up from the middle of SCTP_CID_SHUTDOWN_ACK packets.
From Xin Long.
4) Another attempt from Jozsef Kadlecsik to address the slow down of the
swap command in ipset.
5) Replace a BUG_ON by WARN_ON_ONCE in nf_log, and consolidate check for
the case that the logger is NULL from the read side lock section.
6) Address lack of sanitization for custom expectations. Restrict layer 3
and 4 families to what it is supported by userspace.
* tag 'nf-24-01-31' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nft_ct: sanitize layer 3 and 4 protocol number in custom expectations
netfilter: nf_log: replace BUG_ON by WARN_ON_ONCE when putting logger
netfilter: ipset: fix performance regression in swap operation
netfilter: conntrack: check SCTP_CID_SHUTDOWN_ACK for vtag setting in sctp_new
netfilter: nf_tables: restrict tunnel object to NFPROTO_NETDEV
netfilter: conntrack: correct window scaling with retransmitted SYN
====================
Link: https://lore.kernel.org/r/20240131225943.7536-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In the arm random config file, kconfig option 'CONFIG_AEABI' is
disabled which results in adding the compiler flag '-mabi=apcs-gnu'.
This causes the compiler to add padding in virtchnl2_ptype
structure to align it to 8 bytes, resulting in the following
size check failure:
include/linux/build_bug.h:78:41: error: static assertion failed: "(6) == sizeof(struct virtchnl2_ptype)"
78 | #define __static_assert(expr, msg, ...) _Static_assert(expr, msg)
| ^~~~~~~~~~~~~~
include/linux/build_bug.h:77:34: note: in expansion of macro '__static_assert'
77 | #define static_assert(expr, ...) __static_assert(expr, ##__VA_ARGS__, #expr)
| ^~~~~~~~~~~~~~~
drivers/net/ethernet/intel/idpf/virtchnl2.h:26:9: note: in expansion of macro 'static_assert'
26 | static_assert((n) == sizeof(struct X))
| ^~~~~~~~~~~~~
drivers/net/ethernet/intel/idpf/virtchnl2.h:982:1: note: in expansion of macro 'VIRTCHNL2_CHECK_STRUCT_LEN'
982 | VIRTCHNL2_CHECK_STRUCT_LEN(6, virtchnl2_ptype);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
Avoid the compiler padding by using "__packed" structure
attribute for the virtchnl2_ptype struct. Also align the
structure by using "__aligned(2)" for better code optimization.
Fixes: 0d7502a9b4 ("virtchnl: add virtchnl version 2 ops")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202312220250.ufEm8doQ-lkp@intel.com
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20240131222241.2087516-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Matthieu Baerts says:
====================
mptcp: fixes for recent issues reported by CI's
This series of 9 patches fixes issues mostly identified by CI's not
managed by the MPTCP maintainers. Thank you Linero (LKFT) and Netdev
maintainers (NIPA) for running our kunit and selftests tests!
For the first patch, it took a bit of time to identify the root cause.
Some MPTCP Join selftest subtests have been "flaky", mostly in slow
environments. It appears to be due to the use of a TCP-specific helper
on an MPTCP socket. A fix for kernels >= v5.15.
Patches 2 to 4 add missing kernel config to support NetFilter tables
needed for IPTables commands. These kconfigs are usually enabled in
default configurations, but apparently not for all architectures.
Patches 2 and 3 can be backported up to v5.11 and the 4th one up to
v5.19.
Patch 5 increases the time limit for MPTCP selftests. It appears that
many CI's execute tests in a VM without acceleration supports, e.g. QEmu
without KVM. As a result, the tests take longer. Plus, there are more
and more tests. This patch modifies the timeout added in v5.18.
Patch 6 reduces the maximum rate and delay of the different links in
some Simult Flows selftest subtests. The goal is to let slow VMs reach
the maximum speed. The original rate was introduced in v5.11.
Patch 7 lets CI changing the prefix of the subtests titles, to be able
to run the same selftest multiple times with different parameters. With
different titles, tests will be considered as different and not override
previous results as it is the case with some CI envs. Subtests have been
introduced in v6.6.
Patch 8 and 9 make some MPTCP Join selftest subtests quicker by stopping
the transfer when the expected events have been seen. Patch 8 can be
backported up to v6.5.
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
====================
Link: https://lore.kernel.org/r/20240131-upstream-net-20240131-mptcp-ci-issues-v1-0-4c1c11e571ff@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Since the "Fixes" commits mentioned below, the newly added "userspace
pm" subtests of mptcp_join selftests are launching the whole transfer in
the background, do the required checks, then wait for the end of
transfer.
There is no need to wait longer, especially because the checks at the
end of the transfer are ignored (which is fine). This saves quite a few
seconds on slow environments.
While at it, use 'mptcp_lib_kill_wait()' helper everywhere, instead of
on a specific one with 'kill_tests_wait()'.
Fixes: b2e2248f36 ("selftests: mptcp: userspace pm create id 0 subflow")
Fixes: e3b47e460b ("selftests: mptcp: userspace pm remove initial subflow")
Fixes: b9fb176081 ("selftests: mptcp: userspace pm send RM_ADDR for ID 0")
Cc: stable@vger.kernel.org
Reviewed-and-tested-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240131-upstream-net-20240131-mptcp-ci-issues-v1-9-4c1c11e571ff@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Since the "Fixes" commit mentioned below, "userspace pm" subtests of
mptcp_join selftests introduced in v6.5 are launching the whole transfer
in the background, do the required checks, then wait for the end of
transfer.
There is no need to wait longer, especially because the checks at the
end of the transfer are ignored (which is fine). This saves quite a few
seconds in slow environments.
Note that old versions will need commit bdbef0a6ff ("selftests: mptcp:
add mptcp_lib_kill_wait") as well to get 'mptcp_lib_kill_wait()' helper.
Fixes: 4369c198e5 ("selftests: mptcp: test userspace pm out of transfer")
Cc: stable@vger.kernel.org # 6.5.x: bdbef0a6ff: selftests: mptcp: add mptcp_lib_kill_wait
Cc: stable@vger.kernel.org # 6.5.x
Reviewed-and-tested-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240131-upstream-net-20240131-mptcp-ci-issues-v1-8-4c1c11e571ff@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If a CI executes the same selftest multiple times with different
options, all results from the same subtests will have the same title,
which confuse the CI. With the same title printed in TAP, the tests are
considered as the same ones.
Now, it is possible to override this prefix by using MPTCP_LIB_KSFT_TEST
env var, and have a different title.
While at it, use 'basename' to remove the suffix as well instead of
using an extra 'sed'.
Fixes: c4192967e6 ("selftests: mptcp: lib: format subtests results in TAP")
Cc: stable@vger.kernel.org
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240131-upstream-net-20240131-mptcp-ci-issues-v1-7-4c1c11e571ff@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When running the simult_flow selftest in slow environments -- e.g. QEmu
without KVM support --, the results can be unstable. This selftest
checks if the aggregated bandwidth is (almost) fully used as expected.
To help improving the stability while still keeping the same validation
in place, the BW and the delay are reduced to lower the pressure on the
CPU.
Fixes: 1a418cb8e8 ("mptcp: simult flow self-tests")
Fixes: 219d04992b ("mptcp: push pending frames when subflow has free space")
Cc: stable@vger.kernel.org
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240131-upstream-net-20240131-mptcp-ci-issues-v1-6-4c1c11e571ff@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
On very slow environments -- e.g. when QEmu is used without KVM --,
mptcp_join.sh selftest can take a bit more than 20 minutes. Bump the
default timeout by 50% as it seems normal to take that long on some
environments.
When a debug kernel config is used, this selftest will take even longer,
but that's certainly not a common test env to consider for the timeout.
The Fixes tag that has been picked here is there simply to help having
this patch backported to older stable versions. It is difficult to point
to the exact commit that made some env reaching the timeout from time to
time.
Fixes: d17b968b98 ("selftests: mptcp: increase timeout to 20 minutes")
Cc: stable@vger.kernel.org
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240131-upstream-net-20240131-mptcp-ci-issues-v1-5-4c1c11e571ff@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Since the commit mentioned below, 'mptcp_join' selftests is using
IPTables to add rules to the Filter table.
It is then required to have IP_NF_FILTER KConfig.
This KConfig is usually enabled by default in many defconfig, but we
recently noticed that some CI were running our selftests without them
enabled.
Fixes: 8d014eaa92 ("selftests: mptcp: add ADD_ADDR timeout test case")
Cc: stable@vger.kernel.org
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When the MPTCP PM detects that a subflow is stale, all the packet
scheduler must re-inject all the mptcp-level unacked data. To avoid
acquiring unneeded locks, it first try to check if any unacked data
is present at all in the RTX queue, but such check is currently
broken, as it uses TCP-specific helper on an MPTCP socket.
Funnily enough fuzzers and static checkers are happy, as the accessed
memory still belongs to the mptcp_sock struct, and even from a
functional perspective the recovery completed successfully, as
the short-cut test always failed.
A recent unrelated TCP change - commit d5fed5addb ("tcp: reorganize
tcp_sock fast path variables") - exposed the issue, as the tcp field
reorganization makes the mptcp code always skip the re-inection.
Fix the issue dropping the bogus call: we are on a slow path, the early
optimization proved once again to be evil.
Fixes: 1e1d9d6f11 ("mptcp: handle pending data on closed subflow")
Cc: stable@vger.kernel.org
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/468
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240131-upstream-net-20240131-mptcp-ci-issues-v1-1-4c1c11e571ff@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
gtod_is_based_on_tsc() is boolean in nature, i.e. it returns '1' for good
clocksources and '0' otherwise. Moreover, its result is used raw by
kvm_get_time_and_clockread()/kvm_get_walltime_and_clockread() which are
'bool'.
No functional change intended.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Link: https://lore.kernel.org/r/20240109141121.1619463-6-vkuznets@redhat.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
KVM sets up Hyper-V TSC page clocksource for its guests when system
clocksource is 'based on TSC' (see gtod_is_based_on_tsc()), running
hyperv_clock with any other clocksource leads to imminent failure.
Add the missing requirement to make the test skip gracefully.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Link: https://lore.kernel.org/r/20240109141121.1619463-5-vkuznets@redhat.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
KVM's 'gtod_is_based_on_tsc()' recognizes two clocksources: 'tsc' and
'hyperv_clocksource_tsc_page' and enables kvmclock in 'masterclock'
mode when either is in use. Transform 'sys_clocksource_is_tsc()' into
'sys_clocksource_is_based_on_tsc()' to support the later. This affects
two tests: kvm_clock_test and vmx_nested_tsc_scaling_test, both seem
to work well when system clocksource is 'hyperv_clocksource_tsc_page'.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Link: https://lore.kernel.org/r/20240109141121.1619463-4-vkuznets@redhat.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Several existing x86 selftests need to check that the underlying system
clocksource is TSC or based on TSC but every test implements its own
check. As a first step towards unification, extract check_clocksource()
from kvm_clock_test and split it into two functions: arch-neutral
'sys_get_cur_clocksource()' and x86-specific 'sys_clocksource_is_tsc()'.
Fix a couple of pre-existing issues in kvm_clock_test: memory leakage in
check_clocksource() and using TEST_ASSERT() instead of TEST_REQUIRE().
The change also makes the test fail when system clocksource can't be read
from sysfs.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Link: https://lore.kernel.org/r/20240109141121.1619463-2-vkuznets@redhat.com
[sean: eliminate if-elif pattern just to set a bool true]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Benjamin Poirier says:
====================
selftests: net: More small fixes
Some small fixes for net selftests which follow from these recent commits:
dd2d40acdb ("selftests: bonding: Add more missing config options")
49078c1b80 ("selftests: forwarding: Remove executable bits from lib.sh")
====================
Link: https://lore.kernel.org/r/20240131140848.360618-1-bpoirier@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Some scripts are not tests themselves; they contain utility functions used
by other tests. According to Documentation/dev-tools/kselftest.rst, such
files should be listed in TEST_FILES. Currently they are incorrectly listed
in TEST_PROGS_EXTENDED so rename the variable.
Fixes: c085dbfb1c ("selftests/net/forwarding: define libs as TEST_PROGS_EXTENDED")
Suggested-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com>
Link: https://lore.kernel.org/r/20240131140848.360618-6-bpoirier@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Some scripts are not tests themselves; they contain utility functions used
by other tests. According to Documentation/dev-tools/kselftest.rst, such
files should be listed in TEST_FILES. Move those utility scripts to
TEST_FILES.
Fixes: 1751eb42dd ("selftests: net: use TEST_PROGS_EXTENDED")
Fixes: 25ae948b44 ("selftests/net: add lib.sh")
Fixes: b99ac18411 ("kselftests/net: add missed setup_loopback.sh/setup_veth.sh to Makefile")
Fixes: f5173fe3e1 ("selftests: net: included needed helper in the install targets")
Suggested-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com>
Link: https://lore.kernel.org/r/20240131140848.360618-5-bpoirier@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
setup_loopback.sh and net_helper.sh are meant to be sourced from other
scripts, not executed directly. Therefore, remove the executable bits from
those files' permissions.
This change is similar to commit 49078c1b80 ("selftests: forwarding:
Remove executable bits from lib.sh")
Fixes: 7d1575014a ("selftests/net: GRO coalesce test")
Fixes: 3bdd9fd29c ("selftests/net: synchronize udpgro tests' tx and rx connection")
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com>
Link: https://lore.kernel.org/r/20240131140848.360618-4-bpoirier@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The purpose of the test_LAG_cleanup() function is to check that some
hardware addresses are removed from underlying devices after they have been
unenslaved. The test function simply checks that those addresses are not
present at the end. However, if the addresses were never added to begin
with due to some error in device setup, the test function currently passes.
This is a false positive since in that situation the test did not actually
exercise the intended functionality.
Add a check that the expected addresses are indeed present after device
setup. This makes the test function more robust.
I noticed this problem when running the team/dev_addr_lists.sh test on a
system without support for dummy and ipv6:
tools/testing/selftests/drivers/net/team# ./dev_addr_lists.sh
Error: Unknown device type.
Error: Unknown device type.
This program is not intended to be run as root.
RTNETLINK answers: Operation not supported
TEST: team cleanup mode lacp [ OK ]
Fixes: bbb774d921 ("net: Add tests for bonding and team address list management")
Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com>
Link: https://lore.kernel.org/r/20240131140848.360618-3-bpoirier@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Similar to commit dd2d40acdb ("selftests: bonding: Add more missing
config options"), add more networking-specific config options which are
needed for team device tests.
For testing, I used the minimal config generated by virtme-ng and I added
the options in the config file. Afterwards, the team device test passed.
Fixes: bbb774d921 ("net: Add tests for bonding and team address list management")
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com>
Link: https://lore.kernel.org/r/20240131140848.360618-2-bpoirier@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In commit ac50476717 ("hv_netvsc: Disable NAPI before closing the
VMBus channel"), napi_disable was getting called for all channels,
including all subchannels without confirming if they are enabled or not.
This caused hv_netvsc getting hung at napi_disable, when netvsc_probe()
has finished running but nvdev->subchan_work has not started yet.
netvsc_subchan_work() -> rndis_set_subchannel() has not created the
sub-channels and because of that netvsc_sc_open() is not running.
netvsc_remove() calls cancel_work_sync(&nvdev->subchan_work), for which
netvsc_subchan_work did not run.
netif_napi_add() sets the bit NAPI_STATE_SCHED because it ensures NAPI
cannot be scheduled. Then netvsc_sc_open() -> napi_enable will clear the
NAPIF_STATE_SCHED bit, so it can be scheduled. napi_disable() does the
opposite.
Now during netvsc_device_remove(), when napi_disable is called for those
subchannels, napi_disable gets stuck on infinite msleep.
This fix addresses this problem by ensuring that napi_disable() is not
getting called for non-enabled NAPI struct.
But netif_napi_del() is still necessary for these non-enabled NAPI struct
for cleanup purpose.
Call trace:
[ 654.559417] task:modprobe state:D stack: 0 pid: 2321 ppid: 1091 flags:0x00004002
[ 654.568030] Call Trace:
[ 654.571221] <TASK>
[ 654.573790] __schedule+0x2d6/0x960
[ 654.577733] schedule+0x69/0xf0
[ 654.581214] schedule_timeout+0x87/0x140
[ 654.585463] ? __bpf_trace_tick_stop+0x20/0x20
[ 654.590291] msleep+0x2d/0x40
[ 654.593625] napi_disable+0x2b/0x80
[ 654.597437] netvsc_device_remove+0x8a/0x1f0 [hv_netvsc]
[ 654.603935] rndis_filter_device_remove+0x194/0x1c0 [hv_netvsc]
[ 654.611101] ? do_wait_intr+0xb0/0xb0
[ 654.615753] netvsc_remove+0x7c/0x120 [hv_netvsc]
[ 654.621675] vmbus_remove+0x27/0x40 [hv_vmbus]
Cc: stable@vger.kernel.org
Fixes: ac50476717 ("hv_netvsc: Disable NAPI before closing the VMBus channel")
Signed-off-by: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>
Reviewed-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/1706686551-28510-1-git-send-email-schakrabarti@linux.microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Invoking the make_tx_response() / push_tx_responses() pair with no lock
held would be acceptable only if all such invocations happened from the
same context (NAPI instance or dealloc thread). Since this isn't the
case, and since the interface "spec" also doesn't demand that multicast
operations may only be performed with no in-flight transmits,
MCAST_{ADD,DEL} processing also needs to acquire the response lock
around the invocations.
To prevent similar mistakes going forward, "downgrade" the present
functions to private helpers of just the two remaining ones using them
directly, with no forward declarations anymore. This involves renaming
what so far was make_tx_response(), for the new function of that name
to serve the new (wrapper) purpose.
While there,
- constify the txp parameters,
- correct xenvif_idx_release()'s status parameter's type,
- rename {,_}make_tx_response()'s status parameters for consistency with
xenvif_idx_release()'s.
Fixes: 210c34dcd8 ("xen-netback: add support for multicast control")
Cc: stable@vger.kernel.org
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Link: https://lore.kernel.org/r/980c6c3d-e10e-4459-8565-e8fbde122f00@suse.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The documentation is pointing to the wrong path for the interface.
Documentation is pointing to /sys/class/<iface>, instead of
/sys/class/net/<iface>.
Fix it by adding the `net/` directory before the interface.
Fixes: 1a02ef76ac ("net: sysfs: add documentation entries for /sys/class/<iface>/queues")
Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://lore.kernel.org/r/20240131102150.728960-2-leitao@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull NVMe fixes from Keith:
"nvme fixes for Linux 6.8
- Remove duplicated enums (Guixen)
- Use appropriate controller state accessors (Keith)
- Retryable authentication (Hannes)
- Add missing module descriptions (Chaitanya)
- Fibre-channel fixes for blktests (Daniel)
- Various type correctness updates (Caleb)
- Improve fabrics connection debugging prints (Nitin)
- Passthrough command verbose error logging (Adam)"
* tag 'nvme-6.8-2024-02-01' of git://git.infradead.org/nvme: (31 commits)
nvme: allow passthru cmd error logging
nvme-fc: show hostnqn when connecting to fc target
nvme-rdma: show hostnqn when connecting to rdma target
nvme-tcp: show hostnqn when connecting to tcp target
nvmet-fc: use RCU list iterator for assoc_list
nvmet-fc: take ref count on tgtport before delete assoc
nvmet-fc: avoid deadlock on delete association path
nvmet-fc: abort command when there is no binding
nvmet-fc: do not tack refs on tgtports from assoc
nvmet-fc: remove null hostport pointer check
nvmet-fc: hold reference on hostport match
nvmet-fc: free queue and assoc directly
nvmet-fc: defer cleanup using RCU properly
nvmet-fc: release reference on target port
nvmet-fcloop: swap the list_add_tail arguments
nvme-fc: do not wait in vain when unloading module
nvme-fc: log human-readable opcode on timeout
nvme: split out fabrics version of nvme_opcode_str()
nvme: take const cmd pointer in read-only helpers
nvme: remove redundant status mask
...
Commit d7ac8dca93 ("nvme: quiet user passthrough command errors")
disabled error logging for user passthrough commands. This commit
adds the ability to opt-in to passthrough admin error logging. IO
commands initiated as passthrough will always be logged.
The logging output for passthrough commands (Admin and IO) has been
changed to include CDWXX fields.
nvme0n1: Read(0x2), LBA Out of Range (sct 0x0 / sc 0x80) DNR cdw10=0x0 cdw11=0x1
cdw12=0x70000 cdw13=0x0 cdw14=0x0 cdw15=0x0
Add a helper function nvme_log_err_passthru() which allows us to log
error for passthru commands by decoding cdw10-cdw15 values of nvme
command.
Add a new sysfs attr passthru_err_log_enabled that allows user to conditionally
enable passthrough command logging for either passthrough Admin commands sent to
the controller or passthrough IO commands sent to a namespace.
By default, passthrough error logging is disabled.
To enable passthrough admin error logging:
echo 1 > /sys/class/nvme/nvme0/passthru_err_log_enabled
To disable passthrough admin error logging:
echo 0 > /sys/class/nvme/nvme0/passthru_err_log_enabled
To enable passthrough io error logging:
echo 1 > /sys/class/nvme/nvme0/nvme0n1/passthru_err_log_enabled
To disable passthrough io error logging:
echo 0 > /sys/class/nvme/nvme0/nvme0n1/passthru_err_log_enabled
Signed-off-by: Alan Adamson <alan.adamson@oracle.com>
Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Log hostnqn when connecting to nvme target.
As hostnqn could be changed, logging this information
in syslog at appropriate time may help in troubleshooting.
Signed-off-by: Nitin U. Yewale <nyewale@redhat.com>
Reviewed-by: John Meneghini <jmeneghi@redhat.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Log hostnqn when connecting to nvme target.
As hostnqn could be changed, logging this information
in syslog at appropriate time may help in troubleshooting.
Signed-off-by: Nitin U. Yewale <nyewale@redhat.com>
Reviewed-by: John Meneghini <jmeneghi@redhat.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Log hostnqn when connecting to nvme target.
As hostnqn could be changed, logging this information
in syslog at appropriate time may help in troubleshooting.
Signed-off-by: Nitin U. Yewale <nyewale@redhat.com>
Reviewed-by: John Meneghini <jmeneghi@redhat.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
The assoc_list is a RCU protected list, thus use the RCU flavor of list
functions.
Let's use this opportunity and refactor this code and move the lookup
into a helper and give it a descriptive name.
Signed-off-by: Daniel Wagner <dwagner@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
We have to ensure that the tgtport is not going away
before be have remove all the associations.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
When deleting an association the shutdown path is deadlocking because we
try to flush the nvmet_wq nested. Avoid this by deadlock by deferring
the put work into its own work item.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
When the target port has not active port binding, there is no point in
trying to process the command as it has to fail anyway. Instead adding
checks to all commands abort the command early.
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
The association life time is tied to the life time of the target port.
That means we should not take extra a refcount when creating a
association.
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
An association has always a valid hostport pointer. Remove useless
null pointer check.
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
The hostport data structure is shared between the association, this why
we keep track of the users via a refcount. So we should not decrement
the refcount on a match and free the hostport several times.
Reported by KASAN.
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Neither struct nvmet_fc_tgt_queue nor struct nvmet_fc_tgt_assoc are data
structure which are used in a RCU context. So there is no reason to
delay the free operation.
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
When the target executes a disconnect and the host triggers a reconnect
immediately, the reconnect command still finds an existing association.
The reconnect crashes later on because nvmet_fc_delete_target_assoc
blindly removes resources while the reconnect code wants to use it.
To address this, nvmet_fc_find_target_assoc should not be able to
lookup an association which is being removed. The association list
is already under RCU lifetime management, so let's properly use it
and remove the association from the list and wait for a grace period
before cleaning up all. This means we also can drop the RCU management
on the queues, because this is now handled via the association itself.
A second step split the execution context so that the initial disconnect
command can complete without running the reconnect code in the same
context. As usual, this is done by deferring the ->done to a workqueue.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
In case we return early out of __nvmet_fc_finish_ls_req() we still have
to release the reference on the target port.
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
The first argument of list_add_tail function is the new element which
should be added to the list which is the second argument. Swap the
arguments to allow processing more than one element at a time.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
The module exit path has race between deleting all controllers and
freeing 'left over IDs'. To prevent double free a synchronization
between nvme_delete_ctrl and ida_destroy has been added by the initial
commit.
There is some logic around trying to prevent from hanging forever in
wait_for_completion, though it does not handling all cases. E.g.
blktests is able to reproduce the situation where the module unload
hangs forever.
If we completely rely on the cleanup code executed from the
nvme_delete_ctrl path, all IDs will be freed eventually. This makes
calling ida_destroy unnecessary. We only have to ensure that all
nvme_delete_ctrl code has been executed before we leave
nvme_fc_exit_module. This is done by flushing the nvme_delete_wq
workqueue.
While at it, remove the unused nvme_fc_wq workqueue too.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
The eventfs inode had pointers to dentries (and child dentries) without
actually holding a refcount on said pointer. That is fundamentally
broken, and while eventfs tried to then maintain coherence with dentries
going away by hooking into the '.d_iput' callback, that doesn't actually
work since it's not ordered wrt lookups.
There were two reasonms why eventfs tried to keep a pointer to a dentry:
- the creation of a 'events' directory would actually have a stable
dentry pointer that it created with tracefs_start_creating().
And it needed that dentry when tearing it all down again in
eventfs_remove_events_dir().
This use is actually ok, because the special top-level events
directory dentries are actually stable, not just a temporary cache of
the eventfs data structures.
- the 'eventfs_inode' (aka ei) needs to stay around as long as there
are dentries that refer to it.
It then used these dentry pointers as a replacement for doing
reference counting: it would try to make sure that there was only
ever one dentry associated with an event_inode, and keep a child
dentry array around to see which dentries might still refer to the
parent ei.
This gets rid of the invalid dentry pointer use, and renames the one
valid case to a different name to make it clear that it's not just any
random dentry.
The magic child dentry array that is kind of a "reverse reference list"
is simply replaced by having child dentries take a ref to the ei. As
does the directory dentries. That makes the broken use case go away.
Link: https://lore.kernel.org/linux-trace-kernel/202401291043.e62e89dc-oliver.sang@intel.com/
Link: https://lore.kernel.org/linux-trace-kernel/20240131185513.280463000@goodmis.org
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Ajay Kaher <ajay.kaher@broadcom.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fixes: c1504e5102 ("eventfs: Implement eventfs dir creation functions")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
In order for the dentries to stay up-to-date with the eventfs changes,
just add a 'd_revalidate' function that checks the 'is_freed' bit.
Also, clean up the dentry release to actually use d_release() rather
than the slightly odd d_iput() function. We don't care about the inode,
all we want to do is to get rid of the refcount to the eventfs data
added by dentry->d_fsdata.
It would probably be cleaner to make eventfs its own filesystem, or at
least set its own dentry ops when looking up eventfs files. But as it
is, only eventfs dentries use d_fsdata, so we don't really need to split
these things up by use.
Another thing that might be worth doing is to make all eventfs lookups
mark their dentries as not worth caching. We could do that with
d_delete(), but the DCACHE_DONTCACHE flag would likely be even better.
As it is, the dentries are all freeable, but they only tend to get freed
at memory pressure rather than more proactively. But that's a separate
issue.
Link: https://lore.kernel.org/linux-trace-kernel/202401291043.e62e89dc-oliver.sang@intel.com/
Link: https://lore.kernel.org/linux-trace-kernel/20240131185513.124644253@goodmis.org
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Ajay Kaher <ajay.kaher@broadcom.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fixes: c1504e5102 ("eventfs: Implement eventfs dir creation functions")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
For devices with multiple clock sources connected to a selector, we need
to check what a clock selector control request has returned. This is
needed to ensure that a requested clock source is indeed selected and for
autoclock feature to work.
For devices with single clock source connected, if we get an error there
is nothing else we can do about it. We can't skip clock selector setup as
it is required by some devices. So lets just ignore error in this case.
This should fix various buggy Mackie devices:
[ 649.109785] usb 1-1.3: parse_audio_format_rates_v2v3(): unable to find clock source (clock -32)
[ 649.111946] usb 1-1.3: parse_audio_format_rates_v2v3(): unable to find clock source (clock -32)
[ 649.113822] usb 1-1.3: parse_audio_format_rates_v2v3(): unable to find clock source (clock -32)
There is also interesting info from the Windows documentation [1] (this
is probably why manufacturers dont't even test this feature):
"The USB Audio 2.0 driver doesn't support clock selection. The driver
uses the Clock Source Entity, which is selected by default and never
issues a Clock Selector Control SET CUR request."
Link: https://learn.microsoft.com/en-us/windows-hardware/drivers/audio/usb-2-0-audio-drivers [1]
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217314
Link: https://bugzilla.kernel.org/show_bug.cgi?id=218175
Link: https://bugzilla.kernel.org/show_bug.cgi?id=218342
Signed-off-by: Alexander Tsoy <alexander@tsoy.me>
Link: https://lore.kernel.org/r/20240201115308.17838-1-alexander@tsoy.me
Signed-off-by: Takashi Iwai <tiwai@suse.de>
XDP queues are created/destroyed when a XDP program
is attached/detached. In current driver xdp_queues are not
getting destroyed on program exit due to incorrect xdp_queue
and tot_tx_queue count values.
This patch fixes the issue by setting tot_tx_queue and xdp_queue
count to correct values. It also fixes xdp.data_hard_start address.
Fixes: 06059a1a9a ("octeontx2-pf: Add XDP support to netdev PF")
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Link: https://lore.kernel.org/r/20240130120610.16673-1-gakula@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
If we use IORING_OP_RECV with provided buffers and pass in '0' as the
length of the request, the length is retrieved from the selected buffer.
If MSG_WAITALL is also set and we get a short receive, then we may hit
the retry path which decrements sr->len and increments the buffer for
a retry. However, the length is still zero at this point, which means
that sr->len now becomes huge and import_ubuf() will cap it to
MAX_RW_COUNT and subsequently return -EFAULT for the range as a whole.
Fix this by always assigning sr->len once the buffer has been selected.
Cc: stable@vger.kernel.org
Fixes: 7ba89d2af1 ("io_uring: ensure recv and recvmsg handle MSG_WAITALL correctly")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Check whether the firmware is already patched. If so, include the
firmware version in the firmware file name.
If the firmware has already been patched by the BIOS the driver
can only replace it if it has control of hard RESET.
If the driver cannot replace the firmware, it can still load a wmfw
(for ALSA control definitions) and/or a bin (for additional tunings).
But these must match the version of firmware that is running on the
CS35L56.
The firmware is pre-patched if either:
- FIRMWARE_MISSING == 0, or
- it is a secured CS35L56 (which implies that is was already patched),
cs35l56_hw_init() will set preloaded_fw_ver to the (non-zero)
firmware version if either of these conditions is true.
Normal (unpatched or replaceable firmware):
cs35l56-rev-dsp1-misc[-system_name].[wmfw|bin]
Preloaded firmware:
cs35l56-rev[-s]-VVVVVV-dsp1-misc[-system_name].[wmfw|bin]
Where:
[-s] is an optional -s added into the name for a secured CS35L56
VVVVVV is the 24-bit firmware version in hexadecimal.
Backport note:
This won't apply to kernel versions older than v6.6.
Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Fixes: 73cfbfa9ca ("ALSA: hda/cs35l56: Add driver for Cirrus Logic CS35L56 amplifier")
Link: https://msgid.link/r/20240129162737.497-18-rf@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Change the filename field layout to:
cs35l56-rev[-s]-dsp1-misc[-sub].[wmfw|bin]
This is to keep the same firmware file naming scheme as the
CS35L56 ASoC driver.
This is not a compatibility break because no firmware files have
been published.
The original field layout matched the ASoC driver, but the way the
ASoC driver used the wm_adsp driver config to form this filename
was bugged. Fixing the ASoC driver to use the correct wm_adsp config
strings means that the 's' flag (to indicate a secured part) has to
move to somewhere after the first '-'.
Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Fixes: 73cfbfa9ca ("ALSA: hda/cs35l56: Add driver for Cirrus Logic CS35L56 amplifier")
Link: https://msgid.link/r/20240129162737.497-17-rf@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Check for the cases of system-specific bin file without a
wmfw before falling back to looking for a generic wmfw.
All system-specific options should be tried before falling
back to loading a generic wmfw/bin. With the original code,
the presence of a fallback generic wmfw on the filesystem
would prevent using a system-specific tuning with a ROM
firmware.
Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Fixes: 73cfbfa9ca ("ALSA: hda/cs35l56: Add driver for Cirrus Logic CS35L56 amplifier")
Link: https://msgid.link/r/20240129162737.497-16-rf@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>
If the "spk-id-gpios" property is present it points to GPIOs whose
value must be used to select the correct bin file to match the
speakers.
Some manufacturers use multiple sources of speakers, which need
different tunings for best performance. On these models the type of
speaker fitted is indicated by the values of one or more GPIOs. The
number formed by the GPIOs identifies the tuning required.
The speaker ID must be used in combination with the subsystem ID
(either from PCI SSID or cirrus,firmware-uid property), because the
GPIOs can only indicate variants of a specific model.
Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Fixes: 1a1c3d794e ("ASoC: cs35l56: Use PCI SSID as the firmware UID")
Link: https://msgid.link/r/20240129162737.497-14-rf@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Check during initialization whether the firmware is already patched.
If so, include the firmware version in the wm_adsp fwf_name string.
If the firmware has already been patched by the BIOS the driver
can only replace it if it has control of hard RESET.
If the driver cannot replace the firmware, it can still load a wmfw
(for ALSA control definitions) and/or a bin (for additional tunings).
But these must match the version of firmware that is running on the
CS35L56.
The firmware is pre-patched if FIRMWARE_MISSING == 0.
Including the firmware version in the fwf_name string will
qualify the firmware file name:
Normal (unpatched or replaceable firmware):
cs35l56-rev-dsp1-misc[-system_name].[wmfw|bin]
Preloaded firmware:
cs35l56-rev[-s]-VVVVVV-dsp1-misc[-system_name].[wmfw|bin]
Where:
[-s] is an optional -s added into the name for a secured CS35L56
VVVVVV is the 24-bit firmware version in hexadecimal.
Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Fixes: 608f1b0dbd ("ASoC: cs35l56: Move DSP part string generation so that it is done only once")
Link: https://msgid.link/r/20240129162737.497-13-rf@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Put the silicon revision and secured flag in the wm_adsp fwf_name
string instead of including them in the part string.
This changes the format of the firmware name string from
cs35l56[s]-rev-misc[-system_name]
to
cs35l56-rev[-s]-misc[-system_name]
No firmware files have been published, so this doesn't cause a
compatibility break.
Silicon revision and secured flag are included in the firmware
filename to pick a firmware compatible with the part. These strings
were being added to the part string, but that is a misuse of the
string. The correct place for these is the fwf_name string, which
is specifically intended to select between multiple firmware files
for the same part.
Backport note:
This won't apply to kernels older than v6.6.
Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Fixes: 608f1b0dbd ("ASoC: cs35l56: Move DSP part string generation so that it is done only once")
Link: https://msgid.link/r/20240129162737.497-12-rf@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Defer initializing the state of the ASP1 mixer registers until
the firmware has been downloaded and rebooted.
On a SoundWire system the ASP is free for use as a chip-to-chip
interconnect. This can be either for the firmware on multiple
CS35L56 to share reference audio; or as a bridge to another
device. If it is a firmware interconnect it is owned by the
firmware and the Linux driver should avoid writing the registers.
However, if it is a bridge then Linux may take over and handle
it as a normal codec-to-codec link. Even if the ASP is used
as a firmware-firmware interconnect it is useful to have
ALSA controls for the ASP mixer. They are at least useful for
debugging.
CS35L56 is designed for SDCA and a generic SDCA driver would
know nothing about these chip-specific registers. So if the
ASP is being used on a SoundWire system the firmware sets up the
ASP mixer registers. This means that we can't assume the default
state of these registers. But we don't know the initial state
that the firmware set them to until after the firmware has been
downloaded and booted, which can take several seconds when
downloading multiple amps.
DAPM normally reads the initial state of mux registers during
probe() but this would mean blocking probe() for several seconds
until the firmware has initialized them. To avoid this, the
mixer muxes are set SND_SOC_NOPM to prevent DAPM trying to read
the register state. Custom get/set callbacks are implemented for
ALSA control access, and these can safely block waiting for the
firmware download.
After the firmware download has completed, the state of the
mux registers is known so a work job is queued to call
snd_soc_dapm_mux_update_power() on each of the mux widgets.
Backport note:
This won't apply cleanly to kernels older than v6.6.
Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Fixes: e496112529 ("ASoC: cs35l56: Add driver for Cirrus Logic CS35L56")
Link: https://msgid.link/r/20240129162737.497-11-rf@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Add ASP1_FRAME_CONTROL1, ASP1_FRAME_CONTROL5 and the ASP1_TX?_INPUT
registers to the sequence used to initialize the ASP configuration.
Write this sequence to the cache and directly to the registers to
ensure that they match.
A system-specific firmware can patch these registers to values that are
not the silicon default, so that the CS35L56 boots already in the
configuration used by Windows or by "driverless" Windows setups such
as factory tuning.
These may not match how Linux is configuring the HDA codec. And anyway
on Linux the ALSA controls are used to configure routing options.
Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Fixes: 73cfbfa9ca ("ALSA: hda/cs35l56: Add driver for Cirrus Logic CS35L56 amplifier")
Link: https://msgid.link/r/20240129162737.497-10-rf@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Patch the SDW TX mixer registers to silicon defaults.
CS35L56 is designed for SDCA and a generic SDCA driver would
know nothing about these chip-specific registers. So the
firmware sets up the SDW TX mixer registers to whatever audio
is relevant on a specific system.
This means that the driver cannot assume the initial values
of these registers. But Linux has ALSA controls to configure
routing, so the registers can be patched to silicon default and
the ALSA controls used to select what audio to feed back to the
host capture path.
Backport note:
This won't apply to kernels older than v6.6.
Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Fixes: e496112529 ("ASoC: cs35l56: Add driver for Cirrus Logic CS35L56")
Link: https://msgid.link/r/20240129162737.497-9-rf@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Add a dummy SUPPLY widget connected to the ASP that forces the
chip registers to match the regmap cache when the ASP is
powered-up.
On a SoundWire system the ASP is free for use as a chip-to-chip
interconnect. This can be either for the firmware on multiple
CS35L56 to share reference audio; or as a bridge to another
device. If it is a firmware interconnect it is owned by the
firmware and the Linux driver should avoid writing the registers.
However. If it is a bridge then Linux may take over and handle
it as a normal codec-to-codec link.
CS35L56 is designed for SDCA and a generic SDCA driver would
know nothing about these chip-specific registers. So if the
ASP is being used on a SoundWire system the firmware sets up the
ASP registers. This means that we can't assume the default
state of the ASP registers. But we don't know the initial state
that the firmware set them to until after the firmware has been
downloaded and booted, which can take several seconds when
downloading multiple amps.
To avoid blocking probe() for several seconds waiting for the
firmware, the silicon defaults are assumed. This allows the machine
driver to setup the ASP configuration during probe() without being
blocked. If the ASP is hooked up and used, the SUPPLY widget
ensures that the chip registers match what was configured in the
regmap cache.
If the machine driver does not hook up the ASP, it is assumed that
it won't call any functions to configure the ASP DAI. Therefore
the regmap cache will be clean for these registers so a
regcache_sync() will not overwrite the chip registers. If the
DAI is not hooked up, the dummy SUPPLY widget will not be
invoked so it will never force-overwrite the chip registers.
Backport note:
This won't apply cleanly to kernels older than v6.6.
Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Fixes: e496112529 ("ASoC: cs35l56: Add driver for Cirrus Logic CS35L56")
Link: https://msgid.link/r/20240129162737.497-8-rf@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Remove the check of fw_patched from cs35l56_is_fw_reload_needed().
Also remove the redundant check for control of the reset GPIO.
The fw_patched flag is set when cs35l56_dsp_work() has completed its
steps to download firmware and power-up wm_adsp. There was a check in
cs35l56_is_fw_reload_needed() to make a quick exit of 'false' if
!fw_patched. The original idea was that the system might be suspended
before the driver has ever made any attempt to download firmware, and
in that case the driver doesn't need to return to a patched state
because it was never in a patched state.
This check of fw_patched is buggy because it prevented ever recovering
from a failed patch. If a previous attempt to patch and reboot the
silicon had failed it would leave fw_patched==false. This would mean
the driver never attempted another download even though the fault may
have been cleared (by a hard reset, for example).
It is also a redundant check because the calling code already makes
a quick exit if cs35l56_component_probe() has not been called, which
deals with the original intent of this check but in a safer way.
The check for reset GPIO is redundant: if the silicon was hard-reset
the FIRMWARE_MISSING flag will be 1. But this check created an
expectation that the suspend/resume code toggles reset. This can't
easily be protected against accidental code breakage. The only reason
for the check was to skip runtime-resuming the driver to read the
PROTECTION_STATUS register when it already knows it reset the silicon.
But in that case the driver will have to be runtime-resumed to do
the firmware download. So it created an assumption for no benefit.
Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Fixes: 8a731fd37f ("ASoC: cs35l56: Move utility functions to shared file")
Link: https://msgid.link/r/20240129162737.497-7-rf@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Move the call to cs35l56_set_patch() earlier in cs35l56_init() so
that it only adds the register patch on first-time initialization.
The call was after the post_soft_reset label, so every time this
function was run to re-initialize the hardware after a reset it would
call regmap_register_patch() and add the same reg_sequence again.
Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Fixes: 898673b905 ("ASoC: cs35l56: Move shared data into a common data structure")
Link: https://msgid.link/r/20240129162737.497-6-rf@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>
The cs35l56->component pointer is used by the suspend-resume handling to
know whether the driver is fully instantiated. This is to prevent it
queuing dsp_work which would result in calling wm_adsp when the driver
is not an instantiated ASoC component. So this pointer must be cleared
by cs35l56_component_remove().
Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Fixes: e496112529 ("ASoC: cs35l56: Add driver for Cirrus Logic CS35L56")
Link: https://msgid.link/r/20240129162737.497-4-rf@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>
There's no need to overwrite fwf_name with a kstrdup() of the cs_dsp part
name. It is trivial to select either fwf_name or cs_dsp.part as the string
to use when building the filename in wm_adsp_request_firmware_file().
This leaves fwf_name entirely owned by the codec driver.
It also avoids problems with freeing the pointer. With the original code
fwf_name was either a pointer owned by the codec driver, or a kstrdup()
created by wm_adsp. This meant wm_adsp must free it if it set it, but not
if the codec driver set it. The code was handling this by using
devm_kstrdup().
But there is no absolute requirement that wm_adsp_common_init() must be
called from probe(), so this was a pseudo-memory leak - each new call to
wm_adsp_common_init() would allocate another block of memory but these
would only be freed if the owning codec driver was removed.
Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Link: https://msgid.link/r/20240129162737.497-3-rf@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Check for the cases of system-specific bin file without a
wmfw before falling back to looking for a generic wmfw.
All system-specific options should be tried before falling
back to loading a generic wmfw/bin. With the original code,
the presence of a fallback generic wmfw on the filesystem
would prevent using a system-specific tuning with a ROM
firmware.
Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Fixes: 0e7d82cbea ("ASoC: wm_adsp: Add support for loading bin files without wmfw")
Link: https://msgid.link/r/20240129162737.497-2-rf@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>
ASoC: Fixes for v6.8
Quite a lot of fixes that came in since the merge window, a large
portion for for Qualcomm and ES8326.
The 8 DAI support for Qualcomm is just raising a constant to allow for
devies that otherwise only need DTs, and there's a few other device ID
updates for sunxi (Allwinner) and AMD platforms.
Since commit 363d0e5628 ("media: pwm-ir-tx: Trigger edges from
hrtimer interrupt context"), pwm-ir-tx uses high resolution timers
for IR signal generation when the pwm can be used from atomic context.
Ensure they are available.
Fixes: 363d0e5628 ("media: pwm-ir-tx: Trigger edges from hrtimer interrupt context")
Signed-off-by: Sean Young <sean@mess.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
When irtoy_command fails, buf should be freed since it is allocated by
irtoy_tx, or there is a memleak.
Fixes: 4114978dcd ("media: ir_toy: prevent device from hanging during transmit")
Signed-off-by: Zhipeng Lu <alexious@zju.edu.cn>
Signed-off-by: Sean Young <sean@mess.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
The atomisp driver emulates a standard v4l2 device, which also works
for non media-controller aware applications.
Part of this requires making try_fmt calls on the sensor when
a normal v4l2 app is making try_fmt calls on the /dev/video# mode.
With the recent v4l2_subdev_state handling changes in 6.8 this no longer
works, fixing this requires 2 changes:
1. The atomisp code was using its own internal v4l2_subdev_pad_config
for this. Replace the internal v4l2_subdev_pad_config with allocating
a full v4l2_subdev_state for storing the full try_fmt state.
2. The paths actually setting the fmt or crop selection now need to be
passed the v4l2_subdev's active state, so that sensor drivers which
are using the v4l2_subdev's active state to store their state keep
working.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
This fixes warnings in xe, i915 hwmon docs:
Warning: /sys/devices/.../hwmon/hwmon<i>/curr1_crit is defined 2 times: Documentation/ABI/testing/sysfs-driver-intel-xe-hwmon:35 Documentation/ABI/testing/sysfs-driver-intel-i915-hwmon:52
Warning: /sys/devices/.../hwmon/hwmon<i>/energy1_input is defined 2 times: Documentation/ABI/testing/sysfs-driver-intel-xe-hwmon:54 Documentation/ABI/testing/sysfs-driver-intel-i915-hwmon:65
Warning: /sys/devices/.../hwmon/hwmon<i>/in0_input is defined 2 times: Documentation/ABI/testing/sysfs-driver-intel-xe-hwmon:46 Documentation/ABI/testing/sysfs-driver-intel-i915-hwmon:0
Warning: /sys/devices/.../hwmon/hwmon<i>/power1_crit is defined 2 times: Documentation/ABI/testing/sysfs-driver-intel-xe-hwmon:22 Documentation/ABI/testing/sysfs-driver-intel-i915-hwmon:39
Warning: /sys/devices/.../hwmon/hwmon<i>/power1_max is defined 2 times: Documentation/ABI/testing/sysfs-driver-intel-xe-hwmon:0 Documentation/ABI/testing/sysfs-driver-intel-i915-hwmon:8
Warning: /sys/devices/.../hwmon/hwmon<i>/power1_max_interval is defined 2 times: Documentation/ABI/testing/sysfs-driver-intel-xe-hwmon:62 Documentation/ABI/testing/sysfs-driver-intel-i915-hwmon:30
Warning: /sys/devices/.../hwmon/hwmon<i>/power1_rated_max is defined 2 times: Documentation/ABI/testing/sysfs-driver-intel-xe-hwmon:14 Documentation/ABI/testing/sysfs-driver-intel-i915-hwmon:22
Use a path containing the driver name to differentiate the documentation
of each entry.
Fixes: fb1b70607f ("drm/xe/hwmon: Expose power attributes")
Fixes: 92d44a422d ("drm/xe/hwmon: Expose card reactive critical power")
Fixes: fbcdc9d3bf ("drm/xe/hwmon: Expose input voltage attribute")
Fixes: 71d0a32524 ("drm/xe/hwmon: Expose hwmon energy attribute")
Fixes: 4446fcf220 ("drm/xe/hwmon: Expose power1_max_interval")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Closes: https://lore.kernel.org/all/20240125113345.291118ff@canb.auug.org.au/
Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240127165040.2348009-1-badal.nilawar@intel.com
(cherry picked from commit 20485e3a81)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
The driver requests the interrupts as IRQF_SHARED, so the interrupt
handlers can be called at any time. If such a call happens while the ISP
is powered down, the SoC will hang as the driver tries to access the
ISP registers.
This can be reproduced even without the platform sharing the IRQ line:
Enable CONFIG_DEBUG_SHIRQ and unload the driver, and the board will
hang.
Fix this by adding a new field, 'irqs_enabled', which is used to bail
out from the interrupt handler when the ISP is not operational.
Link: https://lore.kernel.org/r/20231218-rkisp-shirq-fix-v1-2-173007628248@ideasonboard.com
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com>
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Paolo Abeni says:
====================
selftests: net: a few pmtu.sh fixes
This series try to address CI failures for the pmtu.sh tests. It
does _not_ attempt to enable all the currently skipped cases, to
avoid adding more entropy.
Tested with:
make -C tools/testing/selftests/ TARGETS=net install
vng --build --config tools/testing/selftests/net/config
vng --run . --user root -- \
./tools/testing/selftests/kselftest_install/run_kselftest.sh \
-t net:pmtu.sh
====================
Link: https://lore.kernel.org/r/cover.1706635101.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When running the pmtu.sh via the kselftest infra, accessing
/dev/stdout gives unexpected results:
# dd: failed to open '/dev/stdout': Device or resource busy
# TEST: IPv4, bridged vxlan4: PMTU exceptions [FAIL]
Let dd use directly the standard output to fix the above:
# TEST: IPv4, bridged vxlan4: PMTU exceptions - nexthop objects [ OK ]
Fixes: 136a1b434b ("selftests: net: test vxlan pmtu exceptions with tcp")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/23d7592c5d77d75cff9b34f15c227f92e911c2ae.1706635101.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The pmtu.sh test tries to detect the tunnel protocols available
in the running kernel and properly skip the unsupported cases.
In a few more complex setup, such detection is unsuccessful, as
the script currently ignores some intermediate error code at
setup time.
Before:
# which: no nettest in (/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin)
# TEST: vti6: PMTU exceptions (ESP-in-UDP) [FAIL]
# PMTU exception wasn't created after creating tunnel exceeding link layer MTU
# ./pmtu.sh: line 931: kill: (7543) - No such process
# ./pmtu.sh: line 931: kill: (7544) - No such process
After:
# xfrm4 not supported
# TEST: vti4: PMTU exceptions [SKIP]
Fixes: ece1278a9b ("selftests: net: add ESP-in-UDP PMTU test")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/cab10e75fda618e6fff8c595b632f47db58b9309.1706635101.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The mentioned test uses a few Kconfig still missing the
net config, add them.
Before:
# Error: Specified qdisc kind is unknown.
# Error: Specified qdisc kind is unknown.
# Error: Qdisc not classful.
# We have an error talking to the kernel
# Error: Qdisc not classful.
# We have an error talking to the kernel
# policy_routing not supported
# TEST: ICMPv4 with DSCP and ECN: PMTU exceptions [SKIP]
After:
# TEST: ICMPv4 with DSCP and ECN: PMTU exceptions [ OK ]
Fixes: ec730c3e1f ("selftest: net: Test IPv4 PMTU exceptions with DSCP and ECN")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/8d27bf6762a5c7b3acc457d6e6872c533040f9c1.1706635101.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Brett Creeley says:
====================
pds_core: Various fixes
This series includes the following changes:
There can be many users of the pds_core's adminq. This includes
pds_core's uses and any clients that depend on it. When the pds_core
device goes through a reset for any reason the adminq is freed
and reconfigured. There are some gaps in the current implementation
that will cause crashes during reset if any of the previously mentioned
users of the adminq attempt to use it after it's been freed.
Issues around how resets are handled, specifically regarding the driver's
error handlers.
Originally these patches were aimed at net-next, but it was requested to
push the fixes patches to net. The original patches can be found here:
https://lore.kernel.org/netdev/20240126174255.17052-1-brett.creeley@amd.com/
Also, the Reviewed-by tags were left in place from net-next reviews as the
patches didn't change.
====================
Link: https://lore.kernel.org/r/20240129234035.69802-1-brett.creeley@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently the teardown/setup flow for driver probe/remove is quite
a bit different from the reset flows in pdsc_fw_down()/pdsc_fw_up().
One key piece that's missing are the calls to pci_alloc_irq_vectors()
and pci_free_irq_vectors(). The pcie reset case is calling
pci_free_irq_vectors() on reset_prepare, but not calling the
corresponding pci_alloc_irq_vectors() on reset_done. This is causing
unexpected/unwanted interrupt behavior due to the adminq interrupt
being accidentally put into legacy interrupt mode. Also, the
pci_alloc_irq_vectors()/pci_free_irq_vectors() functions are being
called directly in probe/remove respectively.
Fix this inconsistency by making the following changes:
1. Always call pdsc_dev_init() in pdsc_setup(), which calls
pci_alloc_irq_vectors() and get rid of the now unused
pds_dev_reinit().
2. Always free/clear the pdsc->intr_info in pdsc_teardown()
since this structure will get re-alloced in pdsc_setup().
3. Move the calls of pci_free_irq_vectors() to pdsc_teardown()
since pci_alloc_irq_vectors() will always be called in
pdsc_setup()->pdsc_dev_init() for both the probe/remove and
reset flows.
4. Make sure to only create the debugfs "identity" entry when it
doesn't already exist, which it will in the reset case because
it's already been created in the initial call to pdsc_dev_init().
Fixes: ffa5585833 ("pds_core: implement pci reset handlers")
Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Link: https://lore.kernel.org/r/20240129234035.69802-7-brett.creeley@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
There are multiple paths that can result in using the pdsc's
adminq.
[1] pdsc_adminq_isr and the resulting work from queue_work(),
i.e. pdsc_work_thread()->pdsc_process_adminq()
[2] pdsc_adminq_post()
When the device goes through reset via PCIe reset and/or
a fw_down/fw_up cycle due to bad PCIe state or bad device
state the adminq is destroyed and recreated.
A NULL pointer dereference can happen if [1] or [2] happens
after the adminq is already destroyed.
In order to fix this, add some further state checks and
implement reference counting for adminq uses. Reference
counting was used because multiple threads can attempt to
access the adminq at the same time via [1] or [2]. Additionally,
multiple clients (i.e. pds-vfio-pci) can be using [2]
at the same time.
The adminq_refcnt is initialized to 1 when the adminq has been
allocated and is ready to use. Users/clients of the adminq
(i.e. [1] and [2]) will increment the refcnt when they are using
the adminq. When the driver goes into a fw_down cycle it will
set the PDSC_S_FW_DEAD bit and then wait for the adminq_refcnt
to hit 1. Setting the PDSC_S_FW_DEAD before waiting will prevent
any further adminq_refcnt increments. Waiting for the
adminq_refcnt to hit 1 allows for any current users of the adminq
to finish before the driver frees the adminq. Once the
adminq_refcnt hits 1 the driver clears the refcnt to signify that
the adminq is deleted and cannot be used. On the fw_up cycle the
driver will once again initialize the adminq_refcnt to 1 allowing
the adminq to be used again.
Fixes: 01ba61b55b ("pds_core: Add adminq processing and commands")
Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://lore.kernel.org/r/20240129234035.69802-5-brett.creeley@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The initial design for the adminq interrupt was done based
on client drivers having their own adminq and adminq
interrupt. So, each client driver's adminq isr would use
their specific adminqcq for the private data struct. For the
time being the design has changed to only use a single
adminq for all clients. So, instead use the struct pdsc for
the private data to simplify things a bit.
This also has the benefit of not dereferencing the adminqcq
to access the pdsc struct when the PDSC_S_STOPPING_DRIVER bit
is set and the adminqcq has actually been cleared/freed.
Fixes: 01ba61b55b ("pds_core: Add adminq processing and commands")
Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://lore.kernel.org/r/20240129234035.69802-4-brett.creeley@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
There is a small window where pdsc_work_thread()
calls pdsc_process_adminq() and pdsc_process_adminq()
passes the PDSC_S_STOPPING_DRIVER check and starts
to process adminq/notifyq work and then the driver
starts a fw_down cycle. This could cause some
undefined behavior if the notifyqcq/adminqcq are
free'd while pdsc_process_adminq() is running. Use
cancel_work_sync() on the adminqcq's work struct
to make sure any pending work items are cancelled
and any in progress work items are completed.
Also, make sure to not call cancel_work_sync() if
the work item has not be initialized. Without this,
traces will happen in cases where a reset fails and
teardown is called again or if reset fails and the
driver is removed.
Fixes: 01ba61b55b ("pds_core: Add adminq processing and commands")
Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://lore.kernel.org/r/20240129234035.69802-3-brett.creeley@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The PCIe reset handlers can run at the same time as the
health thread. This can cause the health thread to
stomp on the PCIe reset. Fix this by preventing the
health thread from running while a PCIe reset is happening.
As part of this use timer_shutdown_sync() during reset and
remove to make sure the timer doesn't ever get rearmed.
Fixes: ffa5585833 ("pds_core: implement pci reset handlers")
Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://lore.kernel.org/r/20240129234035.69802-2-brett.creeley@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The fc transport logs the opcode and fctype on command timeout.
This is sufficient information to identify the command issued,
but not very human-readable. Use the nvme_fabrics_opcode_str()
helper to also log the name of the command, as rdma and tcp already do.
Signed-off-by: Caleb Sander <csander@purestorage.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
nvme_opcode_str() currently supports admin, IO, and fabrics commands.
However, fabrics commands aren't allowed for the pci transport.
Currently the pci caller passes 0 as the fctype,
which means any fabrics command would be displayed as "Property Set".
Move fabrics command support into a function nvme_fabrics_opcode_str()
and remove the fctype argument to nvme_opcode_str().
This way, a fabrics command will display as "Unknown" for pci.
Convert the rdma and tcp transports to use nvme_fabrics_opcode_str().
Signed-off-by: Caleb Sander <csander@purestorage.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Currently, the test is racy and seems to not pass anymore.
In order to rectify it, aim on TCP_TW_RST.
Doesn't seem way too good with this sleep() part, but it seems as
a reasonable compromise for the test. There is a plan in-line comment on
how-to improve it, going to do it on the top, at this moment I want it
to run on netdev/patchwork selftests dashboard.
It also slightly changes tcp_ao-lib in order to get SO_ERROR propagated
to test_client_verify() return value.
Fixes: c6df7b2361 ("selftests/net: Add TCP-AO RST test")
Signed-off-by: Dmitry Safonov <dima@arista.com>
Link: https://lore.kernel.org/r/20240130-tcp-ao-test-key-mgmt-v2-3-d190430a6c60@arista.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
As the names of (struct test_key) members didn't reflect whether the key
was used for TX or RX, the verification for the counters was done
incorrectly for asymmetrical selftests.
Rename these with _tx appendix and fix checks in verify_counters().
While at it, as the checks are now correct, introduce skip_counters_checks,
which is intended for tests where it's expected that a key that was set
with setsockopt(sk, IPPROTO_TCP, TCP_AO_INFO, ...) might had no chance
of getting used on the wire.
Fixes the following failures, exposed by the previous commit:
> not ok 51 server: Check current != rnext keys set before connect(): Counter pkt_good was expected to increase 0 => 0 for key 132:5
> not ok 52 server: Check current != rnext keys set before connect(): Counter pkt_good was not expected to increase 0 => 21 for key 137:10
>
> not ok 63 server: Check current flapping back on peer's RnextKey request: Counter pkt_good was expected to increase 0 => 0 for key 132:5
> not ok 64 server: Check current flapping back on peer's RnextKey request: Counter pkt_good was not expected to increase 0 => 40 for key 137:10
Cc: Mohammad Nassiri <mnassiri@ciena.com>
Fixes: 3c3ead5556 ("selftests/net: Add TCP-AO key-management test")
Signed-off-by: Dmitry Safonov <dima@arista.com>
Link: https://lore.kernel.org/r/20240130-tcp-ao-test-key-mgmt-v2-2-d190430a6c60@arista.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The end_server() function only operates in the server thread
and always takes an accept socket instead of a listen socket as
its input argument. To align with this, invert the boolean values
used when calling verify_counters() within the end_server() function.
As a result of this typo, the test didn't correctly check for
the non-symmetrical scenario, where i.e. peer-A uses a key <100:200>
to send data, but peer-B uses another key <105:205> to send its data.
So, in simple words, different keys for TX and RX.
Fixes: 3c3ead5556 ("selftests/net: Add TCP-AO key-management test")
Signed-off-by: Mohammad Nassiri <mnassiri@ciena.com>
Link: https://lore.kernel.org/all/934627c5-eebb-4626-be23-cfb134c01d1a@arista.com/
[amended 'Fixes' tag, added the issue description and carried-over to lkml]
Signed-off-by: Dmitry Safonov <dima@arista.com>
Link: https://lore.kernel.org/r/20240130-tcp-ao-test-key-mgmt-v2-1-d190430a6c60@arista.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
nvme_is_fabrics() and nvme_is_write() only read struct nvme_command,
so take it by const pointer. This allows callers to pass a const pointer
and communicates that these functions don't modify the command.
Signed-off-by: Caleb Sander <csander@purestorage.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
In nvme_get_error_status_str(), the status code is already masked
with 0x7ff at the beginning of the function.
Don't bother masking it again when indexing nvme_statuses.
Signed-off-by: Caleb Sander <csander@purestorage.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
The functions in drivers/nvme/host/constants.c returning human-readable
status and opcode strings currently use type "const unsigned char *".
Typically string constants use type "const char *",
so remove "unsigned" from the return types.
This is a purely cosmetic change to clarify that the functions
return text strings instead of an array of bytes, for example.
Signed-off-by: Caleb Sander <csander@purestorage.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Add MODULE_DESCRIPTION() in order to remove warnings & get clean build:-
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/nvme/common/nvme-auth.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/nvme/common/nvme-keyring.o
Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Authentication commands might trigger a lengthy computation on the
controller or even a callout to an external entity.
In these cases the controller might return a status without the DNR
bit set, indicating that the command should be retried.
This patch enables retries for authentication commands by setting
NVME_SUBMIT_RETRY for __nvme_submit_sync_cmd().
Reported-by: Martin George <marting@netapp.com>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Combine the two arguments 'flags' and 'at_head' from __nvme_submit_sync_cmd()
into a single 'flags' argument and use function-specific values to indicate
what should be set within the function.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
No point in having macros just for a single function nvme_auth_submit().
Open-code them into the caller.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Not all mv88e6xxx device support C45 read/write operations. Those
which do not return -EOPNOTSUPP. However, when phylib scans the bus,
it considers this fatal, and the probe of the MDIO bus fails, which in
term causes the mv88e6xxx probe as a whole to fail.
When there is no device on the bus for a given address, the pull up
resistor on the data line results in the read returning 0xffff. The
phylib core code understands this when scanning for devices on the
bus. C45 allows multiple devices to be supported at one address, so
phylib will perform a few reads at each address, so although thought
not the most efficient solution, it is a way to avoid fatal
errors. Make use of this as a minimal fix for stable to fix the
probing problems.
Follow up patches will rework how C45 operates to make it similar to
C22 which considers -ENODEV as a none-fatal, and swap mv88e6xxx to
using this.
Cc: stable@vger.kernel.org
Fixes: 743a19e38d ("net: dsa: mv88e6xxx: Separate C22 and C45 transactions")
Reported-by: Tim Menninger <tmenninger@purestorage.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20240129224948.1531452-1-andrew@lunn.ch
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Following a successful cifs_tree_connect, we have the code
to scale up/down the number of channels in the session.
However, it is not protected by a lock today.
As a result, this code can be executed by several processes
that select the same channel. The core functions handle this
well, as they pick chan_lock. However, we've seen cases where
smb2_reconnect throws some warnings.
To fix that, this change introduces a flags bitmap inside the
cifs_ses structure. A new flag type is used to ensure that
only one process enters this section at any time.
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
The error message buffer overflow 'dc->links' 12 <= 12 suggests that the
code is trying to access an element of the dc->links array that is
beyond its bounds. In C, arrays are zero-indexed, so an array with 12
elements has valid indices from 0 to 11. Trying to access dc->links[12]
would be an attempt to access the 13th element of a 12-element array,
which is a buffer overflow.
To fix this, ensure that the loop does not go beyond the last valid
index when accessing dc->links[i + 1] by subtracting 1 from the loop
condition.
This would ensure that i + 1 is always a valid index in the array.
Fixes the below:
drivers/gpu/drm/amd/amdgpu/../display/dc/link/protocols/link_dp_dpia_bw.c:208 get_host_router_total_dp_tunnel_bw() error: buffer overflow 'dc->links' 12 <= 12
Fixes: 59f1622a5f ("drm/amd/display: Add dpia display mode validation logic")
Cc: PeiChen Huang <peichen.huang@amd.com>
Cc: Aric Cyr <aric.cyr@amd.com>
Cc: Rodrigo Siqueira <rodrigo.siqueira@amd.com>
Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Cc: Meenakshikumar Somasundaram <meenakshikumar.somasundaram@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fix the warning info below during mode1 reset.
[ +0.000004] Call Trace:
[ +0.000004] <TASK>
[ +0.000006] ? show_regs+0x6e/0x80
[ +0.000011] ? __flush_work.isra.0+0x2e8/0x390
[ +0.000005] ? __warn+0x91/0x150
[ +0.000009] ? __flush_work.isra.0+0x2e8/0x390
[ +0.000006] ? report_bug+0x19d/0x1b0
[ +0.000013] ? handle_bug+0x46/0x80
[ +0.000012] ? exc_invalid_op+0x1d/0x80
[ +0.000011] ? asm_exc_invalid_op+0x1f/0x30
[ +0.000014] ? __flush_work.isra.0+0x2e8/0x390
[ +0.000007] ? __flush_work.isra.0+0x208/0x390
[ +0.000007] ? _prb_read_valid+0x216/0x290
[ +0.000008] __cancel_work_timer+0x11d/0x1a0
[ +0.000007] ? try_to_grab_pending+0xe8/0x190
[ +0.000012] cancel_work_sync+0x14/0x20
[ +0.000008] amddrm_sched_stop+0x3c/0x1d0 [amd_sched]
[ +0.000032] amdgpu_device_gpu_recover+0x29a/0xe90 [amdgpu]
This warning info was printed after applying the patch
"drm/sched: Convert drm scheduler to use a work queue rather than kthread".
The root cause is that amdgpu driver tries to use the uninitialized
work_struct in the struct drm_gpu_scheduler
v2:
- Rename the function to amdgpu_ring_sched_ready and move it to
amdgpu_ring.c (Alex)
v3:
- Fix a few more checks based on Vitaly's patch (Alex)
v4:
- squash in fix noticed by Bert in
https://gitlab.freedesktop.org/drm/amd/-/issues/3139
Fixes: 11b3b9f461 ("drm/sched: Check scheduler ready before calling timeout handling")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Ma Jun <Jun.Ma2@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
On 14us for exit latency time causes underflow for 8K monitor with HDR on.
Increasing the latency to 28us fixes the underflow.
[How]
Increase the latency to 28us. This workaround should be sufficient
before we figure out why SR exit so long.
Reviewed-by: Chaitanya Dhere <chaitanya.dhere@amd.com>
Acked-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Nicholas Susanto <nicholas.susanto@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[why]
Originally, PMFW said min FCLK is 300Mhz, but min DCFCLK can be increased
to 400Mhz because min FCLK is now 600Mhz so FCLK >= 1.5 * DCFCLK hardware
requirement will still be satisfied. Increasing min DCFCLK addresses
underflow issues (underflow occurs when phantom pipe is turned on for some
Sub-Viewport configs).
[how]
Increasing DCFCLK by raising the min_dcfclk_mhz
Reviewed-by: Chaitanya Dhere <chaitanya.dhere@amd.com>
Reviewed-by: Alvin Lee <alvin.lee2@amd.com>
Acked-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Sohaib Nadeem <sohaib.nadeem@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Partial migration to system memory should use migrate.addr, not
prange->start as virtual address to allocate system memory page.
Fixes: a546a27684 ("drm/amdkfd: Use partial migrations/mapping for GPU/CPU page faults in SVM")
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Xiaogang Chen <Xiaogang.Chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
- Disallow families other than NFPROTO_{IPV4,IPV6,INET}.
- Disallow layer 4 protocol with no ports, since destination port is a
mandatory attribute for this object.
Fixes: 857b46027d ("netfilter: nft_ct: add ct expectations support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Module reference is bumped for each user, this should not ever happen.
But BUG_ON check should use rcu_access_pointer() instead.
If this ever happens, do WARN_ON_ONCE() instead of BUG_ON() and
consolidate pointer check under the rcu read side lock section.
Fixes: fab4085f4e ("netfilter: log: nf_log_packet() as real unified interface")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The patch "netfilter: ipset: fix race condition between swap/destroy
and kernel side add/del/test", commit 28628fa9 fixes a race condition.
But the synchronize_rcu() added to the swap function unnecessarily slows
it down: it can safely be moved to destroy and use call_rcu() instead.
Eric Dumazet pointed out that simply calling the destroy functions as
rcu callback does not work: sets with timeout use garbage collectors
which need cancelling at destroy which can wait. Therefore the destroy
functions are split into two: cancelling garbage collectors safely at
executing the command received by netlink and moving the remaining
part only into the rcu callback.
Link: https://lore.kernel.org/lkml/C0829B10-EAA6-4809-874E-E1E9C05A8D84@automattic.com/
Fixes: 28628fa952 ("netfilter: ipset: fix race condition between swap/destroy and kernel side add/del/test")
Reported-by: Ale Crismani <ale.crismani@automattic.com>
Reported-by: David Wang <00107082@163.com>
Tested-by: David Wang <00107082@163.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The annotation says in sctp_new(): "If it is a shutdown ack OOTB packet, we
expect a return shutdown complete, otherwise an ABORT Sec 8.4 (5) and (8)".
However, it does not check SCTP_CID_SHUTDOWN_ACK before setting vtag[REPLY]
in the conntrack entry(ct).
Because of that, if the ct in Router disappears for some reason in [1]
with the packet sequence like below:
Client > Server: sctp (1) [INIT] [init tag: 3201533963]
Server > Client: sctp (1) [INIT ACK] [init tag: 972498433]
Client > Server: sctp (1) [COOKIE ECHO]
Server > Client: sctp (1) [COOKIE ACK]
Client > Server: sctp (1) [DATA] (B)(E) [TSN: 3075057809]
Server > Client: sctp (1) [SACK] [cum ack 3075057809]
Server > Client: sctp (1) [HB REQ]
(the ct in Router disappears somehow) <-------- [1]
Client > Server: sctp (1) [HB ACK]
Client > Server: sctp (1) [DATA] (B)(E) [TSN: 3075057810]
Client > Server: sctp (1) [DATA] (B)(E) [TSN: 3075057810]
Client > Server: sctp (1) [HB REQ]
Client > Server: sctp (1) [DATA] (B)(E) [TSN: 3075057810]
Client > Server: sctp (1) [HB REQ]
Client > Server: sctp (1) [ABORT]
when processing HB ACK packet in Router it calls sctp_new() to initialize
the new ct with vtag[REPLY] set to HB_ACK packet's vtag.
Later when sending DATA from Client, all the SACKs from Server will get
dropped in Router, as the SACK packet's vtag does not match vtag[REPLY]
in the ct. The worst thing is the vtag in this ct will never get fixed
by the upcoming packets from Server.
This patch fixes it by checking SCTP_CID_SHUTDOWN_ACK before setting
vtag[REPLY] in the ct in sctp_new() as the annotation says. With this
fix, it will leave vtag[REPLY] in ct to 0 in the case above, and the
next HB REQ/ACK from Server is able to fix the vtag as its value is 0
in nf_conntrack_sctp_packet().
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
In (e)poll mode, threads often depend on I/O events to determine when
data is ready for consumption. Within binder, a thread may initiate a
command via BINDER_WRITE_READ without a read buffer and then make use
of epoll_wait() or similar to consume any responses afterwards.
It is then crucial that epoll threads are signaled via wakeup when they
queue their own work. Otherwise, they risk waiting indefinitely for an
event leaving their work unhandled. What is worse, subsequent commands
won't trigger a wakeup either as the thread has pending work.
Fixes: 457b9a6f09 ("Staging: android: add binder driver")
Cc: Arve Hjønnevåg <arve@android.com>
Cc: Martijn Coenen <maco@android.com>
Cc: Alice Ryhl <aliceryhl@google.com>
Cc: Steven Moreland <smoreland@google.com>
Cc: stable@vger.kernel.org # v4.19+
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Link: https://lore.kernel.org/r/20240131215347.1808751-1-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Bail out on using the tunnel dst template from other than netdev family.
Add the infrastructure to check for the family in objects.
Fixes: af308b94a2 ("netfilter: nf_tables: add tunnel support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
commit c7aab4f170 ("netfilter: nf_conntrack_tcp: re-init for syn packets
only") introduces a bug where SYNs in ORIGINAL direction on reused 5-tuple
result in incorrect window scale negotiation. This commit merged the SYN
re-initialization and simultaneous open or SYN retransmits cases. Merging
this block added the logic in tcp_init_sender() that performed window scale
negotiation to the retransmitted syn case. Previously. this would only
result in updating the sender's scale and flags. After the merge the
additional logic results in improperly clearing the scale in ORIGINAL
direction before any packets in the REPLY direction are received. This
results in packets incorrectly being marked invalid for being
out-of-window.
This can be reproduced with the following trace:
Packet Sequence:
> Flags [S], seq 1687765604, win 62727, options [.. wscale 7], length 0
> Flags [S], seq 1944817196, win 62727, options [.. wscale 7], length 0
In order to fix the issue, only evaluate window negotiation for packets
in the REPLY direction. This was tested with simultaneous open, fast
open, and the above reproduction.
Fixes: c7aab4f170 ("netfilter: nf_conntrack_tcp: re-init for syn packets only")
Signed-off-by: Ryan Schaefer <ryanschf@amazon.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The function sev_map_percpu_data() checks if it is running on an SEV
platform by checking the CC_ATTR_GUEST_MEM_ENCRYPT attribute. However,
this attribute is also defined for TDX.
To avoid false positives, add a cc_vendor check.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Fixes: 4d96f91091 ("x86/sev: Replace occurrences of sev_active() with cc_platform_has()")
Suggested-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: David Rientjes <rientjes@google.com>
Message-Id: <20240124130317.495519-1-kirill.shutemov@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Since commit b0563468ee ("x86/CPU/AMD: Disable XSAVES on AMD family 0x17")
kernel unconditionally clears the XSAVES CPU feature bit on Zen1/2 CPUs.
Because KVM CPU caps are initialized from the kernel boot CPU features this
makes the XSAVES feature also unavailable for KVM guests in this case.
At the same time the XSAVEC feature is left enabled.
Unfortunately, having XSAVEC but no XSAVES in CPUID breaks Hyper-V enabled
Windows Server 2016 VMs that have more than one vCPU.
Let's at least give users hint in the kernel log what could be wrong since
these VMs currently simply hang at boot with a black screen - giving no
clue what suddenly broke them and how to make them work again.
Trigger the kernel message hint based on the particular guest ID written to
the Guest OS Identity Hyper-V MSR implemented by KVM.
Defer this check to when the L1 Hyper-V hypervisor enables SVM in EFER
since we want to limit this message to Hyper-V enabled Windows guests only
(Windows session running nested as L2) but the actual Guest OS Identity MSR
write is done by L1 and happens before it enables SVM.
Fixes: b0563468ee ("x86/CPU/AMD: Disable XSAVES on AMD family 0x17")
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Message-Id: <b83ab45c5e239e5d148b0ae7750133a67ac9575c.1706127425.git.maciej.szmigiero@oracle.com>
[Move some checks before mutex_lock(), rename function. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The spare_init() calls memmap_populate() many times to create VA to PA
mapping for the VMEMMAP area, where all "struct page" are located once
CONFIG_SPARSEMEM_VMEMMAP is defined. These "struct page" are later
initialized in the zone_sizes_init() function. However, during this
process, no sfence.vma instruction is executed for this VMEMMAP area.
This omission may cause the hart to fail to perform page table walk
because some data related to the address translation is invisible to the
hart. To solve this issue, the local_flush_tlb_kernel_range() is called
right after the sparse_init() to execute a sfence.vma instruction for this
VMEMMAP area, ensuring that all data related to the address translation
is visible to the hart.
Fixes: d95f1a542c ("RISC-V: Implement sparsemem")
Signed-off-by: Vincent Chen <vincent.chen@sifive.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20240117140333.2479667-1-vincent.chen@sifive.com
Fixes: 7a92fc8b4d ("mm: Introduce flush_cache_vmap_early()")
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
This patch is to eliminate interrupt warning below:
"[drm] Fence fallback timer expired on ring sdma0.0".
An early vm pt clearing job is sent to SDMA ahead of interrupt enabled.
And re-locating the drm client creation following after drm_dev_register
looks like a more proper flow.
v2: wrap the drm client creation
Fixes: 1819200166 ("drm/amdkfd: Export DMABufs from KFD using GEM handles")
Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The eventfs_find_events() code tries to walk up the tree to find the
event directory that a dentry belongs to, in order to then find the
eventfs inode that is associated with that event directory.
However, it uses an odd combination of walking the dentry parent,
looking up the eventfs inode associated with that, and then looking up
the dentry from there. Repeat.
But the code shouldn't have back-pointers to dentries in the first
place, and it should just walk the dentry parenthood chain directly.
Similarly, 'set_top_events_ownership()' looks up the dentry from the
eventfs inode, but the only reason it wants a dentry is to look up the
superblock in order to look up the root dentry.
But it already has the real filesystem inode, which has that same
superblock pointer. So just pass in the superblock pointer using the
information that's already there, instead of looking up extraneous data
that is irrelevant.
Link: https://lore.kernel.org/linux-trace-kernel/202401291043.e62e89dc-oliver.sang@intel.com/
Link: https://lore.kernel.org/linux-trace-kernel/20240131185512.638645365@goodmis.org
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Ajay Kaher <ajay.kaher@broadcom.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fixes: c1504e5102 ("eventfs: Implement eventfs dir creation functions")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
The return type for ring_buffer_poll_wait() is __poll_t. This is behind
the scenes an unsigned where we can set event bits. In case of a
non-allocated CPU, we do return instead -EINVAL (0xffffffea). Lucky us,
this ends up setting few error bits (EPOLLERR | EPOLLHUP | EPOLLNVAL), so
user-space at least is aware something went wrong.
Nonetheless, this is an incorrect code. Replace that -EINVAL with a
proper EPOLLERR to clean that output. As this doesn't change the
behaviour, there's no need to treat this change as a bug fix.
Link: https://lore.kernel.org/linux-trace-kernel/20240131140955.3322792-1-vdonnefort@google.com
Cc: stable@vger.kernel.org
Fixes: 6721cb6002 ("ring-buffer: Do not poll non allocated cpu buffers")
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
syzbot has found a type mismatch between a USB pipe and the transfer
endpoint, which is triggered by the bcm5974 driver[1].
This driver expects the device to provide input interrupt endpoints and
if that is not the case, the driver registration should terminate.
Repros are available to reproduce this issue with a certain setup for
the dummy_hcd, leading to an interrupt/bulk mismatch which is caught in
the USB core after calling usb_submit_urb() with the following message:
"BOGUS urb xfer, pipe 1 != type 3"
Some other device drivers (like the appletouch driver bcm5974 is mainly
based on) provide some checking mechanism to make sure that an IN
interrupt endpoint is available. In this particular case the endpoint
addresses are provided by a config table, so the checking can be
targeted to the provided endpoints.
Add some basic checking to guarantee that the endpoints available match
the expected type for both the trackpad and button endpoints.
This issue was only found for the trackpad endpoint, but the checking
has been added to the button endpoint as well for the same reasons.
Given that there was never a check for the endpoint type, this bug has
been there since the first implementation of the driver (f89bd95c5c).
[1] https://syzkaller.appspot.com/bug?extid=348331f63b034f89b622
Fixes: f89bd95c5c ("Input: bcm5974 - add driver for Macbook Air and Pro Penryn touchpads")
Signed-off-by: Javier Carrasco <javier.carrasco.cruz@gmail.com>
Reported-and-tested-by: syzbot+348331f63b034f89b622@syzkaller.appspotmail.com
Link: https://lore.kernel.org/r/20231007-topic-bcm5974_bulk-v3-1-d0f38b9d2935@gmail.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Pull SCSI fixes from James Bottomley:
"Six small fixes. Five are obvious and in drivers. The last one is a
core fix to remove the host lock acquisition and release, caused by a
dynamic check of host_busy, in the error handling loop which has been
reported to cause lockups"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: storvsc: Fix ring buffer size calculation
scsi: core: Move scsi_host_busy() out of host lock for waking up EH handler
scsi: MAINTAINERS: Update ibmvscsi_tgt maintainer
scsi: initio: Remove redundant variable 'rb'
scsi: virtio_scsi: Remove duplicate check if queue is broken
scsi: isci: Fix an error code problem in isci_io_request_build()
l2_tos_ttl_inherit.sh verifies the inheritance of tos and ttl
for GRETAP, VXLAN and GENEVE.
Before testing it checks if the required module is available
and if not skips the tests accordingly.
Currently only GRETAP and VXLAN are tested because the GENEVE
module is missing.
Fixes: b690842d12 ("selftests/net: test l2 tunnel TOS/TTL inheriting")
Signed-off-by: Matthias May <matthias.may@westermo.com>
Link: https://lore.kernel.org/r/20240130101157.196006-1-matthias.may@westermo.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
kvm_vcpu_ioctl_x86_set_vcpu_events() routine makes 'KVM_REQ_NMI'
request for a vcpu even when its 'events->nmi.pending' is zero.
Ex:
qemu_thread_start
kvm_vcpu_thread_fn
qemu_wait_io_event
qemu_wait_io_event_common
process_queued_cpu_work
do_kvm_cpu_synchronize_post_init/_reset
kvm_arch_put_registers
kvm_put_vcpu_events (cpu, level=[2|3])
This leads vCPU threads in QEMU to constantly acquire & release the
global mutex lock, delaying the guest boot due to lock contention.
Add check to make KVM_REQ_NMI request only if vcpu has NMI pending.
Fixes: bdedff2631 ("KVM: x86: Route pending NMIs from userspace through process_nmi()")
Cc: stable@vger.kernel.org
Signed-off-by: Prasad Pandit <pjp@fedoraproject.org>
Link: https://lore.kernel.org/r/20240103075343.549293-1-ppandit@redhat.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
The i2c host patches are now set to be merged into the following
repository:
git://git.kernel.org/pub/scm/linux/kernel/git/andi.shyti/linux.git
Cc: Wolfram Sang <wsa@kernel.org>
Acked-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
A last minute revert in 6.7-final introduced a potential deadlock when
enabling ASPM during probe of Qualcomm PCIe controllers as reported by
lockdep:
============================================
WARNING: possible recursive locking detected
6.7.0 #40 Not tainted
--------------------------------------------
kworker/u16:5/90 is trying to acquire lock:
ffffacfa78ced000 (pci_bus_sem){++++}-{3:3}, at: pcie_aspm_pm_state_change+0x58/0xdc
but task is already holding lock:
ffffacfa78ced000 (pci_bus_sem){++++}-{3:3}, at: pci_walk_bus+0x34/0xbc
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(pci_bus_sem);
lock(pci_bus_sem);
*** DEADLOCK ***
Call trace:
print_deadlock_bug+0x25c/0x348
__lock_acquire+0x10a4/0x2064
lock_acquire+0x1e8/0x318
down_read+0x60/0x184
pcie_aspm_pm_state_change+0x58/0xdc
pci_set_full_power_state+0xa8/0x114
pci_set_power_state+0xc4/0x120
qcom_pcie_enable_aspm+0x1c/0x3c [pcie_qcom]
pci_walk_bus+0x64/0xbc
qcom_pcie_host_post_init_2_7_0+0x28/0x34 [pcie_qcom]
The deadlock can easily be reproduced on machines like the Lenovo ThinkPad
X13s by adding a delay to increase the race window during asynchronous
probe where another thread can take a write lock.
Add a new pci_set_power_state_locked() and associated helper functions that
can be called with the PCI bus semaphore held to avoid taking the read lock
twice.
Link: https://lore.kernel.org/r/ZZu0qx2cmn7IwTyQ@hovoldconsulting.com
Link: https://lore.kernel.org/r/20240130100243.11011-1-johan+linaro@kernel.org
Fixes: f93e71aea6 ("Revert "PCI/ASPM: Remove pcie_aspm_pm_state_change()"")
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: <stable@vger.kernel.org> # 6.7
Geert Uytterhoeven reported that commit 4e244c10ea ("kconfig: remove
unneeded symbol_empty variable") changed the default value of
CONFIG_LOG_CPU_MAX_BUF_SHIFT from 12 to 0.
As it turned out, this is an undefined behavior because sym_calc_value()
stopped setting the sym->curr.tri field for 'int', 'hex', and 'string'
symbols.
This commit restores the original behavior, where 'int', 'hex', 'string'
symbols are interpreted as false if used in boolean contexts.
CONFIG_LOG_CPU_MAX_BUF_SHIFT will default to 12 again, irrespective
of CONFIG_BASE_SMALL. Presumably, this is not the intended behavior,
as already reported [1], but this is another issue that should be
addressed by a separate patch.
[1]: https://lore.kernel.org/all/f6856be8-54b7-0fa0-1d17-39632bf29ada@oracle.com/
Fixes: 4e244c10ea ("kconfig: remove unneeded symbol_empty variable")
Reported-by: Geert Uytterhoeven <geert+renesas@glider.be>
Closes: https://lore.kernel.org/all/CAMuHMdWm6u1wX7efZQf=2XUAHascps76YQac6rdnQGhc8nop_Q@mail.gmail.com/
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
The new installkernel application that is now included in systemd-udev
package allows installation although destination files are already present
in the boot directory of the kernel package, but is failing with the
implemented workaround for the old installkernel application from grubby
package.
For the new installkernel application, as Davide says:
<<The %post currently does a shuffling dance before calling installkernel.
This isn't actually necessary afaict, and the current implementation
ends up triggering downstream issues such as
https://github.com/systemd/systemd/issues/29568
This commit simplifies the logic to remove the shuffling. For reference,
the original logic was added in commit 3c9c7a14b627("rpm-pkg: add %post
section to create initramfs and grub hooks").>>
But we need to keep the old behavior as well, because the old installkernel
application from grubby package, does not allow this simplification and
we need to be backward compatible to avoid issues with the different
packages.
Mimic Fedora shipping process and store vmlinuz, config amd System.map
in the module directory instead of the boot directory. In this way, we will
avoid the commented problem for all the cases, because the new destination
files are not going to exist in the boot directory of the kernel package.
Replace installkernel tool with kernel-install tool, because the latter is
more complete.
Besides, after installkernel tool execution, check to complete if the
correct package files vmlinuz, System.map and config files are present
in /boot directory, and if necessary, copy manually for install operation.
In this way, take into account if files were not previously copied from
/usr/lib/kernel/install.d/* scripts and if the suitable files for the
requested package are present (it could be others if the rpm files were
replace with a new pacakge with the same release and a different build).
Tested with Fedora 38, Fedora 39, RHEL 9, Oracle Linux 9.3,
openSUSE Tumbleweed and openMandrive ROME, using dnf/zypper and rpm tools.
cc: stable@vger.kernel.org
Co-Developed-by: Davide Cavalca <dcavalca@meta.com>
Signed-off-by: Jose Ignacio Tornos Martinez <jtornosm@redhat.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Aiden Leong reported modpost fails to build on macOS since commit
16a473f60e ("modpost: inform compilers that fatal() never returns"):
scripts/mod/modpost.c:93:21: error: aliases are not supported on darwin
Nathan's research indicates that Darwin seems to support weak aliases
at least [1]. Although the situation might be improved in future Clang
versions, we can achieve a similar outcome without relying on it.
This commit makes fatal() a macro of error() + exit(1) in modpost.h, as
compilers recognize that exit() never returns.
[1]: https://github.com/llvm/llvm-project/issues/71001
Fixes: 16a473f60e ("modpost: inform compilers that fatal() never returns")
Reported-by: Aiden Leong <aiden.leong@aibsd.com>
Closes: https://lore.kernel.org/all/d9ac2960-6644-4a87-b5e4-4bfb6e0364a8@aibsd.com/
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
The constraints for 'audio-ports' don't match the description. There can
be 1 or 2 DAI entries and each entry is exactly 2 values. Also, the
values' sizes are 32-bits, not 8-bits. Move the size constraints to the
outer dimension (number of DAIs) and add constraints on inner array
values.
Link: https://lore.kernel.org/r/20240122204959.1665970-1-robh@kernel.org
Signed-off-by: Rob Herring <robh@kernel.org>
There currently exists two thinkpad headset jack fixups:
ALC285_FIXUP_THINKPAD_NO_BASS_SPK_HEADSET_JACK
ALC285_FIXUP_THINKPAD_HEADSET_JACK
The latter is applied to alc285 and alc287 thinkpads which contain
bass speakers.
However, the former was only being applied to alc285 thinkpads,
leaving non-bass alc287 thinkpads with no headset button controls.
This patch fixes that by adding ALC285_FIXUP_THINKPAD_NO_BASS_SPK_HEADSET_JACK
to the alc287 chains, allowing the detection of headset buttons.
Signed-off-by: José Relvas <josemonsantorelvas@gmail.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20240131113407.34698-3-josemonsantorelvas@gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
When using hotplug and bringing up a 32-bit CPU, ask the firmware about the
BTLB information to set up the static (block) TLB entries.
For that write access to the static btlb_info struct is needed, but
since it is marked __ro_after_init the kernel segfaults with missing
write permissions.
Fix the crash by dropping the __ro_after_init annotation.
Fixes: e5ef93d02d ("parisc: BTLB: Initialize BTLB tables at CPU startup")
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: <stable@vger.kernel.org> # v6.6+
ASMedia have confirmed that all ASM106x parts currently listed in
ahci_pci_tbl[] suffer from the 43-bit DMA address limitation that we ran
into on the ASM1061, and therefore, we need to apply the quirk added by
commit 20730e9b27 ("ahci: add 43-bit DMA address quirk for ASMedia
ASM1061 controllers") to the other supported ASM106x parts as well.
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/linux-ide/ZbopwKZJAKQRA4Xv@x1-carbon/
Signed-off-by: Lennert Buytenhek <kernel@wantstofly.org>
[cassel: add link to ASMedia confirmation email]
Signed-off-by: Niklas Cassel <cassel@kernel.org>
Follow the docs at Documentation/bpf/kfuncs.rst:
- declare the function with `__bpf_kfunc`
- disables missing prototype warnings, which allows to remove them from
include/linux/hid-bpf.h
Removing the prototypes is not an issue because we currently have to
redeclare them when writing the BPF program. They will eventually be
generated by bpftool directly AFAIU.
Link: https://lore.kernel.org/r/20240124-b4-hid-bpf-fixes-v2-3-052520b1e5e6@kernel.org
Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
Relax DEVX access upon modify commands to be UVERBS_ACCESS_READ.
The kernel doesn't need to protect what firmware protects, or what
causes no damage to anyone but the user.
As firmware needs to protect itself from parallel access to the same
object, don't block parallel modify/query commands on the same object in
the kernel side.
This change will allow user space application to run parallel updates to
different entries in the same bulk object.
Tested-by: Tamar Mashiah <tmashiah@nvidia.com>
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
Link: https://lore.kernel.org/r/7407d5ed35dc427c1097699e12b49c01e1073406.1706433934.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
debugfs entries for RRoCE general CC parameters must be exposed only when
they are supported, otherwise when accessing them there may be a syndrome
error in kernel log, for example:
$ cat /sys/kernel/debug/mlx5/0000:08:00.1/cc_params/rtt_resp_dscp
cat: '/sys/kernel/debug/mlx5/0000:08:00.1/cc_params/rtt_resp_dscp': Invalid argument
$ dmesg
mlx5_core 0000:08:00.1: mlx5_cmd_out_err:805:(pid 1253): QUERY_CONG_PARAMS(0x824) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x325a82), err(-22)
Fixes: 66fb1d5df6 ("IB/mlx5: Extend debug control for CC parameters")
Reviewed-by: Edward Srouji <edwards@nvidia.com>
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Link: https://lore.kernel.org/r/e7ade70bad52b7468bdb1de4d41d5fad70c8b71c.1706433934.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
[BUG]
There is a syzbot crash, triggered by the ASSERT() during subvolume
creation:
assertion failed: !anon_dev, in fs/btrfs/disk-io.c:1319
------------[ cut here ]------------
kernel BUG at fs/btrfs/disk-io.c:1319!
invalid opcode: 0000 [#1] PREEMPT SMP KASAN
RIP: 0010:btrfs_get_root_ref.part.0+0x9aa/0xa60
<TASK>
btrfs_get_new_fs_root+0xd3/0xf0
create_subvol+0xd02/0x1650
btrfs_mksubvol+0xe95/0x12b0
__btrfs_ioctl_snap_create+0x2f9/0x4f0
btrfs_ioctl_snap_create+0x16b/0x200
btrfs_ioctl+0x35f0/0x5cf0
__x64_sys_ioctl+0x19d/0x210
do_syscall_64+0x3f/0xe0
entry_SYSCALL_64_after_hwframe+0x63/0x6b
---[ end trace 0000000000000000 ]---
[CAUSE]
During create_subvol(), after inserting root item for the newly created
subvolume, we would trigger btrfs_get_new_fs_root() to get the
btrfs_root of that subvolume.
The idea here is, we have preallocated an anonymous device number for
the subvolume, thus we can assign it to the new subvolume.
But there is really nothing preventing things like backref walk to read
the new subvolume.
If that happens before we call btrfs_get_new_fs_root(), the subvolume
would be read out, with a new anonymous device number assigned already.
In that case, we would trigger ASSERT(), as we really expect no one to
read out that subvolume (which is not yet accessible from the fs).
But things like backref walk is still possible to trigger the read on
the subvolume.
Thus our assumption on the ASSERT() is not correct in the first place.
[FIX]
Fix it by removing the ASSERT(), and just free the @anon_dev, reset it
to 0, and continue.
If the subvolume tree is read out by something else, it should have
already get a new anon_dev assigned thus we only need to free the
preallocated one.
Reported-by: Chenyuan Yang <chenyuan0y@gmail.com>
Fixes: 2dfb1e43f5 ("btrfs: preallocate anon block device at first phase of snapshot creation")
CC: stable@vger.kernel.org # 5.15+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
If a subvolume still exists, forbid deleting its qgroup 0/subvolid.
This behavior generally leads to incorrect behavior in squotas and
doesn't have a legitimate purpose.
Fixes: cecbb533b5 ("btrfs: record simple quota deltas in delayed refs")
CC: stable@vger.kernel.org # 5.4+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Creating a qgroup 0/subvolid leads to various races and it isn't
helpful, because you can't specify a subvol id when creating a subvol,
so you can't be sure it will be the right one. Any requirements on the
automatic subvol can be gratified by using a higher level qgroup and the
inheritance parameters of subvol creation.
Fixes: cecbb533b5 ("btrfs: record simple quota deltas in delayed refs")
CC: stable@vger.kernel.org # 4.14+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Pull erofs fixes from Gao Xiang:
- fix an infinite loop issue of sub-page compressed data support found
with lengthy stress tests on a 64k-page arm64 VM
- optimize the temporary buffer allocation for low-memory scenarios,
which can reduce 20.21% on average under a heavy multi-app launch
benchmark workload
- get rid of unnecessary GFP_NOFS
* tag 'erofs-for-6.8-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: relaxed temporary buffers allocation on readahead
erofs: fix infinite loop due to a race of filling compressed_bvecs
erofs: get rid of unneeded GFP_NOFS
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2024-01-29 (e1000e, ixgbe)
This series contains updates to e1000e and ixgbe drivers.
Jake corrects values used for maximum frequency adjustment for e1000e.
Christophe Jaillet adjusts error handling path so that semaphore is
released on ixgbe.
* '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
ixgbe: Fix an error handling path in ixgbe_read_iosf_sb_reg_x550()
e1000e: correct maximum frequency adjustment values
====================
Link: https://lore.kernel.org/r/20240129185240.787397-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When port function state change is requested, and when the driver
does not support it, it refers to the hw address attribute instead
of state attribute. Seems like a copy paste error.
Fix it by referring to the port function state attribute.
Fixes: c0bea69d1c ("devlink: Validate port function request")
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20240129191059.129030-1-parav@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The original idea of the delay_time check was to not apply multicast
snooping too early when an MLD querier appears. And to instead wait at
least for MLD reports to arrive before switching from flooding to group
based, MLD snooped forwarding, to avoid temporary packet loss.
However in a batman-adv mesh network it was noticed that after 248 days of
uptime 32bit MIPS based devices would start to signal that they had
stopped applying multicast snooping due to missing queriers - even though
they were the elected querier and still sending MLD queries themselves.
While time_is_before_jiffies() generally is safe against jiffies
wrap-arounds, like the code comments in jiffies.h explain, it won't
be able to track a difference larger than ULONG_MAX/2. With a 32bit
large jiffies and one jiffies tick every 10ms (CONFIG_HZ=100) on these MIPS
devices running OpenWrt this would result in a difference larger than
ULONG_MAX/2 after 248 (= 2^32/100/60/60/24/2) days and
time_is_before_jiffies() would then start to return false instead of
true. Leading to multicast snooping not being applied to multicast
packets anymore.
Fix this issue by using a proper timer_list object which won't have this
ULONG_MAX/2 difference limitation.
Fixes: b00589af3b ("bridge: disable snooping if there is no querier")
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://lore.kernel.org/r/20240127175033.9640-1-linus.luessing@c0d3.blue
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
One of the test cases in the test_bridge_backup_port.sh selftest relies
on a matchall classifier to drop unrelated traffic so that the Tx drop
counter on the VXLAN device will only be incremented as a result of
traffic generated by the test.
However, the configuration option for the matchall classifier is
missing from the configuration file which might explain the failures we
see in the netdev CI [1].
Fix by adding CONFIG_NET_CLS_MATCHALL to the configuration file.
[1]
# Backup nexthop ID - invalid IDs
# -------------------------------
[...]
# TEST: Forwarding out of vx0 [ OK ]
# TEST: No forwarding using backup nexthop ID [ OK ]
# TEST: Tx drop increased [FAIL]
# TEST: IPv6 address family nexthop as backup nexthop [ OK ]
# TEST: No forwarding out of swp1 [ OK ]
# TEST: Forwarding out of vx0 [ OK ]
# TEST: No forwarding using backup nexthop ID [ OK ]
# TEST: Tx drop increased [FAIL]
[...]
Fixes: b408453053 ("selftests: net: Add bridge backup port and backup nexthop ID test")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://lore.kernel.org/r/20240129123703.1857843-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When probing the open-dice driver with PROVE_LOCKING=y, lockdep
complains that the mutex in 'drvdata->lock' has a non-static key:
| INFO: trying to register non-static key.
| The code is fine but needs lockdep annotation, or maybe
| you didn't initialize this object before use?
| turning off the locking correctness validator.
Fix the problem by initialising the mutex memory with mutex_init()
instead of __MUTEX_INITIALIZER().
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: David Brazdil <dbrazdil@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20240126152410.10148-1-will@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In remoteproc shutdown sequence, rpmsg_remove will get called which
would depopulate all the child nodes that have been created during
rpmsg_probe. This would result in cb_remove call for all the context
banks for the remoteproc. In cb_remove function, session 0 is
getting skipped which is not correct as session 0 will never become
available again. Add changes to mark session 0 also as invalid.
Fixes: f6f9279f2b ("misc: fastrpc: Add Qualcomm fastrpc basic driver model")
Cc: stable <stable@kernel.org>
Signed-off-by: Ekansh Gupta <quic_ekangupt@quicinc.com>
Link: https://lore.kernel.org/r/20240108114833.20480-1-quic_ekangupt@quicinc.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pull kunit fixes from Shuah Khan:
"NULL vs IS_ERR() bug fixes, documentation update, MAINTAINERS file
update to add Rae Moar as a reviewer, and a fix to run test suites
only after module initialization completes"
* tag 'linux_kselftest-kunit-fixes-6.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
Documentation: KUnit: Update the instructions on how to test static functions
kunit: run test suites only after module initialization completes
MAINTAINERS: kunit: Add Rae Moar as a reviewer
kunit: device: Fix a NULL vs IS_ERR() check in init()
kunit: Fix a NULL vs IS_ERR() bug
Pull kselftest fixes from Shuah Khan:
"Three fixes to livepatch, rseq, and seccomp tests"
* tag 'linux_kselftest-fixes-6.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
kselftest/seccomp: Report each expectation we assert as a KTAP test
kselftest/seccomp: Use kselftest output functions for benchmark
selftests/livepatch: fix and refactor new dmesg message code
selftests/rseq: Do not skip !allowed_cpus for mm_cid
The Lenovo Legion Go is a handheld gaming system, similar to a Steam Deck.
It has a gamepad (including rear paddles), 3 gyroscopes, a trackpad,
volume buttons, a power button, and 2 LED ring lights.
The Legion Go firmware presents these controls as a USB hub with various
devices attached. In its default state, the gamepad is presented as an
Xbox controller connected to this hub. (By holding a combination of
buttons, it can be changed to use the older DirectInput API.)
This patch teaches the existing Xbox controller module `xpad` to bind to
the controller in the Legion Go, which enables support for the:
- directional pad,
- analog sticks (including clicks),
- X, Y, A, B,
- start and select (or menu and capture),
- shoulder buttons, and
- rumble.
The trackpad, touchscreen, volume controls, and power button are already
supported via existing kernel modules. Two of the face buttons, the
gyroscopes, rear paddles, and LEDs are not.
After this patch lands, the Legion Go will be mostly functional in Linux,
out-of-the-box. The various components of the USB hub can be synthesized
into a single logical controller (including the additional buttons) in
userspace with [Handheld Daemon](https://github.com/hhd-dev/hhd), which
makes the Go fully functional.
Signed-off-by: Brenton Simpson <appsforartists@google.com>
Link: https://lore.kernel.org/r/20240118183546.418064-1-appsforartists@google.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
For these hooks the true "neutral" value is -EOPNOTSUPP, which is
currently what is returned when no LSM provides this hook and what LSMs
return when there is no security context set on the socket. Correct the
value in <linux/lsm_hooks.h> and adjust the dispatch functions in
security/security.c to avoid issues when the BPF LSM is enabled.
Cc: stable@vger.kernel.org
Fixes: 98e828a065 ("security: Refactor declaration of LSM hooks")
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
[PM: subject line tweak]
Signed-off-by: Paul Moore <paul@paul-moore.com>
The rule inside kvm enforces that the vcpu->mutex is taken *inside*
kvm->lock. The rule is violated by the pkvm_create_hyp_vm() which acquires
the kvm->lock while already holding the vcpu->mutex lock from
kvm_vcpu_ioctl(). Avoid the circular locking dependency altogether by
protecting the hyp vm handle with the config_lock, much like we already
do for other forms of VM-scoped data.
Signed-off-by: Sebastian Ene <sebastianene@google.com>
Cc: stable@vger.kernel.org
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20240124091027.1477174-2-sebastianene@google.com
Add the description of @memory_type to silence the warning:
drivers/firmware/efi/libstub/alignedmem.c:27: warning: Function parameter or struct member 'memory_type' not described in 'efi_allocate_pages_aligned'
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
[ardb: tweak comment]
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
The EFI stub's kernel placement logic randomizes the physical placement
of the kernel by taking all available memory into account, and picking a
region at random, based on a random seed.
When KASLR is disabled, this seed is set to 0x0, and this results in the
lowest available region of memory to be selected for loading the kernel,
even if this is below LOAD_PHYSICAL_ADDR. Some of this memory is
typically reserved for the GFP_DMA region, to accommodate masters that
can only access the first 16 MiB of system memory.
Even if such devices are rare these days, we may still end up with a
warning in the kernel log, as reported by Tom:
swapper/0: page allocation failure: order:10, mode:0xcc1(GFP_KERNEL|GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0
Fix this by tweaking the random allocation logic to accept a low bound
on the placement, and set it to LOAD_PHYSICAL_ADDR.
Fixes: a1b87d54f4 ("x86/efistub: Avoid legacy decompressor when doing EFI boot")
Reported-by: Tom Englund <tomenglund26@gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218404
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
open_path_or_exit() is used for '/dev/kvm', '/dev/sev', and
'/sys/module/%s/parameters/%s' and skipping test when the entry is missing
is completely reasonable. Other errors, however, may indicate a real issue
which is easy to miss. E.g. when 'hyperv_features' test was entering an
infinite loop the output was:
./hyperv_features
Testing access to Hyper-V specific MSRs
1..0 # SKIP - /dev/kvm not available (errno: 24)
and this can easily get overlooked.
Keep ENOENT case 'special' for skipping tests and fail when open() results
in any other errno.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Link: https://lore.kernel.org/r/20240129085847.2674082-2-vkuznets@redhat.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
When X86_FEATURE_INVTSC is missing, guest_test_msrs_access() was supposed
to skip testing dependent Hyper-V invariant TSC feature. Unfortunately,
'continue' does not lead to that as stage is not incremented. Moreover,
'vm' allocated with vm_create_with_one_vcpu() is not freed and the test
runs out of available file descriptors very quickly.
Fixes: bd827bd775 ("KVM: selftests: Test Hyper-V invariant TSC control")
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Link: https://lore.kernel.org/r/20240129085847.2674082-1-vkuznets@redhat.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Delete the AMX's tests "stage" counter, as the counter is no longer used,
which makes clang unhappy:
x86_64/amx_test.c:224:6: error: variable 'stage' set but not used
int stage, ret;
^
1 error generated.
Note, "stage" was never really used, it just happened to be dumped out by
a (failed) assertion on run->exit_reason, i.e. the AMX test has no concept
of stages, the code was likely copy+pasted from a different test.
Fixes: c96f57b080 ("KVM: selftests: Make vCPU exit reason test assertion common")
Reviewed-by: Jim Mattson <jmattson@google.com>
Link: https://lore.kernel.org/r/20240109220302.399296-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
In an entirely unrelated discussion where I pointed out a stupid thinko
of mine, Rasmus piped up and noted that that obvious mistake already
existed elsewhere in the kernel tree.
An "error pointer" is the negative error value encoded as a pointer,
making the whole "return error or valid pointer" use-case simple and
straightforward. We use it all over the kernel.
But the key here is that errors are _negative_ error numbers, not the
horrid UNIX user-level model of "-1 and the value of 'errno'".
The Apple mailbox driver used the positive error values, and thus just
returned invalid normal pointers instead of actual errors.
Of course, the reason nobody ever noticed is that the errors presumably
never actually happen, so this is fixing a conceptual bug rather than an
actual one.
Reported-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Link: https://lore.kernel.org/all/5c30afe0-f9fb-45d5-9333-dd914a1ea93a@prevas.dk/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The function kvmalloc_node limits the allocation size to INT_MAX. This
limit will be overflowed if dm-writecache attempts to map a device with
1TiB or larger length. This commit changes kvmalloc_array to vmalloc_array
to avoid the limit.
The commit also changes vmalloc(array_size()) to vmalloc_array().
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
The kvmalloc function fails with a warning if the size is larger than
INT_MAX. Linus said that there should be limits that prevent this warning
from being hit. This commit adds the limits to the dm-stats subsystem
in DM core.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
The kvmalloc function fails with a warning if the size is larger than
INT_MAX. The warning was triggered by a syscall testing robot.
In order to avoid the warning, this commit limits the number of targets to
1048576 and the size of the parameter area to 1073741824.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
If the external phy working together with phy-omap-usb2 does not implement
send_srp(), we may still attempt to call it. This can happen on an idle
Ethernet gadget triggering a wakeup for example:
configfs-gadget.g1 gadget.0: ECM Suspend
configfs-gadget.g1 gadget.0: Port suspended. Triggering wakeup
...
Unable to handle kernel NULL pointer dereference at virtual address
00000000 when execute
...
PC is at 0x0
LR is at musb_gadget_wakeup+0x1d4/0x254 [musb_hdrc]
...
musb_gadget_wakeup [musb_hdrc] from usb_gadget_wakeup+0x1c/0x3c [udc_core]
usb_gadget_wakeup [udc_core] from eth_start_xmit+0x3b0/0x3d4 [u_ether]
eth_start_xmit [u_ether] from dev_hard_start_xmit+0x94/0x24c
dev_hard_start_xmit from sch_direct_xmit+0x104/0x2e4
sch_direct_xmit from __dev_queue_xmit+0x334/0xd88
__dev_queue_xmit from arp_solicit+0xf0/0x268
arp_solicit from neigh_probe+0x54/0x7c
neigh_probe from __neigh_event_send+0x22c/0x47c
__neigh_event_send from neigh_resolve_output+0x14c/0x1c0
neigh_resolve_output from ip_finish_output2+0x1c8/0x628
ip_finish_output2 from ip_send_skb+0x40/0xd8
ip_send_skb from udp_send_skb+0x124/0x340
udp_send_skb from udp_sendmsg+0x780/0x984
udp_sendmsg from __sys_sendto+0xd8/0x158
__sys_sendto from ret_fast_syscall+0x0/0x58
Let's fix the issue by checking for send_srp() and set_vbus() before
calling them. For USB peripheral only cases these both could be NULL.
Fixes: 657b306a7b ("usb: phy: add a new driver for omap usb2 phy")
Signed-off-by: Tony Lindgren <tony@atomide.com>
Link: https://lore.kernel.org/r/20240128120556.8848-1-tony@atomide.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
The current exception handler implementation, which assists when accessing
user space memory, may exhibit random data corruption if the compiler decides
to use a different register than the specified register %r29 (defined in
ASM_EXCEPTIONTABLE_REG) for the error code. If the compiler choose another
register, the fault handler will nevertheless store -EFAULT into %r29 and thus
trash whatever this register is used for.
Looking at the assembly I found that this happens sometimes in emulate_ldd().
To solve the issue, the easiest solution would be if it somehow is
possible to tell the fault handler which register is used to hold the error
code. Using %0 or %1 in the inline assembly is not posssible as it will show
up as e.g. %r29 (with the "%r" prefix), which the GNU assembler can not
convert to an integer.
This patch takes another, better and more flexible approach:
We extend the __ex_table (which is out of the execution path) by one 32-word.
In this word we tell the compiler to insert the assembler instruction
"or %r0,%r0,%reg", where %reg references the register which the compiler
choosed for the error return code.
In case of an access failure, the fault handler finds the __ex_table entry and
can examine the opcode. The used register is encoded in the lowest 5 bits, and
the fault handler can then store -EFAULT into this register.
Since we extend the __ex_table to 3 words we can't use the BUILDTIME_TABLE_SORT
config option any longer.
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: <stable@vger.kernel.org> # v6.0+
The seccomp benchmark test makes a number of checks on the performance it
measures and logs them to the output but does so in a custom format which
none of the automated test runners understand meaning that the chances that
anyone is paying attention are slim. Let's additionally log each result in
KTAP format so that automated systems parsing the test output will see each
comparison as a test case. The original logs are left in place since they
provide the actual numbers for analysis.
As part of this rework the flow for the main program so that when we skip
tests we still log all the tests we skip, this is because the standard KTAP
headers and footers include counts of the number of expected and run tests.
Tested-by: Anders Roxell <anders.roxell@linaro.org>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
In preparation for trying to output the test results themselves in TAP
format rework all the prints in the benchmark to use the kselftest output
functions. The uses of system() all produce single line output so we can
avoid having to deal with fully managing the child process and continue to
use system() by simply printing an empty message before we invoke system().
We also leave one printf() used to complete a line of output in place.
Tested-by: Anders Roxell <anders.roxell@linaro.org>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
The livepatching kselftests rely on comparing expected vs. observed
dmesg output. After each test, new dmesg entries are determined by the
'comm' utility comparing a saved, pre-test copy of dmesg to post-test
dmesg output.
Alexander reports that the 'comm --nocheck-order -13' invocation used by
the tests can be confused when dmesg entry timestamps vary in magnitude
(ie, "[ 98.820331]" vs. "[ 100.031067]"), in which case, additional
messages are reported as new. The unexpected entries then spoil the
test results.
Instead of relying on 'comm' or 'diff' to determine new testing dmesg
entries, refactor the code:
- pre-test : log a unique canary dmesg entry
- test : run tests, log messages
- post-test : filter dmesg starting from pre-test message
Reported-by: Alexander Gordeev <agordeev@linux.ibm.com>
Closes: https://lore.kernel.org/live-patching/ZYAimyPYhxVA9wKg@li-008a6a4c-3549-11b2-a85c-c5cc2836eea2.ibm.com/
Signed-off-by: Joe Lawrence <joe.lawrence@redhat.com>
Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>
Tested-by: Marcos Paulo de Souza <mpdesouza@suse.com>
Reviewed-by: Marcos Paulo de Souza <mpdesouza@suse.com>
Acked-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Before this change, the expected size of the user space buffer was
taken from fx_sw->xstate_size. fx_sw->xstate_size can be changed
from user-space, so it is possible construct a sigreturn frame where:
* fx_sw->xstate_size is smaller than the size required by valid bits in
fx_sw->xfeatures.
* user-space unmaps parts of the sigrame fpu buffer so that not all of
the buffer required by xrstor is accessible.
In this case, xrstor tries to restore and accesses the unmapped area
which results in a fault. But fault_in_readable succeeds because buf +
fx_sw->xstate_size is within the still mapped area, so it goes back and
tries xrstor again. It will spin in this loop forever.
Instead, fault in the maximum size which can be touched by XRSTOR (taken
from fpstate->user_size).
[ dhansen: tweak subject / changelog ]
Fixes: fcb3635f50 ("x86/fpu/signal: Handle #PF in the direct restore path")
Reported-by: Konstantin Bogomolov <bogomolov@google.com>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrei Vagin <avagin@google.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc:stable@vger.kernel.org
Link: https://lore.kernel.org/all/20240130063603.3392627-1-avagin%40google.com
The device IMST USB-Stick for Smart Meter is a rebranded IMST iM871A-USB
Wireless M-Bus USB-adapter. It is used to read wireless water, gas and
electricity meters.
Signed-off-by: Leonard Dallmayr <leonard.dallmayr@mailbox.org>
Cc: stable@vger.kernel.org
Signed-off-by: Johan Hovold <johan@kernel.org>
To pick the changes from:
35e27a5744 ("fs: keep struct mnt_id_req extensible")
b4c2bea8ce ("add listmount(2) syscall")
46eae99ef7 ("add statmount(2) syscall")
That doesn't change anything in tools this time as nothing that is
harvested by the beauty scripts got changed:
$ ls -1 tools/perf/trace/beauty/*mount*sh
tools/perf/trace/beauty/fsmount.sh
tools/perf/trace/beauty/mount_flags.sh
tools/perf/trace/beauty/move_mount_flags.sh
$
This addresses this perf build warning.
Warning: Kernel ABI header differences:
diff -u tools/include/uapi/linux/mount.h include/uapi/linux/mount.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Miklos Szeredi <mszeredi@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/ZbkMiB7ZcOsLP2V5@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The 'Session topology' test currently fails with this message when
evlist__new_default() opens more than one event:
32: Session topology :
--- start ---
templ file: /tmp/perf-test-vv5YzZ
Using CPUID 0x00000000410fd070
Opening: unknown-hardware:HG
------------------------------------------------------------
perf_event_attr:
type 0 (PERF_TYPE_HARDWARE)
config 0xb00000000
disabled 1
------------------------------------------------------------
sys_perf_event_open: pid 0 cpu -1 group_fd -1 flags 0x8 = 4
Opening: unknown-hardware:HG
------------------------------------------------------------
perf_event_attr:
type 0 (PERF_TYPE_HARDWARE)
config 0xa00000000
disabled 1
------------------------------------------------------------
sys_perf_event_open: pid 0 cpu -1 group_fd -1 flags 0x8 = 5
non matching sample_type
FAILED tests/topology.c:73 can't get session
---- end ----
Session topology: FAILED!
This is because when re-opening the file and parsing the header, Perf
expects that any file that has more than one event has the sample ID
flag set. Perf record already sets the flag in a similar way when there
is more than one event, so add the same logic to evlist__new_default().
evlist__new_default() is only currently used in tests, so I don't
expect this change to have any other side effects. The other tests that
use it don't save and re-open the file so don't hit this issue.
The session topology test has been failing on Arm big.LITTLE platforms
since commit 251aa04024 ("perf parse-events: Wildcard most
"numeric" events") when evlist__new_default() started opening multiple
events for 'cycles'.
Fixes: 251aa04024 ("perf parse-events: Wildcard most "numeric" events")
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: James Clark <james.clark@arm.com>
[ This was failing as well on a Rocket Lake Refresh/14700k Intel hybrid system - Arnaldo ]
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Ian Rogers <irogers@google.com>
Tested-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yang Jihong <yangjihong1@huawei.com>
Closes: https://lore.kernel.org/lkml/CAP-5=fWVQ-7ijjK3-w1q+k2WYVNHbAcejb-xY0ptbjRw476VKA@mail.gmail.com/
Link: https://lore.kernel.org/r/20240124094358.489372-1-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick the changes from:
1e536e1068 ("x86/cpu: Detect TDX partial write machine check erratum")
765a0542fd ("x86/virt/tdx: Detect TDX during kernel boot")
30fa92832f ("x86/CPU/AMD: Add ZenX generations flags")
04c3024560 ("x86/barrier: Do not serialize MSR accesses on AMD")
This causes these perf files to be rebuilt and brings some X86_FEATURE
that will be used when updating the copies of
tools/arch/x86/lib/mem{cpy,set}_64.S with the kernel sources:
CC /tmp/build/perf/bench/mem-memcpy-x86-64-asm.o
CC /tmp/build/perf/bench/mem-memset-x86-64-asm.o
And addresses this perf build warning:
Warning: Kernel ABI header differences:
diff -u tools/arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/cpufeatures.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kai Huang <kai.huang@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Esben Haabendal says:
====================
net: stmmac: dwmac-imx: Time Based Scheduling support
This small patch series allows using TBS support of the i.MX Ethernet QOS
controller for etf qdisc offload.
It achieves this in a similar manner that it is done in dwmac-intel.c,
dwmac-mediatek.c and stmmac_pci.c.
Changes since v1:
- Simplified for loop by starting at index 1.
- Fixed problem with indentation.
====================
Link: https://lore.kernel.org/r/cover.1706256158.git.esben@geanix.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
TSO and TBS cannot coexist. For now we set i.MX Ethernet QOS controller to
use the first TX queue with TSO and the rest for TBS.
TX queues with TBS can support etf qdisc hw offload.
Signed-off-by: Esben Haabendal <esben@geanix.com>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
With the dma conf being reallocated on each call to stmmac_open(), any
information in there is lost, unless we specifically handle it.
The STMMAC_TBS_EN bit is set when adding an etf qdisc, and the etf qdisc
therefore would stop working when link was set down and then back up.
Fixes: ba39b344e9 ("net: ethernet: stmicro: stmmac: generate stmmac dma conf before open")
Cc: stable@vger.kernel.org
Signed-off-by: Esben Haabendal <esben@geanix.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Full LTO takes the '-mbranch-protection=none' passed to the compiler
when generating the dynamic shadow call stack patching code as a hint to
stop emitting PAC instructions altogether. (Thin LTO appears unaffected
by this)
Work around this by disabling LTO for the compilation unit, which
appears to convince the linker that it should still use PAC in the rest
of the kernel..
Fixes: 3b619e22c4 ("arm64: implement dynamic shadow call stack for Clang")
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Tested-by: Sami Tolvanen <samitolvanen@google.com>
Link: https://lore.kernel.org/r/20240123133052.1417449-6-ardb+git@google.com
Signed-off-by: Will Deacon <will@kernel.org>
This reverts commit 8c5a19cb17 ("arm64: scs: Work around full LTO
issue with dynamic SCS"), which did not quite fix the issue as intended.
Apparently, -fno-unwind-tables is ignored for the final full LTO link
when it is set on any of the objects, resulting in an early boot crash
due to the SCS patching code patching itself, and attempting to pop the
return address from the shadow stack while the associated push was still
a PACIASP instruction when it executed.
Reported-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Tested-by: Sami Tolvanen <samitolvanen@google.com>
Link: https://lore.kernel.org/r/20240123133052.1417449-5-ardb+git@google.com
Signed-off-by: Will Deacon <will@kernel.org>
On a parisc64 kernel I sometimes notice this kernel warning:
Kernel unaligned access to 0x40ff8814 at ndisc_send_skb+0xc0/0x4d8
The address 0x40ff8814 points to the in6addr_linklocal_allrouters
variable and the warning simply means that some ipv6 function tries to
read a 64-bit word directly from the not-64-bit aligned
in6addr_linklocal_allrouters variable.
Unaligned accesses are non-critical as the architecture or exception
handlers usually will fix it up at runtime. Nevertheless it may trigger
a performance penality for some architectures. For details read the
"unaligned-memory-access" kernel documentation.
The patch below ensures that the ipv6 loopback and router addresses will
always be naturally aligned. This prevents the unaligned accesses for
all architectures.
Signed-off-by: Helge Deller <deller@gmx.de>
Fixes: 034dfc5df9 ("ipv6: export in6addr_loopback to modules")
Acked-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/ZbNuFM1bFqoH-UoY@p100
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
I mistakenly turned off CONFIG_XFS_RT in the Kconfig file for arm64
variant of the djwong-wtf git branch. Unfortunately, it took me a good
hour to figure out that RT wasn't built because this is what got printed
to dmesg:
XFS (sda2): realtime geometry sanity check failed
XFS (sda2): Metadata corruption detected at xfs_sb_read_verify+0x170/0x190 [xfs], xfs_sb block 0x0
Whereas I would have expected:
XFS (sda2): Not built with CONFIG_XFS_RT
XFS (sda2): RT mount failed
The root cause of these problems is the conditional compilation of the
new functions xfs_validate_rtextents and xfs_compute_rextslog that I
introduced in the two commits listed below. The !RT versions of these
functions return false and 0, respectively, which causes primary
superblock validation to fail, which explains the first message.
Move the two functions to other parts of libxfs that are not
conditionally defined by CONFIG_XFS_RT and remove the broken stubs so
that validation works again.
Fixes: e14293803f ("xfs: don't allow overly small or large realtime volumes")
Fixes: a6a38f309a ("xfs: make rextslog computation consistent with mkfs")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
Pull jfs fix from David Kleikamp:
"Revert a bad sanity check"
* tag 'jfs-6.8-rc3' of github.com:kleikamp/linux-shaggy:
Revert "jfs: fix shift-out-of-bounds in dbJoin"
Pull tracing fixes from Steven Rostedt:
"Two small fixes for tracefs and eventfs:
- Fix register_snapshot_trigger() on allocation error
If the snapshot fails to allocate, the register_snapshot_trigger()
can still return success. If the call to
tracing_alloc_snapshot_instance() returned anything but 0, it
returned 0, but it should have been returning the error code from
that allocation function.
- Remove leftover code from tracefs doing a dentry walk on remount.
The update_gid() function was called by the tracefs code on remount
to update the gid of eventfs, but that is no longer the case, but
that code wasn't deleted. Nothing calls it. Remove it"
* tag 'trace-v6.8-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracefs: remove stale 'update_gid' code
tracing/trigger: Fix to return error if failed to alloc snapshot
Modern OSes use iptables implementation with nf_tables as a backend,
e.g.:
$ iptables -V
iptables v1.8.8 (nf_tables)
Pablo points out that we need CONFIG_NFT_COMPAT to make that work,
otherwise we see a lot of:
Warning: Extension DNAT revision 0 not supported, missing kernel module?
with DNAT being just an example here, other modules we need
include udp, TTL, length etc.
Link: https://lore.kernel.org/r/20240126201308.2903602-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When working with GPIO, its direction must be set either when the GPIO is
requested by gpiod_get*() or later on by one of the gpiod_direction_*()
functions. Neither of this is done here which results in undefined
behavior on some systems.
As the reset GPIO is used right after it is requested here, it makes sense
to configure it as GPIOD_OUT_HIGH right away. With that, the following
gpiod_set_value_cansleep(1) becomes redundant and can be safely
removed.
Fixes: a653f2f538 ("net: dsa: qca8k: introduce reset via gpio feature")
Signed-off-by: Michal Vokáč <michal.vokac@ysoft.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/1706266175-3408-1-git-send-email-michal.vokac@ysoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull misc fixes from Andrew Morton:
"22 hotfixes. 11 are cc:stable and the remainder address post-6.7
issues or aren't considered appropriate for backporting"
* tag 'mm-hotfixes-stable-2024-01-28-23-21' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (22 commits)
mm: thp_get_unmapped_area must honour topdown preference
mm: huge_memory: don't force huge page alignment on 32 bit
userfaultfd: fix mmap_changing checking in mfill_atomic_hugetlb
selftests/mm: ksm_tests should only MADV_HUGEPAGE valid memory
scs: add CONFIG_MMU dependency for vfree_atomic()
mm/memory: fix folio_set_dirty() vs. folio_mark_dirty() in zap_pte_range()
mm/huge_memory: fix folio_set_dirty() vs. folio_mark_dirty()
selftests/mm: Update va_high_addr_switch.sh to check CPU for la57 flag
selftests: mm: fix map_hugetlb failure on 64K page size systems
MAINTAINERS: supplement of zswap maintainers update
stackdepot: make fast paths lock-less again
stackdepot: add stats counters exported via debugfs
mm, kmsan: fix infinite recursion due to RCU critical section
mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), again
selftests/mm: switch to bash from sh
MAINTAINERS: add man-pages git trees
mm: memcontrol: don't throttle dying tasks on memory.high
mm: mmap: map MAP_STACK to VM_NOHUGEPAGE
uprobes: use pagesize-aligned virtual address when replacing pages
selftests/mm: mremap_test: fix build warning
...
Pull Samsung clk fixes from Krzysztof Kozlowski:
Google GS101: Correct the input clock names to CMU MISC clock controller
to match received review. The review was initially missed and CMU MISC
clock controller bindings, driver and DTS was merged into v6.8-rc1 with
different names. Nothing was released so far, so the bindings and
driver can be still corrected to match review.
* tag 'samsung-clk-fixes-6.8' of https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux:
clk: samsung: clk-gs101: comply with the new dt cmu_misc clock names
dt-bindings: clock: gs101: rename cmu_misc clock-names
The PCI AER model is an awkward fit for CXL error handling. While the
expectation is that a PCI device can escalate to link reset to recover
from an AER event, the same reset on CXL amounts to a surprise memory
hotplug of massive amounts of memory.
At present, the CXL error handler attempts some optimistic error
handling to unbind the device from the cxl_mem driver after reaping some
RAS register values. This results in a "hopeful" attempt to unplug the
memory, but there is no guarantee that will succeed.
A subsequent AER notification after the memdev unbind event can no
longer assume the registers are mapped. Check for memdev bind before
reaping status register values to avoid crashes of the form:
BUG: unable to handle page fault for address: ffa00000195e9100
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
[...]
RIP: 0010:__cxl_handle_ras+0x30/0x110 [cxl_core]
[...]
Call Trace:
<TASK>
? __die+0x24/0x70
? page_fault_oops+0x82/0x160
? kernelmode_fixup_or_oops+0x84/0x110
? exc_page_fault+0x113/0x170
? asm_exc_page_fault+0x26/0x30
? __pfx_dpc_reset_link+0x10/0x10
? __cxl_handle_ras+0x30/0x110 [cxl_core]
? find_cxl_port+0x59/0x80 [cxl_core]
cxl_handle_rp_ras+0xbc/0xd0 [cxl_core]
cxl_error_detected+0x6c/0xf0 [cxl_core]
report_error_detected+0xc7/0x1c0
pci_walk_bus+0x73/0x90
pcie_do_recovery+0x23f/0x330
Longer term, the unbind and PCI_ERS_RESULT_DISCONNECT behavior might
need to be replaced with a new PCI_ERS_RESULT_PANIC.
Fixes: 6ac07883db ("cxl/pci: Add RCH downstream port error logging")
Cc: stable@vger.kernel.org
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Li Ming <ming4.li@intel.com>
Link: https://lore.kernel.org/r/20240129131856.2458980-1-ming4.li@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
If we have multiple clients and some/all are flooding the receives to
such an extent that we can retry a LOT handling multishot receives, then
we can be starving some clients and hence serving traffic in an
imbalanced fashion.
Limit multishot retry attempts to some arbitrary value, whose only
purpose serves to ensure that we don't keep serving a single connection
for way too long. We default to 32 retries, which should be more than
enough to provide fairness, yet not so small that we'll spend too much
time requeuing rather than handling traffic.
Cc: stable@vger.kernel.org
Depends-on: 704ea888d6 ("io_uring/poll: add requeue return code from poll multishot handling")
Depends-on: 1e5d765a82f ("io_uring/net: un-indent mshot retry path in io_recv_finish()")
Depends-on: e84b01a880 ("io_uring/poll: move poll execution helpers higher up")
Fixes: b3fdea6ecb ("io_uring: multishot recv")
Fixes: 9bb66906f2 ("io_uring: support multishot in recvmsg")
Link: https://github.com/axboe/liburing/issues/1043
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Since our poll handling is edge triggered, multishot handlers retry
internally until they know that no more data is available. In
preparation for limiting these retries, add an internal return code,
IOU_REQUEUE, which can be used to inform the poll backend about the
handler wanting to retry, but that this should happen through a normal
task_work requeue rather than keep hammering on the issue side for this
one request.
No functional changes in this patch, nobody is using this return code
just yet.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
In preparation for putting some retry logic in there, have the done
path just skip straight to the end rather than have too much nesting
in here.
No functional changes in this patch.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
In preparation for calling __io_poll_execute() higher up, move the
functions to avoid forward declarations.
No functional changes in this patch.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Previous commit that added support for Huawei MateBook D16 2021
with Ryzen 4600H (HVY-WXX9 M1010) was incomplete.
To activate support for this laptop, the DMI table in
acp3x-es83xx machine driver must also be updated.
Fixes: b5338b1b90 ("ASoC: amd: acp: Add support for a new Huawei Matebook laptop")
Signed-off-by: Marian Postevca <posteuca@mutex.one>
Link: https://msgid.link/r/20240128172229.657142-1-posteuca@mutex.one
Signed-off-by: Mark Brown <broonie@kernel.org>
Merge series from Chen-Yu Tsai <wens@kernel.org>:
This series adds SPDIF controllers for the H616 and H618.
There's also a fix for SPDIF on H6: the controller also has a
receiver that was not correctly modeled.
response_msg is a pointer to an unsigned int (u32). So passing just
response_msg to sizeof would not print the size of the variable. To get
the size of response_msg we need to pass it as a pointer variable.
Fixes: ec5b0f1193 ("firmware: microchip: add PolarFire SoC Auto Update support")
Signed-off-by: Samasth Norway Ananda <samasth.norway.ananda@oracle.com>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Make sure the rpc timeout was assigned with the correct value for
initial timeout and max number of retries.
Fixes: 57331a59ac ("NFSv4.1: Use the nfs_client's rpc timeouts for backchannel")
Signed-off-by: Samasth Norway Ananda <samasth.norway.ananda@oracle.com>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
All error handling paths, except this one, go to 'out' where
release_swfw_sync() is called.
This call balances the acquire_swfw_sync() call done at the beginning of
the function.
Branch to the error handling path in order to correctly release some
resources in case of error.
Fixes: ae14a1d8e1 ("ixgbe: Fix IOSF SB access issues")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The e1000e driver supports hardware with a variety of different clock
speeds, and thus a variety of different increment values used for
programming its PTP hardware clock.
The values currently programmed in e1000e_ptp_init are incorrect. In
particular, only two maximum adjustments are used: 24000000 - 1, and
600000000 - 1. These were originally intended to be used with the 96 MHz
clock and the 25 MHz clock.
Both of these values are actually slightly too high. For the 96 MHz clock,
the actual maximum value that can safely be programmed is 23,999,938. For
the 25 MHz clock, the maximum value is 599,999,904.
Worse, several devices use a 24 MHz clock or a 38.4 MHz clock. These parts
are incorrectly assigned one of either the 24million or 600million values.
For the 24 MHz clock, this is not a significant issue: its current
increment value can support an adjustment up to 7billion in the positive
direction. However, the 38.4 KHz clock uses an increment value which can
only support up to 230,769,157 before it starts overflowing.
To understand where these values come from, consider that frequency
adjustments have the form of:
new_incval = base_incval + (base_incval * adjustment) / (unit of adjustment)
The maximum adjustment is reported in terms of parts per billion:
new_incval = base_incval + (base_incval * adjustment) / 1 billion
The largest possible adjustment is thus given by the following:
max_incval = base_incval + (base_incval * max_adj) / 1 billion
Re-arranging to solve for max_adj:
max_adj = (max_incval - base_incval) * 1 billion / base_incval
We also need to ensure that negative adjustments cannot underflow. This can
be achieved simply by ensuring max_adj is always less than 1 billion.
Introduce new macros in e1000.h codifying the maximum adjustment in PPB for
each frequency given its associated increment values. Also clarify where
these values come from by commenting about the above equations.
Replace the switch statement in e1000e_ptp_init with one which mirrors the
increment value switch statement from e1000e_get_base_timinica. For each
device, assign the appropriate maximum adjustment based on its frequency.
Some parts can have one of two frequency modes as determined by
E1000_TSYNCRXCTL_SYSCFI.
Since the new flow directly matches the assignments in
e1000e_get_base_timinca, and uses well defined macro names, it is much
easier to verify that the resulting maximum adjustments are correct. It
also avoids difficult to parse construction such as the "hw->mac.type <
e1000_phc_lpt", and the use of fallthrough which was especially confusing
when combined with a conditional block.
Note that I believe the current increment value configuration used for
24MHz clocks is sub-par, as it leaves at least 3 extra bits available in
the INCVALUE register. However, fixing that requires more careful review of
the clock rate and associated values.
Reported-by: Trey Harrison <harrisondigitalmedia@gmail.com>
Fixes: 68fe1d5da5 ("e1000e: Add Support for 38.4MHZ frequency")
Fixes: d89777bf0e ("e1000e: add support for IEEE-1588 PTP")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
When the H6 was added to the bindings, only the TX DMA channel was
added. As the hardware supports both transmit and receive functions,
the binding is missing the RX DMA channel and is thus incorrect.
Also, the reset control was not made mandatory.
Add the RX DMA channel for SPDIF on H6 by removing the compatible from
the list of compatibles that should only have a TX DMA channel. And add
the H6 compatible to the list of compatibles that require the reset
control to be present.
Fixes: b204530314 ("dt-bindings: sound: sun4i-spdif: Add Allwinner H6 compatible")
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Jernej Skrabec <jernej.skrabec@gmail.com>
Link: https://msgid.link/r/20240127163247.384439-2-wens@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Rework the NX hugepage test's skip message regarding the magic token to
provide all of the necessary magic, and to very explicitly recommended
using the wrapper shell script.
Opportunistically remove an overzealous newline; splitting the
recommendation message across two lines of ~45 characters makes it much
harder to read than running out a single line to 98 characters.
Link: https://lore.kernel.org/r/20231129224042.530798-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Many devices with a single alternate setting do not have a Valid
Alternate Setting Control and validation performed by
validate_sample_rate_table_v2v3() doesn't work on them and is not
really needed. So check the presense of control before sending
altsetting validation requests.
MOTU Microbook IIc is suffering the most without this check. It
takes up to 40 seconds to bootup due to how slow it switches
sampling rates:
[ 2659.164824] usb 3-2: New USB device found, idVendor=07fd, idProduct=0004, bcdDevice= 0.60
[ 2659.164827] usb 3-2: New USB device strings: Mfr=1, Product=2, SerialNumber=0
[ 2659.164829] usb 3-2: Product: MicroBook IIc
[ 2659.164830] usb 3-2: Manufacturer: MOTU
[ 2659.166204] usb 3-2: Found last interface = 3
[ 2679.322298] usb 3-2: No valid sample rate available for 1:1, assuming a firmware bug
[ 2679.322306] usb 3-2: 1:1: add audio endpoint 0x3
[ 2679.322321] usb 3-2: Creating new data endpoint #3
[ 2679.322552] usb 3-2: 1:1 Set sample rate 96000, clock 1
[ 2684.362250] usb 3-2: 2:1: cannot get freq (v2/v3): err -110
[ 2694.444700] usb 3-2: No valid sample rate available for 2:1, assuming a firmware bug
[ 2694.444707] usb 3-2: 2:1: add audio endpoint 0x84
[ 2694.444721] usb 3-2: Creating new data endpoint #84
[ 2699.482103] usb 3-2: 2:1 Set sample rate 96000, clock 1
Signed-off-by: Alexander Tsoy <alexander@tsoy.me>
Link: https://lore.kernel.org/r/20240129121254.3454481-1-alexander@tsoy.me
Signed-off-by: Takashi Iwai <tiwai@suse.de>
The ctrl->state value is updated in another thread using WRITE_ONCE, so
ensure all the readers use the appropriate accessor.
Reviewed-by: Sagi Grimberg <sagi@grmberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
This reverts commit cca974daeb.
The added sanity check is incorrect. BUDMIN is not the wrong value and
is too small.
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
TCP rx zerocopy intent is to map pages initially allocated
from NIC drivers, not pages owned by a fs.
This patch adds to can_map_frag() these additional checks:
- Page must not be a compound one.
- page->mapping must be NULL.
This fixes the panic reported by ZhangPeng.
syzbot was able to loopback packets built with sendfile(),
mapping pages owned by an ext4 file to TCP rx zerocopy.
r3 = socket$inet_tcp(0x2, 0x1, 0x0)
mmap(&(0x7f0000ff9000/0x4000)=nil, 0x4000, 0x0, 0x12, r3, 0x0)
r4 = socket$inet_tcp(0x2, 0x1, 0x0)
bind$inet(r4, &(0x7f0000000000)={0x2, 0x4e24, @multicast1}, 0x10)
connect$inet(r4, &(0x7f00000006c0)={0x2, 0x4e24, @empty}, 0x10)
r5 = openat$dir(0xffffffffffffff9c, &(0x7f00000000c0)='./file0\x00',
0x181e42, 0x0)
fallocate(r5, 0x0, 0x0, 0x85b8)
sendfile(r4, r5, 0x0, 0x8ba0)
getsockopt$inet_tcp_TCP_ZEROCOPY_RECEIVE(r4, 0x6, 0x23,
&(0x7f00000001c0)={&(0x7f0000ffb000/0x3000)=nil, 0x3000, 0x0, 0x0, 0x0,
0x0, 0x0, 0x0, 0x0}, &(0x7f0000000440)=0x40)
r6 = openat$dir(0xffffffffffffff9c, &(0x7f00000000c0)='./file0\x00',
0x181e42, 0x0)
Fixes: 93ab6cc691 ("tcp: implement mmap() for zero copy receive")
Link: https://lore.kernel.org/netdev/5106a58e-04da-372a-b836-9d3d0bd2507b@huawei.com/T/
Reported-and-bisected-by: ZhangPeng <zhangpeng362@huawei.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Arjun Roy <arjunroy@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: linux-mm@vger.kernel.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
rx_data_reassembly skb is stored during NCI data exchange for processing
fragmented packets. It is dropped only when the last fragment is processed
or when an NTF packet with NCI_OP_RF_DEACTIVATE_NTF opcode is received.
However, the NCI device may be deallocated before that which leads to skb
leak.
As by design the rx_data_reassembly skb is bound to the NCI device and
nothing prevents the device to be freed before the skb is processed in
some way and cleaned, free it on the NCI device cleanup.
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
Fixes: 6a2968aaf5 ("NFC: basic NCI protocol implementation")
Cc: stable@vger.kernel.org
Reported-by: syzbot+6b7c68d9c21e4ee4251b@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/lkml/000000000000f43987060043da7b@google.com/
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Syzkaller reported [1] hitting a warning after failing to allocate
resources for skb in hsr_init_skb(). Since a WARN_ONCE() call will
not help much in this case, it might be prudent to switch to
netdev_warn_once(). At the very least it will suppress syzkaller
reports such as [1].
Just in case, use netdev_warn_once() in send_prp_supervision_frame()
for similar reasons.
[1]
HSR: Could not send supervision frame
WARNING: CPU: 1 PID: 85 at net/hsr/hsr_device.c:294 send_hsr_supervision_frame+0x60a/0x810 net/hsr/hsr_device.c:294
RIP: 0010:send_hsr_supervision_frame+0x60a/0x810 net/hsr/hsr_device.c:294
...
Call Trace:
<IRQ>
hsr_announce+0x114/0x370 net/hsr/hsr_device.c:382
call_timer_fn+0x193/0x590 kernel/time/timer.c:1700
expire_timers kernel/time/timer.c:1751 [inline]
__run_timers+0x764/0xb20 kernel/time/timer.c:2022
run_timer_softirq+0x58/0xd0 kernel/time/timer.c:2035
__do_softirq+0x21a/0x8de kernel/softirq.c:553
invoke_softirq kernel/softirq.c:427 [inline]
__irq_exit_rcu kernel/softirq.c:632 [inline]
irq_exit_rcu+0xb7/0x120 kernel/softirq.c:644
sysvec_apic_timer_interrupt+0x95/0xb0 arch/x86/kernel/apic/apic.c:1076
</IRQ>
<TASK>
asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:649
...
This issue is also found in older kernels (at least up to 5.10).
Cc: stable@vger.kernel.org
Reported-by: syzbot+3ae0a3f42c84074b7c8e@syzkaller.appspotmail.com
Fixes: 121c33b07b ("net: hsr: introduce common code for skb initialization")
Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
During memory error injection test on kernels >= v6.4, the kernel panics
like below. However, this issue couldn't be reproduced on kernels <= v6.3.
mce: [Hardware Error]: CPU 296: Machine Check Exception: f Bank 1: bd80000000100134
mce: [Hardware Error]: RIP 10:<ffffffff821b9776> {__get_user_nocheck_4+0x6/0x20}
mce: [Hardware Error]: TSC 411a93533ed ADDR 346a8730040 MISC 86
mce: [Hardware Error]: PROCESSOR 0:a06d0 TIME 1706000767 SOCKET 1 APIC 211 microcode 80001490
mce: [Hardware Error]: Run the above through 'mcelog --ascii'
mce: [Hardware Error]: Machine check: Data load in unrecoverable area of kernel
Kernel panic - not syncing: Fatal local machine check
The MCA code can recover from an in-kernel #MC if the fixup type is
EX_TYPE_UACCESS, explicitly indicating that the kernel is attempting to
access userspace memory. However, if the fixup type is EX_TYPE_DEFAULT
the only thing that is raised for an in-kernel #MC is a panic.
ex_handler_uaccess() would warn if users gave a non-canonical addresses
(with bit 63 clear) to {get, put}_user(), which was unexpected.
Therefore, commit
b19b74bc99 ("x86/mm: Rework address range check in get_user() and put_user()")
replaced _ASM_EXTABLE_UA() with _ASM_EXTABLE() for {get, put}_user()
fixups. However, the new fixup type EX_TYPE_DEFAULT results in a panic.
Commit
6014bc2756 ("x86-64: make access_ok() independent of LAM")
added the check gp_fault_address_ok() right before the WARN_ONCE() in
ex_handler_uaccess() to not warn about non-canonical user addresses due
to LAM.
With that in place, revert back to _ASM_EXTABLE_UA() for {get,put}_user()
exception fixups in order to be able to handle in-kernel MCEs correctly
again.
[ bp: Massage commit message. ]
Fixes: b19b74bc99 ("x86/mm: Rework address range check in get_user() and put_user()")
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: <stable@kernel.org>
Link: https://lore.kernel.org/r/20240129063842.61584-1-qiuxu.zhuo@intel.com
The size passed to snprintf() includes the space for the trailing space.
So there is no reason here not to use all the available space.
So remove the -1 when computing 'name_len'.
While at it, use the size of the array directly instead of the intermediate
'name_len' variable.
snprintf() also guaranties that the buffer if NULL terminated, so there is
no need to write an additional trailing NULL "To be sure".
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
While ntfs3 supports discards, FITRIM ioctl() command has defined
only for regular files. This may confuse users trying to invoke
`fstrim` utility with the directory argument (for example, call
`fstrim <mountpoint>` which is the common practice). In this case,
ioctl() returns -ENOTTY without any error messages in kernel ring
buffer, this may be easily interpreted as no support for discards
in ntfs3 driver.
Currently only FITRIM command implemented in ntfs_ioctl() and
passed inode used only for dereferencing NTFS superblock, so
no need for separate ioctl() handler for directories, just add
existing ntfs_ioctl() handler to ntfs_dir_operations.
Signed-off-by: Nekun <nekokun@firemail.cc>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
The issue here is when this is called from ntfs_load_attr_list(). The
"size" comes from le32_to_cpu(attr->res.data_size) so it can't overflow
on a 64bit systems but on 32bit systems the "+ 1023" can overflow and
the result is zero. This means that the kmalloc will succeed by
returning the ZERO_SIZE_PTR and then the memcpy() will crash with an
Oops on the next line.
Fixes: be71b5cba2 ("fs/ntfs3: Add attrib operations")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
local_flush_tlb_range_asid() takes the size as argument, not the end of
the range to flush, so fix this by computing the size from the end and
the start of the range.
Fixes: 7a92fc8b4d ("mm: Introduce flush_cache_vmap_early()")
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Dennis Zhou <dennis@kernel.org>
In XFS_DAS_NODE_REMOVE_ATTR case, xfs_attr_mode_remove_attr() sets
filter to XFS_ATTR_INCOMPLETE. The filter is then reset in
xfs_attr_complete_op() if XFS_DA_OP_REPLACE operation is performed.
The filter is not reset though if XFS just removes the attribute
(args->value == NULL) with xfs_attr_defer_remove(). attr code goes
to XFS_DAS_DONE state.
Fix this by always resetting XFS_ATTR_INCOMPLETE filter. The replace
operation already resets this filter in anyway and others are
completed at this step hence don't need it.
Fixes: fdaf1bb3ca ("xfs: ATTR_REPLACE algorithm with LARP enabled needs rework")
Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
This reverts commit 67794f882a.
We need to explicitly set up the clock selector to workaround a problem
with the Behringer mixers. This was originally done in d2e8f64125
("ALSA: usb-audio: Explicitly set up the clock selector")
The problem with MOTU M Series mentioned in commit message was fixed in
a different way by checking control capabilities of clock selectors.
Signed-off-by: Alexander Tsoy <alexander@tsoy.me>
Link: https://lore.kernel.org/r/20240128132338.819273-1-alexander@tsoy.me
Signed-off-by: Takashi Iwai <tiwai@suse.de>
The bit 23, CM TBT3 Not Supported (CNS), in ROUTER_CS_5 indicates
whether a USB4 Connection Manager is TBT3-Compatible and should be:
0b for TBT3-Compatible
1b for Not TBT3-Compatible
Fixes: b04079837b ("thunderbolt: Add initial support for USB4")
Cc: stable@vger.kernel.org
Signed-off-by: Mohammad Rahimi <rahimi.mhmmd@gmail.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Since the buffer cache for ntfs3 metadata is not released until the file
system is unmounted, allocating from the movable zone may result in cma
allocation failures. This is due to the page still being used by ntfs3,
leading to migration failures.
To address this, this commit use sb_bread_umovable() instead of
sb_bread(). This change prevents allocation from the movable zone,
ensuring compatibility with scenarios where the buffer head is not
released until unmount. This patch is inspired by commit
a8ac900b8163("ext4: use non-movable memory for the ext4 superblock").
The issue is found when playing video files stored in NTFS on the
Android TV platform. During this process, the media parser reads the
video file, causing ntfs3 to allocate buffer cache from the CMA area.
Subsequently, the hardware decoder attempts to allocate memory from the
same CMA area. However, the page is still in use by ntfs3, resulting in
a migrate failure in alloc_contig_range().
The pinned page and allocating stacktrace reported by page owner shows
below:
page:ffffffff00b68880 refcount:3 mapcount:0 mapping:ffffff80046aa828
index:0xc0040 pfn:0x20fa4
aops:def_blk_aops ino:0
flags: 0x2020(active|private)
page dumped because: migration failure
page last allocated via order 0, migratetype Movable,
gfp_mask 0x108c48
(GFP_NOFS|__GFP_NOFAIL|__GFP_HARDWALL|__GFP_MOVABLE),
page_owner tracks the page as allocated
prep_new_page
get_page_from_freelist
__alloc_pages_nodemask
pagecache_get_page
__getblk_gfp
__bread_gfp
ntfs_read_run_nb
ntfs_read_bh
mi_read
ntfs_iget5
dir_search_u
ntfs_lookup
__lookup_slow
lookup_slow
walk_component
path_lookupat
Signed-off-by: Ism Hong <ism.hong@gmail.com>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
io_read_mshot() always relies on poll triggering retries, and this works
fine as long as we do a retry per size of the buffer being read. The
buffer size is given by the size of the buffer(s) in the given buffer
group ID.
But if we're reading less than what is available, then we don't always
get to read everything that is available. For example, if the buffers
available are 32 bytes and we have 64 bytes to read, then we'll
correctly read the first 32 bytes and then wait for another poll trigger
before we attempt the next read. This next poll trigger may never
happen, in which case we just sit forever and never make progress, or it
may trigger at some point in the future, and now we're just delivering
the available data much later than we should have.
io_read_mshot() could do retries itself, but that is wasteful as we'll
be going through all of __io_read() again, and most likely in vain.
Rather than do that, bump our poll reference count and have
io_poll_check_events() do one more loop and check with vfs_poll() if we
have more data to read. If we do, io_read_mshot() will get invoked again
directly and we'll read the next chunk.
io_poll_multishot_retry() must only get called from inside
io_poll_issue(), which is our multishot retry handler, as we know we
already "own" the request at this point.
Cc: stable@vger.kernel.org
Link: https://github.com/axboe/liburing/issues/1041
Fixes: fc68fcda04 ("io_uring/rw: add support for IORING_OP_READ_MULTISHOT")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Parent dir is locked by user_path_locked_at() before validating the
required dentry. It should be unlocked if we can not perform the
deletion.
This fixes the problem:
$ bcachefs subvolume delete not-exist-entry
BCH_IOCTL_SUBVOLUME_DESTROY ioctl error: No such file or directory
$ bcachefs subvolume delete not-exist-entry
the second will stuck because the parent dir is locked in the previous
deletion.
Signed-off-by: Guoyu Ou <benogy@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
The gcc compiler on paric does support the __int128 type, although the
architecture does not have native 128-bit support.
The effect is, that the bcachefs u128_square() function will pull in the
libgcc __multi3() helper, which breaks the kernel build when bcachefs is
built as module since this function isn't currently exported in
arch/parisc/kernel/parisc_ksyms.c.
The build failure can be seen in the latest debian kernel build at:
https://buildd.debian.org/status/fetch.php?pkg=linux&arch=hppa&ver=6.7.1-1%7Eexp1&stamp=1706132569&raw=0
We prefer to not export that symbol, so fall back to the optional 64-bit
implementation provided by bcachefs and thus avoid usage of __multi3().
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Pull cxl fixes from Dan Williams:
"A build regression fix, a device compatibility fix, and an original
bug preventing creation of large (16 device) interleave sets:
- Fix unit test build regression fallout from global
"missing-prototypes" change
- Fix compatibility with devices that do not support interrupts
- Fix overflow when calculating the capacity of large interleave sets"
* tag 'cxl-fixes-6.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
cxl/region:Fix overflow issue in alloc_hpa()
cxl/pci: Skip irq features if MSI/MSI-X are not supported
tools/testing/nvdimm: Disable "missing prototypes / declarations" warnings
tools/testing/cxl: Disable "missing prototypes / declarations" warnings
Pull locking fix from Borislav Petkov:
- Prevent an inconsistent futex operation leading to stale state
exposure
* tag 'locking_urgent_for_v6.8_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
futex: Prevent the reuse of stale pi_state
Pull irq fix from Borislav Petkov:
- Initialize the resend node of each IRQ descriptor, not only the first
one
* tag 'irq_urgent_for_v6.8_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq: Initialize resend_node hlist for all interrupt descriptors
Pull timer fixes from Borislav Petkov:
- Preserve the number of idle calls and sleep entries across CPU
hotplug events in order to be able to compute correct averages
- Limit the duration of the clocksource watchdog checking interval as
too long intervals lead to wrongly marking the TSC as unstable
* tag 'timers_urgent_for_v6.8_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tick/sched: Preserve number of idle sleeps across CPU hotplug events
clocksource: Skip watchdog check for large watchdog intervals
Pull x86 fixes from Borislav Petkov:
- Make sure 32-bit syscall registers are properly sign-extended
- Add detection for AMD's Zen5 generation CPUs and Intel's Clearwater
Forest CPU model number
- Make a stub function export non-GPL because it is part of the
paravirt alternatives and that can be used by non-GPL code
* tag 'x86_urgent_for_v6.8_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/CPU/AMD: Add more models to X86_FEATURE_ZEN5
x86/entry/ia32: Ensure s32 is sign extended to s64
x86/cpu: Add model number for Intel Clearwater Forest processor
x86/CPU/AMD: Add X86_FEATURE_ZEN5
x86/paravirt: Make BUG_func() usable by non-GPL modules
Pull memblock fix from Mike Rapoport:
"Fix crash when reserved memory is not added to memory.
When CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled, the initialization
of reserved pages may cause access of NODE_DATA() with invalid nid and
crash.
Add a fall back to early_pfn_to_nid() in memmap_init_reserved_pages()
to ensure a valid node id is always passed to init_reserved_page()"
* tag 'fixes-2024-01-28' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
memblock: fix crash when reserved memory is not added to memory
Printing the inventory on a serial console can be quite slow and thus may
trigger the hung task detector (CONFIG_DETECT_HUNG_TASK=y) and possibly
reboot the machine. Adding a cond_resched() prevents this.
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: <stable@vger.kernel.org> # v6.0+
Report if the calculated cache stride size is zero, otherwise the cache
flushing routine will never finish and hang the machine.
This can be reproduced with a testcase in qemu, where the firmware reports
wrong cache values.
Signed-off-by: Helge Deller <deller@gmx.de>
The rodata_test program for CONFIG_DEBUG_RODATA_TEST=y complains if
read-only data does not start at page boundary.
Signed-off-by: Helge Deller <deller@gmx.de>
In uart_tiocmget():
result = uport->mctrl;
uart_port_lock_irq(uport);
result |= uport->ops->get_mctrl(uport);
uart_port_unlock_irq(uport);
...
return result;
In uart_update_mctrl():
uart_port_lock_irqsave(port, &flags);
...
port->mctrl = (old & ~clear) | set;
...
port->ops->set_mctrl(port, port->mctrl);
...
uart_port_unlock_irqrestore(port, flags);
An atomicity violation is identified due to the concurrent execution of
uart_tiocmget() and uart_update_mctrl(). After assigning
result = uport->mctrl, the mctrl value may change in uart_update_mctrl(),
leading to a mismatch between the value returned by
uport->ops->get_mctrl(uport) and the mctrl value previously read.
This can result in uart_tiocmget() returning an incorrect value.
This possible bug is found by an experimental static analysis tool
developed by our team, BassCheck[1]. This tool analyzes the locking APIs
to extract function pairs that can be concurrently executed, and then
analyzes the instructions in the paired functions to identify possible
concurrency bugs including data races and atomicity violations. The above
possible bug is reported when our tool analyzes the source code of
Linux 5.17.
To address this issue, it is suggested to move the line
result = uport->mctrl inside the uart_port_lock block to ensure atomicity
and prevent the mctrl value from being altered during the execution of
uart_tiocmget(). With this patch applied, our tool no longer reports the
bug, with the kernel configuration allyesconfig for x86_64. Due to the
absence of the requisite hardware, we are unable to conduct runtime
testing of the patch. Therefore, our verification is solely based on code
logic analysis.
[1] https://sites.google.com/view/basscheck/
Fixes: c5f4644e6c ("[PATCH] Serial: Adjust serial locking")
Cc: stable@vger.kernel.org
Signed-off-by: Gui-Dong Han <2045gemini@gmail.com>
Link: https://lore.kernel.org/r/20240112113624.17048-1-2045gemini@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
These > comparisons should be >= to prevent writing one element beyond
the end of the rx_buff[] array. The rx_buff[] buffer has RX_BUF_SIZE
elements. Fix the buffer overflow.
Fixes: aba8290f36 ("8250: microchip: pci1xxxx: Add Burst mode reception support in uart driver for writing into FIFO")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Link: https://lore.kernel.org/r/ZZ7vIfj7Jgh-pJn8@moroto
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If the PD is disabled for the port, port->pds will be left as NULL,
which causes the following crash during caps intilisation. Fix the
crash.
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
Call trace:
tcpm_register_port+0xaec/0xc44
qcom_pmic_typec_probe+0x1a4/0x254
platform_probe+0x68/0xc0
really_probe+0x148/0x2ac
__driver_probe_device+0x78/0x12c
driver_probe_device+0xd8/0x160
Bluetooth: hci0: QCA Product ID :0x0000000a
__device_attach_driver+0xb8/0x138
bus_for_each_drv+0x80/0xdc
Bluetooth: hci0: QCA SOC Version :0x40020150
__device_attach+0x9c/0x188
device_initial_probe+0x14/0x20
bus_probe_device+0xac/0xb0
deferred_probe_work_func+0x8c/0xc8
process_one_work+0x1ec/0x51c
worker_thread+0x1ec/0x3e4
kthread+0x120/0x124
ret_from_fork+0x10/0x20
Fixes: cd099cde4e ("usb: typec: tcpm: Support multiple capabilities")
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Link: https://lore.kernel.org/r/20240113-pmi632-typec-v2-5-182d9aa0a5b3@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The PPM on some Dell laptops seems to expect that the ACK_CC_CI
command to clear the connector change notification is in turn
followed by another ACK_CC_CI to acknowledge the ACK_CC_CI command
itself. This is in violation of the UCSI spec that states:
"The only notification that is not acknowledged by the OPM is
the command completion notification for the ACK_CC_CI or the
PPM_RESET command."
Add a quirk to send this ack anyway.
Apply the quirk to all Dell systems.
On the first command that acks a connector change send a dummy
command to determine if it runs into a timeout. Only activate
the quirk if it does. This ensure that we do not break Dell
systems that do not need the quirk.
Signed-off-by: "Christian A. Ehrhardt" <lk@c--e.de>
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Link: https://lore.kernel.org/r/20240121204123.275441-4-lk@c--e.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In case of a spurious or otherwise delayed notification it is
possible that CCI still reports the previous completion. The
UCSI spec is aware of this and provides two completion bits in
CCI, one for normal commands and one for acks. As acks and commands
alternate the notification handler can determine if the completion
bit is from the current command.
The initial UCSI code correctly handled this but the distinction
between the two completion bits was lost with the introduction of
the new API.
To fix this revive the ACK_PENDING bit for ucsi_acpi and only complete
commands if the completion bit matches.
Fixes: f56de278e8 ("usb: typec: ucsi: acpi: Move to the new API")
Cc: stable@vger.kernel.org
Signed-off-by: "Christian A. Ehrhardt" <lk@c--e.de>
Acked-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Link: https://lore.kernel.org/r/20240121204123.275441-3-lk@c--e.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Calling ->sync_write must be done while holding the PPM lock as
the mailbox logic does not support concurrent commands.
At least since the addition of partner task this means that
ucsi_acknowledge_connector_change should be called with the
PPM lock held as it calls ->sync_write.
Thus protect the only call to ucsi_acknowledge_connector_change
with the PPM. All other calls to ->sync_write already happen
under the PPM lock.
Fixes: b9aa02ca39 ("usb: typec: ucsi: Add polling mechanism for partner tasks like alt mode checking")
Cc: stable@vger.kernel.org
Signed-off-by: "Christian A. Ehrhardt" <lk@c--e.de>
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Link: https://lore.kernel.org/r/20240121204123.275441-2-lk@c--e.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This reverts commit 1e35f07439.
Given that ERROR_RECOVERY calls into PORT_RESET for Hi-Zing
the CC pins, setting CC pins to default state during PORT_RESET
breaks error recovery.
4.5.2.2.2.1 ErrorRecovery State Requirements
The port shall not drive VBUS or VCONN, and shall present a
high-impedance to ground (above zOPEN) on its CC1 and CC2 pins.
Hi-Zing the CC pins is the inteded behavior for PORT_RESET.
CC pins are set to default state after tErrorRecovery in
PORT_RESET_WAIT_OFF.
4.5.2.2.2.2 Exiting From ErrorRecovery State
A Sink shall transition to Unattached.SNK after tErrorRecovery.
A Source shall transition to Unattached.SRC after tErrorRecovery.
Cc: stable@vger.kernel.org
Cc: Frank Wang <frank.wang@rock-chips.com>
Fixes: 1e35f07439 ("usb: typec: tcpm: fix cc role at port reset")
Signed-off-by: Badhri Jagan Sridharan <badhri@google.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/20240117114742.2587779-1-badhri@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When write UDC to empty and unbind gadget driver from gadget device, it is
possible that there are many queue failures for mass storage function.
The root cause is mass storage main thread alaways try to queue request to
receive a command from host if running flag is on, on platform like dwc3,
if pull down called, it will not queue request again and return
-ESHUTDOWN, but it not affect running flag of mass storage function.
Check return code from mass storage function and clear running flag if it
is -ESHUTDOWN, also indicate start in/out transfer failure to break loops.
Cc: stable <stable@kernel.org>
Signed-off-by: yuan linyu <yuanlinyu@hihonor.com>
Reviewed-by: Alan Stern <stern@rowland.harvard.edu>
Link: https://lore.kernel.org/r/20240123034829.3848409-1-yuanlinyu@hihonor.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The OTG 1.3 spec has the feature A_ALT_HNP_SUPPORT, which tells
a device that it is connected to the wrong port. Some devices
refuse to operate if you enable that feature, because it indicates
to them that they ought to request to be connected to another port.
According to the spec this feature may be used based only the following
three conditions:
6.5.3 a_alt_hnp_support
Setting this feature indicates to the B-device that it is connected to
an A-device port that is not capable of HNP, but that the A-device does
have an alternate port that is capable of HNP.
The A-device is required to set this feature under the following conditions:
• the A-device has multiple receptacles
• the A-device port that connects to the B-device does not support HNP
• the A-device has another port that does support HNP
A check for the third and first condition is missing. Add it.
Signed-off-by: Oliver Neukum <oneukum@suse.com>
Cc: stable <stable@kernel.org>
Fixes: 7d2d641c44 ("usb: otg: don't set a_alt_hnp_support feature for OTG 2.0 device")
Link: https://lore.kernel.org/r/20240122153545.12284-1-oneukum@suse.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When power is recycled in usb controller during system power management,
the controller will recognize it and switch role if role has been changed
during power lost. In current design, it will be completed in resume()
function. However, this may bring issues since usb class devices have
their pm operations too and these device's resume() functions are still
not being called at this point. When usb controller recognized host role
should be stopped, these usb class devices will be removed at this point.
But these usb class devices can't be removed in some cases, such as scsi
devices. Since scsi driver may sync data to U-disk, however it will block
there because scsi drvier can only handle pm request when is in suspended
state. Therefore, there may exist a dependency between ci_resume() and usb
class device's resume(). To break this potential dependency, we need to
handle power lost work in a workqueue.
Fixes: 74494b3321 ("usb: chipidea: core: add controller resume support when controller is powered off")
cc: stable@vger.kernel.org
Signed-off-by: Xu Yang <xu.yang_2@nxp.com>
Link: https://lore.kernel.org/r/20240119123537.3614838-1-xu.yang_2@nxp.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In current scenario if Plug-out and Plug-In performed continuously
there could be a chance while checking for dwc->gadget_driver in
dwc3_gadget_suspend, a NULL pointer dereference may occur.
Call Stack:
CPU1: CPU2:
gadget_unbind_driver dwc3_suspend_common
dwc3_gadget_stop dwc3_gadget_suspend
dwc3_disconnect_gadget
CPU1 basically clears the variable and CPU2 checks the variable.
Consider CPU1 is running and right before gadget_driver is cleared
and in parallel CPU2 executes dwc3_gadget_suspend where it finds
dwc->gadget_driver which is not NULL and resumes execution and then
CPU1 completes execution. CPU2 executes dwc3_disconnect_gadget where
it checks dwc->gadget_driver is already NULL because of which the
NULL pointer deference occur.
Cc: stable@vger.kernel.org
Fixes: 9772b47a4c ("usb: dwc3: gadget: Fix suspend/resume during device mode")
Acked-by: Thinh Nguyen <Thinh.Nguyen@synopsys.com>
Signed-off-by: Uttkarsh Aggarwal <quic_uaggarwa@quicinc.com>
Link: https://lore.kernel.org/r/20240119094825.26530-1-quic_uaggarwa@quicinc.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
xHCI 4.9 explicitly forbids assuming that the xHC has released its
ownership of a multi-TRB TD when it reports an error on one of the
early TRBs. Yet the driver makes such assumption and releases the TD,
allowing the remaining TRBs to be freed or overwritten by new TDs.
The xHC should also report completion of the final TRB due to its IOC
flag being set by us, regardless of prior errors. This event cannot
be recognized if the TD has already been freed earlier, resulting in
"Transfer event TRB DMA ptr not part of current TD" error message.
Fix this by reusing the logic for processing isoc Transaction Errors.
This also handles hosts which fail to report the final completion.
Fix transfer length reporting on Babble errors. They may be caused by
device malfunction, no guarantee that the buffer has been filled.
Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20240125152737.2983959-5-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The last TRB of a isoc TD might not trigger an event if there was
an error event for a TRB mid TD. This is seen on a NEC Corporation
uPD720200 USB 3.0 Host
After an error mid a multi-TRB TD the xHC should according to xhci 4.9.1
generate events for passed TRBs with IOC flag set if it proceeds to the
next TD. This event is either a copy of the original error, or a
"success" transfer event.
If that event is missing then the driver and xHC host get out of sync as
the driver is still expecting a transfer event for that first TD, while
xHC host is already sending events for the next TD in the list.
This leads to
"Transfer event TRB DMA ptr not part of current TD" messages.
As a solution we tag the isoc TDs that get error events mid TD.
If an event doesn't match the first TD, then check if the tag is
set, and event points to the next TD.
In that case give back the fist TD and process the next TD normally
Make sure TD status and transferred length stay valid in both cases
with and without final TD completion event.
Reported-by: Michał Pecio <michal.pecio@gmail.com>
Closes: https://lore.kernel.org/linux-usb/20240112235205.1259f60c@foxbook/
Tested-by: Michał Pecio <michal.pecio@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20240125152737.2983959-4-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Recent commit [1] added support for changing max segment size of the NCM
interface via configfs. But the value of segment size value stored in
ncm_opts need to be converted to little endian before saving it in
ecm_desc. Also while initialising the value of segment size in opts during
instance allocation, the value ETH_FRAME_LEN needs to be assigned directly
without any conversion as ETH_FRAME_LEN and the variable max_segment_size
are native endian. The current implementaion modifies it into little endian
thus breaking things for big endian targets.
Fix endianness while assigning these variables.
While at it, fix up some stray spaces in comments added in code.
[1]: https://lore.kernel.org/all/20231221153216.18657-1-quic_kriskura@quicinc.com/
Fixes: 1900daeefd ("usb: gadget: ncm: Add support to update wMaxSegmentSize via configfs")
Signed-off-by: Krishna Kurapati <quic_kriskura@quicinc.com>
Reviewed-by: Maciej Żenczykowski <maze@google.com>
Link: https://lore.kernel.org/r/20240118154910.8765-1-quic_kriskura@quicinc.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Upstream commit bac1ec5514 ("usb: xhci: Set quirk for
XHCI_SG_TRB_CACHE_SIZE_QUIRK") introduced a new quirk in XHCI
which fixes XHC timeout, which was seen on synopsys XHCs while
using SG buffers. Currently this quirk can only be set using
xhci private data. But there are some drivers like dwc3/host.c
which adds adds quirks using software node for xhci device.
Hence set this xhci quirk by iterating over device properties.
Cc: stable@vger.kernel.org # 5.11
Fixes: bac1ec5514 ("usb: xhci: Set quirk for XHCI_SG_TRB_CACHE_SIZE_QUIRK")
Signed-off-by: Prashanth K <quic_prashk@quicinc.com>
Link: https://lore.kernel.org/r/20240116055816.1169821-3-quic_prashk@quicinc.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Upstream commit bac1ec5514 ("usb: xhci: Set quirk for
XHCI_SG_TRB_CACHE_SIZE_QUIRK") introduced a new quirk in XHCI
which fixes XHC timeout, which was seen on synopsys XHCs while
using SG buffers. But the support for this quirk isn't present
in the DWC3 layer.
We will encounter this XHCI timeout/hung issue if we run iperf
loopback tests using RTL8156 ethernet adaptor on DWC3 targets
with scatter-gather enabled. This gets resolved after enabling
the XHCI_SG_TRB_CACHE_SIZE_QUIRK. This patch enables it using
the xhci device property since its needed for DWC3 controller.
In Synopsys DWC3 databook,
Table 9-3: xHCI Debug Capability Limitations
Chained TRBs greater than TRB cache size: The debug capability
driver must not create a multi-TRB TD that describes smaller
than a 1K packet that spreads across 8 or more TRBs on either
the IN TR or the OUT TR.
Cc: stable@vger.kernel.org #5.11
Signed-off-by: Prashanth K <quic_prashk@quicinc.com>
Acked-by: Thinh Nguyen <Thinh.Nguyen@synopsys.com>
Link: https://lore.kernel.org/r/20240116055816.1169821-2-quic_prashk@quicinc.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pull x86 platform driver fixes from Hans de Goede:
- WMI bus driver fixes
- Second attempt (previously reverted) at P2SB PCI rescan deadlock fix
- AMD PMF driver improvements
- MAINTAINERS updates
- Misc other small fixes and hw-id additions
* tag 'platform-drivers-x86-v6.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/x86: touchscreen_dmi: Add info for the TECLAST X16 Plus tablet
platform/x86/intel/ifs: Call release_firmware() when handling errors.
platform/x86/amd/pmf: Fix memory leak in amd_pmf_get_pb_data()
platform/x86/amd/pmf: Get ambient light information from AMD SFH driver
platform/x86/amd/pmf: Get Human presence information from AMD SFH driver
platform/mellanox: mlxbf-pmc: Fix offset calculation for crspace events
platform/mellanox: mlxbf-tmfifo: Drop Tx network packet when Tx TmFIFO is full
MAINTAINERS: remove defunct acpi4asus project info from asus notebooks section
MAINTAINERS: add Luke Jones as maintainer for asus notebooks
MAINTAINERS: Remove Perry Yuan as DELL WMI HARDWARE PRIVACY SUPPORT maintainer
platform/x86: silicom-platform: Add missing "Description:" for power_cycle sysfs attr
platform/x86: intel-wmi-sbl-fw-update: Fix function name in error message
platform/x86: p2sb: Use pci_resource_n() in p2sb_read_bar0()
platform/x86: p2sb: Allow p2sb_bar() calls during PCI device probe
platform/x86: intel-uncore-freq: Fix types in sysfs callbacks
platform/x86: wmi: Fix wmi_dev_probe()
platform/x86: wmi: Fix notify callback locking
platform/x86: wmi: Decouple legacy WMI notify handlers from wmi_block_list
platform/x86: wmi: Return immediately if an suitable WMI event is found
platform/x86: wmi: Fix error handling in legacy WMI notify handler functions
Pull LoongArch fixes from Huacai Chen:
"Fix boot failure on machines with more than 8 nodes, and fix two build
errors about KVM"
* tag 'loongarch-fixes-6.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
LoongArch: KVM: Add returns to SIMD stubs
LoongArch: KVM: Fix build due to API changes
LoongArch/smp: Call rcutree_report_cpu_starting() at tlb_init()
Pull xfs fix from Chandan Babu:
- Fix read only mounts when using fsopen mount API
* tag 'xfs-6.8-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: read only mounts with fsopen mount API are busted
Pull bcachefs fixes from Kent Overstreet:
- fix for REQ_OP_FLUSH usage; this fixes filesystems going read only
with -EOPNOTSUPP from the block layer.
(this really should have gone in with the block layer patch causing
the -EOPNOTSUPP, or should have gone in before).
- fix an allocation in non-sleepable context
- fix one source of srcu lock latency, on devices with terrible discard
latency
- fix a reattach_inode() issue in fsck
* tag 'bcachefs-2024-01-26' of https://evilpiepirate.org/git/bcachefs:
bcachefs: __lookup_dirent() works in snapshot, not subvol
bcachefs: discard path uses unlock_long()
bcachefs: fix incorrect usage of REQ_OP_FLUSH
bcachefs: Add gfp flags param to bch2_prt_task_backtrace()
Pull smb server fixes from Steve French:
- Fix netlink OOB
- Minor kernel doc fix
* tag '6.8-rc2-smb3-server-fixes' of git://git.samba.org/ksmbd:
ksmbd: fix global oob in ksmbd_nl_policy
smb: Fix some kernel-doc comments
Pull smb client fixes from Steve French:
"Nine cifs/smb client fixes
- Four network error fixes (three relating to replays of requests
that need to be retried, and one fixing some places where we were
returning the wrong rc up the stack on network errors)
- Two multichannel fixes including locking fix and case where subset
of channels need reconnect
- netfs integration fixup: share remote i_size with netfslib
- Two small cleanups (one for addressing a clang warning)"
* tag '6.8-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: fix stray unlock in cifs_chan_skip_or_disable
cifs: set replay flag for retries of write command
cifs: commands that are retried should have replay flag set
cifs: helper function to check replayable error codes
cifs: translate network errors on send to -ECONNABORTED
cifs: cifs_pick_channel should try selecting active channels
cifs: Share server EOF pos with netfslib
smb: Work around Clang __bdos() type confusion
smb: client: delete "true", "false" defines
After the linked LLVM change, building ARCH=um defconfig results in a
segmentation fault in modpost. Prior to commit a23e7584ec ("modpost:
unify 'sym' and 'to' in default_mismatch_handler()"), there was a
warning:
WARNING: modpost: vmlinux.o(__ex_table+0x88): Section mismatch in reference to the .ltext:(unknown)
WARNING: modpost: The relocation at __ex_table+0x88 references
section ".ltext" which is not in the list of
authorized sections. If you're adding a new section
and/or if this reference is valid, add ".ltext" to the
list of authorized sections to jump to on fault.
This can be achieved by adding ".ltext" to
OTHER_TEXT_SECTIONS in scripts/mod/modpost.c.
The linked LLVM change moves global objects to the '.ltext' (and
'.ltext.*' with '-ffunction-sections') sections with '-mcmodel=large',
which ARCH=um uses. These sections should be handled just as '.text'
and '.text.*' are, so add them to TEXT_SECTIONS.
Cc: stable@vger.kernel.org
Closes: https://github.com/ClangBuiltLinux/linux/issues/1981
Link: 4bf8a68895
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
The kernel builds with -fno-PIE, so commit 883354afbc ("um: link
vmlinux with -no-pie") added the compiler linker flag '-no-pie' via
cc-option because '-no-pie' was only supported in GCC 6.1.0 and newer.
While this works for GCC, this does not work for clang because cc-option
uses '-c', which stops the pipeline right before linking, so '-no-pie'
is unconsumed and clang warns, causing cc-option to fail just as it
would if the option was entirely unsupported:
$ clang -Werror -no-pie -c -o /dev/null -x c /dev/null
clang-16: error: argument unused during compilation: '-no-pie' [-Werror,-Wunused-command-line-argument]
A recent version of clang exposes this because it generates a relocation
under '-mcmodel=large' that is not supported in PIE mode:
/usr/sbin/ld: init/main.o: relocation R_X86_64_32 against symbol `saved_command_line' can not be used when making a PIE object; recompile with -fPIE
/usr/sbin/ld: failed to set dynamic section sizes: bad value
clang: error: linker command failed with exit code 1 (use -v to see invocation)
Remove the cc-option check altogether. It is wasteful to invoke the
compiler to check for '-no-pie' because only one supported compiler
version does not support it, GCC 5.x (as it is supported with the
minimum version of clang and GCC 6.1.0+). Use a combination of the
gcc-min-version macro and CONFIG_CC_IS_CLANG to unconditionally add
'-no-pie' with CONFIG_LD_SCRIPT_DYN=y, so that it is enabled with all
compilers that support this. Furthermore, using gcc-min-version can help
turn this back into
LINK-$(CONFIG_LD_SCRIPT_DYN) += -no-pie
when the minimum version of GCC is bumped past 6.1.0.
Cc: stable@vger.kernel.org
Closes: https://github.com/ClangBuiltLinux/linux/issues/1982
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
For some ARCH values, SRCARCH, which should be used for finding arch/
subdirectory, is different from ARCH.
Signed-off-by: Zhang Bingwu <xtexchooser@duck.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
The current driver code no longer perfrom internal conversion from
VID to direct. Instead it configures READ_VOUT using MFR_DC_LOOP_CTRL.
Correct the comment message inside the 'mp2975_read_byte_data'
function to match the driver logic.
Signed-off-by: Konstantin Aladyshev <aladyshev22@gmail.com>
Fixes: c60fe56c16 ("hwmon: (pmbus/mp2975) Fix driver initialization for MP2975 device")
Link: https://lore.kernel.org/r/20240127154844.989-1-aladyshev22@gmail.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Commit 4c3577db3e ("Staging: iio: impedance-analyzer: Fix sparse
warning") fixed a compiler warning, but introduced a bug that resulted
in one of the two 16 bit IIO channels always being zero (when both are
enabled).
This is because int is 32 bits wide on most architectures and in the
case of a little-endian machine the two most significant bytes would
occupy the buffer for the second channel as 'val' is being passed as a
void pointer to 'iio_push_to_buffers()'.
Fix by defining 'val' as u16. Tested working on ARM64.
Fixes: 4c3577db3e ("Staging: iio: impedance-analyzer: Fix sparse warning")
Signed-off-by: David Schiller <david.schiller@jku.at>
Link: https://lore.kernel.org/r/20240122134916.2137957-1-david.schiller@jku.at
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
If we still own the FPU after initializing fcr31, when we are preempted
the dirty value in the FPU will be read out and stored into fcr31,
clobbering our setting. This can cause an improper floating-point
environment after execve(). For example:
zsh% cat measure.c
#include <fenv.h>
int main() { return fetestexcept(FE_INEXACT); }
zsh% cc measure.c -o measure -lm
zsh% echo $((1.0/3)) # raising FE_INEXACT
0.33333333333333331
zsh% while ./measure; do ; done
(stopped in seconds)
Call lose_fpu(0) before setting fcr31 to prevent this.
Closes: https://lore.kernel.org/linux-mips/7a6aa1bbdbbe2e63ae96ff163fab0349f58f1b9e.camel@xry111.site/
Fixes: 9b26616c8d ("MIPS: Respect the ISA level in FCSR handling")
Cc: stable@vger.kernel.org
Signed-off-by: Xi Ruoyao <xry111@xry111.site>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Commit 61167ad5fecd("mm: pass nid to reserve_bootmem_region()") reveals
that reserved memblock regions have no valid node id set, just set it
right since loongson64 firmware makes it clear in memory layout info.
This works around booting failure on 3A1000+ since commit 61167ad5fe
("mm: pass nid to reserve_bootmem_region()") under
CONFIG_DEFERRED_STRUCT_PAGE_INIT.
Signed-off-by: Huang Pei <huangpei@loongson.cn>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
In case the interface between the MAC and the PHY is SGMII, then the bit
GIGA_MODE on the MAC side needs to be set regardless of the speed at
which it is running.
Fixes: d28d6d2e37 ("net: lan966x: add port module support")
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 46cf789b68 ("connector: Move maintainence under networking
drivers umbrella.") moved the connector maintenance but did not include
the connector header files.
It seems that it has always been implied that these headers were
maintained along with the rest of the connector code, both before and
after the cited commit. Make this explicit.
Signed-off-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The batman-adv multicast tracker TVLV handler is registered for the
new batman-adv multicast packet type upon creating a batman-adv interface,
but not unregistered again upon the interface's deletion, leading to a
memory leak.
Fix this memory leak by calling the according TVLV handler unregister
routine for the multicast tracker TVLV upon batman-adv interface
deletion.
Fixes: 07afe1ba28 ("batman-adv: mcast: implement multicast packet reception and forwarding")
Reported-and-tested-by: syzbot+ebe64cc5950868e77358@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/000000000000beadc4060f0cbc23@google.com/
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
When a node which does not have the new batman-adv multicast packet type
capability vanishes then the according, global counter erroneously would
not be reduced in response on other nodes. Which in turn leads to the mesh
never switching back to sending with the new multicast packet type.
Fix this by reducing the according counter when such a node times out.
Fixes: 9003913322 ("batman-adv: mcast: implement multicast packet generation")
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Even with inplace decompression, sometimes very few temporary buffers
may be still needed for a single decompression shot (e.g. 16 pages for
64k sliding window or 4 pages for 16k sliding window). In low-memory
scenarios, it would be better to try to allocate with GFP_NOWAIT on
readahead first. That can help reduce the time spent on page allocation
under durative memory pressure.
Here are detailed performance numbers under multi-app launch benchmark
workload [1] on ARM64 Android devices (8-core CPU and 8GB of memory)
running a 5.15 LTS kernel with EROFS of 4k pclusters:
+----------------------------------------------+
| LZ4 | vanilla | patched | diff |
|----------------+---------+---------+---------|
| Average (ms) | 3364 | 2684 | -20.21% | [64k sliding window]
|----------------+---------+---------+---------|
| Average (ms) | 2079 | 1610 | -22.56% | [16k sliding window]
+----------------------------------------------+
The total size of system images for 4k pclusters is almost unchanged:
(64k sliding window) 9,117,044 KB
(16k sliding window) 9,113,096 KB
Therefore, in addition to switch the sliding window from 64k to 16k,
after applying this patch, it can eventually save 52.14% (3364 -> 1610)
on average with no memory reservation. That is particularly useful for
embedded devices with limited resources.
[1] https://lore.kernel.org/r/20240109074143.4138783-1-guochunhai@vivo.com
Suggested-by: Gao Xiang <xiang@kernel.org>
Signed-off-by: Chunhai Guo <guochunhai@vivo.com>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Yue Hu <huyue2@coolpad.com>
Link: https://lore.kernel.org/r/20240126140142.201718-1-hsiangkao@linux.alibaba.com
Pull ata updates from Niklas Cassel:
- Fix an incorrect link_power_management_policy sysfs attribute value.
We were previously using the same attribute value for two different
LPM policies (me)
- Add a ASMedia ASM1166 quirk.
The SATA host controller always reports that it has 32 ports, even
though it only has six ports. Add a quirk that overrides the value
reported by the controller (Conrad)
- Add a ASMedia ASM1061 quirk.
The SATA host controller completely ignores the upper 21 bits of the
DMA address. This causes IOMMU error events when a (valid) DMA
address actually has any of the upper 21 bits set. Add a quirk that
limits the dma_mask to 43-bits (Lennert)
* tag 'ata-6.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux:
ahci: add 43-bit DMA address quirk for ASMedia ASM1061 controllers
ahci: asm1166: correct count of reported ports
ata: libata-sata: improve sysfs description for ATA_LPM_UNKNOWN
Pull block fixes from Jens Axboe:
- RCU warning fix for md (Mikulas)
- Fix for an aoe issue that lockdep rightfully complained about
(Maksim)
- Fix for an error code change in partitioning that caused a regression
with some tools (Li)
- Fix for a data direction warning with bi-direction commands
(Christian)
* tag 'block-6.8-2024-01-26' of git://git.kernel.dk/linux:
md: fix a suspicious RCU usage warning
aoe: avoid potential deadlock at set_capacity
block: Fix WARNING in _copy_from_iter
block: Move checking GENHD_FL_NO_PART to bdev_add_partition()
Pull io_uring fix from Jens Axboe:
"Just a single tweak to the newly added IORING_OP_FIXED_FD_INSTALL from
Paul, ensuring it goes via the audit path and playing it safe by
excluding it from using registered creds"
* tag 'io_uring-6.8-2024-01-26' of git://git.kernel.dk/linux:
io_uring: enable audit and restrict cred override for IORING_OP_FIXED_FD_INSTALL
Pull thermal control update from Rafael Wysocki:
"Remove some dead code from the Intel powerclamp thermal control driver
(Srinivas Pandruvada)"
* tag 'thermal-6.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
thermal: intel: powerclamp: Remove dead code for target mwait value
Pull power management fixes from Rafael Wysocki:
"These fix two cpufreq drivers and the cpupower utility.
Specifics:
- Fix the handling of scaling_max/min_freq sysfs attributes in the
AMD P-state cpufreq driver (Mario Limonciello)
- Make the intel_pstate cpufreq driver avoid unnecessary computation
of the HWP performance level corresponding to a given frequency in
the cases when it is known already, which also helps to avoid
reducing the maximum CPU capacity artificially on some systems
(Rafael J. Wysocki)
- Fix compilation of the cpupower utility when CFLAGS is passed as a
make argument for cpupower, but it does not take effect as expected
due to mishandling (Stanley Chan)"
* tag 'pm-6.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq/amd-pstate: Fix setting scaling max/min freq values
cpufreq: intel_pstate: Refine computation of P-state for given frequency
tools cpupower bench: Override CFLAGS assignments
Pull documentation fixes from Jonathan Corbet:
"A handful of relatively boring documentation fixes"
* tag 'docs-6.8-fixes' of git://git.lwn.net/linux:
docs: admin-guide: remove obsolete advice related to SLAB allocator
doc: admin-guide/kernel-parameters: remove useless comment
docs/accel: correct links to mailing list archives
docs/sphinx: Fix TOC scroll hack for the home page
The inode_getsecctx LSM hook has previously been corrected to have
-EOPNOTSUPP instead of 0 as the default return value to fix BPF LSM
behavior. However, the call_int_hook()-generated loop in
security_inode_getsecctx() was left treating 0 as the neutral value, so
after an LSM returns 0, the loop continues to try other LSMs, and if one
of them returns a non-zero value, the function immediately returns with
said value. So in a situation where SELinux and the BPF LSMs registered
this hook, -EOPNOTSUPP would be incorrectly returned whenever SELinux
returned 0.
Fix this by open-coding the call_int_hook() loop and making it use the
correct LSM_RET_DEFAULT() value as the neutral one, similar to what
other hooks do.
Cc: stable@vger.kernel.org
Reported-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Link: https://lore.kernel.org/selinux/CAEjxPJ4ev-pasUwGx48fDhnmjBnq_Wh90jYPwRQRAqXxmOKD4Q@mail.gmail.com/
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2257983
Fixes: b36995b860 ("lsm: fix default return value for inode_getsecctx")
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Reviewed-by: Casey Schaufler <casey@schaufler-ca.com>
[PM: subject line tweak]
Signed-off-by: Paul Moore <paul@paul-moore.com>
The gro.sh test-case relay on the gro_flush_timeout to ensure
that all the segments belonging to any given batch are properly
aggregated.
The other end, the sender is a user-space program transmitting
each packet with a separate write syscall. A busy host and/or
stracing the sender program can make the relevant segments reach
the GRO engine after the flush timeout triggers.
Give the GRO flush timeout more slack, to avoid sporadic self-tests
failures.
Fixes: 9af771d2ec ("selftests/net: allow GRO coalesce test on veth")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Tested-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/bffec2beab3a5672dd13ecabe4fad81d2155b367.1706206101.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
the udpgro_fraglist self-test uses the BPF classifiers, but the
current net self-test configuration does not include it, causing
CI failures:
# selftests: net: udpgro_frglist.sh
# ipv6
# tcp - over veth touching data
# -l 4 -6 -D 2001:db8::1 -t rx -4 -t
# Error: TC classifier not found.
# We have an error talking to the kernel
# Error: TC classifier not found.
# We have an error talking to the kernel
Add the missing knob.
Fixes: edae34a3ed ("selftests net: add UDP GRO fraglist + bpf self-tests")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Maciej Żenczykowski <maze@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/7c3643763b331e9a400e1874fe089193c99a1c3f.1706170897.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
commit 056bce63c4 ("bnxt_en: Make PTP TX timestamp HWRM query silent")
changed a netdev_err() to netdev_WARN_ONCE().
netdev_WARN_ONCE() is it generates a kernel WARNING, which is bad, for
the following reasons:
* You do not a kernel warning if the firmware queries are late
* In busy networks, timestamp query failures fairly regularly
* A WARNING message doesn't bring much value, since the code path
is clear.
(This was discussed in-depth in [1])
Transform the netdev_WARN_ONCE() into a netdev_warn_once(), and print a
more well-behaved message, instead of a full WARN().
bnxt_en 0000:67:00.0 eth0: TS query for TX timer failed rc = fffffff5
[1] Link: https://lore.kernel.org/all/ZbDj%2FFI4EJezcfd1@gmail.com/
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Fixes: 056bce63c4 ("bnxt_en: Make PTP TX timestamp HWRM query silent")
Link: https://lore.kernel.org/r/20240125134104.2045573-1-leitao@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull drm fixes from Dave Airlie:
"Lots going on for rc2, ivpu has a bunch of stabilisation and debugging
work, then amdgpu and xe are the main fixes. i915, exynos have a few,
then some misc panel and bridge fixes.
Worth mentioning are three regressions. One of the nouveau fixes in
6.7 for a serious deadlock had side effects, so I guess we will bring
back the deadlock until I can figure out what should be done properly.
There was a scheduler regression vs amdgpu which was reported in a few
places and is now fixed. There was an i915 vs simpledrm problem
resulting in black screens, that is reverted also.
I'll be working on a proper nouveau fix, it kinda looks like one of
those cases where someone tried to use an atomic where they should
have probably used a lock, but I'll see.
fb:
- fix simpledrm/i915 regression by reverting change
scheduler:
- fix regression affecting amdgpu users due to sched draining
nouveau:
- revert 6.7 deadlock fix as it has side effects
dp:
- fix documentation warning
ttm:
- fix dummy page read on some platforms
bridge:
- anx7625 suspend fix
- sii902x: fix probing and audio registration
- parade-ps8640: fix suspend of bridge, aux fixes
- samsung-dsim: avoid using FORCE_STOP_STATE
panel:
- simple add missing bus flags
- fix samsung-s6d7aa0 flags
amdgpu:
- AC/DC power supply tracking fix
- Don't show invalid vram vendor data
- SMU 13.0.x fixes
- GART fix for umr on systems without VRAM
- GFX 10/11 UNORD_DISPATCH fixes
- IPS display fixes (required for S0ix on some platforms)
- Misc fixes
i915:
- DSI sequence revert to fix GitLab #10071 and DP test-pattern fix
- Drop -Wstringop-overflow (broken on GCC11)
ivpu:
- fix recovery/reset support
- improve submit ioctl stability
- fix dev open/close races on unbind
- PLL disable reset fix
- deprecate context priority param
- improve debug buffer logging
- disable buffer sharing across VPU contexts
- free buffer sgt on unbind
- fix missing lock around shmem vmap
- add better boot diagnostics
- add more debug prints around mapping
- dump MMU events in case of timeout
v3d:
- NULL ptr dereference fix
exynos:
- fix stack usage
- fix incorrect type
- fix dt typo
- fix gsc runtime resume
xe:
- Make an ops struct static
- Fix an implicit 0 to NULL conversion
- A couple of 32-bit fixes
- A migration coherency fix for Lunar Lake.
- An error path vm id leak fix
- Remove PVC references in kunit tests"
* tag 'drm-fixes-2024-01-27' of git://anongit.freedesktop.org/drm/drm: (66 commits)
Revert "nouveau: push event block/allowing out of the fence context"
drm: bridge: samsung-dsim: Don't use FORCE_STOP_STATE
drm/sched: Drain all entities in DRM sched run job worker
drm/amd/display: "Enable IPS by default"
drm/amd: Add a DC debug mask for IPS
drm/amd/display: Disable ips before dc interrupt setting
drm/amd/display: Replay + IPS + ABM in Full Screen VPB
drm/amd/display: Add IPS checks before dcn register access
drm/amd/display: Add Replay IPS register for DMUB command table
drm/amd/display: Allow IPS2 during Replay
drm/amdgpu/gfx11: set UNORD_DISPATCH in compute MQDs
drm/amdgpu/gfx10: set UNORD_DISPATCH in compute MQDs
drm/amd/amdgpu: Assign GART pages to AMD device mapping
drm/amd/pm: Fetch current power limit from FW
drm/amdgpu: Fix null pointer dereference
drm/amdgpu: Show vram vendor only if available
drm/amd/pm: update the power cap setting
drm/amdgpu: Avoid fetching vram vendor information
drm/amdgpu/pm: Fix the power source flag error
drm/amd/display: Fix uninitialized variable usage in core_link_ 'read_dpcd() & write_dpcd()' functions
...
This reverts commit b43f7ddc2b.
The offending commit deferred power-supply class device registration
until the service-started notification is received.
This triggers a NULL pointer dereference during boot of the Lenovo
ThinkPad X13s and SC8280XP CRD as battery status notifications can be
received before the service-start notification:
Unable to handle kernel NULL pointer dereference at virtual address 00000000000005c0
...
Call trace:
_acquire+0x338/0x2064
acquire+0x1e8/0x318
spin_lock_irqsave+0x60/0x88
_supply_changed+0x2c/0xa4
battmgr_callback+0x1d4/0x60c [qcom_battmgr]
pmic_glink_rpmsg_callback+0x5c/0xa4 [pmic_glink]
qcom_glink_native_rx+0x58c/0x7e8
qcom_glink_smem_intr+0x14/0x24 [qcom_glink_smem]
__handle_irq_event_percpu+0xb0/0x2d4
handle_irq_event+0x4c/0xb8
As trying to serialise this is non-trivial and risks missing
notifications, let's revert to registration during probe so that the
driver data is all set up once the service goes live.
The warning message during resume in case the aDSP firmware is not
running that motivated the change can be considered a feature and should
not be suppressed.
Fixes: b43f7ddc2b ("power: supply: qcom_battmgr: Register the power supplies after PDR is up")
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org>
Link: https://lore.kernel.org/r/20240123160053.18331-1-johan+linaro@kernel.org
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Pull asm-generic update from Arnd Bergmann:
"Just one patch this time, adding Andreas Larsson as co-maintainer for
arch/sparc. He is volunteering to help since David Miller has become
much less active over the past few years.
In turn, I'm helping Andreas get set up as a new maintainer, starting
with the entry in the MAINTAINERS file"
* tag 'asm-generic-6.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
MAINTAINERS: Add Andreas Larsson as co-maintainer for arch/sparc
Pull arm SoC fixes from Arnd Bergmann:
"There are a couple of devicetree fixes for samsung, riscv/sophgo, and
for TPM device nodes on a couple of platforms.
Both the Arm FF-A and the SCMI firmware drivers get a number of code
fixes, addressing minor implementation bugs and compatibility with
firmware implementations. Most of these bugs relate to the usage of
xarray and rwlock structures and are fixed by Cristian Marussi"
* tag 'arm-fixes-6.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
riscv: dts: sophgo: separate sg2042 mtime and mtimecmp to fit aclint format
arm64: dts: Fix TPM schema violations
ARM: dts: Fix TPM schema violations
ARM: dts: exynos4212-tab3: add samsung,invert-vclk flag to fimd
arm64: dts: exynos: gs101: comply with the new cmu_misc clock names
firmware: arm_ffa: Handle partitions setup failures
firmware: arm_ffa: Use xa_insert() and check for result
firmware: arm_ffa: Simplify ffa_partitions_cleanup()
firmware: arm_ffa: Check xa_load() return value
firmware: arm_ffa: Add missing rwlock_init() for the driver partition
firmware: arm_ffa: Add missing rwlock_init() in ffa_setup_partitions()
firmware: arm_scmi: Fix the clock protocol supported version
firmware: arm_scmi: Fix the clock protocol version for v3.2
firmware: arm_scmi: Use xa_insert() when saving raw queues
firmware: arm_scmi: Use xa_insert() to store opps
firmware: arm_scmi: Replace asm-generic/bug.h with linux/bug.h
firmware: arm_scmi: Check mailbox/SMT channel for consistency
The commit 1feb31e810 ("hwmon: (pmbus/mp2975) Simplify VOUT code")
has introduced a bug that makes it impossible to initialize MP2975
device:
"""
mp2975 5-0020: Failed to identify chip capabilities
i2c i2c-5: new_device: Instantiated device mp2975 at 0x20
i2c i2c-5: delete_device: Deleting device mp2975 at 0x20
"""
Since the 'read_byte_data' function was removed from the
'pmbus_driver_info ' structure the driver no longer reports correctly
that VOUT mode is direct. Therefore 'pmbus_identify_common' fails
with error, making it impossible to initialize the device.
Restore 'read_byte_data' function to fix the issue.
Tested:
- before: it is not possible to initialize MP2975 device with the
'mp2975' driver,
- after: 'mp2975' correctly initializes MP2975 device and all sensor
data is correct.
Fixes: 1feb31e810 ("hwmon: (pmbus/mp2975) Simplify VOUT code")
Signed-off-by: Konstantin Aladyshev <aladyshev22@gmail.com>
Link: https://lore.kernel.org/r/20240126205714.2363-1-aladyshev22@gmail.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Pull spi fixes from Mark Brown:
"As well as a few device IDs and the usual scattering of driver
specific fixes this contains a couple of core things.
One is a missed case in error handling, the other patch is a change
from me raising the number of chip selects allowed by the newly added
multi chip select support patches to resolve problems seen on several
systems that exceeded the limit.
This is not a real solution to the issue but rather just a change to
avoid disruption to users, one of the options I am considering is just
sending a revert of those changes if we can't come up with something
sensible"
* tag 'spi-fix-v6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: fix finalize message on error return
spi: cs42l43: Handle error from devm_pm_runtime_enable
spi: Raise limit on number of chip selects
spi: hisi-sfc-v3xx: Return IRQ_NONE if no interrupts were detected
spi: spi-cadence: Reverse the order of interleaved write and read operations
spi: spi-imx: Use dev_err_probe for failed DMA channel requests
spi: bcm-qspi: fix SFDP BFPT read by usig mspi read
spi: intel-pci: Add support for Arrow Lake SPI serial flash
spi: intel-pci: Remove Meteor Lake-S SoC PCI ID from the list
Pull gpio fixes from Bartosz Golaszewski:
- add a quirk to GPIO ACPI handling to ignore touchpad wakeups on GPD
G1619-04
- clear interrupt status bits (that may have been set before enabling
the interrupts) after setting the interrupt type in gpio-eic-sprd
* tag 'gpio-fixes-for-v6.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpio: eic-sprd: Clear interrupt after set the interrupt type
gpiolib: acpi: Ignore touchpad wakeup on GPD G1619-04
Pull media fixes from Mauro Carvalho Chehab:
- remove K3 DT prefix from wave5
- vb2 core: fix missing caps on VIDIO_CREATE_BUFS under certain
circumstances
- videobuf2: Stop direct calls to queue num_buffers field
* tag 'media/v6.8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
media: vb2: refactor setting flags and caps, fix missing cap
media: media videobuf2: Stop direct calls to queue num_buffers field
media: chips-media: wave5: Remove K3 References
dt-bindings: media: Remove K3 Family Prefix from Compatible
The big_tcp test-case requires a few kernel knobs currently
not specified in the net selftests config, causing the
following failure:
# selftests: net: big_tcp.sh
# Error: Failed to load TC action module.
# We have an error talking to the kernel
...
# Testing for BIG TCP:
# CLI GSO | GW GRO | GW GSO | SER GRO
# ./big_tcp.sh: line 107: test: !=: unary operator expected
...
# on on on on : [FAIL_on_link1]
Add the missing configs
Fixes: 6bb382bcf7 ("selftests: add a selftest for big tcp")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Acked-by: Xin Long <lucien.xin@gmail.com>
Link: https://lore.kernel.org/all/21630ecea872fea13f071342ac64ef52a991a9b5.1706282943.git.pabeni@redhat.com/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Merge cpufreq fixes for 6.8-rc2:
- Fix the handling of scaling_max/min_freq sysfs attributes in the AMD
P-state cpufreq driver (Mario Limonciello).
- Make the intel_pstate cpufreq driver avoid unnecessary computation of
the HWP performance level corresponding to a given frequency in the
cases when it is known already, which also helps to avoid reducing
the maximum CPU capacity artificially on some systems (Rafael J.
Wysocki).
* pm-cpufreq:
cpufreq/amd-pstate: Fix setting scaling max/min freq values
cpufreq: intel_pstate: Refine computation of P-state for given frequency
This reverts commit eacabb5462.
This commit causes some regressions in desktop usage, this will
reintroduce the original deadlock in DRI_PRIME situations, I've
got an idea to fix it by offloading to a workqueue in a different
spot, however this code has a race condition where we sometimes
miss interrupts so I'd like to fix that as well.
Cc: stable@vger.kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
KVM/riscv changes for 6.8 part #2
- Zbc extension support for Guest/VM
- Scalar crypto extensions support for Guest/VM
- Vector crypto extensions support for Guest/VM
- Zfh[min] extensions support for Guest/VM
- Zihintntl extension support for Guest/VM
- Zvfh[min] extensions support for Guest/VM
- Zfa extension support for Guest/VM
A while back the I2C HID implementation was split in an ACPI and OF
part, but the new OF driver never initialises the client pointer which
is dereferenced on power-up failures.
Fixes: b33752c300 ("HID: i2c-hid: Reorganize so ACPI and OF are separate modules")
Cc: stable@vger.kernel.org # 5.12
Cc: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Jiri Kosina <jkosina@suse.com>
The nvmet_tcp_queue_ida should be destroy when the nvmet-tcp module
exit.
Signed-off-by: Guixin Liu <kanie@linux.alibaba.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
The recently introduced EFI memory attributes protocol should be used
if it exists to ensure that the memory allocation created for the kernel
permits execution. This is needed for compatibility with tightened
requirements related to Windows logo certification for x86 PCs.
Currently, we simply strip the execute protect (XP) attribute from the
entire range, but this might be rejected under some firmware security
policies, and so in a subsequent patch, this will be changed to only
strip XP from the executable region that runs early, and make it
read-only (RO) as well.
In order to catch any issues early, ensure that the memory attribute
protocol works as intended, and give up if it produces spurious errors.
Note that the DXE services based fallback was always based on best
effort, so don't propagate any errors returned by that API.
Fixes: a1b87d54f4 ("x86/efistub: Avoid legacy decompressor when doing EFI boot")
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Picking the changes from:
8570c27932 ("drm/syncobj: Add deadline support for syncobj waits")
9724ed6c1b ("drm: Introduce DRM_CLIENT_CAP_CURSOR_PLANE_HOTSPOT")
e4d983acff ("drm: introduce DRM_CAP_ATOMIC_ASYNC_PAGE_FLIP")
d208d87566 ("drm: introduce CLOSEFB IOCTL")
afa5cf3175 ("drm/i915/uapi: fix typos/spellos and punctuation")
Addressing these perf build warnings:
Warning: Kernel ABI header differences:
Now 'perf trace' and other code that might use the
tools/perf/trace/beauty autogenerated tables will be able to translate
this new ioctl command into a string:
$ tools/perf/trace/beauty/drm_ioctl.sh > before
$ cp include/uapi/drm/drm.h tools/include/uapi/drm/drm.h
$ tools/perf/trace/beauty/drm_ioctl.sh > after
$ diff -u before after
--- before 2024-01-26 10:54:23.486381862 -0300
+++ after 2024-01-26 10:54:35.767902442 -0300
@@ -109,6 +109,7 @@
[0xCD] = "SYNCOBJ_TIMELINE_SIGNAL",
[0xCE] = "MODE_GETFB2",
[0xCF] = "SYNCOBJ_EVENTFD",
+ [0xD0] = "MODE_CLOSEFB",
[DRM_COMMAND_BASE + 0x00] = "I915_INIT",
[DRM_COMMAND_BASE + 0x01] = "I915_FLUSH",
[DRM_COMMAND_BASE + 0x02] = "I915_FLIP",
$
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Javier Martinez Canillas <javierm@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Rob Clark <robdclark@chromium.org>
Cc: Simon Ser <contact@emersion.fr>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Zack Rusin <zack.rusin@broadcom.com>
Link: https://lore.kernel.org/lkml/ZbPIN9Dcc5AM0uxo@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick up the changes in:
765a0542fd ("x86/virt/tdx: Detect TDX during kernel boot")
Addressing this tools/perf build warning:
Warning: Kernel ABI header differences:
diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h
That makes the beautification scripts to pick some new entries:
$ tools/perf/trace/beauty/tracepoints/x86_msr.sh > before
$ cp arch/x86/include/asm/msr-index.h tools/arch/x86/include/asm/msr-index.h
$ tools/perf/trace/beauty/tracepoints/x86_msr.sh > after
$ diff -u before after
--- before 2024-01-25 11:08:12.363223880 -0300
+++ after 2024-01-25 11:08:24.839307699 -0300
@@ -21,6 +21,7 @@
[0x0000004f] = "PPIN",
[0x00000060] = "LBR_CORE_TO",
[0x00000079] = "IA32_UCODE_WRITE",
+ [0x00000087] = "IA32_MKTME_KEYID_PARTITIONING",
[0x0000008b] = "AMD64_PATCH_LEVEL",
[0x0000008C] = "IA32_SGXLEPUBKEYHASH0",
[0x0000008D] = "IA32_SGXLEPUBKEYHASH1",
$
Now one can trace systemwide asking to see backtraces to where that MSR
is being read/written, see this example with a previous update:
# perf trace -e msr:*_msr/max-stack=32/ --filter="msr==IA32_MKTME_KEYID_PARTITIONING"
^C#
If we use -v (verbose mode) we can see what it does behind the scenes:
# perf trace -v -e msr:*_msr/max-stack=32/ --filter="msr==IA32_MKTME_KEYID_PARTITIONING"
Using CPUID GenuineIntel-6-8E-A
0x87
New filter for msr:read_msr: (msr==0x87) && (common_pid != 58627 && common_pid != 3792)
0x87
New filter for msr:write_msr: (msr==0x87) && (common_pid != 58627 && common_pid != 3792)
mmap size 528384B
^C#
Example with a frequent msr:
# perf trace -v -e msr:*_msr/max-stack=32/ --filter="msr==IA32_SPEC_CTRL" --max-events 2
Using CPUID AuthenticAMD-25-21-0
0x48
New filter for msr:read_msr: (msr==0x48) && (common_pid != 2612129 && common_pid != 3841)
0x48
New filter for msr:write_msr: (msr==0x48) && (common_pid != 2612129 && common_pid != 3841)
mmap size 528384B
Looking at the vmlinux_path (8 entries long)
symsrc__init: build id mismatch for vmlinux.
Using /proc/kcore for kernel data
Using /proc/kallsyms for symbols
0.000 Timer/2525383 msr:write_msr(msr: IA32_SPEC_CTRL, val: 6)
do_trace_write_msr ([kernel.kallsyms])
do_trace_write_msr ([kernel.kallsyms])
__switch_to_xtra ([kernel.kallsyms])
__switch_to ([kernel.kallsyms])
__schedule ([kernel.kallsyms])
schedule ([kernel.kallsyms])
futex_wait_queue_me ([kernel.kallsyms])
futex_wait ([kernel.kallsyms])
do_futex ([kernel.kallsyms])
__x64_sys_futex ([kernel.kallsyms])
do_syscall_64 ([kernel.kallsyms])
entry_SYSCALL_64_after_hwframe ([kernel.kallsyms])
__futex_abstimed_wait_common64 (/usr/lib64/libpthread-2.33.so)
0.030 :0/0 msr:write_msr(msr: IA32_SPEC_CTRL, val: 2)
do_trace_write_msr ([kernel.kallsyms])
do_trace_write_msr ([kernel.kallsyms])
__switch_to_xtra ([kernel.kallsyms])
__switch_to ([kernel.kallsyms])
__schedule ([kernel.kallsyms])
schedule_idle ([kernel.kallsyms])
do_idle ([kernel.kallsyms])
cpu_startup_entry ([kernel.kallsyms])
secondary_startup_64_no_verify ([kernel.kallsyms])
#
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kai Huang <kai.huang@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/ZbJt27rjkQVU1YoP@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The FORCE_STOP_STATE bit is unsuitable to force the DSI link into LP-11
mode. It seems the bridge internally queues DSI packets and when the
FORCE_STOP_STATE bit is cleared, they are sent in close succession
without any useful timing (this also means that the DSI lanes won't go
into LP-11 mode). The length of this gibberish varies between 1ms and
5ms. This sometimes breaks an attached bridge (TI SN65DSI84 in this
case). In our case, the bridge will fail in about 1 per 500 reboots.
The FORCE_STOP_STATE handling was introduced to have the DSI lanes in
LP-11 state during the .pre_enable phase. But as it turns out, none of
this is needed at all. Between samsung_dsim_init() and
samsung_dsim_set_display_enable() the lanes are already in LP-11 mode.
The code as it was before commit 20c827683d ("drm: bridge:
samsung-dsim: Fix init during host transfer") and 0c14d31306 ("drm:
bridge: samsung-dsim: Fix i.MX8M enable flow to meet spec") was correct
in this regard.
This patch basically reverts both commits. It was tested on an i.MX8M
SoC with an SN65DSI84 bridge. The signals were probed and the DSI
packets were decoded during initialization and link start-up. After this
patch the first DSI packet on the link is a VSYNC packet and the timing
is correct.
Command mode between .pre_enable and .enable was also briefly tested by
a quick hack. There was no DSI link partner which would have responded,
but it was made sure the DSI packet was send on the link. As a side
note, the command mode seems to just work in HS mode. I couldn't find
that the bridge will handle commands in LP mode.
Fixes: 20c827683d ("drm: bridge: samsung-dsim: Fix init during host transfer")
Fixes: 0c14d31306 ("drm: bridge: samsung-dsim: Fix i.MX8M enable flow to meet spec")
Signed-off-by: Michael Walle <mwalle@kernel.org>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231113164344.1612602-1-mwalle@kernel.org
I encountered a race issue after lengthy (~594647 secs) stress tests on
a 64k-page arm64 VM with several 4k-block EROFS images. The timing
is like below:
z_erofs_try_inplace_io z_erofs_fill_bio_vec
cmpxchg(&compressed_bvecs[].page,
NULL, ..)
[access bufvec]
compressed_bvecs[] = *bvec;
Previously, z_erofs_submit_queue() just accessed bufvec->page only, so
other fields in bufvec didn't matter. After the subpage block support
is landed, .offset and .end can be used too, but filling bufvec isn't
an atomic operation which can cause inconsistency.
Let's use a spinlock to keep the atomicity of each bufvec. More
specifically, just reuse the existing spinlock `pcl->obj.lockref.lock`
since it's rarely used (also it takes a short time if even used) as long
as the pcluster has a reference.
Fixes: 192351616a ("erofs: support I/O submission for sub-page compressed blocks")
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Yue Hu <huyue2@coolpad.com>
Reviewed-by: Sandeep Dhavale <dhavale@google.com>
Link: https://lore.kernel.org/r/20240125120039.3228103-1-hsiangkao@linux.alibaba.com
When a wiphy work is queued with timer, and then again
without a delay, it's started immediately but *also*
started again after the timer expires. This can lead,
for example, to warnings in mac80211's offchannel code
as reported by Jouni. Running the same work twice isn't
expected, of course. Fix this by deleting the timer at
this point, when queuing immediately due to delay=0.
Cc: stable@vger.kernel.org
Reported-by: Jouni Malinen <j@w1.fi>
Fixes: a3ee4dc84c ("wifi: cfg80211: add a work abstraction with special semantics")
Link: https://msgid.link/20240125095108.2feb0eaaa446.I4617f3210ed0e7f252290d5970dac6a876aa595b@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Lantiq uses a common kernel config for devices with 24Kc and 34Kc cores.
The changes made previously to add support for interrupts on all cores
work on 24Kc platforms with SMP disabled and 34Kc platforms with SMP
enabled. This patch fixes boot issues on Danube (single core 24Kc) with
SMP enabled.
Fixes: 730320fd77 ("MIPS: lantiq: enable all hardware interrupts on second VPE")
Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Commit 61167ad5fecd("mm: pass nid to reserve_bootmem_region()") reveals
that reserved memblock regions have no valid node id set, just set it
right since loongson64 firmware makes it clear in memory layout info.
This works around booting failure on 3A1000+ since commit 61167ad5fe
("mm: pass nid to reserve_bootmem_region()") under
CONFIG_DEFERRED_STRUCT_PAGE_INIT.
Signed-off-by: Huang Pei <huangpei@loongson.cn>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
"cpu_probe" is called both by BP and APs, but reserving exception vector
(like 0x0-0x1000) called by "cpu_probe" need once and calling on APs is
too late since memblock is unavailable at that time.
So, reserve exception vector ONLY by BP.
Suggested-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Huang Pei <huangpei@loongson.cn>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Most of the symbols for which we do not have a prototype can actually be
made static and for the few that cannot, there is already a declaration
in a header for it.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
The addition of commit efa7df3e3b ("mm: align larger anonymous mappings
on THP boundaries") caused the "virtual_address_range" mm selftest to
start failing on arm64. Let's fix that regression.
There were 2 visible problems when running the test; 1) it takes much
longer to execute, and 2) the test fails. Both are related:
The (first part of the) test allocates as many 1GB anonymous blocks as it
can in the low 256TB of address space, passing NULL as the addr hint to
mmap. Before the faulty patch, all allocations were abutted and contained
in a single, merged VMA. However, after this patch, each allocation is in
its own VMA, and there is a 2M gap between each VMA. This causes the 2
problems in the test: 1) mmap becomes MUCH slower because there are so
many VMAs to check to find a new 1G gap. 2) mmap fails once it hits the
VMA limit (/proc/sys/vm/max_map_count). Hitting this limit then causes a
subsequent calloc() to fail, which causes the test to fail.
The problem is that arm64 (unlike x86) selects
ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT. But __thp_get_unmapped_area()
allocates len+2M then always aligns to the bottom of the discovered gap.
That causes the 2M hole.
Fix this by detecting cases where we can still achive the alignment goal
when moved to the top of the allocated area, if configured to prefer
top-down allocation.
While we are at it, fix thp_get_unmapped_area's use of pgoff, which should
always be zero for anonymous mappings. Prior to the faulty change, while
it was possible for user space to pass in pgoff!=0, the old
mm->get_unmapped_area() handler would not use it. thp_get_unmapped_area()
does use it, so let's explicitly zero it before calling the handler. This
should also be the correct behavior for arches that define their own
get_unmapped_area() handler.
Link: https://lkml.kernel.org/r/20240123171420.3970220-1-ryan.roberts@arm.com
Fixes: efa7df3e3b ("mm: align larger anonymous mappings on THP boundaries")
Closes: https://lore.kernel.org/linux-mm/1e8f5ac7-54ce-433a-ae53-81522b2320e1@arm.com/
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
ahash_alg->setkey is updated to ahash_nosetkey in ahash.c
so checking setkey() function to determine hmac algorithm is not valid.
to fix this added is_hmac variable in structure caam_hash_alg to determine
whether the algorithm is hmac or not.
Fixes: 2f1f34c1bf ("crypto: ahash - optimize performance when wrapping shash")
Signed-off-by: Gaurav Jain <gaurav.jain@nxp.com>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The commit "crypto: qat - generate dynamically arbiter mappings"
introduced a regression on qat_402xx devices.
This is reported when the driver probes the device, as indicated by
the following error messages:
4xxx 0000:0b:00.0: enabling device (0140 -> 0142)
4xxx 0000:0b:00.0: Generate of the thread to arbiter map failed
4xxx 0000:0b:00.0: Direct firmware load for qat_402xx_mmp.bin failed with error -2
The root cause of this issue was the omission of a necessary function
pointer required by the mapping algorithm during the implementation.
Fix it by adding the missing function pointer.
Fixes: 5da6a2d535 ("crypto: qat - generate dynamically arbiter mappings")
Signed-off-by: Damian Muszynski <damian.muszynski@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The stubs for kvm_own/lsx()/kvm_own_lasx() when CONFIG_CPU_HAS_LSX or
CONFIG_CPU_HAS_LASX is not defined should have a return value since they
return an int, so add "return -EINVAL;" to the stubs.
Fixes the build error:
In file included from ../arch/loongarch/include/asm/kvm_csr.h:12,
from ../arch/loongarch/kvm/interrupt.c:8:
../arch/loongarch/include/asm/kvm_vcpu.h: In function 'kvm_own_lasx':
../arch/loongarch/include/asm/kvm_vcpu.h:73:39: error: no return statement in function returning non-void [-Werror=return-type]
73 | static inline int kvm_own_lasx(struct kvm_vcpu *vcpu) { }
Fixes: db1ecca22e ("LoongArch: KVM: Add LSX (128bit SIMD) support")
Fixes: 118e10cd89 ("LoongArch: KVM: Add LASX (256bit SIMD) support")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Commit 8569992d64 ("KVM: Use gfn instead of hva for
mmu_notifier_retry") replaces mmu_invalidate_retry_hva() usage with
mmu_invalidate_retry_gfn() for X86, LoongArch also need similar changes
to fix build.
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Machines which have more than 8 nodes fail to boot SMP after commit
a2ccf46333 ("LoongArch/smp: Call rcutree_report_cpu_starting()
earlier"). Because such machines use tlb-based per-cpu base address
rather than dmw-based per-cpu base address, resulting per-cpu variables
can only be accessed after tlb_init(). But rcutree_report_cpu_starting()
is now called before tlb_init() and accesses per-cpu variables indeed.
Since the original patch want to avoid the lockdep warning caused by
page allocation in tlb_init(), we can move rcutree_report_cpu_starting()
to tlb_init() where after tlb exception configuration but before page
allocation.
Fixes: a2ccf46333 ("LoongArch/smp: Call rcutree_report_cpu_starting() earlier")
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
ksm_tests was previously mmapping a region of memory, aligning the
returned pointer to a PMD boundary, then setting MADV_HUGEPAGE, but was
setting it past the end of the mmapped area due to not taking the pointer
alignment into consideration. Fix this behaviour.
Up until commit efa7df3e3b ("mm: align larger anonymous mappings on THP
boundaries"), this buggy behavior was (usually) masked because the
alignment difference was always less than PMD-size. But since the
mentioned commit, `ksm_tests -H -s 100` started failing.
Link: https://lkml.kernel.org/r/20240122120554.3108022-1-ryan.roberts@arm.com
Fixes: 3252548996 ("selftests: vm: add KSM huge pages merging time test")
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Cc: Pedro Demarchi Gomes <pedrodemargomes@gmail.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The correct folio replacement for "set_page_dirty()" is
"folio_mark_dirty()", not "folio_set_dirty()". Using the latter won't
properly inform the FS using the dirty_folio() callback.
This has been found by code inspection, but likely this can result in some
real trouble.
Link: https://lkml.kernel.org/r/20240122175407.307992-1-david@redhat.com
Fixes: a8e61d584e ("mm/huge_memory: page_remove_rmap() -> folio_remove_rmap_pmd()")
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
In order for the page table level 5 to be in use, the CPU must have the
setting enabled in addition to the CONFIG option. Check for the flag to be
set to avoid false test failures on systems that do not have this cpu flag
set.
The test does a series of mmap calls including three using the
MAP_FIXED flag and specifying an address that is 1<<47 or 1<<48. These
addresses are only available if you are using level 5 page tables,
which requires both the CPU to have the capabiltiy (la57 flag) and the
kernel to be configured. Currently the test only checks for the kernel
configuration option, so this test can still report a false positive.
Here are the three failing lines:
$ ./va_high_addr_switch | grep FAILED
mmap(ADDR_SWITCH_HINT, 2 * PAGE_SIZE, MAP_FIXED): 0xffffffffffffffff - FAILED
mmap(HIGH_ADDR, MAP_FIXED): 0xffffffffffffffff - FAILED
mmap(ADDR_SWITCH_HINT, 2 * PAGE_SIZE, MAP_FIXED): 0xffffffffffffffff - FAILED
I thought (for about a second) refactoring the test so that these three
mmap calls will only be run on systems with the level 5 page tables
available, but the whole point of the test is to check the level 5
feature...
Link: https://lkml.kernel.org/r/20240119205801.62769-1-audra@redhat.com
Fixes: 4f2930c671 ("selftests/vm: only run 128TBswitch with 5-level paging")
Signed-off-by: Audra Mitchell <audra@redhat.com>
Cc: Rafael Aquini <raquini@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Adam Sindelar <adam@wowsignal.io>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
With the introduction of the pool_rwlock (reader-writer lock), several
fast paths end up taking the pool_rwlock as readers. Furthermore,
stack_depot_put() unconditionally takes the pool_rwlock as a writer.
Despite allowing readers to make forward-progress concurrently,
reader-writer locks have inherent cache contention issues, which does not
scale well on systems with large CPU counts.
Rework the synchronization story of stack depot to again avoid taking any
locks in the fast paths. This is done by relying on RCU-protected list
traversal, and the NMI-safe subset of RCU to delay reuse of freed stack
records. See code comments for more details.
Along with the performance issues, this also fixes incorrect nesting of
rwlock within a raw_spinlock, given that stack depot should still be
usable from anywhere:
| [ BUG: Invalid wait context ]
| -----------------------------
| swapper/0/1 is trying to lock:
| ffffffff89869be8 (pool_rwlock){..--}-{3:3}, at: stack_depot_save_flags
| other info that might help us debug this:
| context-{5:5}
| 2 locks held by swapper/0/1:
| #0: ffffffff89632440 (rcu_read_lock){....}-{1:3}, at: __queue_work
| #1: ffff888100092018 (&pool->lock){-.-.}-{2:2}, at: __queue_work <-- raw_spin_lock
Stack depot usage stats are similar to the previous version after a KASAN
kernel boot:
$ cat /sys/kernel/debug/stackdepot/stats
pools: 838
allocations: 29865
frees: 6604
in_use: 23261
freelist_size: 1879
The number of pools is the same as previously. The freelist size is
minimally larger, but this may also be due to variance across system
boots. This shows that even though we do not eagerly wait for the next
RCU grace period (such as with synchronize_rcu() or call_rcu()) after
freeing a stack record - requiring depot_pop_free() to "poll" if an entry
may be used - new allocations are very likely to happen in later RCU grace
periods.
Link: https://lkml.kernel.org/r/20240118110216.2539519-2-elver@google.com
Fixes: 108be8def4 ("lib/stackdepot: allow users to evict stack traces")
Reported-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Marco Elver <elver@google.com>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Add a few basic stats counters for stack depot that can be used to derive
if stack depot is working as intended. This is a snapshot of the new
stats after booting a system with a KASAN-enabled kernel:
$ cat /sys/kernel/debug/stackdepot/stats
pools: 838
allocations: 29861
frees: 6561
in_use: 23300
freelist_size: 1840
Generally, "pools" should be well below the max; once the system is
booted, "in_use" should remain relatively steady.
Link: https://lkml.kernel.org/r/20240118110216.2539519-1-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Alexander Potapenko <glider@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Alexander Potapenko writes in [1]: "For every memory access in the code
instrumented by KMSAN we call kmsan_get_metadata() to obtain the metadata
for the memory being accessed. For virtual memory the metadata pointers
are stored in the corresponding `struct page`, therefore we need to call
virt_to_page() to get them.
According to the comment in arch/x86/include/asm/page.h,
virt_to_page(kaddr) returns a valid pointer iff virt_addr_valid(kaddr) is
true, so KMSAN needs to call virt_addr_valid() as well.
To avoid recursion, kmsan_get_metadata() must not call instrumented code,
therefore ./arch/x86/include/asm/kmsan.h forks parts of
arch/x86/mm/physaddr.c to check whether a virtual address is valid or not.
But the introduction of rcu_read_lock() to pfn_valid() added instrumented
RCU API calls to virt_to_page_or_null(), which is called by
kmsan_get_metadata(), so there is an infinite recursion now. I do not
think it is correct to stop that recursion by doing
kmsan_enter_runtime()/kmsan_exit_runtime() in kmsan_get_metadata(): that
would prevent instrumented functions called from within the runtime from
tracking the shadow values, which might introduce false positives."
Fix the issue by switching pfn_valid() to the _sched() variant of
rcu_read_lock/unlock(), which does not require calling into RCU. Given
the critical section in pfn_valid() is very small, this is a reasonable
trade-off (with preemptible RCU).
KMSAN further needs to be careful to suppress calls into the scheduler,
which would be another source of recursion. This can be done by wrapping
the call to pfn_valid() into preempt_disable/enable_no_resched(). The
downside is that this sacrifices breaking scheduling guarantees; however,
a kernel compiled with KMSAN has already given up any performance
guarantees due to being heavily instrumented.
Note, KMSAN code already disables tracing via Makefile, and since mmzone.h
is included, it is not necessary to use the notrace variant, which is
generally preferred in all other cases.
Link: https://lkml.kernel.org/r/20240115184430.2710652-1-glider@google.com [1]
Link: https://lkml.kernel.org/r/20240118110022.2538350-1-elver@google.com
Fixes: 5ec8e8ea8b ("mm/sparsemem: fix race in accessing memory_section->usage")
Signed-off-by: Marco Elver <elver@google.com>
Reported-by: Alexander Potapenko <glider@google.com>
Reported-by: syzbot+93a9e8a3dea8d6085e12@syzkaller.appspotmail.com
Reviewed-by: Alexander Potapenko <glider@google.com>
Tested-by: Alexander Potapenko <glider@google.com>
Cc: Charan Teja Kalla <quic_charante@quicinc.com>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(struct dirty_throttle_control *)->thresh is an unsigned long, but is
passed as the u32 divisor argument to div_u64(). On architectures where
unsigned long is 64 bytes, the argument will be implicitly truncated.
Use div64_u64() instead of div_u64() so that the value used in the "is
this a safe division" check is the same as the divisor.
Also, remove redundant cast of the numerator to u64, as that should happen
implicitly.
This would be difficult to exploit in memcg domain, given the ratio-based
arithmetic domain_drity_limits() uses, but is much easier in global
writeback domain with a BDI_CAP_STRICTLIMIT-backing device, using e.g.
vm.dirty_bytes=(1<<32)*PAGE_SIZE so that dtc->thresh == (1<<32)
Link: https://lkml.kernel.org/r/20240118181954.1415197-1-zokeefe@google.com
Fixes: f6789593d5 ("mm/page-writeback.c: fix divide by zero in bdi_dirty_limits()")
Signed-off-by: Zach O'Keefe <zokeefe@google.com>
Cc: Maxim Patlasov <MPatlasov@parallels.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
While investigating hosts with high cgroup memory pressures, Tejun
found culprit zombie tasks that had were holding on to a lot of
memory, had SIGKILL pending, but were stuck in memory.high reclaim.
In the past, we used to always force-charge allocations from tasks
that were exiting in order to accelerate them dying and freeing up
their rss. This changed for memory.max in a4ebf1b6ca ("memcg:
prohibit unconditional exceeding the limit of dying tasks"); it noted
that this can cause (userspace inducable) containment failures, so it
added a mandatory reclaim and OOM kill cycle before forcing charges.
At the time, memory.high enforcement was handled in the userspace
return path, which isn't reached by dying tasks, and so memory.high
was still never enforced by dying tasks.
When c9afe31ec4 ("memcg: synchronously enforce memory.high for large
overcharges") added synchronous reclaim for memory.high, it added
unconditional memory.high enforcement for dying tasks as well. The
callstack shows that this path is where the zombie is stuck in.
We need to accelerate dying tasks getting past memory.high, but we
cannot do it quite the same way as we do for memory.max: memory.max is
enforced strictly, and tasks aren't allowed to move past it without
FIRST reclaiming and OOM killing if necessary. This ensures very small
levels of excess. With memory.high, though, enforcement happens lazily
after the charge, and OOM killing is never triggered. A lot of
concurrent threads could have pushed, or could actively be pushing,
the cgroup into excess. The dying task will enter reclaim on every
allocation attempt, with little hope of restoring balance.
To fix this, skip synchronous memory.high enforcement on dying tasks
altogether again. Update memory.high path documentation while at it.
[hannes@cmpxchg.org: also handle tasks are being killed during the reclaim]
Link: https://lkml.kernel.org/r/20240111192807.GA424308@cmpxchg.org
Link: https://lkml.kernel.org/r/20240111132902.389862-1-hannes@cmpxchg.org
Fixes: c9afe31ec4 ("memcg: synchronously enforce memory.high for large overcharges")
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Yosry Ahmed <yosryahmed@google.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Dan Schatzberg <schatzberg.dan@gmail.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
commit efa7df3e3b ("mm: align larger anonymous mappings on THP
boundaries") incured regression for stress-ng pthread benchmark [1]. It
is because THP get allocated to pthread's stack area much more possible
than before. Pthread's stack area is allocated by mmap without
VM_GROWSDOWN or VM_GROWSUP flag, so kernel can't tell whether it is a
stack area or not.
The MAP_STACK flag is used to mark the stack area, but it is a no-op on
Linux. Mapping MAP_STACK to VM_NOHUGEPAGE to prevent from allocating THP
for such stack area.
With this change the stack area looks like:
fffd18e10000-fffd19610000 rw-p 00000000 00:00 0
Size: 8192 kB
KernelPageSize: 4 kB
MMUPageSize: 4 kB
Rss: 12 kB
Pss: 12 kB
Pss_Dirty: 12 kB
Shared_Clean: 0 kB
Shared_Dirty: 0 kB
Private_Clean: 0 kB
Private_Dirty: 12 kB
Referenced: 12 kB
Anonymous: 12 kB
KSM: 0 kB
LazyFree: 0 kB
AnonHugePages: 0 kB
ShmemPmdMapped: 0 kB
FilePmdMapped: 0 kB
Shared_Hugetlb: 0 kB
Private_Hugetlb: 0 kB
Swap: 0 kB
SwapPss: 0 kB
Locked: 0 kB
THPeligible: 0
VmFlags: rd wr mr mw me ac nh
The "nh" flag is set.
[1] https://lore.kernel.org/linux-mm/202312192310.56367035-oliver.sang@intel.com/
Link: https://lkml.kernel.org/r/20231221065943.2803551-2-shy828301@gmail.com
Fixes: efa7df3e3b ("mm: align larger anonymous mappings on THP boundaries")
Signed-off-by: Yang Shi <yang@os.amperecomputing.com>
Reported-by: kernel test robot <oliver.sang@intel.com>
Tested-by: Oliver Sang <oliver.sang@intel.com>
Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Christopher Lameter <cl@linux.com>
Cc: Huang, Ying <ying.huang@intel.com>
Cc: <stable@vger.kerenl.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
uprobes passes an unaligned page mapping address to
folio_add_new_anon_rmap(), which ends up triggering a VM_BUG_ON() we
recently extended in commit 372cbd4d5a ("mm: non-pmd-mappable, large
folios for folio_add_new_anon_rmap()").
Arguably, this is uprobes code doing something wrong; however, for the
time being it would have likely worked in rmap code because
__folio_set_anon() would set folio->index to the same value.
Looking at __replace_page(), we'd also pass slightly wrong values to
mmu_notifier_range_init(), page_vma_mapped_walk(), flush_cache_page(),
ptep_clear_flush() and set_pte_at_notify(). I suspect most of them are
fine, but let's just mark the introducing commit as the one needed fixing.
I don't think CC stable is warranted.
We'll add more sanity checks in rmap code separately, to make sure that we
always get properly aligned addresses.
Link: https://lkml.kernel.org/r/20240115100731.91007-1-david@redhat.com
Fixes: c517ee744b ("uprobes: __replace_page() should not use page_address_in_vma()")
Signed-off-by: David Hildenbrand <david@redhat.com>
Reported-by: Jiri Olsa <jolsa@kernel.org>
Closes: https://lkml.kernel.org/r/ZaMR2EWN-HvlCfUl@krava
Tested-by: Jiri Olsa <jolsa@kernel.org>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Shishkin
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
ra_alloc_folio() marks a page that should trigger next round of async
readahead. However it rounds up computed index to the order of page being
allocated. This can however lead to multiple consecutive pages being
marked with readahead flag. Consider situation with index == 1, mark ==
1, order == 0. We insert order 0 page at index 1 and mark it. Then we
bump order to 1, index to 2, mark (still == 1) is rounded up to 2 so page
at index 2 is marked as well. Then we bump order to 2, index is
incremented to 4, mark gets rounded to 4 so page at index 4 is marked as
well. The fact that multiple pages get marked within a single readahead
window confuses the readahead logic and results in readahead window being
trimmed back to 1. This situation is triggered in particular when maximum
readahead window size is not a power of two (in the observed case it was
768 KB) and as a result sequential read throughput suffers.
Fix the problem by rounding 'mark' down instead of up. Because the index
is naturally aligned to 'order', we are guaranteed 'rounded mark' == index
iff 'mark' is within the page we are allocating at 'index' and thus
exactly one page is marked with readahead flag as required by the
readahead code and sequential read performance is restored.
This effectively reverts part of commit b9ff43dd27 ("mm/readahead: Fix
readahead with large folios"). The commit changed the rounding with the
rationale:
"... we were setting the readahead flag on the folio which contains the
last byte read from the block. This is wrong because we will trigger
readahead at the end of the read without waiting to see if a subsequent
read is going to use the pages we just read."
Although this is true, the fact is this was always the case with read
sizes not aligned to folio boundaries and large folios in the page cache
just make the situation more obvious (and frequent). Also for sequential
read workloads it is better to trigger the readahead earlier rather than
later. It is true that the difference in the rounding and thus earlier
triggering of the readahead can result in reading more for semi-random
workloads. However workloads really suffering from this seem to be rare.
In particular I have verified that the workload described in commit
b9ff43dd27 ("mm/readahead: Fix readahead with large folios") of reading
random 100k blocks from a file like:
[reader]
bs=100k
rw=randread
numjobs=1
size=64g
runtime=60s
is not impacted by the rounding change and achieves ~70MB/s in both cases.
[jack@suse.cz: fix one more place where mark rounding was done as well]
Link: https://lkml.kernel.org/r/20240123153254.5206-1-jack@suse.cz
Link: https://lkml.kernel.org/r/20240104085839.21029-1-jack@suse.cz
Fixes: b9ff43dd27 ("mm/readahead: Fix readahead with large folios")
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Guo Xuenan <guoxuenan@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Louis Peens says:
====================
nfp: flower: a few small conntrack offload fixes
This small series addresses two bugs in the nfp conntrack offloading
code.
The first patch is a check to prevent offloading for a case which is
currently not supported by the nfp.
The second patch fixes up parsing of layer4 mangling code so it can be
correctly offloaded. Since the masks are an inverse mask and we are
shifting it so it can be packed together with the destination we
effectively need to 'clear' the lower bits of the mask by setting it to
0xFFFF.
====================
Link: https://lore.kernel.org/r/20240124151909.31603-1-louis.peens@corigine.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The nfp driver will merge the tp source port and tp destination port
into one dword which the offset must be zero to do hardware offload.
However, the mangle action for the tp source port and tp destination
port is separated for tc ct action. Modify the mangle action for the
FLOW_ACT_MANGLE_HDR_TYPE_TCP and FLOW_ACT_MANGLE_HDR_TYPE_UDP to
satisfy the nfp driver offload check for the tp port.
The mangle action provides a 4B value for source, and a 4B value for
the destination, but only 2B of each contains the useful information.
For offload the 2B of each is combined into a single 4B word. Since the
incoming mask for the source is '0xFFFF<mask>' the shift-left will
throw away the 0xFFFF part. When this gets combined together in the
offload it will clear the destination field. Fix this by setting the
lower bits back to 0xFFFF, effectively doing a rotate-left operation on
the mask.
Fixes: 5cee92c6f5 ("nfp: flower: support hw offload for ct nat action")
CC: stable@vger.kernel.org # 6.1+
Signed-off-by: Hui Zhou <hui.zhou@corigine.com>
Signed-off-by: Louis Peens <louis.peens@corigine.com>
Link: https://lore.kernel.org/r/20240124151909.31603-3-louis.peens@corigine.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The nfp offload flow pay will not allocate a mask id when the out port
is openvswitch internal port. This is because these flows are used to
configure the pre_tun table and are never actually send to the firmware
as an add-flow message. When a tc rule which action contains ct and
the post ct entry's out port is openvswitch internal port, the merge
offload flow pay with the wrong mask id of 0 will be send to the
firmware. Actually, the nfp can not support hardware offload for this
situation, so return EOPNOTSUPP.
Fixes: bd0fe7f96a ("nfp: flower-ct: add zone table entry when handling pre/post_ct flows")
CC: stable@vger.kernel.org # 5.14+
Signed-off-by: Hui Zhou <hui.zhou@corigine.com>
Signed-off-by: Louis Peens <louis.peens@corigine.com>
Link: https://lore.kernel.org/r/20240124151909.31603-2-louis.peens@corigine.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The busywait timeout value is a millisecond, not a second. So the
current setting 2 is too small. On slow/busy host (or VMs) the
current timeout can expire even on "correct" execution, causing random
failures. Let's copy the WAIT_TIMEOUT from forwarding/lib.sh and set
BUSYWAIT_TIMEOUT here.
Fixes: 25ae948b44 ("selftests/net: add lib.sh")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240124061344.1864484-1-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add a new helper, bch2_hash_lookup_in_snapshot(), for when we're not
operating in a subvolume and already have a snapshot ID, and then use it
in lookup_lostfound() -> __lookup_dirent().
This is a bugfix - lookup_lostfound() doesn't take a subvolume ID, we
were passing a nonsense subvolume ID before, and don't have one to pass
since we may be operating in an interior snapshot node that doesn't have
a subvolume ID.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Paolo Abeni says:
====================
selftests: net: a few fixes
This series address self-tests failures for udp gro-related tests.
The first patch addresses the main problem I observe locally - the XDP
program required by such tests, xdp_dummy, is currently build in the
ebpf self-tests directory, not available if/when the user targets net
only. Arguably is more a refactor than a fix, but still targeting net
to hopefully
The second patch fixes the integration of such tests with the build
system.
Patch 3/3 fixes sporadic failures due to races.
Tested with:
make -C tools/testing/selftests/ TARGETS=net install
./tools/testing/selftests/kselftest_install/run_kselftest.sh \
-t "net:udpgro_bench.sh net:udpgro.sh net:udpgro_fwd.sh \
net:udpgro_frglist.sh net:veth.sh"
no failures.
====================
Link: https://lore.kernel.org/r/cover.1706131762.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The blamed commit below introduce a dependency in some net self-tests
towards a newly introduce helper script.
Such script is currently not included into the TEST_PROGS_EXTENDED list
and thus is not installed, causing failure for the relevant tests when
executed from the install dir.
Fix the issue updating the install targets.
Fixes: 3bdd9fd29c ("selftests/net: synchronize udpgro tests' tx and rx connection")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/076e8758e21ff2061cc9f81640e7858df775f0a9.1706131762.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Still a bit unclear whether each directory should have its own
config file, but assuming they should lets add one for tcp_ao.
The following tests still fail with this config in place:
- rst_ipv4,
- rst_ipv6,
- bench-lookups_ipv6.
other 21 pass.
Fixes: d11301f659 ("selftests/net: Add TCP-AO ICMPs accept test")
Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com>
Link: https://lore.kernel.org/r/20240124192550.1865743-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In __spi_pump_transfer_message(), the message was not finalized in the
first error return as it is in the other error return paths. Not
finalizing the message could cause anything waiting on the message to
complete to hang forever.
This adds the missing call to spi_finalize_current_message().
Fixes: ae7d2346dc ("spi: Don't use the message queue if possible in spi_sync")
Signed-off-by: David Lechner <dlechner@baylibre.com>
Link: https://msgid.link/r/20240125205312.3458541-2-dlechner@baylibre.com
Signed-off-by: Mark Brown <broonie@kernel.org>
[Why]
IPS was temporary disabled due to instability.
It was fixed in dmub firmware and with:
- "drm/amd/display: Add IPS checks before dcn register access"
- "drm/amd/display: Disable ips before dc interrupt setting"
[How]
Enable IPS by default.
Disable IPS if 0x800 bit set in amdgpu.dcdebugmask module params
Signed-off-by: Roman Li <Roman.Li@amd.com>
Tested-by: Mark Broadworth <mark.broadworth@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
While in IPS2 an access to dcn registers is not allowed.
If interrupt results in dc call, we should disable IPS.
[How]
Safeguard register access in IPS2 by disabling idle optimization
before calling dc interrupt setting api.
Signed-off-by: Roman Li <Roman.Li@amd.com>
Tested-by: Mark Broadworth <mark.broadworth@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
Because ABM will wait VStart to start getting histogram data,
it will cause we can't enter IPS while full screnn video playing.
[How]
Modify the panel refresh rate to the maximun multiple of current
refresh rate.
Reviewed-by: Dennis Chan <dennis.chan@amd.com>
Acked-by: Roman Li <roman.li@amd.com>
Signed-off-by: ChunTao Tso <chuntao.tso@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
With IPS enabled a system hangs once PSR is active.
PSR active triggers transition to IPS2 state.
While in IPS2 an access to dcn registers results in hard hang.
Existing check doesn't cover for PSR sequence.
[How]
Safeguard register access by disabling idle optimization in atomic commit
and crtc scanout. It will be re-enabled on next vblank.
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Acked-by: Roman Li <roman.li@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why & How]
Add regkey to block video playback in IPS2 by default
Allow idle optimizations in the same spot we allow Replay for
video playback usecases.
Avoid sending it when there's an external display connected by
modifying the allow idle checks to check for active non-eDP screens.
Reviewed-by: Charlene Liu <charlene.liu@amd.com>
Acked-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This needs to be set to 1 to avoid a potential deadlock in
the GC 10.x and newer. On GC 9.x and older, this needs
to be set to 0. This can lead to hangs in some mixed
graphics and compute workloads. Updated firmware is also
required for AQL.
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
This needs to be set to 1 to avoid a potential deadlock in
the GC 10.x and newer. On GC 9.x and older, this needs
to be set to 0. This can lead to hangs in some mixed
graphics and compute workloads. Updated firmware is also
required for AQL.
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
The power source flag should be updated when
[1] System receives an interrupt indicating that the power source
has changed.
[2] System resumes from suspend or runtime suspend
Signed-off-by: Ma Jun <Jun.Ma2@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
The 'status' variable in 'core_link_read_dpcd()' &
'core_link_write_dpcd()' was uninitialized.
Thus, initializing 'status' variable to 'DC_ERROR_UNEXPECTED' by default.
Fixes the below:
drivers/gpu/drm/amd/amdgpu/../display/dc/link/protocols/link_dpcd.c:226 core_link_read_dpcd() error: uninitialized symbol 'status'.
drivers/gpu/drm/amd/amdgpu/../display/dc/link/protocols/link_dpcd.c:248 core_link_write_dpcd() error: uninitialized symbol 'status'.
Cc: stable@vger.kernel.org
Cc: Jerry Zuo <jerry.zuo@amd.com>
Cc: Jun Lei <Jun.Lei@amd.com>
Cc: Wayne Lin <Wayne.Lin@amd.com>
Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Cc: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Pull networking fixes from Paolo Abeni:
"Including fixes from bpf, netfilter and WiFi.
Jakub is doing a lot of work to include the self-tests in our CI, as a
result a significant amount of self-tests related fixes is flowing in
(and will likely continue in the next few weeks).
Current release - regressions:
- bpf: fix a kernel crash for the riscv 64 JIT
- bnxt_en: fix memory leak in bnxt_hwrm_get_rings()
- revert "net: macsec: use skb_ensure_writable_head_tail to expand
the skb"
Previous releases - regressions:
- core: fix removing a namespace with conflicting altnames
- tc/flower: fix chain template offload memory leak
- tcp:
- make sure init the accept_queue's spinlocks once
- fix autocork on CPUs with weak memory model
- udp: fix busy polling
- mlx5e:
- fix out-of-bound read in port timestamping
- fix peer flow lists corruption
- iwlwifi: fix a memory corruption
Previous releases - always broken:
- netfilter:
- nft_chain_filter: handle NETDEV_UNREGISTER for inet/ingress
basechain
- nft_limit: reject configurations that cause integer overflow
- bpf: fix bpf_xdp_adjust_tail() with XSK zero-copy mbuf, avoiding a
NULL pointer dereference upon shrinking
- llc: make llc_ui_sendmsg() more robust against bonding changes
- smc: fix illegal rmb_desc access in SMC-D connection dump
- dpll: fix pin dump crash for rebound module
- bnxt_en: fix possible crash after creating sw mqprio TCs
- hv_netvsc: calculate correct ring size when PAGE_SIZE is not 4kB
Misc:
- several self-tests fixes for better integration with the netdev CI
- added several missing modules descriptions"
* tag 'net-6.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (88 commits)
tsnep: Fix XDP_RING_NEED_WAKEUP for empty fill ring
tsnep: Remove FCS for XDP data path
net: fec: fix the unhandled context fault from smmu
selftests: bonding: do not test arp/ns target with mode balance-alb/tlb
fjes: fix memleaks in fjes_hw_setup
i40e: update xdp_rxq_info::frag_size for ZC enabled Rx queue
i40e: set xdp_rxq_info::frag_size
xdp: reflect tail increase for MEM_TYPE_XSK_BUFF_POOL
ice: update xdp_rxq_info::frag_size for ZC enabled Rx queue
intel: xsk: initialize skb_frag_t::bv_offset in ZC drivers
ice: remove redundant xdp_rxq_info registration
i40e: handle multi-buffer packets that are shrunk by xdp prog
ice: work on pre-XDP prog frag count
xsk: fix usage of multi-buffer BPF helpers for ZC XDP
xsk: make xsk_buff_pool responsible for clearing xdp_buff::flags
xsk: recycle buffer in case Rx queue was full
net: fill in MODULE_DESCRIPTION()s for rvu_mbox
net: fill in MODULE_DESCRIPTION()s for litex
net: fill in MODULE_DESCRIPTION()s for fsl_pq_mdio
net: fill in MODULE_DESCRIPTION()s for fec
...
Pull overlayfs fix from Amir Goldstein:
"Change the on-disk format for the new "xwhiteouts" feature introduced
in v6.7
The change reduces unneeded overhead of an extra getxattr per readdir.
The only user of the "xwhiteout" feature is the external composefs
tool, which has been updated to support the new on-disk format.
This change is also designated for 6.7.y"
* tag 'ovl-fixes-6.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs:
ovl: mark xwhiteouts directory with overlay.opaque='x'
Pull netfs fixes from Christian Brauner:
"This contains various fixes for the netfs work merged earlier this
cycle:
afs:
- Fix locking imbalance in afs_proc_addr_prefs_show()
- Remove afs_dynroot_d_revalidate() which is redundant
- Fix error handling during lookup
- Hide sillyrenames from userspace. This fixes a race between
silly-rename files being created/removed and userspace iterating
over directory entries
- Don't use unnecessary folio_*() functions
cifs:
- Don't use unnecessary folio_*() functions
cachefiles:
- erofs: Fix Null dereference when cachefiles are not doing
ondemand-mode
- Update mailing list
netfs library:
- Add Jeff Layton as reviewer
- Update mailing list
- Fix a error checking in netfs_perform_write()
- fscache: Check error before dereferencing
- Don't use unnecessary folio_*() functions"
* tag 'vfs-6.8-rc2.netfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
afs: Fix missing/incorrect unlocking of RCU read lock
afs: Remove afs_dynroot_d_revalidate() as it is redundant
afs: Fix error handling with lookup via FS.InlineBulkStatus
afs: Hide silly-rename files from userspace
cachefiles, erofs: Fix NULL deref in when cachefiles is not doing ondemand-mode
netfs: Fix a NULL vs IS_ERR() check in netfs_perform_write()
netfs, fscache: Prevent Oops in fscache_put_cache()
cifs: Don't use certain unnecessary folio_*() functions
afs: Don't use certain unnecessary folio_*() functions
netfs: Don't use certain unnecessary folio_*() functions
netfs: Add Jeff Layton as reviewer
netfs, cachefiles: Change mailing list
Pull nfsd fixes from Chuck Lever:
- Fix in-kernel RPC UDP transport
- Fix NFSv4.0 RELEASE_LOCKOWNER
* tag 'nfsd-6.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
nfsd: fix RELEASE_LOCKOWNER
SUNRPC: use request size to initialize bio_vec in svc_udp_sendto()
Pull RCU fix from Neeraj Upadhyay:
"This fixes RCU grace period stalls, which are observed when an
outgoing CPU's quiescent state reporting results in wakeup of one of
the grace period kthreads, to complete the grace period.
If those kthreads have SCHED_FIFO policy, the wake up can indirectly
arm the RT bandwith timer to the local offline CPU.
Earlier migration of the hrtimers from the CPU introduced in commit
5c0930ccaa ("hrtimers: Push pending hrtimers away from outgoing CPU
earlier") results in this timer getting ignored.
If the RCU grace period kthreads are waiting for RT bandwidth to be
available, they may never be actually scheduled, resulting in RCU
stall warnings"
* tag 'urgent-rcu.2024.01.24a' of https://github.com/neeraju/linux:
rcu: Defer RCU kthreads wakeup when CPU is dying
Samsung fixes for v6.8
1. Google GS101: Correct the input clock names to CMU MISC clock
controller to match received review. The review was initially missed
and CMU MISC clock controller bindings, driver and DTS was merged
into v6.8-rc1 with different names. Nothing was released so far, so
the bindings and driver can be still corrected to match review.
2. Samsung Galaxy Tab3: Fix display by using correct vclk polarity in
display node.
* tag 'samsung-fixes-6.8' of https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux:
ARM: dts: exynos4212-tab3: add samsung,invert-vclk flag to fimd
arm64: dts: exynos: gs101: comply with the new cmu_misc clock names
Link: https://lore.kernel.org/r/20240125082400.163935-1-krzysztof.kozlowski@linaro.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Arm FF-A fixes for v6.8
Quite a few fixes addressing issues around missing RW lock initialisation
in ffa_setup_partitions(), missing check for xa_load() return value,
use of xa_insert instead of xa_store to flag case of duplicate insertion.
It also simplifies ffa_partitions_cleanup() with xa_for_each() and xa_erase()
instead of xa_extract() and kfree(). Finally it includes fixes around
handling of partitions setup failures during initialisation.
* tag 'ffa-fixes-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux:
firmware: arm_ffa: Handle partitions setup failures
firmware: arm_ffa: Use xa_insert() and check for result
firmware: arm_ffa: Simplify ffa_partitions_cleanup()
firmware: arm_ffa: Check xa_load() return value
firmware: arm_ffa: Add missing rwlock_init() for the driver partition
firmware: arm_ffa: Add missing rwlock_init() in ffa_setup_partitions()
Link: https://lore.kernel.org/r/20240122161652.3551159-1-sudeep.holla@arm.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Arm SCMI fixes for v6.8
Few fixes addressing the below issues:
1. A spurious IRQ related to the late reply can get wrongly associated
with the new enqueued request resulting in misinterpretation of data
in shared memory. This race-condition can be detected by looking at
the channel status bits which the platform must set to the channel
free before triggering the completion IRQ. Adding a consistency check
to validate such condition will fix the issue.
2. Incorrect use of asm-generic/bug.h instead of generic linux/bUg.h
3. xa_store() can't check for possible duplication insertion, use
xa_insert() instead
4. Fix the SCMI clock protocol version in the v3.2 SCMI specification
5. Incorrect upgrade of highest supported clock protocol version from
v2.0 to v3.0
* tag 'scmi-fixes-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux:
firmware: arm_scmi: Fix the clock protocol supported version
firmware: arm_scmi: Fix the clock protocol version for v3.2
firmware: arm_scmi: Use xa_insert() when saving raw queues
firmware: arm_scmi: Use xa_insert() to store opps
firmware: arm_scmi: Replace asm-generic/bug.h with linux/bug.h
firmware: arm_scmi: Check mailbox/SMT channel for consistency
Link: https://lore.kernel.org/r/20240122161640.3551085-1-sudeep.holla@arm.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
With one of the on-board ASM1061 AHCI controllers (1b21:0612) on an
ASUSTeK Pro WS WRX80E-SAGE SE WIFI mainboard, a controller hang was
observed that was immediately preceded by the following kernel
messages:
ahci 0000:28:00.0: Using 64-bit DMA addresses
ahci 0000:28:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0035 address=0x7fffff00000 flags=0x0000]
ahci 0000:28:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0035 address=0x7fffff00300 flags=0x0000]
ahci 0000:28:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0035 address=0x7fffff00380 flags=0x0000]
ahci 0000:28:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0035 address=0x7fffff00400 flags=0x0000]
ahci 0000:28:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0035 address=0x7fffff00680 flags=0x0000]
ahci 0000:28:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0035 address=0x7fffff00700 flags=0x0000]
The first message is produced by code in drivers/iommu/dma-iommu.c
which is accompanied by the following comment that seems to apply:
/*
* Try to use all the 32-bit PCI addresses first. The original SAC vs.
* DAC reasoning loses relevance with PCIe, but enough hardware and
* firmware bugs are still lurking out there that it's safest not to
* venture into the 64-bit space until necessary.
*
* If your device goes wrong after seeing the notice then likely either
* its driver is not setting DMA masks accurately, the hardware has
* some inherent bug in handling >32-bit addresses, or not all the
* expected address bits are wired up between the device and the IOMMU.
*/
Asking the ASM1061 on a discrete PCIe card to DMA from I/O virtual
address 0xffffffff00000000 produces the following I/O page faults:
vfio-pci 0000:07:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0021 address=0x7ff00000000 flags=0x0010]
vfio-pci 0000:07:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0021 address=0x7ff00000500 flags=0x0010]
Note that the upper 21 bits of the logged DMA address are zero. (When
asking a different PCIe device in the same PCIe slot to DMA to the
same I/O virtual address, we do see all the upper 32 bits of the DMA
address as 1, so this is not an issue with the chipset or IOMMU
configuration on the test system.)
Also, hacking libahci to always set the upper 21 bits of all DMA
addresses to 1 produces no discernible effect on the behavior of the
ASM1061, and mkfs/mount/scrub/etc work as without this hack.
This all strongly suggests that the ASM1061 has a 43 bit DMA address
limit, and this commit therefore adds a quirk to deal with this limit.
This issue probably applies to (some of) the other supported ASMedia
parts as well, but we limit it to the PCI IDs known to refer to
ASM1061 parts, as that's the only part we know for sure to be affected
by this issue at this point.
Link: https://lore.kernel.org/linux-ide/ZaZ2PIpEId-rl6jv@wantstofly.org/
Signed-off-by: Lennert Buytenhek <kernel@wantstofly.org>
[cassel: drop date from error messages in commit log]
Signed-off-by: Niklas Cassel <cassel@kernel.org>
One issue comes up while building selftest/landlock/fs_test on my side
(gcc 7.3/glibc-2.28/kernel-4.19).
gcc -Wall -O2 -isystem fs_test.c -lcap -o selftests/landlock/fs_test
fs_test.c:4575:9: error: initializer element is not constant
.mnt = mnt_tmp,
^~~~~~~
Signed-off-by: Hu Yadi <hu.yadi@h3c.com>
Suggested-by: Jiao <jiaoxupo@h3c.com>
Reviewed-by: Berlin <berlin@h3c.com>
Link: https://lore.kernel.org/r/20240124022908.42100-1-hu.yadi@h3c.com
Fixes: 04f9070e99 ("selftests/landlock: Add tests for pseudo filesystems")
[mic: Factor out mount's data string and make mnt_tmp static]
Signed-off-by: Mickaël Salaün <mic@digikod.net>
One issue comes up while building selftest/landlock/net_test on my side
(gcc 7.3/glibc-2.28/kernel-4.19).
net_test.c: In function ‘set_service’:
net_test.c:91:45: warning: implicit declaration of function ‘gettid’; [-Wimplicit-function-declaration]
"_selftests-landlock-net-tid%d-index%d", gettid(),
^~~~~~
getgid
net_test.c:(.text+0x4e0): undefined reference to `gettid'
Signed-off-by: Hu Yadi <hu.yadi@h3c.com>
Suggested-by: Jiao <jiaoxupo@h3c.com>
Reviewed-by: Berlin <berlin@h3c.com>
Fixes: a549d055a2 ("selftests/landlock: Add network tests")
Link: https://lore.kernel.org/r/20240123062621.25082-1-hu.yadi@h3c.com
[mic: Cosmetic fixes]
Signed-off-by: Mickaël Salaün <mic@digikod.net>
The fill ring of the XDP socket may contain not enough buffers to
completey fill the RX queue during socket creation. In this case the
flag XDP_RING_NEED_WAKEUP is not set as this flag is only set if the RX
queue is not completely filled during polling.
Set XDP_RING_NEED_WAKEUP flag also if RX queue is not completely filled
during XDP socket creation.
Fixes: 3fc2333933 ("tsnep: Add XDP socket zero-copy RX support")
Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The RX data buffer includes the FCS. The FCS is already stripped for the
normal data path. But for the XDP data path the FCS is included and
acts like additional/useless data.
Remove the FCS from the RX data buffer also for XDP.
Fixes: 65b28c8100 ("tsnep: Add XDP RX support")
Fixes: 3fc2333933 ("tsnep: Add XDP socket zero-copy RX support")
Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Saeed Mahameed says:
====================
mlx5 fixes 2024-01-24
This series provides bug fixes to mlx5 driver.
Please pull and let me know if there is any problem.
* tag 'mlx5-fixes-2024-01-24' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5e: fix a potential double-free in fs_any_create_groups
net/mlx5e: fix a double-free in arfs_create_groups
net/mlx5e: Ignore IPsec replay window values on sender side
net/mlx5e: Allow software parsing when IPsec crypto is enabled
net/mlx5: Use mlx5 device constant for selecting CQ period mode for ASO
net/mlx5: DR, Can't go to uplink vport on RX rule
net/mlx5: DR, Use the right GVMI number for drop action
net/mlx5: Bridge, fix multicast packets sent to uplink
net/mlx5: Fix a WARN upon a callback command failure
net/mlx5e: Fix peer flow lists handling
net/mlx5e: Fix inconsistent hairpin RQT sizes
net/mlx5e: Fix operation precedence bug in port timestamping napi_poll context
net/mlx5: Fix query of sd_group field
net/mlx5e: Use the correct lag ports number when creating TISes
====================
Link: https://lore.kernel.org/r/20240124081855.115410-1-saeed@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Daniel Borkmann says:
====================
pull-request: bpf 2024-01-25
The following pull-request contains BPF updates for your *net* tree.
We've added 12 non-merge commits during the last 2 day(s) which contain
a total of 13 files changed, 190 insertions(+), 91 deletions(-).
The main changes are:
1) Fix bpf_xdp_adjust_tail() in context of XSK zero-copy drivers which
support XDP multi-buffer. The former triggered a NULL pointer
dereference upon shrinking, from Maciej Fijalkowski & Tirthendu Sarkar.
2) Fix a bug in riscv64 BPF JIT which emitted a wrong prologue and
epilogue for struct_ops programs, from Pu Lehui.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
i40e: update xdp_rxq_info::frag_size for ZC enabled Rx queue
i40e: set xdp_rxq_info::frag_size
xdp: reflect tail increase for MEM_TYPE_XSK_BUFF_POOL
ice: update xdp_rxq_info::frag_size for ZC enabled Rx queue
intel: xsk: initialize skb_frag_t::bv_offset in ZC drivers
ice: remove redundant xdp_rxq_info registration
i40e: handle multi-buffer packets that are shrunk by xdp prog
ice: work on pre-XDP prog frag count
xsk: fix usage of multi-buffer BPF helpers for ZC XDP
xsk: make xsk_buff_pool responsible for clearing xdp_buff::flags
xsk: recycle buffer in case Rx queue was full
riscv, bpf: Fix unpredictable kernel crash about RV64 struct_ops
====================
Link: https://lore.kernel.org/r/20240125084416.10876-1-daniel@iogearbox.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
When repeatedly changing the interface link speed using the command below:
ethtool -s eth0 speed 100 duplex full
ethtool -s eth0 speed 1000 duplex full
The following errors may sometimes be reported by the ARM SMMU driver:
[ 5395.035364] fec 5b040000.ethernet eth0: Link is Down
[ 5395.039255] arm-smmu 51400000.iommu: Unhandled context fault:
fsr=0x402, iova=0x00000000, fsynr=0x100001, cbfrsynra=0x852, cb=2
[ 5398.108460] fec 5b040000.ethernet eth0: Link is Up - 100Mbps/Full -
flow control off
It is identified that the FEC driver does not properly stop the TX queue
during the link speed transitions, and this results in the invalid virtual
I/O address translations from the SMMU and causes the context faults.
Fixes: dbc64a8ea2 ("net: fec: move calls to quiesce/resume packet processing out of fec_restart()")
Signed-off-by: Shenwei Wang <shenwei.wang@nxp.com>
Link: https://lore.kernel.org/r/20240123165141.2008104-1-shenwei.wang@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
On HSW non-ULT (or at least on Dell Latitude E6540) external displays
start to flicker when we enable PSR on the eDP. We observe a much higher
SR and PC6 residency than should be possible with an external display,
and indeen much higher than what we observe with eDP disabled and
only the external display enabled. Looks like the hardware is somehow
ignoring the fact that the external display is active during PSR.
I wasn't able to redproduce this on my HSW ULT machine, or BDW.
So either there's something specific about this particular laptop
(eg. some unknown firmware thing) or the issue is limited to just
non-ULT HSW systems. All known registers that could affect this
look perfectly reasonable on the affected machine.
As a workaround let's unmask the LPSP event to prevent PSR entry
except while in LPSP mode (only pipe A + eDP active). This
will prevent PSR entry entirely when multiple pipes are active.
The one slight downside is that we now also prevent PSR entry
when driving eDP with pipe B or C, but I think that's a reasonable
tradeoff to avoid having to implement a more complex workaround.
Cc: stable@vger.kernel.org
Fixes: 783d8b8087 ("drm/i915/psr: Re-enable PSR1 on hsw/bdw")
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/10092
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240118212131.31868-1-ville.syrjala@linux.intel.com
Reviewed-by: Jouni Högander <jouni.hogander@intel.com>
(cherry picked from commit 94501c3ca6)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
There have been reports of the watchdog marking clocksources unstable on
machines with 8 NUMA nodes:
clocksource: timekeeping watchdog on CPU373:
Marking clocksource 'tsc' as unstable because the skew is too large:
clocksource: 'hpet' wd_nsec: 14523447520
clocksource: 'tsc' cs_nsec: 14524115132
The measured clocksource skew - the absolute difference between cs_nsec
and wd_nsec - was 668 microseconds:
cs_nsec - wd_nsec = 14524115132 - 14523447520 = 667612
The kernel used 200 microseconds for the uncertainty_margin of both the
clocksource and watchdog, resulting in a threshold of 400 microseconds (the
md variable). Both the cs_nsec and the wd_nsec value indicate that the
readout interval was circa 14.5 seconds. The observed behaviour is that
watchdog checks failed for large readout intervals on 8 NUMA node
machines. This indicates that the size of the skew was directly proportinal
to the length of the readout interval on those machines. The measured
clocksource skew, 668 microseconds, was evaluated against a threshold (the
md variable) that is suited for readout intervals of roughly
WATCHDOG_INTERVAL, i.e. HZ >> 1, which is 0.5 second.
The intention of 2e27e793e2 ("clocksource: Reduce clocksource-skew
threshold") was to tighten the threshold for evaluating skew and set the
lower bound for the uncertainty_margin of clocksources to twice
WATCHDOG_MAX_SKEW. Later in c37e85c135 ("clocksource: Loosen clocksource
watchdog constraints"), the WATCHDOG_MAX_SKEW constant was increased to
125 microseconds to fit the limit of NTP, which is able to use a
clocksource that suffers from up to 500 microseconds of skew per second.
Both the TSC and the HPET use default uncertainty_margin. When the
readout interval gets stretched the default uncertainty_margin is no
longer a suitable lower bound for evaluating skew - it imposes a limit
that is far stricter than the skew with which NTP can deal.
The root causes of the skew being directly proportinal to the length of
the readout interval are:
* the inaccuracy of the shift/mult pairs of clocksources and the watchdog
* the conversion to nanoseconds is imprecise for large readout intervals
Prevent this by skipping the current watchdog check if the readout
interval exceeds 2 * WATCHDOG_INTERVAL. Considering the maximum readout
interval of 2 * WATCHDOG_INTERVAL, the current default uncertainty margin
(of the TSC and HPET) corresponds to a limit on clocksource skew of 250
ppm (microseconds of skew per second). To keep the limit imposed by NTP
(500 microseconds of skew per second) for all possible readout intervals,
the margins would have to be scaled so that the threshold value is
proportional to the length of the actual readout interval.
As for why the readout interval may get stretched: Since the watchdog is
executed in softirq context the expiration of the watchdog timer can get
severely delayed on account of a ksoftirqd thread not getting to run in a
timely manner. Surely, a system with such belated softirq execution is not
working well and the scheduling issue should be looked into but the
clocksource watchdog should be able to deal with it accordingly.
Fixes: 2e27e793e2 ("clocksource: Reduce clocksource-skew threshold")
Suggested-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Jiri Wiesner <jwiesner@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Feng Tang <feng.tang@intel.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240122172350.GA740@incl
'struct hidraw_list' is a circular queue whose head can be smaller than
tail. Using 'list->tail != list->head' to release all memory that should
be released.
Fixes: a5623a203c ("HID: hidraw: fix memory leak in hidraw_release()")
Signed-off-by: Su Hui <suhui@nfschina.com>
Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Jiri Kosina <jkosina@suse.com>
RCU protection was removed in the commit 2d32777d60 ("raid1: remove rcu
protection to access rdev from conf").
However, the code in fix_read_error does rcu_dereference outside
rcu_read_lock - this triggers the following warning. The warning is
triggered by a LVM2 test shell/integrity-caching.sh.
This commit removes rcu_dereference.
=============================
WARNING: suspicious RCU usage
6.7.0 #2 Not tainted
-----------------------------
drivers/md/raid1.c:2265 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1
no locks held by mdX_raid1/1859.
stack backtrace:
CPU: 2 PID: 1859 Comm: mdX_raid1 Not tainted 6.7.0 #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x60/0x70
lockdep_rcu_suspicious+0x153/0x1b0
raid1d+0x1732/0x1750 [raid1]
? lock_acquire+0x9f/0x270
? finish_wait+0x3d/0x80
? md_thread+0xf7/0x130 [md_mod]
? lock_release+0xaa/0x230
? md_register_thread+0xd0/0xd0 [md_mod]
md_thread+0xa0/0x130 [md_mod]
? housekeeping_test_cpu+0x30/0x30
kthread+0xdc/0x110
? kthread_complete_and_exit+0x20/0x20
ret_from_fork+0x28/0x40
? kthread_complete_and_exit+0x20/0x20
ret_from_fork_asm+0x11/0x20
</TASK>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Fixes: ca294b34aa ("md/raid1: support read error check")
Reviewed-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/51539879-e1ca-fde3-b8b4-8934ddedcbc@redhat.com
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
1) Update nf_tables kdoc to keep it in sync with the code, from George Guo.
2) Handle NETDEV_UNREGISTER event for inet/ingress basechain.
3) Reject configuration that cause nft_limit to overflow,
from Florian Westphal.
4) Restrict anonymous set/map names to 16 bytes, from Florian Westphal.
5) Disallow to encode queue number and error in verdicts. This reverts
a patch which seems to have introduced an early attempt to support for
nfqueue maps, which is these days supported via nft_queue expression.
6) Sanitize family via .validate for expressions that explicitly refer
to NF_INET_* hooks.
* tag 'nf-24-01-24' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nf_tables: validate NFPROTO_* family
netfilter: nf_tables: reject QUEUE/DROP verdict parameters
netfilter: nf_tables: restrict anonymous set and map names to 16 bytes
netfilter: nft_limit: reject configurations that cause integer overflow
netfilter: nft_chain_filter: handle NETDEV_UNREGISTER for inet/ingress basechain
netfilter: nf_tables: cleanup documentation
====================
Link: https://lore.kernel.org/r/20240124191248.75463-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Creating a region with 16 memory devices caused a problem. The div_u64_rem
function, used for dividing an unsigned 64-bit number by a 32-bit one,
faced an issue when SZ_256M * p->interleave_ways. The result surpassed
the maximum limit of the 32-bit divisor (4G), leading to an overflow
and a remainder of 0.
note: At this point, p->interleave_ways is 16, meaning 16 * 256M = 4G
To fix this issue, I replaced the div_u64_rem function with div64_u64_rem
and adjusted the type of the remainder.
Signed-off-by: Quanquan Cao <caoqq@fujitsu.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Fixes: 23a22cd1c9 ("cxl/region: Allocate HPA capacity to regions")
Cc: <stable@vger.kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Several fixups
- Minor fix in `drm/exynos: gsc: gsc_runtime_resume`
. The patch ensures `clk_disable_unprepare()` is called on the first
element of `ctx->clocks` array.
This issue was identified by the Linux Verification Center.
- Fix excessive stack usage in `fimd_win_set_pixfmt()` in `drm/exynos`
. The issue, highlighted by gcc, involved an unnecessary on-stack copy of
the large `exynos_drm_plane` structure, now replaced with a pointer.
- Fix an incorrect type issue in `exynos_drm_fimd.c` module
. Addresses an incorrect type issue in `fimd_commit()` within the
`exynos_drm_fimd.c` The problem was reported by the kernel test robot[1].
[1] https://lore.kernel.org/oe-kbuild-all/202312140930.Me9yWf8F-lkp@intel.com/
- Fix a typo in the dt-bindings for `samsung,exynos-mixer`
. Changes 'regs' to the correct property name 'reg' in the dt-bindings
documentation for `samsung,exynos-mixer`
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Inki Dae <inki.dae@samsung.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240122072407.39546-1-inki.dae@samsung.com
In fjes_hw_setup, it allocates several memory and delay the deallocation
to the fjes_hw_exit in fjes_probe through the following call chain:
fjes_probe
|-> fjes_hw_init
|-> fjes_hw_setup
|-> fjes_hw_exit
However, when fjes_hw_setup fails, fjes_hw_exit won't be called and thus
all the resources allocated in fjes_hw_setup will be leaked. In this
patch, we free those resources in fjes_hw_setup and prevents such leaks.
Fixes: 2fcbca6877 ("fjes: platform_driver's .probe and .remove routine")
Signed-off-by: Zhipeng Lu <alexious@zju.edu.cn>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240122172445.3841883-1-alexious@zju.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull ceph fixes from Ilya Dryomov:
"A fix to avoid triggering an assert in some cases where RBD exclusive
mappings are involved and a deprecated API cleanup"
* tag 'ceph-for-6.8-rc2' of https://github.com/ceph/ceph-client:
rbd: don't move requests to the running list on errors
rbd: remove usage of the deprecated ida_simple_*() API
Pull integrity fix from Mimi Zohar:
"Revert patch that required user-provided key data, since keys can be
created from kernel-generated random numbers"
* tag 'integrity-v6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
Revert "KEYS: encrypted: Add check for strsep"
Maciej Fijalkowski says:
====================
net: bpf_xdp_adjust_tail() and Intel mbuf fixes
Hey,
after a break followed by dealing with sickness, here is a v6 that makes
bpf_xdp_adjust_tail() actually usable for ZC drivers that support XDP
multi-buffer. Since v4 I tried also using bpf_xdp_adjust_tail() with
positive offset which exposed yet another issues, which can be observed
by increased commit count when compared to v3.
John, in the end I think we should remove handling
MEM_TYPE_XSK_BUFF_POOL from __xdp_return(), but it is out of the scope
for fixes set, IMHO.
Thanks,
Maciej
v6:
- add acks [Magnus]
- fix spelling mistakes [Magnus]
- avoid touching xdp_buff in xp_alloc_{reused,new_from_fq}() [Magnus]
- s/shrink_data/bpf_xdp_shrink_data [Jakub]
- remove __shrink_data() [Jakub]
- check retvals from __xdp_rxq_info_reg() [Magnus]
v5:
- pick correct version of patch 5 [Simon]
- elaborate a bit more on what patch 2 fixes
v4:
- do not clear frags flag when deleting tail; xsk_buff_pool now does
that
- skip some NULL tests for xsk_buff_get_tail [Martin, John]
- address problems around registering xdp_rxq_info
- fix bpf_xdp_frags_increase_tail() for ZC mbuf
v3:
- add acks
- s/xsk_buff_tail_del/xsk_buff_del_tail
- address i40e as well (thanks Tirthendu)
v2:
- fix !CONFIG_XDP_SOCKETS builds
- add reviewed-by tag to patch 3
====================
Link: https://lore.kernel.org/r/20240124191602.566724-1-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
i40e support XDP multi-buffer so it is supposed to use
__xdp_rxq_info_reg() instead of xdp_rxq_info_reg() and set the
frag_size. It can not be simply converted at existing callsite because
rx_buf_len could be un-initialized, so let us register xdp_rxq_info
within i40e_configure_rx_ring(), which happen to be called with already
initialized rx_buf_len value.
Commit 5180ff1364 ("i40e: use int for i40e_status") converted 'err' to
int, so two variables to deal with return codes are not needed within
i40e_configure_rx_ring(). Remove 'ret' and use 'err' to handle status
from xdp_rxq_info registration.
Fixes: e213ced19b ("i40e: add support for XDP multi-buffer Rx")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20240124191602.566724-11-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
XSK ZC Rx path calculates the size of data that will be posted to XSK Rx
queue via subtracting xdp_buff::data_end from xdp_buff::data.
In bpf_xdp_frags_increase_tail(), when underlying memory type of
xdp_rxq_info is MEM_TYPE_XSK_BUFF_POOL, add offset to data_end in tail
fragment, so that later on user space will be able to take into account
the amount of bytes added by XDP program.
Fixes: 24ea50127e ("xsk: support mbuf on ZC RX")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20240124191602.566724-10-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Now that ice driver correctly sets up frag_size in xdp_rxq_info, let us
make it work for ZC multi-buffer as well. ice_rx_ring::rx_buf_len for ZC
is being set via xsk_pool_get_rx_frame_size() and this needs to be
propagated up to xdp_rxq_info.
Use a bigger hammer and instead of unregistering only xdp_rxq_info's
memory model, unregister it altogether and register it again and have
xdp_rxq_info with correct frag_size value.
Fixes: 1bbc04de60 ("ice: xsk: add RX multi-buffer support")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20240124191602.566724-9-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Ice and i40e ZC drivers currently set offset of a frag within
skb_shared_info to 0, which is incorrect. xdp_buffs that come from
xsk_buff_pool always have 256 bytes of a headroom, so they need to be
taken into account to retrieve xdp_buff::data via skb_frag_address().
Otherwise, bpf_xdp_frags_increase_tail() would be starting its job from
xdp_buff::data_hard_start which would result in overwriting existing
payload.
Fixes: 1c9ba9c146 ("i40e: xsk: add RX multi-buffer support")
Fixes: 1bbc04de60 ("ice: xsk: add RX multi-buffer support")
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20240124191602.566724-8-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
xdp_rxq_info struct can be registered by drivers via two functions -
xdp_rxq_info_reg() and __xdp_rxq_info_reg(). The latter one allows
drivers that support XDP multi-buffer to set up xdp_rxq_info::frag_size
which in turn will make it possible to grow the packet via
bpf_xdp_adjust_tail() BPF helper.
Currently, ice registers xdp_rxq_info in two spots:
1) ice_setup_rx_ring() // via xdp_rxq_info_reg(), BUG
2) ice_vsi_cfg_rxq() // via __xdp_rxq_info_reg(), OK
Cited commit under fixes tag took care of setting up frag_size and
updated registration scheme in 2) but it did not help as
1) is called before 2) and as shown above it uses old registration
function. This means that 2) sees that xdp_rxq_info is already
registered and never calls __xdp_rxq_info_reg() which leaves us with
xdp_rxq_info::frag_size being set to 0.
To fix this misbehavior, simply remove xdp_rxq_info_reg() call from
ice_setup_rx_ring().
Fixes: 2fba7dc515 ("ice: Add support for XDP multi-buffer on Rx side")
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20240124191602.566724-7-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
XDP programs can shrink packets by calling the bpf_xdp_adjust_tail()
helper function. For multi-buffer packets this may lead to reduction of
frag count stored in skb_shared_info area of the xdp_buff struct. This
results in issues with the current handling of XDP_PASS and XDP_DROP
cases.
For XDP_PASS, currently skb is being built using frag count of
xdp_buffer before it was processed by XDP prog and thus will result in
an inconsistent skb when frag count gets reduced by XDP prog. To fix
this, get correct frag count while building the skb instead of using
pre-obtained frag count.
For XDP_DROP, current page recycling logic will not reuse the page but
instead will adjust the pagecnt_bias so that the page can be freed. This
again results in inconsistent behavior as the page refcnt has already
been changed by the helper while freeing the frag(s) as part of
shrinking the packet. To fix this, only adjust pagecnt_bias for buffers
that are stillpart of the packet post-xdp prog run.
Fixes: e213ced19b ("i40e: add support for XDP multi-buffer Rx")
Reported-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Tirthendu Sarkar <tirthendu.sarkar@intel.com>
Link: https://lore.kernel.org/r/20240124191602.566724-6-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Fix an OOM panic in XDP_DRV mode when a XDP program shrinks a
multi-buffer packet by 4k bytes and then redirects it to an AF_XDP
socket.
Since support for handling multi-buffer frames was added to XDP, usage
of bpf_xdp_adjust_tail() helper within XDP program can free the page
that given fragment occupies and in turn decrease the fragment count
within skb_shared_info that is embedded in xdp_buff struct. In current
ice driver codebase, it can become problematic when page recycling logic
decides not to reuse the page. In such case, __page_frag_cache_drain()
is used with ice_rx_buf::pagecnt_bias that was not adjusted after
refcount of page was changed by XDP prog which in turn does not drain
the refcount to 0 and page is never freed.
To address this, let us store the count of frags before the XDP program
was executed on Rx ring struct. This will be used to compare with
current frag count from skb_shared_info embedded in xdp_buff. A smaller
value in the latter indicates that XDP prog freed frag(s). Then, for
given delta decrement pagecnt_bias for XDP_DROP verdict.
While at it, let us also handle the EOP frag within
ice_set_rx_bufs_act() to make our life easier, so all of the adjustments
needed to be applied against freed frags are performed in the single
place.
Fixes: 2fba7dc515 ("ice: Add support for XDP multi-buffer on Rx side")
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20240124191602.566724-5-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
XDP multi-buffer support introduced XDP_FLAGS_HAS_FRAGS flag that is
used by drivers to notify data path whether xdp_buff contains fragments
or not. Data path looks up mentioned flag on first buffer that occupies
the linear part of xdp_buff, so drivers only modify it there. This is
sufficient for SKB and XDP_DRV modes as usually xdp_buff is allocated on
stack or it resides within struct representing driver's queue and
fragments are carried via skb_frag_t structs. IOW, we are dealing with
only one xdp_buff.
ZC mode though relies on list of xdp_buff structs that is carried via
xsk_buff_pool::xskb_list, so ZC data path has to make sure that
fragments do *not* have XDP_FLAGS_HAS_FRAGS set. Otherwise,
xsk_buff_free() could misbehave if it would be executed against xdp_buff
that carries a frag with XDP_FLAGS_HAS_FRAGS flag set. Such scenario can
take place when within supplied XDP program bpf_xdp_adjust_tail() is
used with negative offset that would in turn release the tail fragment
from multi-buffer frame.
Calling xsk_buff_free() on tail fragment with XDP_FLAGS_HAS_FRAGS would
result in releasing all the nodes from xskb_list that were produced by
driver before XDP program execution, which is not what is intended -
only tail fragment should be deleted from xskb_list and then it should
be put onto xsk_buff_pool::free_list. Such multi-buffer frame will never
make it up to user space, so from AF_XDP application POV there would be
no traffic running, however due to free_list getting constantly new
nodes, driver will be able to feed HW Rx queue with recycled buffers.
Bottom line is that instead of traffic being redirected to user space,
it would be continuously dropped.
To fix this, let us clear the mentioned flag on xsk_buff_pool side
during xdp_buff initialization, which is what should have been done
right from the start of XSK multi-buffer support.
Fixes: 1bbc04de60 ("ice: xsk: add RX multi-buffer support")
Fixes: 1c9ba9c146 ("i40e: xsk: add RX multi-buffer support")
Fixes: 24ea50127e ("xsk: support mbuf on ZC RX")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20240124191602.566724-3-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Breno Leitao says:
====================
Fix MODULE_DESCRIPTION() for net (p2)
There are hundreds of network modules that misses MODULE_DESCRIPTION(),
causing a warnning when compiling with W=1. Example:
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/net/arcnet/com90io.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/net/arcnet/arc-rimi.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/net/arcnet/com20020.o
This part2 of the patchset focus on the drivers/net/ethernet drivers.
There are still some missing warnings in drivers/net/ethernet that will
be fixed in an upcoming patchset.
v1: https://lore.kernel.org/all/20240122184543.2501493-2-leitao@debian.org/
====================
Link: https://lore.kernel.org/r/20240123190332.677489-1-leitao@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Some (bad) devices can have really terrible discard latency; we don't
want them blocking memory reclaim and causing warnings.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
MSA MISC0 bit 1 to 7 contains Colorimetry Indicator Field.
dp_link_get_colorimetry_config() returns wrong colorimetry value
in the DP_TEST_DYNAMIC_RANGE_CEA case in the current implementation.
Hence fix this problem by having dp_link_get_colorimetry_config()
return defined CEA RGB colorimetry value in the case of
DP_TEST_DYNAMIC_RANGE_CEA.
Changes in V2:
-- drop retrieving colorimetry from colorspace
-- drop dr = link->dp_link.test_video.test_dyn_range assignment
Changes in V3:
-- move defined MISCr0a Colorimetry vale to dp_reg.h
-- rewording commit title
-- rewording commit text to more precise describe this patch
Fixes: c943b4948b ("drm/msm/dp: add displayPort driver support")
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Patchwork: https://patchwork.freedesktop.org/patch/574888/
Link: https://lore.kernel.org/r/1705526010-597-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Abhinav Kumar <quic_abhinavk@quicinc.com>
Since the value of DP_TEST_BIT_DEPTH_8 is already left shifted, in the
BPC unknown case, the additional shift causes spill over to the other
bits of the [DP_CONFIGURATION_CTRL] register.
Fix this by changing the return value of dp_link_get_test_bits_depth()
in the BPC unknown case to (DP_TEST_BIT_DEPTH_8 >> DP_TEST_BIT_DEPTH_SHIFT).
Fixes: c943b4948b ("drm/msm/dp: add displayPort driver support")
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Reviewed-by: Abhinav Kumar <quic_abhinavk@quicinc.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Patchwork: https://patchwork.freedesktop.org/patch/573989/
Link: https://lore.kernel.org/r/1704917931-30133-1-git-send-email-quic_khsieh@quicinc.com
[quic_abhinavk@quicinc.com: fix minor checkpatch warning to align with opening braces]
Signed-off-by: Abhinav Kumar <quic_abhinavk@quicinc.com>
If there is more than 32 cpus the bitmask will start to contain
commas, leading to:
./rps_default_mask.sh: line 36: [: 00000000,00000000: integer expression expected
Remove the commas, bash doesn't interpret leading zeroes as oct
so that should be good enough. Switch to bash, Simon reports that
not all shells support this type of substitution.
Fixes: c12e0d5f26 ("self-tests: introduce self-tests for RPS default mask")
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240122195815.638997-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jann Horn points out that uselib() really shouldn't trigger the new
FMODE_EXEC logic introduced by commit 4759ff71f2 ("exec: __FMODE_EXEC
instead of in_execve for LSMs").
In fact, it shouldn't even have ever triggered the old pre-existing
logic for __FMODE_EXEC (like the NFS code that makes executables not
need read permissions). Unlike a real execve(), that can work even with
files that are purely executable by the user (not readable), uselib()
has that MAY_READ requirement becasue it's really just a convenience
wrapper around mmap() for legacy shared libraries.
The whole FMODE_EXEC bit was originally introduced by commit
b500531e6f ("[PATCH] Introduce FMODE_EXEC file flag"), primarily to
give ETXTBUSY error returns for distributed filesystems.
It has since grown a few other warts (like that NFS thing), but there
really isn't any reason to use it for uselib(), and now that we are
trying to use it to replace the horrid 'tsk->in_execve' flag, it's
actively wrong.
Of course, as Jann Horn also points out, nobody should be enabling
CONFIG_USELIB in the first place in this day and age, but that's a
different discussion entirely.
Reported-by: Jann Horn <jannh@google.com>
Fixes: 4759ff71f2 ("exec: __FMODE_EXEC instead of in_execve for LSMs")
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This reverts commit b4af096b5d.
New encrypted keys are created either from kernel-generated random
numbers or user-provided decrypted data. Revert the change requiring
user-provided decrypted data.
Reported-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
otherwise the synopsys_id value may be read out wrong,
because the GMAC_VERSION register might still be in reset
state, for at least 1 us after the reset is de-asserted.
Add a wait for 10 us before continuing to be on the safe side.
> From what have you got that delay value?
Just try and error, with very old linux versions and old gcc versions
the synopsys_id was read out correctly most of the time (but not always),
with recent linux versions and recnet gcc versions it was read out
wrongly most of the time, but again not always.
I don't have access to the VHDL code in question, so I cannot
tell why it takes so long to get the correct values, I also do not
have more than a few hardware samples, so I cannot tell how long
this timeout must be in worst case.
Experimentally I can tell that the register is read several times
as zero immediately after the reset is de-asserted, also adding several
no-ops is not enough, adding a printk is enough, also udelay(1) seems to
be enough but I tried that not very often, and I have not access to many
hardware samples to be 100% sure about the necessary delay.
And since the udelay here is only executed once per device instance,
it seems acceptable to delay the boot for 10 us.
BTW: my hardware's synopsys id is 0x37.
Fixes: c5e4ddbdfa ("net: stmmac: Add support for optional reset control")
Signed-off-by: Bernd Edlinger <bernd.edlinger@hotmail.de>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Serge Semin <fancer.lancer@gmail.com>
Link: https://lore.kernel.org/r/AS8P193MB1285A810BD78C111E7F6AA34E4752@AS8P193MB1285.EURP193.PROD.OUTLOOK.COM
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Make 'git status' quietly happy again after a full allmodconfig build.
Fixes: 60433a9d03 ("samples: introduce new samples subdir for cgroup")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Several expressions explicitly refer to NF_INET_* hook definitions
from expr->ops->validate, however, family is not validated.
Bail out with EOPNOTSUPP in case they are used from unsupported
families.
Fixes: 0ca743a559 ("netfilter: nf_tables: add compatibility layer for x_tables")
Fixes: a3c90f7a23 ("netfilter: nf_tables: flow offload expression")
Fixes: 2fa841938c ("netfilter: nf_tables: introduce routing expression")
Fixes: 554ced0a6e ("netfilter: nf_tables: add support for native socket matching")
Fixes: ad49d86e07 ("netfilter: nf_tables: Add synproxy support")
Fixes: 4ed8eb6570 ("netfilter: nf_tables: Add native tproxy support")
Fixes: 6c47260250 ("netfilter: nf_tables: add xfrm expression")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This reverts commit e0abdadcc6.
core.c:nf_hook_slow assumes that the upper 16 bits of NF_DROP
verdicts contain a valid errno, i.e. -EPERM, -EHOSTUNREACH or similar,
or 0.
Due to the reverted commit, its possible to provide a positive
value, e.g. NF_ACCEPT (1), which results in use-after-free.
Its not clear to me why this commit was made.
NF_QUEUE is not used by nftables; "queue" rules in nftables
will result in use of "nft_queue" expression.
If we later need to allow specifiying errno values from userspace
(do not know why), this has to call NF_DROP_GETERR and check that
"err <= 0" holds true.
Fixes: e0abdadcc6 ("netfilter: nf_tables: accept QUEUE/DROP verdict parameters")
Cc: stable@vger.kernel.org
Reported-by: Notselwyn <notselwyn@pwning.tech>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
nftables has two types of sets/maps, one where userspace defines the
name, and anonymous sets/maps, where userspace defines a template name.
For the latter, kernel requires presence of exactly one "%d".
nftables uses "__set%d" and "__map%d" for this. The kernel will
expand the format specifier and replaces it with the smallest unused
number.
As-is, userspace could define a template name that allows to move
the set name past the 256 bytes upperlimit (post-expansion).
I don't see how this could be a problem, but I would prefer if userspace
cannot do this, so add a limit of 16 bytes for the '%d' template name.
16 bytes is the old total upper limit for set names that existed when
nf_tables was merged initially.
Fixes: 387454901b ("netfilter: nf_tables: Allow set names of up to 255 chars")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Reject bogus configs where internal token counter wraps around.
This only occurs with very very large requests, such as 17gbyte/s.
Its better to reject this rather than having incorrect ratelimit.
Fixes: d2168e849e ("netfilter: nft_limit: add per-byte limiting")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Remove netdevice from inet/ingress basechain in case NETDEV_UNREGISTER
event is reported, otherwise a stale reference to netdevice remains in
the hook list.
Fixes: 60a3815da7 ("netfilter: add inet ingress support")
Cc: stable@vger.kernel.org
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
- Correct comments for nlpid, family, udlen and udata in struct nft_table,
and afinfo is no longer a member of enum nft_set_class.
- Add comment for data in struct nft_set_elem.
- Add comment for flags in struct nft_ctx.
- Add comments for timeout in struct nft_set_iter, and flags is not a
member of struct nft_set_iter, remove the comment for it.
- Add comments for commit, abort, estimate and gc_init in struct
nft_set_ops.
- Add comments for pending_update, num_exprs, exprs and catchall_list
in struct nft_set.
- Add comment for ext_len in struct nft_set_ext_tmpl.
- Add comment for inner_ops in struct nft_expr_type.
- Add comments for clone, destroy_clone, reduce, gc, offload,
offload_action, offload_stats in struct nft_expr_ops.
- Add comments for blob_gen_0, blob_gen_1, bound, genmask, udlen, udata,
blob_next in struct nft_chain.
- Add comment for flags in struct nft_base_chain.
- Add comments for udlen, udata in struct nft_object.
- Add comment for type in struct nft_object_ops.
- Add comment for hook_list in struct nft_flowtable, and remove comments
for dev_name and ops which are not members of struct nft_flowtable.
Signed-off-by: George Guo <guodongtai@kylinos.cn>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
When the CPU goes idle for the last time during the CPU down hotplug
process, RCU reports a final quiescent state for the current CPU. If
this quiescent state propagates up to the top, some tasks may then be
woken up to complete the grace period: the main grace period kthread
and/or the expedited main workqueue (or kworker).
If those kthreads have a SCHED_FIFO policy, the wake up can indirectly
arm the RT bandwith timer to the local offline CPU. Since this happens
after hrtimers have been migrated at CPUHP_AP_HRTIMERS_DYING stage, the
timer gets ignored. Therefore if the RCU kthreads are waiting for RT
bandwidth to be available, they may never be actually scheduled.
This triggers TREE03 rcutorture hangs:
rcu: INFO: rcu_preempt self-detected stall on CPU
rcu: 4-...!: (1 GPs behind) idle=9874/1/0x4000000000000000 softirq=0/0 fqs=20 rcuc=21071 jiffies(starved)
rcu: (t=21035 jiffies g=938281 q=40787 ncpus=6)
rcu: rcu_preempt kthread starved for 20964 jiffies! g938281 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
rcu: RCU grace-period kthread stack dump:
task:rcu_preempt state:R running task stack:14896 pid:14 tgid:14 ppid:2 flags:0x00004000
Call Trace:
<TASK>
__schedule+0x2eb/0xa80
schedule+0x1f/0x90
schedule_timeout+0x163/0x270
? __pfx_process_timeout+0x10/0x10
rcu_gp_fqs_loop+0x37c/0x5b0
? __pfx_rcu_gp_kthread+0x10/0x10
rcu_gp_kthread+0x17c/0x200
kthread+0xde/0x110
? __pfx_kthread+0x10/0x10
ret_from_fork+0x2b/0x40
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1b/0x30
</TASK>
The situation can't be solved with just unpinning the timer. The hrtimer
infrastructure and the nohz heuristics involved in finding the best
remote target for an unpinned timer would then also need to handle
enqueues from an offline CPU in the most horrendous way.
So fix this on the RCU side instead and defer the wake up to an online
CPU if it's too late for the local one.
Reported-by: Paul E. McKenney <paulmck@kernel.org>
Fixes: 5c0930ccaa ("hrtimers: Push pending hrtimers away from outgoing CPU earlier")
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.iitr10@gmail.com>
Pull fbdev fixes and cleanups from Helge Deller:
"A crash fix in stifb which was missed to be included in the drm-misc
tree, two checks to prevent wrong userspace input in sisfb and
savagefb and two trivial printk cleanups:
- stifb: Fix crash in stifb_blank()
- savage/sis: Error out if pixclock equals zero
- minor trivial cleanups"
* tag 'fbdev-for-6.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev:
fbdev: stifb: Fix crash in stifb_blank()
fbcon: Fix incorrect printed function name in fbcon_prepare_logo()
fbdev: sis: Error out if pixclock equals zero
fbdev: savage: Error out if pixclock equals zero
fbdev: vt8500lcdfb: Remove unnecessary print function dev_err()
Several functions implementing VIDIOC_REQBUFS and _CREATE_BUFS all use
almost the same code to fill in the flags and capability fields. Refactor
this into a new vb2_set_flags_and_caps() function that replaces the old
fill_buf_caps() and validate_memory_flags() functions.
This also fixes a bug where vb2_ioctl_create_bufs() would not set the
V4L2_BUF_CAP_SUPPORTS_MAX_NUM_BUFFERS cap and also not fill in the
max_num_buffers field.
Fixes: d055a76c00 ("media: core: Report the maximum possible number of buffers for the queue")
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Reviewed-by: Benjamin Gaignard <benjamin.gaignard@collabora.com>
Acked-by: Tomasz Figa <tfiga@chromium.org>
Use vb2_get_num_buffers() to avoid using queue num_buffers field directly.
This allows us to change how the number of buffers is computed in the
future.
Fixes: d055a76c00 ("media: core: Report the maximum possible number of buffers for the queue")
Signed-off-by: Benjamin Gaignard <benjamin.gaignard@collabora.com>
Acked-by: Tomasz Figa <tfiga@chromium.org>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
The quirk table entries should be put in the USB ID order, but some
entries have been put in random places. Re-sort them.
Fixes: bf990c1023 ("ALSA: usb-audio: add quirk to fix Hamedal C20 disconnect issue")
Fixes: fd28941cff ("ALSA: usb-audio: Add new quirk FIXED_RATE for JBL Quantum810 Wireless")
Fixes: dfd5fe19db ("ALSA: usb-audio: Add FIXED_RATE quirk for JBL Quantum610 Wireless")
Fixes: 4a63e68a29 ("ALSA: usb-audio: Fix microphone sound on Nexigo webcam.")
Fixes: 7822baa844 ("ALSA: usb-audio: add quirk for RODE NT-USB+")
Fixes: 4fb7c24f69 ("ALSA: usb-audio: Add quirk for Fiero SC-01")
Fixes: 2307a0e1ca ("ALSA: usb-audio: Add quirk for Fiero SC-01 (fw v1.0.0)")
Link: https://lore.kernel.org/r/20240124155307.16996-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Move set_capacity() outside of the section procected by (&d->lock).
To avoid possible interrupt unsafe locking scenario:
CPU0 CPU1
---- ----
[1] lock(&bdev->bd_size_lock);
local_irq_disable();
[2] lock(&d->lock);
[3] lock(&bdev->bd_size_lock);
<Interrupt>
[4] lock(&d->lock);
*** DEADLOCK ***
Where [1](&bdev->bd_size_lock) hold by zram_add()->set_capacity().
[2]lock(&d->lock) hold by aoeblk_gdalloc(). And aoeblk_gdalloc()
is trying to acquire [3](&bdev->bd_size_lock) at set_capacity() call.
In this situation an attempt to acquire [4]lock(&d->lock) from
aoecmd_cfg_rsp() will lead to deadlock.
So the simplest solution is breaking lock dependency
[2](&d->lock) -> [3](&bdev->bd_size_lock) by moving set_capacity()
outside.
Signed-off-by: Maksim Kiselev <bigunclemax@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20240124072436.3745720-2-bigunclemax@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
When the block layer doesn't generate/verify metadata, the SG length is
smaller than the transfer length. This is because the SG length doesn't
include the metadata length that is added by the HW on the wire. The
target failes those commands with "Data SGL Length Invalid" by comparing
the transfer length and the SG length. Fix it by adding the metadata
length to the transfer length when there is no metadata SGL. The bug
reproduces when setting read_verify/write_generate configs to 0 at the
child multipath device or at the primary device when NVMe multipath is
disabled.
Note that setting those configs to 0 on the multipath device (ns_head)
doesn't have any impact on the I/Os.
Fixes: 5ec5d3bddc ("nvme-rdma: add metadata/T10-PI support")
Signed-off-by: Israel Rukshin <israelr@nvidia.com>
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
The RODE NT-USB+ is marketed as a professional usb microphone, however the
usb audio interface is a mess:
[ 1.130977] usb 1-5: new full-speed USB device number 2 using xhci_hcd
[ 1.503906] usb 1-5: config 1 has an invalid interface number: 5 but max is 4
[ 1.503912] usb 1-5: config 1 has no interface number 4
[ 1.519689] usb 1-5: New USB device found, idVendor=19f7, idProduct=0035, bcdDevice= 1.09
[ 1.519695] usb 1-5: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[ 1.519697] usb 1-5: Product: RØDE NT-USB+
[ 1.519699] usb 1-5: Manufacturer: RØDE
[ 1.519700] usb 1-5: SerialNumber: 1D773A1A
[ 8.327495] usb 1-5: 1:1: cannot get freq at ep 0x82
[ 8.344500] usb 1-5: 1:2: cannot get freq at ep 0x82
[ 8.365499] usb 1-5: 2:1: cannot get freq at ep 0x2
Add QUIRK_FLAG_GET_SAMPLE_RATE to work around the broken sample rate get.
I have asked Rode support to fix it, but they show no interest.
Signed-off-by: Sean Young <sean@mess.org>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20240124151524.23314-1-sean@mess.org
Signed-off-by: Takashi Iwai <tiwai@suse.de>
The test on so_count in nfsd4_release_lockowner() is nonsense and
harmful. Revert to using check_for_locks(), changing that to not sleep.
First: harmful.
As is documented in the kdoc comment for nfsd4_release_lockowner(), the
test on so_count can transiently return a false positive resulting in a
return of NFS4ERR_LOCKS_HELD when in fact no locks are held. This is
clearly a protocol violation and with the Linux NFS client it can cause
incorrect behaviour.
If RELEASE_LOCKOWNER is sent while some other thread is still
processing a LOCK request which failed because, at the time that request
was received, the given owner held a conflicting lock, then the nfsd
thread processing that LOCK request can hold a reference (conflock) to
the lock owner that causes nfsd4_release_lockowner() to return an
incorrect error.
The Linux NFS client ignores that NFS4ERR_LOCKS_HELD error because it
never sends NFS4_RELEASE_LOCKOWNER without first releasing any locks, so
it knows that the error is impossible. It assumes the lock owner was in
fact released so it feels free to use the same lock owner identifier in
some later locking request.
When it does reuse a lock owner identifier for which a previous RELEASE
failed, it will naturally use a lock_seqid of zero. However the server,
which didn't release the lock owner, will expect a larger lock_seqid and
so will respond with NFS4ERR_BAD_SEQID.
So clearly it is harmful to allow a false positive, which testing
so_count allows.
The test is nonsense because ... well... it doesn't mean anything.
so_count is the sum of three different counts.
1/ the set of states listed on so_stateids
2/ the set of active vfs locks owned by any of those states
3/ various transient counts such as for conflicting locks.
When it is tested against '2' it is clear that one of these is the
transient reference obtained by find_lockowner_str_locked(). It is not
clear what the other one is expected to be.
In practice, the count is often 2 because there is precisely one state
on so_stateids. If there were more, this would fail.
In my testing I see two circumstances when RELEASE_LOCKOWNER is called.
In one case, CLOSE is called before RELEASE_LOCKOWNER. That results in
all the lock states being removed, and so the lockowner being discarded
(it is removed when there are no more references which usually happens
when the lock state is discarded). When nfsd4_release_lockowner() finds
that the lock owner doesn't exist, it returns success.
The other case shows an so_count of '2' and precisely one state listed
in so_stateid. It appears that the Linux client uses a separate lock
owner for each file resulting in one lock state per lock owner, so this
test on '2' is safe. For another client it might not be safe.
So this patch changes check_for_locks() to use the (newish)
find_any_file_locked() so that it doesn't take a reference on the
nfs4_file and so never calls nfsd_file_put(), and so never sleeps. With
this check is it safe to restore the use of check_for_locks() rather
than testing so_count against the mysterious '2'.
Fixes: ce3c4ad7f4 ("NFSD: Fix possible sleep during nfsd4_release_lockowner()")
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Cc: stable@vger.kernel.org # v6.2+
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Presently ia32 registers stored in ptregs are unconditionally cast to
unsigned int by the ia32 stub. They are then cast to long when passed to
__se_sys*, but will not be sign extended.
This takes the sign of the syscall argument into account in the ia32
stub. It still casts to unsigned int to avoid implementation specific
behavior. However then casts to int or unsigned int as necessary. So that
the following cast to long sign extends the value.
This fixes the io_pgetevents02 LTP test when compiled with -m32. Presently
the systemcall io_pgetevents_time64() unexpectedly accepts -1 for the
maximum number of events.
It doesn't appear other systemcalls with signed arguments are effected
because they all have compat variants defined and wired up.
Fixes: ebeb8c82ff ("syscalls/x86: Use 'struct pt_regs' based syscall calling for IA32_EMULATION and x32")
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Richard Palethorpe <rpalethorpe@suse.com>
Signed-off-by: Nikolay Borisov <nik.borisov@suse.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240110130122.3836513-1-nik.borisov@suse.com
Link: https://lore.kernel.org/ltp/20210921130127.24131-1-rpalethorpe@suse.com/
Since the PCI IDs for PVC weren't added to the xe driver, the xe_wa
tests should not try to create a fake PVC device since they can't find
the right PCI ID. Fix bugs when running kunit:
# xe_wa_gt: ASSERTION FAILED at drivers/gpu/drm/xe/tests/xe_wa_test.c:111
Expected ret == 0, but
ret == -19 (0xffffffffffffffed)
[FAILED] PVC (B0)
# xe_wa_gt: ASSERTION FAILED at drivers/gpu/drm/xe/tests/xe_wa_test.c:111
Expected ret == 0, but
ret == -19 (0xffffffffffffffed)
[FAILED] PVC (B1)
# xe_wa_gt: ASSERTION FAILED at drivers/gpu/drm/xe/tests/xe_wa_test.c:111
Expected ret == 0, but
ret == -19 (0xffffffffffffffed)
[FAILED] PVC (C0)
Fixes: dd08ebf6c3 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240123031242.3548724-1-lucas.demarchi@intel.com
(cherry picked from commit ab5ae65fb2)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
readq() is not available in 32bits and i915_gem_object_read_from_page()
is supposed to allow reading arbitrary sizes determined by the `size`
argument. Currently the only caller only passes a size == 8 so the
second problem is not that big. Migrate to calling
memcpy()/memcpy_fromio() to allow possible changes in the display side
and to fix the build on 32b architectures.
v2: Use memcpy/memcpy_fromio directly rather than using iosys-map with
the same size == 8 bytes restriction (Matt Roper)
Fixes: 44e694958b ("drm/xe/display: Implement display support")
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240119001612.2991381-4-lucas.demarchi@intel.com
(cherry picked from commit 406663f777)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
When kcalloc() for ft->g succeeds but kvzalloc() for in fails,
fs_any_create_groups() will free ft->g. However, its caller
fs_any_create_table() will free ft->g again through calling
mlx5e_destroy_flow_table(), which will lead to a double-free.
Fix this by setting ft->g to NULL in fs_any_create_groups().
Fixes: 0f575c20bf ("net/mlx5e: Introduce Flow Steering ANY API")
Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
When `in` allocated by kvzalloc fails, arfs_create_groups will free
ft->g and return an error. However, arfs_create_table, the only caller of
arfs_create_groups, will hold this error and call to
mlx5e_destroy_flow_table, in which the ft->g will be freed again.
Fixes: 1cabe6b096 ("net/mlx5e: Create aRFS flow tables")
Signed-off-by: Zhipeng Lu <alexious@zju.edu.cn>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
XFRM stack doesn't prevent from users to configure replay window
in TX side and strongswan sets replay_window to be 1. It causes
to failures in validation logic when trying to offload the SA.
Replay window is not relevant in TX side and should be ignored.
Fixes: cded6d8012 ("net/mlx5e: Store replay window in XFRM attributes")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
All ConnectX devices have software parsing capability enabled, but it is
more correct to set allow_swp only if capability exists, which for IPsec
means that crypto offload is supported.
Fixes: 2451da081a ("net/mlx5: Unify device IPsec capabilities check")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
mlx5 devices have specific constants for choosing the CQ period mode. These
constants do not have to match the constants used by the kernel software
API for DIM period mode selection.
Fixes: cdd04f4d4d ("net/mlx5: Add support to create SQ and CQ for ASO")
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Reviewed-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Go-To-Vport action on RX is not allowed when the vport is uplink.
In such case, the packet should be dropped.
Fixes: 9db810ed2d ("net/mlx5: DR, Expose steering action functionality")
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Erez Shitrit <erezsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
When FW provides ICM addresses for drop RX/TX, the provided capability
is 64 bits that contain its GVMI as well as the ICM address itself.
In case of TX DROP this GVMI is different from the GVMI that the
domain is operating on.
This patch fixes the action to use these GVMI IDs, as provided by FW.
Fixes: 9db810ed2d ("net/mlx5: DR, Expose steering action functionality")
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
To enable multicast packets which are offloaded in bridge multicast
offload mode to be sent also to uplink, FTE bit uplink_hairpin_en should
be set. Add this bit to FTE for the bridge multicast offload rules.
Fixes: 18c2916cee ("net/mlx5: Bridge, snoop igmp/mld packets")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The processing of traffic in hairpin queues occurs in HW/FW and does not
involve the cpus, hence the upper bound on max num channels does not
apply to them. Using this bound for the hairpin RQT max_table_size is
wrong. It could be too small, and cause the error below [1]. As the
RQT size provided on init does not get modified later, use the same
value for both actual and max table sizes.
[1]
mlx5_core 0000:08:00.1: mlx5_cmd_out_err:805:(pid 1200): CREATE_RQT(0x916) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x538faf), err(-22)
Fixes: 74a8dadac1 ("net/mlx5e: Preparations for supporting larger number of channels")
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Indirection (*) is of lower precedence than postfix increment (++). Logic
in napi_poll context would cause an out-of-bound read by first increment
the pointer address by byte address space and then dereference the value.
Rather, the intended logic was to dereference first and then increment the
underlying value.
Fixes: 92214be597 ("net/mlx5e: Update doorbell for port timestamping CQ before the software counter")
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The sd_group field moved in the HW spec from the MPIR register
to the vport context.
Align the query accordingly.
Fixes: f5e9563299 ("net/mlx5: Expose Management PCIe Index Register (MPIR)")
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The cited commit moved the code of mlx5e_create_tises() and changed the
loop to create TISes over MLX5_MAX_PORTS constant value, instead of
getting the correct lag ports supported by the device, which can cause
FW errors on devices with less than MLX5_MAX_PORTS ports.
Change that back to mlx5e_get_num_lag_ports(mdev).
Also IPoIB interfaces create there own TISes, they don't use the eth
TISes, pass a flag to indicate that.
This fixes the following errors that might appear in kernel log:
mlx5_cmd_out_err:808:(pid 650): CREATE_TIS(0x912) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x595b5d), err(-22)
mlx5e_create_mdev_resources:174:(pid 650): alloc tises failed, -22
Fixes: b25bd37c85 ("net/mlx5: Move TISes from priv to mdev HW resources")
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The ASM1166 SATA host controller always reports wrongly,
that it has 32 ports. But in reality, it only has six ports.
This seems to be a hardware issue, as all tested ASM1166
SATA host controllers reports such high count of ports.
Example output: ahci 0000:09:00.0: AHCI 0001.0301
32 slots 32 ports 6 Gbps 0xffffff3f impl SATA mode.
By adjusting the port_map, the count is limited to six ports.
New output: ahci 0000:09:00.0: AHCI 0001.0301
32 slots 32 ports 6 Gbps 0x3f impl SATA mode.
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=211873
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218346
Signed-off-by: Conrad Kostecki <conikost@gentoo.org>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Niklas Cassel <cassel@kernel.org>
According to the datasheet(Table 3-2: Port configuration) the serdes 2
(SD2) can be configured to run QSGMII or SGMII mode. Already the QSGMII
mode is supported in the serdes_muxes list but was missing the SGMII mode.
In this mode the serdes is connected to the port 4.
Therefore add this entry in the list.
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Link: https://lore.kernel.org/r/20240108205140.1701770-1-horatiu.vultur@microchip.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Commit 2be22aae6b ("phy: qcom-qmp-usb: populate offsets configuration")
introduced register offsets to the driver but for ipq8074/ipq6018 they do
not match what was in the old style device tree. Example from old
ipq6018.dtsi:
<0x00078200 0x130>, /* Tx */
<0x00078400 0x200>, /* Rx */
<0x00078800 0x1f8>, /* PCS */
<0x00078600 0x044>; /* PCS misc */
which would translate to:
{.., .pcs = 0x800, .pcs_misc = 0x600, .tx = 0x200, .rx = 0x400 }
but was translated to:
{.., .pcs = 0x600, .tx = 0x200, .rx = 0x400 }
So split usb_offsets and fix USB initialization for IPQ8074 and IPQ6018.
Tested only on IPQ6018
Fixes: 2be22aae6b ("phy: qcom-qmp-usb: populate offsets configuration")
Signed-off-by: Mantas Pucka <mantas@8devices.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Link: https://lore.kernel.org/r/1706026160-17520-2-git-send-email-mantas@8devices.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Current code uses the specified ring buffer size (either the default of 128
Kbytes or a module parameter specified value) to encompass the one page
ring buffer header plus the actual ring itself. When the page size is 4K,
carving off one page for the header isn't significant. But when the page
size is 64K on ARM64, only half of the default 128 Kbytes is left for the
actual ring. While this doesn't break anything, the smaller ring size
could be a performance bottleneck.
Fix this by applying the VMBUS_RING_SIZE macro to the specified ring buffer
size. This macro adds a page for the header, and rounds up the size to a
page boundary, using the page size for which the kernel is built. Use this
new size for subsequent ring buffer calculations. For example, on ARM64
with 64K page size and the default ring size, this results in the actual
ring being 128 Kbytes, which is intended.
Cc: stable@vger.kernel.org # 5.15.x
Signed-off-by: Michael Kelley <mhklinux@outlook.com>
Link: https://lore.kernel.org/r/20240122170956.496436-1-mhklinux@outlook.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
A recent change moved the code that decides to skip
a channel or disable multichannel entirely, into a
helper function.
During this, a mutex_unlock of the session_mutex
should have been removed. Doing that here.
Fixes: f591062bdb ("cifs: handle servers that still advertise multichannel after disabling")
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Similar to the rest of the commands, this is a change
to add replay flags on retry. This one does not add a
back-off, considering that we may want to flush a write
ASAP to the server. Considering that this will be a
flush of cached pages, the retrans value is also not
honoured.
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
MS-SMB2 states that the header flag SMB2_FLAGS_REPLAY_OPERATION
needs to be set when a command needs to be retried, so that
the server is aware that this is a replay for an operation that
appeared before.
This can be very important, for example, for state changing
operations and opens which get retried following a reconnect;
since the client maybe unaware of the status of the previous
open.
This is particularly important for multichannel scenario, since
disconnection of one connection does not mean that the session
is lost. The requests can be replayed on another channel.
This change also makes use of exponential back-off before replays
and also limits the number of retries to "retrans" mount option
value.
Also, this change does not modify the read/write codepath.
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
The code to check for replay is not just -EAGAIN. In some
cases, the send request or receive response may result in
network errors, which we're now mapping to -ECONNABORTED.
This change introduces a helper function which checks
if the error returned in one of the above two errors.
And all checks for replays will now use this helper.
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
When the network stack returns various errors, we today bubble
up the error to the user (in case of soft mounts).
This change translates all network errors except -EINTR and
-EAGAIN to -ECONNABORTED. A similar approach is taken when
we receive network errors when reading from the socket.
The change also forces the cifsd thread to reconnect during
it's next activity.
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Inside scsi_eh_wakeup(), scsi_host_busy() is called & checked with host
lock every time for deciding if error handler kthread needs to be waken up.
This can be too heavy in case of recovery, such as:
- N hardware queues
- queue depth is M for each hardware queue
- each scsi_host_busy() iterates over (N * M) tag/requests
If recovery is triggered in case that all requests are in-flight, each
scsi_eh_wakeup() is strictly serialized, when scsi_eh_wakeup() is called
for the last in-flight request, scsi_host_busy() has been run for (N * M -
1) times, and request has been iterated for (N*M - 1) * (N * M) times.
If both N and M are big enough, hard lockup can be triggered on acquiring
host lock, and it is observed on mpi3mr(128 hw queues, queue depth 8169).
Fix the issue by calling scsi_host_busy() outside the host lock. We don't
need the host lock for getting busy count because host the lock never
covers that.
[mkp: Drop unnecessary 'busy' variables pointed out by Bart]
Cc: Ewan Milne <emilne@redhat.com>
Fixes: 6eb045e092 ("scsi: core: avoid host-wide host_busy counter for scsi_mq")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20240112070000.4161982-1-ming.lei@redhat.com
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Reviewed-by: Sathya Prakash Veerichetty <safhya.prakash@broadcom.com>
Tested-by: Sathya Prakash Veerichetty <safhya.prakash@broadcom.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When a qdisc is deleted from a net device the stack instructs the
underlying driver to remove its flow offload callback from the
associated filter block using the 'FLOW_BLOCK_UNBIND' command. The stack
then continues to replay the removal of the filters in the block for
this driver by iterating over the chains in the block and invoking the
'reoffload' operation of the classifier being used. In turn, the
classifier in its 'reoffload' operation prepares and emits a
'FLOW_CLS_DESTROY' command for each filter.
However, the stack does not do the same for chain templates and the
underlying driver never receives a 'FLOW_CLS_TMPLT_DESTROY' command when
a qdisc is deleted. This results in a memory leak [1] which can be
reproduced using [2].
Fix by introducing a 'tmplt_reoffload' operation and have the stack
invoke it with the appropriate arguments as part of the replay.
Implement the operation in the sole classifier that supports chain
templates (flower) by emitting the 'FLOW_CLS_TMPLT_{CREATE,DESTROY}'
command based on whether a flow offload callback is being bound to a
filter block or being unbound from one.
As far as I can tell, the issue happens since cited commit which
reordered tcf_block_offload_unbind() before tcf_block_flush_all_chains()
in __tcf_block_put(). The order cannot be reversed as the filter block
is expected to be freed after flushing all the chains.
[1]
unreferenced object 0xffff888107e28800 (size 2048):
comm "tc", pid 1079, jiffies 4294958525 (age 3074.287s)
hex dump (first 32 bytes):
b1 a6 7c 11 81 88 ff ff e0 5b b3 10 81 88 ff ff ..|......[......
01 00 00 00 00 00 00 00 e0 aa b0 84 ff ff ff ff ................
backtrace:
[<ffffffff81c06a68>] __kmem_cache_alloc_node+0x1e8/0x320
[<ffffffff81ab374e>] __kmalloc+0x4e/0x90
[<ffffffff832aec6d>] mlxsw_sp_acl_ruleset_get+0x34d/0x7a0
[<ffffffff832bc195>] mlxsw_sp_flower_tmplt_create+0x145/0x180
[<ffffffff832b2e1a>] mlxsw_sp_flow_block_cb+0x1ea/0x280
[<ffffffff83a10613>] tc_setup_cb_call+0x183/0x340
[<ffffffff83a9f85a>] fl_tmplt_create+0x3da/0x4c0
[<ffffffff83a22435>] tc_ctl_chain+0xa15/0x1170
[<ffffffff838a863c>] rtnetlink_rcv_msg+0x3cc/0xed0
[<ffffffff83ac87f0>] netlink_rcv_skb+0x170/0x440
[<ffffffff83ac6270>] netlink_unicast+0x540/0x820
[<ffffffff83ac6e28>] netlink_sendmsg+0x8d8/0xda0
[<ffffffff83793def>] ____sys_sendmsg+0x30f/0xa80
[<ffffffff8379d29a>] ___sys_sendmsg+0x13a/0x1e0
[<ffffffff8379d50c>] __sys_sendmsg+0x11c/0x1f0
[<ffffffff843b9ce0>] do_syscall_64+0x40/0xe0
unreferenced object 0xffff88816d2c0400 (size 1024):
comm "tc", pid 1079, jiffies 4294958525 (age 3074.287s)
hex dump (first 32 bytes):
40 00 00 00 00 00 00 00 57 f6 38 be 00 00 00 00 @.......W.8.....
10 04 2c 6d 81 88 ff ff 10 04 2c 6d 81 88 ff ff ..,m......,m....
backtrace:
[<ffffffff81c06a68>] __kmem_cache_alloc_node+0x1e8/0x320
[<ffffffff81ab36c1>] __kmalloc_node+0x51/0x90
[<ffffffff81a8ed96>] kvmalloc_node+0xa6/0x1f0
[<ffffffff82827d03>] bucket_table_alloc.isra.0+0x83/0x460
[<ffffffff82828d2b>] rhashtable_init+0x43b/0x7c0
[<ffffffff832aed48>] mlxsw_sp_acl_ruleset_get+0x428/0x7a0
[<ffffffff832bc195>] mlxsw_sp_flower_tmplt_create+0x145/0x180
[<ffffffff832b2e1a>] mlxsw_sp_flow_block_cb+0x1ea/0x280
[<ffffffff83a10613>] tc_setup_cb_call+0x183/0x340
[<ffffffff83a9f85a>] fl_tmplt_create+0x3da/0x4c0
[<ffffffff83a22435>] tc_ctl_chain+0xa15/0x1170
[<ffffffff838a863c>] rtnetlink_rcv_msg+0x3cc/0xed0
[<ffffffff83ac87f0>] netlink_rcv_skb+0x170/0x440
[<ffffffff83ac6270>] netlink_unicast+0x540/0x820
[<ffffffff83ac6e28>] netlink_sendmsg+0x8d8/0xda0
[<ffffffff83793def>] ____sys_sendmsg+0x30f/0xa80
[2]
# tc qdisc add dev swp1 clsact
# tc chain add dev swp1 ingress proto ip chain 1 flower dst_ip 0.0.0.0/32
# tc qdisc del dev swp1 clsact
# devlink dev reload pci/0000:06:00.0
Fixes: bbf73830cd ("net: sched: traverse chains in block with tcf_get_next_chain()")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Current code in netvsc_drv_init() incorrectly assumes that PAGE_SIZE
is 4 Kbytes, which is wrong on ARM64 with 16K or 64K page size. As a
result, the default VMBus ring buffer size on ARM64 with 64K page size
is 8 Mbytes instead of the expected 512 Kbytes. While this doesn't break
anything, a typical VM with 8 vCPUs and 8 netvsc channels wastes 120
Mbytes (8 channels * 2 ring buffers/channel * 7.5 Mbytes/ring buffer).
Unfortunately, the module parameter specifying the ring buffer size
is in units of 4 Kbyte pages. Ideally, it should be in units that
are independent of PAGE_SIZE, but backwards compatibility prevents
changing that now.
Fix this by having netvsc_drv_init() hardcode 4096 instead of using
PAGE_SIZE when calculating the ring buffer size in bytes. Also
use the VMBUS_RING_SIZE macro to ensure proper alignment when running
with page size larger than 4K.
Cc: <stable@vger.kernel.org> # 5.15.x
Fixes: 7aff79e297 ("Drivers: hv: Enable Hyper-V code to be built on ARM64")
Signed-off-by: Michael Kelley <mhklinux@outlook.com>
Link: https://lore.kernel.org/r/20240122162028.348885-1-mhklinux@outlook.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
cifs_pick_channel today just selects a channel based
on the policy of least loaded channel. However, it
does not take into account if the channel needs
reconnect. As a result, we can have failures in send
that can be completely avoided.
This change doesn't make a channel a candidate for
this selection if it needs reconnect.
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Recent versions of Clang gets confused about the possible size of the
"user" allocation, and CONFIG_FORTIFY_SOURCE ends up emitting a
warning[1]:
repro.c:126:4: warning: call to '__write_overflow_field' declared with 'warning' attribute: detected write beyond size of field (1st parameter); maybe use struct_group()? [-Wattribute-warning]
126 | __write_overflow_field(p_size_field, size);
| ^
for this memset():
int len;
__le16 *user;
...
len = ses->user_name ? strlen(ses->user_name) : 0;
user = kmalloc(2 + (len * 2), GFP_KERNEL);
...
if (len) {
...
} else {
memset(user, '\0', 2);
}
While Clang works on this bug[2], switch to using a direct assignment,
which avoids memset() entirely which both simplifies the code and silences
the false positive warning. (Making "len" size_t also silences the
warning, but the direct assignment seems better.)
Reported-by: Nathan Chancellor <nathan@kernel.org>
Closes: https://github.com/ClangBuiltLinux/linux/issues/1966 [1]
Link: https://github.com/llvm/llvm-project/issues/77813 [2]
Cc: Steve French <sfrench@samba.org>
Cc: Paulo Alcantara <pc@manguebit.com>
Cc: Ronnie Sahlberg <ronniesahlberg@gmail.com>
Cc: Shyam Prasad N <sprasad@microsoft.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: linux-cifs@vger.kernel.org
Cc: llvm@lists.linux.dev
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Pull tracing and eventfs fixes from Steven Rostedt:
- Fix histogram tracing_map insertion.
The tracing_map_insert copies the value into the elt variable and
then assigns the elt to the entry value. But it is possible that the
entry value becomes visible on other CPUs before the elt is fully
initialized. This is fixed by adding a wmb() between the
initialization of the elt variable and assigning it.
- Have eventfs directory have unique inode numbers.
Having them be all the same proved to be a failure as the 'find'
application will think that the directories are causing loops, as it
checks for directory loops via their inodes. Have the evenfs dir
entries get their inodes assigned when they are referenced and then
save them in the eventfs_inode structure.
* tag 'trace-v6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
eventfs: Save directory inodes in the eventfs_inode structure
tracing: Ensure visibility when inserting an element into tracing_map
We need to correct some aspects of the IORING_OP_FIXED_FD_INSTALL
command to take into account the security implications of making an
io_uring-private file descriptor generally accessible to a userspace
task.
The first change in this patch is to enable auditing of the FD_INSTALL
operation as installing a file descriptor into a task's file descriptor
table is a security relevant operation and something that admins/users
may want to audit.
The second change is to disable the io_uring credential override
functionality, also known as io_uring "personalities", in the
FD_INSTALL command. The credential override in FD_INSTALL is
particularly problematic as it affects the credentials used in the
security_file_receive() LSM hook. If a task were to request a
credential override via REQ_F_CREDS on a FD_INSTALL operation, the LSM
would incorrectly check to see if the overridden credentials of the
io_uring were able to "receive" the file as opposed to the task's
credentials. After discussions upstream, it's difficult to imagine a
use case where we would want to allow a credential override on a
FD_INSTALL operation so we are simply going to block REQ_F_CREDS on
IORING_OP_FIXED_FD_INSTALL operations.
Fixes: dc18b89ab1 ("io_uring/openclose: add support for IORING_OP_FIXED_FD_INSTALL")
Signed-off-by: Paul Moore <paul@paul-moore.com>
Link: https://lore.kernel.org/r/20240123215501.289566-2-paul@paul-moore.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We encountered a kernel crash triggered by the bpf_tcp_ca testcase as
show below:
Unable to handle kernel paging request at virtual address ff60000088554500
Oops [#1]
...
CPU: 3 PID: 458 Comm: test_progs Tainted: G OE 6.8.0-rc1-kselftest_plain #1
Hardware name: riscv-virtio,qemu (DT)
epc : 0xff60000088554500
ra : tcp_ack+0x288/0x1232
epc : ff60000088554500 ra : ffffffff80cc7166 sp : ff2000000117ba50
gp : ffffffff82587b60 tp : ff60000087be0040 t0 : ff60000088554500
t1 : ffffffff801ed24e t2 : 0000000000000000 s0 : ff2000000117bbc0
s1 : 0000000000000500 a0 : ff20000000691000 a1 : 0000000000000018
a2 : 0000000000000001 a3 : ff60000087be03a0 a4 : 0000000000000000
a5 : 0000000000000000 a6 : 0000000000000021 a7 : ffffffff8263f880
s2 : 000000004ac3c13b s3 : 000000004ac3c13a s4 : 0000000000008200
s5 : 0000000000000001 s6 : 0000000000000104 s7 : ff2000000117bb00
s8 : ff600000885544c0 s9 : 0000000000000000 s10: ff60000086ff0b80
s11: 000055557983a9c0 t3 : 0000000000000000 t4 : 000000000000ffc4
t5 : ffffffff8154f170 t6 : 0000000000000030
status: 0000000200000120 badaddr: ff60000088554500 cause: 000000000000000c
Code: c796 67d7 0000 0000 0052 0002 c13b 4ac3 0000 0000 (0001) 0000
---[ end trace 0000000000000000 ]---
The reason is that commit 2cd3e3772e ("x86/cfi,bpf: Fix bpf_struct_ops
CFI") changes the func_addr of arch_prepare_bpf_trampoline in struct_ops
from NULL to non-NULL, while we use func_addr on RV64 to differentiate
between struct_ops and regular trampoline. When the struct_ops testcase
is triggered, it emits wrong prologue and epilogue, and lead to
unpredictable issues. After commit 2cd3e3772e, we can use
BPF_TRAMP_F_INDIRECT to distinguish them as it always be set in
struct_ops.
Fixes: 2cd3e3772e ("x86/cfi,bpf: Fix bpf_struct_ops CFI")
Signed-off-by: Pu Lehui <pulehui@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Björn Töpel <bjorn@rivosinc.com>
Acked-by: Björn Töpel <bjorn@kernel.org>
Link: https://lore.kernel.org/bpf/20240123023207.1917284-1-pulehui@huaweicloud.com
Commit 1db9d06aaa55 ("mm/slab: remove CONFIG_SLAB from all Kconfig and
Makefile") removes the config SLAB and makes the SLUB allocator the only
default allocator in the kernel. Hence, the advice on reducing OS jitter
due to kworker kernel threads to build with CONFIG_SLUB instead of
CONFIG_SLAB is obsolete.
Remove the obsolete advice to build with SLUB instead of SLAB.
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20231130095515.21586-1-lukas.bulwahn@gmail.com
Merge cpupower utility update for 6.8-rc2 from Shuah Khan:
"This cpupower fixes update for Linux 6.8-rc2 consists of one single fix
to an issue where CFLAGS is passed as a make argument for cpupower, but
bench makefile does not inherit and append to them."
This fixes the problem so the user specified CFLAGS are honored.
* tag 'linux-cpupower-6.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux:
tools cpupower bench: Override CFLAGS assignments
Kernel has its own official true/false definitions.
The defines aren't even used in this file.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Kalle Valo says:
====================
wireless fixes for v6.8-rc2
The most visible fix here is the ath11k crash fix which was introduced
in v6.7. We also have a fix for iwlwifi memory corruption and few
smaller fixes in the stack.
* tag 'wireless-2024-01-22' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
wifi: mac80211: fix race condition on enabling fast-xmit
wifi: iwlwifi: fix a memory corruption
wifi: mac80211: fix potential sta-link leak
wifi: cfg80211/mac80211: remove dependency on non-existing option
wifi: cfg80211: fix missing interfaces when dumping
wifi: ath11k: rely on mac80211 debugfs handling for vif
wifi: p54: fix GCC format truncation warning with wiphy->fw_version
====================
Link: https://lore.kernel.org/r/20240122153434.E0254C433C7@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The host and target use two definition of aer type, unify
them into a single one.
Signed-off-by: Guixin Liu <kanie@linux.alibaba.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Return IRQ_NONE from the interrupt handler when no interrupt was
detected. Because an empty interrupt will cause a null pointer error:
Unable to handle kernel NULL pointer dereference at virtual
address 0000000000000008
Call trace:
complete+0x54/0x100
hisi_sfc_v3xx_isr+0x2c/0x40 [spi_hisi_sfc_v3xx]
__handle_irq_event_percpu+0x64/0x1e0
handle_irq_event+0x7c/0x1cc
Signed-off-by: Devyn Liu <liudingyuan@huawei.com>
Link: https://msgid.link/r/20240123071149.917678-1-liudingyuan@huawei.com
Signed-off-by: Mark Brown <broonie@kernel.org>
We can't use devm_platform_ioremap_resource_byname() to remap the
interrupt register that can be shared between
regulator-abb-{ivahd,dspeve,gpu} drivers instances.
The combined helper introduce a call to devm_request_mem_region() that
creates a new busy resource region on PRM_IRQSTATUS_MPU register
(0x4ae06010). The first devm_request_mem_region() call succeeds for
regulator-abb-ivahd but fails for the two other regulator-abb-dspeve
and regulator-abb-gpu.
# cat /proc/iomem | grep -i 4ae06
4ae06010-4ae06013 : 4ae07e34.regulator-abb-ivahd int-address
4ae06014-4ae06017 : 4ae07ddc.regulator-abb-mpu int-address
regulator-abb-dspeve and regulator-abb-gpu are missing due to
devm_request_mem_region() failure (EBUSY):
[ 1.326660] ti_abb 4ae07e30.regulator-abb-dspeve: can't request region for resource [mem 0x4ae06010-0x4ae06013]
[ 1.326660] ti_abb: probe of 4ae07e30.regulator-abb-dspeve failed with error -16
[ 1.327239] ti_abb 4ae07de4.regulator-abb-gpu: can't request region for resource [mem 0x4ae06010-0x4ae06013]
[ 1.327270] ti_abb: probe of 4ae07de4.regulator-abb-gpu failed with error -16
>From arm/boot/dts/dra7.dtsi:
The abb_mpu is the only instance using its own interrupt register:
(0x4ae06014) PRM_IRQSTATUS_MPU_2, ABB_MPU_DONE_ST (bit 7)
The other tree instances (abb_ivahd, abb_dspeve, abb_gpu) share
PRM_IRQSTATUS_MPU register (0x4ae06010) but use different bits
ABB_IVA_DONE_ST (bit 30), ABB_DSPEVE_DONE_ST( bit 29) and
ABB_GPU_DONE_ST (but 28).
The commit b36c6b1887 ("regulator: ti-abb: Make use of the helper
function devm_ioremap related") overlooked the following comment
implicitly explaining why devm_ioremap() is used in this case:
/*
* We may have shared interrupt register offsets which are
* write-1-to-clear between domains ensuring exclusivity.
*/
Fixes and partially reverts commit b36c6b1887 ("regulator: ti-abb:
Make use of the helper function devm_ioremap related").
Improve the existing comment to avoid further conversion to
devm_platform_ioremap_resource_byname().
Fixes: b36c6b1887 ("regulator: ti-abb: Make use of the helper function devm_ioremap related")
Signed-off-by: Romain Naour <romain.naour@skf.com>
Reviewed-by: Yoann Congal <yoann.congal@smile.fr>
Link: https://msgid.link/r/20240123111456.739381-1-romain.naour@smile.fr
Signed-off-by: Mark Brown <broonie@kernel.org>
Pull netfs fixes from David Howells:
* 'netfs-fixes' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
afs: Fix missing/incorrect unlocking of RCU read lock
afs: Remove afs_dynroot_d_revalidate() as it is redundant
afs: Fix error handling with lookup via FS.InlineBulkStatus
afs: Hide silly-rename files from userspace
cachefiles, erofs: Fix NULL deref in when cachefiles is not doing ondemand-mode
netfs: Fix a NULL vs IS_ERR() check in netfs_perform_write()
netfs, fscache: Prevent Oops in fscache_put_cache()
cifs: Don't use certain unnecessary folio_*() functions
afs: Don't use certain unnecessary folio_*() functions
netfs: Don't use certain unnecessary folio_*() functions
Signed-off-by: Christian Brauner <brauner@kernel.org>
The eventfs inodes and directories are allocated when referenced. But this
leaves the issue of keeping consistent inode numbers and the number is
only saved in the inode structure itself. When the inode is no longer
referenced, it can be freed. When the file that the inode was representing
is referenced again, the inode is once again created, but the inode number
needs to be the same as it was before.
Just making the inode numbers the same for all files is fine, but that
does not work with directories. The find command will check for loops via
the inode number and having the same inode number for directories triggers:
# find /sys/kernel/tracing
find: File system loop detected;
'/sys/kernel/debug/tracing/events/initcall/initcall_finish' is part of the same file system loop as
'/sys/kernel/debug/tracing/events/initcall'.
[..]
Linus pointed out that the eventfs_inode structure ends with a single
32bit int, and on 64 bit machines, there's likely a 4 byte hole due to
alignment. We can use this hole to store the inode number for the
eventfs_inode. All directories in eventfs are represented by an
eventfs_inode and that data structure can hold its inode number.
That last int was also purposely placed at the end of the structure to
prevent holes from within. Now that there's a 4 byte number to hold the
inode, both the inode number and the last integer can be moved up in the
structure for better cache locality, where the llist and rcu fields can be
moved to the end as they are only used when the eventfs_inode is being
deleted.
Link: https://lore.kernel.org/all/CAMuHMdXKiorg-jiuKoZpfZyDJ3Ynrfb8=X+c7x0Eewxn-YRdCA@mail.gmail.com/
Link: https://lore.kernel.org/linux-trace-kernel/20240122152748.46897388@gandalf.local.home
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Fixes: 53c41052ba ("eventfs: Have the inodes all for files and directories all be the same")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Since commit 086b957cc1 ("ALSA: usb-audio: Skip the clock selector
inquiry for single connections") we are already skipping clock selector
inquiry if only one clock source is connected, but we are still sending
a set request. Lets skip that too.
This should fix errors when setting a sample rate on devices that don't
have any controls present within the clock selector. An example of such
device is the new revision of MOTU M Series (07fd:000b):
AudioControl Interface Descriptor:
bLength 8
bDescriptorType 36
bDescriptorSubtype 11 (CLOCK_SELECTOR)
bClockID 1
bNrInPins 1
baCSourceID(0) 2
bmControls 0x00
iClockSelector 0
Perhaps we also should check if clock selectors are readable and writeable
like we already do for clock sources, but this is out of scope of this
patch.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217601
Signed-off-by: Alexander Tsoy <alexander@tsoy.me>
Link: https://lore.kernel.org/r/20240123134635.54026-1-alexander@tsoy.me
Signed-off-by: Takashi Iwai <tiwai@suse.de>
The EFI stub makefile contains logic to ensure that the objects that
make up the stub do not contain relocations that require runtime fixups
(typically to account for the runtime load address of the executable)
On RISC-V, we also avoid GP based relocations, as they require that GP
is assigned the correct base in the startup code, which is not
implemented in the EFI stub.
So add these relocation types to the grep expression that is used to
carry out this check.
Link: https://lkml.kernel.org/r/42c63cb9-87d0-49db-9af8-95771b186684%40siemens.com
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
The cflags for the RISC-V efistub were missing -mno-relax, thus were
under the risk that the compiler could use GP-relative addressing. That
happened for _edata with binutils-2.41 and kernel 6.1, causing the
relocation to fail due to an invalid kernel_size in handle_kernel_image.
It was not yet observed with newer versions, but that may just be luck.
Cc: <stable@vger.kernel.org>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
In the existing implementation, when executing interleaved write and read
operations in the ISR for a transfer length greater than the FIFO size,
the TXFIFO write precedes the RXFIFO read. Consequently, the initially
received data in the RXFIFO is pushed out and lost, leading to a failure
in data integrity. To address this issue, reverse the order of interleaved
operations and conduct the RXFIFO read followed by the TXFIFO write.
Fixes: 6afe2ae8dc ("spi: spi-cadence: Interleave write of TX and read of RX FIFO")
Signed-off-by: Amit Kumar Mahapatra <amit.kumar-mahapatra@amd.com>
Link: https://msgid.link/r/20231218090652.18403-1-amit.kumar-mahapatra@amd.com
Signed-off-by: Mark Brown <broonie@kernel.org>
If the power domains are registered first with genpd and *after that*
the driver attempts to power them on in the probe sequence, then it is
possible that a race condition occurs if genpd tries to power them on
in the same time.
The same is valid for powering them off before unregistering them
from genpd.
Attempt to fix race conditions by first removing the domains from genpd
and *after that* powering down domains.
Also first power up the domains and *after that* register them
to genpd.
Fixes: 59b644b01c ("soc: mediatek: Add MediaTek SCPSYS power domains")
Signed-off-by: Eugen Hristev <eugen.hristev@collabora.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20231225133615.78993-1-eugen.hristev@collabora.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
An opaque directory cannot have xwhiteouts, so instead of marking an
xwhiteouts directory with a new xattr, overload overlay.opaque xattr
for marking both opaque dir ('y') and xwhiteouts dir ('x').
This is more efficient as the overlay.opaque xattr is checked during
lookup of directory anyway.
This also prevents unnecessary checking the xattr when reading a
directory without xwhiteouts, i.e. most of the time.
Note that the xwhiteouts marker is not checked on the upper layer and
on the last layer in lowerstack, where xwhiteouts are not expected.
Fixes: bc8df7a3dc ("ovl: Add an alternative type of whiteout")
Cc: <stable@vger.kernel.org> # v6.7
Reviewed-by: Alexander Larsson <alexl@redhat.com>
Tested-by: Alexander Larsson <alexl@redhat.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
There's a Cirque touchpad that wakes system up without anything touched
the touchpad. The input report is empty when this happens.
The reason is stated in HID over I2C spec, 7.2.8.2:
"If the DEVICE wishes to wake the HOST from its low power state, it can
issue a wake by asserting the interrupt."
This is fine if OS can put system back to suspend by identifying input
wakeup count stays the same on resume, like Chrome OS Dark Resume [0].
But for regular distro such policy is lacking.
Though the change doesn't bring any impact on power consumption for
touchpad is minimal, other i2c-hid device may depends on SLEEP control
power. So use a quirk to limit the change scope.
[0] https://chromium.googlesource.com/chromiumos/platform2/+/HEAD/power_manager/docs/dark_resume.md
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Jiri Kosina <jkosina@suse.com>
devm_kasprintf() returns a pointer to dynamically allocated memory
which can be NULL upon failure. Ensure the allocation was successful
by checking the pointer validity.
[jkosina@suse.com: tweak changelog a bit]
Signed-off-by: Kunwu Chan <chentao@kylinos.cn>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Jiri Kosina <jkosina@suse.com>
I analyze the potential sleeping issue of the following processes:
Thread A Thread B
... netlink_create //ref = 1
do_mq_notify ...
sock = netlink_getsockbyfilp ... //ref = 2
info->notify_sock = sock; ...
... netlink_sendmsg
... skb = netlink_alloc_large_skb //skb->head is vmalloced
... netlink_unicast
... sk = netlink_getsockbyportid //ref = 3
... netlink_sendskb
... __netlink_sendskb
... skb_queue_tail //put skb to sk_receive_queue
... sock_put //ref = 2
... ...
... netlink_release
... deferred_put_nlk_sk //ref = 1
mqueue_flush_file
spin_lock
remove_notification
netlink_sendskb
sock_put //ref = 0
sk_free
...
__sk_destruct
netlink_sock_destruct
skb_queue_purge //get skb from sk_receive_queue
...
__skb_queue_purge_reason
kfree_skb_reason
__kfree_skb
...
skb_release_all
skb_release_head_state
netlink_skb_destructor
vfree(skb->head) //sleeping while holding spinlock
In netlink_sendmsg, if the memory pointed to by skb->head is allocated by
vmalloc, and is put to sk_receive_queue queue, also the skb is not freed.
When the mqueue executes flush, the sleeping bug will occur. Use
vfree_atomic instead of vfree in netlink_skb_destructor to solve the issue.
Fixes: c05cdb1b86 ("netlink: allow large data transfers from user-space")
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Link: https://lore.kernel.org/r/20240122011807.2110357-1-shaozhengchao@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jakub reported that ASSERT_EQ(cpu, i) in so_incoming_cpu.c seems to
fire somewhat randomly.
# # RUN so_incoming_cpu.before_reuseport.test3 ...
# # so_incoming_cpu.c:191:test3:Expected cpu (32) == i (0)
# # test3: Test terminated by assertion
# # FAIL so_incoming_cpu.before_reuseport.test3
# not ok 3 so_incoming_cpu.before_reuseport.test3
When the test failed, not-yet-accepted CLOSE_WAIT sockets received
SYN with a "challenging" SEQ number, which was sent from an unexpected
CPU that did not create the receiver.
The test basically does:
1. for each cpu:
1-1. create a server
1-2. set SO_INCOMING_CPU
2. for each cpu:
2-1. set cpu affinity
2-2. create some clients
2-3. let clients connect() to the server on the same cpu
2-4. close() clients
3. for each server:
3-1. accept() all child sockets
3-2. check if all children have the same SO_INCOMING_CPU with the server
The root cause was the close() in 2-4. and net.ipv4.tcp_tw_reuse.
In a loop of 2., close() changed the client state to FIN_WAIT_2, and
the peer transitioned to CLOSE_WAIT.
In another loop of 2., connect() happened to select the same port of
the FIN_WAIT_2 socket, and it was reused as the default value of
net.ipv4.tcp_tw_reuse is 2.
As a result, the new client sent SYN to the CLOSE_WAIT socket from
a different CPU, and the receiver's sk_incoming_cpu was overwritten
with unexpected CPU ID.
Also, the SYN had a different SEQ number, so the CLOSE_WAIT socket
responded with Challenge ACK. The new client properly returned RST
and effectively killed the CLOSE_WAIT socket.
This way, all clients were created successfully, but the error was
detected later by 3-2., ASSERT_EQ(cpu, i).
To avoid the failure, let's make sure that (i) the number of clients
is less than the number of available ports and (ii) such reuse never
happens.
Fixes: 6df96146b2 ("selftest: Add test for SO_INCOMING_CPU.")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Tested-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20240120031642.67014-1-kuniyu@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
On CPUs with weak memory models, reads and updates performed by tcp_push
to the sk variables can get reordered leaving the socket throttled when
it should not. The tasklet running tcp_wfree() may also not observe the
memory updates in time and will skip flushing any packets throttled by
tcp_push(), delaying the sending. This can pathologically cause 40ms
extra latency due to bad interactions with delayed acks.
Adding a memory barrier in tcp_push removes the bug, similarly to the
previous commit bf06200e73 ("tcp: tsq: fix nonagle handling").
smp_mb__after_atomic() is used to not incur in unnecessary overhead
on x86 since not affected.
Patch has been tested using an AWS c7g.2xlarge instance with Ubuntu
22.04 and Apache Tomcat 9.0.83 running the basic servlet below:
import java.io.IOException;
import java.io.OutputStreamWriter;
import java.io.PrintWriter;
import javax.servlet.ServletException;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
public class HelloWorldServlet extends HttpServlet {
@Override
protected void doGet(HttpServletRequest request, HttpServletResponse response)
throws ServletException, IOException {
response.setContentType("text/html;charset=utf-8");
OutputStreamWriter osw = new OutputStreamWriter(response.getOutputStream(),"UTF-8");
String s = "a".repeat(3096);
osw.write(s,0,s.length());
osw.flush();
}
}
Load was applied using wrk2 (https://github.com/kinvolk/wrk2) from an AWS
c6i.8xlarge instance. Before the patch an additional 40ms latency from P99.99+
values is observed while, with the patch, the extra latency disappears.
No patch and tcp_autocorking=1
./wrk -t32 -c128 -d40s --latency -R10000 http://172.31.60.173:8080/hello/hello
...
50.000% 0.91ms
75.000% 1.13ms
90.000% 1.46ms
99.000% 1.74ms
99.900% 1.89ms
99.990% 41.95ms <<< 40+ ms extra latency
99.999% 48.32ms
100.000% 48.96ms
With patch and tcp_autocorking=1
./wrk -t32 -c128 -d40s --latency -R10000 http://172.31.60.173:8080/hello/hello
...
50.000% 0.90ms
75.000% 1.13ms
90.000% 1.45ms
99.000% 1.72ms
99.900% 1.83ms
99.990% 2.11ms <<< no 40+ ms extra latency
99.999% 2.53ms
100.000% 2.62ms
Patch has been also tested on x86 (m7i.2xlarge instance) which it is not
affected by this issue and the patch doesn't introduce any additional
delay.
Fixes: 7aa5470c2c ("tcp: tsq: move tsq_flags close to sk_wmem_alloc")
Signed-off-by: Salvatore Dipietro <dipiets@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240119190133.43698-1-dipiets@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Avoid a kernel crash in stifb by providing the correct pointer to the fb_info
struct. Prior to commit e2e0b838a1 ("video/sticore: Remove info field from
STI struct") the fb_info struct was at the beginning of the fb struct.
Fixes: e2e0b838a1 ("video/sticore: Remove info field from STI struct")
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Apollo Lake seems to also suffer from IRQ timing issues. After being up for ~4
minutes, a Pentium N4200 system ends up falling back to workqueue-based IRQ
handling:
[ 208.019906] snd_hda_intel 0000:00:0e.0: IRQ timing workaround is activated
for card #0. Suggest a bigger bdl_pos_adj.
Unfortunately, the Baytrail and Braswell workaround value of 32 samples isn't
enough to fix the issue here. Default to 64 samples.
Signed-off-by: Rui Salvaterra <rsalvaterra@gmail.com>
Reviewed-by: Amadeusz Sławiński <amadeuszx.slawinski@linux.intel.com>
Link: https://lore.kernel.org/r/20240122114512.55808-3-rsalvaterra@gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
The QXL driver doesn't use any device for DMA mappings or allocations so
dev_to_node() will panic inside ttm_device_init() on NUMA systems:
general protection fault, probably for non-canonical address 0xdffffc000000007a: 0000 [#1] PREEMPT SMP KASAN NOPTI
KASAN: null-ptr-deref in range [0x00000000000003d0-0x00000000000003d7]
CPU: 1 PID: 1 Comm: swapper/0 Not tainted 6.7.0+ #9
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014
RIP: 0010:ttm_device_init+0x10e/0x340
Call Trace:
qxl_ttm_init+0xaa/0x310
qxl_device_init+0x1071/0x2000
qxl_pci_probe+0x167/0x3f0
local_pci_probe+0xe1/0x1b0
pci_device_probe+0x29d/0x790
really_probe+0x251/0x910
__driver_probe_device+0x1ea/0x390
driver_probe_device+0x4e/0x2e0
__driver_attach+0x1e3/0x600
bus_for_each_dev+0x12d/0x1c0
bus_add_driver+0x25a/0x590
driver_register+0x15c/0x4b0
qxl_pci_driver_init+0x67/0x80
do_one_initcall+0xf5/0x5d0
kernel_init_freeable+0x637/0xb10
kernel_init+0x1c/0x2e0
ret_from_fork+0x48/0x80
ret_from_fork_asm+0x1b/0x30
RIP: 0010:ttm_device_init+0x10e/0x340
Fall back to NUMA_NO_NODE if there is no device for DMA.
Found by Linux Verification Center (linuxtesting.org).
Fixes: b0a7ce53d4 ("drm/ttm: Schedule delayed_delete worker closer")
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When afs does a lookup, it tries to use FS.InlineBulkStatus to preemptively
look up a bunch of files in the parent directory and cache this locally, on
the basis that we might want to look at them too (for example if someone
does an ls on a directory, they may want want to then stat every file
listed).
FS.InlineBulkStatus can be considered a compound op with the normal abort
code applying to the compound as a whole. Each status fetch within the
compound is then given its own individual abort code - but assuming no
error that prevents the bulk fetch from returning the compound result will
be 0, even if all the constituent status fetches failed.
At the conclusion of afs_do_lookup(), we should use the abort code from the
appropriate status to determine the error to return, if any - but instead
it is assumed that we were successful if the op as a whole succeeded and we
return an incompletely initialised inode, resulting in ENOENT, no matter
the actual reason. In the particular instance reported, a vnode with no
permission granted to be accessed is being given a UAEACCES abort code
which should be reported as EACCES, but is instead being reported as
ENOENT.
Fix this by abandoning the inode (which will be cleaned up with the op) if
file[1] has an abort code indicated and turn that abort code into an error
instead.
Whilst we're at it, add a tracepoint so that the abort codes of the
individual subrequests of FS.InlineBulkStatus can be logged. At the moment
only the container abort code can be 0.
Fixes: e49c7b2f6d ("afs: Build an abstraction around an "operation" concept")
Reported-by: Jeffrey Altman <jaltman@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
There appears to be a race between silly-rename files being created/removed
and various userspace tools iterating over the contents of a directory,
leading to such errors as:
find: './kernel/.tmp_cpio_dir/include/dt-bindings/reset/.__afs2080': No such file or directory
tar: ./include/linux/greybus/.__afs3C95: File removed before we read it
when building a kernel.
Fix afs_readdir() so that it doesn't return .__afsXXXX silly-rename files
to userspace. This doesn't stop them being looked up directly by name as
we need to be able to look them up from within the kernel as part of the
silly-rename algorithm.
Fixes: 79ddbfa500 ("afs: Implement sillyrename for unlink and rename")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cachefiles_ondemand_init_object() as called from cachefiles_open_file() and
cachefiles_create_tmpfile() does not check if object->ondemand is set
before dereferencing it, leading to an oops something like:
RIP: 0010:cachefiles_ondemand_init_object+0x9/0x41
...
Call Trace:
<TASK>
cachefiles_open_file+0xc9/0x187
cachefiles_lookup_cookie+0x122/0x2be
fscache_cookie_state_machine+0xbe/0x32b
fscache_cookie_worker+0x1f/0x2d
process_one_work+0x136/0x208
process_scheduled_works+0x3a/0x41
worker_thread+0x1a2/0x1f6
kthread+0xca/0xd2
ret_from_fork+0x21/0x33
Fix this by making cachefiles_ondemand_init_object() return immediately if
cachefiles->ondemand is NULL.
Fixes: 3c5ecfe16e ("cachefiles: extract ondemand info field from cachefiles_object")
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Gao Xiang <xiang@kernel.org>
cc: Chao Yu <chao@kernel.org>
cc: Yue Hu <huyue2@coolpad.com>
cc: Jeffle Xu <jefflexu@linux.alibaba.com>
cc: linux-erofs@lists.ozlabs.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Running the following two commands in parallel on a multi-processor
AArch64 machine can sporadically produce an unexpected warning about
duplicate histogram entries:
$ while true; do
echo hist:key=id.syscall:val=hitcount > \
/sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/trigger
cat /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/hist
sleep 0.001
done
$ stress-ng --sysbadaddr $(nproc)
The warning looks as follows:
[ 2911.172474] ------------[ cut here ]------------
[ 2911.173111] Duplicates detected: 1
[ 2911.173574] WARNING: CPU: 2 PID: 12247 at kernel/trace/tracing_map.c:983 tracing_map_sort_entries+0x3e0/0x408
[ 2911.174702] Modules linked in: iscsi_ibft(E) iscsi_boot_sysfs(E) rfkill(E) af_packet(E) nls_iso8859_1(E) nls_cp437(E) vfat(E) fat(E) ena(E) tiny_power_button(E) qemu_fw_cfg(E) button(E) fuse(E) efi_pstore(E) ip_tables(E) x_tables(E) xfs(E) libcrc32c(E) aes_ce_blk(E) aes_ce_cipher(E) crct10dif_ce(E) polyval_ce(E) polyval_generic(E) ghash_ce(E) gf128mul(E) sm4_ce_gcm(E) sm4_ce_ccm(E) sm4_ce(E) sm4_ce_cipher(E) sm4(E) sm3_ce(E) sm3(E) sha3_ce(E) sha512_ce(E) sha512_arm64(E) sha2_ce(E) sha256_arm64(E) nvme(E) sha1_ce(E) nvme_core(E) nvme_auth(E) t10_pi(E) sg(E) scsi_mod(E) scsi_common(E) efivarfs(E)
[ 2911.174738] Unloaded tainted modules: cppc_cpufreq(E):1
[ 2911.180985] CPU: 2 PID: 12247 Comm: cat Kdump: loaded Tainted: G E 6.7.0-default #2 1b58bbb22c97e4399dc09f92d309344f69c44a01
[ 2911.182398] Hardware name: Amazon EC2 c7g.8xlarge/, BIOS 1.0 11/1/2018
[ 2911.183208] pstate: 61400005 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
[ 2911.184038] pc : tracing_map_sort_entries+0x3e0/0x408
[ 2911.184667] lr : tracing_map_sort_entries+0x3e0/0x408
[ 2911.185310] sp : ffff8000a1513900
[ 2911.185750] x29: ffff8000a1513900 x28: ffff0003f272fe80 x27: 0000000000000001
[ 2911.186600] x26: ffff0003f272fe80 x25: 0000000000000030 x24: 0000000000000008
[ 2911.187458] x23: ffff0003c5788000 x22: ffff0003c16710c8 x21: ffff80008017f180
[ 2911.188310] x20: ffff80008017f000 x19: ffff80008017f180 x18: ffffffffffffffff
[ 2911.189160] x17: 0000000000000000 x16: 0000000000000000 x15: ffff8000a15134b8
[ 2911.190015] x14: 0000000000000000 x13: 205d373432323154 x12: 5b5d313131333731
[ 2911.190844] x11: 00000000fffeffff x10: 00000000fffeffff x9 : ffffd1b78274a13c
[ 2911.191716] x8 : 000000000017ffe8 x7 : c0000000fffeffff x6 : 000000000057ffa8
[ 2911.192554] x5 : ffff0012f6c24ec0 x4 : 0000000000000000 x3 : ffff2e5b72b5d000
[ 2911.193404] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff0003ff254480
[ 2911.194259] Call trace:
[ 2911.194626] tracing_map_sort_entries+0x3e0/0x408
[ 2911.195220] hist_show+0x124/0x800
[ 2911.195692] seq_read_iter+0x1d4/0x4e8
[ 2911.196193] seq_read+0xe8/0x138
[ 2911.196638] vfs_read+0xc8/0x300
[ 2911.197078] ksys_read+0x70/0x108
[ 2911.197534] __arm64_sys_read+0x24/0x38
[ 2911.198046] invoke_syscall+0x78/0x108
[ 2911.198553] el0_svc_common.constprop.0+0xd0/0xf8
[ 2911.199157] do_el0_svc+0x28/0x40
[ 2911.199613] el0_svc+0x40/0x178
[ 2911.200048] el0t_64_sync_handler+0x13c/0x158
[ 2911.200621] el0t_64_sync+0x1a8/0x1b0
[ 2911.201115] ---[ end trace 0000000000000000 ]---
The problem appears to be caused by CPU reordering of writes issued from
__tracing_map_insert().
The check for the presence of an element with a given key in this
function is:
val = READ_ONCE(entry->val);
if (val && keys_match(key, val->key, map->key_size)) ...
The write of a new entry is:
elt = get_free_elt(map);
memcpy(elt->key, key, map->key_size);
entry->val = elt;
The "memcpy(elt->key, key, map->key_size);" and "entry->val = elt;"
stores may become visible in the reversed order on another CPU. This
second CPU might then incorrectly determine that a new key doesn't match
an already present val->key and subsequently insert a new element,
resulting in a duplicate.
Fix the problem by adding a write barrier between
"memcpy(elt->key, key, map->key_size);" and "entry->val = elt;", and for
good measure, also use WRITE_ONCE(entry->val, elt) for publishing the
element. The sequence pairs with the mentioned "READ_ONCE(entry->val);"
and the "val->key" check which has an address dependency.
The barrier is placed on a path executed when adding an element for
a new key. Subsequent updates targeting the same key remain unaffected.
From the user's perspective, the issue was introduced by commit
c193707dde ("tracing: Remove code which merges duplicates"), which
followed commit cbf4100efb ("tracing: Add support to detect and avoid
duplicates"). The previous code operated differently; it inherently
expected potential races which result in duplicates but merged them
later when they occurred.
Link: https://lore.kernel.org/linux-trace-kernel/20240122150928.27725-1-petr.pavlu@suse.com
Fixes: c193707dde ("tracing: Remove code which merges duplicates")
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
Acked-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Filesystems should use folio->index and folio->mapping, instead of
folio_index(folio), folio_mapping() and folio_file_mapping() since
they know that it's in the pagecache.
Change this automagically with:
perl -p -i -e 's/folio_mapping[(]([^)]*)[)]/\1->mapping/g' fs/smb/client/*.c
perl -p -i -e 's/folio_file_mapping[(]([^)]*)[)]/\1->mapping/g' fs/smb/client/*.c
perl -p -i -e 's/folio_index[(]([^)]*)[)]/\1->index/g' fs/smb/client/*.c
Reported-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: Steve French <sfrench@samba.org>
cc: Paulo Alcantara <pc@manguebit.com>
cc: Ronnie Sahlberg <lsahlber@redhat.com>
cc: Shyam Prasad N <sprasad@microsoft.com>
cc: Tom Talpey <tom@talpey.com>
cc: linux-cifs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
Filesystems should use folio->index and folio->mapping, instead of
folio_index(folio), folio_mapping() and folio_file_mapping() since
they know that it's in the pagecache.
Change this automagically with:
perl -p -i -e 's/folio_mapping[(]([^)]*)[)]/\1->mapping/g' fs/afs/*.c
perl -p -i -e 's/folio_file_mapping[(]([^)]*)[)]/\1->mapping/g' fs/afs/*.c
perl -p -i -e 's/folio_index[(]([^)]*)[)]/\1->index/g' fs/afs/*.c
Reported-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: linux-fsdevel@vger.kernel.org
Merge series from Johan Hovold <johan+linaro@kernel.org>:
To reduce the risk of speaker damage the PA gain needs to be limited on
machines like the Lenovo Thinkpad X13s until we have active speaker
protection in place.
Limit the gain to the current default setting provided by the UCM
configuration which most user have so far been using (due to a bug in
the configuration files which prevented hardware volume control [1]).
The wsa883x PA volume control also turned out to be broken, which meant
that the default setting used by UCM configuration is actually the
lowest level (-3 dB). With the codec driver fixed, hardware volume
control also works as expected.
Note that the new wsa884x driver most likely suffers from a similar bug,
I'll send a fix for that once I've got that confirmed.
Included is also a related fix for the LPASS WSA macro driver, which
was changing the digital gain setting behind the back of user space and
which can result in excessive (or too low) digital gain.
There are further Qualcomm codec drivers that similarly appear to
manipulate various gain settings, but on closer inspection it turns out
that they only write back the current settings. Tests reveal that these
writes are indeed needed for any prior updates to take effect (at least
for the WSA and RX macros).
[1] https://github.com/alsa-project/alsa-ucm-conf/pull/382
If the boot logo does not fit, a message is printed, including a wrong
function name prefix. Instead of correcting the function name (or using
__func__), just use "fbcon", like is done in several other messages.
While at it, modernize the call by switching to pr_info().
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Helge Deller <deller@gmx.de>
Pull btrfs fixes from David Sterba:
- zoned mode fixes:
- fix slowdown when writing large file sequentially by looking up
block groups with enough space faster
- locking fixes when activating a zone
- new mount API fixes:
- preserve mount options for a ro/rw mount of the same subvolume
- scrub fixes:
- fix use-after-free in case the chunk length is not aligned to
64K, this does not happen normally but has been reported on
images converted from ext4
- similar alignment check was missing with raid-stripe-tree
- subvolume deletion fixes:
- prevent calling ioctl on already deleted subvolume
- properly track flag tracking a deleted subvolume
- in subpage mode, fix decompression of an inline extent (zlib, lzo,
zstd)
- fix crash when starting writeback on a folio, after integration with
recent MM changes this needs to be started conditionally
- reject unknown flags in defrag ioctl
- error handling, API fixes, minor warning fixes
* tag 'for-6.8-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: scrub: limit RST scrub to chunk boundary
btrfs: scrub: avoid use-after-free when chunk length is not 64K aligned
btrfs: don't unconditionally call folio_start_writeback in subpage
btrfs: use the original mount's mount options for the legacy reconfigure
btrfs: don't warn if discard range is not aligned to sector
btrfs: tree-checker: fix inline ref size in error messages
btrfs: zstd: fix and simplify the inline extent decompression
btrfs: lzo: fix and simplify the inline extent decompression
btrfs: zlib: fix and simplify the inline extent decompression
btrfs: defrag: reject unknown flags of btrfs_ioctl_defrag_range_args
btrfs: avoid copying BTRFS_ROOT_SUBVOL_DEAD flag to snapshot of subvolume being deleted
btrfs: don't abort filesystem when attempting to snapshot deleted subvolume
btrfs: zoned: fix lock ordering in btrfs_zone_activate()
btrfs: fix unbalanced unlock of mapping_tree_lock
btrfs: ref-verify: free ref cache before clearing mount opt
btrfs: fix kvcalloc() arguments order in btrfs_ioctl_send()
btrfs: zoned: optimize hint byte for zoned allocator
btrfs: zoned: factor out prepare_allocation_zoned()
Currently, both ATA_LPM_UNKNOWN (0) and ATA_LPM_MAX_POWER (1) displays
as "max_performance" in sysfs.
This is quite misleading as they are not the same.
For ATA_LPM_UNKNOWN, ata_eh_set_lpm() will not be called at all,
leaving the configuration in unknown state.
For ATA_LPM_MAX_POWER, ata_eh_set_lpm() is called, and setting the
policy to ATA_LPM_MAX_POWER.
This also matches the description of the SATA_MOBILE_LPM_POLICY Kconfig:
0 => Keep firmware settings
1 => Maximum performance
Thus, update the sysfs description for ATA_LPM_UNKNOWN to match reality.
While at it, update libata.h to mention that the ascii descriptions
are in libata-sata.c and not in libata-scsi.c.
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Niklas Cassel <cassel@kernel.org>
Let logitech-hidpp driver claim Logitech G Pro X Superlight 2.
Reported-by: Marcus Rückert <darix@opensu.se>
Signed-off-by: Jiri Kosina <jkosina@suse.com>
The UCM configuration for the Lenovo ThinkPad X13s has up until now
been setting the speaker PA volume to the minimum -3 dB when enabling
the speakers, but this does not prevent the user from increasing the
volume further.
Limit the digital gain and PA volumes to a combined -3 dB in the machine
driver to reduce the risk of speaker damage until we have active speaker
protection in place (or higher safe levels have been established).
Note that the PA volume limit cannot be set lower than 0 dB or
PulseAudio gets confused when the first 16 levels all map to -3 dB.
Also note that this will probably need to be generalised using
machine-specific limits, but a common limit should do for now.
Cc: <stable@vger.kernel.org> # 6.5
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Link: https://msgid.link/r/20240122181819.4038-3-johan+linaro@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Something when wrong when applying the original patch and only the
c file made it in. Here the rest of the changes are applied.
Fixes: c9180b8e39 ("iio: humidity: Add driver for ti HDC302x humidity sensors")
Reported-by: Lars-Peter Clausen <lars@metafoo.de>
Cc: Javier Carrasco <javier.carrasco.cruz@gmail.com>
Cc: Li peiyu <579lpy@gmail.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Lars-Peter Clausen <lars@metafoo.de>
There are a ton of build errors when REGMAP is not set, so select
REGMAP to fix all of them.
Examples (not all of them):
../drivers/iio/imu/bno055/bno055_ser_core.c:495:15: error: variable 'bno055_ser_regmap_bus' has initializer but incomplete type
495 | static struct regmap_bus bno055_ser_regmap_bus = {
../drivers/iio/imu/bno055/bno055_ser_core.c:496:10: error: 'struct regmap_bus' has no member named 'write'
496 | .write = bno055_ser_write_reg,
../drivers/iio/imu/bno055/bno055_ser_core.c:497:10: error: 'struct regmap_bus' has no member named 'read'
497 | .read = bno055_ser_read_reg,
../drivers/iio/imu/bno055/bno055_ser_core.c: In function 'bno055_ser_probe':
../drivers/iio/imu/bno055/bno055_ser_core.c:532:18: error: implicit declaration of function 'devm_regmap_init'; did you mean 'vmem_map_init'? [-Werror=implicit-function-declaration]
532 | regmap = devm_regmap_init(&serdev->dev, &bno055_ser_regmap_bus,
../drivers/iio/imu/bno055/bno055_ser_core.c:532:16: warning: assignment to 'struct regmap *' from 'int' makes pointer from integer without a cast [-Wint-conversion]
532 | regmap = devm_regmap_init(&serdev->dev, &bno055_ser_regmap_bus,
../drivers/iio/imu/bno055/bno055_ser_core.c: At top level:
../drivers/iio/imu/bno055/bno055_ser_core.c:495:26: error: storage size of 'bno055_ser_regmap_bus' isn't known
495 | static struct regmap_bus bno055_ser_regmap_bus = {
Fixes: 2eef5a9cc6 ("iio: imu: add BNO055 serdev driver")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Andrea Merello <andrea.merello@iit.it>
Cc: Jonathan Cameron <jic23@kernel.org>
Cc: Lars-Peter Clausen <lars@metafoo.de>
Cc: linux-iio@vger.kernel.org
Cc: <Stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20240110185611.19723-1-rdunlap@infradead.org
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
"bmp085" is missing in bmp280_spi_id[] table, which leads to the next
warning in dmesg:
SPI driver bmp280 has no spi_device_id for bosch,bmp085
Add "bmp085" to bmp280_spi_id[] by mimicking its existing description in
bmp280_of_spi_match[] table to fix the above warning.
Signed-off-by: Sam Protsenko <semen.protsenko@linaro.org>
Fixes: b26b4e9170 ("iio: pressure: bmp280: add SPI interface driver")
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
CXL 3.1 Section 3.1.1 states:
"A Function on a CXL device must not generate INTx messages if
that Function participates in CXL.cache protocol or CXL.mem
protocols."
The generic CXL memory driver only supports devices which use the
CXL.mem protocol. The current driver attempts to allocate MSI/MSI-X
vectors in anticipation of their need for mailbox interrupts or event
processing. However, the above requirement does not require a device to
support interrupts, only that they use MSI/MSI-X. For example, a device
may disable mailbox interrupts and either be configured for firmware
first or skip event processing and function.
Dave Larsen reported that the following Intel / Agilex card does not
support interrupts on function 0.
CXL: Intel Corporation Device 0ddb (rev 01) (prog-if 10 [CXL Memory Device (CXL 2.x)])
Rather than fail device probe if interrupts are not supported; flag that
irqs are not enabled and avoid features which require interrupts.
Emit messages appropriate for the situation to aid in debugging should
device behavior be unexpected due to a failure to allocate vectors.
Note that it is possible for a device to have host based event
processing through polling. However, the driver does not support
polling and it is not anticipated to be generally required. Leave that
functionality to a future patch if such a device comes along.
Reported-by: Dave Larsen <davelarsen58@gmail.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Fan Ni <fan.ni@samsung.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-and-tested-by: Davidlohr Bueso <dave@stgolabs.net>
Link: https://lore.kernel.org/r/20240117-dont-fail-irq-v2-1-f33f26b0e365@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Prevent warnings of the form:
tools/testing/nvdimm/config_check.c:4:6: error: no previous prototype
for ‘check’ [-Werror=missing-prototypes]
...by locally disabling some warnings.
It turns out that:
Commit 0fcb70851f ("Makefile.extrawarn: turn on missing-prototypes globally")
...in addition to expanding in-tree coverage, also impacts out-of-tree
module builds like those in tools/testing/nvdimm/.
Filter out the warning options on unit test code that does not effect
mainline builds.
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Link: https://lore.kernel.org/r/170543984331.460832.1780246477583036191.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Prevent warnings of the form:
tools/testing/cxl/test/mock.c:44:6: error: no previous prototype for
‘__wrap_is_acpi_device_node’ [-Werror=missing-prototypes]
tools/testing/cxl/test/mock.c:63:5: error: no previous prototype for
‘__wrap_acpi_table_parse_cedt’ [-Werror=missing-prototypes]
tools/testing/cxl/test/mock.c:81:13: error: no previous prototype for
‘__wrap_acpi_evaluate_integer’ [-Werror=missing-prototypes]
...by locally disabling some warnings.
It turns out that:
Commit 0fcb70851f ("Makefile.extrawarn: turn on missing-prototypes globally")
...in addition to expanding in-tree coverage, also impacts out-of-tree
module builds like those in tools/testing/cxl/.
Filter out the warning options on unit test code that does not effect
mainline builds.
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Link: https://lore.kernel.org/r/170543983780.460832.10920261849128601697.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Indexing with mm_cid is incompatible with skipping disallowed cpumask,
because concurrency IDs are based on a virtual ID allocation which is
unrelated to the physical CPU mask.
These issues can be reproduced by running the rseq selftests under a
taskset which excludes CPU 0, e.g.
taskset -c 10-20 ./run_param_test.sh
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Pull stringop-overflow warning update from Gustavo A. R. Silva:
"Enable -Wstringop-overflow globally.
I waited for the release of -rc1 to run a final build-test on top of
it before sending this pull request. Fortunatelly, after building 358
kernels overnight (basically all supported archs with a wide variety
of configs), no more warnings have surfaced! :)
Thus, we are in a good position to enable this compiler option for all
versions of GCC that support it, with the exception of GCC-11, which
appears to have some issues with this option [1]"
Link: https://lore.kernel.org/lkml/b3c99290-40bc-426f-b3d2-1aa903f95c4e@embeddedor.com/ [1]
* tag 'Wstringop-overflow-for-6.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux:
init: Kconfig: Disable -Wstringop-overflow for GCC-11
Makefile: Enable -Wstringop-overflow globally
Pull xen netback fix from Juergen Gross:
"Transmit requests in Xen's virtual network protocol can consist of
multiple parts. While not really useful, except for the initial part
any of them may be of zero length, i.e. carry no data at all.
Besides a certain initial portion of the to be transferred data, these
parts are directly translated into what Linux calls SKB fragments.
Such converted request parts can, when for a particular SKB they are
all of length zero, lead to a de-reference of NULL in core networking
code"
* tag 'xsa448-6.8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen-netback: don't produce zero-size SKB frags
REQ_OP_FLUSH is only for internal use in the blk-mq and request based
drivers. File systems and other block layer consumers must use
REQ_OP_WRITE | REQ_PREFLUSH as documented in
Documentation/block/writeback_cache_control.rst.
While REQ_OP_FLUSH appears to work for blk-mq drivers it does not
get the proper flush state machine handling, and completely fails
for any bio based drivers, including all the stacking drivers. The
block layer will also get a check in 6.8 to reject this use case
entirely.
[Note: completely untested, but as this never got fixed since the
original bug report in November:
https://bugzilla.kernel.org/show_bug.cgi?id=218184
and the the discussion in December:
https://lore.kernel.org/all/20231221053016.72cqcfg46vxwohcj@moria.home.lan/T/
this seems to be best way to force it]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
The symbol's prompt should be a one-line description, instead of just
duplicating the symbol name.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
When walking directory trees, instead of looking for specific files and
running dirname to get the parent folder, traverse all folders and
ignore the ones not containing the desired files. This avoids the need
to call dirname inside the loop, which drastically decreases run time:
Running locally on a mt8192-asurada-spherion, which reports 160 test
cases, has gone from 5.5s to 2.9s, while running remotely with an
nfsroot has gone from 13.5s to 5.5s.
This change has a side-effect, which is that the root DT node now
also shows in the output, even though it isn't expected to bind to a
driver. However there shouldn't be a matching driver for the board
compatible, so the end result will be just an extra skipped test:
ok 1 / # SKIP
Reported-by: Mark Brown <broonie@kernel.org>
Closes: https://lore.kernel.org/all/310391e8-fdf2-4c2f-a680-7744eb685177@sirena.org.uk
Fixes: 14571ab1ad ("kselftest: Add new test for detecting unprobed Devicetree devices")
Tested-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Nícolas F. R. A. Prado <nfraprado@collabora.com>
Link: https://lore.kernel.org/r/20240122-dt-kselftest-dirname-perf-fix-v2-1-f1630532fd38@collabora.com
Signed-off-by: Rob Herring <robh@kernel.org>
Odroid-C1 uses a Monolithic Power Systems MP2161 controlled via PWM for
the VDDEE voltage supply of the Meson8b SoC. Commit 6b9352f3f8 ("pwm:
meson: modify and simplify calculation in meson_pwm_get_state") results
in my Odroid-C1 crashing with memory corruption in many different places
(seemingly at random). It turns out that this is due to a currently not
supported corner case.
The VDDEE regulator can generate between 860mV (duty cycle of ~91%) and
1140mV (duty cycle of 0%). We consider it to be enabled by the bootloader
(which is why it has the regulator-boot-on flag in .dts) as well as
being always-on (which is why it has the regulator-always-on flag in
.dts) because the VDDEE voltage is generally required for the Meson8b
SoC to work. The public S805 datasheet [0] states on page 17 (where "A5"
refers to the Cortex-A5 CPU cores):
[...] So if EE domains is shut off, A5 memory is also shut off. That
does not matter. Before EE power domain is shut off, A5 should be shut
off at first.
It turns out that at least some bootloader versions are keeping the PWM
output disabled. This is not a problem due to the specific design of the
regulator: when the PWM output is disabled the output pin is pulled LOW,
effectively achieving a 0% duty cycle (which in return means that VDDEE
voltage is at 1140mV).
The problem comes when the pwm-regulator driver tries to initialize the
PWM output. To do so it reads the current state from the hardware, which
is:
period: 3666ns
duty cycle: 3333ns (= ~91%)
enabled: false
Then those values are translated using the continuous voltage range to
860mV.
Later, when the regulator is being enabled (either by the regulator core
due to the always-on flag or first consumer - in this case the lima
driver for the Mali-450 GPU) the pwm-regulator driver tries to keep the
voltage (at 860mV) and just enable the PWM output. This is when things
start to go wrong as the typical voltage used for VDDEE is 1100mV.
Commit 6b9352f3f8 ("pwm: meson: modify and simplify calculation in
meson_pwm_get_state") triggers above condition as before that change
period and duty cycle were both at 0. Since the change to the pwm-meson
driver is considered correct the solution is to be found in the
pwm-regulator driver. Update the duty cycle during driver probe if the
regulator is flagged as boot-on so that a call to pwm_regulator_enable()
(by the regulator core during initialization of a regulator flagged with
boot-on) without any preceding call to pwm_regulator_set_voltage() does
not change the output voltage.
[0] https://dn.odroid.com/S805/Datasheet/S805_Datasheet%20V0.8%2020150126.pdf
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Link: https://msgid.link/r/20240113224628.377993-4-martin.blumenstingl@googlemail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
The vendor driver appears to be modifying the gain settings behind the
back of user space but these hacks never made it upstream except for
some essentially dead code that adds a constant zero to the current gain
setting on DAPM events.
Note that the volume registers still need to be written after enabling
clocks in order for any prior updates to take effect.
Reviewed-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Link: https://msgid.link/r/20240119112420.7446-5-johan+linaro@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
The LPASS WSA macro codec driver is updating the digital gain settings
behind the back of user space on DAPM events if companding has been
enabled.
As compander control is exported to user space, this can result in the
digital gain setting being incremented (or decremented) every time the
sound server is started and the codec suspended depending on what the
UCM configuration looks like.
Soon enough playback will become distorted (or too quiet).
This is specifically a problem on the Lenovo ThinkPad X13s as this
bypasses the limit for the digital gain setting that has been set by the
machine driver.
Fix this by simply dropping the compander gain offset hack. If someone
cares about modelling the impact of the compander setting this can
possibly be done by exporting it as a volume control later.
Note that the volume registers still need to be written after enabling
clocks in order for any prior updates to take effect.
Fixes: 2c4066e5d4 ("ASoC: codecs: lpass-wsa-macro: add dapm widgets and route")
Cc: stable@vger.kernel.org # 5.11
Cc: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Link: https://msgid.link/r/20240119112420.7446-4-johan+linaro@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
The PA gain can be set in steps of 1.5 dB from -3 dB to 18 dB, that is,
in 15 levels.
Fix the dB values for the PA volume control as experiments using wsa8835
show that the first 16 levels all map to the same lowest gain while the
last three map to the highest gain.
These values specifically need to be correct for the sound server to
provide proper volume control.
Note that level 0 (-3 dB) does not mute the PA so the mute flag should
also not be set.
Fixes: cdb09e6231 ("ASoC: codecs: wsa883x: add control, dapm widgets and map")
Cc: stable@vger.kernel.org # 6.0
Cc: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Link: https://msgid.link/r/20240119112420.7446-2-johan+linaro@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
We get a noise issue during the startup of recording. We update the
register setting and dapm widgets to fix this issue.
we change callback type of es8326_mute function to mute_stream.
ES8326_ADC_MUTE is moved to es8326_mute function so it can
be turned on at last and turned off at first.
Signed-off-by: Zhu Ning <zhuning0077@gmail.com>
Link: https://msgid.link/r/20240120101240.12496-6-zhuning0077@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
We update the values of some registers in the initialization
sequence in es8326_resume function to improve THD+N performance.
THD+N performance decreases if the output level on headphone is
close to full scale. So we change the register setting in
es8326_jack_detect_handler function to improve THD+N performance
if headphone pulgged. Also, the register setting should be restored
when the headset is unplugged
Signed-off-by: Zhu Ning <zhuning0077@gmail.com>
Link: https://msgid.link/r/20240120101240.12496-3-zhuning0077@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
We change the crosstalk parameter in es8326_resume function
to improve crosstalk performance.
Adding crosstalk kcontrol to enhance the flexibility of crosstalk
debugging in machine.
Adding ES8326_DAC_CROSSTALK macro to declare the crosstalk register.
Signed-off-by: Zhu Ning <zhuning0077@gmail.com>
Link: https://msgid.link/r/20240120101240.12496-2-zhuning0077@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Now that we have the VISIBLE_IF_KUNIT and EXPORT_SYMBOL_IF_KUNIT macros,
update the instructions to recommend this way of testing static
functions.
Signed-off-by: Arthur Grillo <arthurgrillo@riseup.net>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Commit 2810c1e998 ("kunit: Fix wild-memory-access bug in
kunit_free_suite_set()") fixed a wild-memory-access bug that could have
happened during the loading phase of test suites built and executed as
loadable modules. However, it also introduced a problematic side effect
that causes test suites modules to crash when they attempt to register
fake devices.
When a module is loaded, it traverses the MODULE_STATE_UNFORMED and
MODULE_STATE_COMING states before reaching the normal operating state
MODULE_STATE_LIVE. Finally, when the module is removed, it moves to
MODULE_STATE_GOING before being released. However, if the loading
function load_module() fails between complete_formation() and
do_init_module(), the module goes directly from MODULE_STATE_COMING to
MODULE_STATE_GOING without passing through MODULE_STATE_LIVE.
This behavior was causing kunit_module_exit() to be called without
having first executed kunit_module_init(). Since kunit_module_exit() is
responsible for freeing the memory allocated by kunit_module_init()
through kunit_filter_suites(), this behavior was resulting in a
wild-memory-access bug.
Commit 2810c1e998 ("kunit: Fix wild-memory-access bug in
kunit_free_suite_set()") fixed this issue by running the tests when the
module is still in MODULE_STATE_COMING. However, modules in that state
are not fully initialized, lacking sysfs kobjects. Therefore, if a test
module attempts to register a fake device, it will inevitably crash.
This patch proposes a different approach to fix the original
wild-memory-access bug while restoring the normal module execution flow
by making kunit_module_exit() able to detect if kunit_module_init() has
previously initialized the tests suite set. In this way, test modules
can once again register fake devices without crashing.
This behavior is achieved by checking whether mod->kunit_suites is a
virtual or direct mapping address. If it is a virtual address, then
kunit_module_init() has allocated the suite_set in kunit_filter_suites()
using kmalloc_array(). On the contrary, if mod->kunit_suites is still
pointing to the original address that was set when looking up the
.kunit_test_suites section of the module, then the loading phase has
failed and there's no memory to be freed.
v4:
- rebased on 6.8
- noted that kunit_filter_suites() must return a virtual address
v3:
- add a comment to clarify why the start address is checked
v2:
- add include <linux/mm.h>
Fixes: 2810c1e998 ("kunit: Fix wild-memory-access bug in kunit_free_suite_set()")
Reviewed-by: David Gow <davidgow@google.com>
Tested-by: Rae Moar <rmoar@google.com>
Tested-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Signed-off-by: Marco Pagani <marpagan@redhat.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Rae has been shouldering a lot of the KUnit review burden for the last
year, and will continue to do so in the future. Thanks!
Signed-off-by: David Gow <davidgow@google.com>
Reviewed-by: Rae Moar <rmoar@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
The kunit_device_register() function doesn't return NULL, it returns
error pointers. Change the KUNIT_ASSERT_NOT_NULL() to check for
ERR_OR_NULL().
Fixes: d03c720e03 ("kunit: Add APIs for managing devices")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Rae Moar <rmoar@google.com>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Several inlined functions subject to paravirt patching are referencing
BUG_func() after the recent switch to the alternative patching
mechanism.
As those functions can legally be used by non-GPL modules, BUG_func()
must be usable by those modules, too. So use EXPORT_SYMBOL() when
exporting BUG_func().
Fixes: 9824b00c2b ("x86/paravirt: Move some functions and defines to alternative.c")
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20240109082232.22657-1-jgross@suse.com
On systems using HWP, if a given frequency is equal to the maximum turbo
frequency or the maximum non-turbo frequency, the HWP performance level
corresponding to it is already known and can be used directly without
any computation.
Accordingly, adjust the code to use the known HWP performance levels in
the cases mentioned above.
This also helps to avoid limiting CPU capacity artificially in some
cases when the BIOS produces the HWP_CAP numbers using a different
E-core-to-P-core performance scaling factor than expected by the kernel.
Fixes: f5c8cf2a49 ("cpufreq: intel_pstate: hybrid: Use known scaling factor for P-cores")
Cc: 6.1+ <stable@vger.kernel.org> # 6.1+
Tested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Syzcaller UBSAN crash occurs in rds_cmsg_recv(),
which reads inc->i_rx_lat_trace[j + 1] with index 4 (3 + 1),
but with array size of 4 (RDS_RX_MAX_TRACES).
Here 'j' is assigned from rs->rs_rx_trace[i] and in-turn from
trace.rx_trace_pos[i] in rds_recv_track_latency(),
with both arrays sized 3 (RDS_MSG_RX_DGRAM_TRACE_MAX). So fix the
off-by-one bounds check in rds_recv_track_latency() to prevent
a potential crash in rds_cmsg_recv().
Found by syzcaller:
=================================================================
UBSAN: array-index-out-of-bounds in net/rds/recv.c:585:39
index 4 is out of range for type 'u64 [4]'
CPU: 1 PID: 8058 Comm: syz-executor228 Not tainted 6.6.0-gd2f51b3516da #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.15.0-1 04/01/2014
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x136/0x150 lib/dump_stack.c:106
ubsan_epilogue lib/ubsan.c:217 [inline]
__ubsan_handle_out_of_bounds+0xd5/0x130 lib/ubsan.c:348
rds_cmsg_recv+0x60d/0x700 net/rds/recv.c:585
rds_recvmsg+0x3fb/0x1610 net/rds/recv.c:716
sock_recvmsg_nosec net/socket.c:1044 [inline]
sock_recvmsg+0xe2/0x160 net/socket.c:1066
__sys_recvfrom+0x1b6/0x2f0 net/socket.c:2246
__do_sys_recvfrom net/socket.c:2264 [inline]
__se_sys_recvfrom net/socket.c:2260 [inline]
__x64_sys_recvfrom+0xe0/0x1b0 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:51 [inline]
do_syscall_64+0x40/0x110 arch/x86/entry/common.c:82
entry_SYSCALL_64_after_hwframe+0x63/0x6b
==================================================================
Fixes: 3289025aed ("RDS: add receive message trace used by application")
Reported-by: Chenyuan Yang <chenyuan0y@gmail.com>
Closes: https://lore.kernel.org/linux-rdma/CALGdzuoVdq-wtQ4Az9iottBqC5cv9ZhcE5q8N7LfYFvkRsOVcw@mail.gmail.com/
Signed-off-by: Sharath Srinivasan <sharath.srinivasan@oracle.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Starting from Linux 5.16 kernel, Tx timeout mechanism was added
in the virtio_net driver which prints the "Tx timeout" warning
message when a packet stays in Tx queue for too long. Below is an
example of the reported message:
"[494105.316739] virtio_net virtio1 tmfifo_net0: TX timeout on
queue: 0, sq: output.0, vq: 0×1, name: output.0, usecs since
last trans: 3079892256".
This issue could happen when external host driver which drains the
FIFO is restared, stopped or upgraded. To avoid such confusing
"Tx timeout" messages, this commit adds logic to drop the outstanding
Tx packet if it's not able to transmit in two seconds due to Tx FIFO
full, which can be considered as congestion or out-of-resource drop.
This commit also handles the special case that the packet is half-
transmitted into the Tx FIFO. In such case, the packet is discarded
with remaining length stored in vring->rem_padding. So paddings with
zeros can be sent out when Tx space is available to maintain the
integrity of the packet format. The padded packet will be dropped on
the receiving side.
Signed-off-by: Liming Sun <limings@nvidia.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://lore.kernel.org/r/20240111173106.96958-1-limings@nvidia.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
The HW has the capability to check each frame if it is a PTP frame,
which domain it is, which ptp frame type it is, different ip address in
the frame. And if one of these checks fail then the frame is not
timestamp. Most of these checks were disabled except checking the field
minorVersionPTP inside the PTP header. Meaning that once a partner sends
a frame compliant to 8021AS which has minorVersionPTP set to 1, then the
frame was not timestamp because the HW expected by default a value of 0
in minorVersionPTP. This is exactly the same issue as on lan8841.
Fix this issue by removing this check so the userspace can decide on this.
Fixes: ece1950283 ("net: phy: micrel: 1588 support for LAN8814 phy")
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Divya Koppera <divya.koppera@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In case of multiple kernel module instances using the same dpll device:
if only one registers dpll device, then only that one can register
directly connected pins with a dpll device. When unregistered parent is
responsible for determining if the muxed pin can be registered with it
or not, the drivers need to be loaded in serialized order to work
correctly - first the driver instance which registers the direct pins
needs to be loaded, then the other instances could register muxed type
pins.
Allow registration of a pin with a parent even if the parent was not
yet registered, thus allow ability for unserialized driver instance
load order.
Do not WARN_ON notification for unregistered pin, which can be invoked
for described case, instead just return error.
Fixes: 9431063ad3 ("dpll: core: Add DPLL framework base functions")
Fixes: 9d71b54b65 ("dpll: netlink: Add DPLL framework base functions")
Reviewed-by: Jan Glaza <jan.glaza@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If parent pin was unregistered but child pin was not, the userspace
would see the "zombie" pins - the ones that were registered with
a parent pin (dpll_pin_on_pin_register(..)).
Technically those are not available - as there is no dpll device in the
system. Do not dump those pins and prevent userspace from any
interaction with them. Provide a unified function to determine if the
pin is available and use it before acting/responding for user requests.
Fixes: 9d71b54b65 ("dpll: netlink: Add DPLL framework base functions")
Reviewed-by: Jan Glaza <jan.glaza@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a kernel module is unbound but the pin resources were not entirely
freed (other kernel module instance of the same PCI device have had kept
the reference to that pin), and kernel module is again bound, the pin
properties would not be updated (the properties are only assigned when
memory for the pin is allocated), prop pointer still points to the
kernel module memory of the kernel module which was deallocated on the
unbind.
If the pin dump is invoked in this state, the result is a kernel crash.
Prevent the crash by storing persistent pin properties in dpll subsystem,
copy the content from the kernel module when pin is allocated, instead of
using memory of the kernel module.
Fixes: 9431063ad3 ("dpll: core: Add DPLL framework base functions")
Fixes: 9d71b54b65 ("dpll: netlink: Add DPLL framework base functions")
Reviewed-by: Jan Glaza <jan.glaza@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If pin type is not expected, or pin properities failed to allocate
memory, the unwind error path shall not destroy pin's xarrays, which
were not yet initialized.
Add new goto label and use it to fix broken error path.
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After conversion of this driver to use powercap idle_inject core, this
driver doesn't use target_mwait value. So remove dead code.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Yunjian Wang says:
====================
fixes for tun
There are few places on the receive path where packet receives and
packet drops were not accounted for. This patchset fixes that issue.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The TUN can be used as vhost-net backend, and it is necessary to
count the packets transmitted from TUN to vhost-net/virtio-net.
However, there are some places in the receive path that were not
taken into account when using XDP. It would be beneficial to also
include new accounting for successfully received bytes using
dev_sw_netstats_rx_add.
Fixes: 761876c857 ("tap: XDP support")
Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The commit 8ae1aff0b3 ("tuntap: split out XDP logic") includes
dropped counter for XDP_DROP, XDP_ABORTED, and invalid XDP actions.
Unfortunately, that commit missed the dropped counter when error
occurs during XDP_TX and XDP_REDIRECT actions. This patch fixes
this issue.
Fixes: 8ae1aff0b3 ("tuntap: split out XDP logic")
Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The raw interrupt status of eic maybe set before the interrupt is enabled,
since the eic interrupt has a latch function, which would trigger the
interrupt event once enabled it from user side. To solve this problem,
interrupts generated before setting the interrupt trigger type are ignored.
Fixes: 25518e024e ("gpio: Add Spreadtrum EIC driver support")
Acked-by: Chunyan Zhang <zhang.lyra@gmail.com>
Signed-off-by: Wenhua Lin <Wenhua.Lin@unisoc.com>
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
p2sb_bar() unhides P2SB device to get resources from the device. It
guards the operation by locking pci_rescan_remove_lock so that parallel
rescans do not find the P2SB device. However, this lock causes deadlock
when PCI bus rescan is triggered by /sys/bus/pci/rescan. The rescan
locks pci_rescan_remove_lock and probes PCI devices. When PCI devices
call p2sb_bar() during probe, it locks pci_rescan_remove_lock again.
Hence the deadlock.
To avoid the deadlock, do not lock pci_rescan_remove_lock in p2sb_bar().
Instead, do the lock at fs_initcall. Introduce p2sb_cache_resources()
for fs_initcall which gets and caches the P2SB resources. At p2sb_bar(),
refer the cache and return to the caller.
Before operating the device at P2SB DEVFN for resource cache, check
that its device class is PCI_CLASS_MEMORY_OTHER 0x0580 that PCH
specifications define. This avoids unexpected operation to other devices
at the same DEVFN.
Link: https://lore.kernel.org/linux-pci/6xb24fjmptxxn5js2fjrrddjae6twex5bjaftwqsuawuqqqydx@7cl3uik5ef6j/
Fixes: 9745fb0747 ("platform/x86/intel: Add Primary to Sideband (P2SB) bridge support")
Cc: stable@vger.kernel.org
Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Link: https://lore.kernel.org/r/20240108062059.3583028-2-shinichiro.kawasaki@wdc.com
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Tested-by Klara Modin <klarasmodin@gmail.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
When booting a kernel with CONFIG_CFI_CLANG, there is a CFI failure when
accessing any of the values under
/sys/devices/system/cpu/intel_uncore_frequency/package_00_die_00:
$ cat /sys/devices/system/cpu/intel_uncore_frequency/package_00_die_00/max_freq_khz
fish: Job 1, 'cat /sys/devices/system/cpu/int…' terminated by signal SIGSEGV (Address boundary error)
$ sudo dmesg &| grep 'CFI failure'
[ 170.953925] CFI failure at kobj_attr_show+0x19/0x30 (target: show_max_freq_khz+0x0/0xc0 [intel_uncore_frequency_common]; expected type: 0xd34078c5
The sysfs callback functions such as show_domain_id() are written as if
they are going to be called by dev_attr_show() but as the above message
shows, they are instead called by kobj_attr_show(). kCFI checks that the
destination of an indirect jump has the exact same type as the prototype
of the function pointer it is called through and fails when they do not.
These callbacks are called through kobj_attr_show() because
uncore_root_kobj was initialized with kobject_create_and_add(), which
means uncore_root_kobj has a ->sysfs_ops of kobj_sysfs_ops from
kobject_create(), which uses kobj_attr_show() as its ->show() value.
The only reason there has not been a more noticeable problem until this
point is that 'struct kobj_attribute' and 'struct device_attribute' have
the same layout, so getting the callback from container_of() works the
same with either value.
Change all the callbacks and their uses to be compatible with
kobj_attr_show() and kobj_attr_store(), which resolves the kCFI failure
and allows the sysfs files to work properly.
Closes: https://github.com/ClangBuiltLinux/linux/issues/1974
Fixes: ae7b2ce578 ("platform/x86/intel/uncore-freq: Use sysfs API to create attributes")
Cc: stable@vger.kernel.org
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Link: https://lore.kernel.org/r/20240104-intel-uncore-freq-kcfi-fix-v1-1-bf1e8939af40@kernel.org
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
In case of long format of qDMA command descriptor, there are one frame
descriptor, three entries in the frame list and two data entries. So the
size of dma_pool_create for these three fields should be the same with
the total size of entries respectively, or the contents may be overwritten
by the next allocated descriptor.
Fixes: 7fdf9b05c7 ("dmaengine: fsl-dpaa2-qdma: Add NXP dpaa2 qDMA controller driver for Layerscape SoCs")
Signed-off-by: Guanhua Gao <guanhua.gao@nxp.com>
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Link: https://lore.kernel.org/r/20240118162917.2951450-1-Frank.Li@nxp.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
When an legacy WMI event handler is removed, an WMI event could
have called the handler just before it was removed, meaning the
handler could still be running after wmi_remove_notify_handler()
returns.
Something similar could also happens when using the WMI bus, as
the WMI core might still call the notify() callback from an WMI
driver even if its remove() callback was just called.
Fix this by introducing a rw semaphore which ensures that the
event state of a WMI device does not change while the WMI core
is handling an event for it.
Tested on a Dell Inspiron 3505 and a Acer Aspire E1-731.
Fixes: 1686f54445 ("platform/x86: wmi: Incorporate acpi_install_notify_handler")
Signed-off-by: Armin Wolf <W_Armin@gmx.de>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20240103192707.115512-5-W_Armin@gmx.de
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Until now, legacy WMI notify handler functions where using the
wmi_block_list, which did no refcounting on the returned WMI device.
This meant that the WMI device could disappear at any moment,
potentially leading to various errors.
Fix this by using bus_find_device() which returns an actual
reference to the found WMI device.
Tested on a Dell Inspiron 3505 and a Acer Aspire E1-731.
Signed-off-by: Armin Wolf <W_Armin@gmx.de>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20240103192707.115512-4-W_Armin@gmx.de
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Commit 58f6425eb9 ("WMI: Cater for multiple events with same GUID")
allowed legacy WMI notify handlers to be installed for multiple WMI
devices with the same GUID.
However this is useless since the legacy GUID-based interface is
blacklisted from seeing WMI devices with duplicated GUIDs.
Return immediately if a suitable WMI event is found in
wmi_install/remove_notify_handler() since searching for other suitable
events is pointless.
Tested on a Dell Inspiron 3505 and a Acer Aspire E1-731.
Signed-off-by: Armin Wolf <W_Armin@gmx.de>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20240103192707.115512-3-W_Armin@gmx.de
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
When wmi_install_notify_handler()/wmi_remove_notify_handler() are
unable to enable/disable the WMI device, they unconditionally return
an error to the caller.
When registering legacy WMI notify handlers, this means that the
callback remains registered despite wmi_install_notify_handler()
having returned an error.
When removing legacy WMI notify handlers, this means that the
callback is removed despite wmi_remove_notify_handler() having
returned an error.
Fix this by only warning when the WMI device could not be enabled.
This behaviour matches the bus-based WMI interface.
Tested on a Dell Inspiron 3505 and a Acer Aspire E1-731.
Fixes: 58f6425eb9 ("WMI: Cater for multiple events with same GUID")
Signed-off-by: Armin Wolf <W_Armin@gmx.de>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20240103192707.115512-2-W_Armin@gmx.de
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Fix interrupt function prototypes, move all prototypes into a new
file ip32-common.h and include it where needed.
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Make ArcGetMemoryDescriptor() static since it's only needed internally.
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Fix missing prototypes by making not shared functions static and
adding others to ip27-common.h. Also drop ip27-hubio.c as it's
not used for a long time.
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
While adding new partitions descriptors to the XArray the outcome of the
stores should be checked and, in particular, it has also to be ensured
that an existing entry with the same index was not already present, since
partitions IDs are expected to be unique.
Use xa_insert() instead of xa_store() since it returns -EBUSY when the
index is already in use and log an error when that happens.
Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
Link: https://lore.kernel.org/r/20240108-ffa_fixes_6-8-v1-5-75bf7035bc50@arm.com
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Add the missing rwlock initialization for the FF-A partition associated
the driver in ffa_setup_partitions(). It will the primary scheduler
partition in the host or the VM partition in the virtualised environment.
IOW, it corresponds to the partition with VM ID == drv_info->vm_id.
Fixes: 1b6bf41b7a ("firmware: arm_ffa: Add notification handling mechanism")
Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
Link: https://lore.kernel.org/r/20240108-ffa_fixes_6-8-v1-2-75bf7035bc50@arm.com
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
The clock protocol version as per the SCMI v3.2 specification is 0x30000.
Enable the v3.0 clock protocol features only when clock protocol version
equals 0x30000.
The previous beta version of the spec had this value set to 0x20001 and
th same value trickled down from the initial development. The version
update were missed in the driver.
Fixes: e49e314a2c ("firmware: arm_scmi: Add clock v3.2 CONFIG_SET support")
Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
Link: https://lore.kernel.org/r/20240109150106.2066739-1-cristian.marussi@arm.com
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
On reception of a completion interrupt the shared memory area is accessed
to retrieve the message header at first and then, if the message sequence
number identifies a transaction which is still pending, the related
payload is fetched too.
When an SCMI command times out the channel ownership remains with the
platform until eventually a late reply is received and, as a consequence,
any further transmission attempt remains pending, waiting for the channel
to be relinquished by the platform.
Once that late reply is received the channel ownership is given back
to the agent and any pending request is then allowed to proceed and
overwrite the SMT area of the just delivered late reply; then the wait
for the reply to the new request starts.
It has been observed that the spurious IRQ related to the late reply can
be wrongly associated with the freshly enqueued request: when that happens
the SCMI stack in-flight lookup procedure is fooled by the fact that the
message header now present in the SMT area is related to the new pending
transaction, even though the real reply has still to arrive.
This race-condition on the A2P channel can be detected by looking at the
channel status bits: a genuine reply from the platform will have set the
channel free bit before triggering the completion IRQ.
Add a consistency check to validate such condition in the A2P ISR.
Reported-by: Xinglong Yang <xinglong.yang@cixtech.com>
Closes: https://lore.kernel.org/all/PUZPR06MB54981E6FA00D82BFDBB864FBF08DA@PUZPR06MB5498.apcprd06.prod.outlook.com/
Fixes: 5c8a47a5a9 ("firmware: arm_scmi: Make scmi core independent of the transport type")
Cc: stable@vger.kernel.org # 5.15+
Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
Tested-by: Xinglong Yang <xinglong.yang@cixtech.com>
Link: https://lore.kernel.org/r/20231220172112.763539-1-cristian.marussi@arm.com
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
We have a number of missing prototypes warnings for board_setup(),
alchemy_set_lpj() and prom_init_cmdline(), prom_getenv() and
prom_get_ethernet_addr(). Fix those by providing definitions for the
first two functions in au1000.h which is included everywhere relevant,
and including prom.h for the last three.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Fix missing prototypes warnings for cobalt_machine_halt() and
cobalt_machine_restart() by moving their prototypes to cobalt.h which is
included by setup.c.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
This was not supported properly. A buffer was imported to another VPU
context as a separate buffer object with duplicated sgt.
Both exported and imported buffers could be DMA mapped causing a double
mapping on the same device.
Buffers imported from another VPU context will now just increase
reference count, leaving only a single sgt, fixing the problem above.
Buffers still can't be shared among VPU contexts because each has its
own MMU mapping and ivpu_bo only supports single MMU mappings.
The solution would be to use a mapping list as in panfrost or etnaviv
drivers and it will be implemented in future if required.
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Reviewed-by: Andrzej Kacprowski <andrzej.kacprowski@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240115134434.493839-8-jacek.lawrynowicz@linux.intel.com
Recently xfs/513 started failing on my test machines testing "-o
ro,norecovery" mount options. This was being emitted in dmesg:
[ 9906.932724] XFS (pmem0): no-recovery mounts must be read-only.
Turns out, readonly mounts with the fsopen()/fsconfig() mount API
have been busted since day zero. It's only taken 5 years for debian
unstable to start using this "new" mount API, and shortly after this
I noticed xfs/513 had started to fail as per above.
The syscall trace is:
fsopen("xfs", FSOPEN_CLOEXEC) = 3
mount_setattr(-1, NULL, 0, NULL, 0) = -1 EINVAL (Invalid argument)
.....
fsconfig(3, FSCONFIG_SET_STRING, "source", "/dev/pmem0", 0) = 0
fsconfig(3, FSCONFIG_SET_FLAG, "ro", NULL, 0) = 0
fsconfig(3, FSCONFIG_SET_FLAG, "norecovery", NULL, 0) = 0
fsconfig(3, FSCONFIG_CMD_CREATE, NULL, NULL, 0) = -1 EINVAL (Invalid argument)
close(3) = 0
Showing that the actual mount instantiation (FSCONFIG_CMD_CREATE) is
what threw out the error.
During mount instantiation, we call xfs_fs_validate_params() which
does:
/* No recovery flag requires a read-only mount */
if (xfs_has_norecovery(mp) && !xfs_is_readonly(mp)) {
xfs_warn(mp, "no-recovery mounts must be read-only.");
return -EINVAL;
}
and xfs_is_readonly() checks internal mount flags for read only
state. This state is set in xfs_init_fs_context() from the
context superblock flag state:
/*
* Copy binary VFS mount flags we are interested in.
*/
if (fc->sb_flags & SB_RDONLY)
set_bit(XFS_OPSTATE_READONLY, &mp->m_opstate);
With the old mount API, all of the VFS specific superblock flags
had already been parsed and set before xfs_init_fs_context() is
called, so this all works fine.
However, in the brave new fsopen/fsconfig world,
xfs_init_fs_context() is called from fsopen() context, before any
VFS superblock have been set or parsed. Hence if we use fsopen(),
the internal XFS readonly state is *never set*. Hence anything that
depends on xfs_is_readonly() actually returning true for read only
mounts is broken if fsopen() has been used to mount the filesystem.
Fix this by moving this internal state initialisation to
xfs_fs_fill_super() before we attempt to validate the parameters
that have been set prior to the FSCONFIG_CMD_CREATE call being made.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Fixes: 73e5fff98b ("xfs: switch to use the new mount-api")
cc: stable@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
Do not forget to call clk_disable_unprepare() on the first element of
ctx->clocks array.
Found by Linux Verification Center (linuxtesting.org).
Fixes: 8b7d3ec83a ("drm/exynos: gsc: Convert driver to IPP v2 core API")
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Reviewed-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
gcc rightfully complains about excessive stack usage in the fimd_win_set_pixfmt()
function:
drivers/gpu/drm/exynos/exynos_drm_fimd.c: In function 'fimd_win_set_pixfmt':
drivers/gpu/drm/exynos/exynos_drm_fimd.c:750:1: error: the frame size of 1032 bytes is larger than 1024 byte
drivers/gpu/drm/exynos/exynos5433_drm_decon.c: In function 'decon_win_set_pixfmt':
drivers/gpu/drm/exynos/exynos5433_drm_decon.c:381:1: error: the frame size of 1032 bytes is larger than 1024 bytes
There is really no reason to copy the large exynos_drm_plane
structure to the stack before using one of its members, so just
use a pointer instead.
Fixes: 6f8ee5c217 ("drm/exynos: fimd: Make plane alpha configurable")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
Mark reports a BUG() when a net namespace is removed.
kernel BUG at net/core/dev.c:11520!
Physical interfaces moved outside of init_net get "refunded"
to init_net when that namespace disappears. The main interface
name may get overwritten in the process if it would have
conflicted. We need to also discard all conflicting altnames.
Recent fixes addressed ensuring that altnames get moved
with the main interface, which surfaced this problem.
Reported-by: Марк Коренберг <socketpair@gmail.com>
Link: https://lore.kernel.org/all/CAEmTpZFZ4Sv3KwqFOY2WKDHeZYdi0O7N5H1nTvcGp=SAEavtDg@mail.gmail.com/
Fixes: 7663d52209 ("net: check for altname conflicts when changing netdev's netns")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow user to specify outside CFLAGS values as make argument
Corrects an issue where CFLAGS is passed as a make argument for
cpupower, but bench's makefile does not inherit and append to them.
Signed-off-by: Stanley Chan <schan@cloudflare.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
It seems that we have finished addressing all the remaining
issues regarding -Wstringop-overflow. So, we are now in good
shape to enable this compiler option globally.
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
The running list is supposed to contain requests that are pinning the
exclusive lock, i.e. those that must be flushed before exclusive lock
is released. When wake_lock_waiters() is called to handle an error,
requests on the acquiring list are failed with that error and no
flushing takes place. Briefly moving them to the running list is not
only pointless but also harmful: if exclusive lock gets acquired
before all of their state machines are scheduled and go through
rbd_lock_del_request(), we trigger
rbd_assert(list_empty(&rbd_dev->running_list));
in rbd_try_acquire_lock().
Cc: stable@vger.kernel.org
Fixes: 637cd06053 ("rbd: new exclusive lock wait/wake code")
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
ida_alloc() and ida_free() should be preferred to the deprecated
ida_simple_get() and ida_simple_remove().
Note that the upper limit of ida_simple_get() is exclusive, while that
of ida_alloc_max() is inclusive, so 1 has been subtracted.
[ idryomov: tweak changelog ]
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Fix some kernel-doc comments to silence the warnings:
fs/smb/server/transport_tcp.c:374: warning: Function parameter or struct member 'max_retries' not described in 'ksmbd_tcp_read'
fs/smb/server/transport_tcp.c:423: warning: Function parameter or struct member 'iface' not described in 'create_socket'
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
idpf registers multiple netdevs (virtual ports) for one PCI function,
but it does not provide a way for userspace to distinguish them with
sysfs attributes. Per Documentation/ABI/testing/sysfs-class-net, it is
a bug not to set dev_port for independent ports on the same PCI bus,
device and function.
Without dev_port set, systemd-udevd's default naming policy attempts
to assign the same name ("ens2f0") to all four idpf netdevs on my test
system and obviously fails, leaving three of them with the initial
eth<N> name.
With this patch, systemd-udevd is able to assign unique names to the
netdevs (e.g. "ens2f0", "ens2f0d1", "ens2f0d2", "ens2f0d3").
The Intel-provided out-of-tree idpf driver already sets dev_port. In
this patch I chose to do it in the same place in the idpf_cfg_netdev
function.
Fixes: 0fe45467a1 ("idpf: add create vport and netdev configuration")
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Generic sk_busy_loop_end() only looks at sk->sk_receive_queue
for presence of packets.
Problem is that for UDP sockets after blamed commit, some packets
could be present in another queue: udp_sk(sk)->reader_queue
In some cases, a busy poller could spin until timeout expiration,
even if some packets are available in udp_sk(sk)->reader_queue.
v3: - make sk_busy_loop_end() nicer (Willem)
v2: - add a READ_ONCE(sk->sk_family) in sk_is_inet() to avoid KCSAN splats.
- add a sk_is_inet() check in sk_is_udp() (Willem feedback)
- add a sk_is_inet() check in sk_is_tcp().
Fixes: 2276f58ac5 ("udp: use a separate rx queue for packet reception")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The userspace program could pass any values to the driver through
ioctl() interface. If the driver doesn't check the value of pixclock,
it may cause divide-by-zero error.
In sisfb_check_var(), var->pixclock is used as a divisor to caculate
drate before it is checked against zero. Fix this by checking it
at the beginning.
This is similar to CVE-2022-3061 in i740fb which was fixed by
commit 15cf0b8.
Signed-off-by: Fullway Wang <fullwaywang@outlook.com>
Signed-off-by: Helge Deller <deller@gmx.de>
The userspace program could pass any values to the driver through
ioctl() interface. If the driver doesn't check the value of pixclock,
it may cause divide-by-zero error.
Although pixclock is checked in savagefb_decode_var(), but it is not
checked properly in savagefb_probe(). Fix this by checking whether
pixclock is zero in the function savagefb_check_var() before
info->var.pixclock is used as the divisor.
This is similar to CVE-2022-3061 in i740fb which was fixed by
commit 15cf0b8.
Signed-off-by: Fullway Wang <fullwaywang@outlook.com>
Signed-off-by: Helge Deller <deller@gmx.de>
syzbot reported an uninit-value bug below. [0]
llc supports ETH_P_802_2 (0x0004) and used to support ETH_P_TR_802_2
(0x0011), and syzbot abused the latter to trigger the bug.
write$tun(r0, &(0x7f0000000040)={@val={0x0, 0x11}, @val, @mpls={[], @llc={@snap={0xaa, 0x1, ')', "90e5dd"}}}}, 0x16)
llc_conn_handler() initialises local variables {saddr,daddr}.mac
based on skb in llc_pdu_decode_sa()/llc_pdu_decode_da() and passes
them to __llc_lookup().
However, the initialisation is done only when skb->protocol is
htons(ETH_P_802_2), otherwise, __llc_lookup_established() and
__llc_lookup_listener() will read garbage.
The missing initialisation existed prior to commit 211ed86510
("net: delete all instances of special processing for token ring").
It removed the part to kick out the token ring stuff but forgot to
close the door allowing ETH_P_TR_802_2 packets to sneak into llc_rcv().
Let's remove llc_tr_packet_type and complete the deprecation.
[0]:
BUG: KMSAN: uninit-value in __llc_lookup_established+0xe9d/0xf90
__llc_lookup_established+0xe9d/0xf90
__llc_lookup net/llc/llc_conn.c:611 [inline]
llc_conn_handler+0x4bd/0x1360 net/llc/llc_conn.c:791
llc_rcv+0xfbb/0x14a0 net/llc/llc_input.c:206
__netif_receive_skb_one_core net/core/dev.c:5527 [inline]
__netif_receive_skb+0x1a6/0x5a0 net/core/dev.c:5641
netif_receive_skb_internal net/core/dev.c:5727 [inline]
netif_receive_skb+0x58/0x660 net/core/dev.c:5786
tun_rx_batched+0x3ee/0x980 drivers/net/tun.c:1555
tun_get_user+0x53af/0x66d0 drivers/net/tun.c:2002
tun_chr_write_iter+0x3af/0x5d0 drivers/net/tun.c:2048
call_write_iter include/linux/fs.h:2020 [inline]
new_sync_write fs/read_write.c:491 [inline]
vfs_write+0x8ef/0x1490 fs/read_write.c:584
ksys_write+0x20f/0x4c0 fs/read_write.c:637
__do_sys_write fs/read_write.c:649 [inline]
__se_sys_write fs/read_write.c:646 [inline]
__x64_sys_write+0x93/0xd0 fs/read_write.c:646
do_syscall_x64 arch/x86/entry/common.c:51 [inline]
do_syscall_64+0x44/0x110 arch/x86/entry/common.c:82
entry_SYSCALL_64_after_hwframe+0x63/0x6b
Local variable daddr created at:
llc_conn_handler+0x53/0x1360 net/llc/llc_conn.c:783
llc_rcv+0xfbb/0x14a0 net/llc/llc_input.c:206
CPU: 1 PID: 5004 Comm: syz-executor994 Not tainted 6.6.0-syzkaller-14500-g1c41041124bd #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/09/2023
Fixes: 211ed86510 ("net: delete all instances of special processing for token ring")
Reported-by: syzbot+b5ad66046b913bc04c6f@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=b5ad66046b913bc04c6f
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240119015515.61898-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In the vlan_changelink function, a loop is used to parse the nested
attributes IFLA_VLAN_EGRESS_QOS and IFLA_VLAN_INGRESS_QOS in order to
obtain the struct ifla_vlan_qos_mapping. These two nested attributes are
checked in the vlan_validate_qos_map function, which calls
nla_validate_nested_deprecated with the vlan_map_policy.
However, this deprecated validator applies a LIBERAL strictness, allowing
the presence of an attribute with the type IFLA_VLAN_QOS_UNSPEC.
Consequently, the loop in vlan_changelink may parse an attribute of type
IFLA_VLAN_QOS_UNSPEC and believe it carries a payload of
struct ifla_vlan_qos_mapping, which is not necessarily true.
To address this issue and ensure compatibility, this patch introduces two
type checks that skip attributes whose type is not IFLA_VLAN_QOS_MAPPING.
Fixes: 07b5b17e15 ("[VLAN]: Use rtnl_link API")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240118130306.1644001-1-linma@zju.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Michael Chan says:
====================
bnxt_en: Bug fixes
This series contains 5 miscellaneous fixes. The fixes include adding
delay for FLR, buffer memory leak, RSS table size calculation,
ethtool self test kernel warning, and mqprio crash.
====================
Link: https://lore.kernel.org/r/20240117234515.226944-1-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We call bnxt_half_open_nic() to setup the chip partially to run
loopback tests. The rings and buffers are initialized normally
so that we can transmit and receive packets in loopback mode.
That means page pool buffers are allocated for the aggregation ring
just like the normal case. NAPI is not needed because we are just
polling for the loopback packets.
When we're done with the loopback tests, we call bnxt_half_close_nic()
to clean up. When freeing the page pools, we hit a WARN_ON()
in page_pool_unlink_napi() because the NAPI state linked to the
page pool is uninitialized.
The simplest way to avoid this warning is just to initialize the
NAPIs during half open and delete the NAPIs during half close.
Trying to skip the page pool initialization or skip linking of
NAPI during half open will be more complicated.
This fix avoids this warning:
WARNING: CPU: 4 PID: 46967 at net/core/page_pool.c:946 page_pool_unlink_napi+0x1f/0x30
CPU: 4 PID: 46967 Comm: ethtool Tainted: G S W 6.7.0-rc5+ #22
Hardware name: Dell Inc. PowerEdge R750/06V45N, BIOS 1.3.8 08/31/2021
RIP: 0010:page_pool_unlink_napi+0x1f/0x30
Code: 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 48 8b 47 18 48 85 c0 74 1b 48 8b 50 10 83 e2 01 74 08 8b 40 34 83 f8 ff 74 02 <0f> 0b 48 c7 47 18 00 00 00 00 c3 cc cc cc cc 66 90 90 90 90 90 90
RSP: 0018:ffa000003d0dfbe8 EFLAGS: 00010246
RAX: ff110003607ce640 RBX: ff110010baf5d000 RCX: 0000000000000008
RDX: 0000000000000000 RSI: ff110001e5e522c0 RDI: ff110010baf5d000
RBP: ff11000145539b40 R08: 0000000000000001 R09: ffffffffc063f641
R10: ff110001361eddb8 R11: 000000000040000f R12: 0000000000000001
R13: 000000000000001c R14: ff1100014553a080 R15: 0000000000003fc0
FS: 00007f9301c4f740(0000) GS:ff1100103fd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f91344fa8f0 CR3: 00000003527cc005 CR4: 0000000000771ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
<TASK>
? __warn+0x81/0x140
? page_pool_unlink_napi+0x1f/0x30
? report_bug+0x102/0x200
? handle_bug+0x44/0x70
? exc_invalid_op+0x13/0x60
? asm_exc_invalid_op+0x16/0x20
? bnxt_free_ring.isra.123+0xb1/0xd0 [bnxt_en]
? page_pool_unlink_napi+0x1f/0x30
page_pool_destroy+0x3e/0x150
bnxt_free_mem+0x441/0x5e0 [bnxt_en]
bnxt_half_close_nic+0x2a/0x40 [bnxt_en]
bnxt_self_test+0x21d/0x450 [bnxt_en]
__dev_ethtool+0xeda/0x2e30
? native_queued_spin_lock_slowpath+0x17f/0x2b0
? __link_object+0xa1/0x160
? _raw_spin_unlock_irqrestore+0x23/0x40
? __create_object+0x5f/0x90
? __kmem_cache_alloc_node+0x317/0x3c0
? dev_ethtool+0x59/0x170
dev_ethtool+0xa7/0x170
dev_ioctl+0xc3/0x530
sock_do_ioctl+0xa8/0xf0
sock_ioctl+0x270/0x310
__x64_sys_ioctl+0x8c/0xc0
do_syscall_64+0x3e/0xf0
entry_SYSCALL_64_after_hwframe+0x6e/0x76
Fixes: 294e39e0d0 ("bnxt: hook NAPIs to page pools")
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20240117234515.226944-5-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The existing formula used in the driver to calculate the number of RSS
table entries is to round up the number of RX rings to the next integer
multiples of 64 (e.g. 64, 128, 192, ..). This is incorrect. The valid
values supported by the chip are 64, 128, 256, 512 only (power of 2
starting from 64). When the number of RX rings is greater than 128, the
entry size will likely be wrong. Firmware will round down the invalid
value (e.g. 192 rounded down to 128) provided by the driver, causing some
RSS rings to not receive any packets.
We already have an existing function bnxt_calc_nr_ring_pages() to
do this calculation. Use it in bnxt_get_nr_rss_ctxs() to calculate the
number of RSS contexts correctly for P5_PLUS chips.
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Fixes: 7b3af4f75b ("bnxt_en: Add RSS support for 57500 chips.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20240117234515.226944-4-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When tests are run by runner.sh, bond_options.sh gets killed before
it can complete:
make -C tools/testing/selftests run_tests TARGETS="drivers/net/bonding"
[...]
# timeout set to 120
# selftests: drivers/net/bonding: bond_options.sh
# TEST: prio (active-backup miimon primary_reselect 0) [ OK ]
# TEST: prio (active-backup miimon primary_reselect 1) [ OK ]
# TEST: prio (active-backup miimon primary_reselect 2) [ OK ]
# TEST: prio (active-backup arp_ip_target primary_reselect 0) [ OK ]
# TEST: prio (active-backup arp_ip_target primary_reselect 1) [ OK ]
# TEST: prio (active-backup arp_ip_target primary_reselect 2) [ OK ]
#
not ok 7 selftests: drivers/net/bonding: bond_options.sh # TIMEOUT 120 seconds
This test includes many sleep statements, at least some of which are
related to timers in the operation of the bonding driver itself. Increase
the test timeout to allow the test to complete.
I ran the test in slightly different VMs (including one without HW
virtualization support) and got runtimes of 13m39.760s, 13m31.238s, and
13m2.956s. Use a ~1.5x "safety factor" and set the timeout to 1200s.
Fixes: 42a8d4aaea ("selftests: bonding: add bonding prio option test")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Closes: https://lore.kernel.org/netdev/20240116104402.1203850a@kernel.org/#t
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://lore.kernel.org/r/20240118001233.304759-1-bpoirier@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
A crash was found when dumping SMC-D connections. It can be reproduced
by following steps:
- run nginx/wrk test:
smc_run nginx
smc_run wrk -t 16 -c 1000 -d <duration> -H 'Connection: Close' <URL>
- continuously dump SMC-D connections in parallel:
watch -n 1 'smcss -D'
BUG: kernel NULL pointer dereference, address: 0000000000000030
CPU: 2 PID: 7204 Comm: smcss Kdump: loaded Tainted: G E 6.7.0+ #55
RIP: 0010:__smc_diag_dump.constprop.0+0x5e5/0x620 [smc_diag]
Call Trace:
<TASK>
? __die+0x24/0x70
? page_fault_oops+0x66/0x150
? exc_page_fault+0x69/0x140
? asm_exc_page_fault+0x26/0x30
? __smc_diag_dump.constprop.0+0x5e5/0x620 [smc_diag]
? __kmalloc_node_track_caller+0x35d/0x430
? __alloc_skb+0x77/0x170
smc_diag_dump_proto+0xd0/0xf0 [smc_diag]
smc_diag_dump+0x26/0x60 [smc_diag]
netlink_dump+0x19f/0x320
__netlink_dump_start+0x1dc/0x300
smc_diag_handler_dump+0x6a/0x80 [smc_diag]
? __pfx_smc_diag_dump+0x10/0x10 [smc_diag]
sock_diag_rcv_msg+0x121/0x140
? __pfx_sock_diag_rcv_msg+0x10/0x10
netlink_rcv_skb+0x5a/0x110
sock_diag_rcv+0x28/0x40
netlink_unicast+0x22a/0x330
netlink_sendmsg+0x1f8/0x420
__sock_sendmsg+0xb0/0xc0
____sys_sendmsg+0x24e/0x300
? copy_msghdr_from_user+0x62/0x80
___sys_sendmsg+0x7c/0xd0
? __do_fault+0x34/0x160
? do_read_fault+0x5f/0x100
? do_fault+0xb0/0x110
? __handle_mm_fault+0x2b0/0x6c0
__sys_sendmsg+0x4d/0x80
do_syscall_64+0x69/0x180
entry_SYSCALL_64_after_hwframe+0x6e/0x76
It is possible that the connection is in process of being established
when we dump it. Assumed that the connection has been registered in a
link group by smc_conn_create() but the rmb_desc has not yet been
initialized by smc_buf_create(), thus causing the illegal access to
conn->rmb_desc. So fix it by checking before dump.
Fixes: 4b1b7d3b30 ("net/smc: add SMC-D diag support")
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Slaby reported a futex state inconsistency resulting in -EINVAL during
a lock operation for a PI futex. It requires that the a lock process is
interrupted by a timeout or signal:
T1 Owns the futex in user space.
T2 Tries to acquire the futex in kernel (futex_lock_pi()). Allocates a
pi_state and attaches itself to it.
T2 Times out and removes its rt_waiter from the rt_mutex. Drops the
rtmutex lock and tries to acquire the hash bucket lock to remove
the futex_q. The lock is contended and T2 schedules out.
T1 Unlocks the futex (futex_unlock_pi()). Finds a futex_q but no
rt_waiter. Unlocks the futex (do_uncontended) and makes it available
to user space.
T3 Acquires the futex in user space.
T4 Tries to acquire the futex in kernel (futex_lock_pi()). Finds the
existing futex_q of T2 and tries to attach itself to the existing
pi_state. This (attach_to_pi_state()) fails with -EINVAL because uval
contains the TID of T3 but pi_state points to T1.
It's incorrect to unlock the futex and make it available for user space to
acquire as long as there is still an existing state attached to it in the
kernel.
T1 cannot hand over the futex to T2 because T2 already gave up and started
to clean up and is blocked on the hash bucket lock, so T2's futex_q with
the pi_state pointing to T1 is still queued.
T2 observes the futex_q, but ignores it as there is no waiter on the
corresponding rt_mutex and takes the uncontended path which allows the
subsequent caller of futex_lock_pi() (T4) to observe that stale state.
To prevent this the unlock path must dequeue all futex_q entries which
point to the same pi_state when there is no waiter on the rt mutex. This
requires obviously to make the dequeue conditional in the locking path to
prevent a double dequeue. With that it's guaranteed that user space cannot
observe an uncontended futex which has kernel state attached.
Fixes: fbeb558b0d ("futex/pi: Fix recursive rt_mutex waiter state")
Reported-by: Jiri Slaby <jirislaby@kernel.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jiri Slaby <jirislaby@kernel.org>
Link: https://lore.kernel.org/r/20240118115451.0TkD_ZhB@linutronix.de
Closes: https://lore.kernel.org/all/4611bcf2-44d0-4c34-9b84-17406f881003@kernel.org
After commit 61167ad5fe ("mm: pass nid to reserve_bootmem_region()")
nid of a reserved region is used by init_reserved_page() (with
CONFIG_DEFERRED_STRUCT_PAGE_INIT=y) to access node strucure.
In many cases the nid of the reserved memory is not set and this causes
a crash.
When the nid of a reserved region is not set, fall back to
early_pfn_to_nid(), so that nid of the first_online_node will be passed
to init_reserved_page().
Fixes: 61167ad5fe ("mm: pass nid to reserve_bootmem_region()")
Signed-off-by: Yajun Deng <yajun.deng@linux.dev>
Link: https://lore.kernel.org/r/20240118061853.2652295-1-yajun.deng@linux.dev
[rppt: massaged the commit message]
Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Some devices list 3 Gpio resources in the ACPI resource list for
the touchscreen:
1. GpioInt resource pointing to the GPIO used for the interrupt
2. GpioIo resource pointing to the reset GPIO
3. GpioIo resource pointing to the GPIO used for the interrupt
Note how the third extra GpioIo resource really is a duplicate
of the GpioInt provided info.
Ignore this extra GPIO, treating this setup the same as gpio_count == 2 &&
gpio_int_idx == 0 fixes the touchscreen not working on the Thunderbook
Colossus W803 rugged tablet and likely also on the CyberBook_T116K.
Reported-by: Maarten van der Schrieck
Closes: https://gitlab.com/AdyaAdya/goodix-touchscreen-linux-driver/-/issues/22
Suggested-by: Maarten van der Schrieck
Tested-by: Maarten van der Schrieck
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20231223141650.10679-1-hdegoede@redhat.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
We extend the KVM ISA extension ONE_REG interface to allow KVM
user space to detect and enable Zfa extension for Guest/VM.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
The KVM RISC-V allows Zvfh[min] extensions for Guest/VM so let us
add these extensions to get-reg-list test.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
We extend the KVM ISA extension ONE_REG interface to allow KVM
user space to detect and enable Zvfh[min] extensions for Guest/VM.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
We extend the KVM ISA extension ONE_REG interface to allow KVM
user space to detect and enable Zihintntl extension for Guest/VM.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
We extend the KVM ISA extension ONE_REG interface to allow KVM
user space to detect and enable Zfh[min] extensions for Guest/VM.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
The KVM RISC-V allows vector crypto extensions for Guest/VM so let us
add these extensions to get-reg-list test. This includes extensions
Zvbb, Zvbc, Zvkb, Zvkg, Zvkned, Zvknha, Zvknhb, Zvksed, Zvksh, and Zvkt.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
We extend the KVM ISA extension ONE_REG interface to allow KVM
user space to detect and enable vector crypto extensions for
Guest/VM. This includes extensions Zvbb, Zvbc, Zvkb, Zvkg,
Zvkned, Zvknha, Zvknhb, Zvksed, Zvksh, and Zvkt.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
The KVM RISC-V allows scaler crypto extensions for Guest/VM so let us
add these extensions to get-reg-list test. This includes extensions
Zbkb, Zbkc, Zbkx, Zknd, Zkne, Zknh, Zkr, Zksed, Zksh, and Zkt.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
We extend the KVM ISA extension ONE_REG interface to allow KVM
user space to detect and enable scalar crypto extensions for
Guest/VM. This includes extensions Zbkb, Zbkc, Zbkx, Zknd, Zkne,
Zknh, Zkr, Zksed, Zksh, and Zkt.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
We extend the KVM ISA extension ONE_REG interface to allow KVM
user space to detect and enable Zbc extension for Guest/VM.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
[BUG]
If there is an extent beyond chunk boundary, currently RST scrub would
error out.
[CAUSE]
In scrub_submit_extent_sector_read(), we completely rely on
extent_sector_bitmap, which is populated using extent tree.
The extent tree can be corrupted that there is an extent item beyond a
chunk.
In that case, RST scrub would fail and error out.
[FIX]
Despite the extent_sector_bitmap usage, also limit the read to chunk
boundary.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
There is a bug report that, on a ext4-converted btrfs, scrub leads to
various problems, including:
- "unable to find chunk map" errors
BTRFS info (device vdb): scrub: started on devid 1
BTRFS critical (device vdb): unable to find chunk map for logical 2214744064 length 4096
BTRFS critical (device vdb): unable to find chunk map for logical 2214744064 length 45056
This would lead to unrepariable errors.
- Use-after-free KASAN reports:
==================================================================
BUG: KASAN: slab-use-after-free in __blk_rq_map_sg+0x18f/0x7c0
Read of size 8 at addr ffff8881013c9040 by task btrfs/909
CPU: 0 PID: 909 Comm: btrfs Not tainted 6.7.0-x64v3-dbg #11 c50636e9419a8354555555245df535e380563b2b
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 2023.11-2 12/24/2023
Call Trace:
<TASK>
dump_stack_lvl+0x43/0x60
print_report+0xcf/0x640
kasan_report+0xa6/0xd0
__blk_rq_map_sg+0x18f/0x7c0
virtblk_prep_rq.isra.0+0x215/0x6a0 [virtio_blk 19a65eeee9ae6fcf02edfad39bb9ddee07dcdaff]
virtio_queue_rqs+0xc4/0x310 [virtio_blk 19a65eeee9ae6fcf02edfad39bb9ddee07dcdaff]
blk_mq_flush_plug_list.part.0+0x780/0x860
__blk_flush_plug+0x1ba/0x220
blk_finish_plug+0x3b/0x60
submit_initial_group_read+0x10a/0x290 [btrfs e57987a360bed82fe8756dcd3e0de5406ccfe965]
flush_scrub_stripes+0x38e/0x430 [btrfs e57987a360bed82fe8756dcd3e0de5406ccfe965]
scrub_stripe+0x82a/0xae0 [btrfs e57987a360bed82fe8756dcd3e0de5406ccfe965]
scrub_chunk+0x178/0x200 [btrfs e57987a360bed82fe8756dcd3e0de5406ccfe965]
scrub_enumerate_chunks+0x4bc/0xa30 [btrfs e57987a360bed82fe8756dcd3e0de5406ccfe965]
btrfs_scrub_dev+0x398/0x810 [btrfs e57987a360bed82fe8756dcd3e0de5406ccfe965]
btrfs_ioctl+0x4b9/0x3020 [btrfs e57987a360bed82fe8756dcd3e0de5406ccfe965]
__x64_sys_ioctl+0xbd/0x100
do_syscall_64+0x5d/0xe0
entry_SYSCALL_64_after_hwframe+0x63/0x6b
RIP: 0033:0x7f47e5e0952b
- Crash, mostly due to above use-after-free
[CAUSE]
The converted fs has the following data chunk layout:
item 2 key (FIRST_CHUNK_TREE CHUNK_ITEM 2214658048) itemoff 16025 itemsize 80
length 86016 owner 2 stripe_len 65536 type DATA|single
For above logical bytenr 2214744064, it's at the chunk end
(2214658048 + 86016 = 2214744064).
This means btrfs_submit_bio() would split the bio, and trigger endio
function for both of the two halves.
However scrub_submit_initial_read() would only expect the endio function
to be called once, not any more.
This means the first endio function would already free the bbio::bio,
leaving the bvec freed, thus the 2nd endio call would lead to
use-after-free.
[FIX]
- Make sure scrub_read_endio() only updates bits in its range
Since we may read less than 64K at the end of the chunk, we should not
touch the bits beyond chunk boundary.
- Make sure scrub_submit_initial_read() only to read the chunk range
This is done by calculating the real number of sectors we need to
read, and add sector-by-sector to the bio.
Thankfully the scrub read repair path won't need extra fixes:
- scrub_stripe_submit_repair_read()
With above fixes, we won't update error bit for range beyond chunk,
thus scrub_stripe_submit_repair_read() should never submit any read
beyond the chunk.
Reported-by: Rongrong <i@rong.moe>
Fixes: e02ee89baa ("btrfs: scrub: switch scrub_simple_mirror() to scrub_stripe infrastructure")
Tested-by: Rongrong <i@rong.moe>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In the normal case we check if a page is under writeback and skip it
before we attempt to begin writeback.
The exception is subpage metadata writes, where we know we don't have an
eb under writeback and we're doing it one eb at a time. Since
b5612c3686 ("mm: return void from folio_start_writeback() and related
functions") we now will BUG_ON() if we call folio_start_writeback()
on a folio that's already under writeback. Previously
folio_start_writeback() would bail if writeback was already started.
Fix this in the subpage code by checking if we have writeback set and
skipping it if we do. This fixes the panic we were seeing on subpage.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
btrfs/330, which tests our old trick to allow
mount -o ro,subvol=/x /dev/sda1 /foo
mount -o rw,subvol=/y /dev/sda1 /bar
fails on the block group tree. This is because we aren't preserving the
mount options for what is essentially a remount, and thus we're ending
up without the FREE_SPACE_TREE mount option, which triggers our free
space tree delete codepath. This isn't possible with the block group
tree and thus it falls over.
Fix this by making sure we copy the existing mount options for the
existing fs mount over in this case.
Fixes: f044b31867 ("btrfs: handle the ro->rw transition for mounting different subvolumes")
Reviewed-by: Neal Gompa <neal@gompa.dev>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
If we have a filesystem with 4k sectorsize, and an inlined compressed
extent created like this:
item 4 key (257 INODE_ITEM 0) itemoff 15863 itemsize 160
generation 8 transid 8 size 4096 nbytes 4096
block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0
sequence 1 flags 0x0(none)
item 5 key (257 INODE_REF 256) itemoff 15839 itemsize 24
index 2 namelen 14 name: source_inlined
item 6 key (257 EXTENT_DATA 0) itemoff 15770 itemsize 69
generation 8 type 0 (inline)
inline extent data size 48 ram_bytes 4096 compression 3 (zstd)
Then trying to reflink that extent in an aarch64 system with 64K page
size, the reflink would just fail:
# xfs_io -f -c "reflink $mnt/source_inlined 0 60k 4k" $mnt/dest
XFS_IOC_CLONE_RANGE: Input/output error
[CAUSE]
In zstd_decompress(), we didn't treat @start_byte as just a page offset,
but also use it as an indicator on whether we should error out, without
any proper explanation (this is copied from other decompression code).
In reality, for subpage cases, although @start_byte can be non-zero,
we should never switch input/output buffer nor error out, since the whole
input/output buffer should never exceed one sector, thus we should not
need to do any buffer switch.
Thus the current code using @start_byte as a condition to switch
input/output buffer or finish the decompression is completely incorrect.
[FIX]
The fix involves several modification:
- Rename @start_byte to @dest_pgoff to properly express its meaning
- Use @sectorsize other than PAGE_SIZE to properly initialize the
output buffer size
- Use correct destination offset inside the destination page
- Simplify the main loop
Since the input/output buffer should never switch, we only need one
zstd_decompress_stream() call.
- Consider early end as an error
After the fix, even on 64K page sized aarch64, above reflink now
works as expected:
# xfs_io -f -c "reflink $mnt/source_inlined 0 60k 4k" $mnt/dest
linked 4096/4096 bytes at offset 61440
And results the correct file layout:
item 9 key (258 INODE_ITEM 0) itemoff 15542 itemsize 160
generation 10 transid 10 size 65536 nbytes 4096
block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0
sequence 1 flags 0x0(none)
item 10 key (258 INODE_REF 256) itemoff 15528 itemsize 14
index 3 namelen 4 name: dest
item 11 key (258 XATTR_ITEM 3817753667) itemoff 15445 itemsize 83
location key (0 UNKNOWN.0 0) type XATTR
transid 10 data_len 37 name_len 16
name: security.selinux
data unconfined_u:object_r:unlabeled_t:s0
item 12 key (258 EXTENT_DATA 61440) itemoff 15392 itemsize 53
generation 10 type 1 (regular)
extent data disk byte 13631488 nr 4096
extent data offset 0 nr 4096 ram 4096
extent compression 0 (none)
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
If we have a filesystem with 4k sectorsize, and an inlined compressed
extent created like this:
item 4 key (257 INODE_ITEM 0) itemoff 15863 itemsize 160
generation 8 transid 8 size 4096 nbytes 4096
block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0
sequence 1 flags 0x0(none)
item 5 key (257 INODE_REF 256) itemoff 15839 itemsize 24
index 2 namelen 14 name: source_inlined
item 6 key (257 EXTENT_DATA 0) itemoff 15770 itemsize 69
generation 8 type 0 (inline)
inline extent data size 48 ram_bytes 4096 compression 2 (lzo)
Then trying to reflink that extent in an aarch64 system with 64K page
size, the reflink would just fail:
# xfs_io -f -c "reflink $mnt/source_inlined 0 60k 4k" $mnt/dest
XFS_IOC_CLONE_RANGE: Input/output error
[CAUSE]
In zlib_decompress(), we didn't treat @start_byte as just a page offset,
but also use it as an indicator on whether we should error out, without
any proper explanation (this is from the very beginning of btrfs).
In reality, for subpage cases, although @start_byte can be non-zero,
we should never switch input/output buffer nor error out, since the whole
input/output buffer should never exceed one sector.
Note: The above assumption is only not true if we're going to support
multi-page sectorsize.
Thus the current code using @start_byte as a condition to switch
input/output buffer or finish the decompression is completely incorrect.
[FIX]
The fix involves several modifications:
- Rename @start_byte to @dest_pgoff to properly express its meaning
- Use @sectorsize other than PAGE_SIZE to properly initialize the
output buffer size
- Use correct destination offset inside the destination page
- Use memcpy_to_page() to copy the contents to the destination page
- Use memzero_page() to zero out the tailing part
- Consider early end as an error
After the fix, even on 64K page sized aarch64, above reflink now
works as expected:
# xfs_io -f -c "reflink $mnt/source_inlined 0 60k 4k" $mnt/dest
linked 4096/4096 bytes at offset 61440
And results the correct file layout:
item 9 key (258 INODE_ITEM 0) itemoff 15542 itemsize 160
generation 10 transid 10 size 65536 nbytes 4096
block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0
sequence 1 flags 0x0(none)
item 10 key (258 INODE_REF 256) itemoff 15528 itemsize 14
index 3 namelen 4 name: dest
item 11 key (258 XATTR_ITEM 3817753667) itemoff 15445 itemsize 83
location key (0 UNKNOWN.0 0) type XATTR
transid 10 data_len 37 name_len 16
name: security.selinux
data unconfined_u:object_r:unlabeled_t:s0
item 12 key (258 EXTENT_DATA 61440) itemoff 15392 itemsize 53
generation 10 type 1 (regular)
extent data disk byte 13631488 nr 4096
extent data offset 0 nr 4096 ram 4096
extent compression 0 (none)
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
If we have a filesystem with 4k sectorsize, and an inlined compressed
extent created like this:
item 4 key (257 INODE_ITEM 0) itemoff 15863 itemsize 160
generation 8 transid 8 size 4096 nbytes 4096
block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0
sequence 1 flags 0x0(none)
item 5 key (257 INODE_REF 256) itemoff 15839 itemsize 24
index 2 namelen 14 name: source_inlined
item 6 key (257 EXTENT_DATA 0) itemoff 15770 itemsize 69
generation 8 type 0 (inline)
inline extent data size 48 ram_bytes 4096 compression 1 (zlib)
Which has an inline compressed extent at file offset 0, and its
decompressed size is 4K, allowing us to reflink that 4K range to another
location (which will not be compressed).
If we do such reflink on a subpage system, it would fail like this:
# xfs_io -f -c "reflink $mnt/source_inlined 0 60k 4k" $mnt/dest
XFS_IOC_CLONE_RANGE: Input/output error
[CAUSE]
In zlib_decompress(), we didn't treat @start_byte as just a page offset,
but also use it as an indicator on whether we should switch our output
buffer.
In reality, for subpage cases, although @start_byte can be non-zero,
we should never switch input/output buffer, since the whole input/output
buffer should never exceed one sector.
Note: The above assumption is only not true if we're going to support
multi-page sectorsize.
Thus the current code using @start_byte as a condition to switch
input/output buffer or finish the decompression is completely incorrect.
[FIX]
The fix involves several modifications:
- Rename @start_byte to @dest_pgoff to properly express its meaning
- Add an extra ASSERT() inside btrfs_decompress() to make sure the
input/output size never exceeds one sector.
- Use Z_FINISH flag to make sure the decompression happens in one go
- Remove the loop needed to switch input/output buffers
- Use correct destination offset inside the destination page
- Consider early end as an error
After the fix, even on 64K page sized aarch64, above reflink now
works as expected:
# xfs_io -f -c "reflink $mnt/source_inlined 0 60k 4k" $mnt/dest
linked 4096/4096 bytes at offset 61440
And resulted a correct file layout:
item 9 key (258 INODE_ITEM 0) itemoff 15542 itemsize 160
generation 10 transid 10 size 65536 nbytes 4096
block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0
sequence 1 flags 0x0(none)
item 10 key (258 INODE_REF 256) itemoff 15528 itemsize 14
index 3 namelen 4 name: dest
item 11 key (258 XATTR_ITEM 3817753667) itemoff 15445 itemsize 83
location key (0 UNKNOWN.0 0) type XATTR
transid 10 data_len 37 name_len 16
name: security.selinux
data unconfined_u:object_r:unlabeled_t:s0
item 12 key (258 EXTENT_DATA 61440) itemoff 15392 itemsize 53
generation 10 type 1 (regular)
extent data disk byte 13631488 nr 4096
extent data offset 0 nr 4096 ram 4096
extent compression 0 (none)
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The "needed" controls the number of ext4_prealloc_space to discard in
ext4_discard_preallocations. Function ext4_discard_preallocations is
supposed to discard all non-used preallocated blocks when "needed"
is 0 and now ext4_discard_preallocations is always called with "needed"
= 0. Remove unnecessary parameter "needed" and remove all non-used
preallocated spaces in ext4_discard_preallocations to simplify the
code.
Note: If count of non-used preallocated spaces could be more than
UINT_MAX, there was a memory leak as some non-used preallocated
spaces are left ununsed and this commit will fix it. Otherwise,
there is no behavior change.
Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20240105092102.496631-9-shikemeng@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Determine if the group block bitmap is corrupted before using ac_b_ex in
ext4_mb_try_best_found() to avoid allocating blocks from a group with a
corrupted block bitmap in the following concurrency and making the
situation worse.
ext4_mb_regular_allocator
ext4_lock_group(sb, group)
ext4_mb_good_group
// check if the group bbitmap is corrupted
ext4_mb_complex_scan_group
// Scan group gets ac_b_ex but doesn't use it
ext4_unlock_group(sb, group)
ext4_mark_group_bitmap_corrupted(group)
// The block bitmap was corrupted during
// the group unlock gap.
ext4_mb_try_best_found
ext4_lock_group(ac->ac_sb, group)
ext4_mb_use_best_found
mb_mark_used
// Allocating blocks in block bitmap corrupted group
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20240104142040.2835097-7-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
After updating bb_free in mb_free_blocks, it is possible to return without
updating bb_fragments because the block being freed is found to have
already been freed, which leads to inconsistency between bb_free and
bb_fragments.
Since the group may be unlocked in ext4_grp_locked_error(), this can lead
to problems such as dividing by zero when calculating the average fragment
length. Hence move the update of bb_free to after the block double-free
check guarantees that the corresponding statistics are updated only after
the core block bitmap is modified.
Fixes: eabe0444df ("ext4: speed-up releasing blocks on commit")
CC: <stable@vger.kernel.org> # 3.10
Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20240104142040.2835097-5-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
This mostly reverts commit 6bd97bf273 ("ext4: remove redundant
mb_regenerate_buddy()") and reintroduces mb_regenerate_buddy(). Based on
code in mb_free_blocks(), fast commit replay can end up marking as free
blocks that are already marked as such. This causes corruption of the
buddy bitmap so we need to regenerate it in that case.
Reported-by: Jan Kara <jack@suse.cz>
Fixes: 6bd97bf273 ("ext4: remove redundant mb_regenerate_buddy()")
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20240104142040.2835097-4-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
In ext4_move_extents(), moved_len is only updated when all moves are
successfully executed, and only discards orig_inode and donor_inode
preallocations when moved_len is not zero. When the loop fails to exit
after successfully moving some extents, moved_len is not updated and
remains at 0, so it does not discard the preallocations.
If the moved extents overlap with the preallocated extents, the
overlapped extents are freed twice in ext4_mb_release_inode_pa() and
ext4_process_freed_data() (as described in commit 94d7c16cbb ("ext4:
Fix double-free of blocks with EXT4_IOC_MOVE_EXT")), and bb_free is
incremented twice. Hence when trim is executed, a zero-division bug is
triggered in mb_update_avg_fragment_size() because bb_free is not zero
and bb_fragments is zero.
Therefore, update move_len after each extent move to avoid the issue.
Reported-by: Wei Chen <harperchen1110@gmail.com>
Reported-by: xingwei lee <xrivendell7@gmail.com>
Closes: https://lore.kernel.org/r/CAO4mrferzqBUnCag8R3m2zf897ts9UEuhjFQGPtODT92rYyR2Q@mail.gmail.com
Fixes: fcf6b1b729 ("ext4: refactor ext4_move_extents code base")
CC: <stable@vger.kernel.org> # 3.18
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20240104142040.2835097-2-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
For dio read, bio will be leave in flight when a successful partial
aio read have been setup, blockdev_direct_IO() will return
-EIOCBQUEUED. In the case, iter->iov_offset will be not advanced,
the oops reported by syzbot will occur if revert iter->iov_offset
with iov_iter_revert(). The unwritten part had been zeroed by aio
read, so there is no need to zero it in dio read.
Reported-by: syzbot+fd404f6b03a58e8bc403@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=fd404f6b03a58e8bc403
Fixes: 11a347fb6c ("exfat: change to get file size from DataLength")
Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
fast-xmit must only be enabled after the sta has been uploaded to the driver,
otherwise it could end up passing the not-yet-uploaded sta via drv_tx calls
to the driver, leading to potential crashes because of uninitialized drv_priv
data.
Add a missing sta->uploaded check and re-check fast xmit after inserting a sta.
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://msgid.link/20240104181059.84032-1-nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Commit ffbd0c8c1e ("wifi: mac80211: add an element parsing unit test")
and commit 730eeb17bb ("wifi: cfg80211: add first kunit tests, for
element defrag") add new configs that depend on !KERNEL_6_2, but the config
option KERNEL_6_2 does not exist in the tree. This dependency is used for
handling backporting to restrict the option to certain kernels but this
really should not be carried around the mainline kernel tree.
Clean up this needless dependency on the non-existing option KERNEL_6_2.
Link: https://lore.kernel.org/lkml/CAKXUXMyfrM6amOR7Ysim3WNQ-Ckf9HJDqRhAoYmLXujo1UV+yA@mail.gmail.com/
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The nl80211_dump_interface() supports resumption
in case nl80211_send_iface() doesn't have the
resources to complete its work.
The logic would store the progress as iteration
offsets for rdev and wdev loops.
However the logic did not properly handle
resumption for non-last rdev. Assuming a system
with 2 rdevs, with 2 wdevs each, this could
happen:
dump(cb=[0, 0]):
if_start=cb[1] (=0)
send rdev0.wdev0 -> ok
send rdev0.wdev1 -> yield
cb[1] = 1
dump(cb=[0, 1]):
if_start=cb[1] (=1)
send rdev0.wdev1 -> ok
// since if_start=1 the rdev0.wdev0 got skipped
// through if_idx < if_start
send rdev1.wdev1 -> ok
The if_start needs to be reset back to 0 upon wdev
loop end.
The problem is actually hard to hit on a desktop,
and even on most routers. The prerequisites for
this manifesting was:
- more than 1 wiphy
- a few handful of interfaces
- dump without rdev or wdev filter
I was seeing this with 4 wiphys 9 interfaces each.
It'd miss 6 interfaces from the last wiphy
reported to userspace.
Signed-off-by: Michal Kazior <michal@plume.com>
Link: https://msgid.link/20240116142340.89678-1-kazikcz@gmail.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This reverts commit 88b065943c.
Lenovo 82TQ is unhappy if we do the display on sequence this
late. The display output shows severe corruption.
It's unclear if this is a failure on our part (perhaps
something to do with sending commands in LP mode after HS
/video mode transmission has been started? Though the backlight
on command at least seems to work) or simply that there are
some commands in the sequence that are needed to be done
earlier (eg. could be some DSC init stuff?). If the latter
then I don't think the current Windows code would work
either, but maybe this was originally tested with an older
driver, who knows.
Root causing this fully would likely require a lot of
experimentation which isn't really feasible without direct
access to the machine, so let's just accept failure and
go back to the original sequence.
Cc: stable@vger.kernel.org
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/10071
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240116210821.30194-1-ville.syrjala@linux.intel.com
Acked-by: Jani Nikula <jani.nikula@intel.com>
(cherry picked from commit dc524d0597)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Use the proper size when setting up the bio_vec, as otherwise only
zero-length UDP packets will be sent.
Fixes: baabf59c24 ("SUNRPC: Convert svc_udp_sendto() to use the per-socket bio_vec array")
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
The variable 'rb' is being assigned a value but it isn't being read
afterwards. The assignment is redundant and so 'rb' can be removed.
Cleans up clang scan build warning:
warning: Although the value stored to 'rb' is used in the enclosing
expression, the value is never actually read from 'rb'[deadcode.DeadStores]
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Link: https://lore.kernel.org/r/20240116112606.2263738-1-colin.i.king@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The ps8640 bridge seems to expect everything to be power cycled at the
disable process, but sometimes ps8640_aux_transfer() holds the runtime
PM reference and prevents the bridge from suspend.
Prevent that by introducing a mutex lock between ps8640_aux_transfer()
and .post_disable() to make sure the bridge is really powered off.
Fixes: 826cff3f7e ("drm/bridge: parade-ps8640: Enable runtime power management")
Signed-off-by: Pin-yen Lin <treapking@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20240109120528.1292601-1-treapking@chromium.org
Commit 23e7b1bfed ("xfrm: Don't accidentally set RTO_ONLINK in
decode_session4()") fixed a problem where decode_session4() could
erroneously set the RTO_ONLINK flag for IPv4 route lookups. This
problem was reintroduced when decode_session4() was modified to
use the flow dissector.
Fix this by clearing again the two low order bits of ->flowi4_tos.
Found by code inspection, compile tested only.
Fixes: 7a0207094f ("xfrm: policy: replace session decode with flow dissector")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
A null pointer dereference crash has been observed rarely on TI
platforms using sii9022 bridge:
[ 53.271356] sii902x_get_edid+0x34/0x70 [sii902x]
[ 53.276066] sii902x_bridge_get_edid+0x14/0x20 [sii902x]
[ 53.281381] drm_bridge_get_edid+0x20/0x34 [drm]
[ 53.286305] drm_bridge_connector_get_modes+0x8c/0xcc [drm_kms_helper]
[ 53.292955] drm_helper_probe_single_connector_modes+0x190/0x538 [drm_kms_helper]
[ 53.300510] drm_client_modeset_probe+0x1f0/0xbd4 [drm]
[ 53.305958] __drm_fb_helper_initial_config_and_unlock+0x50/0x510 [drm_kms_helper]
[ 53.313611] drm_fb_helper_initial_config+0x48/0x58 [drm_kms_helper]
[ 53.320039] drm_fbdev_dma_client_hotplug+0x84/0xd4 [drm_dma_helper]
[ 53.326401] drm_client_register+0x5c/0xa0 [drm]
[ 53.331216] drm_fbdev_dma_setup+0xc8/0x13c [drm_dma_helper]
[ 53.336881] tidss_probe+0x128/0x264 [tidss]
[ 53.341174] platform_probe+0x68/0xc4
[ 53.344841] really_probe+0x188/0x3c4
[ 53.348501] __driver_probe_device+0x7c/0x16c
[ 53.352854] driver_probe_device+0x3c/0x10c
[ 53.357033] __device_attach_driver+0xbc/0x158
[ 53.361472] bus_for_each_drv+0x88/0xe8
[ 53.365303] __device_attach+0xa0/0x1b4
[ 53.369135] device_initial_probe+0x14/0x20
[ 53.373314] bus_probe_device+0xb0/0xb4
[ 53.377145] deferred_probe_work_func+0xcc/0x124
[ 53.381757] process_one_work+0x1f0/0x518
[ 53.385770] worker_thread+0x1e8/0x3dc
[ 53.389519] kthread+0x11c/0x120
[ 53.392750] ret_from_fork+0x10/0x20
The issue here is as follows:
- tidss probes, but is deferred as sii902x is still missing.
- sii902x starts probing and enters sii902x_init().
- sii902x calls drm_bridge_add(). Now the sii902x bridge is ready from
DRM's perspective.
- sii902x calls sii902x_audio_codec_init() and
platform_device_register_data()
- The registration of the audio platform device causes probing of the
deferred devices.
- tidss probes, which eventually causes sii902x_bridge_get_edid() to be
called.
- sii902x_bridge_get_edid() tries to use the i2c to read the edid.
However, the sii902x driver has not set up the i2c part yet, leading
to the crash.
Fix this by moving the drm_bridge_add() to the end of the
sii902x_init(), which is also at the very end of sii902x_probe().
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com>
Fixes: 21d808405f ("drm/bridge/sii902x: Fix EDID readback")
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/r/20240103-si902x-fixes-v1-1-b9fd3e448411@ideasonboard.com
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20240103-si902x-fixes-v1-1-b9fd3e448411@ideasonboard.com
QXL driver doesn't use any device for DMA mappings or allocations so
dev_to_node() will panic inside ttm_device_init() on NUMA systems:
general protection fault, probably for non-canonical address 0xdffffc000000007a: 0000 [#1] PREEMPT SMP KASAN NOPTI
KASAN: null-ptr-deref in range [0x00000000000003d0-0x00000000000003d7]
CPU: 1 PID: 1 Comm: swapper/0 Not tainted 6.7.0+ #9
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014
RIP: 0010:ttm_device_init+0x10e/0x340
Call Trace:
<TASK>
qxl_ttm_init+0xaa/0x310
qxl_device_init+0x1071/0x2000
qxl_pci_probe+0x167/0x3f0
local_pci_probe+0xe1/0x1b0
pci_device_probe+0x29d/0x790
really_probe+0x251/0x910
__driver_probe_device+0x1ea/0x390
driver_probe_device+0x4e/0x2e0
__driver_attach+0x1e3/0x600
bus_for_each_dev+0x12d/0x1c0
bus_add_driver+0x25a/0x590
driver_register+0x15c/0x4b0
qxl_pci_driver_init+0x67/0x80
do_one_initcall+0xf5/0x5d0
kernel_init_freeable+0x637/0xb10
kernel_init+0x1c/0x2e0
ret_from_fork+0x48/0x80
ret_from_fork_asm+0x1b/0x30
</TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:ttm_device_init+0x10e/0x340
Fall back to NUMA_NO_NODE if there is no device for DMA.
Found by Linux Verification Center (linuxtesting.org).
Fixes: b0a7ce53d4 ("drm/ttm: Schedule delayed_delete worker closer")
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Link: https://patchwork.freedesktop.org/patch/msgid/20240113213347.9562-1-pchelkin@ispras.ru
Signed-off-by: Christian König <christian.koenig@amd.com>
There are a number of issues in this code. First of all if
steam_create_client_hid() fails then it leads to an error pointer
dereference when we call hid_destroy_device(steam->client_hdev).
Also there are a number of leaks. hid_hw_stop() is not called if
hid_hw_open() fails for example. And it doesn't call steam_unregister()
or hid_hw_close().
Fixes: 691ead124a ("HID: hid-steam: Clean up locking")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Vicki Pfau <vi@endrift.com>
Link: https://lore.kernel.org/r/1fd87904-dabf-4879-bb89-72d13ebfc91e@moroto.mountain
Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
Add extra sanity check for btrfs_ioctl_defrag_range_args::flags.
This is not really to enhance fuzzing tests, but as a preparation for
future expansion on btrfs_ioctl_defrag_range_args.
In the future we're going to add new members, allowing more fine tuning
for btrfs defrag. Without the -ENONOTSUPP error, there would be no way
to detect if the kernel supports those new defrag features.
CC: stable@vger.kernel.org # 4.14+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Sweet Tea spotted a race between subvolume deletion and snapshotting
that can result in the root item for the snapshot having the
BTRFS_ROOT_SUBVOL_DEAD flag set. The race is:
Thread 1 | Thread 2
----------------------------------------------|----------
btrfs_delete_subvolume |
btrfs_set_root_flags(BTRFS_ROOT_SUBVOL_DEAD)|
|btrfs_mksubvol
| down_read(subvol_sem)
| create_snapshot
| ...
| create_pending_snapshot
| copy root item from source
down_write(subvol_sem) |
This flag is only checked in send and swap activate, which this would
cause to fail mysteriously.
create_snapshot() now checks the root refs to reject a deleted
subvolume, so we can fix this by locking subvol_sem earlier so that the
BTRFS_ROOT_SUBVOL_DEAD flag and the root refs are updated atomically.
CC: stable@vger.kernel.org # 4.14+
Reported-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Reviewed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The btrfs CI reported a lockdep warning as follows by running generic
generic/129.
WARNING: possible circular locking dependency detected
6.7.0-rc5+ #1 Not tainted
------------------------------------------------------
kworker/u5:5/793427 is trying to acquire lock:
ffff88813256d028 (&cache->lock){+.+.}-{2:2}, at: btrfs_zone_finish_one_bg+0x5e/0x130
but task is already holding lock:
ffff88810a23a318 (&fs_info->zone_active_bgs_lock){+.+.}-{2:2}, at: btrfs_zone_finish_one_bg+0x34/0x130
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&fs_info->zone_active_bgs_lock){+.+.}-{2:2}:
...
-> #0 (&cache->lock){+.+.}-{2:2}:
...
This is because we take fs_info->zone_active_bgs_lock after a block_group's
lock in btrfs_zone_activate() while doing the opposite in other places.
Fix the issue by expanding the fs_info->zone_active_bgs_lock's critical
section and taking it before a block_group's lock.
Fixes: a7e1ac7bdc ("btrfs: zoned: reserve zones for an active metadata/system block group")
CC: stable@vger.kernel.org # 6.6
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The error path of btrfs_get_chunk_map() releases
fs_info->mapping_tree_lock. But, it is taken and released in
btrfs_find_chunk_map(). So, there is no need to do so.
Fixes: 7dc66abb5a ("btrfs: use a dedicated data structure for chunk maps")
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When compiling with gcc version 14.0.0 20231220 (experimental)
and W=1, I've noticed the following warning:
fs/btrfs/send.c: In function 'btrfs_ioctl_send':
fs/btrfs/send.c:8208:44: warning: 'kvcalloc' sizes specified with 'sizeof'
in the earlier argument and not in the later argument [-Wcalloc-transposed-args]
8208 | sctx->clone_roots = kvcalloc(sizeof(*sctx->clone_roots),
| ^
Since 'n' and 'size' arguments of 'kvcalloc()' are multiplied to
calculate the final size, their actual order doesn't affect the result
and so this is not a bug. But it's still worth to fix it.
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Writing sequentially to a huge file on btrfs on a SMR HDD revealed a
decline of the performance (220 MiB/s to 30 MiB/s after 500 minutes).
The performance goes down because of increased latency of the extent
allocation, which is induced by a traversing of a lot of full block groups.
So, this patch optimizes the ffe_ctl->hint_byte by choosing a block group
with sufficient size from the active block group list, which does not
contain full block groups.
After applying the patch, the performance is maintained well.
Fixes: 2eda57089e ("btrfs: zoned: implement sequential extent allocation")
CC: stable@vger.kernel.org # 5.15+
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
GCC 13.2 warns:
drivers/net/wireless/intersil/p54/fwio.c:128:34: warning: '%s' directive output may be truncated writing up to 39 bytes into a region of size 32 [-Wformat-truncation=]
drivers/net/wireless/intersil/p54/fwio.c:128:33: note: directive argument in the range [0, 16777215]
drivers/net/wireless/intersil/p54/fwio.c:128:33: note: directive argument in the range [0, 255]
drivers/net/wireless/intersil/p54/fwio.c:127:17: note: 'snprintf' output between 7 and 52 bytes into a destination of size 32
The issue here is that wiphy->fw_version is 32 bytes and in theory the string
we try to place there can be 39 bytes. wiphy->fw_version is used for providing
the firmware version to user space via ethtool, so not really important.
fw_version in theory can be 24 bytes but in practise it's shorter, so even if
print only 19 bytes via ethtool there should not be any practical difference.
I did consider removing fw_var from the string altogether or making the maximum
length for fw_version 19 bytes, but chose this approach as it was the least
intrusive.
Compile tested only.
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Acked-by: Christian Lamparter <chunkeey@gmail.com> # Tested with Dell 1450 USB
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://msgid.link/20231219162516.898205-1-kvalo@kernel.org
The correct property name is 'reg' not 'regs'.
Fixes: 68e89bb36d ("dt-bindings: display: samsung,exynos-mixer: convert to dtschema")
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
Unlike what is claimed in commit f5aa7d46b0 ("drm/bridge:
parade-ps8640: Provide wait_hpd_asserted() in struct drm_dp_aux"), if
someone manually tries to do an AUX transfer (like via `i2cdump ${bus}
0x50 i`) while the panel is off we don't just get a simple transfer
error. Instead, the whole ps8640 gets thrown for a loop and goes into
a bad state.
Let's put the function to wait for the HPD (and the magical 50 ms
after first reset) back in when we're doing an AUX transfer. This
shouldn't actually make things much slower (assuming the panel is on)
because we should immediately poll and see the HPD high. Mostly this
is just an extra i2c transfer to the bridge.
Fixes: f5aa7d46b0 ("drm/bridge: parade-ps8640: Provide wait_hpd_asserted() in struct drm_dp_aux")
Tested-by: Pin-yen Lin <treapking@chromium.org>
Reviewed-by: Pin-yen Lin <treapking@chromium.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20231221135548.1.I10f326a9305d57ad32cee7f8d9c60518c8be20fb@changeid
While frontends may submit zero-size requests (wasting a precious slot),
core networking code as of at least 3ece782693 ("sock: skb_copy_ubufs
support for compound pages") can't deal with SKBs when they have all
zero-size fragments. Respond to empty requests right when populating
fragments; all further processing is fragment based and hence won't
encounter these empty requests anymore.
In a way this should have been that way from the beginning: When no data
is to be transferred for a particular request, there's not even a point
in validating the respective grant ref. That's no different from e.g.
passing NULL into memcpy() when at the same time the size is 0.
This is XSA-448 / CVE-2023-46838.
Cc: stable@vger.kernel.org
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Right now it is possible to see gmap->private being zero in
kvm_s390_vsie_gmap_notifier resulting in a crash. This is due to the
fact that we add gmap->private == kvm after creation:
static int acquire_gmap_shadow(struct kvm_vcpu *vcpu,
struct vsie_page *vsie_page)
{
[...]
gmap = gmap_shadow(vcpu->arch.gmap, asce, edat);
if (IS_ERR(gmap))
return PTR_ERR(gmap);
gmap->private = vcpu->kvm;
Let children inherit the private field of the parent.
Reported-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Fixes: a3508fbe9d ("KVM: s390: vsie: initial support for nested virtualization")
Cc: <stable@vger.kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Link: https://lore.kernel.org/r/20231220125317.4258-1-borntraeger@linux.ibm.com
It is preferable to exit through the out: label because
internal debugging functions are located there.
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Unfortunately reparse attribute is used for many purposes (several dozens).
It is not possible here to know is this name symlink or not.
To get exactly the type of name we should to open inode (read mft).
getattr for opened file (fstat) correctly returns symlink.
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
*`Initial discussion on the New subsystem for acceleration devices <https://lkml.org/lkml/2022/7/31/83>`_ - Oded Gabbay (2022)
*`patch-set to add the new subsystem <https://lkml.org/lkml/2022/10/22/544>`_ - Oded Gabbay (2022)
*`Initial discussion on the New subsystem for acceleration devices <https://lore.kernel.org/lkml/CAFCwf11=9qpNAepL7NL+YAV_QO=Wv6pnWPhKHKAepK3fNn+2Dg@mail.gmail.com/>`_ - Oded Gabbay (2022)
*`patch-set to add the new subsystem <https://lore.kernel.org/lkml/20221022214622.18042-1-ogabbay@kernel.org/>`_ - Oded Gabbay (2022)
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.