Pull driver core updates from Danilo Krummrich:
"Arch Topology:
- Move parse_acpi_topology() from arm64 to common code for reuse in
RISC-V
CPU:
- Expose housekeeping CPUs through /sys/devices/system/cpu/housekeeping
- Print a newline (or 0x0A) instead of '(null)' reading
/sys/devices/system/cpu/nohz_full when nohz_full= is not set
debugfs
- Remove (broken) 'no-mount' mode
- Remove redundant access mode checks in debugfs_get_tree() and
debugfs_create_*() functions
Devres:
- Remove unused devm_free_percpu() helper
- Move devm_alloc_percpu() from device.h to devres.h
Firmware Loader:
- Replace simple_strtol() with kstrtoint()
- Do not call cancel_store() when no upload is in progress
kernfs:
- Increase struct super_block::maxbytes to MAX_LFS_FILESIZE
- Fix a missing unwind path in __kernfs_new_node()
Misc:
- Increase the name size in struct auxiliary_device_id to 40
characters
- Replace system_unbound_wq with system_dfl_wq and add WQ_PERCPU to
alloc_workqueue()
Platform:
- Replace ERR_PTR() with IOMEM_ERR_PTR() in platform ioremap
functions
Rust:
- Auxiliary:
- Unregister auxiliary device on parent device unbind
- Move parent() to impl Device; implement device context aware
parent() for Device<Bound>
- Illustrate how to safely obtain a driver's device private data
when calling from an auxiliary driver into the parant device
driver
- DebugFs:
- Implement support for binary large objects
- Device:
- Let probe() return the driver's device private data as pinned
initializer, i.e. impl PinInit<Self, Error>
- Implement safe accessor for a driver's device private data for
Device<Bound> (returned reference can't out-live driver binding
and guarantees the correct private data type)
- Implement AsBusDevice trait, to be used by class device
abstractions to derive the bus device type of the parent device
- DMA:
- Store raw pointer of allocation as NonNull
- Use start_ptr() and start_ptr_mut() to inherit correct
mutability of self
- FS:
- Add file::Offset type alias
- I2C:
- Add abstractions for I2C device / driver infrastructure
- Implement abstractions for manual I2C device registrations
- I/O:
- Use "kernel vertical" style for imports
- Define ResourceSize as resource_size_t
- Move ResourceSize to top-level I/O module
- Add type alias for phys_addr_t
- Implement Rust version of read_poll_timeout_atomic()
- PCI:
- Use "kernel vertical" style for imports
- Move I/O and IRQ infrastructure to separate files
- Add support for PCI interrupt vectors
- Implement TryInto<IrqRequest<'a>> for IrqVector<'a> to convert
an IrqVector bound to specific pci::Device into an IrqRequest
bound to the same pci::Device's parent Device
- Leverage pin_init_scope() to get rid of redundant Result in IRQ
methods
- PinInit:
- Add {pin_}init_scope() to execute code before creating an
initializer
- Platform:
- Leverage pin_init_scope() to get rid of redundant Result in IRQ
methods
- Timekeeping:
- Implement abstraction of udelay()
- Uaccess:
- Implement read_slice_partial() and read_slice_file() for
UserSliceReader
- Implement write_slice_partial() and write_slice_file() for
UserSliceWriter
sysfs:
- Prepare the constification of struct attribute"
* tag 'driver-core-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core: (75 commits)
rust: pci: fix build failure when CONFIG_PCI_MSI is disabled
debugfs: Fix default access mode config check
debugfs: Remove broken no-mount mode
debugfs: Remove redundant access mode checks
driver core: Check drivers_autoprobe for all added devices
driver core: WQ_PERCPU added to alloc_workqueue users
driver core: replace use of system_unbound_wq with system_dfl_wq
tick/nohz: Expose housekeeping CPUs in sysfs
tick/nohz: avoid showing '(null)' if nohz_full= not set
sysfs/cpu: Use DEVICE_ATTR_RO for nohz_full attribute
kernfs: fix memory leak of kernfs_iattrs in __kernfs_new_node
fs/kernfs: raise sb->maxbytes to MAX_LFS_FILESIZE
mod_devicetable: Bump auxiliary_device_id name size
sysfs: simplify attribute definition macros
samples/kobject: constify 'struct foo_attribute'
samples/kobject: add is_visible() callback to attribute group
sysfs: attribute_group: enable const variants of is_visible()
sysfs: introduce __SYSFS_FUNCTION_ALTERNATIVE()
sysfs: transparently handle const pointers in ATTRIBUTE_GROUPS()
sysfs: attribute_group: allow registration of const attribute
...
Pull IPMI updates from Corey Minyard:
"Minor IPMI fixes:
- Some device tree cleanups and a maintainer add
- Fix a race when handling channel updates that could result in
errors being reported to the user in some cases"
* tag 'for-linus-6.19-1' of https://github.com/cminyard/linux-ipmi:
MAINTAINERS: Add entry on Loongson-2K IPMI driver
dt-bindings: ipmi: Convert aspeed,ast2400-ibt-bmc to DT schema
dt-bindings: ipmi: Convert nuvoton,npcm750-kcs-bmc to DT schema
ipmi: Skip channel scan if channels are already marked ready
ipmi: Fix __scan_channels() failing to rescan channels
ipmi: Fix the race between __scan_channels() and deliver_response()
Pull ata fix from Niklas Cassel:
- The DELLBOSS VD SATA controller times out when sending I/Os of size
4096 KiB or larger, even though it claims LBA48 support, which per
the ACS standard requires support for a maximum command size of
65535 sectors, i.e. 32 MiB - 512. Thus, quirk the device so that it
sets a lower maximum command size (me)
* tag 'ata-6.19-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux:
ata: libata-core: Quirk DELLBOSS VD max_sectors
ata: libata: Move quirk flags to their own enum
Pull more tpm updates from Jarkko Sakkinen:
"This is targeted for tpm2-sessions updates.
There's two bug fixes and two more cosmetic tweaks for HMAC protected
sessions. They provide a baseine for further improvements to be
implemented during the the course of the release cycle"
* tag 'tpmdd-sessions-next-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
tpm2-sessions: Open code tpm_buf_append_hmac_session()
tpm2-sessions: Remove 'attributes' parameter from tpm_buf_append_auth
tpm2-sessions: Fix tpm2_read_public range checks
tpm2-sessions: Fix out of range indexing in name_size
Pull media kernel-doc fix from Mauro Carvalho Chehab:
"A fix to shut up a kernel-doc warning on c3-isp driver"
* tag 'media/v6.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
media: uapi: c3-isp: Fix documentation warning
Pull SCSI updates from James Bottomley:
"Usual driver updates (ufs, lpfc, target, qla2xxx) plus assorted
cleanups and fixes including the WQ_PERCPU series.
The biggest core change is the new allocation of pseudo-devices which
allow the sending of internal commands to a given SCSI target"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (147 commits)
scsi: MAINTAINERS: Add the UFS include directory
scsi: scsi_debug: Support injecting unaligned write errors
scsi: qla2xxx: Fix improper freeing of purex item
scsi: ufs: rockchip: Fix compile error without CONFIG_GPIOLIB
scsi: ufs: rockchip: Reset controller on PRE_CHANGE of hce enable notify
scsi: ufs: core: Use scsi_device_busy()
scsi: ufs: core: Fix single doorbell mode support
scsi: pm80xx: Add WQ_PERCPU to alloc_workqueue() users
scsi: target: Add WQ_PERCPU to alloc_workqueue() users
scsi: qedi: Add WQ_PERCPU to alloc_workqueue() users
scsi: target: ibmvscsi: Add WQ_PERCPU to alloc_workqueue() users
scsi: qedf: Add WQ_PERCPU to alloc_workqueue() users
scsi: bnx2fc: Add WQ_PERCPU to alloc_workqueue() users
scsi: be2iscsi: Add WQ_PERCPU to alloc_workqueue() users
scsi: message: fusion: Add WQ_PERCPU to alloc_workqueue() users
scsi: lpfc: WQ_PERCPU added to alloc_workqueue() users
scsi: scsi_transport_fc: WQ_PERCPU added to alloc_workqueue users()
scsi: scsi_dh_alua: WQ_PERCPU added to alloc_workqueue() users
scsi: qla2xxx: WQ_PERCPU added to alloc_workqueue() users
scsi: target: sbp: Replace use of system_unbound_wq with system_dfl_wq
...
Open code 'tpm_buf_append_hmac_session_opt' to the call site, as it only
masks a call sequence and does otherwise nothing particularly useful.
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@opinsys.com>
Reviewed-by: Jonathan McDowell <noodles@meta.com>
Remove 'attributes' parameter from 'tpm_buf_append_auth', as it is not used
by the function.
Fixes: 27184f8905 ("tpm: Opt-in in disable PCR integrity protection")
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@opinsys.com>
Reviewed-by: Jonathan McDowell <noodles@meta.com>
tpm2_read_public() has some rudimentary range checks but the function does
not ensure that the response buffer has enough bytes for the full TPMT_HA
payload.
Re-implement the function with necessary checks and validation, and return
name and name size for all handle types back to the caller.
Cc: stable@vger.kernel.org # v6.10+
Fixes: d0a25bb961 ("tpm: Add HMAC session name/handle append")
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Reviewed-by: Jonathan McDowell <noodles@meta.com>
'name_size' does not have any range checks, and it just directly indexes
with TPM_ALG_ID, which could lead into memory corruption at worst.
Address the issue by only processing known values and returning -EINVAL for
unrecognized values.
Make also 'tpm_buf_append_name' and 'tpm_buf_fill_hmac_session' fallible so
that errors are detected before causing any spurious TPM traffic.
End also the authorization session on failure in both of the functions, as
the session state would be then by definition corrupted.
Cc: stable@vger.kernel.org # v6.10+
Fixes: 1085b8276b ("tpm: Add the rest of the session HMAC API")
Reviewed-by: Jonathan McDowell <noodles@meta.com>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
When CONFIG_PCI_MSI is disabled pci_alloc_irq_vectors() and
pci_free_irq_vectors() are defined as inline functions and hence require
a Rust helper.
error[E0425]: cannot find function `pci_alloc_irq_vectors` in crate `bindings`
--> rust/kernel/pci/irq.rs:144:23
|
144 | ...s::pci_alloc_irq_vectors(dev.as_raw(), min_vecs, max_vecs, irq_types.as_raw())
| ^^^^^^^^^^^^^^^^^^^^^ help: a function with a similar name exists: `pci_irq_vector`
|
::: .../rust/bindings/bindings_helpers_generated.rs:1197:5
|
1197 | pub fn pci_irq_vector(pdev: *mut pci_dev, nvec: ffi::c_uint) -> ffi::c_int;
| --------------------------------------------------------------------------- similarly named function `pci_irq_vector` defined here
error[E0425]: cannot find function `pci_free_irq_vectors` in crate `bindings`
--> rust/kernel/pci/irq.rs:170:28
|
170 | unsafe { bindings::pci_free_irq_vectors(self.dev.as_raw()) };
| ^^^^^^^^^^^^^^^^^^^^ help: a function with a similar name exists: `pci_irq_vector`
|
::: .../rust/bindings/bindings_helpers_generated.rs:1197:5
|
1197 | pub fn pci_irq_vector(pdev: *mut pci_dev, nvec: ffi::c_uint) -> ffi::c_int;
| --------------------------------------------------------------------------- similarly named function `pci_irq_vector` defined here
error: aborting due to 2 previous errors
Fix this by adding the corresponding helpers.
Fixes: 340ccc9735 ("rust: pci: Allocate and manage PCI interrupt vectors")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202512012238.YgVvRRUx-lkp@intel.com/
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com>
Link: https://patch.msgid.link/20251202210501.40998-1-dakr@kernel.org
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Building htmldocs generates a warning:
WARNING: include/uapi/linux/media/amlogic/c3-isp-config.h:199
error: Cannot parse struct or union!
Which correctly highlights that the c3_isp_params_block_header symbol
is wrongly documented as a struct while it's a plain #define instead.
Fix this by removing the 'struct' identifier from the documentation of
the c3_isp_params_block_header symbol.
[ribalda: Add Closes:]
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Closes: https://lore.kernel.org/all/20251127131425.4b5b6644@canb.auug.org.au/
Fixes: 4566208285 ("media: uapi: Convert Amlogic C3 to V4L2 extensible params")
Cc: stable@vger.kernel.org
Signed-off-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com>
Signed-off-by: Ricardo Ribalda <ribalda@chromium.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Commit 9b8b84879d ("block: Increase BLK_DEF_MAX_SECTORS_CAP") increased
the default max_sectors_kb from 1280 KiB to 4096 KiB.
DELLBOSS VD with FW rev MV.R00-0 times out when sending I/Os of size
4096 KiB.
Enable ATA_QUIRK_MAX_SEC, with value 8191 (sectors) for this device,
since any I/O with more sectors than that lead to I/O timeouts.
With this, the DELLBOSS VD SATA controller is usable again.
Cc: stable+noautosel@kernel.org # depends on Move quirk flags to their own enum
Fixes: 9b8b84879d ("block: Increase BLK_DEF_MAX_SECTORS_CAP")
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Niklas Cassel <cassel@kernel.org>
The anonymous enum in include/linux/libata.h that is used to store
various global constants can currently be backed by type int.
(It contains both negative and positive constants.)
__ATA_QUIRK_MAX is currently 31.
The quirk flags in the various global constants enum are defined as
"1U << quirk_flag_bit".
Thus if we simply add an additional quirk, the quirk flag will be 1 << 31,
which is a value that is too large to be represented by a signed int.
The various global constants enum will thus therefore be backed by type
long.
This will lead to error prints like e.g.:
ata_port_err(ap, "EH pending after %d tries, giving up\n",
ATA_EH_MAX_TRIES);
now failing to build, with build error:
error: format ‘%d’ expects argument of type ‘int’, but argument 4 has type ‘long int’ [-Werror=format=]
This is because all constants in the various global constants enum now
has to be printed as a long, as that is now the backing type of the enum.
Since the compiler will use the smallest possible backing type for an
enum, it is good practice to not mix unrelated things in a single enum.
Move the quirk flags to a separate enum, so that we don't need to change
the printf specifier for all other constants in the "various global
constants" enum when adding an additional quirk.
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Niklas Cassel <cassel@kernel.org>
debugfs access modes were added in Linux 5.10 (Dec 2020) [1], but the
no-mount mode has behaved effectively the same as the off mode since
Linux 5.12 (Apr 2021) [2]. The only difference is the specific error
code returned by the debugfs_create_* functions, which is -ENOENT in
no-mount mode and -EPERM in off mode.
Given that no-mount hasn't worked for several years with no complaints,
just remove it.
[1] a24c6f7bc9 ("debugfs: Add access restriction option")
[2] bc6de804d3 ("debugfs: be more robust at handling improper input in debugfs_lookup()")
56348560d4 ("debugfs: do not attempt to create a new file before the filesystem is initalized")
Signed-off-by: Aaron Thompson <dev@aaront.org>
Link: https://patch.msgid.link/20251120102222.18371-3-dev@null.aaront.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
debugfs_get_tree() can only be called if debugfs itself calls
simple_pin_fs() or register_filesystem(), and those call paths also
check the access mode.
debugfs_start_creating() checks the access mode so the checks in the
debugfs_create_* functions are unnecessary.
An upcoming change will affect debugfs_allow, so doing this cleanup
first will make that change simpler.
Signed-off-by: Aaron Thompson <dev@aaront.org>
Link: https://patch.msgid.link/20251120102222.18371-2-dev@null.aaront.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When a device is hot-plugged, the drivers_autoprobe sysfs attribute is
not checked (at least for PCI devices). This means that
drivers_autoprobe is not working as intended, e.g. hot-plugged PCI
devices will still be autoprobed and bound to drivers even with
drivers_autoprobe disabled.
The problem likely started when device_add() was removed from
pci_bus_add_device() in commit 4f535093cf ("PCI: Put pci_dev in device
tree as early as possible") which means that the check for
drivers_autoprobe which used to happen in bus_probe_device() is no
longer present (previously bus_add_device() calls bus_probe_device()).
Conveniently, in commit 9170304169 ("PCI: Allow built-in drivers to
use async initial probing") device_attach() was replaced with
device_initial_probe() which faciliates this change to push the check
for drivers_autoprobe into device_initial_probe().
Make sure all devices check drivers_autoprobe by pushing the
drivers_autoprobe check into device_initial_probe(). This will only
affect devices on the PCI bus for now as device_initial_probe() is only
called by pci_bus_add_device() and bus_probe_device(), but
bus_probe_device() already checks for autoprobe, so callers of
bus_probe_device() should not observe changes on autoprobing.
Note also that pushing this check into device_initial_probe() rather
than device_attach() makes it only affect automatic probing of
drivers (e.g. when a device is hot-plugged), userspace can still choose
to manually bind a driver by writing to drivers_probe sysfs attribute,
even with autoprobe disabled.
Any future callers of device_initial_probe() will respect the
drivers_autoprobe sysfs attribute, which is the intended purpose of
drivers_autoprobe.
Signed-off-by: Vincent Liu <vincent.liu@nutanix.com>
Link: https://patch.msgid.link/20251022120740.2476482-1-vincent.liu@nutanix.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently if a user enqueue a work item using schedule_delayed_work() the
used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND.
This lack of consistency cannot be addressed without refactoring the API.
alloc_workqueue() treats all queues as per-CPU by default, while unbound
workqueues must opt-in via WQ_UNBOUND.
This default is suboptimal: most workloads benefit from unbound queues,
allowing the scheduler to place worker threads where they’re needed and
reducing noise when CPUs are isolated.
This continues the effort to refactor workqueue APIs, which began with
the introduction of new workqueues and a new alloc_workqueue flag in:
commit 128ea9f6cc ("workqueue: Add system_percpu_wq and system_dfl_wq")
commit 930c2ea566 ("workqueue: Add new WQ_PERCPU flag")
This change adds a new WQ_PERCPU flag to explicitly request
alloc_workqueue() to be per-cpu when WQ_UNBOUND has not been specified.
With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND),
any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND
must now use WQ_PERCPU.
Once migration is complete, WQ_UNBOUND can be removed and unbound will
become the implicit default.
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Link: https://patch.msgid.link/20251114141618.172154-3-marco.crivellari@suse.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently if a user enqueue a work item using schedule_delayed_work() the
used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND.
This lack of consistentcy cannot be addressed without refactoring the API.
This continues the effort to refactor workqueue APIs, which began with
the introduction of new workqueues and a new alloc_workqueue flag in:
commit 128ea9f6cc ("workqueue: Add system_percpu_wq and system_dfl_wq")
commit 930c2ea566 ("workqueue: Add new WQ_PERCPU flag")
Switch to using system_dfl_wq because system_unbound_wq is going away as part of
a workqueue restructuring.
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Link: https://patch.msgid.link/20251114141618.172154-2-marco.crivellari@suse.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Expose the current system-defined list of housekeeping CPUs in a new
sysfs file: /sys/devices/system/cpu/housekeeping.
This provides userspace performance tuning tools and resource managers
with a canonical, reliable method to accurately identify the cores
responsible for essential kernel maintenance workloads (RCU, timer
callbacks, and unbound workqueues). Currently, tooling must manually
calculate the housekeeping set by parsing complex kernel boot parameters
(like isolcpus= and nohz_full=) and system topology, which is prone to
error. This dedicated file simplifies the configuration of low-latency
workloads.
Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
Link: https://patch.msgid.link/20251011012853.7539-2-atomlin@atomlin.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In the context of CONFIG_NO_HZ_FULL=y, tick_nohz_full_mask (of type
cpumask_var_t) is initialised to 0. Memory is only allocated to the cpumask
data structure, in tick_nohz_full_setup(), when Linux kernel boot-time
parameter "nohz_full=" is correctly specified (see housekeeping_setup()).
If "nohz_full=" is not set and an attempt is made to read
/sys/devices/system/cpu/nohz_full, '(null)' can be displayed:
❯ cat /sys/devices/system/cpu/nohz_full
(null)
This patch changes the output to print a newline (or 0x0A) instead of
'(null)', making it consistent with print_cpus_isolated() behaviour.
Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
Link: https://patch.msgid.link/20251011011830.6670-3-atomlin@atomlin.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The /sys/devices/system/cpu/nohz_full file is a read-only attribute that
reports the CPUs configured for tickless operation (CONFIG_NO_HZ_FULL=y).
The current definition uses the generic DEVICE_ATTR macro, which
unnecessarily requires specifying the file mode (0444) and a NULL
store operation pointer.
This patch converts the definition to use the dedicated DEVICE_ATTR_RO
macro. This correctly expresses the read-only nature of the attribute,
removes the redundant mode field, and simplifies the code. As a related
cleanup, rename the show function from print_cpus_nohz_full() to the
standard nohz_full_show() for consistency with common sysfs attribute
naming conventions.
Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
Link: https://patch.msgid.link/20251011011830.6670-2-atomlin@atomlin.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There exists a memory leak of kernfs_iattrs contained as an element
of kernfs_node allocated in __kernfs_new_node(). __kernfs_setattr()
allocates kernfs_iattrs as a sub-object, and the LSM security check
incorrectly errors out and does not free the kernfs_iattrs sub-object.
Make an additional error out case that properly frees kernfs_iattrs if
security_kernfs_init_security() fails.
Fixes: e19dfdc83b ("kernfs: initialize security of newly created nodes")
Co-developed-by: Oliver Rosenberg <olrose55@gmail.com>
Signed-off-by: Oliver Rosenberg <olrose55@gmail.com>
Signed-off-by: Will Rosenberg <whrosenb@asu.edu>
Link: https://patch.msgid.link/20251125151332.2010687-1-whrosenb@asu.edu
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
On an ARM64 A1 system, it's possible to have physical memory span
up to the 64T boundary, like below
$ lsmem -b -r -n -o range,size
0x0000000080000000-0x00000000bfffffff 1073741824
0x0000080000000000-0x000008007fffffff 2147483648
0x00000800c0000000-0x0000087fffffffff 546534588416
0x0000400000000000-0x00004000bfffffff 3221225472
0x0000400100000000-0x0000407fffffffff 545460846592
So it's time to extend /sys/kernel/mm/page_idle/bitmap to be able
to account for >2G number of pages, by raising the kernfs file size
limit.
Signed-off-by: Jane Chu <jane.chu@oracle.com>
Link: https://patch.msgid.link/20251111202606.1505437-1-jane.chu@oracle.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Define the macros in terms of each other.
This makes them easier to understand and also will make it easier to
implement the transition machinery for 'const struct attribute'.
__ATTR_RO_MODE() can't be implemented in terms of __ATTR() as not all
attributes have a .store callback. The same issue theoretically exists
for __ATTR_WO(), but practically that does not occur today.
Reorder __ATTR_RO() below __ATTR_RO_MODE() to keep the order of the
macro definition consistent with respect to each other.
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Link: https://patch.msgid.link/20251029-sysfs-const-attr-prep-v5-7-ea7d745acff4@weissschuh.net
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
For the constification phase of 'struct attribute' various callback
struct members will need to exist in both const and non-const variants.
Keeping both members in a union avoids memory and CPU overhead but will
be detected and trapped by Control Flow Integrity (CFI). By deciding
between a struct and a union depending whether CFI is enabled, most
configurations can avoid this overhead. Code using these callbacks will
still need to be updated to handle both members explicitly.
In the union case the compiler will recognize that testing for one union
member is enough and optimize away the code for the other one.
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Link: https://patch.msgid.link/20251029-sysfs-const-attr-prep-v5-3-ea7d745acff4@weissschuh.net
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To be able to constify instances of struct attribute it has to be
possible to add them to struct attribute_group. The current type of the
attrs member however is not compatible with that. Introduce a union that
allows registration of both const and non-const attributes to enable a
piecewise transition. As both union member types are compatible no
logic needs to be adapted.
Technically it is now possible register a const struct attribute and
receive it as mutable pointer in the callbacks. This is a soundness
issue. But this same soundness issue already exists today in
sysfs_create_file(). Also the struct definition and callback
implementation are always closely linked and are meant to be moved to
const in lockstep.
Similar to commit 906c508afd ("sysfs: attribute_group: allow
registration of const bin_attribute")
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Link: https://patch.msgid.link/20251029-sysfs-const-attr-prep-v5-1-ea7d745acff4@weissschuh.net
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The rust-next tree contains commit:
3b83f5d5e7 ("rust: replace `CStr` with `core::ffi::CStr`")
which, when merged together with commits:
57c5bd9aee ("rust: i2c: add basic I2C device and driver abstractions")
f3cc26a417 ("rust: i2c: add manual I2C device creation abstractions")
from this tree (driver-core), produces errors like the following:
error[E0599]: no method named `len_with_nul` found for reference `&'static ffi::CStr` in the current scope
--> rust/kernel/i2c.rs:48:16
|
48 | id.len_with_nul() <= Self::I2C_NAME_SIZE,
| ^^^^^^^^^^^^ method not found in `&CStr`
error[E0599]: no method named `as_bytes_with_nul` found for reference `&'static ffi::CStr` in the current scope
--> rust/kernel/i2c.rs:51:22
|
51 | let src = id.as_bytes_with_nul();
| ^^^^^^^^^^^^^^^^^
|
help: there is a method `to_bytes_with_nul` with a similar name
|
51 | let src = id.to_bytes_with_nul();
| ~~~~~~~~~~~~~~~~~
which were detected in linux-next by Stephen [1].
The `i2c` code can be independently prepared to be ready for the change,
thus do so.
The change is similar to the one done by Tamir in commit 657403637f
("rust: acpi: use `core::ffi::CStr` method names").
Link: https://lore.kernel.org/all/20251120181111.65ce75a0@canb.auug.org.au/ [1]
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
Link: https://patch.msgid.link/20251123163536.1771801-1-ojeda@kernel.org
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Bean Huo <beanhuo@iokpp.de> says:
This patch series introduces OP-TEE based RPMB (Replay Protected
Memory Block) support for UFS devices, extending the kernel-level
secure storage capabilities that are currently available for eMMC
devices.
Previously, OP-TEE required a userspace supplicant to access RPMB
partitions, which created complex dependencies and reliability issues,
especially during early boot scenarios. Recent work by Linaro has
moved core supplicant functionality directly into the Linux kernel for
eMMC devices, eliminating userspace dependencies and enabling
immediate secure storage access. This series extends the same approach
to UFS devices, which are used in enterprise and mobile applications
that require secure storage capabilities.
Benefits:
- Eliminates dependency on userspace supplicant for UFS RPMB access
- Enables early boot secure storage access (e.g., fTPM, secure UEFI
variables)
- Provides kernel-level RPMB access as soon as UFS driver is
initialized
- Removes complex initramfs dependencies and boot ordering
requirements
- Ensures reliable and deterministic secure storage operations
- Supports both built-in and modular fTPM configurations.
Prerequisites:
--------------
This patch series depends on commit 7e8242405b ("rpmb: move struct
rpmb_frame to common header") which has been merged into mainline
v6.18-rc2.
Link: https://patch.msgid.link/20251107230518.4060231-1-beanhuo@iokpp.de
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In qla2xxx_process_purls_iocb(), an item is allocated via
qla27xx_copy_multiple_pkt(), which internally calls
qla24xx_alloc_purex_item().
The qla24xx_alloc_purex_item() function may return a pre-allocated item
from a per-adapter pool for small allocations, instead of dynamically
allocating memory with kzalloc().
An error handling path in qla2xxx_process_purls_iocb() incorrectly uses
kfree() to release the item. If the item was from the pre-allocated
pool, calling kfree() on it is a bug that can lead to memory corruption.
Fix this by using the correct deallocation function,
qla24xx_free_purex_item(), which properly handles both dynamically
allocated and pre-allocated items.
Fixes: 875386b988 ("scsi: qla2xxx: Add Unsolicited LS Request and Response Support for NVMe")
Signed-off-by: Zilin Guan <zilin@seu.edu.cn>
Reviewed-by: Himanshu Madhani <hmadhani2024@gmail.com>
Link: https://patch.msgid.link/20251113151246.762510-1-zilin@seu.edu.cn
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This fixes the dme-reset failed when doing recovery. Because device
reset is not enough, we could occasionally see the error below:
ufshcd-rockchip 2a2d0000.ufs: uic cmd 0x14 with arg3 0x0 completion timeout
ufshcd-rockchip 2a2d0000.ufs: dme-reset: error code -110
ufshcd-rockchip 2a2d0000.ufs: DME_RESET failed
ufshcd-rockchip 2a2d0000.ufs: ufshcd_host_reset_and_restore: Host init failed -110
Fix this by resetting the controller on PRE_CHANGE stage of hce enable
notify.
Fixes: d3cbe455d6 ("scsi: ufs: rockchip: Initial support for UFS")
Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
Link: https://patch.msgid.link/1763009575-237552-1-git-send-email-shawn.lin@rock-chips.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Implement the `AsBusDevice` trait for converting a `Device` reference to a
bus device reference for all bus devices.
The `AsBusDevice` trait allows abstractions to provide the bus device in
class device callbacks. It must not be used by drivers and is intended for
bus and class device abstractions only.
Signed-off-by: Markus Probst <markus.probst@posteo.de>
Link: https://patch.msgid.link/20251027200547.1038967-2-markus.probst@posteo.de
[ * Remove unused import.
* Change visibility of AsBusDevice to public.
* Fix build for USB.
* Add impl for I2cClient.
- Danilo ]
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
In addition to the basic I2C device support, add rust abstractions
upon `i2c_new_client_device`/`i2c_unregister_device` C functions.
Implement the core abstractions needed for manual creation/deletion
of I2C devices, including:
* `i2c::Registration` — a NonNull pointer created by the function
`i2c_new_client_device`
* `i2c::I2cAdapter` — a ref counted wrapper around `struct i2c_adapter`
* `i2c::I2cBoardInfo` — a safe wrapper around `struct i2c_board_info`
Acked-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Igor Korotin <igor.korotin.linux@gmail.com>
Link: https://patch.msgid.link/20251116162154.171493-1-igor.korotin.linux@gmail.com
[ Remove unnecessary safety comment. - Danilo ]
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Resource sizes are a general concept for dealing with physical
addresses, and not specific to the Resource type, which is just one way
to access physical addresses. Thus, move the typedef to the io module.
Still keep a re-export under resource. This avoids this commit from
being a flag-day, but I also think it's a useful re-export in general so
that you can import
use kernel::io::resource::{Resource, ResourceSize};
instead of having to write
use kernel::io::{
resource::Resource,
ResourceSize,
};
in the specific cases where you need ResourceSize because you are using
the Resource type. Therefore I think it makes sense to keep this
re-export indefinitely and it is *not* intended as a temporary re-export
for migration purposes.
Cc: stable@vger.kernel.org # for v6.18 [1]
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
Link: https://patch.msgid.link/20251112-resource-phys-typedefs-v2-2-538307384f82@google.com
Link: https://lore.kernel.org/all/20251112-resource-phys-typedefs-v2-0-538307384f82@google.com/ [1]
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Marco Crivellari <marco.crivellari@suse.com> says:
Hi,
=== Current situation: problems ===
Let's consider a nohz_full system with isolated CPUs: wq_unbound_cpumask is
set to the housekeeping CPUs, for !WQ_UNBOUND the local CPU is selected.
This leads to different scenarios if a work item is scheduled on an
isolated CPU where "delay" value is 0 or greater then 0:
schedule_delayed_work(, 0);
This will be handled by __queue_work() that will queue the work item on the
current local (isolated) CPU, while:
schedule_delayed_work(, 1);
Will move the timer on an housekeeping CPU, and schedule the work there.
Currently if a user enqueue a work item using schedule_delayed_work() the
used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND.
This lack of consistency cannot be addressed without refactoring the API.
=== Recent changes to the WQ API ===
The following, address the recent changes in the Workqueue API:
- commit 128ea9f6cc ("workqueue: Add system_percpu_wq and system_dfl_wq")
- commit 930c2ea566 ("workqueue: Add new WQ_PERCPU flag")
The old workqueues will be removed in a future release cycle.
=== Introduced Changes by this series ===
1) [P 1] Replace uses of system_wq and system_unbound_wq
system_unbound_wq is to be used when locality is not required.
Because of that, system_unbound_wq has been replaced with
system_dfl_wq, to make clear it should be used when locality
is not required.
2) [P 2-3-4] WQ_PERCPU added to alloc_workqueue()
This change adds a new WQ_PERCPU flag to explicitly request
alloc_workqueue() to be per-cpu when WQ_UNBOUND has not been specified.
Thanks!
Link: https://patch.msgid.link/20251031095643.74246-1-marco.crivellari@suse.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Currently if a user enqueues a work item using schedule_delayed_work()
the used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND. This lack of consistency cannot be addressed
without refactoring the API.
alloc_workqueue() treats all queues as per-CPU by default, while unbound
workqueues must opt-in via WQ_UNBOUND.
This default is suboptimal: most workloads benefit from unbound queues,
allowing the scheduler to place worker threads where they’re needed and
reducing noise when CPUs are isolated.
This continues the effort to refactor workqueue APIs, which began with
the introduction of new workqueues and a new alloc_workqueue flag in:
commit 128ea9f6cc ("workqueue: Add system_percpu_wq and system_dfl_wq")
commit 930c2ea566 ("workqueue: Add new WQ_PERCPU flag")
This adds a new WQ_PERCPU flag to explicitly request to alloc_workqueue()
to be per-cpu when WQ_UNBOUND has not been specified.
With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND),
any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND
must now use WQ_PERCPU.
Once migration is complete, WQ_UNBOUND can be removed and unbound will
become the implicit default.
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Link: https://patch.msgid.link/20251107155257.316728-1-marco.crivellari@suse.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Currently if a user enqueues a work item using schedule_delayed_work()
the used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND. This lack of consistency cannot be addressed
without refactoring the API.
alloc_workqueue() treats all queues as per-CPU by default, while unbound
workqueues must opt-in via WQ_UNBOUND.
This default is suboptimal: most workloads benefit from unbound queues,
allowing the scheduler to place worker threads where they’re needed and
reducing noise when CPUs are isolated.
This continues the effort to refactor workqueue APIs, which began with
the introduction of new workqueues and a new alloc_workqueue flag in:
commit 128ea9f6cc ("workqueue: Add system_percpu_wq and system_dfl_wq")
commit 930c2ea566 ("workqueue: Add new WQ_PERCPU flag")
This change adds a new WQ_PERCPU flag to explicitly request
alloc_workqueue() to be per-cpu when WQ_UNBOUND has not been specified.
With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND),
any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND
must now use WQ_PERCPU.
Once migration is complete, WQ_UNBOUND can be removed and unbound will
become the implicit default.
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Link: https://patch.msgid.link/20251107154008.304127-1-marco.crivellari@suse.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Currently if a user enqueues a work item using schedule_delayed_work()
the used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND. This lack of consistency cannot be addressed
without refactoring the API.
alloc_workqueue() treats all queues as per-CPU by default, while unbound
workqueues must opt-in via WQ_UNBOUND.
This default is suboptimal: most workloads benefit from unbound queues,
allowing the scheduler to place worker threads where they’re needed and
reducing noise when CPUs are isolated.
This continues the effort to refactor workqueue APIs, which began with
the introduction of new workqueues and a new alloc_workqueue flag in:
commit 128ea9f6cc ("workqueue: Add system_percpu_wq and system_dfl_wq")
commit 930c2ea566 ("workqueue: Add new WQ_PERCPU flag")
This change adds a new WQ_PERCPU flag to explicitly request
alloc_workqueue() to be per-cpu when WQ_UNBOUND has not been specified.
With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND),
any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND
must now use WQ_PERCPU.
Once migration is complete, WQ_UNBOUND can be removed and unbound will
become the implicit default.
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Link: https://patch.msgid.link/20251107151618.281250-1-marco.crivellari@suse.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Currently if a user enqueues a work item using schedule_delayed_work()
the used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND. This lack of consistency cannot be addressed
without refactoring the API.
alloc_workqueue() treats all queues as per-CPU by default, while unbound
workqueues must opt-in via WQ_UNBOUND.
This default is suboptimal: most workloads benefit from unbound queues,
allowing the scheduler to place worker threads where they’re needed and
reducing noise when CPUs are isolated.
This continues the effort to refactor workqueue APIs, which began with
the introduction of new workqueues and a new alloc_workqueue flag in:
commit 128ea9f6cc ("workqueue: Add system_percpu_wq and system_dfl_wq")
commit 930c2ea566 ("workqueue: Add new WQ_PERCPU flag")
This change adds a new WQ_PERCPU flag to explicitly request
alloc_workqueue() to be per-cpu when WQ_UNBOUND has not been specified.
With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND),
any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND
must now use WQ_PERCPU.
Once migration is complete, WQ_UNBOUND can be removed and unbound will
become the implicit default.
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Link: https://patch.msgid.link/20251107150542.271229-1-marco.crivellari@suse.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Currently if a user enqueues a work item using schedule_delayed_work()
the used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND. This lack of consistency cannot be addressed
without refactoring the API.
alloc_workqueue() treats all queues as per-CPU by default, while unbound
workqueues must opt-in via WQ_UNBOUND.
This default is suboptimal: most workloads benefit from unbound queues,
allowing the scheduler to place worker threads where they’re needed and
reducing noise when CPUs are isolated.
This continues the effort to refactor workqueue APIs, which began with
the introduction of new workqueues and a new alloc_workqueue flag in:
commit 128ea9f6cc ("workqueue: Add system_percpu_wq and system_dfl_wq")
commit 930c2ea566 ("workqueue: Add new WQ_PERCPU flag")
This change adds a new WQ_PERCPU flag to explicitly request
alloc_workqueue() to be per-cpu when WQ_UNBOUND has not been specified.
With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND),
any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND
must now use WQ_PERCPU.
Once migration is complete, WQ_UNBOUND can be removed and unbound will
become the implicit default.
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Link: https://patch.msgid.link/20251107150155.267651-3-marco.crivellari@suse.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Currently if a user enqueues a work item using schedule_delayed_work()
the used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND. This lack of consistency cannot be addressed
without refactoring the API.
alloc_workqueue() treats all queues as per-CPU by default, while unbound
workqueues must opt-in via WQ_UNBOUND.
This default is suboptimal: most workloads benefit from unbound queues,
allowing the scheduler to place worker threads where they’re needed and
reducing noise when CPUs are isolated.
This continues the effort to refactor workqueue APIs, which began with
the introduction of new workqueues and a new alloc_workqueue flag in:
commit 128ea9f6cc ("workqueue: Add system_percpu_wq and system_dfl_wq")
commit 930c2ea566 ("workqueue: Add new WQ_PERCPU flag")
This change adds a new WQ_PERCPU flag to explicitly request
alloc_workqueue() to be per-cpu when WQ_UNBOUND has not been specified.
With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND),
any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND
must now use WQ_PERCPU.
Once migration is complete, WQ_UNBOUND can be removed and unbound will
become the implicit default.
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Link: https://patch.msgid.link/20251107150155.267651-2-marco.crivellari@suse.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Currently if a user enqueues a work item using schedule_delayed_work()
the used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND. This lack of consistency cannot be addressed
without refactoring the API.
alloc_workqueue() treats all queues as per-CPU by default, while unbound
workqueues must opt-in via WQ_UNBOUND.
This default is suboptimal: most workloads benefit from unbound queues,
allowing the scheduler to place worker threads where they’re needed and
reducing noise when CPUs are isolated.
This continues the effort to refactor workqueue APIs, which began with
the introduction of new workqueues and a new alloc_workqueue flag in:
commit 128ea9f6cc ("workqueue: Add system_percpu_wq and system_dfl_wq")
commit 930c2ea566 ("workqueue: Add new WQ_PERCPU flag")
This change adds a new WQ_PERCPU flag to explicitly request
alloc_workqueue() to be per-cpu when WQ_UNBOUND has not been specified.
With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND),
any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND
must now use WQ_PERCPU.
Once migration is complete, WQ_UNBOUND can be removed and unbound will
become the implicit default.
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Link: https://patch.msgid.link/20251107144949.256894-1-marco.crivellari@suse.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Currently if a user enqueues a work item using schedule_delayed_work()
the used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND. This lack of consistency cannot be addressed
without refactoring the API.
alloc_workqueue() treats all queues as per-CPU by default, while unbound
workqueues must opt-in via WQ_UNBOUND.
This default is suboptimal: most workloads benefit from unbound queues,
allowing the scheduler to place worker threads where they’re needed and
reducing noise when CPUs are isolated.
This continues the effort to refactor workqueue APIs, which began with
the introduction of new workqueues and a new alloc_workqueue flag in:
commit 128ea9f6cc ("workqueue: Add system_percpu_wq and system_dfl_wq")
commit 930c2ea566 ("workqueue: Add new WQ_PERCPU flag")
This change adds a new WQ_PERCPU flag to explicitly request
alloc_workqueue() to be per-cpu when WQ_UNBOUND has not been specified.
With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND),
any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND
must now use WQ_PERCPU.
Once migration is complete, WQ_UNBOUND can be removed and unbound will
become the implicit default.
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Link: https://patch.msgid.link/20251107141458.225119-1-marco.crivellari@suse.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Currently if a user enqueue a work item using schedule_delayed_work()
the used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND. This lack of consistentcy cannot be
addressed without refactoring the API.
alloc_workqueue() treats all queues as per-CPU by default, while unbound
workqueues must opt-in via WQ_UNBOUND.
This default is suboptimal: most workloads benefit from unbound queues,
allowing the scheduler to place worker threads where they’re needed and
reducing noise when CPUs are isolated.
This change adds a new WQ_PERCPU flag to explicitly request
alloc_workqueue() to be per-cpu when WQ_UNBOUND has not been specified.
This patch continues the effort to refactor worqueue APIs, which has
begun with the change introducing new workqueues and a new
alloc_workqueue flag:
commit 128ea9f6cc ("workqueue: Add system_percpu_wq and system_dfl_wq")
commit 930c2ea566 ("workqueue: Add new WQ_PERCPU flag")
With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND),
any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND
must now use WQ_PERCPU.
Once migration is complete, WQ_UNBOUND can be removed and unbound will
become the implicit default.
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Reviewed-by: Justin Tee <justin.tee@broadcom.com>
Link: https://patch.msgid.link/20251104110808.123424-1-marco.crivellari@suse.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Currently if a user enqueue a work item using schedule_delayed_work() the
used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND.
This lack of consistentcy cannot be addressed without refactoring the API.
alloc_workqueue() treats all queues as per-CPU by default, while unbound
workqueues must opt-in via WQ_UNBOUND.
This default is suboptimal: most workloads benefit from unbound queues,
allowing the scheduler to place worker threads where they’re needed and
reducing noise when CPUs are isolated.
This change adds a new WQ_PERCPU flag to explicitly request
alloc_workqueue() to be per-cpu when WQ_UNBOUND has not been specified.
With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND),
any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND
must now use WQ_PERCPU.
Once migration is complete, WQ_UNBOUND can be removed and unbound will
become the implicit default.
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Reviewed-by: Justin Tee <justin.tee@broadcom.com>
Link: https://patch.msgid.link/20251031095643.74246-5-marco.crivellari@suse.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Currently if a user enqueue a work item using schedule_delayed_work() the
used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND.
This lack of consistentcy cannot be addressed without refactoring the API.
alloc_workqueue() treats all queues as per-CPU by default, while unbound
workqueues must opt-in via WQ_UNBOUND.
This default is suboptimal: most workloads benefit from unbound queues,
allowing the scheduler to place worker threads where they’re needed and
reducing noise when CPUs are isolated.
This change adds a new WQ_PERCPU flag to explicitly request
alloc_workqueue() to be per-cpu when WQ_UNBOUND has not been specified.
With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND),
any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND
must now use WQ_PERCPU.
Once migration is complete, WQ_UNBOUND can be removed and unbound will
become the implicit default.
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Link: https://patch.msgid.link/20251031095643.74246-4-marco.crivellari@suse.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Currently if a user enqueue a work item using schedule_delayed_work() the
used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND.
This lack of consistentcy cannot be addressed without refactoring the API.
alloc_workqueue() treats all queues as per-CPU by default, while unbound
workqueues must opt-in via WQ_UNBOUND.
This default is suboptimal: most workloads benefit from unbound queues,
allowing the scheduler to place worker threads where they’re needed and
reducing noise when CPUs are isolated.
This change adds a new WQ_PERCPU flag to explicitly request
alloc_workqueue() to be per-cpu when WQ_UNBOUND has not been specified.
With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND),
any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND
must now use WQ_PERCPU.
Once migration is complete, WQ_UNBOUND can be removed and unbound will
become the implicit default.
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Link: https://patch.msgid.link/20251031095643.74246-3-marco.crivellari@suse.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Currently if a user enqueue a work item using schedule_delayed_work()
the used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND.
This lack of consistentcy cannot be addressed without refactoring the
API.
This patch continues the effort to refactor worqueue APIs, which has
begun with the change introducing new workqueues and a new
alloc_workqueue flag:
commit 128ea9f6cc ("workqueue: Add system_percpu_wq and system_dfl_wq")
commit 930c2ea566 ("workqueue: Add new WQ_PERCPU flag")
system_dfl_wq should be the default workqueue so as not to enforce
locality constraints for random work whenever it's not required.
The old system_unbound_wq will be kept for a few release cycles.
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Link: https://patch.msgid.link/20251104104518.102130-1-marco.crivellari@suse.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Currently if a user enqueue a work item using schedule_delayed_work()
the used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND.
This lack of consistency cannot be addressed without refactoring the
API.
system_unbound_wq should be the default workqueue so as not to enforce
locality constraints for random work whenever it's not required.
Adding system_dfl_wq to encourage its use when unbound work should be
used.
The old system_unbound_wq will be kept for a few release cycles.
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Link: https://patch.msgid.link/20251031095643.74246-2-marco.crivellari@suse.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Currently if a user enqueue a work item using schedule_delayed_work()
the used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND.
This lack of consistency cannot be addressed without refactoring the
API.
system_unbound_wq should be the default workqueue so as not to enforce
locality constraints for random work whenever it's not required.
Adding system_dfl_wq to encourage its use when unbound work should be
used.
The old system_unbound_wq will be kept for a few release cycles.
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Link: https://patch.msgid.link/20251031095643.74246-2-marco.crivellari@suse.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The !payload check tests the address of a member of a data structure. We
know that the start address of this data structure (job) is not NULL
since the 'job' pointer has already been dereferenced. Hence, the
!payload check is superfluous. Remove this test. This was reported by
the CodeSonar static analyzer.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://patch.msgid.link/20251111184802.125111-1-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The driver calls ioport_map() to map I/O ports in sim710_probe_common()
but never calls ioport_unmap() to release the mapping. This causes
resource leaks in both the error path when request_irq() fails and in
the normal device removal path via sim710_device_remove().
Add ioport_unmap() calls in the out_release error path and in
sim710_device_remove().
Fixes: 56fece2008 ("[PATCH] finally fix 53c700 to use the generic iomem infrastructure")
Signed-off-by: Haotian Zhang <vulab@iscas.ac.cn>
Link: https://patch.msgid.link/20251029032555.1476-1-vulab@iscas.ac.cn
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Tony Battersby <tonyb@cybernetics.com> says:
This patch series improves the qla2xxx FC driver in target mode. I
developed these patches using the out-of-tree SCST target-mode
subsystem (https://scst.sourceforge.net/), although most of the
improvements will also apply to the other target-mode subsystems such
as the in-tree LIO. Unfortunately qla2xxx+LIO does not pass all of my
tests, but my patches do not make it any worse (results below). These
patches have been well-tested at my employer with qla2xxx+SCST in both
initiator mode and target mode and with a variety of FC HBAs and
initiators. Since SCST is out-of-tree, some of the patches have parts
that apply in-tree and other parts that apply out-of-tree to SCST. The
SCST patches can be found in the v2 posting linked above.
All patches apply to linux 6.17 and SCST 3.10 master branch.
Summary of patches:
- bugfixes
- cleanups
- improve handling of aborts and task management requests
- improve log message
- add back SLER / SRR support (removed in 2017)
Some of these patches improve handling of aborts and task management
requests. This is some of the testing that I did:
Test 1: Use /dev/sg to queue random disk I/O with short timeouts; make
sure cmds are aborted successfully.
Test 2: Queue lots of disk I/O, then use "sg_reset -N -d /dev/sg" on
initiator to reset logical unit.
Test 3: Queue lots of disk I/O, then use "sg_reset -N -t /dev/sg" on
initiator to reset target.
Test 4: Queue lots of disk I/O, then use "sg_reset -N -b /dev/sg" on
initiator to reset bus.
Test 5: Queue lots of disk I/O, then use "sg_reset -N -H /dev/sg" on
initiator to reset host.
Test 6: Use fiber channel attenuator to trigger SRR during
write/read/compare test; check data integrity.
With my patches, SCST passes all of these tests.
Results with in-tree LIO target-mode subsystem:
Test 1: Seems to abort the same cmd multiple times (both
qlt_24xx_retry_term_exchange() and __qlt_send_term_exchange()). But
cmds get aborted, so give it a pass?
Test 2: Seems to work; cmds are aborted.
Test 3: Target reset doesn't seem to abort cmds, instead, a few seconds
later:
qla2xxx [0000:04:00.0]-f058:9: qla_target(0): tag 1314312, op 2a: CTIO
with TIMEOUT status 0xb received (state 1, port 51:40:2e:c0:18:1d:9f:cc,
LUN 0)
Tests 4 and 5: The initiator is unable to log back in to the target; the
following messages are repeated over and over on the target:
qla2xxx [0000:04:00.0]-e01c:9: Sending TERM ELS CTIO (ha=00000000f8811390)
qla2xxx [0000:04:00.0]-f097:9: Linking sess 000000008df5aba8 [0] wwn
51:40:2e:c0:18:1d:9f:cc with PLOGI ACK to wwn 51:40:2e:c0:18:1d:9f:cc
s_id 00:00:01, ref=2 pla 00000000835a9271 link 0
Test 6: passes with my patches; SRR not supported previously.
So qla2xxx+LIO seems a bit flaky when handling exceptions, but my
patches do not make it any worse. Perhaps someone who is more familiar
with LIO can look at the difference between LIO and SCST and figure out
how to improve it.
Tony Battersby
https://www.cybernetics.com/
v1: https://lore.kernel.org/r/f8977250-638c-4d7d-ac0c-65f742b8d535@cybernetics.com/
v2: https://lore.kernel.org/linux-scsi/e95ee7d0-3580-4124-b854-7f73ca3a3a84@cybernetics.com/
Link: https://patch.msgid.link/aaea0ab0-da8b-4153-9369-60db7507ff7a@cybernetics.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The driver associates two different structs with numeric handles and
passes the handles to the hardware. When the hardware passes the handle
back to the driver, the driver consults a table of void * to convert the
handle back to the struct without checking the type of struct. This can
lead to type confusion if the HBA firmware misbehaves (and some firmware
versions do). So verify the type of struct is what is expected before
using it.
But we can also do better than that. Also verify that the exchange
address of the message sent from the hardware matches the exchange
address of the command being returned. This adds an extra guard against
buggy HBA firmware that returns duplicate messages multiple times (which
has also been seen) in case the driver has reused the handle for a
different command of the same type.
These problems were seen on a QLE2694L with firmware 9.08.02 when
testing SLER / SRR support. The SRR caused the HBA to flood the
response queue with hundreds of bogus entries.
Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Link: https://patch.msgid.link/7c7cb574-fe62-42ae-b800-d136d8dd89ca@cybernetics.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Background: loading qla2xxx with "ql2xtgt_tape_enable=1" enables
Sequence Level Error Recovery (SLER), which is most commonly used for
tape drives. With SLER enabled, if there is a recoverable I/O error
during a SCSI command, a Sequence Retransmission Request (SRR) will be
used to retry the I/O at a low-level completely within the driver
without propagating the error to the upper levels of the SCSI stack.
SRR support was removed in 2017 by commit 2c39b5ca2a ("qla2xxx: Remove
SRR code"). Add it back, new and improved.
The old removed SRR code used sequence numbers to correlate the SRR
CTIOs with SRR immediate notify messages. I don't see how that would
work reliably with MSI-X interrupts and multiple queues. So instead use
the exchange address to find the command associated with the immediate
notify (qlt_srr_to_cmd).
The old removed SRR code had a function qlt_check_srr_debug() to
simulate a SRR, but it didn't work for me. Instead I just used fiber
optic attenuators attached to the FC cable to reduce the strength of the
signal and induce errors. Unfortunately this only worked for inducing
SRRs on Data-Out (write) commands, so that is all I was able to test.
The code to build a new scatterlist for a SRR with nonzero offset has
been improved to reduce memory requirements and has been well-tested.
However it does not support protection information.
When a single cmd gets multiple SRRs, the old removed SRR code would
restore the data buffer from the values in cmd->se_cmd before processing
the new SRR. That might be needed if the offset for the new SRR was
lower than the offset for the previous SRR, but I am not sure if that
can happen. In my testing, when a single cmd gets multiple SRRs, the
SRR offset always increases or stays the same. But in case it can
decrease, I added the function qlt_restore_orig_sg(). If this is not
supposed to happen then qlt_restore_orig_sg() can be removed to simplify
the code.
I ran into some HBA firmware bugs with QLE269x, QLE27xx, and QLE28xx
firmware 9.05.xx - 9.08.xx where a SRR would cause the HBA to misbehave
badly. Since SRRs are rare and therefore difficult to test, I figured
it would be worth checking for the buggy firmware and disabling SLER
with a warning instead of letting others run into the same problem on
the rare occasion that they get a SRR. This turned out to be difficult
because the firmware version isn't known in the normal NVRAM config
routine, so I added a second NVRAM config routine that is called after
the firmware version is known.
Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Link: https://patch.msgid.link/654b7181-b79e-40ed-a15b-6d6e441a5d5f@cybernetics.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
- Add the command tag to various messages so that different messages
about the same command can be correlated.
- For CTIO errors (i.e. when the HW reports an error about a cmd),
print the cmd tag, opcode, state, initiator WWPN, and LUN. This info
helps an administrator determine what is going wrong.
- When a command experiences a transport error, log a message when it
is freed. This makes debugging exceptions easier.
Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Link: https://patch.msgid.link/c579987d-5658-41ae-9653-f0e58c9d1880@cybernetics.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
struct atio7_fcp_cmnd is a variable-length data structure because of
add_cdb_len, but it is embedded in struct atio_from_isp and copied
around like a fixed-length data structure. For big CDBs > 16 bytes,
get_datalen_for_atio() called on a fixed-length copy of the atio will
access invalid memory.
In some cases this can be fixed by moving the atio to the end of the
data structure and using a variable-length allocation. In other cases
such as allocating struct qla_tgt_cmd, the fixed-length data structures
are preallocated for speed, so in the case that add_cdb_len != 0,
allocate a separate buffer for the CDB. Also add memcpy_atio() as a
safeguard against invalid memory accesses.
Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Link: https://patch.msgid.link/306a9d0b-3c89-42fc-a69c-eebca8171347@cybernetics.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(target mode)
If handle_tmr() fails:
- The code for QLA_TGT_ABTS results in memory-use-after-free and
double-free:
qlt_do_tmr_work()
qlt_build_abts_resp_iocb()
qpair->req->outstanding_cmds[h] = (srb_t *)mcmd;
mempool_free(mcmd, qla_tgt_mgmt_cmd_mempool); FIRST FREE
qlt_handle_abts_completion()
mcmd = qlt_ctio_to_cmd()
cmd = req->outstanding_cmds[h];
return cmd;
vha = mcmd->vha; USE-AFTER-FREE
ha->tgt.tgt_ops->free_mcmd(mcmd); SECOND FREE
- qlt_send_busy() makes no sense because it sends a SCSI command
response instead of a TMR response.
Instead just call qlt_xmit_tm_rsp() to send a TMR failed response, since
that code is well-tested and handles a number of corner cases. But it
would be incorrect to call ha->tgt.tgt_ops->free_mcmd() after
handle_tmr() failed, so add a flag to mcmd indicating the proper way to
free the mcmd so that qlt_xmit_tm_rsp() can be used for both cases.
Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Link: https://patch.msgid.link/09a1ff3d-6738-4953-a31b-10e89c540462@cybernetics.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Similar fixes to both functions:
qlt_xmit_response:
- If the cmd cannot be processed, remember to call ->free_cmd() to
prevent the target-mode midlevel from seeing a cmd lockup.
- Do not try to send the response if the exchange has been terminated.
- Check for chip reset once after lock instead of both before and after
lock.
- Give errors from qlt_pre_xmit_response() a lower priority to
compensate for removing the first check for chip reset.
qlt_rdy_to_xfer:
- Check for chip reset after lock instead of before lock to avoid
races.
- Do not try to receive data if the exchange has been terminated.
- Give errors from qlt_pci_map_calc_cnt() a lower priority to
compensate for moving the check for chip reset.
Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Link: https://patch.msgid.link/cd6ccd31-33fa-4454-be36-507bf578a546@cybernetics.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
cmd->cmd_lock only protects cmd->aborted, but when deciding how to
process a cmd, it is necessary to consider other factors such as
cmd->state and if the chip has been reset, which are protected by
qpair->qp_lock_ptr. So replace cmd_lock with qp_lock_ptr, whick makes
it possible to check additional values and make decisions about what to
do without racing with the CTIO handler and other code.
- Lock cmd->qpair->qp_lock_ptr when aborting a cmd.
- Eliminate cmd->cmd_lock and change cmd->aborted to a bitfield since
it is now protected by qp_lock_ptr just like all the other flags.
- Add another command state QLA_TGT_STATE_DONE to avoid any possible
races between qlt_abort_cmd() and tgt_ops->free_cmd().
- Add the cmd->sent_term_exchg flag to indicate if
qlt_send_term_exchange() has already been called.
- Export qlt_send_term_exchange() for SCST so that it can be called
directly instead of trying to make qlt_abort_cmd() work for both TMR
abort and HW timeout.
Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Link: https://patch.msgid.link/2c8d03e4-308b-4d5a-a418-a334be23f815@cybernetics.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Commit aefed3e554 ("scsi: qla2xxx: target: Fix offline port handling
and host reset handling") caused two problems:
1. Commands sent to FW, after chip reset got stuck and never freed as FW
is not going to respond to them anymore.
2. BUG_ON(cmd->sg_mapped) in qlt_free_cmd(). Commit 26f9ce5381
("scsi: qla2xxx: Fix missed DMA unmap for aborted commands")
attempted to fix this, but introduced another bug under different
circumstances when two different CPUs were racing to call
qlt_unmap_sg() at the same time: BUG_ON(!valid_dma_direction(dir)) in
dma_unmap_sg_attrs().
So revert "scsi: qla2xxx: Fix missed DMA unmap for aborted commands" and
partially revert "scsi: qla2xxx: target: Fix offline port handling and
host reset handling" at __qla2x00_abort_all_cmds.
Fixes: aefed3e554 ("scsi: qla2xxx: target: Fix offline port handling and host reset handling")
Fixes: 26f9ce5381 ("scsi: qla2xxx: Fix missed DMA unmap for aborted commands")
Co-developed-by: Dmitry Bogdanov <d.bogdanov@yadro.com>
Signed-off-by: Dmitry Bogdanov <d.bogdanov@yadro.com>
Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Link: https://patch.msgid.link/0e7e5d26-e7a0-42d1-8235-40eeb27f3e98@cybernetics.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Bart Van Assche <bvanassche@acm.org> says:
Hi Martin,
This patch series optimizes the hot path of the UFS driver by making
struct scsi_cmnd and struct ufshcd_lrb adjacent. Making these two data
structures adjacent is realized as follows:
@@ -9040,6 +9046,7 @@ static const struct scsi_host_template ufshcd_driver_template = {
.name = UFSHCD,
.proc_name = UFSHCD,
.map_queues = ufshcd_map_queues,
+ .cmd_size = sizeof(struct ufshcd_lrb),
.init_cmd_priv = ufshcd_init_cmd_priv,
.queuecommand = ufshcd_queuecommand,
.mq_poll = ufshcd_poll,
The following changes had to be made prior to making these two data
structures adjacent:
* Add support for driver-internal and reserved commands in the SCSI core.
* Instead of making the reserved command slot (hba->reserved_slot)
invisible to the SCSI core, let the SCSI core allocate a reserved command.
* Remove all UFS data structure members that are no longer needed
because struct scsi_cmnd and struct ufshcd_lrb are now adjacent
* Call ufshcd_init_lrb() from inside the code for queueing a command instead of
calling this function before I/O starts. This is necessary because
ufshcd_memory_alloc() allocates fewer instances than the block layer
allocates requests. See also the following code in the block layer
core:
if (blk_mq_init_request(set, hctx->fq->flush_rq, hctx_idx,
hctx->numa_node))
Although the UFS driver could be modified such that ufshcd_init_lrb()
is called from ufshcd_init_cmd_priv(), realizing this would require
moving the memory allocations that happen from inside
ufshcd_memory_alloc() into ufshcd_init_cmd_priv(). That would make
this patch series even larger. Although ufshcd_init_lrb() is called for each
command, the benefits of reduced indirection and better cache efficiency
outweigh the small overhead of per-command lrb initialization.
* ufshcd_add_scsi_host() happens now before any device management
commands are submitted. This change is necessary because this patch
makes device management command allocation happen when the SCSI host
is allocated.
* Allocate as many command slots as the host controller supports. Decrease
host->cmds_per_lun if necessary once it is clear whether or not the UFS
device supports less command slots than the host controller.
Please consider this patch series for the next merge window.
Thanks,
Bart.
Link: https://patch.msgid.link/20251031204029.2883185-1-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Properly set the nport_handle field of the terminate exchange message.
Previously when this field was not set properly, the term exchange would
fail when cmd_sent_to_fw == 1 but work when cmd_sent_to_fw == 0 (i.e. it
would fail when the HW was actively transferring data or status for the
cmd but work when the HW was idle). With this change, term exchange
works in any cmd state, which now makes it possible to abort a command
that is locked up in the HW.
Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Link: https://patch.msgid.link/1a221699-969b-4f28-8ea4-395d2f7a7c0a@cybernetics.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When qla2xxx is loaded with qlini_mode=disabled,
ha->flags.disable_msix_handshake is used before it is set, resulting in
the wrong interrupt handler being used on certain HBAs
(qla2xxx_msix_rsp_q_hs() is used when qla2xxx_msix_rsp_q() should be
used). The only difference between these two interrupt handlers is that
the _hs() version writes to a register to clear the "RISC" interrupt,
whereas the other version does not. So this bug results in the RISC
interrupt being cleared when it should not be. This occasionally causes
a different interrupt handler qla24xx_msix_default() for a different
vector to see ((stat & HSRX_RISC_INT) == 0) and ignore its interrupt,
which then causes problems like:
qla2xxx [0000:02:00.0]-d04c:6: MBX Command timeout for cmd 20,
iocontrol=8 jiffies=1090c0300 mb[0-3]=[0x4000 0x0 0x40 0xda] mb7 0x500
host_status 0x40000010 hccr 0x3f00
qla2xxx [0000:02:00.0]-101e:6: Mailbox cmd timeout occurred, cmd=0x20,
mb[0]=0x20. Scheduling ISP abort
(the cmd varies; sometimes it is 0x20, 0x22, 0x54, 0x5a, 0x5d, or 0x6a)
This problem can be reproduced with a 16 or 32 Gbps HBA by loading
qla2xxx with qlini_mode=disabled and running a high IOPS test while
triggering frequent RSCN database change events.
While analyzing the problem I discovered that even with
disable_msix_handshake forced to 0, it is not necessary to clear the
RISC interrupt from qla2xxx_msix_rsp_q_hs() (more below). So just
completely remove qla2xxx_msix_rsp_q_hs() and the logic for selecting
it, which also fixes the bug with qlini_mode=disabled.
The test below describes the justification for not needing
qla2xxx_msix_rsp_q_hs():
Force disable_msix_handshake to 0:
qla24xx_config_rings():
if (0 && (ha->fw_attributes & BIT_6) && (IS_MSIX_NACK_CAPABLE(ha)) &&
(ha->flags.msix_enabled)) {
In qla24xx_msix_rsp_q() and qla2xxx_msix_rsp_q_hs(), check:
(rd_reg_dword(®->host_status) & HSRX_RISC_INT)
Count the number of calls to each function with HSRX_RISC_INT set and
the number with HSRX_RISC_INT not set while performing some I/O.
If qla2xxx_msix_rsp_q_hs() clears the RISC interrupt (original code):
qla24xx_msix_rsp_q: 50% of calls have HSRX_RISC_INT set
qla2xxx_msix_rsp_q_hs: 5% of calls have HSRX_RISC_INT set
(# of qla2xxx_msix_rsp_q_hs interrupts) =
(# of qla24xx_msix_rsp_q interrupts) * 3
If qla2xxx_msix_rsp_q_hs() does not clear the RISC interrupt (patched
code):
qla24xx_msix_rsp_q: 100% of calls have HSRX_RISC_INT set
qla2xxx_msix_rsp_q_hs: 9% of calls have HSRX_RISC_INT set
(# of qla2xxx_msix_rsp_q_hs interrupts) =
(# of qla24xx_msix_rsp_q interrupts) * 3
In the case of the original code, qla24xx_msix_rsp_q() was seeing
HSRX_RISC_INT set only 50% of the time because qla2xxx_msix_rsp_q_hs()
was clearing it when it shouldn't have been. In the patched code,
qla24xx_msix_rsp_q() sees HSRX_RISC_INT set 100% of the time, which
makes sense if that interrupt handler needs to clear the RISC interrupt
(which it does). qla2xxx_msix_rsp_q_hs() sees HSRX_RISC_INT only 9% of
the time, which is just overlap from the other interrupt during the
high IOPS test.
Tested with SCST on:
QLE2742 FW:v9.08.02 (32 Gbps 2-port)
QLE2694L FW:v9.10.11 (16 Gbps 4-port)
QLE2694L FW:v9.08.02 (16 Gbps 4-port)
QLE2672 FW:v8.07.12 (16 Gbps 2-port)
both initiator and target mode
Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Link: https://patch.msgid.link/56d378eb-14ad-49c7-bae9-c649b6c7691e@cybernetics.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When given the module parameter qlini_mode=exclusive, qla2xxx in
initiator mode is initially unable to successfully send SCSI commands to
devices it finds while scanning, resulting in an escalating series of
resets until an adapter reset clears the issue. Fix by checking the
active mode instead of the module parameter.
Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Link: https://patch.msgid.link/1715ec14-ba9a-45dc-9cf2-d41aa6b81b5e@cybernetics.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Instead of storing the tag of the reserved command in hba->reserved_slot,
use scsi_get_internal_cmd() and scsi_put_internal_cmd() to allocate the
tag for the reserved command dynamically. Add
ufshcd_queue_reserved_command() for submitting reserved commands. Add
support in ufshcd_abort() for device management commands. Use
blk_execute_rq() for submitting reserved commands. Remove the code and
data structures that became superfluous. This includes
ufshcd_wait_for_dev_cmd(), hba->reserved_slot and ufs_dev_cmd.complete.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://patch.msgid.link/20251031204029.2883185-29-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The ufshcd_dev_cmd_completion() call is useful for some but not for all
ufshcd_wait_for_dev_cmd() callers. Hence, remove the
ufshcd_dev_cmd_completion() call from ufshcd_wait_for_dev_cmd() and move
it past the ufshcd_issue_dev_cmd() calls where appropriate. This makes
it easier to detect timeout errors for UPIU frames submitted through the
BSG interface.
Reviewed-by: Avri Altman <avri.altman@sandisk.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://patch.msgid.link/20251031204029.2883185-28-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Set .cmd_size in the SCSI host template such that the SCSI core makes
struct scsi_cmnd and struct ufshcd_lrb adjacent. Convert the cmd->lrbp
and lrbp->cmd memory loads into pointer offset calculations. Remove the
data structure members that became superfluous, namely ufshcd_lrb.cmd
and ufs_hba.lrb. Since ufshcd_lrb.cmd is removed, this pointer cannot be
used anymore to test whether or not a command is a SCSI command.
Introduce a new function for this purpose, namely ufshcd_is_scsi_cmd().
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://patch.msgid.link/20251031204029.2883185-24-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Instead of letting the SCSI core allocate hba->nutrs - 1 commands, let
the SCSI core allocate hba->nutrs commands, set the number of reserved
tags to 1 and use the reserved tag for device management commands. This
patch changes the 'reserved slot' from hba->nutrs - 1 into 0 because the
block layer reserves the smallest tags for reserved commands.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://patch.msgid.link/20251031204029.2883185-22-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reserved commands will be used by SCSI LLDs for submitting internal
commands. Since the SCSI host, target and device limits do not apply to
the reserved command use cases, bypass the SCSI host limit checks for
reserved commands. Introduce the .queue_reserved_command() callback for
reserved commands. Additionally, do not activate the SCSI error handler
if a reserved command fails such that reserved commands can be submitted
from inside the SCSI error handler.
[ bvanassche: modified patch title and patch description. Renamed
.reserved_queuecommand() into .queue_reserved_command(). Changed
the second argument of __blk_mq_end_request() from 0 into error
code in the completion path if cmd->result != 0. Rewrote the
scsi_queue_rq() changes. See also
https://lore.kernel.org/linux-scsi/1666693096-180008-5-git-send-email-john.garry@huawei.com/ ]
Cc: Hannes Reinecke <hare@suse.de>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://patch.msgid.link/20251031204029.2883185-6-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Allocate a pseudo SCSI device if 'nr_reserved_cmds' has been set. Pseudo
SCSI devices have the SCSI ID <max_id>:U64_MAX so they won't clash with
any devices the LLD might create. Pseudo SCSI devices are excluded from
scanning and will not show up in sysfs. Additionally, pseudo SCSI
devices are skipped by shost_for_each_device(). This prevents that the
SCSI error handler tries to submit a reset to a non-existent logical
unit.
Do not allocate a budget map for pseudo SCSI devices since the
cmd_per_lun limit does not apply to pseudo SCSI devices.
Do not perform queue depth ramp up / ramp down for pseudo SCSI devices.
Pseudo SCSI devices will be used to send internal commands to a storage
device.
[ bvanassche: edited patch description / renamed host_sdev into
pseudo_sdev / unexported scsi_get_host_dev() / modified error path in
scsi_get_pseudo_dev() / skip pseudo devices in __scsi_iterate_devices()
and also when calling sdev_init(), sdev_configure() and sdev_destroy().
See also
https://lore.kernel.org/linux-scsi/20211125151048.103910-2-hare@suse.de/ ]
Reviewed-by: John Garry <john.g.garry@oracle.com>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://patch.msgid.link/20251031204029.2883185-5-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Quite some drivers are using management commands internally. These
commands typically use the same tag pool as regular SCSI commands. Tags
for these management commands are set aside before allocating the
block-mq tag bitmap for regular SCSI commands. The block layer already
supports this via the reserved tag mechanism. Add a new field
'nr_reserved_cmds' to the SCSI host template to instruct the block layer
to set aside a tag space for these management commands by using reserved
tags. Exclude reserved commands from .can_queue because .can_queue is
visible in sysfs.
[ bvanassche: modified patch title and patch description. Left out the
following statements: "if (sht->nr_reserved_cmds)" and also
"if (sdev->host->nr_reserved_cmds) flags |= BLK_MQ_REQ_RESERVED;". Moved
nr_reserved_cmds declarations and statements close to the
corresponding can_queue declarations and statements. See also
https://lore.kernel.org/linux-scsi/20210503150333.130310-11-hare@suse.de/ ]
Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://patch.msgid.link/20251031204029.2883185-2-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Using start_ptr() and start_ptr_mut() has the advantage that we inherit
the requirements the a mutable or immutable reference from those
methods.
Hence, use them instead of self.cpu_addr.
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Link: https://patch.msgid.link/20251103190655.2326191-1-dakr@kernel.org
[ Keep using self.cpu_addr in item_from_index(). - Danilo ]
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Add OP-TEE based RPMB support for UFS devices. This enables secure RPMB
operations on UFS devices through OP-TEE, providing the same
functionality available for eMMC devices and extending kernel-based
secure storage support to UFS-based systems.
Benefits of OP-TEE based RPMB implementation:
- Eliminates dependency on userspace supplicant for RPMB access
- Enables early boot secure storage access (e.g., fTPM, secure UEFI
variables)
- Provides kernel-level RPMB access as soon as UFS driver is
initialized
- Removes complex initramfs dependencies and boot ordering requirements
- Ensures reliable and deterministic secure storage operations
- Supports both built-in and modular fTPM configurations
[mkp: make this build as a module]
Co-developed-by: Can Guo <can.guo@oss.qualcomm.com>
Signed-off-by: Can Guo <can.guo@oss.qualcomm.com>
Reviewed-by: Avri Altman <avri.altman@sandisk.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Bean Huo <beanhuo@micron.com>
Link: https://patch.msgid.link/20251107230518.4060231-4-beanhuo@iokpp.de
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The function ufshcd_read_string_desc() was duplicating memory starting
from the beginning of struct uc_string_id, which included the length and
type fields. As a result, the allocated buffer contained unwanted
metadata in addition to the string itself.
The correct behavior is to duplicate only the Unicode character array in
the structure. Update the code so that only the actual string content is
copied into the new buffer.
Fixes: 5f57704dbc ("scsi: ufs: Use kmemdup in ufshcd_read_string_desc()")
Reviewed-by: Avri Altman <avri.altman@sandisk.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Bean Huo <beanhuo@micron.com>
Link: https://patch.msgid.link/20251107230518.4060231-3-beanhuo@iokpp.de
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Convert SD_ASCII_STD and SD_RAW from boolean macros to enum values for
improved code readability. This makes ufshcd_read_string_desc() calls
self-documenting by using explicit enum values instead of true/false.
Move the ufshcd_read_string_desc() declaration from include/ufs/ufshcd.h
to drivers/ufs/core/ufshcd-priv.h since this function is not exported.
Co-developed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Suggested-by: Avri Altman <Avri.Altman@sandisk.com>
Signed-off-by: Bean Huo <beanhuo@micron.com>
Link: https://patch.msgid.link/20251107230518.4060231-2-beanhuo@iokpp.de
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Markus Probst <markus.probst@posteo.de> says:
This series adds support for power resources defined in acpi on ata
ports/devices. A device can define a power resource in an ata port/device,
which then gets powered on right before the port is probed. This can be
useful for devices, which have sata power connectors that are:
a: powered down by default
b: can be individually powered on
like in some synology nas devices. If thats the case it will be assumed,
that the power resource won't survive reboots and therefore the disk will
be stopped.
Link: https://patch.msgid.link/20251104142413.322347-1-markus.probst@posteo.de
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Some embedded devices have the ability to control whether power is
provided to the disks via the SATA power connector or not. ACPI power
resources are usually off by default, thus making it unclear if the
specific power resource will retain its state after a restart. If power
resources are defined on ATA ports / devices in ACPI, we should stop the
disk on SYSTEM_RESTART, to ensure the disk will not lose power while
active.
Add a new function, ata_acpi_dev_manage_restart(), that will be used to
determine if a disk should be stopped before restarting the system. If a
usable ACPI power resource has been found, it is assumed that the disk
will lose power after a restart and should be stopped to avoid unclean
shutdown due to power loss.
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Markus Probst <markus.probst@posteo.de>
Link: https://patch.msgid.link/20251104142413.322347-4-markus.probst@posteo.de
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Some embedded devices have the ability to control whether power is
provided to the disks via the SATA power connector or not. If power
resources are defined on ATA ports / devices in ACPI, we should try to
set the power state to D0 before probing the disk to ensure that any
power supply or power gate that may exist is providing power to the
disk.
An example for such devices would be newer synology NAS devices. Every
disk slot has its own SATA power connector. Whether the connector is
providing power is controlled via an gpio, which is *off by default*.
Also the disk loses power on reboots.
Add a new function, ata_acpi_port_power_on(), that will be used to power
on the SATA power connector if usable ACPI power resources on the
associated ATA port / device are found. It will be called right before
probing the port, therefore the disk will be powered on just in time.
Signed-off-by: Markus Probst <markus.probst@posteo.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://patch.msgid.link/20251104142413.322347-3-markus.probst@posteo.de
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In addition to the already existing manage_shutdown,
manage_system_start_stop and manage_runtime_start_stop device scsi_disk
attributes, add manage_restart, which allows the high-level device
driver (sd) to manage the device power state for SYSTEM_RESTART if set
to 1.
This attribute is necessary for the following commit "ata: stop disk on
restart if ACPI power resources are found" to avoid a potential disk
power failure in the case the SATA power connector does not retain the
power state after a restart.
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Markus Probst <markus.probst@posteo.de>
Link: https://patch.msgid.link/20251104142413.322347-2-markus.probst@posteo.de
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Justin Tee <justin.tee@broadcom.com> says:
Update lpfc to revision 14.4.0.12
This patch set contains updates to log messaging, revision of outdated
comment descriptions, fixes to kref accounting, support for BB credit
recovery in point-to-point mode, and introduction of registering unique
platform name identifiers with fabrics.
The patches were cut against Martin's 6.19/scsi-queue tree.
Link: https://patch.msgid.link/20251106224639.139176-1-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
FC-LS and FC-GS specifications outline fields for registering a platform
name identifier (PNI) to the fabric. The PNI assists fabrics with
identifying the physical server source of frames in the fabric.
lpfc generates a PNI based partially on the uuid specific for the
system. Initial attempts to extract a uuid are made from SMBIOS's
System Information 08h uuid entry. If SMBIOS DMI does not exist, a PNI
is not generated and PNI registration with the fabric is skipped.
The PNI is submitted in FLOGI and FDISC frames. After successful fabric
login, the RSPNI_PNI CT frame is submitted to the fabric to register the
OS host name tying it to the PNI.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://patch.msgid.link/20251106224639.139176-10-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Currently, there is a kref put in the lpfc_cleanup() routine that takes
care of outstanding references on fabric controller ndlps in UNUSED
state. While typically there is a state change from UNUSED -> REGLOGIN
when the ndlp successfully logs into the fabric, there may be cases when
FLOGI is unsuccessful and the ndlp will remain in UNUSED state without a
registered rpi, yet the ndlp incorrectly has a kref count of one.
To address this, handling of Fabric Controller ndlps are moved into the
routines: lpfc_issue_els_scr(), lpfc_issue_els_rdf(),
lpfc_cmpl_els_disc_cmd().
In both lpfc_issue_els_scr() and lpfc_issue_els_rdf(), if there does not
exist a previously created fabric controller ndlp, an ndlp will be
created. Otherwise, we can reuse the pre-existing ndlp object.
In lpfc_cmpl_els_disc_cmd(), if the SCR or RDF are not successfully
issued, the initial reference on the ndlp that is not registered with
upper layers will be decremented with a kref_put().
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://patch.msgid.link/20251106224639.139176-7-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In point-to-point topology, the driver sometimes defers the unsolicited
FLOGI LS_ACC until it sends its FLOGI to the remote port. When this
happens, lpfc neglects to release the ndlp allocated for the unsolicited
FLOGI. This patch adds code to release the ndlp for the deferred
unsolicited FLOGI LS_ACC.
An NLP_FLOGI_DFR_ACC flag is introduced to facilitate identifying an
ndlp with an expected deferred FLOGI LS_ACC completion. When
lpfc_cmpl_els_rsp() detects the correct qualifiers, it releases the
initial reference on the ndlp object. And when lpfc_cmpl_els_rsp()
exits, the remaining put for the deferred action is executed and the
ndlp is released.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://patch.msgid.link/20251106224639.139176-6-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Unregistration of an rpi object should be done when a PLOGI is received
as PLOGI receipt implies an implicit LOGO. Previously, the driver would
continue using the same, already registered, rpi and ACC the received
PLOGI.
Replace the ACC and early return statement with break to execute the
rest of the lpfc_rcv_plogi logic outside the switch case statement.
This ensures unregistration and reregistration of an rpi after PLOGI_ACC
completion.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://patch.msgid.link/20251106224639.139176-5-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Don Brace <don.brace@microchip.com> says:
These patches are based on Martin Petersen's 6.19/scsi-queue tree
https://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git
6.19/scsi-queue
This patch series includes four patches, with two main functional changes:
1. smartpqi-Add-timeout-value-to-RAID-path-requests-to-physical-devices
Sets a timeout value for requests sent to physical devices via the
RAID path. This prevents the controller firmware from waiting
indefinitely and allows it to time out requests a few seconds before
the OS issues Target Management Function (TMF) commands.
The timeout value sent to the firmware is set to 3 seconds less than
the OS timeout, which significantly reduces TMF invocations.
2. smartpqi-fix-Device-resources-accessed-after-device-removal
Fixes a race condition during device removal by:
* Checking for device removal in the reset handler and canceling any
pending reset work if the device is no longer present.
* Canceling outstanding TMF work items in the .sdev_destroy handler.
Together, these changes eliminate races between reset operations
and device removal.
The other two patches:
3. smartpqi-add-new-Hurray-Data-pci-device
Adds support for new Hurray Data PCI device.
No functional changes.
4. smartpqi-update-driver-version-to-2.1.36-026
Updates the driver version string.
No functional changes.
Link: https://patch.msgid.link/20251106163823.786828-1-don.brace@microchip.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Correct possible race conditions during device removal.
Previously, a scheduled work item to reset a LUN could still execute
after the device was removed, leading to use-after-free and other
resource access issues.
This race condition occurs because the abort handler may schedule a LUN
reset concurrently with device removal via sdev_destroy(), leading to
use-after-free and improper access to freed resources.
- Check in the device reset handler if the device is still present in
the controller's SCSI device list before running; if not, the reset
is skipped.
- Cancel any pending TMF work that has not started in sdev_destroy().
- Ensure device freeing in sdev_destroy() is done while holding the
LUN reset mutex to avoid races with ongoing resets.
Fixes: 2d80f4054f ("scsi: smartpqi: Update deleting a LUN via sysfs")
Reviewed-by: Scott Teel <scott.teel@microchip.com>
Reviewed-by: Scott Benesh <scott.benesh@microchip.com>
Signed-off-by: Mike McGowen <mike.mcgowen@microchip.com>
Signed-off-by: Don Brace <don.brace@microchip.com>
Link: https://patch.msgid.link/20251106163823.786828-3-don.brace@microchip.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add a timeout value to requests sent to physical devices via the RAID
path.
A timeout value of zero means wait indefinitely, which may cause the OS
to issue Target Management Function (TMF) commands if the device does
not respond.
For input timeouts of 8 seconds or greater, the value sent to firmware
is reduced by 3 seconds to provide an earlier firmware timeout and allow
the OS additional time before timing out.
This change improves timeout handling between the driver, firmware, and
OS, helping to better manage device responsiveness and avoid indefinite
waits.
Reviewed-by: David Strahan <david.strahan@microchip.com>
Reviewed-by: Scott Benesh <scott.benesh@microchip.com>
Reviewed-by: Scott Teel <scott.teel@microchip.com>
Signed-off-by: Mike McGowen <Mike.McGowen@microchip.com>
Signed-off-by: Don Brace <don.brace@microchip.com>
Link: https://patch.msgid.link/20251106163823.786828-2-don.brace@microchip.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Currently if a user enqueues a work item using schedule_delayed_work()
the used wq is "system_wq" (per-cpu wq) while queue_delayed_work() uses
WORK_CPU_UNBOUND (used when a CPU is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND. This lack of consistentcy cannot be
addressed without refactoring the API.
alloc_workqueue() treats all queues as per-CPU by default, while unbound
workqueues must opt-in via WQ_UNBOUND.
This default is suboptimal: most workloads benefit from unbound queues,
allowing the scheduler to place worker threads where they’re needed and
reducing noise when CPUs are isolated.
Continue the effort to refactor workqueue APIs, which has begun with the
change introducing new workqueues and a new alloc_workqueue flag:
commit 128ea9f6cc ("workqueue: Add system_percpu_wq and system_dfl_wq")
commit 930c2ea566 ("workqueue: Add new WQ_PERCPU flag")
Adds a new WQ_PERCPU flag to explicitly request alloc_workqueue() to be
per-CPU when WQ_UNBOUND has not been specified.
With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND),
any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND
must now use WQ_PERCPU.
Once migration is complete, WQ_UNBOUND can be removed and unbound will
become the implicit default.
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://patch.msgid.link/20251105150336.244079-1-marco.crivellari@suse.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
With commit 9604eea5bd ("scsi: st: Add third party poweron reset
handling") some customer tape applications fail from being unable to
complete ioctls to verify ID information for the device when there has
been any type of reset event to their tape devices.
The st driver currently will fail all standard SCSI ioctls if a call to
flush_buffer() fails in st_ioctl(). This causes ioctls which otherwise
have no effect on tape state to succeed or fail based on events
unrelated to the requested ioctl.
This makes SCSI information ioctls unreliable after a reset even if no
buffering is in use. With a reset setting the pos_unknown field,
flush_buffer() will report failure and fail all ioctls. So any
application expecting to use ioctls to check the identify the device
will be unable to do so in such a state.
For SCSI information ioctls, avoid the need for a buffer flush and allow
the ioctls to execute regardless of buffer state.
Signed-off-by: David Jeffery <djeffery@redhat.com>
Tested-by: Laurence Oberman <loberman@redhat.com>
Acked-by: Kai Mäkisara <kai.makisara@kolumbus.fi>
Reviewed-by: John Meneghini <jmeneghi@redhat.com>
Tested-by: John Meneghini <jmeneghi@redhat.com>
Link: https://patch.msgid.link/20251104154709.6436-2-djeffery@redhat.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In stex_probe(), register_reboot_notifier() is called at the beginning,
but if any subsequent initialization step fails, the function returns
without unregistering the notifier, resulting in a resource leak.
Add unregister_reboot_notifier() in the out_disable error path to ensure
proper cleanup on all failure paths.
Fixes: 61b745fa63 ("scsi: stex: Add S6 support")
Signed-off-by: Haotian Zhang <vulab@iscas.ac.cn>
Link: https://patch.msgid.link/20251104094847.270-1-vulab@iscas.ac.cn
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Currently request_irq_by_index() returns
Result<impl PinInit<irq::Registration<T>, Error> + 'a>
which may carry an error in the Result or the initializer; the same is
true for the other IRQ methods.
Use pin_init::pin_init_scope() to get rid of this redundancy.
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Link: https://patch.msgid.link/20251103203053.2348783-2-dakr@kernel.org
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Currently request_irq() returns
Result<impl PinInit<irq::Registration<T>, Error> + 'a>
which may carry an error in the Result or the initializer; the same is
true for request_threaded_irq().
Use pin_init::pin_init_scope() to get rid of this redundancy.
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Link: https://patch.msgid.link/20251103203053.2348783-1-dakr@kernel.org
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Some older (yet supported) versions of clippy throw a false positive
warning for missing a safety comment when the safety comment is on a
multiline statement.
warning: unsafe block missing a safety comment
--> rust/kernel/auxiliary.rs:351:22
|
351 | Self(unsafe { NonNull::new_unchecked(adev) }),
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= help: consider adding a safety comment on the preceding line
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#undocumented_unsafe_blocks
= note: requested on the command line with `-W clippy::undocumented-unsafe-blocks`
warning: 1 warning emitted
Fix this by placing the safety comment right on top of the same line
introducing the unsafe block.
Fixes: e4e679c860 ("rust: auxiliary: unregister on parent device unbind")
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://patch.msgid.link/20251103203932.2361660-1-dakr@kernel.org
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Extend the rust_debugfs_scoped sample to demonstrate how to export a
large binary object through a ScopedDir.
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Matthew Maurer <mmaurer@google.com>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Add support for creating binary debugfs files via ScopedDir. This
mirrors the existing functionality for Dir, but without producing an
owning handle -- files are automatically removed when the associated
Scope is dropped.
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Matthew Maurer <mmaurer@google.com>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Extend the Rust debugfs sample to demonstrate usage of binary file
support. The example now shows how to expose both fixed-size arrays
and dynamically sized vectors as binary blobs in debugfs.
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Matthew Maurer <mmaurer@google.com>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Extend Rust debugfs binary support to allow exposing data stored in
common smart pointers and heap-allocated collections.
- Implement BinaryWriter for Box<T>, Pin<Box<T>>, Arc<T>, and Vec<T>.
- Introduce BinaryReaderMut for mutable binary access with outer locks.
- Implement BinaryReaderMut for Box<T>, Vec<T>, and base types.
- Update BinaryReader to delegate to BinaryReaderMut for Mutex<T>,
Box<T>, Pin<Box<T>> and Arc<T>.
This enables debugfs files to directly expose or update data stored
inside heap-allocated, reference-counted, or lock-protected containers
without manual dereferencing or locking.
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Matthew Maurer <mmaurer@google.com>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Introduce support for read-only, write-only, and read-write binary files
in Rust debugfs. This adds:
- BinaryWriter and BinaryReader traits for writing to and reading from
user slices in binary form.
- New Dir methods: read_binary_file(), write_binary_file(),
`read_write_binary_file`.
- Corresponding FileOps implementations: BinaryReadFile,
BinaryWriteFile, BinaryReadWriteFile.
This allows kernel modules to expose arbitrary binary data through
debugfs, with proper support for offsets and partial reads/writes.
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Matthew Maurer <mmaurer@google.com>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Add UserSliceWriter::write_slice_file(), which is the same as
UserSliceWriter::write_slice_partial() but updates the given
file::Offset by the number of bytes written.
This is equivalent to C's `simple_read_from_buffer()` and useful when
dealing with file offsets from file operations.
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Acked-by: Miguel Ojeda <ojeda@kernel.org>
[ Replace saturating_add() with the raw operator and a corresponding
OVERFLOW comment. - Danilo ]
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
The existing write_slice() method is a wrapper around copy_to_user() and
expects the user buffer to be larger than the source buffer.
However, userspace may split up reads in multiple partial operations
providing an offset into the source buffer and a smaller user buffer.
In order to support this common case, provide a helper for partial
writes.
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Matthew Maurer <mmaurer@google.com>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Alexandre Courbot <acourbot@nvidia.com>
Acked-by: Miguel Ojeda <ojeda@kernel.org>
[ Replace map_or() with let-else; use saturating_add(). - Danilo ]
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Add UserSliceReader::read_slice_file(), which is the same as
UserSliceReader::read_slice_partial() but updates the given file::Offset
by the number of bytes read.
This is equivalent to C's `simple_write_to_buffer()` and useful when
dealing with file offsets from file operations.
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Alexandre Courbot <acourbot@nvidia.com>
Acked-by: Miguel Ojeda <ojeda@kernel.org>
[ Replace saturating_add() with the raw operator and a corresponding
OVERFLOW comment. - Danilo ]
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
The existing read_slice() method is a wrapper around copy_from_user()
and expects the user buffer to be larger than the destination buffer.
However, userspace may split up writes in multiple partial operations
providing an offset into the destination buffer and a smaller user
buffer.
In order to support this common case, provide a helper for partial
reads.
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Matthew Maurer <mmaurer@google.com>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Alexandre Courbot <acourbot@nvidia.com>
Acked-by: Miguel Ojeda <ojeda@kernel.org>
[ Replace map_or() with let-else; use saturating_add(). - Danilo ]
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Add read_poll_timeout_atomic function which polls periodically until a
condition is met, an error occurs, or the attempt limit is reached.
The C's read_poll_timeout_atomic() is used for the similar purpose.
In atomic context the timekeeping infrastructure is unavailable, so
reliable time-based timeouts cannot be implemented. So instead, the
helper accepts a maximum number of attempts and busy-waits (udelay +
cpu_relax) between tries.
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Signed-off-by: FUJITA Tomonori <fujita.tomonori@gmail.com>
Link: https://patch.msgid.link/20251103112958.2961517-3-fujita.tomonori@gmail.com
[ Adjust imports to use "kernel vertical" style. - Danilo ]
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
A refactoring of Device::drvdata_obtain() broke T::disconnect() in the
USB abstractions.
"""
error[E0599]: no method named `data` found for struct `core::pin::Pin<kbox::Box<T, Kmalloc>>` in the current scope
--> rust/kernel/usb.rs:92:34
|
92 | T::disconnect(intf, data.data());
| ^^^^ method not found in `core::pin::Pin<kbox::Box<T, Kmalloc>>`
error: aborting due to 1 previous error
For more information about this error, try `rustc --explain E0599`.
make[2]: *** [rust/Makefile:553: rust/kernel.o] Error 1
make[1]: *** [/builddir/build/BUILD/kernel-6.18.0-build/kernel-next-20251103/linux-6.18.0-0.0.next.20251103.436.vanilla.fc44.x86_64/Makefile:1316: prepare] Error 2
make: *** [Makefile:256: __sub-make] Error 2
"""
This slipped through, since the USB abstractions are globally disabled.
However, the USB tree recently enabled them, hence it showed up in
linux-next.
Reported-by: Thorsten Leemhuis <linux@leemhuis.info>
Closes: https://lore.kernel.org/all/1c8afbc0-e888-4702-9e4e-fa8aef0f97ae@leemhuis.info/
Fixes: 6bbaa93912 ("rust: device: narrow the generic of drvdata_obtain()")
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Tested-by: Thorsten Leemhuis <linux@leemhuis.info>
Link: https://patch.msgid.link/20251103110115.1925072-1-dakr@kernel.org
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
The atomic use in the main I/O path is causing perf issues when using
higher performance backend devices and multiple queues (more than
10 when using vhost-scsi) like with this fio workload:
[global]
bs=4K
iodepth=128
direct=1
ioengine=libaio
group_reporting
time_based
runtime=120
name=standard-iops
rw=randread
numjobs=16
cpus_allowed=0-15
To fix this issue, move the LUN stats to per CPU.
Note: I forgot to include this patch with the delayed/ordered per CPU
tracking and per device/device entry per CPU stats. With this patch you
get the full 33% improvements when using fast backends, multiple queues
and multiple IO submiters.
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Reviewed-by: Dmitry Bogdanov <d.bogdanov@yadro.com>
Link: https://patch.msgid.link/20250917221338.14813-4-michael.christie@oracle.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Create some macros to reduce code duplication for when we handle per-CPU
stats. Convert the existing LUN and auth cases.
Note: This is similar to percpu_counters but they only support s64 since
they are also used for non-stat counters where you need to handle/prevent
rollover more gracefully. Our use is just for stats where the spec
defines u64 use plus we already have some files exporting u64 values so
it's not possible to directly use percpu_counters.
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Reviewed-by: Dmitry Bogdanov <d.bogdanov@yadro.com>
Link: https://patch.msgid.link/20250917221338.14813-3-michael.christie@oracle.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Remove the stack buffer 'buf'. This patch does not modify the output
produced by target_lu_gp_members_show().
An example of the output that is produced with this patch applied:
$ cat /sys/kernel/config/target/core/alua/lu_gps/default_lu_gp/members
fileio_0/vdev0
fileio_1/vdev1
iblock_0/vdev2
$ od -c /sys/kernel/config/target/core/alua/lu_gps/default_lu_gp/members
0000000 f i l e i o _ 0 / v d e v 0 \n f
0000020 i l e i o _ 1 / v d e v 1 \n i b
0000040 l o c k _ 0 / v d e v 2 \n
0000055
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://patch.msgid.link/20251027184639.3501254-3-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Ajay Neeli <ajay.neeli@amd.com> says:
This patch series adds support for the UFS driver on the AMD Versal
Gen 2 SoC. It includes:
- Device tree bindings and driver implementation.
- Secure read support for the secure retrieval of UFS calibration
values.
The UFS host driver is based upon the Synopsis DesignWare (DWC) UFS
architecture, utilizing the existing UFSHCD_DWC and UFSHCD_PLATFORM
drivers.
Link: https://patch.msgid.link/20251021113003.13650-1-ajay.neeli@amd.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add support for a generic ioctl read/write interface using which users
can request firmware to perform read/write operations on a protected and
secure address space.
The functionality is introduced through the means of two new IOCTL IDs
which extend the existing PM_IOCTL EEMI API:
- IOCTL_READ_REG
- IOCTL_MASK_WRITE_REG
The caller only passes the node id of the given device and an offset.
The base address is not exposed to the caller and internally retrieved
by the firmware. Firmware will enforce an access policy on the incoming
read/write request.
Signed-off-by: Izhar Ameer Shaikh <izhar.ameer.shaikh@amd.com>
Reviewed-by: Tanmay Shah <tanmay.shah@amd.com>
Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com>
Signed-off-by: Ajay Neeli <ajay.neeli@amd.com>
Acked-by: Senthil Nathan Thangaraj <senthilnathan.thangaraj@amd.com>
Acked-by: Michal Simek <michal.simek@amd.com>
Acked-by: Bart Van Assche <bvanassche@acm.org>
Link: https://patch.msgid.link/20251021113003.13650-3-ajay.neeli@amd.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Illustrate how a parent driver of an auxiliary driver can take advantage
of the device context guarantees given by the auxiliary bus and
subsequently safely derive its device private data.
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
In ParentDriver::connect() rename parent to dev, use it for the
dev_info!() call, call pdev.vendor_() directly in the print statement
and remove the unnecessary generic type of Result.
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Take advantage of the fact that if the auxiliary device is bound the
parent is guaranteed to be bound as well and implement a separate
parent() method for auxiliary::Device<Bound>.
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Currently, the parent method is implemented for any Device<Ctx>, i.e.
any device context and returns a &device::Device<Normal>.
However, a subsequent patch will introduce
impl Device<Bound> {
pub fn parent() -> device::Device<Bound> { ... }
}
which takes advantage of the fact that if the auxiliary device is bound
the parent is guaranteed to be bound as well.
I.e. the behavior we want is that all device contexts that dereference
to Bound, will use the implementation above, whereas the old
implementation should only be implemented for Device<Normal>.
Hence, move the current implementation.
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Guarantee that an auxiliary driver will be unbound before its parent is
unbound; there is no point in operating an auxiliary device whose parent
has been unbound.
In practice, this guarantee allows us to assume that for a bound
auxiliary device, also the parent device is bound.
This is useful when an auxiliary driver calls into its parent, since it
allows the parent to directly access device resources and its device
private data due to the guaranteed bound device context.
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
An auxiliary device is guaranteed to always have a parent device (both
in C and Rust), hence don't return an Option<&auxiliary::Device> in
auxiliary::Device::parent().
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
In C dev_get_drvdata() has specific requirements under which it is valid
to access the returned pointer. That is, drivers have to ensure that
(1) for the duration the returned pointer is accessed the driver is
bound and remains to be bound to the corresponding device,
(2) the returned void * is treated according to the driver's private
data type, i.e. according to what has been passed to
dev_set_drvdata().
In Rust, (1) can be ensured by simply requiring the Bound device
context, i.e. provide the drvdata() method for Device<Bound> only.
For (2) we would usually make the device type generic over the driver
type, e.g. Device<T: Driver>, where <T as Driver>::Data is the type of
the driver's private data.
However, a device does not have a driver type known at compile time and
may be bound to multiple drivers throughout its lifetime.
Hence, in order to be able to provide a safe accessor for the driver's
device private data, we have to do the type check on runtime.
This is achieved by letting a driver assert the expected type, which is
then compared to a type hash stored in struct device_private when
dev_set_drvdata() is called.
Example:
// `dev` is a `&Device<Bound>`.
let data = dev.drvdata::<SampleDriver>()?;
There are two aspects to note:
(1) Technically, the same check could be achieved by comparing the
struct device_driver pointer of struct device with the struct
device_driver pointer of the driver struct (e.g. struct
pci_driver).
However, this would - in addition the pointer comparison - require
to tie back the private driver data type to the struct
device_driver pointer of the driver struct to prove correctness.
Besides that, accessing the driver struct (stored in the module
structure) isn't trivial and would result into horrible code and
API ergonomics.
(2) Having a direct accessor to the driver's private data is not
commonly required (at least in Rust): Bus callback methods already
provide access to the driver's device private data through a &self
argument, while other driver entry points such as IRQs,
workqueues, timers, IOCTLs, etc. have their own private data with
separate ownership and lifetime.
In other words, a driver's device private data is only relevant
for driver model contexts (such a file private is only relevant
for file contexts).
Having that said, the motivation for accessing the driver's device
private data with Device<Bound>::drvdata() are interactions between
drivers. For instance, when an auxiliary driver calls back into its
parent, the parent has to be capable to derive its private data from the
corresponding device (i.e. the parent of the auxiliary device).
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ * Remove unnecessary `const _: ()` block,
* rename type_id_{store,match}() to {set,match}_type_id(),
* assert size_of::<bindings::driver_type>() >= size_of::<TypeId>(),
* add missing check in case Device::drvdata() is called from probe().
- Danilo ]
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Let T be the actual private driver data type without the surrounding
box, as it leaves less room for potential bugs.
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
- Drop unnecessary Result's '<()>'
- Use '?' instead of match
Signed-off-by: FUJITA Tomonori <fujita.tomonori@gmail.com>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Bart Van Assche <bvanassche@acm.org> says:
Hi Martin,
This patch series includes two bug fixes for this development cycle
and six small patches that are intended for the next merge window. If
applying the first two patches only during the current development
cycle would be inconvenient, postponing all patches until the next
merge window is fine with me.
Please consider including these patches in the upstream kernel.
Thanks,
Bart.
[mkp: Applied patches #1 and #2 to 6.18/scsi-fixes]
Link: https://patch.msgid.link/20251014200118.3390839-1-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Replace deprecated simple_strtol() calls with kstrtoint() in
timeout_store() and firmware_loading_store() functions to
improve input validation and error handling. The simple_strtol()
function does not provide proper error checking for invalid input,
while kstrtoint() returns an error for malformed strings.
This change adds proper validation for user input from sysfs attributes,
returning -EINVAL for invalid numeric strings instead of silently accepting
potentially malformed input. The behavior for valid numeric input remains
unchanged.
The simple_strtol() function is deprecated in favor of kstrtoint() family
functions which provide better error handling and are recommended for new
code and replacements.
Signed-off-by: Kaushlendra Kumar <kaushlendra.kumar@intel.com>
Link: https://patch.msgid.link/20250925063812.2269501-1-kaushlendra.kumar@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The cancel_store() function currently calls the firmware upload cancel
operation even when no upload is in progress (i.e., when progress is
FW_UPLOAD_PROG_IDLE).
Update cancel_store() to only invoke the cancel operation when an upload
is active. If the upload is idle, return -ENODEV without calling cancel.
This change improves safety and correctness by ensuring driver operations
are only called in valid states.
Signed-off-by: Kaushlendra Kumar <kaushlendra.kumar@intel.com>
Link: https://patch.msgid.link/20250925054129.2199157-1-kaushlendra.kumar@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Building the qla1280 driver with DEBUG_QLA1280 set will emit compiler
warnings. Fix some print formatting strings to reflect the correct type
of printed variables as well as remove unused code. (static function
ql1280_dump_device) in order to avoid compiler warnings.
[mkp: fixed a few more checkpatch warnings]
Signed-off-by: Magnus Lindholm <linmag7@gmail.com>
Link: https://patch.msgid.link/20251002052604.24590-1-linmag7@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Peter Wang <peter.wang@mediatek.com says>:
Improves the UFS Mediatek driver by correcting clock scaling with PM
QoS, and adjusting power management flows. It addresses
shutdown/suspend race conditions, and removes redundant
functions. Support for new platforms is added with the MMIO_OTSD_CTRL
register, and MT6991 performance is optimized with MRTT and random
improvements. These changes collectively enhance driver performance,
stability, and compatibility.
Changes since v1:
1. Remove two patches that will be fixed in UFS core.
- ufs: host: mediatek: Fix runtime suspend error deadlock
- ufs: host: mediatek: Enable interrupts for MCQ mode
2. Use hba->shutting_down instead of ufshcd_is_user_access_allowed
v1:
https://patch.msgid.link/20250918104000.208856-1-peter.wang@mediatek.com
Link: https://patch.msgid.link/20250924094527.2992256-1-peter.wang@mediatek.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Introduce support for a new UFS Mediatek platform by adding the
REG_UFS_UFS_MMIO_OTSD_CTRL register. This update includes checks for
legacy platforms and uses the new register to replace debug selection
and handle specific operations. The changes ensure compatibility across
different hardware versions and prevent potential issues with debug
usage on newer platforms.
Additional updates include error logging improvements during link setup
for newer and legacy platforms, ensuring proper event logging and
debugging.
Signed-off-by: Peter Wang <peter.wang@mediatek.com>
Acked-by: Chun-Hung Wu <chun-hung.wu@mediatek.com>
Link: https://patch.msgid.link/20250924094527.2992256-8-peter.wang@mediatek.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Bao D. Nguyen <quic_nguyenb@quicinc.com> says:
Multiple ufs device manufacturers request support for the
UFS_DEVICE_QUIRK_DELAY_AFTER_LPM quirk in the Qualcomm's platform
driver. After checking further with the major UFS manufacturers
engineering teams such as Samsung, Kioxia, SK Hynix and Micron, all
the manufacturers require this quirk. Since the quirk is needed by all
the ufs device manufacturers, remove the quirk in the ufs core driver
and implement a universal delay for all the ufs devices.
In addition to verifying with the public device's datasheets, the ufs
device manufacturer's engineering teams confirmed the required vcc
power-off time for the devices is a minimum of 1ms before vcc can be
powered on again. The existing 5ms delay implemented in the ufs core
driver seems too conservative, so replace the hard coded 5ms delay
with a variable default to 2ms setting to improve the system resume
latency. The platform drivers can override this setting as needed.
Link: https://patch.msgid.link/cover.1760383740.git.quic_nguyenb@quicinc.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
After the UFS device VCC is turned off, all the UFS device manufacturers
require a period of power-off time before the VCC can be turned on
again. This requirement has been confirmed with all the UFS device
manufacturer's datasheets.
Remove the UFS_DEVICE_QUIRK_DELAY_AFTER_LPM quirk in the UFS core driver
and implement a universal delay that is required by all the UFS device
manufacturers. In addition, remove the support for this quirk in the
platform drivers.
Signed-off-by: Bao D. Nguyen <quic_nguyenb@quicinc.com>
Reviewed-by: Peter Wang <peter.wang@mediatek.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://patch.msgid.link/25f134d5a42e8b8365be64d512d1bb5fc2bce6ff.1760383740.git.quic_nguyenb@quicinc.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The driver model defines the lifetime of the private data stored in (and
owned by) a bus device to be valid from when the driver is bound to a
device (i.e. from successful probe()) until the driver is unbound from
the device.
This is already taken care of by the Rust implementation of the driver
model. However, we still ask drivers to return a Result<Pin<KBox<Self>>>
from probe().
Unlike in C, where we do not have the concept of initializers, but
rather deal with uninitialized memory, drivers can just return an
impl PinInit<Self, Error> instead.
This contributes to more clarity to the fact that a driver returns it's
device private data in probe() and the Rust driver model owns the data,
manages the lifetime and - considering the lifetime - provides (safe)
accessors for the driver.
Hence, let probe() functions return an impl PinInit<Self, Error> instead
of Result<Pin<KBox<Self>>>.
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Alexandre Courbot <acourbot@nvidia.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
In more complex cases, initializers need to run arbitrary code before
assigning initializers to fields. While this is possible using the
underscore codeblock feature (`_: {}`), values returned by such
functions cannot be used from later field initializers.
The two new functions `[pin_]init_scope` allow users to first run some
fallible code and then return an initializer which the function turns
into a single initializer. This permits using the same value multiple
times by different fields.
Reviewed-by: Gary Guo <gary@garyguo.net>
Reviewed-by: Danilo Krummrich <dakr@kernel.org>
Signed-off-by: Benno Lossin <lossin@kernel.org>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
[ Fix typo in commit message: s/functinos/functions/. - Danilo ]
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
-Wflex-array-member-not-at-end was introduced in GCC-14, and we are
getting ready to enable it, globally.
Use the new TRAILING_OVERLAP() helper to fix the following warnings:
drivers/scsi/megaraid/megaraid_sas_fusion.h:1153:31: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]
drivers/scsi/megaraid/megaraid_sas_fusion.h:1198:32: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]
This helper creates a union between a flexible-array member (FAM) and a
set of MEMBERS that would otherwise follow it --in this case 'struct
MR_LD_SPAN_MAP ldSpanMap[MAX_LOGICAL_DRIVES_DYN]' and 'struct
MR_LD_SPAN_MAP ldSpanMap[MAX_LOGICAL_DRIVES]' in the corresponding
structures.
This overlays the trailing members onto the FAM (struct MR_LD_SPAN_MAP
ldSpanMap[];) while keeping the FAM and the start of MEMBERS aligned.
The static_assert() ensures this alignment remains, and it's
intentionally placed inmediately after the corresponding structures --no
blank line in between.
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://patch.msgid.link/aM1E7Xa8qYdZ598N@kspp
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-Wflex-array-member-not-at-end was introduced in GCC-14, and we are
getting ready to enable it, globally.
Move the conflicting declaration (which happens to be in a union, so
we're moving the entire union) to the end of the corresponding
structure. Notice that `struct ssp_response_iu` is a flexible structure,
this is a structure that contains a flexible-array member.
With these changes fix the following warning:
drivers/scsi/isci/task.h:92:11: warning: structure containing a flexible
array member is not at the end of another structure
[-Wflex-array-member-not-at-end]
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://patch.msgid.link/aM09bpl1xj9KZSZl@kspp
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Implement TryInto<IrqRequest<'a>> for IrqVector<'a> to directly convert
a pci::IrqVector into a generic IrqRequest, instead of taking the
indirection via an unrelated pci::Device method.
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Add support to PCI rust module to allocate, free and manage IRQ vectors.
Integrate with devres for managing the allocated resources.
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
[ Add links in doc-comments; add missing invariant comment; re-format
multiple safety requirements as list and fix missing backticks;
refactor the example of alloc_irq_vectors() to compile. - Danilo ]
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
When merging the Loongson-2K BMC driver, temporarily removed the
addition of the IPMI driver entry in MAINTAINERS to avoid conflicts.
This needs to be fixed as soon as possible.
Now, adding myself as maintainer for the Loongson-2K IPMI driver.
Signed-off-by: Binbin Zhou <zhoubinbin@loongson.cn>
Message-ID: <20251015095556.3736330-1-zhoubinbin@loongson.cn>
Signed-off-by: Corey Minyard <corey@minyard.net>
channel_handler() sets intf->channels_ready to true but never
clears it, so __scan_channels() skips any rescan. When the BMC
firmware changes a rescan is required. Allow it by clearing
the flag before starting a new scan.
Signed-off-by: Jinhui Guo <guojinhui.liam@bytedance.com>
Message-ID: <20250930074239.2353-3-guojinhui.liam@bytedance.com>
Signed-off-by: Corey Minyard <corey@minyard.net>
The race window between __scan_channels() and deliver_response() causes
the parameters of some channels to be set to 0.
1.[CPUA] __scan_channels() issues an IPMI request and waits with
wait_event() until all channels have been scanned.
wait_event() internally calls might_sleep(), which might
yield the CPU. (Moreover, an interrupt can preempt
wait_event() and force the task to yield the CPU.)
2.[CPUB] deliver_response() is invoked when the CPU receives the
IPMI response. After processing a IPMI response,
deliver_response() directly assigns intf->wchannels to
intf->channel_list and sets intf->channels_ready to true.
However, not all channels are actually ready for use.
3.[CPUA] Since intf->channels_ready is already true, wait_event()
never enters __wait_event(). __scan_channels() immediately
clears intf->null_user_handler and exits.
4.[CPUB] Once intf->null_user_handler is set to NULL, deliver_response()
ignores further IPMI responses, leaving the remaining
channels zero-initialized and unusable.
CPUA CPUB
------------------------------- -----------------------------
__scan_channels()
intf->null_user_handler
= channel_handler;
send_channel_info_cmd(intf,
0);
wait_event(intf->waitq,
intf->channels_ready);
do {
might_sleep();
deliver_response()
channel_handler()
intf->channel_list =
intf->wchannels + set;
intf->channels_ready = true;
send_channel_info_cmd(intf,
intf->curr_channel);
if (condition)
break;
__wait_event(wq_head,
condition);
} while(0)
intf->null_user_handler
= NULL;
deliver_response()
if (!msg->user)
if (intf->null_user_handler)
rv = -EINVAL;
return rv;
------------------------------- -----------------------------
Fix the race between __scan_channels() and deliver_response() by
deferring both the assignment intf->channel_list = intf->wchannels
and the flag intf->channels_ready = true until all channels have
been successfully scanned or until the IPMI request has failed.
Signed-off-by: Jinhui Guo <guojinhui.liam@bytedance.com>
Message-ID: <20250930074239.2353-2-guojinhui.liam@bytedance.com>
Signed-off-by: Corey Minyard <corey@minyard.net>
2025-10-14 15:52:58 -05:00
200 changed files with 8167 additions and 2370 deletions
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.