We started disabling '-Warray-bounds' for gcc-12 originally on s390,
because it resulted in some warnings that weren't realistically fixable
(commit 8b202ee218: "s390: disable -Warray-bounds").
That s390-specific issue was then found to be less common elsewhere, but
generic (see f0be87c42c: "gcc-12: disable '-Warray-bounds' universally
for now"), and then later expanded the version check was expanded to
gcc-11 (5a41237ad1: "gcc: disable -Warray-bounds for gcc-11 too").
And it turns out that I was much too optimistic in thinking that it's
all going to go away, and here we are with gcc-13 showing all the same
issues. So instead of expanding this one version at a time, let's just
disable it for gcc-11+, and put an end limit to it only when we actually
find a solution.
Yes, I'm sure some of this is because the kernel just does odd things
(like our "container_of()" use, but also knowingly playing games with
things like linker tables and array layouts).
And yes, some of the warnings are likely signs of real bugs, but when
there are hundreds of false positives, that doesn't really help.
Oh well.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull Kbuild fixes from Masahiro Yamada:
- Fix the prefix in the kernel source tarball
- Fix a typo in the copyright file in Debian package
* tag 'kbuild-fixes-v6.3-4' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kbuild: use proper prefix for tarballs to fix rpm-pkg build error
kbuild: deb-pkg: Fix a spell typo in mkdebian script
Pull irq fix from Borislav Petkov:
- Remove an over-zealous sanity check of the array of MSI-X vectors to
be allocated for a device
* tag 'irq_urgent_for_v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
PCI/MSI: Remove over-zealous hardware size check in pci_msix_validate_entries()
Pull x86 fix from Borislav Petkov
- Fix for older binutils which do not support C-syntax constant
suffixes
* tag 'x86_urgent_for_v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/alternatives: Do not use integer constant suffixes in inline asm
Pull input fixes from Dmitry Torokhov:
- a check in pegasus-notetaker driver to validate the type of pipe when
probing a new device
- a fix for Cypress touch controller to correctly parse maximum number
of touches.
* tag 'input-for-v6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: cyttsp5 - fix sensing configuration data structure
Input: pegasus-notetaker - check pipe type when probing
Since commit f8d94c4e40 ("kbuild: do not create intermediate *.tar
for source tarballs"), 'make rpm-pkg' fails because the prefix of the
source tarball is 'linux.tar/' instead of 'linux/'. $(basename $@)
strips only '.gz' from the filename linux.tar.gz.
You need to strip two suffixes from compressed tarballs and one suffix
from uncompressed tarballs (for example 'perf-6.3.0.tar' generated by
'make perf-tar-src-pkg').
One tricky fix might be --prefix=$(firstword $(subst .tar, ,$@))/
but I think it is better to hard-code the prefix.
Fixes: f8d94c4e40 ("kbuild: do not create intermediate *.tar for source tarballs")
Reported-by: Jiwei Sun <sunjw10@lenovo.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
Pull MIPS fix from Thomas Bogendoerfer:
"Fix for link errors"
* tag 'mips-fixes_6.3_2' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
MIPS: Define RUNTIME_DISCARD_EXIT in LD script
Pull kvm fixes from Paolo Bonzini:
"Two serious ARM fixes:
- Plug a buffer overflow due to the use of the user-provided register
width for firmware regs. Outright reject accesses where the user
register width does not match the kernel representation.
- Protect non-atomic RMW operations on vCPU flags against preemption,
as an update to the flags by an intervening preemption could be
lost"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: arm64: Fix buffer overflow in kvm_arm_set_fw_reg()
KVM: arm64: Make vcpu flag updates non-preemptible
Pull cifs fixes from Steve French:
"Three small smb3 client fixes:
- two important fixes for unbuffered read regression with the
iov_iter changes (e.g. read soon after mount in some multichannel
scenarios)
- DFS prefix path fix (also for stable)"
* tag '6.3-rc7-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: Reapply lost fix from commit 30b2b2196d
cifs: Fix unbuffered read
cifs: avoid dup prefix path in dfs_get_automount_devname()
KVM/arm64 fixes for 6.3, part #4
- Plug a buffer overflow due to the use of the user-provided register
width for firmware regs. Outright reject accesses where the
user register width does not match the kernel representation.
- Protect non-atomic RMW operations on vCPU flags against preemption,
as an update to the flags by an intervening preemption could be lost.
MIPS's exit sections are discarded at runtime as well.
Fixes link error:
`.exit.text' referenced in section `__jump_table' of fs/fuse/inode.o:
defined in discarded section `.exit.text' of fs/fuse/inode.o
Fixes: 99cb0d917f ("arch: fix broken BuildID for arm64 and riscv")
Reported-by: "kernelci.org bot" <bot@kernelci.org>
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Pull btrfs fixes from David Sterba:
"Two patches fixing the problem with aync discard.
The default settings had a low IOPS limit and processing a large batch
to discard would take a long time. On laptops this can cause increased
power consumption due to disk activity.
As async discard has been on by default since 6.2 this likely affects
a lot of users.
Summary:
- increase the default IOPS limit 10x which reportedly helped
- setting the sysfs IOPS value to 0 now does not throttle anymore
allowing the discards to be processed at full speed. Previously
there was an arbitrary 6 hour target for processing the pending
batch"
* tag 'for-6.3-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: reinterpret async discard iops_limit=0 as no delay
btrfs: set default discard iops_limit to 1000
Pull block fix from Jens Axboe:
"Just a single revert of a patch from the 6.3 series"
* tag 'block-6.3-2023-04-21' of git://git.kernel.dk/linux:
Revert "block: Merge bio before checking ->cached_rq"
Pull char/misc driver fixes from Greg KH:
"Here are some last-minute tiny driver fixes for 6.3-final. They
include fixes for some fpga and iio drivers:
- fpga bridge driver fix
- fpga dfl error reporting fix
- fpga m10bmc driver fix
- fpga xilinx driver fix
- iio light driver fix
- iio dac fwhandle leak fix
- iio adc driver fix
All of these have been in linux-next for a few weeks with no reported
problems"
* tag 'char-misc-6.3-final' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
iio: light: tsl2772: fix reading proximity-diodes from device tree
fpga: bridge: properly initialize bridge device before populating children
iio: dac: ad5755: Add missing fwnode_handle_put()
iio: adc: at91-sama5d2_adc: fix an error code in at91_adc_allocate_trigger()
fpga: xilinx-pr-decoupler: Use readl wrapper instead of pure readl
fpga: dfl-pci: Drop redundant pci_enable_pcie_error_reporting()
fpga: m10bmc-sec: Fix rsu_send_data() to return FW_UPLOAD_ERR_HW_ERROR
Pull gpio fixes from Bartosz Golaszewski:
- use raw_spinlocks in regmaps that are used in interrupt context in
gpio-104-idi-48 and gpio-104-dio-48e
* tag 'gpio-fixes-for-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpio: 104-idi-48: Enable use_raw_spinlock for idi48_regmap_config
gpio: 104-dio-48e: Enable use_raw_spinlock for dio48e_regmap_config
Pull sound fixes from Takashi Iwai:
"Just a few fixes: all small and device-specific (ASoC FSL, SOF, and
HD-audio quirks), should be safe to apply at the last minute"
* tag 'sound-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda/realtek: fix mute/micmute LEDs for a HP ProBook
ASoC: fsl_asrc_dma: fix potential null-ptr-deref
ASoC: fsl_sai: Fix pins setting for i.MX8QM platform
ALSA: hda/realtek: Remove specific patch for Dell Precision 3260
ASoC: max98373: change power down sequence for smart amp
ASoC: SOF: pm: Tear down pipelines only if DSP was active
ASoC: SOF: ipc4-topology: Clarify bind failure caused by missing fw_module
Pull drm fixes from Dave Airlie:
"This is the regular and hopefully last round of fixes for 6.3.
Pretty small, a few amdgpu, one i915, one nouveau, one rockchip and
one gpu scheduler fix:
nouveau:
- fix dma-resv timeout
rockchip:
- fix suspend/resume
sched:
- fix timeout handling
i915:
- Fix fast wake AUX sync len
amdgpu:
- GPU reset fix
- DCN 3.1.5 line buffer fix
- Display fix for single channel memory configs
- Fix a possible divide by 0"
* tag 'drm-fixes-2023-04-21' of git://anongit.freedesktop.org/drm/drm:
drm/amd/display: fix a divided-by-zero error
drm/amd/display: limit timing for single dimm memory
drm/amd/display: set dcn315 lb bpp to 48
drm/amdgpu: Fix desktop freezed after gpu-reset
drm/rockchip: vop2: Use regcache_sync() to fix suspend/resume
drm/nouveau: fix incorrect conversion to dma_resv_wait_timeout()
drm/rockchip: vop2: fix suspend/resume
drm/i915: Fix fast wake AUX sync len
drm/sched: Check scheduler ready before calling timeout handling
Pull pci fix from Bjorn Helgaas:
- Previously we ignored PCI devices if the DT "status" property or the
ACPI _STA method said it was not present.
Per spec, _STA cannot be used for that purpose, and using it that way
caused regressions, so skip the _STA check (Rob Herring)
* tag 'pci-v6.3-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
PCI: Restrict device disabled status check to DT
Currently, a limit of 0 results in a hard coded metering over 6 hours.
Since the default is a set limit, I suspect no one truly depends on this
rather arbitrary setting. Repurpose it for an arguably more useful
"unlimited" mode, where the delay is 0.
Note that if block groups are too new, or go fully empty, there is still
a delay associated with those conditions. Those delays implement
heuristics for not trimming a region we are relatively likely to fully
overwrite soon.
CC: stable@vger.kernel.org # 6.2+
Reviewed-by: Neal Gompa <neal@gompa.dev>
Signed-off-by: Boris Burkov <boris@bur.io>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Previously, the default was a relatively conservative 10. This results
in a 100ms delay, so with ~300 discards in a commit, it takes the full
30s till the next commit to finish the discards. On a workstation, this
results in the disk never going idle, wasting power/battery, etc.
Set the default to 1000, which results in using the smallest possible
delay, currently, which is 1ms. This has shown to not pathologically
keep the disk busy by the original reporter.
Link: https://lore.kernel.org/linux-btrfs/Y%2F+n1wS%2F4XAH7X1p@nz/
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2182228
CC: stable@vger.kernel.org # 6.2+
Reviewed-by: Neal Gompa <neal@gompa.dev
Signed-off-by: Boris Burkov <boris@bur.io>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Pull Rust fixes from Miguel Ojeda:
"Most of these are straightforward.
The last one is more complex, but it only touches Rust + GCC builds
which are for the moment best-effort.
- Code: Missing 'extern "C"' fix.
- Scripts: 'is_rust_module.sh' and 'generate_rust_analyzer.py' fixes.
- A couple trivial fixes
- Build: Rust + GCC build fix and 'grep' warning fix"
* tag 'rust-fixes-6.3' of https://github.com/Rust-for-Linux/linux:
rust: allow to use INIT_STACK_ALL_ZERO
rust: fix regexp in scripts/is_rust_module.sh
rust: build: Fix grep warning
scripts: generate_rust_analyzer: Handle sub-modules with no Makefile
rust: kernel: Mark rust_fmt_argument as extern "C"
rust: sort uml documentation arch support table
rust: str: fix requierments->requirements typo
Pull networking fixes from Paolo Abeni:
"Including fixes from netfilter and bpf.
There are a few fixes for new code bugs, including the Mellanox one
noted in the last networking pull. No known regressions outstanding.
Current release - regressions:
- sched: clear actions pointer in miss cookie init fail
- mptcp: fix accept vs worker race
- bpf: fix bpf_arch_text_poke() with new_addr == NULL on s390
- eth: bnxt_en: fix a possible NULL pointer dereference in unload
path
- eth: veth: take into account peer device for
NETDEV_XDP_ACT_NDO_XMIT xdp_features flag
Current release - new code bugs:
- eth: revert "net/mlx5: Enable management PF initialization"
Previous releases - regressions:
- netfilter: fix recent physdev match breakage
- bpf: fix incorrect verifier pruning due to missing register
precision taints
- eth: virtio_net: fix overflow inside xdp_linearize_page()
- eth: cxgb4: fix use after free bugs caused by circular dependency
problem
- eth: mlxsw: pci: fix possible crash during initialization
Previous releases - always broken:
- sched: sch_qfq: prevent slab-out-of-bounds in qfq_activate_agg
- netfilter: validate catch-all set elements
- bridge: don't notify FDB entries with "master dynamic"
- eth: bonding: fix memory leak when changing bond type to ethernet
- eth: i40e: fix accessing vsi->active_filters without holding lock
Misc:
- Mat is back as MPTCP co-maintainer"
* tag 'net-6.3-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (33 commits)
net: bridge: switchdev: don't notify FDB entries with "master dynamic"
Revert "net/mlx5: Enable management PF initialization"
MAINTAINERS: Resume MPTCP co-maintainer role
mailmap: add entries for Mat Martineau
e1000e: Disable TSO on i219-LM card to increase speed
bnxt_en: fix free-runnig PHC mode
net: dsa: microchip: ksz8795: Correctly handle huge frame configuration
bpf: Fix incorrect verifier pruning due to missing register precision taints
hamradio: drop ISA_DMA_API dependency
mlxsw: pci: Fix possible crash during initialization
mptcp: fix accept vs worker race
mptcp: stops worker on unaccepted sockets at listener close
net: rpl: fix rpl header size calculation
net: vmxnet3: Fix NULL pointer dereference in vmxnet3_rq_rx_complete()
bonding: Fix memory leak when changing bond type to Ethernet
veth: take into account peer device for NETDEV_XDP_ACT_NDO_XMIT xdp_features flag
mlxfw: fix null-ptr-deref in mlxfw_mfa2_tlv_next()
bnxt_en: Fix a possible NULL pointer dereference in unload path
bnxt_en: Do not initialize PTP on older P3/P4 chips
netfilter: nf_tables: tighten netlink attribute requirements for catch-all elements
...
There is a structural problem in switchdev, where the flag bits in
struct switchdev_notifier_fdb_info (added_by_user, is_local etc) only
represent a simplified / denatured view of what's in struct
net_bridge_fdb_entry :: flags (BR_FDB_ADDED_BY_USER, BR_FDB_LOCAL etc).
Each time we want to pass more information about struct
net_bridge_fdb_entry :: flags to struct switchdev_notifier_fdb_info
(here, BR_FDB_STATIC), we find that FDB entries were already notified to
switchdev with no regard to this flag, and thus, switchdev drivers had
no indication whether the notified entries were static or not.
For example, this command:
ip link add br0 type bridge && ip link set swp0 master br0
bridge fdb add dev swp0 00:01:02:03:04:05 master dynamic
has never worked as intended with switchdev. It causes a struct
net_bridge_fdb_entry to be passed to br_switchdev_fdb_notify() which has
a single flag set: BR_FDB_ADDED_BY_USER.
This is further passed to the switchdev notifier chain, where interested
drivers have no choice but to assume this is a static (does not age) and
sticky (does not migrate) FDB entry. So currently, all drivers offload
it to hardware as such, as can be seen below ("offload" is set).
bridge fdb get 00:01:02:03:04:05 dev swp0 master
00:01:02:03:04:05 dev swp0 offload master br0
The software FDB entry expires $ageing_time centiseconds after the
kernel last sees a packet with this MAC SA, and the bridge notifies its
deletion as well, so it eventually disappears from hardware too.
This is a problem, because it is actually desirable to start offloading
"master dynamic" FDB entries correctly - they should expire $ageing_time
centiseconds after the *hardware* port last sees a packet with this
MAC SA - and this is how the current incorrect behavior was discovered.
With an offloaded data plane, it can be expected that software only sees
exception path packets, so an otherwise active dynamic FDB entry would
be aged out by software sooner than it should.
With the change in place, these FDB entries are no longer offloaded:
bridge fdb get 00:01:02:03:04:05 dev swp0 master
00:01:02:03:04:05 dev swp0 master br0
and this also constitutes a better way (assuming a backport to stable
kernels) for user space to determine whether the kernel has the
capability of doing something sane with these or not.
As opposed to "master dynamic" FDB entries, on the current behavior of
which no one currently depends on (which can be deduced from the lack of
kselftests), Ido Schimmel explains that entries with the "extern_learn"
flag (BR_FDB_ADDED_BY_EXT_LEARN) should still be notified to switchdev,
since the spectrum driver listens to them (and this is kind of okay,
because although they are treated identically to "static", they are
expected to not age, and to roam).
Fixes: 6b26b51b1d ("net: bridge: Add support for notifying devices about FDB add/del")
Link: https://lore.kernel.org/netdev/20230327115206.jk5q5l753aoelwus@skbuf/
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/20230418155902.898627-1-vladimir.oltean@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Daniel Borkmann says:
====================
bpf 2023-04-19
We've added 3 non-merge commits during the last 6 day(s) which contain
a total of 3 files changed, 34 insertions(+), 9 deletions(-).
The main changes are:
1) Fix a crash on s390's bpf_arch_text_poke() under a NULL new_addr,
from Ilya Leoshkevich.
2) Fix a bug in BPF verifier's precision tracker, from Daniel Borkmann
and Andrii Nakryiko.
3) Fix a regression in veth's xdp_features which led to a broken BPF CI
selftest, from Lorenzo Bianconi.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf: Fix incorrect verifier pruning due to missing register precision taints
veth: take into account peer device for NETDEV_XDP_ACT_NDO_XMIT xdp_features flag
s390/bpf: Fix bpf_arch_text_poke() with new_addr == NULL
====================
Link: https://lore.kernel.org/r/20230419195847.27060-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2023-04-17 (i40e)
This series contains updates to i40e only.
Alex moves setting of active filters to occur under lock and checks/takes
error path in rebuild if re-initializing the misc interrupt vector
failed.
* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
i40e: fix i40e_setup_misc_vector() error handling
i40e: fix accessing vsi->active_filters without holding lock
====================
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230417205245.1030733-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull misc fixes from Andrew Morton:
"22 hotfixes.
19 are cc:stable and the remainder address issues which were
introduced during this merge cycle, or aren't considered suitable for
-stable backporting.
19 are for MM and the remainder are for other subsystems"
* tag 'mm-hotfixes-stable-2023-04-19-16-36' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (22 commits)
nilfs2: initialize unused bytes in segment summary blocks
mm: page_alloc: skip regions with hugetlbfs pages when allocating 1G pages
mm/mmap: regression fix for unmapped_area{_topdown}
maple_tree: fix mas_empty_area() search
maple_tree: make maple state reusable after mas_empty_area_rev()
mm: kmsan: handle alloc failures in kmsan_ioremap_page_range()
mm: kmsan: handle alloc failures in kmsan_vmap_pages_range_noflush()
tools/Makefile: do missed s/vm/mm/
mm: fix memory leak on mm_init error handling
mm/page_alloc: fix potential deadlock on zonelist_update_seq seqlock
kernel/sys.c: fix and improve control flow in __sys_setres[ug]id()
Revert "userfaultfd: don't fail on unrecognized features"
writeback, cgroup: fix null-ptr-deref write in bdi_split_work_to_wbs
maple_tree: fix a potential memory leak, OOB access, or other unpredictable bug
tools/mm/page_owner_sort.c: fix TGID output when cull=tg is used
mailmap: update jtoppins' entry to reference correct email
mm/mempolicy: fix use-after-free of VMA iterator
mm/huge_memory.c: warn with pr_warn_ratelimited instead of VM_WARN_ON_ONCE_FOLIO
mm/mprotect: fix do_mprotect_pkey() return on error
mm/khugepaged: check again on anon uffd-wp during isolation
...
Pull spi fix from Mark Brown:
"A small fix in the error handling for the rockchip driver, ensuring we
don't leak clock enables if we fail to request the interrupt for the
device"
* tag 'spi-fix-v6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: spi-rockchip: Fix missing unwind goto in rockchip_sfc_probe()
Pull regulator fixes from Mark Brown:
"A few driver specific fixes, one build coverage issue and a couple of
'someone typed in the wrong number' style errors in describing devices
to the subsystem"
* tag 'regulator-fix-v6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: sm5703: Fix missing n_voltages for fixed regulators
regulator: fan53555: Fix wrong TCS_SLEW_MASK
regulator: fan53555: Explicitly include bits header
With CONFIG_INIT_STACK_ALL_ZERO enabled, bindgen passes
-ftrivial-auto-var-init=zero to clang, that triggers the following
error:
error: '-ftrivial-auto-var-init=zero' hasn't been enabled; enable it at your own peril for benchmarking purpose only with '-enable-trivial-auto-var-init-zero-knowing-it-will-be-removed-from-clang'
However, this additional option that is currently required by clang is
deprecated since clang-16 and going to be removed in the future,
likely with clang-18.
So, make sure bindgen is using this extra option if the major version of
the libclang used by bindgen is < 16.
In this way we can enable CONFIG_INIT_STACK_ALL_ZERO with CONFIG_RUST
without triggering any build error.
Link: https://github.com/llvm/llvm-project/issues/44842
Link: https://github.com/llvm/llvm-project/blob/llvmorg-16.0.0-rc2/clang/docs/ReleaseNotes.rst#deprecated-compiler-flags
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
[Changed to < 16, added link and reworded]
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
nm can use "R" or "r" to show read-only data sections, but
scripts/is_rust_module.sh can only recognize "r", so with some versions
of binutils it can fail to detect if a module is a Rust module or not.
Right now we're using this script only to determine if we need to skip
BTF generation (that is disabled globally if CONFIG_RUST is enabled),
but it's still nice to fix this script to do the proper job.
Moreover, with this patch applied I can also relax the constraint of
"RUST depends on !DEBUG_INFO_BTF" and build a kernel with Rust and BTF
enabled at the same time (of course BTF generation is still skipped for
Rust modules).
[ Miguel: The actual reason is likely to be a change on the Rust
compiler between 1.61.0 and 1.62.0:
echo '#[used] static S: () = ();' |
rustup run 1.61.0 rustc --emit=obj --crate-type=lib - &&
nm rust_out.o
echo '#[used] static S: () = ();' |
rustup run 1.62.0 rustc --emit=obj --crate-type=lib - &&
nm rust_out.o
Gives:
0000000000000000 r _ZN8rust_out1S17h48027ce0da975467E
0000000000000000 R _ZN8rust_out1S17h58e1f3d9c0e97cefE
See https://godbolt.org/z/KE6jneoo4. ]
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Reviewed-by: Vincenzo Palazzo <vincenzopalazzodev@gmail.com>
Reviewed-by: Eric Curtin <ecurtin@redhat.com>
Reviewed-by: Martin Rodriguez Reboredo <yakoyoku@gmail.com>
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
Juan Jose et al reported an issue found via fuzzing where the verifier's
pruning logic prematurely marks a program path as safe.
Consider the following program:
0: (b7) r6 = 1024
1: (b7) r7 = 0
2: (b7) r8 = 0
3: (b7) r9 = -2147483648
4: (97) r6 %= 1025
5: (05) goto pc+0
6: (bd) if r6 <= r9 goto pc+2
7: (97) r6 %= 1
8: (b7) r9 = 0
9: (bd) if r6 <= r9 goto pc+1
10: (b7) r6 = 0
11: (b7) r0 = 0
12: (63) *(u32 *)(r10 -4) = r0
13: (18) r4 = 0xffff888103693400 // map_ptr(ks=4,vs=48)
15: (bf) r1 = r4
16: (bf) r2 = r10
17: (07) r2 += -4
18: (85) call bpf_map_lookup_elem#1
19: (55) if r0 != 0x0 goto pc+1
20: (95) exit
21: (77) r6 >>= 10
22: (27) r6 *= 8192
23: (bf) r1 = r0
24: (0f) r0 += r6
25: (79) r3 = *(u64 *)(r0 +0)
26: (7b) *(u64 *)(r1 +0) = r3
27: (95) exit
The verifier treats this as safe, leading to oob read/write access due
to an incorrect verifier conclusion:
func#0 @0
0: R1=ctx(off=0,imm=0) R10=fp0
0: (b7) r6 = 1024 ; R6_w=1024
1: (b7) r7 = 0 ; R7_w=0
2: (b7) r8 = 0 ; R8_w=0
3: (b7) r9 = -2147483648 ; R9_w=-2147483648
4: (97) r6 %= 1025 ; R6_w=scalar()
5: (05) goto pc+0
6: (bd) if r6 <= r9 goto pc+2 ; R6_w=scalar(umin=18446744071562067969,var_off=(0xffffffff00000000; 0xffffffff)) R9_w=-2147483648
7: (97) r6 %= 1 ; R6_w=scalar()
8: (b7) r9 = 0 ; R9=0
9: (bd) if r6 <= r9 goto pc+1 ; R6=scalar(umin=1) R9=0
10: (b7) r6 = 0 ; R6_w=0
11: (b7) r0 = 0 ; R0_w=0
12: (63) *(u32 *)(r10 -4) = r0
last_idx 12 first_idx 9
regs=1 stack=0 before 11: (b7) r0 = 0
13: R0_w=0 R10=fp0 fp-8=0000????
13: (18) r4 = 0xffff8ad3886c2a00 ; R4_w=map_ptr(off=0,ks=4,vs=48,imm=0)
15: (bf) r1 = r4 ; R1_w=map_ptr(off=0,ks=4,vs=48,imm=0) R4_w=map_ptr(off=0,ks=4,vs=48,imm=0)
16: (bf) r2 = r10 ; R2_w=fp0 R10=fp0
17: (07) r2 += -4 ; R2_w=fp-4
18: (85) call bpf_map_lookup_elem#1 ; R0=map_value_or_null(id=1,off=0,ks=4,vs=48,imm=0)
19: (55) if r0 != 0x0 goto pc+1 ; R0=0
20: (95) exit
from 19 to 21: R0=map_value(off=0,ks=4,vs=48,imm=0) R6=0 R7=0 R8=0 R9=0 R10=fp0 fp-8=mmmm????
21: (77) r6 >>= 10 ; R6_w=0
22: (27) r6 *= 8192 ; R6_w=0
23: (bf) r1 = r0 ; R0=map_value(off=0,ks=4,vs=48,imm=0) R1_w=map_value(off=0,ks=4,vs=48,imm=0)
24: (0f) r0 += r6
last_idx 24 first_idx 19
regs=40 stack=0 before 23: (bf) r1 = r0
regs=40 stack=0 before 22: (27) r6 *= 8192
regs=40 stack=0 before 21: (77) r6 >>= 10
regs=40 stack=0 before 19: (55) if r0 != 0x0 goto pc+1
parent didn't have regs=40 stack=0 marks: R0_rw=map_value_or_null(id=1,off=0,ks=4,vs=48,imm=0) R6_rw=P0 R7=0 R8=0 R9=0 R10=fp0 fp-8=mmmm????
last_idx 18 first_idx 9
regs=40 stack=0 before 18: (85) call bpf_map_lookup_elem#1
regs=40 stack=0 before 17: (07) r2 += -4
regs=40 stack=0 before 16: (bf) r2 = r10
regs=40 stack=0 before 15: (bf) r1 = r4
regs=40 stack=0 before 13: (18) r4 = 0xffff8ad3886c2a00
regs=40 stack=0 before 12: (63) *(u32 *)(r10 -4) = r0
regs=40 stack=0 before 11: (b7) r0 = 0
regs=40 stack=0 before 10: (b7) r6 = 0
25: (79) r3 = *(u64 *)(r0 +0) ; R0_w=map_value(off=0,ks=4,vs=48,imm=0) R3_w=scalar()
26: (7b) *(u64 *)(r1 +0) = r3 ; R1_w=map_value(off=0,ks=4,vs=48,imm=0) R3_w=scalar()
27: (95) exit
from 9 to 11: R1=ctx(off=0,imm=0) R6=0 R7=0 R8=0 R9=0 R10=fp0
11: (b7) r0 = 0 ; R0_w=0
12: (63) *(u32 *)(r10 -4) = r0
last_idx 12 first_idx 11
regs=1 stack=0 before 11: (b7) r0 = 0
13: R0_w=0 R10=fp0 fp-8=0000????
13: (18) r4 = 0xffff8ad3886c2a00 ; R4_w=map_ptr(off=0,ks=4,vs=48,imm=0)
15: (bf) r1 = r4 ; R1_w=map_ptr(off=0,ks=4,vs=48,imm=0) R4_w=map_ptr(off=0,ks=4,vs=48,imm=0)
16: (bf) r2 = r10 ; R2_w=fp0 R10=fp0
17: (07) r2 += -4 ; R2_w=fp-4
18: (85) call bpf_map_lookup_elem#1
frame 0: propagating r6
last_idx 19 first_idx 11
regs=40 stack=0 before 18: (85) call bpf_map_lookup_elem#1
regs=40 stack=0 before 17: (07) r2 += -4
regs=40 stack=0 before 16: (bf) r2 = r10
regs=40 stack=0 before 15: (bf) r1 = r4
regs=40 stack=0 before 13: (18) r4 = 0xffff8ad3886c2a00
regs=40 stack=0 before 12: (63) *(u32 *)(r10 -4) = r0
regs=40 stack=0 before 11: (b7) r0 = 0
parent didn't have regs=40 stack=0 marks: R1=ctx(off=0,imm=0) R6_r=P0 R7=0 R8=0 R9=0 R10=fp0
last_idx 9 first_idx 9
regs=40 stack=0 before 9: (bd) if r6 <= r9 goto pc+1
parent didn't have regs=40 stack=0 marks: R1=ctx(off=0,imm=0) R6_rw=Pscalar() R7_w=0 R8_w=0 R9_rw=0 R10=fp0
last_idx 8 first_idx 0
regs=40 stack=0 before 8: (b7) r9 = 0
regs=40 stack=0 before 7: (97) r6 %= 1
regs=40 stack=0 before 6: (bd) if r6 <= r9 goto pc+2
regs=40 stack=0 before 5: (05) goto pc+0
regs=40 stack=0 before 4: (97) r6 %= 1025
regs=40 stack=0 before 3: (b7) r9 = -2147483648
regs=40 stack=0 before 2: (b7) r8 = 0
regs=40 stack=0 before 1: (b7) r7 = 0
regs=40 stack=0 before 0: (b7) r6 = 1024
19: safe
frame 0: propagating r6
last_idx 9 first_idx 0
regs=40 stack=0 before 6: (bd) if r6 <= r9 goto pc+2
regs=40 stack=0 before 5: (05) goto pc+0
regs=40 stack=0 before 4: (97) r6 %= 1025
regs=40 stack=0 before 3: (b7) r9 = -2147483648
regs=40 stack=0 before 2: (b7) r8 = 0
regs=40 stack=0 before 1: (b7) r7 = 0
regs=40 stack=0 before 0: (b7) r6 = 1024
from 6 to 9: safe
verification time 110 usec
stack depth 4
processed 36 insns (limit 1000000) max_states_per_insn 0 total_states 3 peak_states 3 mark_read 2
The verifier considers this program as safe by mistakenly pruning unsafe
code paths. In the above func#0, code lines 0-10 are of interest. In line
0-3 registers r6 to r9 are initialized with known scalar values. In line 4
the register r6 is reset to an unknown scalar given the verifier does not
track modulo operations. Due to this, the verifier can also not determine
precisely which branches in line 6 and 9 are taken, therefore it needs to
explore them both.
As can be seen, the verifier starts with exploring the false/fall-through
paths first. The 'from 19 to 21' path has both r6=0 and r9=0 and the pointer
arithmetic on r0 += r6 is therefore considered safe. Given the arithmetic,
r6 is correctly marked for precision tracking where backtracking kicks in
where it walks back the current path all the way where r6 was set to 0 in
the fall-through branch.
Next, the pruning logics pops the path 'from 9 to 11' from the stack. Also
here, the state of the registers is the same, that is, r6=0 and r9=0, so
that at line 19 the path can be pruned as it is considered safe. It is
interesting to note that the conditional in line 9 turned r6 into a more
precise state, that is, in the fall-through path at the beginning of line
10, it is R6=scalar(umin=1), and in the branch-taken path (which is analyzed
here) at the beginning of line 11, r6 turned into a known const r6=0 as
r9=0 prior to that and therefore (unsigned) r6 <= 0 concludes that r6 must
be 0 (**):
[...] ; R6_w=scalar()
9: (bd) if r6 <= r9 goto pc+1 ; R6=scalar(umin=1) R9=0
[...]
from 9 to 11: R1=ctx(off=0,imm=0) R6=0 R7=0 R8=0 R9=0 R10=fp0
[...]
The next path is 'from 6 to 9'. The verifier considers the old and current
state equivalent, and therefore prunes the search incorrectly. Looking into
the two states which are being compared by the pruning logic at line 9, the
old state consists of R6_rwD=Pscalar() R9_rwD=0 R10=fp0 and the new state
consists of R1=ctx(off=0,imm=0) R6_w=scalar(umax=18446744071562067968)
R7_w=0 R8_w=0 R9_w=-2147483648 R10=fp0. While r6 had the reg->precise flag
correctly set in the old state, r9 did not. Both r6'es are considered as
equivalent given the old one is a superset of the current, more precise one,
however, r9's actual values (0 vs 0x80000000) mismatch. Given the old r9
did not have reg->precise flag set, the verifier does not consider the
register as contributing to the precision state of r6, and therefore it
considered both r9 states as equivalent. However, for this specific pruned
path (which is also the actual path taken at runtime), register r6 will be
0x400 and r9 0x80000000 when reaching line 21, thus oob-accessing the map.
The purpose of precision tracking is to initially mark registers (including
spilled ones) as imprecise to help verifier's pruning logic finding equivalent
states it can then prune if they don't contribute to the program's safety
aspects. For example, if registers are used for pointer arithmetic or to pass
constant length to a helper, then the verifier sets reg->precise flag and
backtracks the BPF program instruction sequence and chain of verifier states
to ensure that the given register or stack slot including their dependencies
are marked as precisely tracked scalar. This also includes any other registers
and slots that contribute to a tracked state of given registers/stack slot.
This backtracking relies on recorded jmp_history and is able to traverse
entire chain of parent states. This process ends only when all the necessary
registers/slots and their transitive dependencies are marked as precise.
The backtrack_insn() is called from the current instruction up to the first
instruction, and its purpose is to compute a bitmask of registers and stack
slots that need precision tracking in the parent's verifier state. For example,
if a current instruction is r6 = r7, then r6 needs precision after this
instruction and r7 needs precision before this instruction, that is, in the
parent state. Hence for the latter r7 is marked and r6 unmarked.
For the class of jmp/jmp32 instructions, backtrack_insn() today only looks
at call and exit instructions and for all other conditionals the masks
remain as-is. However, in the given situation register r6 has a dependency
on r9 (as described above in **), so also that one needs to be marked for
precision tracking. In other words, if an imprecise register influences a
precise one, then the imprecise register should also be marked precise.
Meaning, in the parent state both dest and src register need to be tracked
for precision and therefore the marking must be more conservative by setting
reg->precise flag for both. The precision propagation needs to cover both
for the conditional: if the src reg was marked but not the dst reg and vice
versa.
After the fix the program is correctly rejected:
func#0 @0
0: R1=ctx(off=0,imm=0) R10=fp0
0: (b7) r6 = 1024 ; R6_w=1024
1: (b7) r7 = 0 ; R7_w=0
2: (b7) r8 = 0 ; R8_w=0
3: (b7) r9 = -2147483648 ; R9_w=-2147483648
4: (97) r6 %= 1025 ; R6_w=scalar()
5: (05) goto pc+0
6: (bd) if r6 <= r9 goto pc+2 ; R6_w=scalar(umin=18446744071562067969,var_off=(0xffffffff80000000; 0x7fffffff),u32_min=-2147483648) R9_w=-2147483648
7: (97) r6 %= 1 ; R6_w=scalar()
8: (b7) r9 = 0 ; R9=0
9: (bd) if r6 <= r9 goto pc+1 ; R6=scalar(umin=1) R9=0
10: (b7) r6 = 0 ; R6_w=0
11: (b7) r0 = 0 ; R0_w=0
12: (63) *(u32 *)(r10 -4) = r0
last_idx 12 first_idx 9
regs=1 stack=0 before 11: (b7) r0 = 0
13: R0_w=0 R10=fp0 fp-8=0000????
13: (18) r4 = 0xffff9290dc5bfe00 ; R4_w=map_ptr(off=0,ks=4,vs=48,imm=0)
15: (bf) r1 = r4 ; R1_w=map_ptr(off=0,ks=4,vs=48,imm=0) R4_w=map_ptr(off=0,ks=4,vs=48,imm=0)
16: (bf) r2 = r10 ; R2_w=fp0 R10=fp0
17: (07) r2 += -4 ; R2_w=fp-4
18: (85) call bpf_map_lookup_elem#1 ; R0=map_value_or_null(id=1,off=0,ks=4,vs=48,imm=0)
19: (55) if r0 != 0x0 goto pc+1 ; R0=0
20: (95) exit
from 19 to 21: R0=map_value(off=0,ks=4,vs=48,imm=0) R6=0 R7=0 R8=0 R9=0 R10=fp0 fp-8=mmmm????
21: (77) r6 >>= 10 ; R6_w=0
22: (27) r6 *= 8192 ; R6_w=0
23: (bf) r1 = r0 ; R0=map_value(off=0,ks=4,vs=48,imm=0) R1_w=map_value(off=0,ks=4,vs=48,imm=0)
24: (0f) r0 += r6
last_idx 24 first_idx 19
regs=40 stack=0 before 23: (bf) r1 = r0
regs=40 stack=0 before 22: (27) r6 *= 8192
regs=40 stack=0 before 21: (77) r6 >>= 10
regs=40 stack=0 before 19: (55) if r0 != 0x0 goto pc+1
parent didn't have regs=40 stack=0 marks: R0_rw=map_value_or_null(id=1,off=0,ks=4,vs=48,imm=0) R6_rw=P0 R7=0 R8=0 R9=0 R10=fp0 fp-8=mmmm????
last_idx 18 first_idx 9
regs=40 stack=0 before 18: (85) call bpf_map_lookup_elem#1
regs=40 stack=0 before 17: (07) r2 += -4
regs=40 stack=0 before 16: (bf) r2 = r10
regs=40 stack=0 before 15: (bf) r1 = r4
regs=40 stack=0 before 13: (18) r4 = 0xffff9290dc5bfe00
regs=40 stack=0 before 12: (63) *(u32 *)(r10 -4) = r0
regs=40 stack=0 before 11: (b7) r0 = 0
regs=40 stack=0 before 10: (b7) r6 = 0
25: (79) r3 = *(u64 *)(r0 +0) ; R0_w=map_value(off=0,ks=4,vs=48,imm=0) R3_w=scalar()
26: (7b) *(u64 *)(r1 +0) = r3 ; R1_w=map_value(off=0,ks=4,vs=48,imm=0) R3_w=scalar()
27: (95) exit
from 9 to 11: R1=ctx(off=0,imm=0) R6=0 R7=0 R8=0 R9=0 R10=fp0
11: (b7) r0 = 0 ; R0_w=0
12: (63) *(u32 *)(r10 -4) = r0
last_idx 12 first_idx 11
regs=1 stack=0 before 11: (b7) r0 = 0
13: R0_w=0 R10=fp0 fp-8=0000????
13: (18) r4 = 0xffff9290dc5bfe00 ; R4_w=map_ptr(off=0,ks=4,vs=48,imm=0)
15: (bf) r1 = r4 ; R1_w=map_ptr(off=0,ks=4,vs=48,imm=0) R4_w=map_ptr(off=0,ks=4,vs=48,imm=0)
16: (bf) r2 = r10 ; R2_w=fp0 R10=fp0
17: (07) r2 += -4 ; R2_w=fp-4
18: (85) call bpf_map_lookup_elem#1
frame 0: propagating r6
last_idx 19 first_idx 11
regs=40 stack=0 before 18: (85) call bpf_map_lookup_elem#1
regs=40 stack=0 before 17: (07) r2 += -4
regs=40 stack=0 before 16: (bf) r2 = r10
regs=40 stack=0 before 15: (bf) r1 = r4
regs=40 stack=0 before 13: (18) r4 = 0xffff9290dc5bfe00
regs=40 stack=0 before 12: (63) *(u32 *)(r10 -4) = r0
regs=40 stack=0 before 11: (b7) r0 = 0
parent didn't have regs=40 stack=0 marks: R1=ctx(off=0,imm=0) R6_r=P0 R7=0 R8=0 R9=0 R10=fp0
last_idx 9 first_idx 9
regs=40 stack=0 before 9: (bd) if r6 <= r9 goto pc+1
parent didn't have regs=240 stack=0 marks: R1=ctx(off=0,imm=0) R6_rw=Pscalar() R7_w=0 R8_w=0 R9_rw=P0 R10=fp0
last_idx 8 first_idx 0
regs=240 stack=0 before 8: (b7) r9 = 0
regs=40 stack=0 before 7: (97) r6 %= 1
regs=40 stack=0 before 6: (bd) if r6 <= r9 goto pc+2
regs=240 stack=0 before 5: (05) goto pc+0
regs=240 stack=0 before 4: (97) r6 %= 1025
regs=240 stack=0 before 3: (b7) r9 = -2147483648
regs=40 stack=0 before 2: (b7) r8 = 0
regs=40 stack=0 before 1: (b7) r7 = 0
regs=40 stack=0 before 0: (b7) r6 = 1024
19: safe
from 6 to 9: R1=ctx(off=0,imm=0) R6_w=scalar(umax=18446744071562067968) R7_w=0 R8_w=0 R9_w=-2147483648 R10=fp0
9: (bd) if r6 <= r9 goto pc+1
last_idx 9 first_idx 0
regs=40 stack=0 before 6: (bd) if r6 <= r9 goto pc+2
regs=240 stack=0 before 5: (05) goto pc+0
regs=240 stack=0 before 4: (97) r6 %= 1025
regs=240 stack=0 before 3: (b7) r9 = -2147483648
regs=40 stack=0 before 2: (b7) r8 = 0
regs=40 stack=0 before 1: (b7) r7 = 0
regs=40 stack=0 before 0: (b7) r6 = 1024
last_idx 9 first_idx 0
regs=200 stack=0 before 6: (bd) if r6 <= r9 goto pc+2
regs=240 stack=0 before 5: (05) goto pc+0
regs=240 stack=0 before 4: (97) r6 %= 1025
regs=240 stack=0 before 3: (b7) r9 = -2147483648
regs=40 stack=0 before 2: (b7) r8 = 0
regs=40 stack=0 before 1: (b7) r7 = 0
regs=40 stack=0 before 0: (b7) r6 = 1024
11: R6=scalar(umax=18446744071562067968) R9=-2147483648
11: (b7) r0 = 0 ; R0_w=0
12: (63) *(u32 *)(r10 -4) = r0
last_idx 12 first_idx 11
regs=1 stack=0 before 11: (b7) r0 = 0
13: R0_w=0 R10=fp0 fp-8=0000????
13: (18) r4 = 0xffff9290dc5bfe00 ; R4_w=map_ptr(off=0,ks=4,vs=48,imm=0)
15: (bf) r1 = r4 ; R1_w=map_ptr(off=0,ks=4,vs=48,imm=0) R4_w=map_ptr(off=0,ks=4,vs=48,imm=0)
16: (bf) r2 = r10 ; R2_w=fp0 R10=fp0
17: (07) r2 += -4 ; R2_w=fp-4
18: (85) call bpf_map_lookup_elem#1 ; R0_w=map_value_or_null(id=3,off=0,ks=4,vs=48,imm=0)
19: (55) if r0 != 0x0 goto pc+1 ; R0_w=0
20: (95) exit
from 19 to 21: R0=map_value(off=0,ks=4,vs=48,imm=0) R6=scalar(umax=18446744071562067968) R7=0 R8=0 R9=-2147483648 R10=fp0 fp-8=mmmm????
21: (77) r6 >>= 10 ; R6_w=scalar(umax=18014398507384832,var_off=(0x0; 0x3fffffffffffff))
22: (27) r6 *= 8192 ; R6_w=scalar(smax=9223372036854767616,umax=18446744073709543424,var_off=(0x0; 0xffffffffffffe000),s32_max=2147475456,u32_max=-8192)
23: (bf) r1 = r0 ; R0=map_value(off=0,ks=4,vs=48,imm=0) R1_w=map_value(off=0,ks=4,vs=48,imm=0)
24: (0f) r0 += r6
last_idx 24 first_idx 21
regs=40 stack=0 before 23: (bf) r1 = r0
regs=40 stack=0 before 22: (27) r6 *= 8192
regs=40 stack=0 before 21: (77) r6 >>= 10
parent didn't have regs=40 stack=0 marks: R0_rw=map_value(off=0,ks=4,vs=48,imm=0) R6_r=Pscalar(umax=18446744071562067968) R7=0 R8=0 R9=-2147483648 R10=fp0 fp-8=mmmm????
last_idx 19 first_idx 11
regs=40 stack=0 before 19: (55) if r0 != 0x0 goto pc+1
regs=40 stack=0 before 18: (85) call bpf_map_lookup_elem#1
regs=40 stack=0 before 17: (07) r2 += -4
regs=40 stack=0 before 16: (bf) r2 = r10
regs=40 stack=0 before 15: (bf) r1 = r4
regs=40 stack=0 before 13: (18) r4 = 0xffff9290dc5bfe00
regs=40 stack=0 before 12: (63) *(u32 *)(r10 -4) = r0
regs=40 stack=0 before 11: (b7) r0 = 0
parent didn't have regs=40 stack=0 marks: R1=ctx(off=0,imm=0) R6_rw=Pscalar(umax=18446744071562067968) R7_w=0 R8_w=0 R9_w=-2147483648 R10=fp0
last_idx 9 first_idx 0
regs=40 stack=0 before 9: (bd) if r6 <= r9 goto pc+1
regs=240 stack=0 before 6: (bd) if r6 <= r9 goto pc+2
regs=240 stack=0 before 5: (05) goto pc+0
regs=240 stack=0 before 4: (97) r6 %= 1025
regs=240 stack=0 before 3: (b7) r9 = -2147483648
regs=40 stack=0 before 2: (b7) r8 = 0
regs=40 stack=0 before 1: (b7) r7 = 0
regs=40 stack=0 before 0: (b7) r6 = 1024
math between map_value pointer and register with unbounded min value is not allowed
verification time 886 usec
stack depth 4
processed 49 insns (limit 1000000) max_states_per_insn 1 total_states 5 peak_states 5 mark_read 2
Fixes: b5dc0163d8 ("bpf: precise scalar_value tracking")
Reported-by: Juan Jose Lopez Jaimez <jjlopezjaimez@google.com>
Reported-by: Meador Inge <meadori@google.com>
Reported-by: Simon Scannell <simonscannell@google.com>
Reported-by: Nenad Stojanovski <thenenadx@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Co-developed-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Reviewed-by: Juan Jose Lopez Jaimez <jjlopezjaimez@google.com>
Reviewed-by: Meador Inge <meadori@google.com>
Reviewed-by: Simon Scannell <simonscannell@google.com>
Pull nfsd fixes from Chuck Lever:
- Address two issues with the new GSS krb5 Kunit tests
* tag 'nfsd-6.3-6' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
SUNRPC: Fix failures of checksum Kunit tests
sunrpc: Fix RFC6803 encryption test
Pull LoongArch fixes from Huacai Chen:
"Some bug fixes, some build fixes, a comment fix and a trivial cleanup"
* tag 'loongarch-fixes-6.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
tools/loongarch: Use __SIZEOF_LONG__ to define __BITS_PER_LONG
LoongArch: Replace hard-coded values in comments with VALEN
LoongArch: Clean up plat_swiotlb_setup() related code
LoongArch: Check unwind_error() in arch_stack_walk()
LoongArch: Adjust user_regset_copyin parameter to the correct offset
LoongArch: Adjust user_watch_state for explicit alignment
LoongArch: module: set section addresses to 0x0
LoongArch: Mark 3 symbol exports as non-GPL
LoongArch: Enable PG when wakeup from suspend
LoongArch: Fix _CONST64_(x) as unsigned
LoongArch: Fix build error if CONFIG_SUSPEND is not set
LoongArch: Fix probing of the CRC32 feature
LoongArch: Make WriteCombine configurable for ioremap()
dma_request_slave_channel() may return NULL which will lead to
NULL pointer dereference error in 'tmp_chan->private'.
Correct this behaviour by, first, switching from deprecated function
dma_request_slave_channel() to dma_request_chan(). Secondly, enable
sanity check for the resuling value of dma_request_chan().
Also, fix description that follows the enacted changes and that
concerns the use of dma_request_slave_channel().
Fixes: 706e2c8811 ("ASoC: fsl_asrc_dma: Reuse the dma channel if available in Back-End")
Co-developed-by: Natalia Petrova <n.petrova@fintech.ru>
Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru>
Acked-by: Shengjiu Wang <shengjiu.wang@gmail.com>
Link: https://lore.kernel.org/r/20230417133242.53339-1-n.zhandarovich@fintech.ru
Signed-off-by: Mark Brown <broonie@kernel.org>
It looks like the dependency got added accidentally in commit a553260618
("[PATCH] ISA DMA Kconfig fixes - part 3"). Unlike the previously removed
dmascc driver, the scc driver never used DMA.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
During initialization the driver issues a reset command via its command
interface in order to remove previous configuration from the device.
After issuing the reset, the driver waits for 200ms before polling on
the "system_status" register using memory-mapped IO until the device
reaches a ready state (0x5E). The wait is necessary because the reset
command only triggers the reset, but the reset itself happens
asynchronously. If the driver starts polling too soon, the read of the
"system_status" register will never return and the system will crash
[1].
The issue was discovered when the device was flashed with a development
firmware version where the reset routine took longer to complete. The
issue was fixed in the firmware, but it exposed the fact that the
current wait time is borderline.
Fix by increasing the wait time from 200ms to 400ms. With this patch and
the buggy firmware version, the issue did not reproduce in 10 reboots
whereas without the patch the issue is reproduced quite consistently.
[1]
mce: CPUs not responding to MCE broadcast (may include false positives): 0,4
mce: CPUs not responding to MCE broadcast (may include false positives): 0,4
Kernel panic - not syncing: Timeout: Not all CPUs entered broadcast exception handler
Shutting down cpus with NMI
Kernel Offset: 0x12000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
Fixes: ac004e8416 ("mlxsw: pci: Wait longer before accessing the device after reset")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Unfortunately, the tester gave a weak feedback (working/non-working) for
this case. After the double confirmation, this change is not really required.
The standard code with alc269_fallback_pin_fixup_tbl should work on this
hardware.
Fixes: 5911d78fab ("ALSA: hda/realtek: Improve support for Dell Precision 3260")
Fixes: 5f4efc9dfc ("ALSA: hda/realtek: Fix support for Dell Precision 3260")
Signed-off-by: Jaroslav Kysela <perex@perex.cz>
Link: https://lore.kernel.org/r/20230419081121.304846-1-perex@perex.cz
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Matthieu Baerts says:
====================
mptcp: fixes around listening sockets and the MPTCP worker
Christoph Paasch reported a couple of issues found by syzkaller and
linked to operations done by the MPTCP worker on (un)accepted sockets.
Fixing these issues was not obvious and rather complex but Paolo Abeni
nicely managed to propose these excellent patches that seem to satisfy
syzkaller.
Patch 1 partially reverts a recent fix but while still providing a
solution for the previous issue, it also prevents the MPTCP worker from
running concurrently with inet_csk_listen_stop(). A warning is then
avoided. The partially reverted patch has been introduced in v6.3-rc3,
backported up to v6.1 and fixing an issue visible from v5.18.
Patch 2 prevents the MPTCP worker to race with mptcp_accept() causing a
UaF when a fallback to TCP is done while in parallel, the socket is
being accepted by the userspace. This is also a fix of a previous fix
introduced in v6.3-rc3, backported up to v6.1 but here fixing an issue
that is in theory there from v5.7. There is no need to backport it up
to here as it looks like it is only visible later, around v5.18, see the
previous cover-letter linked to this original fix.
====================
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
This patch fixes a missing 8 byte for the header size calculation. The
ipv6_rpl_srh_size() is used to check a skb_pull() on skb->data which
points to skb_transport_header(). Currently we only check on the
calculated addresses fields using CmprI and CmprE fields, see:
https://www.rfc-editor.org/rfc/rfc6554#section-3
there is however a missing 8 byte inside the calculation which stands
for the fields before the addresses field. Those 8 bytes are represented
by sizeof(struct ipv6_rpl_sr_hdr) expression.
Fixes: 8610c7c6e3 ("net: ipv6: add support for rpl sr exthdr")
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Reported-by: maxpl0it <maxpl0it@protonmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
When vmxnet3_rq_create() fails to allocate rq->data_ring.base due to page
allocation failure, subsequent call to vmxnet3_rq_rx_complete() can result in
NULL pointer dereference.
To fix this bug, check not only that rxDataRingUsed is true but also that
adapter->rxdataring_enabled is true before calling memcpy() in
vmxnet3_rq_rx_complete().
[1728352.477993] ethtool: page allocation failure: order:9, mode:0x6000c0(GFP_KERNEL), nodemask=(null),cpuset=/,mems_allowed=0
...
[1728352.478009] Call Trace:
[1728352.478028] dump_stack+0x41/0x60
[1728352.478035] warn_alloc.cold.120+0x7b/0x11b
[1728352.478038] ? _cond_resched+0x15/0x30
[1728352.478042] ? __alloc_pages_direct_compact+0x15f/0x170
[1728352.478043] __alloc_pages_slowpath+0xcd3/0xd10
[1728352.478047] __alloc_pages_nodemask+0x2e2/0x320
[1728352.478049] __dma_direct_alloc_pages.constprop.25+0x8a/0x120
[1728352.478053] dma_direct_alloc+0x5a/0x2a0
[1728352.478056] vmxnet3_rq_create.part.57+0x17c/0x1f0 [vmxnet3]
...
[1728352.478188] vmxnet3 0000:0b:00.0 ens192: rx data ring will be disabled
...
[1728352.515347] BUG: unable to handle kernel NULL pointer dereference at 0000000000000034
...
[1728352.515440] RIP: 0010:memcpy_orig+0x54/0x130
...
[1728352.515655] Call Trace:
[1728352.515665] <IRQ>
[1728352.515672] vmxnet3_rq_rx_complete+0x419/0xef0 [vmxnet3]
[1728352.515690] vmxnet3_poll_rx_only+0x31/0xa0 [vmxnet3]
...
Signed-off-by: Seiji Nishikawa <snishika@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a net device is put administratively up, its 'IFF_UP' flag is set
(if not set already) and a 'NETDEV_UP' notification is emitted, which
causes the 8021q driver to add VLAN ID 0 on the device. The reverse
happens when a net device is put administratively down.
When changing the type of a bond to Ethernet, its 'IFF_UP' flag is
incorrectly cleared, resulting in the kernel skipping the above process
and VLAN ID 0 being leaked [1].
Fix by restoring the flag when changing the type to Ethernet, in a
similar fashion to the restoration of the 'IFF_SLAVE' flag.
The issue can be reproduced using the script in [2], with example out
before and after the fix in [3].
[1]
unreferenced object 0xffff888103479900 (size 256):
comm "ip", pid 329, jiffies 4294775225 (age 28.561s)
hex dump (first 32 bytes):
00 a0 0c 15 81 88 ff ff 00 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<ffffffff81a6051a>] kmalloc_trace+0x2a/0xe0
[<ffffffff8406426c>] vlan_vid_add+0x30c/0x790
[<ffffffff84068e21>] vlan_device_event+0x1491/0x21a0
[<ffffffff81440c8e>] notifier_call_chain+0xbe/0x1f0
[<ffffffff8372383a>] call_netdevice_notifiers_info+0xba/0x150
[<ffffffff837590f2>] __dev_notify_flags+0x132/0x2e0
[<ffffffff8375ad9f>] dev_change_flags+0x11f/0x180
[<ffffffff8379af36>] do_setlink+0xb96/0x4060
[<ffffffff837adf6a>] __rtnl_newlink+0xc0a/0x18a0
[<ffffffff837aec6c>] rtnl_newlink+0x6c/0xa0
[<ffffffff837ac64e>] rtnetlink_rcv_msg+0x43e/0xe00
[<ffffffff839a99e0>] netlink_rcv_skb+0x170/0x440
[<ffffffff839a738f>] netlink_unicast+0x53f/0x810
[<ffffffff839a7fcb>] netlink_sendmsg+0x96b/0xe90
[<ffffffff8369d12f>] ____sys_sendmsg+0x30f/0xa70
[<ffffffff836a6d7a>] ___sys_sendmsg+0x13a/0x1e0
unreferenced object 0xffff88810f6a83e0 (size 32):
comm "ip", pid 329, jiffies 4294775225 (age 28.561s)
hex dump (first 32 bytes):
a0 99 47 03 81 88 ff ff a0 99 47 03 81 88 ff ff ..G.......G.....
81 00 00 00 01 00 00 00 cc cc cc cc cc cc cc cc ................
backtrace:
[<ffffffff81a6051a>] kmalloc_trace+0x2a/0xe0
[<ffffffff84064369>] vlan_vid_add+0x409/0x790
[<ffffffff84068e21>] vlan_device_event+0x1491/0x21a0
[<ffffffff81440c8e>] notifier_call_chain+0xbe/0x1f0
[<ffffffff8372383a>] call_netdevice_notifiers_info+0xba/0x150
[<ffffffff837590f2>] __dev_notify_flags+0x132/0x2e0
[<ffffffff8375ad9f>] dev_change_flags+0x11f/0x180
[<ffffffff8379af36>] do_setlink+0xb96/0x4060
[<ffffffff837adf6a>] __rtnl_newlink+0xc0a/0x18a0
[<ffffffff837aec6c>] rtnl_newlink+0x6c/0xa0
[<ffffffff837ac64e>] rtnetlink_rcv_msg+0x43e/0xe00
[<ffffffff839a99e0>] netlink_rcv_skb+0x170/0x440
[<ffffffff839a738f>] netlink_unicast+0x53f/0x810
[<ffffffff839a7fcb>] netlink_sendmsg+0x96b/0xe90
[<ffffffff8369d12f>] ____sys_sendmsg+0x30f/0xa70
[<ffffffff836a6d7a>] ___sys_sendmsg+0x13a/0x1e0
[2]
ip link add name t-nlmon type nlmon
ip link add name t-dummy type dummy
ip link add name t-bond type bond mode active-backup
ip link set dev t-bond up
ip link set dev t-nlmon master t-bond
ip link set dev t-nlmon nomaster
ip link show dev t-bond
ip link set dev t-dummy master t-bond
ip link show dev t-bond
ip link del dev t-bond
ip link del dev t-dummy
ip link del dev t-nlmon
[3]
Before:
12: t-bond: <NO-CARRIER,BROADCAST,MULTICAST,MASTER,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000
link/netlink
12: t-bond: <BROADCAST,MULTICAST,MASTER,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
link/ether 46:57:39:a4:46:a2 brd ff:ff:ff:ff:ff:ff
After:
12: t-bond: <NO-CARRIER,BROADCAST,MULTICAST,MASTER,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000
link/netlink
12: t-bond: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
link/ether 66:48:7b:74:b6:8a brd ff:ff:ff:ff:ff:ff
Fixes: e36b9d16c6 ("bonding: clean muticast addresses when device changes type")
Fixes: 75c78500dd ("bonding: remap muticast addresses without using dev_close() and dev_open()")
Fixes: 9ec7eb60dc ("bonding: restore IFF_MASTER/SLAVE flags on bond enslave ether type change")
Reported-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
Link: https://lore.kernel.org/netdev/78a8a03b-6070-3e6b-5042-f848dab16fb8@alu.unizg.hr/
Tested-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Although __SIZEOF_POINTER__ is equal to _SIZEOF_LONG__ on LoongArch,
it is better to use __SIZEOF_LONG__ to define __BITS_PER_LONG to keep
consistent between arch/loongarch/include/uapi/asm/bitsperlong.h and
tools/arch/loongarch/include/uapi/asm/bitsperlong.h.
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
After commit c78c43fe7d ("LoongArch: Use acpi_arch_dma_setup() and
remove ARCH_HAS_PHYS_TO_DMA"), plat_swiotlb_setup() has been deleted,
so clean up the related code.
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
We can see the following messages with CONFIG_PROVE_LOCKING=y on
LoongArch:
BUG: MAX_STACK_TRACE_ENTRIES too low!
turning off the locking correctness validator.
This is because stack_trace_save() returns a big value after call
arch_stack_walk(), here is the call trace:
save_trace()
stack_trace_save()
arch_stack_walk()
stack_trace_consume_entry()
arch_stack_walk() should return immediately if unwind_next_frame()
failed, no need to do the useless loops to increase the value of c->len
in stack_trace_consume_entry(), then we can fix the above problem.
Cc: stable@vger.kernel.org
Reported-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/all/8a44ad71-68d2-4926-892f-72bfc7a67e2a@roeck-us.net/
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
This is done in order to easily calculate the number of breakpoints in
hw_break_get()/hw_break_set().
Signed-off-by: Qing Zhang <zhangqing@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
1) Unbreak br_netfilter physdev match support, from Florian Westphal.
2) Use GFP_KERNEL_ACCOUNT for stateful/policy objects, from Chen Aotian.
3) Use IS_ENABLED() in nf_reset_trace(), from Florian Westphal.
4) Fix validation of catch-all set element.
5) Tighten requirements for catch-all set elements.
* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nf_tables: tighten netlink attribute requirements for catch-all elements
netfilter: nf_tables: validate catch-all set elements
netfilter: nf_tables: fix ifdef to also consider nf_tables=m
netfilter: nf_tables: Modify nla_memdup's flag to GFP_KERNEL_ACCOUNT
netfilter: br_netfilter: fix recent physdev match breakage
====================
Link: https://lore.kernel.org/r/20230418145048.67270-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If read() is done in an unbuffered manner, such that, say,
cifs_strict_readv() goes through cifs_user_readv() and thence
__cifs_readv(), it doesn't recognise the EOF and keeps indicating to
userspace that it returning full buffers of data.
This is due to ctx->iter being advanced in cifs_send_async_read() as the
buffer is split up amongst a number of rdata objects. The iterator count
is then used in collect_uncached_read_data() in the non-DIO case to set the
total length read - and thus the return value of sys_read(). But since the
iterator normally gets used up completely during splitting, ctx->total_len
gets overridden to the full amount.
However, prior to that in collect_uncached_read_data(), we've gone through
the list of rdatas and added up the amount of data we actually received
(which we then throw away).
Fix this by removing the bit that overrides the amount read in the non-DIO
case and just going with the total added up in the aforementioned loop.
This was observed by mounting a cifs share with multiple channels, e.g.:
mount //192.168.6.1/test /test/ -o user=shares,pass=...,max_channels=6
and then reading a 1MiB file on the share:
strace cat /xfstest.test/1M >/dev/null
Through strace, the same data can be seen being read again and again.
Fixes: d08089f649 ("cifs: Change the I/O paths to use an iterator rather than a page list")
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
cc: Jérôme Glisse <jglisse@redhat.com>
cc: Long Li <longli@microsoft.com>
cc: Enzo Matsumiya <ematsumiya@suse.de>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
A bug was reported by Yuanxi Liu where allocating 1G pages at runtime is
taking an excessive amount of time for large amounts of memory. Further
testing allocating huge pages that the cost is linear i.e. if allocating
1G pages in batches of 10 then the time to allocate nr_hugepages from
10->20->30->etc increases linearly even though 10 pages are allocated at
each step. Profiles indicated that much of the time is spent checking the
validity within already existing huge pages and then attempting a
migration that fails after isolating the range, draining pages and a whole
lot of other useless work.
Commit eb14d4eefd ("mm,page_alloc: drop unnecessary checks from
pfn_range_valid_contig") removed two checks, one which ignored huge pages
for contiguous allocations as huge pages can sometimes migrate. While
there may be value on migrating a 2M page to satisfy a 1G allocation, it's
potentially expensive if the 1G allocation fails and it's pointless to try
moving a 1G page for a new 1G allocation or scan the tail pages for valid
PFNs.
Reintroduce the PageHuge check and assume any contiguous region with
hugetlbfs pages is unsuitable for a new 1G allocation.
The hpagealloc test allocates huge pages in batches and reports the
average latency per page over time. This test happens just after boot
when fragmentation is not an issue. Units are in milliseconds.
hpagealloc
6.3.0-rc6 6.3.0-rc6 6.3.0-rc6
vanilla hugeallocrevert-v1r1 hugeallocsimple-v1r2
Min Latency 26.42 ( 0.00%) 5.07 ( 80.82%) 18.94 ( 28.30%)
1st-qrtle Latency 356.61 ( 0.00%) 5.34 ( 98.50%) 19.85 ( 94.43%)
2nd-qrtle Latency 697.26 ( 0.00%) 5.47 ( 99.22%) 20.44 ( 97.07%)
3rd-qrtle Latency 972.94 ( 0.00%) 5.50 ( 99.43%) 20.81 ( 97.86%)
Max-1 Latency 26.42 ( 0.00%) 5.07 ( 80.82%) 18.94 ( 28.30%)
Max-5 Latency 82.14 ( 0.00%) 5.11 ( 93.78%) 19.31 ( 76.49%)
Max-10 Latency 150.54 ( 0.00%) 5.20 ( 96.55%) 19.43 ( 87.09%)
Max-90 Latency 1164.45 ( 0.00%) 5.53 ( 99.52%) 20.97 ( 98.20%)
Max-95 Latency 1223.06 ( 0.00%) 5.55 ( 99.55%) 21.06 ( 98.28%)
Max-99 Latency 1278.67 ( 0.00%) 5.57 ( 99.56%) 22.56 ( 98.24%)
Max Latency 1310.90 ( 0.00%) 8.06 ( 99.39%) 26.62 ( 97.97%)
Amean Latency 678.36 ( 0.00%) 5.44 * 99.20%* 20.44 * 96.99%*
6.3.0-rc6 6.3.0-rc6 6.3.0-rc6
vanilla revert-v1 hugeallocfix-v2
Duration User 0.28 0.27 0.30
Duration System 808.66 17.77 35.99
Duration Elapsed 830.87 18.08 36.33
The vanilla kernel is poor, taking up to 1.3 second to allocate a huge
page and almost 10 minutes in total to run the test. Reverting the
problematic commit reduces it to 8ms at worst and the patch takes 26ms.
This patch fixes the main issue with skipping huge pages but leaves the
page_count() out because a page with an elevated count potentially can
migrate.
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=217022
Link: https://lkml.kernel.org/r/20230414141429.pwgieuwluxwez3rj@techsingularity.net
Fixes: eb14d4eefd ("mm,page_alloc: drop unnecessary checks from pfn_range_valid_contig")
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reported-by: Yuanxi Liu <y.liu@naruida.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
commit f1a7941243 ("mm: convert mm's rss stats into percpu_counter")
introduces a memory leak by missing a call to destroy_context() when a
percpu_counter fails to allocate.
Before introducing the per-cpu counter allocations, init_new_context() was
the last call that could fail in mm_init(), and thus there was no need to
ever invoke destroy_context() in the error paths. Adding the following
percpu counter allocations adds error paths after init_new_context(),
which means its associated destroy_context() needs to be called when
percpu counters fail to allocate.
Link: https://lkml.kernel.org/r/20230330133822.66271-1-mathieu.desnoyers@efficios.com
Fixes: f1a7941243 ("mm: convert mm's rss stats into percpu_counter")
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
syzbot is reporting circular locking dependency which involves
zonelist_update_seq seqlock [1], for this lock is checked by memory
allocation requests which do not need to be retried.
One deadlock scenario is kmalloc(GFP_ATOMIC) from an interrupt handler.
CPU0
----
__build_all_zonelists() {
write_seqlock(&zonelist_update_seq); // makes zonelist_update_seq.seqcount odd
// e.g. timer interrupt handler runs at this moment
some_timer_func() {
kmalloc(GFP_ATOMIC) {
__alloc_pages_slowpath() {
read_seqbegin(&zonelist_update_seq) {
// spins forever because zonelist_update_seq.seqcount is odd
}
}
}
}
// e.g. timer interrupt handler finishes
write_sequnlock(&zonelist_update_seq); // makes zonelist_update_seq.seqcount even
}
This deadlock scenario can be easily eliminated by not calling
read_seqbegin(&zonelist_update_seq) from !__GFP_DIRECT_RECLAIM allocation
requests, for retry is applicable to only __GFP_DIRECT_RECLAIM allocation
requests. But Michal Hocko does not know whether we should go with this
approach.
Another deadlock scenario which syzbot is reporting is a race between
kmalloc(GFP_ATOMIC) from tty_insert_flip_string_and_push_buffer() with
port->lock held and printk() from __build_all_zonelists() with
zonelist_update_seq held.
CPU0 CPU1
---- ----
pty_write() {
tty_insert_flip_string_and_push_buffer() {
__build_all_zonelists() {
write_seqlock(&zonelist_update_seq);
build_zonelists() {
printk() {
vprintk() {
vprintk_default() {
vprintk_emit() {
console_unlock() {
console_flush_all() {
console_emit_next_record() {
con->write() = serial8250_console_write() {
spin_lock_irqsave(&port->lock, flags);
tty_insert_flip_string() {
tty_insert_flip_string_fixed_flag() {
__tty_buffer_request_room() {
tty_buffer_alloc() {
kmalloc(GFP_ATOMIC | __GFP_NOWARN) {
__alloc_pages_slowpath() {
zonelist_iter_begin() {
read_seqbegin(&zonelist_update_seq); // spins forever because zonelist_update_seq.seqcount is odd
spin_lock_irqsave(&port->lock, flags); // spins forever because port->lock is held
}
}
}
}
}
}
}
}
spin_unlock_irqrestore(&port->lock, flags);
// message is printed to console
spin_unlock_irqrestore(&port->lock, flags);
}
}
}
}
}
}
}
}
}
write_sequnlock(&zonelist_update_seq);
}
}
}
This deadlock scenario can be eliminated by
preventing interrupt context from calling kmalloc(GFP_ATOMIC)
and
preventing printk() from calling console_flush_all()
while zonelist_update_seq.seqcount is odd.
Since Petr Mladek thinks that __build_all_zonelists() can become a
candidate for deferring printk() [2], let's address this problem by
disabling local interrupts in order to avoid kmalloc(GFP_ATOMIC)
and
disabling synchronous printk() in order to avoid console_flush_all()
.
As a side effect of minimizing duration of zonelist_update_seq.seqcount
being odd by disabling synchronous printk(), latency at
read_seqbegin(&zonelist_update_seq) for both !__GFP_DIRECT_RECLAIM and
__GFP_DIRECT_RECLAIM allocation requests will be reduced. Although, from
lockdep perspective, not calling read_seqbegin(&zonelist_update_seq) (i.e.
do not record unnecessary locking dependency) from interrupt context is
still preferable, even if we don't allow calling kmalloc(GFP_ATOMIC)
inside
write_seqlock(&zonelist_update_seq)/write_sequnlock(&zonelist_update_seq)
section...
Link: https://lkml.kernel.org/r/8796b95c-3da3-5885-fddd-6ef55f30e4d3@I-love.SAKURA.ne.jp
Fixes: 3d36424b3b ("mm/page_alloc: fix race condition between build_all_zonelists and page allocation")
Link: https://lkml.kernel.org/r/ZCrs+1cDqPWTDFNM@alley [2]
Reported-by: syzbot <syzbot+223c7461c58c58a4cb10@syzkaller.appspotmail.com>
Link: https://syzkaller.appspot.com/bug?extid=223c7461c58c58a4cb10 [1]
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Petr Mladek <pmladek@suse.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Patrick Daly <quic_pdaly@quicinc.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Linux Security Modules (LSMs) that implement the "capable" hook will
usually emit an access denial message to the audit log whenever they
"block" the current task from using the given capability based on their
security policy.
The occurrence of a denial is used as an indication that the given task
has attempted an operation that requires the given access permission, so
the callers of functions that perform LSM permission checks must take care
to avoid calling them too early (before it is decided if the permission is
actually needed to perform the requested operation).
The __sys_setres[ug]id() functions violate this convention by first
calling ns_capable_setid() and only then checking if the operation
requires the capability or not. It means that any caller that has the
capability granted by DAC (task's capability set) but not by MAC (LSMs)
will generate a "denied" audit record, even if is doing an operation for
which the capability is not required.
Fix this by reordering the checks such that ns_capable_setid() is checked
last and -EPERM is returned immediately if it returns false.
While there, also do two small optimizations:
* move the capability check before prepare_creds() and
* bail out early in case of a no-op.
Link: https://lkml.kernel.org/r/20230217162154.837549-1-omosnace@redhat.com
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[Why]
1. It could hit bandwidth limitdation under single dimm
memory when connecting 8K external monitor.
2. IsSupportedVidPn got validation failed with
2K240Hz eDP + 8K24Hz external monitor.
3. It's better to filter out such combination in
EnumVidPnCofuncModality
4. For short term, filter out in dc bandwidth validation.
[How]
Force 2K@240Hz+8K@24Hz timing validation false in dc.
Reviewed-by: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com>
Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com>
Signed-off-by: Daniel Miess <Daniel.Miess@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
[Why]
After gpu-reset, sometimes the driver fails to enable vblank irq,
causing flip_done timed out and the desktop freezed.
During gpu-reset, we disable and enable vblank irq in dm_suspend() and
dm_resume(). Later on in amdgpu_irq_gpu_reset_resume_helper(), we check
irqs' refcount and decide to enable or disable the irqs again.
However, we have 2 sets of API for controling vblank irq, one is
dm_vblank_get/put() and another is amdgpu_irq_get/put(). Each API has
its own refcount and flag to store the state of vblank irq, and they
are not synchronized.
In drm we use the first API to control vblank irq but in
amdgpu_irq_gpu_reset_resume_helper() we use the second set of API.
The failure happens when vblank irq was enabled by dm_vblank_get()
before gpu-reset, we have vblank->enabled true. However, during
gpu-reset, in amdgpu_irq_gpu_reset_resume_helper() vblank irq's state
checked from amdgpu_irq_update() is DISABLED. So finally it disables
vblank irq again. After gpu-reset, if there is a cursor plane commit,
the driver will try to enable vblank irq by calling drm_vblank_enable(),
but the vblank->enabled is still true, so it fails to turn on vblank
irq and causes flip_done can't be completed in vblank irq handler and
desktop become freezed.
[How]
Combining the 2 vblank control APIs by letting drm's API finally calls
amdgpu_irq's API, so the irq's refcount and state of both APIs can be
synchronized. Also add a check to prevent refcount from being less then
0 in amdgpu_irq_put().
v2:
- Add warning in amdgpu_irq_enable() if the irq is already disabled.
- Call dc_interrupt_set() in dm_set_vblank() to avoid refcount change
if it is in gpu-reset.
v3:
- Improve commit message and code comments.
Signed-off-by: Alan Liu <HaoPing.Liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Pull MMC fixes from Ulf Hansson:
"MMC host:
- sdhci_am654: Fix support for UHS-I SDR12 and SDR25 speed modes
MEMSTICK:
- Fix memory leak if card device never gets registered"
* tag 'mmc-v6.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
memstick: fix memory leak if card device is never registered
mmc: sdhci_am654: Set HIGH_SPEED_ENA for SDR12 and SDR25
Per-vcpu flags are updated using a non-atomic RMW operation.
Which means it is possible to get preempted between the read and
write operations.
Another interesting thing to note is that preemption also updates
flags, as we have some flag manipulation in both the load and put
operations.
It is thus possible to lose information communicated by either
load or put, as the preempted flag update will overwrite the flags
when the thread is resumed. This is specially critical if either
load or put has stored information which depends on the physical
CPU the vcpu runs on.
This results in really elusive bugs, and kudos must be given to
Mostafa for the long hours of debugging, and finally spotting
the problem.
Fix it by disabling preemption during the RMW operation, which
ensures that the state stays consistent. Also upgrade vcpu_get_flag
path to use READ_ONCE() to make sure the field is always atomically
accessed.
Fixes: e87abb73e5 ("KVM: arm64: Add helpers to manipulate vcpu flags among a set")
Reported-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230418125737.2327972-1-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Pull ARM SoC fixes from Arnd Bergmann:
"There are a number of updates for devicetree files for Qualcomm,
Rockchips, and NXP i.MX platforms, addressing mistakes in the DT
contents:
- Wrong GPIO polarity on some boards
- Lower SD card interface speed for better stability
- Incorrect power supply, clock, pmic, cache properties
- Disable broken hbr3 on sc7280-herobrine
- Devicetree warning fixes
The only other changes are:
- A regression fix for the Amlogic performance monitoring unit
driver, along with two related DT changes.
- imx_v6_v7_defconfig enables PCI support again.
- Trivial fixes for tee, optee and psci firmware drivers, addressing
compiler warning and error output"
* tag 'arm-fixes-6.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (32 commits)
firmware/psci: demote suspend-mode warning to info level
arm64: dts: qcom: sc7280: remove hbr3 support on herobrine boards
ARM: imx_v6_v7_defconfig: Fix unintentional disablement of PCI
arm64: dts: rockchip: correct panel supplies on some rk3326 boards
arm64: dts: rockchip: use just "port" in panel on RockPro64
arm64: dts: rockchip: use just "port" in panel on Pinebook Pro
ARM: dts: imx6ull-colibri: Remove unnecessary #address-cells/#size-cells
ARM: dts: imx7d-remarkable2: Remove unnecessary #address-cells/#size-cells
arm64: dts: imx8mp-verdin: correct off-on-delay
arm64: dts: imx8mm-verdin: correct off-on-delay
arm64: dts: imx8mm-evk: correct pmic clock source
arm64: dts: qcom: sc8280xp-pmics: fix pon compatible and registers
arm64: dts: rockchip: Remove non-existing pwm-delay-us property
arm64: dts: rockchip: Add clk_rtc_32k to Anbernic xx3 Devices
tee: Pass a pointer to virt_to_page()
perf/amlogic: adjust register offsets
arm64: dts: meson-g12-common: resolve conflict between canvas & pmu
arm64: dts: meson-g12-common: specify full DMC range
arm64: dts: imx8mp: fix address length for LCDIF2
riscv: dts: canaan: drop invalid spi-max-frequency
...
These got*, plt* and .text.ftrace_trampoline sections specified for
LoongArch have non-zero addressses. Non-zero section addresses in a
relocatable ELF would confuse GDB when it tries to compute the section
offsets and it ends up printing wrong symbol addresses. Therefore, set
them to zero, which mirrors the change in commit 5d8591bc0f
("arm64 module: set plt* section addresses to 0x0").
Cc: stable@vger.kernel.org
Reviewed-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Chong Qiao <qiaochong@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
vm_map_base, empty_zero_page and invalid_pmd_table could be accessed
widely by some out-of-tree non-GPL but important file systems or drivers
(e.g. OpenZFS). Let's use EXPORT_SYMBOL() instead of EXPORT_SYMBOL_GPL()
to export them, so as to avoid build errors.
1, Details about vm_map_base:
This is a LoongArch-specific symbol and may be referenced through macros
PCI_IOBASE, VMALLOC_START and VMALLOC_END.
2, Details about empty_zero_page:
As it stands today, only 3 architectures export empty_zero_page as a GPL
symbol: IA64, LoongArch and MIPS. LoongArch gets the GPL export by
inheriting from MIPS, and the MIPS export was first introduced in commit
497d2adcbf ("[MIPS] Export empty_zero_page for sake of the ext4
module."). The IA64 export was similar: commit a7d57ecf42 ("[IA64]
Export three symbols for module use") did so for kvm.
In both IA64 and MIPS, the export of empty_zero_page was done for
satisfying some in-kernel component built as module (kvm and ext4
respectively), and given its reasonably low-level nature, GPL is a
reasonable choice. But looking at the bigger picture it is evident most
other architectures do not regard it as GPL, so in effect the symbol
probably should not be treated as such, in favor of consistency.
3, Details about invalid_pmd_table:
Keep consistency with invalid_pte_table and make it be possible by some
modules.
Cc: stable@vger.kernel.org
Reviewed-by: WANG Xuerui <git@xen0n.name>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Some firmwares don't enable PG when wakeup from suspend, so do it in
kernel. This can improve code compatibility for boot kernel.
Signed-off-by: Baoqi Zhang <zhangbaoqi@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Addresses should all be of unsigned type to avoid unnecessary conversions.
Signed-off-by: Qing Zhang <zhangqing@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
We can see the following build error on LoongArch if CONFIG_SUSPEND is
not set:
ld: drivers/acpi/sleep.o: in function 'acpi_pm_prepare':
sleep.c:(.text+0x2b8): undefined reference to 'loongarch_wakeup_start'
Here is the call trace:
acpi_pm_prepare()
__acpi_pm_prepare()
acpi_sleep_prepare()
acpi_get_wakeup_address()
loongarch_wakeup_start()
Root cause: loongarch_wakeup_start() is defined in arch/loongarch/power/
suspend_asm.S which is only built under CONFIG_SUSPEND. In order to fix
the build error, just let acpi_get_wakeup_address() return 0 if CONFIG_
SUSPEND is not set.
Fixes: 366bb35a8e ("LoongArch: Add suspend (ACPI S3) support")
Reviewed-by: WANG Xuerui <git@xen0n.name>
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/all/11215033-fa3c-ecb1-2fc0-e9aeba47be9b@infradead.org/
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Not all LoongArch processors support CRC32 instructions. This feature
is indicated by CPUCFG1.CRC32 (Bit25) but it is wrongly defined in the
previous versions of the ISA manual (and so does in loongarch.h). The
CRC32 feature is set unconditionally now, so fix it.
BTW, expose the CRC32 feature in /proc/cpuinfo.
Cc: stable@vger.kernel.org
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
LoongArch maintains cache coherency in hardware, but when paired with
LS7A chipsets the WUC attribute (Weak-ordered UnCached, which is similar
to WriteCombine) is out of the scope of cache coherency machanism for
PCIe devices (this is a PCIe protocol violation, which may be fixed in
newer chipsets).
This means WUC can only used for write-only memory regions now, so this
option is disabled by default, making WUC silently fallback to SUC for
ioremap(). You can enable this option if the kernel is ensured to run on
hardware without this bug.
Kernel parameter writecombine=on/off can be used to override the Kconfig
option.
Cc: stable@vger.kernel.org
Suggested-by: WANG Xuerui <kernel@xen0n.name>
Reviewed-by: WANG Xuerui <kernel@xen0n.name>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Function mlxfw_mfa2_tlv_multi_get() returns NULL if 'tlv' in
question does not pass checks in mlxfw_mfa2_tlv_payload_get(). This
behaviour may lead to NULL pointer dereference in 'multi->total_len'.
Fix this issue by testing mlxfw_mfa2_tlv_multi_get()'s return value
against NULL.
Found by Linux Verification Center (linuxtesting.org) with static
analysis tool SVACE.
Fixes: 410ed13cae ("Add the mlxfw module for Mellanox firmware flash process")
Co-developed-by: Natalia Petrova <n.petrova@fintech.ru>
Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/20230417120718.52325-1-n.zhandarovich@fintech.ru
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Michael Chan says:
====================
bnxt_en: Bug fixes
This small series contains 2 fixes. The first one fixes the PTP
initialization logic on older chips to avoid logging a warning. The
second one fixes a potenial NULL pointer dereference in the driver's
aux bus unload path.
====================
Link: https://lore.kernel.org/r/20230417065819.122055-1-michael.chan@broadcom.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
In the driver unload path, the driver currently checks the valid
BNXT_FLAG_ROCE_CAP flag in bnxt_rdma_aux_device_uninit() before
proceeding. This is flawed because the flag may not be set initially
during driver load. It may be set later after the NVRAM setting is
changed followed by a firmware reset. Relying on the
BNXT_FLAG_ROCE_CAP flag may crash in bnxt_rdma_aux_device_uninit() if
the aux device was never initialized:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
PGD 8ae6aa067 P4D 0
Oops: 0000 [#1] SMP NOPTI
CPU: 39 PID: 42558 Comm: rmmod Kdump: loaded Tainted: G OE --------- - - 4.18.0-348.el8.x86_64 #1
Hardware name: Dell Inc. PowerEdge R750/0WT8Y6, BIOS 1.5.4 12/17/2021
RIP: 0010:device_del+0x1b/0x410
Code: 89 a5 50 03 00 00 4c 89 a5 58 03 00 00 eb 89 0f 1f 44 00 00 41 56 41 55 41 54 4c 8d a7 80 00 00 00 55 53 48 89 fb 48 83 ec 18 <48> 8b 2f 4c 89 e7 65 48 8b 04 25 28 00 00 00 48 89 44 24 10 31 c0
RSP: 0018:ff7f82bf469a7dc8 EFLAGS: 00010292
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000206 RDI: 0000000000000000
RBP: ff31b7cd114b0ac0 R08: 0000000000000000 R09: ffffffff935c3400
R10: ff31b7cd45bc3440 R11: 0000000000000001 R12: 0000000000000080
R13: ffffffffc1069f40 R14: 0000000000000000 R15: 0000000000000000
FS: 00007fc9903ce740(0000) GS:ff31b7d4ffac0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000992fee004 CR4: 0000000000773ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
bnxt_rdma_aux_device_uninit+0x1f/0x30 [bnxt_en]
bnxt_remove_one+0x2f/0x1f0 [bnxt_en]
pci_device_remove+0x3b/0xc0
device_release_driver_internal+0x103/0x1f0
driver_detach+0x54/0x88
bus_remove_driver+0x77/0xc9
pci_unregister_driver+0x2d/0xb0
bnxt_exit+0x16/0x2c [bnxt_en]
__x64_sys_delete_module+0x139/0x280
do_syscall_64+0x5b/0x1a0
entry_SYSCALL_64_after_hwframe+0x65/0xca
RIP: 0033:0x7fc98f3af71b
Fix this by modifying the check inside bnxt_rdma_aux_device_uninit()
to check for bp->aux_priv instead. We also need to make some changes
in bnxt_rdma_aux_device_init() to make sure that bp->aux_priv is set
only when the aux device is fully initialized.
Fixes: d80d88b0df ("bnxt_en: Add auxiliary driver support")
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The driver does not support PTP on these older chips and it is assuming
that firmware on these older chips will not return the
PORT_MAC_PTP_QCFG_RESP_FLAGS_HWRM_ACCESS flag in __bnxt_hwrm_ptp_qcfg(),
causing the function to abort quietly.
But newer firmware now sets this flag and so __bnxt_hwrm_ptp_qcfg()
will proceed further. Eventually it will fail in bnxt_ptp_init() ->
bnxt_map_ptp_regs() because there is no code to support the older chips.
The driver will then complain:
"PTP initialization failed.\n"
Fix it so that we abort quietly earlier without going through the
unnecessary steps and alarming the user with the warning log.
Fixes: ae5c42f0b9 ("bnxt_en: Get PTP hardware capability from firmware")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The usage of the BIT() macro in inline asm code was introduced in 6.3 by
the commit in the Fixes tag. However, this macro uses "1UL" for integer
constant suffixes in its shift operation, while gas before 2.28 does not
support the "L" suffix after a number, and gas before 2.27 does not
support the "U" suffix, resulting in build errors such as the following
with such versions:
./arch/x86/include/asm/uaccess_64.h:124: Error: found 'L', expected: ')'
./arch/x86/include/asm/uaccess_64.h:124: Error: junk at end of line,
first unrecognized character is `L'
However, the currently minimal binutils version the kernel supports is
2.25.
There's a single use of this macro here, revert to (1 << 0) that works
with such older binutils.
As an additional info, the binutils PRs which add support for those
suffixes are:
https://sourceware.org/bugzilla/show_bug.cgi?id=19910https://sourceware.org/bugzilla/show_bug.cgi?id=20732
[ bp: Massage and extend commit message. ]
Fixes: 5d1dd961e7 ("x86/alternatives: Add alt_instr.flags")
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Link: https://lore.kernel.org/lkml/a9aae568-3046-306c-bd71-92c1fc8eeddc@linux.alibaba.com/
If NFT_SET_ELEM_CATCHALL is set on, then userspace provides no set element
key. Otherwise, bail out with -EINVAL.
Fixes: aaa31047a6 ("netfilter: nftables: add catch-all set element support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The flower_stats_timer can schedule flower_stats_work and
flower_stats_work can also arm the flower_stats_timer. The
process is shown below:
----------- timer schedules work ------------
ch_flower_stats_cb() //timer handler
schedule_work(&adap->flower_stats_work);
----------- work arms timer ------------
ch_flower_stats_handler() //workqueue callback function
mod_timer(&adap->flower_stats_timer, ...);
When the cxgb4 device is detaching, the timer and workqueue
could still be rearmed. The process is shown below:
(cleanup routine) | (timer and workqueue routine)
remove_one() |
free_some_resources() | ch_flower_stats_cb() //timer
cxgb4_cleanup_tc_flower() | schedule_work()
del_timer_sync() |
| ch_flower_stats_handler() //workqueue
| mod_timer()
cancel_work_sync() |
kfree(adapter) //FREE | ch_flower_stats_cb() //timer
| adap->flower_stats_work //USE
This patch changes del_timer_sync() to timer_shutdown_sync(),
which could prevent rearming of the timer from the workqueue.
Fixes: e0f911c81e ("cxgb4: fetch stats for offloaded tc flower flows")
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Link: https://lore.kernel.org/r/20230415081227.7463-1-duoming@zju.edu.cn
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
catch-all set element might jump/goto to chain that uses expressions
that require validation.
Fixes: aaa31047a6 ("netfilter: nftables: add catch-all set element support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Add error handling of i40e_setup_misc_vector() in i40e_rebuild().
In case interrupt vectors setup fails do not re-open vsi-s and
do not bring up vf-s, we have no interrupts to serve a traffic
anyway.
Fixes: 41c445ff0f ("i40e: main driver core")
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Fix accessing vsi->active_filters without holding the mac_filter_hash_lock.
Move vsi->active_filters = 0 inside critical section and
move clear_bit(__I40E_VSI_OVERFLOW_PROMISC, vsi->state) after the critical
section to ensure the new filters from other threads can be added only after
filters cleaning in the critical section is finished.
Fixes: 278e7d0b9d ("i40e: store MAC/VLAN filters in a hash with the MAC Address as key")
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Scott reports that when the new GSS krb5 Kunit tests are built as
a separate module and loaded, the RFC 6803 and RFC 8009 checksum
tests all fail, even though they pass when run under kunit.py.
It appears that passing a buffer backed by static const memory to
gss_krb5_checksum() is a problem. A printk in checksum_case() shows
the correct plaintext, but by the time the buffer has been converted
to a scatterlist and arrives at checksummer(), it contains all
zeroes.
Replacing this buffer with one that is dynamically allocated fixes
the issue.
Reported-by: Scott Mayhew <smayhew@redhat.com>
Fixes: 02142b2ca8 ("SUNRPC: Add checksum KUnit tests for the RFC 6803 encryption types")
Tested-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
nftables can be built as a module, so fix the preprocessor conditional
accordingly.
Fixes: 478b360a47 ("netfilter: nf_tables: fix nf_trace always-on with XT_TRACE=n")
Reported-by: Florian Fainelli <f.fainelli@gmail.com>
Reported-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
During a suspend/resume cycle the VO power domain will be disabled and
the VOP2 registers will reset to their default values. After that the
cached register values will be out of sync and the read/modify/write
operations we do on the window registers will result in bogus values
written. Fix this by re-initializing the register cache each time we
enable the VOP2. With this the VOP2 will show a picture after a
suspend/resume cycle whereas without this the screen stays dark.
Fixes: 604be85547 ("drm/rockchip: Add VOP2 driver")
Cc: stable@vger.kernel.org
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Tested-by: Chris Morgan <macromorgan@hotmail.com>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20230413144347.3506023-1-s.hauer@pengutronix.de
Palash reports a UAF when using a modified version of syzkaller[1].
When 'tcf_exts_miss_cookie_base_alloc()' fails in 'tcf_exts_init_ex()'
a call to 'tcf_exts_destroy()' is made to free up the tcf_exts
resources.
In flower, a call to '__fl_put()' when 'tcf_exts_init_ex()' fails is made;
Then calling 'tcf_exts_destroy()', which triggers an UAF since the
already freed tcf_exts action pointer is lingering in the struct.
Before the offending patch, this was not an issue since there was no
case where the tcf_exts action pointer could linger. Therefore, restore
the old semantic by clearing the action pointer in case of a failure to
initialize the miss_cookie.
[1] https://github.com/cmu-pasta/linux-kernel-enriched-corpus
v1->v2: Fix compilation on configs without tc actions (kernel test robot)
Fixes: 80cd22c35c ("net/sched: cls_api: Support hardware miss to tc action")
Reported-by: Palash Oswal <oswalpalash@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is a use-after-free scenario that is:
When the NIC is down, user set mac address or vlan tag to VF,
the xxx_set_vf_mac() or xxx_set_vf_vlan() will invoke efx_net_stop()
and efx_net_open(), since netif_running() is false, the port will not
start and keep port_enabled false, but selftest_work is scheduled
in efx_net_open().
If we remove the device before selftest_work run, the efx_stop_port()
will not be called since the NIC is down, and then efx is freed,
we will soon get a UAF in run_timer_softirq() like this:
[ 1178.907941] ==================================================================
[ 1178.907948] BUG: KASAN: use-after-free in run_timer_softirq+0xdea/0xe90
[ 1178.907950] Write of size 8 at addr ff11001f449cdc80 by task swapper/47/0
[ 1178.907950]
[ 1178.907953] CPU: 47 PID: 0 Comm: swapper/47 Kdump: loaded Tainted: G O --------- -t - 4.18.0 #1
[ 1178.907954] Hardware name: SANGFOR X620G40/WI2HG-208T1061A, BIOS SPYH051032-U01 04/01/2022
[ 1178.907955] Call Trace:
[ 1178.907956] <IRQ>
[ 1178.907960] dump_stack+0x71/0xab
[ 1178.907963] print_address_description+0x6b/0x290
[ 1178.907965] ? run_timer_softirq+0xdea/0xe90
[ 1178.907967] kasan_report+0x14a/0x2b0
[ 1178.907968] run_timer_softirq+0xdea/0xe90
[ 1178.907971] ? init_timer_key+0x170/0x170
[ 1178.907973] ? hrtimer_cancel+0x20/0x20
[ 1178.907976] ? sched_clock+0x5/0x10
[ 1178.907978] ? sched_clock_cpu+0x18/0x170
[ 1178.907981] __do_softirq+0x1c8/0x5fa
[ 1178.907985] irq_exit+0x213/0x240
[ 1178.907987] smp_apic_timer_interrupt+0xd0/0x330
[ 1178.907989] apic_timer_interrupt+0xf/0x20
[ 1178.907990] </IRQ>
[ 1178.907991] RIP: 0010:mwait_idle+0xae/0x370
If the NIC is not actually brought up, there is no need to schedule
selftest_work, so let's move invoking efx_selftest_async_start()
into efx_start_all(), and it will be canceled by broughting down.
Fixes: dd40781e3a ("sfc: Run event/IRQ self-test asynchronously when interface is brought up")
Fixes: e340be9230 ("sfc: add ndo_set_vf_mac() function for EF10")
Debugged-by: Huang Cun <huangcun@sangfor.com.cn>
Cc: Donglin Peng <pengdonglin@sangfor.com.cn>
Suggested-by: Martin Habets <habetsm.xilinx@gmail.com>
Signed-off-by: Ding Hui <dinghui@sangfor.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Here we copy the data from the original buf to the new page. But we
not check that it may be overflow.
As long as the size received(including vnethdr) is greater than 3840
(PAGE_SIZE -VIRTIO_XDP_HEADROOM). Then the memcpy will overflow.
And this is completely possible, as long as the MTU is large, such
as 4096. In our test environment, this will cause crash. Since crash is
caused by the written memory, it is meaningless, so I do not include it.
Fixes: 72979a6c35 ("virtio_net: xdp, add slowpath case for non contiguous buffers")
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
@server->origin_fullpath already contains the tree name + optional
prefix, so avoid calling __build_path_from_dentry_optional_prefix() as
it might end up duplicating prefix path from @cifs_sb->prepath into
final full path.
Instead, generate DFS full path by simply merging
@server->origin_fullpath with dentry's path.
This fixes the following case
mount.cifs //root/dfs/dir /mnt/ -o ...
ls /mnt/link
where cifs_dfs_do_automount() will call smb3_parse_devname() with
@devname set to "//root/dfs/dir/link" instead of
"//root/dfs/dir/dir/link".
Fixes: 7ad54b98fc ("cifs: use origin fullpath for automounts")
Cc: <stable@vger.kernel.org> # 6.2+
Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
KASAN report null-ptr-deref:
==================================================================
BUG: KASAN: null-ptr-deref in bdi_split_work_to_wbs+0x5c5/0x7b0
Write of size 8 at addr 0000000000000000 by task sync/943
CPU: 5 PID: 943 Comm: sync Tainted: 6.3.0-rc5-next-20230406-dirty #461
Call Trace:
<TASK>
dump_stack_lvl+0x7f/0xc0
print_report+0x2ba/0x340
kasan_report+0xc4/0x120
kasan_check_range+0x1b7/0x2e0
__kasan_check_write+0x24/0x40
bdi_split_work_to_wbs+0x5c5/0x7b0
sync_inodes_sb+0x195/0x630
sync_inodes_one_sb+0x3a/0x50
iterate_supers+0x106/0x1b0
ksys_sync+0x98/0x160
[...]
==================================================================
The race that causes the above issue is as follows:
cpu1 cpu2
-------------------------|-------------------------
inode_switch_wbs
INIT_WORK(&isw->work, inode_switch_wbs_work_fn)
queue_rcu_work(isw_wq, &isw->work)
// queue_work async
inode_switch_wbs_work_fn
wb_put_many(old_wb, nr_switched)
percpu_ref_put_many
ref->data->release(ref)
cgwb_release
queue_work(cgwb_release_wq, &wb->release_work)
// queue_work async
&wb->release_work
cgwb_release_workfn
ksys_sync
iterate_supers
sync_inodes_one_sb
sync_inodes_sb
bdi_split_work_to_wbs
kmalloc(sizeof(*work), GFP_ATOMIC)
// alloc memory failed
percpu_ref_exit
ref->data = NULL
kfree(data)
wb_get(wb)
percpu_ref_get(&wb->refcnt)
percpu_ref_get_many(ref, 1)
atomic_long_add(nr, &ref->data->count)
atomic64_add(i, v)
// trigger null-ptr-deref
bdi_split_work_to_wbs() traverses &bdi->wb_list to split work into all
wbs. If the allocation of new work fails, the on-stack fallback will be
used and the reference count of the current wb is increased afterwards.
If cgroup writeback membership switches occur before getting the reference
count and the current wb is released as old_wd, then calling wb_get() or
wb_put() will trigger the null pointer dereference above.
This issue was introduced in v4.3-rc7 (see fix tag1). Both
sync_inodes_sb() and __writeback_inodes_sb_nr() calls to
bdi_split_work_to_wbs() can trigger this issue. For scenarios called via
sync_inodes_sb(), originally commit 7fc5854f8c ("writeback: synchronize
sync(2) against cgroup writeback membership switches") reduced the
possibility of the issue by adding wb_switch_rwsem, but in v5.14-rc1 (see
fix tag2) removed the "inode_io_list_del_locked(inode, old_wb)" from
inode_switch_wbs_work_fn() so that wb->state contains WB_has_dirty_io,
thus old_wb is not skipped when traversing wbs in bdi_split_work_to_wbs(),
and the issue becomes easily reproducible again.
To solve this problem, percpu_ref_exit() is called under RCU protection to
avoid race between cgwb_release_workfn() and bdi_split_work_to_wbs().
Moreover, replace wb_get() with wb_tryget() in bdi_split_work_to_wbs(),
and skip the current wb if wb_tryget() fails because the wb has already
been shutdown.
Link: https://lkml.kernel.org/r/20230410130826.1492525-1-libaokun1@huawei.com
Fixes: b817525a4a ("writeback: bdi_writeback iteration must not skip dying ones")
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Hou Tao <houtao1@huawei.com>
Cc: yangerkun <yangerkun@huawei.com>
Cc: Zhang Yi <yi.zhang@huawei.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
In mas_alloc_nodes(), "node->node_count = 0" means to initialize the
node_count field of the new node, but the node may not be a new node. It
may be a node that existed before and node_count has a value, setting it
to 0 will cause a memory leak. At this time, mas->alloc->total will be
greater than the actual number of nodes in the linked list, which may
cause many other errors. For example, out-of-bounds access in
mas_pop_node(), and mas_pop_node() may return addresses that should not be
used. Fix it by initializing node_count only for new nodes.
Also, by the way, an if-else statement was removed to simplify the code.
Link: https://lkml.kernel.org/r/20230411041005.26205-1-zhangpeng.00@bytedance.com
Fixes: 54a611b605 ("Maple Tree: add new data structure")
Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
set_mempolicy_home_node() iterates over a list of VMAs and calls
mbind_range() on each VMA, which also iterates over the singular list of
the VMA passed in and potentially splits the VMA. Since the VMA iterator
is not passed through, set_mempolicy_home_node() may now point to a stale
node in the VMA tree. This can result in a UAF as reported by syzbot.
Avoid the stale maple tree node by passing the VMA iterator through to the
underlying call to split_vma().
mbind_range() is also overly complicated, since there are two calling
functions and one already handles iterating over the VMAs. Simplify
mbind_range() to only handle merging and splitting of the VMAs.
Align the new loop in do_mbind() and existing loop in
set_mempolicy_home_node() to use the reduced mbind_range() function. This
allows for a single location of the range calculation and avoids
constantly looking up the previous VMA (since this is a loop over the
VMAs).
Link: https://lore.kernel.org/linux-mm/000000000000c93feb05f87e24ad@google.com/
Fixes: 66850be55e ("mm/mempolicy: use vma iterator & maple state instead of vma linked list")
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reported-by: syzbot+a7c1ec5b1d71ceaa5186@syzkaller.appspotmail.com
Link: https://lkml.kernel.org/r/20230410152205.2294819-1-Liam.Howlett@oracle.com
Tested-by: syzbot+a7c1ec5b1d71ceaa5186@syzkaller.appspotmail.com
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
When the loop over the VMA is terminated early due to an error, the return
code could be overwritten with ENOMEM. Fix the return code by only
setting the error on early loop termination when the error is not set.
User-visible effects include: attempts to run mprotect() against a
special mapping or with a poorly-aligned hugetlb address should return
-EINVAL, but they presently return -ENOMEM. In other cases an -EACCESS
should be returned.
Link: https://lkml.kernel.org/r/20230406193050.1363476-1-Liam.Howlett@oracle.com
Fixes: 2286a6914c ("mm: change mprotect_fixup to vma iterator")
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Khugepaged collapse an anonymous thp in two rounds of scans. The 2nd
round done in __collapse_huge_page_isolate() after
hpage_collapse_scan_pmd(), during which all the locks will be released
temporarily. It means the pgtable can change during this phase before 2nd
round starts.
It's logically possible some ptes got wr-protected during this phase, and
we can errornously collapse a thp without noticing some ptes are
wr-protected by userfault. e1e267c792 wanted to avoid it but it only
did that for the 1st phase, not the 2nd phase.
Since __collapse_huge_page_isolate() happens after a round of small page
swapins, we don't need to worry on any !present ptes - if it existed
khugepaged will already bail out. So we only need to check present ptes
with uffd-wp bit set there.
This is something I found only but never had a reproducer, I thought it
was one caused a bug in Muhammad's recent pagemap new ioctl work, but it
turns out it's not the cause of that but an userspace bug. However this
seems to still be a real bug even with a very small race window, still
worth to have it fixed and copy stable.
Link: https://lkml.kernel.org/r/20230405155120.3608140-1-peterx@redhat.com
Fixes: e1e267c792 ("khugepaged: skip collapse if uffd-wp detected")
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Looks like what we fixed for hugetlb in commit 44f86392bd ("mm/hugetlb:
fix uffd-wp handling for migration entries in
hugetlb_change_protection()") similarly applies to THP.
Setting/clearing uffd-wp on THP migration entries is not implemented
properly. Further, while removing migration PMDs considers the uffd-wp
bit, inserting migration PMDs does not consider the uffd-wp bit.
We have to set/clear independently of the migration entry type in
change_huge_pmd() and properly copy the uffd-wp bit in
set_pmd_migration_entry().
Verified using a simple reproducer that triggers migration of a THP, that
the set_pmd_migration_entry() no longer loses the uffd-wp bit.
Link: https://lkml.kernel.org/r/20230405160236.587705-2-david@redhat.com
Fixes: f45ec5ff16 ("userfaultfd: wp: support swap and page migration")
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: <stable@vger.kernel.org>
Cc: Muhammad Usama Anjum <usama.anjum@collabora.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The ->percpu_pvec_drained was originally introduced by commit d9ed0d08b6
("mm: only drain per-cpu pagevecs once per pagevec usage") to drain
per-cpu pagevecs only once per pagevec usage. But after converting the
swap code to be more folio-based, the commit c2bc16817a ("mm/swap: add
folio_batch_move_lru()") breaks this logic, which would cause
->percpu_pvec_drained to be reset to false, that means per-cpu pagevecs
will be drained multiple times per pagevec usage.
In theory, there should be no functional changes when converting code to
be more folio-based. We should call folio_batch_reinit() in
folio_batch_move_lru() instead of folio_batch_init(). And to verify that
we still need ->percpu_pvec_drained, I ran mmtests/sparsetruncate-tiny and
got the following data:
baseline with
baseline/ patch/
Min Time 326.00 ( 0.00%) 328.00 ( -0.61%)
1st-qrtle Time 334.00 ( 0.00%) 336.00 ( -0.60%)
2nd-qrtle Time 338.00 ( 0.00%) 341.00 ( -0.89%)
3rd-qrtle Time 343.00 ( 0.00%) 347.00 ( -1.17%)
Max-1 Time 326.00 ( 0.00%) 328.00 ( -0.61%)
Max-5 Time 327.00 ( 0.00%) 330.00 ( -0.92%)
Max-10 Time 328.00 ( 0.00%) 331.00 ( -0.91%)
Max-90 Time 350.00 ( 0.00%) 357.00 ( -2.00%)
Max-95 Time 395.00 ( 0.00%) 390.00 ( 1.27%)
Max-99 Time 508.00 ( 0.00%) 434.00 ( 14.57%)
Max Time 547.00 ( 0.00%) 476.00 ( 12.98%)
Amean Time 344.61 ( 0.00%) 345.56 * -0.28%*
Stddev Time 30.34 ( 0.00%) 19.51 ( 35.69%)
CoeffVar Time 8.81 ( 0.00%) 5.65 ( 35.87%)
BAmean-99 Time 342.38 ( 0.00%) 344.27 ( -0.55%)
BAmean-95 Time 338.58 ( 0.00%) 341.87 ( -0.97%)
BAmean-90 Time 336.89 ( 0.00%) 340.26 ( -1.00%)
BAmean-75 Time 335.18 ( 0.00%) 338.40 ( -0.96%)
BAmean-50 Time 332.54 ( 0.00%) 335.42 ( -0.87%)
BAmean-25 Time 329.30 ( 0.00%) 332.00 ( -0.82%)
From the above it can be seen that we get similar data to when
->percpu_pvec_drained was introduced, so we still need it. Let's call
folio_batch_reinit() in folio_batch_move_lru() to restore the original
logic.
Link: https://lkml.kernel.org/r/20230405161854.6931-1-zhengqi.arch@bytedance.com
Fixes: c2bc16817a ("mm/swap: add folio_batch_move_lru()")
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Pull scheduler fix from Borislav Petkov:
- Do not pull tasks to the local scheduling group if its average load
is higher than the average system load
* tag 'sched_urgent_for_v6.3_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/fair: Fix imbalance overflow
Pull x86 fix from Borislav Petkov:
- Drop __init annotation from two rtc functions which get called after
boot is done, in order to prevent a crash
* tag 'x86_urgent_for_v6.3_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/rtc: Remove __init for runtime functions
Pull powerpc fix from Michael Ellerman:
- A fix for NUMA distance handling in the pseries SCM (pmem) driver.
Thanks to Aneesh Kumar K.V.
* tag 'powerpc-6.3-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/papr_scm: Update the NUMA distance table for the target node
Pull Kbuild fixes from Masahiro Yamada:
- Drop debug info from purgatory objects again
- Document that kernel.org provides prebuilt LLVM toolchains
- Give up handling untracked files for source package builds
- Avoid creating corrupted cpio when KBUILD_BUILD_TIMESTAMP is given
with a pre-epoch data.
- Change panic_show_mem() to a macro to handle variable-length argument
- Compress tarballs on-the-fly again
* tag 'kbuild-fixes-v6.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kbuild: do not create intermediate *.tar for tar packages
kbuild: do not create intermediate *.tar for source tarballs
kbuild: merge cmd_archive_linux and cmd_archive_perf
init/initramfs: Fix argument forwarding to panic() in panic_show_mem()
initramfs: Check negative timestamp to prevent broken cpio archive
kbuild: give up untracked files for source package builds
Documentation/llvm: Add a note about prebuilt kernel.org toolchains
purgatory: fix disabling debug info
Pull ksmbd server fix from Steve French:
"smb311 server preauth integrity negotiate context parsing fix (check
for out of bounds access)"
* tag '6.3-rc6-ksmbd-server-fix' of git://git.samba.org/ksmbd:
ksmbd: avoid out of bounds access in decode_preauth_ctxt()
pci_msix_validate_entries() validates the entries array which is handed in
by the caller for a MSI-X interrupt allocation. Aside of consistency
failures it also detects a failure when the size of the MSI-X hardware table
in the device is smaller than the size of the entries array.
That's wrong for the case of range allocations where the caller provides
the minimum and the maximum number of vectors to allocate, when the
hardware size is greater or equal than the mininum, but smaller than the
maximum.
Remove the hardware size check completely from that function and just
ensure that the entires array up to the maximum size is consistent.
The limitation and range checking versus the hardware size happens
independently of that afterwards anyway because the entries array is
optional.
Fixes: 4644d22eb6 ("PCI/MSI: Validate MSI-X contiguous restriction early")
Reported-by: David Laight <David.Laight@aculab.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/87v8i3sg62.ffs@tglx
Commit 05e96e96a3 ("kbuild: use git-archive for source package
creation") split the compression as a separate step to factor out
the common build rules.
With the previous commit, we got back to the situation where source
tarballs are compressed on-the-fly.
There is no reason to keep the separate compression rules.
Generate the comressed tar packages directly.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Since commit 05e96e96a3 ("kbuild: use git-archive for source package
creation"), a source tarball is created in two steps; create *.tar file
then compress it. I split the compression as a separate rule because I
just thought 'git archive' supported only gzip.
For other compression algorithms, I could pipe the two commands:
$ git archive HEAD | xz > linux.tar.xz
I read git-archive(1) carefully, and I realized GIT had provided a
more elegant way:
$ git -c tar.tar.xz.command=xz archive -o linux.tar.xz HEAD
This commit uses 'tar.tar.*.command' configuration to specify the
compression backend so we can compress a source tarball on-the-fly.
GIT commit 767cf4579f0e ("archive: implement configurable tar filters")
is more than a decade old, so it should be available on almost all build
environments.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
The two commands, cmd_archive_linux and cmd_archive_perf, are similar.
Merge them to make it easier to add more changes to the git-archive
command.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Forwarding variadic argument lists can't be done by passing a va_list
to a function with signature foo(...) (as panic() has). It ends up
interpreting the va_list itself as a single argument instead of
iterating it. printf() happily accepts it of course, leading to corrupt
output.
Convert panic_show_mem() to a macro to allow forwarding the arguments.
The function is trivial enough that it's easier than trying to introduce
a vpanic() variant.
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Similar to commit 4c9d410f32 ("initramfs: Check timestamp to prevent
broken cpio archive"), except asserts that the timestamp is
non-negative. This can happen when the KBUILD_BUILD_TIMESTAMP is a value
before UNIX epoch, which may be set when making reproducible builds that
don't want to look like they use a valid date.
While support for dates before 1970 might not be supported, this is more
about preventing undetected CPIO corruption. The printf's use a minimum
length format specifier, and will happily make the field longer than 8
characters if they need to.
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Tested-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Pull cifs fix from Steve French:
"Small client fix for better checking for smb311 negotiate context
overflows, also marked for stable"
* tag '6.3-rc6-smb311-client-negcontext-fix' of git://git.samba.org/sfrench/cifs-2.6:
cifs: fix negotiate context parsing
Pull UBI fixes from Richard Weinberger:
- Fix failure to attach when vid_hdr offset equals the (sub)page size
- Fix for a deadlock in UBI's worker thread
* tag 'ubifs-for-linus-6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs:
ubi: Fix failure attaching when vid_hdr offset equals to (sub)page size
ubi: Fix deadlock caused by recursively holding work_sem
smb311_decode_neg_context() doesn't properly check against SMB packet
boundaries prior to accessing individual negotiate context entries. This
is due to the length check omitting the eight byte smb2_neg_context
header, as well as incorrect decrementing of len_of_ctxts.
Fixes: 5100d8a3fe ("SMB311: Improve checking of negotiate security contexts")
Reported-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: David Disseldorp <ddiss@suse.de>
Signed-off-by: Steve French <stfrench@microsoft.com>
Pull i2c fixes from Wolfram Sang:
"Just two driver fixes"
* tag 'i2c-for-6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: ocores: generate stop condition after timeout in polling mode
i2c: mchp-pci1xxxx: Update Timing registers
Pull SCSI fix from James Bottomley:
"One small fix to SCSI Enclosure Services to fix a regression caused by
another recent fix"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: ses: Handle enclosure with just a primary component gracefully
Pull block fix from Jens Axboe:
"A single NVMe quirk entry addition"
* tag 'block-6.3-2023-04-14' of git://git.kernel.dk/linux:
nvme-pci: add NVME_QUIRK_BOGUS_NID for T-FORCE Z330 SSD
Pull io_uring fix from Jens Axboe:
"Just a small tweak to when task_work needs redirection, marked for
stable as well"
* tag 'io_uring-6.3-2023-04-14' of git://git.kernel.dk/linux:
io_uring: complete request via task work in case of DEFER_TASKRUN
Pull RISC-V fixes from Palmer Dabbelt:
- A fix for a missing fence when generating the NOMMU sigreturn
trampoline
- A set of fixes for early DTB handling of reserved memory nodes
* tag 'riscv-for-linus-6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: No need to relocate the dtb as it lies in the fixmap region
riscv: Do not set initial_boot_params to the linear address of the dtb
riscv: Move early dtb mapping into the fixmap region
riscv: add icache flush for nommu sigreturn trampoline
Pull ACPI fixes from Rafael Wysocki:
"These add two ACPI-related quirks:
- Add a quirk to force StorageD3Enable on AMD Picasso systems (Mario
Limonciello)
- Add an ACPI IRQ override quirk for ASUS ExpertBook B1502CBA (Paul
Menzel)"
* tag 'acpi-6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: resource: Skip IRQ override on ASUS ExpertBook B1502CBA
ACPI: x86: utils: Add Picasso to the list for forcing StorageD3Enable
Pull power management fix from Rafael Wysocki:
"Make the amd-pstate cpufreq driver take all of the possible
combinations of the 'old' and 'new' status values correctly while
changing the operation mode via sysfs (Wyes Karny)"
* tag 'pm-6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
amd-pstate: Fix amd_pstate mode switch
Pull thermal control fix from Rafael Wysocki:
"Modify the Intel thermal throttling code to avoid updating unsupported
status clearing mask bits which causes the kernel to complain about
unchecked MSR access (Srinivas Pandruvada)"
* tag 'thermal-6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
thermal: intel: Avoid updating unsupported THERM_STATUS_CLEAR mask bits
Pull sound fixes from Takashi Iwai:
"A collection of small fixes.
At this time, quite a few fixes for the old PCI drivers are found.
Although they are not regression fixes, I took these as they are
materials for stable kernels.
In addition, a couple of regression fixes and another couple of
HD-audio quirks are included"
* tag 'sound-6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda/hdmi: disable KAE for Intel DG2
ALSA: hda/realtek: Add quirks for Lenovo Z13/Z16 Gen2
ALSA: hda: patch_realtek: add quirk for Asus N7601ZM
ALSA: firewire-tascam: add missing unwind goto in snd_tscm_stream_start_duplex()
ALSA: emu10k1: don't create old pass-through playback device on Audigy
ALSA: emu10k1: fix capture interrupt handler unlinking
ALSA: hda/sigmatel: fix S/PDIF out on Intel D*45* motherboards
ALSA: hda/sigmatel: add pin overrides for Intel DP45SG motherboard
ALSA: i2c/cs8427: fix iec958 mixer control deactivation
Pull rdma fixes from Jason Gunthorpe:
"We had a fairly slow cycle on the rc side this time, here are the
accumulated fixes, mostly in drivers:
- irdma should not generate extra completions during flushing
- Fix several memory leaks
- Do not get confused in irdma's iwarp mode if IPv6 is present
- Correct a link speed calculation in mlx5
- Increase the EQ/WQ limits on erdma as they are too small for big
applications
- Use the right math for erdma's inline mtt feature
- Make erdma probing more robust to boot time ordering differences
- Fix a KMSAN crash in CMA due to uninitialized qkey"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/core: Fix GID entry ref leak when create_ah fails
RDMA/cma: Allow UD qp_type to join multicast only
RDMA/erdma: Defer probing if netdevice can not be found
RDMA/erdma: Inline mtt entries into WQE if supported
RDMA/erdma: Update default EQ depth to 4096 and max_send_wr to 8192
RDMA/erdma: Fix some typos
IB/mlx5: Add support for 400G_8X lane speed
RDMA/irdma: Add ipv4 check to irdma_find_listener()
RDMA/irdma: Increase iWARP CM default rexmit count
RDMA/irdma: Fix memory leak of PBLE objects
RDMA/irdma: Do not generate SW completions for NOPs
Thomas Richter reported a crash in linux-next with a backtrace similar
to the following one:
[<0000000000000000>] 0x0
([<000000000031a182>] bpf_trace_run4+0xc2/0x218)
[<00000000001d59f4>] __bpf_trace_sched_switch+0x1c/0x28
[<0000000000c44a3a>] __schedule+0x43a/0x890
[<0000000000c44ef8>] schedule+0x68/0x110
[<0000000000c4e5ca>] do_nanosleep+0xa2/0x168
[<000000000026e7fe>] hrtimer_nanosleep+0xf6/0x1c0
[<000000000026eb6e>] __s390x_sys_nanosleep+0xb6/0xf0
[<0000000000c3b81c>] __do_syscall+0x1e4/0x208
[<0000000000c50510>] system_call+0x70/0x98
Last Breaking-Event-Address:
[<000003ff7fda1814>] bpf_prog_65e887c70a835bbf_on_switch+0x1a4/0x1f0
The problem is that bpf_arch_text_poke() with new_addr == NULL is
susceptible to the following race condition:
T1 T2
----------------- -------------------
plt.target = NULL
entry: brcl 0xf,plt
entry.mask = 0
lgrl %r1,plt.target
br %r1
Fix by setting PLT target to the instruction following `brcl 0xf,plt`
instead of 0. This way T2 will simply resume the execution of the eBPF
program, which is the desired effect of passing new_addr == NULL.
Fixes: f1d5df84cd ("s390/bpf: Implement bpf_arch_text_poke()")
Reported-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Link: https://lore.kernel.org/bpf/20230414154755.184502-1-iii@linux.ibm.com
Merge a quirk to force StorageD3Enable on AMD Picasso systems (Mario
Limonciello).
* acpi-x86:
ACPI: x86: utils: Add Picasso to the list for forcing StorageD3Enable
So far io_req_complete_post() only covers DEFER_TASKRUN by completing
request via task work when the request is completed from IOWQ.
However, uring command could be completed from any context, and if io
uring is setup with DEFER_TASKRUN, the command is required to be
completed from current context, otherwise wait on IORING_ENTER_GETEVENTS
can't be wakeup, and may hang forever.
The issue can be observed on removing ublk device, but turns out it is
one generic issue for uring command & DEFER_TASKRUN, so solve it in
io_uring core code.
Fixes: e6aeb2721d ("io_uring: complete all requests in task context")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/linux-block/b3fc9991-4c53-9218-a8cc-5b4dd3952108@kernel.dk/
Reported-by: Jens Axboe <axboe@kernel.dk>
Cc: Kanchan Joshi <joshi.k@samsung.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
A few more Qualcomm ARM64 DeviceTree fixes for 6.3
The GPIO polarity of the WSA881x shutdown GPIO was inconsistent and had
to be corrected in the driver, this fixes the polarity in the DeviceTree
for QRB5165 RB5, SM8250 MTP, Samsung Galaxy Book 2 and Lenovo Yoga C630.
The recent rearrangement of nodes among the IPQ8074 accidentally enabled
the PCIe PHYs, rather than the PCIe controllers. This is being
corrected, to restore PCIe functionality.
PMK8280 PON node has the wrong compatible, which recently caused the
driver to stop probing. This is corrected and the required "pbs" region
is added.
With support for HBR3 introduced, it's noted that SC7280 Herobrine
devices are having trouble running at this rate. This drops the claim
that it's supported, until further analysis can be done.
* tag 'qcom-arm64-fixes-for-6.3-2' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux:
arm64: dts: qcom: sc7280: remove hbr3 support on herobrine boards
arm64: dts: qcom: sc8280xp-pmics: fix pon compatible and registers
arm64: dts: qcom: ipq8074-hk10: enable QMP device, not the PHY node
arm64: dts: qcom: ipq8074-hk01: enable QMP device, not the PHY node
arm64: dts: qcom: qrb5165-rb5: Use proper WSA881x shutdown GPIO polarity
arm64: dts: qcom: sm8250-mtp: Use proper WSA881x shutdown GPIO polarity
arm64: dts: qcom: sdm850-samsung-w737: Use proper WSA881x shutdown GPIO polarity
arm64: dts: qcom: sdm850-lenovo-yoga-c630: Use proper WSA881x shutdown GPIO polarity
Link: https://lore.kernel.org/r/20230410153850.4752-1-andersson@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Lower sd card speeds for two boards to make them run more reliable,
missing 32k clock definition for Anbric xx3 devices, missing cache-levels
for rk3588, fixed rk3326-board display supplies and more dt-schema fixes.
* tag 'v6.3-rockchip-dtsfixes1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip:
arm64: dts: rockchip: correct panel supplies on some rk3326 boards
arm64: dts: rockchip: use just "port" in panel on RockPro64
arm64: dts: rockchip: use just "port" in panel on Pinebook Pro
arm64: dts: rockchip: Remove non-existing pwm-delay-us property
arm64: dts: rockchip: Add clk_rtc_32k to Anbernic xx3 Devices
arm64: dts: rockchip: add rk3588 cache level information
arm64: dts: rockchip: Lower SD card speed on rk3399 Pinebook Pro
arm64: dts: rockchip: Lower sd speed on rk3566-soquartz
ARM: dts: rockchip: fix a typo error for rk3288 spdif node
arm64: dts: rockchip: Fix rk3399 GICv3 ITS node name
Link: https://lore.kernel.org/r/10559306.CDJkKcVGEf@phil
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
On some Qualcomm platforms, like SC8280XP, the attempt to set PC mode
during boot fails with PSCI_RET_DENIED and since commit 998fcd001f
("firmware/psci: Print a warning if PSCI doesn't accept PC mode") this
is now logged at warning level:
psci: failed to set PC mode: -3
As there is nothing users can do about the firmware behaving this way,
demote the warning to info level and clearly mark it as a firmware bug:
psci: [Firmware Bug]: failed to set PC mode: -3
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Acked-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
If the TCA_QFQ_LMAX value is not offered through nlattr, lmax is determined by the MTU value of the network device.
The MTU of the loopback device can be set up to 2^31-1.
As a result, it is possible to have an lmax value that exceeds QFQ_MIN_LMAX.
Due to the invalid lmax value, an index is generated that exceeds the QFQ_MAX_INDEX(=24) value, causing out-of-bounds read/write errors.
The following reports a oob access:
[ 84.582666] BUG: KASAN: slab-out-of-bounds in qfq_activate_agg.constprop.0 (net/sched/sch_qfq.c:1027 net/sched/sch_qfq.c:1060 net/sched/sch_qfq.c:1313)
[ 84.583267] Read of size 4 at addr ffff88810f676948 by task ping/301
[ 84.583686]
[ 84.583797] CPU: 3 PID: 301 Comm: ping Not tainted 6.3.0-rc5 #1
[ 84.584164] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
[ 84.584644] Call Trace:
[ 84.584787] <TASK>
[ 84.584906] dump_stack_lvl (lib/dump_stack.c:107 (discriminator 1))
[ 84.585108] print_report (mm/kasan/report.c:320 mm/kasan/report.c:430)
[ 84.585570] kasan_report (mm/kasan/report.c:538)
[ 84.585988] qfq_activate_agg.constprop.0 (net/sched/sch_qfq.c:1027 net/sched/sch_qfq.c:1060 net/sched/sch_qfq.c:1313)
[ 84.586599] qfq_enqueue (net/sched/sch_qfq.c:1255)
[ 84.587607] dev_qdisc_enqueue (net/core/dev.c:3776)
[ 84.587749] __dev_queue_xmit (./include/net/sch_generic.h:186 net/core/dev.c:3865 net/core/dev.c:4212)
[ 84.588763] ip_finish_output2 (./include/net/neighbour.h:546 net/ipv4/ip_output.c:228)
[ 84.589460] ip_output (net/ipv4/ip_output.c:430)
[ 84.590132] ip_push_pending_frames (./include/net/dst.h:444 net/ipv4/ip_output.c:126 net/ipv4/ip_output.c:1586 net/ipv4/ip_output.c:1606)
[ 84.590285] raw_sendmsg (net/ipv4/raw.c:649)
[ 84.591960] sock_sendmsg (net/socket.c:724 net/socket.c:747)
[ 84.592084] __sys_sendto (net/socket.c:2142)
[ 84.593306] __x64_sys_sendto (net/socket.c:2150)
[ 84.593779] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
[ 84.593902] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
[ 84.594070] RIP: 0033:0x7fe568032066
[ 84.594192] Code: 0e 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 41 89 ca 64 8b 04 25 18 00 00 00 85 c09[ 84.594796] RSP: 002b:00007ffce388b4e8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
Code starting with the faulting instruction
===========================================
[ 84.595047] RAX: ffffffffffffffda RBX: 00007ffce388cc70 RCX: 00007fe568032066
[ 84.595281] RDX: 0000000000000040 RSI: 00005605fdad6d10 RDI: 0000000000000003
[ 84.595515] RBP: 00005605fdad6d10 R08: 00007ffce388eeec R09: 0000000000000010
[ 84.595749] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 84.595984] R13: 00007ffce388cc30 R14: 00007ffce388b4f0 R15: 0000001d00000001
[ 84.596218] </TASK>
[ 84.596295]
[ 84.596351] Allocated by task 291:
[ 84.596467] kasan_save_stack (mm/kasan/common.c:46)
[ 84.596597] kasan_set_track (mm/kasan/common.c:52)
[ 84.596725] __kasan_kmalloc (mm/kasan/common.c:384)
[ 84.596852] __kmalloc_node (./include/linux/kasan.h:196 mm/slab_common.c:967 mm/slab_common.c:974)
[ 84.596979] qdisc_alloc (./include/linux/slab.h:610 ./include/linux/slab.h:731 net/sched/sch_generic.c:938)
[ 84.597100] qdisc_create (net/sched/sch_api.c:1244)
[ 84.597222] tc_modify_qdisc (net/sched/sch_api.c:1680)
[ 84.597357] rtnetlink_rcv_msg (net/core/rtnetlink.c:6174)
[ 84.597495] netlink_rcv_skb (net/netlink/af_netlink.c:2574)
[ 84.597627] netlink_unicast (net/netlink/af_netlink.c:1340 net/netlink/af_netlink.c:1365)
[ 84.597759] netlink_sendmsg (net/netlink/af_netlink.c:1942)
[ 84.597891] sock_sendmsg (net/socket.c:724 net/socket.c:747)
[ 84.598016] ____sys_sendmsg (net/socket.c:2501)
[ 84.598147] ___sys_sendmsg (net/socket.c:2557)
[ 84.598275] __sys_sendmsg (./include/linux/file.h:31 net/socket.c:2586)
[ 84.598399] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
[ 84.598520] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
[ 84.598688]
[ 84.598744] The buggy address belongs to the object at ffff88810f674000
[ 84.598744] which belongs to the cache kmalloc-8k of size 8192
[ 84.599135] The buggy address is located 2664 bytes to the right of
[ 84.599135] allocated 7904-byte region [ffff88810f674000, ffff88810f675ee0)
[ 84.599544]
[ 84.599598] The buggy address belongs to the physical page:
[ 84.599777] page:00000000e638567f refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x10f670
[ 84.600074] head:00000000e638567f order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
[ 84.600330] flags: 0x200000000010200(slab|head|node=0|zone=2)
[ 84.600517] raw: 0200000000010200 ffff888100043180 dead000000000122 0000000000000000
[ 84.600764] raw: 0000000000000000 0000000080020002 00000001ffffffff 0000000000000000
[ 84.601009] page dumped because: kasan: bad access detected
[ 84.601187]
[ 84.601241] Memory state around the buggy address:
[ 84.601396] ffff88810f676800: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 84.601620] ffff88810f676880: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 84.601845] >ffff88810f676900: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 84.602069] ^
[ 84.602243] ffff88810f676980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 84.602468] ffff88810f676a00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 84.602693] ==================================================================
[ 84.602924] Disabling lock debugging due to kernel taint
Fixes: 3015f3d2a3 ("pkt_sched: enable QFQ to support TSO/GSO")
Reported-by: Gwangun Jung <exsociety@gmail.com>
Signed-off-by: Gwangun Jung <exsociety@gmail.com>
Acked-by: Jamal Hadi Salim<jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Prior to this patch, the sensing configuration data was not parsed
correctly, breaking detection of max_tch. The vendor driver includes
this field. This change informs the driver about the correct maximum
number of simultaneous touch inputs.
Tested on a Pine64 PineNote with a modified touch screen controller
firmware.
Signed-off-by: hrdl <git@hrdl.eu>
Reviewed-by: Alistair Francis <alistair@alistair23.me>
Link: https://lore.kernel.org/r/20230411211651.3791304-1-git@hrdl.eu
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Added a quirk to fix the TeamGroup T-Force Cardea Zero Z330 SSDs reporting
duplicate NGUIDs.
Signed-off-by: Duy Truong <dory@dory.moe>
Cc: stable@vger.kernel.org
Signed-off-by: Christoph Hellwig <hch@lst.de>
riscv establishes 2 virtual mappings:
- early_pg_dir maps the kernel which allows to discover the system
memory
- swapper_pg_dir installs the final mapping (linear mapping included)
We used to map the dtb in early_pg_dir using DTB_EARLY_BASE_VA, and this
mapping was not carried over in swapper_pg_dir. It happens that
early_init_fdt_scan_reserved_mem() must be called before swapper_pg_dir is
setup otherwise we could allocate reserved memory defined in the dtb.
And this function initializes reserved_mem variable with addresses that
lie in the early_pg_dir dtb mapping: when those addresses are reused
with swapper_pg_dir, this mapping does not exist and then we trap.
The previous "fix" was incorrect as early_init_fdt_scan_reserved_mem()
must be called before swapper_pg_dir is set up otherwise we could
allocate in reserved memory defined in the dtb.
So move the dtb mapping in the fixmap region which is established in
early_pg_dir and handed over to swapper_pg_dir.
Fixes: 922b0375fc ("riscv: Fix memblock reservation for device tree blob")
Fixes: 8f3a2b4a96 ("RISC-V: Move DT mapping outof fixmap")
Fixes: 50e63dd8ed ("riscv: fix reserved memory setup")
Reported-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/all/f8e67f82-103d-156c-deb0-d6d6e2756f5e@microchip.com/
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Tested-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230329081932.79831-2-alexghiti@rivosinc.com
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Pull cgroup fixes from Tejun Heo:
"This is a relatively big pull request this late in the cycle but the
major contributor is the cpuset bug which is rather significant:
- Fix several cpuset bugs including one where it wasn't applying the
target cgroup when tasks are created with CLONE_INTO_CGROUP
With a few smaller fixes:
- Fix inversed locking order in cgroup1 freezer implementation
- Fix garbage cpu.stat::core_sched.forceidle_usec reporting in the
root cgroup"
* tag 'cgroup-for-6.3-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup/cpuset: Make cpuset_attach_task() skip subpartitions CPUs for top_cpuset
cgroup/cpuset: Add cpuset_can_fork() and cpuset_cancel_fork() methods
cgroup/cpuset: Make cpuset_fork() handle CLONE_INTO_CGROUP properly
cgroup/cpuset: Wake up cpuset_attach_wq tasks in cpuset_cancel_attach()
cgroup,freezer: hold cpu_hotplug_lock before freezer_mutex
cgroup/cpuset: Fix partition root's cpuset.cpus update bug
cgroup: fix display of forceidle time at root
Pull clk fixes from Stephen Boyd:
"A few more clk driver fixes:
- Set the max_register member of the spreadtrum regmap so that reads
don't go off the end of the I/O space
- Avoid a clk parent error in the i.MX imx6ul driver when the
selector is unknown
- Fix an oops due to REGCACHE_NONE usage by the Renesas 9-series
driver"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: rs9: Fix suspend/resume
clk: imx6ul: fix "failed to get parent" error
clk: sprd: set max_register according to mapping range
Pull networking fixes from Jakub Kicinski:
"Including fixes from bpf, and bluetooth.
Not all that quiet given spring celebrations, but "current" fixes are
thinning out, which is encouraging. One outstanding regression in the
mlx5 driver when using old FW, not blocking but we're pushing for a
fix.
Current release - new code bugs:
- eth: enetc: workaround for unresponsive pMAC after receiving
express traffic
Previous releases - regressions:
- rtnetlink: restore RTM_NEW/DELLINK notification behavior, keep the
pid/seq fields 0 for backward compatibility
Previous releases - always broken:
- sctp: fix a potential overflow in sctp_ifwdtsn_skip
- mptcp:
- use mptcp_schedule_work instead of open-coding it and make the
worker check stricter, to avoid scheduling work on closed
sockets
- fix NULL pointer dereference on fastopen early fallback
- skbuff: fix memory corruption due to a race between skb coalescing
and releasing clones confusing page_pool reference counting
- bonding: fix neighbor solicitation validation on backup slaves
- bpf: tcp: use sock_gen_put instead of sock_put in bpf_iter_tcp
- bpf: arm64: fixed a BTI error on returning to patched function
- openvswitch: fix race on port output leading to inf loop
- sfp: initialize sfp->i2c_block_size at sfp allocation to avoid
returning a different errno than expected
- phy: nxp-c45-tja11xx: unregister PTP, purge queues on remove
- Bluetooth: fix printing errors if LE Connection times out
- Bluetooth: assorted UaF, deadlock and data race fixes
- eth: macb: fix memory corruption in extended buffer descriptor mode
Misc:
- adjust the XDP Rx flow hash API to also include the protocol layers
over which the hash was computed"
* tag 'net-6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (50 commits)
selftests/bpf: Adjust bpf_xdp_metadata_rx_hash for new arg
mlx4: bpf_xdp_metadata_rx_hash add xdp rss hash type
veth: bpf_xdp_metadata_rx_hash add xdp rss hash type
mlx5: bpf_xdp_metadata_rx_hash add xdp rss hash type
xdp: rss hash types representation
selftests/bpf: xdp_hw_metadata remove bpf_printk and add counters
skbuff: Fix a race between coalescing and releasing SKBs
net: macb: fix a memory corruption in extended buffer descriptor mode
selftests: add the missing CONFIG_IP_SCTP in net config
udp6: fix potential access to stale information
selftests: openvswitch: adjust datapath NL message declaration
selftests: mptcp: userspace pm: uniform verify events
mptcp: fix NULL pointer dereference on fastopen early fallback
mptcp: stricter state check in mptcp_worker
mptcp: use mptcp_schedule_work instead of open-coding it
net: enetc: workaround for unresponsive pMAC after receiving express traffic
sctp: fix a potential overflow in sctp_ifwdtsn_skip
net: qrtr: Fix an uninit variable access bug in qrtr_tx_resume()
rtnetlink: Restore RTM_NEW/DELLINK notification behavior
net: ti/cpsw: Add explicit platform_device.h and of_platform.h includes
...
Pull devicetree fixes from Rob Herring:
- Fix interaction between fw_devlink and DT overlays causing devices to
not be probed
- Fix the compatible string for loongson,cpu-interrupt-controller
* tag 'devicetree-fixes-for-6.2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
treewide: Fix probing of devices in DT overlays
dt-bindings: interrupt-controller: loongarch: Fix mismatched compatible
Pull pin control fix from Linus Walleij:
"This is just a revert of the AMD fix, because the fix broke some
laptops. We are working on a proper solution"
* tag 'pinctrl-v6.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
Revert "pinctrl: amd: Disable and mask interrupts on resume"
Pull drm fixes from Daniel Vetter:
- two fbcon regressions
- amdgpu: dp mst, smu13
- i915: dual link dsi for tgl+
- armada, nouveau, drm/sched, fbmem
* tag 'drm-fixes-2023-04-13' of git://anongit.freedesktop.org/drm/drm:
fbcon: set_con2fb_map needs to set con2fb_map!
fbcon: Fix error paths in set_con2fb_map
drm/amd/pm: correct the pcie link state check for SMU13
drm/amd/pm: correct SMU13.0.7 max shader clock reporting
drm/amd/pm: correct SMU13.0.7 pstate profiling clock settings
drm/amd/display: Pass the right info to drm_dp_remove_payload
drm/armada: Fix a potential double free in an error handling path
fbmem: Reject FB_ACTIVATE_KD_TEXT from userspace
drm/nouveau/fb: add missing sysmen flush callbacks
drm/i915/dsi: fix DSS CTL register offsets for TGL+
drm/scheduler: Fix UAF race in drm_sched_entity_push_job()
Daniel Borkmann says:
====================
pull-request: bpf 2023-04-13
We've added 6 non-merge commits during the last 1 day(s) which contain
a total of 14 files changed, 205 insertions(+), 38 deletions(-).
The main changes are:
1) One late straggler fix on the XDP hints side which fixes
bpf_xdp_metadata_rx_hash kfunc API before the release goes out
in order to provide information on the RSS hash type,
from Jesper Dangaard Brouer.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
selftests/bpf: Adjust bpf_xdp_metadata_rx_hash for new arg
mlx4: bpf_xdp_metadata_rx_hash add xdp rss hash type
veth: bpf_xdp_metadata_rx_hash add xdp rss hash type
mlx5: bpf_xdp_metadata_rx_hash add xdp rss hash type
xdp: rss hash types representation
selftests/bpf: xdp_hw_metadata remove bpf_printk and add counters
====================
Link: https://lore.kernel.org/r/20230413192939.10202-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Confirm that the accessed pneg_ctxt->HashAlgorithms address sits within
the SMB request boundary; deassemble_neg_contexts() only checks that the
eight byte smb2_neg_context header + (client controlled) DataLength are
within the packet boundary, which is insufficient.
Checking for sizeof(struct smb2_preauth_neg_context) is overkill given
that the type currently assumes SMB311_SALT_SIZE bytes of trailing Salt.
Signed-off-by: David Disseldorp <ddiss@suse.de>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Jesper Dangaard Brouer says:
====================
Current API for bpf_xdp_metadata_rx_hash() returns the raw RSS hash value,
but doesn't provide information on the RSS hash type (part of 6.3-rc).
This patchset proposal is to change the function call signature via adding
a pointer value argument for providing the RSS hash type.
Patchset also removes all bpf_printk's from xdp_hw_metadata program
that we expect driver developers to use. Instead counters are introduced
for relaying e.g. skip and fail info.
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Update API for bpf_xdp_metadata_rx_hash() with arg for xdp rss hash type.
The veth driver currently only support XDP-hints based on SKB code path.
The SKB have lost information about the RSS hash type, by compressing
the information down to a single bitfield skb->l4_hash, that only knows
if this was a L4 hash value.
In preparation for veth, the xdp_rss_hash_type have an L4 indication
bit that allow us to return a meaningful L4 indication when working
with SKB based packets.
Fixes: 306531f024 ("veth: Support RX XDP metadata")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/168132893055.340624.16209448340644513469.stgit@firesoul
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Update API for bpf_xdp_metadata_rx_hash() with arg for xdp rss hash type
via mapping table.
The mlx5 hardware can also identify and RSS hash IPSEC. This indicate
hash includes SPI (Security Parameters Index) as part of IPSEC hash.
Extend xdp core enum xdp_rss_hash_type with IPSEC hash type.
Fixes: bc8d405b1b ("net/mlx5e: Support RX XDP metadata")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/168132892548.340624.11185734579430124869.stgit@firesoul
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The RSS hash type specifies what portion of packet data NIC hardware used
when calculating RSS hash value. The RSS types are focused on Internet
traffic protocols at OSI layers L3 and L4. L2 (e.g. ARP) often get hash
value zero and no RSS type. For L3 focused on IPv4 vs. IPv6, and L4
primarily TCP vs UDP, but some hardware supports SCTP.
Hardware RSS types are differently encoded for each hardware NIC. Most
hardware represent RSS hash type as a number. Determining L3 vs L4 often
requires a mapping table as there often isn't a pattern or sorting
according to ISO layer.
The patch introduce a XDP RSS hash type (enum xdp_rss_hash_type) that
contains both BITs for the L3/L4 types, and combinations to be used by
drivers for their mapping tables. The enum xdp_rss_type_bits get exposed
to BPF via BTF, and it is up to the BPF-programmer to match using these
defines.
This proposal change the kfunc API bpf_xdp_metadata_rx_hash() adding
a pointer value argument for provide the RSS hash type.
Change signature for all xmo_rx_hash calls in drivers to make it compile.
The RSS type implementations for each driver comes as separate patches.
Fixes: 3d76a4d3d4 ("bpf: XDP metadata RX kfuncs")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/168132892042.340624.582563003880565460.stgit@firesoul
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The tool xdp_hw_metadata can be used by driver developers
implementing XDP-hints metadata kfuncs.
Remove all bpf_printk calls, as the tool already transfers all the
XDP-hints related information via metadata area to AF_XDP
userspace process.
Add counters for providing remaining information about failure and
skipped packet events.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/168132891533.340624.7313781245316405141.stgit@firesoul
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
I got really badly confused in d443d93864 ("fbcon: move more common
code into fb_open()") because we set the con2fb_map before the failure
points, which didn't look good.
But in trying to fix that I moved the assignment into the wrong path -
we need to do it for _all_ vc we take over, not just the first one
(which additionally requires the call to con2fb_acquire_newinfo).
I've figured this out because of a KASAN bug report, where the
fbcon_registered_fb and fbcon_display arrays went out of sync in
fbcon_mode_deleted() because the con2fb_map pointed at the old
fb_info, but the modes and everything was updated for the new one.
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Acked-by: Helge Deller <deller@gmx.de>
Tested-by: Xingyuan Mo <hdthky0@gmail.com>
Fixes: d443d93864 ("fbcon: move more common code into fb_open()")
Reported-by: Xingyuan Mo <hdthky0@gmail.com>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Xingyuan Mo <hdthky0@gmail.com>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Helge Deller <deller@gmx.de>
Cc: <stable@vger.kernel.org> # v5.19+
This is a regressoin introduced in b07db39584 ("fbcon: Ditch error
handling for con2fb_release_oldinfo"). I failed to realize what the if
(!err) checks. The mentioned commit was dropping the
con2fb_release_oldinfo() return value but the if (!err) was also
checking whether the con2fb_acquire_newinfo() function call above
failed or not.
Fix this with an early return statement.
Note that there's still a difference compared to the orginal state of
the code, the below lines are now also skipped on error:
if (!search_fb_in_map(info_idx))
info_idx = newidx;
These are only needed when we've actually thrown out an old fb_info
from the console mappings, which only happens later on.
Also move the fbcon_add_cursor_work() call into the same if block,
it's all protected by console_lock so doesn't matter when we set up
the blinking cursor delayed work anyway. This further simplifies the
control flow and allows us to ditch the found local variable.
v2: Clarify commit message (Javier)
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Acked-by: Helge Deller <deller@gmx.de>
Tested-by: Xingyuan Mo <hdthky0@gmail.com>
Fixes: b07db39584 ("fbcon: Ditch error handling for con2fb_release_oldinfo")
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Xingyuan Mo <hdthky0@gmail.com>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Helge Deller <deller@gmx.de>
Cc: <stable@vger.kernel.org> # v5.19+
Commit 1effe8ca4e ("skbuff: fix coalescing for page_pool fragment
recycling") allowed coalescing to proceed with non page pool page and page
pool page when @from is cloned, i.e.
to->pp_recycle --> false
from->pp_recycle --> true
skb_cloned(from) --> true
However, it actually requires skb_cloned(@from) to hold true until
coalescing finishes in this situation. If the other cloned SKB is
released while the merging is in process, from_shinfo->nr_frags will be
set to 0 toward the end of the function, causing the increment of frag
page _refcount to be unexpectedly skipped resulting in inconsistent
reference counts. Later when SKB(@to) is released, it frees the page
directly even though the page pool page is still in use, leading to
use-after-free or double-free errors. So it should be prohibited.
The double-free error message below prompted us to investigate:
BUG: Bad page state in process swapper/1 pfn:0e0d1
page:00000000c6548b28 refcount:-1 mapcount:0 mapping:0000000000000000
index:0x2 pfn:0xe0d1
flags: 0xfffffc0000000(node=0|zone=1|lastcpupid=0x1fffff)
raw: 000fffffc0000000 0000000000000000 ffffffff00000101 0000000000000000
raw: 0000000000000002 0000000000000000 ffffffffffffffff 0000000000000000
page dumped because: nonzero _refcount
CPU: 1 PID: 0 Comm: swapper/1 Tainted: G E 6.2.0+
Call Trace:
<IRQ>
dump_stack_lvl+0x32/0x50
bad_page+0x69/0xf0
free_pcp_prepare+0x260/0x2f0
free_unref_page+0x20/0x1c0
skb_release_data+0x10b/0x1a0
napi_consume_skb+0x56/0x150
net_rx_action+0xf0/0x350
? __napi_schedule+0x79/0x90
__do_softirq+0xc8/0x2b1
__irq_exit_rcu+0xb9/0xf0
common_interrupt+0x82/0xa0
</IRQ>
<TASK>
asm_common_interrupt+0x22/0x40
RIP: 0010:default_idle+0xb/0x20
Fixes: 53e0961da1 ("page_pool: add frag page recycling support in page pool")
Signed-off-by: Liang Chen <liangchen.linux@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20230413090353.14448-1-liangchen.linux@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
lena wang reported an issue caused by udpv6_sendmsg()
mangling msg->msg_name and msg->msg_namelen, which
are later read from ____sys_sendmsg() :
/*
* If this is sendmmsg() and sending to current destination address was
* successful, remember it.
*/
if (used_address && err >= 0) {
used_address->name_len = msg_sys->msg_namelen;
if (msg_sys->msg_name)
memcpy(&used_address->name, msg_sys->msg_name,
used_address->name_len);
}
udpv6_sendmsg() wants to pretend the remote address family
is AF_INET in order to call udp_sendmsg().
A fix would be to modify the address in-place, instead
of using a local variable, but this could have other side effects.
Instead, restore initial values before we return from udpv6_sendmsg().
Fixes: c71d8ebe7a ("net: Fix security_socket_sendmsg() bypass problem.")
Reported-by: lena wang <lena.wang@mediatek.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Maciej Żenczykowski <maze@google.com>
Link: https://lore.kernel.org/r/20230412130308.1202254-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The netlink message for creating a new datapath takes an array
of ports for the PID creation. This shouldn't cause much issue
but correct it for future cases where we need to do decode of
datapath information that could include the per-cpu PID map.
Fixes: 25f16c873f ("selftests: add openvswitch selftest suite")
Signed-off-by: Aaron Conole <aconole@redhat.com>
Link: https://lore.kernel.org/r/20230412115828.3991806-1-aconole@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Matthieu Baerts says:
====================
mptcp: more fixes for 6.3
Patch 1 avoids scheduling the MPTCP worker on a closed socket on some
edge cases. It fixes issues that can be visible from v5.11.
Patch 2 makes sure the MPTCP worker doesn't try to manipulate
disconnected sockets. This is also a fix for an issue that can be
visible from v5.11.
Patch 3 fixes a NULL pointer dereference when MPTCP FastOpen is used
and an early fallback is done. A fix for v6.2.
Patch 4 improves the stability of the userspace PM selftest for a
subtest added in v6.2.
====================
Link: https://lore.kernel.org/r/20230411-upstream-net-20230411-mptcp-fixes-v1-0-ca540f3ef986@tessares.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Simply adding a "sleep" before checking something is usually not a good
idea because the time that has been picked can not be enough or too
much. The best is to wait for events with a timeout.
In this selftest, 'sleep 0.5' is used more than 40 times. It is always
used before calling a 'verify_*' function except for this
verify_listener_events which has been added later.
At the end, using all these 'sleep 0.5' seems to work: the slow CIs
don't complain so far. Also because it doesn't take too much time, we
can just add two more 'sleep 0.5' to uniform what is done before calling
a 'verify_*' function. For the same reasons, we can also delay a bigger
refactoring to replace all these 'sleep 0.5' by functions waiting for
events instead of waiting for a fix time and hope for the best.
Fixes: 6c73008aa3 ("selftests: mptcp: listener test for userspace PM")
Cc: stable@vger.kernel.org
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In case of early fallback to TCP, subflow_syn_recv_sock() deletes
the subflow context before returning the newly allocated sock to
the caller.
The fastopen path does not cope with the above unconditionally
dereferencing the subflow context.
Fixes: 36b122baf6 ("mptcp: add subflow_v(4,6)_send_synack()")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Beyond reducing code duplication this also avoids scheduling
the mptcp_worker on a closed socket on some edge scenarios.
The addressed issue is actually older than the blamed commit
below, but this fix needs it as a pre-requisite.
Fixes: ba8f48f7a4 ("mptcp: introduce mptcp_schedule_work")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In polling mode, no stop condition is generated after a timeout. This
causes SCL to remain low and thereby block the bus. If this happens
during a transfer it can cause slaves to misinterpret the subsequent
transfer and return wrong values.
To solve this, pass the ETIMEDOUT error up from ocores_process_polling()
instead of setting STATE_ERROR directly. The caller is adjusted to call
ocores_process_timeout() on error both in polling and in IRQ mode, which
will set STATE_ERROR and generate a stop condition.
Fixes: 69c8c0c0ef ("i2c: ocores: add polling interface")
Signed-off-by: Gregor Herburger <gregor.herburger@tq-group.com>
Signed-off-by: Matthias Schiffer <matthias.schiffer@ew.tq-group.com>
Acked-by: Peter Korsgaard <peter@korsgaard.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Federico Vaga <federico.vaga@cern.ch>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
set_rtc_noop(), get_rtc_noop() are after booting, therefore their __init
annotation is wrong.
A crash was observed on an x86 platform where CMOS RTC is unused and
disabled via device tree. set_rtc_noop() was invoked from ntp:
sync_hw_clock(), although CONFIG_RTC_SYSTOHC=n, however sync_cmos_clock()
doesn't honour that.
Workqueue: events_power_efficient sync_hw_clock
RIP: 0010:set_rtc_noop
Call Trace:
update_persistent_clock64
sync_hw_clock
Fix this by dropping the __init annotation from set/get_rtc_noop().
Fixes: c311ed6183 ("x86/init: Allow DT configured systems to disable RTC at boot time")
Signed-off-by: Matija Glavinic Pecotic <matija.glavinic-pecotic.ext@nokia.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/59f7ceb1-446b-1d3d-0bc8-1f0ee94b1e18@nokia.com
I have observed an issue where the RX direction of the LS1028A ENETC pMAC
seems unresponsive. The minimal procedure to reproduce the issue is:
1. Connect ENETC port 0 with a loopback RJ45 cable to one of the Felix
switch ports (0).
2. Bring the ports up (MAC Merge layer is not enabled on either end).
3. Send a large quantity of unidirectional (express) traffic from Felix
to ENETC. I tried altering frame size and frame count, and it doesn't
appear to be specific to either of them, but rather, to the quantity
of octets received. Lowering the frame count, the minimum quantity of
packets to reproduce relatively consistently seems to be around 37000
frames at 1514 octets (w/o FCS) each.
4. Using ethtool --set-mm, enable the pMAC in the Felix and in the ENETC
ports, in both RX and TX directions, and with verification on both
ends.
5. Wait for verification to complete on both sides.
6. Configure a traffic class as preemptible on both ends.
7. Send some packets again.
The issue is at step 5, where the verification process of ENETC ends
(meaning that Felix responds with an SMD-R and ENETC sees the response),
but the verification process of Felix never ends (it remains VERIFYING).
If step 3 is skipped or if ENETC receives less traffic than
approximately that threshold, the test runs all the way through
(verification succeeds on both ends, preemptible traffic passes fine).
If, between step 4 and 5, the step below is also introduced:
4.1. Disable and re-enable PM0_COMMAND_CONFIG bit RX_EN
then again, the sequence of steps runs all the way through, and
verification succeeds, even if there was the previous RX traffic
injected into ENETC.
Traffic sent *by* the ENETC port prior to enabling the MAC Merge layer
does not seem to influence the verification result, only received
traffic does.
The LS1028A manual does not mention any relationship between
PM0_COMMAND_CONFIG and MMCSR, and the hardware people don't seem to
know for now either.
The bit that is toggled to work around the issue is also toggled
by enetc_mac_enable(), called from phylink's mac_link_down() and
mac_link_up() methods - which is how the workaround was found:
verification would work after a link down/up.
Fixes: c7b9e80869 ("net: enetc: add support for MAC Merge layer")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20230411192645.1896048-1-vladimir.oltean@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Currently, when traversing ifwdtsn skips with _sctp_walk_ifwdtsn, it only
checks the pos against the end of the chunk. However, the data left for
the last pos may be < sizeof(struct sctp_ifwdtsn_skip), and dereference
it as struct sctp_ifwdtsn_skip may cause coverflow.
This patch fixes it by checking the pos against "the end of the chunk -
sizeof(struct sctp_ifwdtsn_skip)" in sctp_ifwdtsn_skip, similar to
sctp_fwdtsn_skip.
Fixes: 0fc2ea922c ("sctp: implement validate_ftsn for sctp_stream_interleave")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Link: https://lore.kernel.org/r/2a71bffcd80b4f2c61fac6d344bb2f11c8fd74f7.1681155810.git.lucien.xin@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The commits referenced below allows userspace to use the NLM_F_ECHO flag
for RTM_NEW/DELLINK operations to receive unicast notifications for the
affected link. Prior to these changes, applications may have relied on
multicast notifications to learn the same information without specifying
the NLM_F_ECHO flag.
For such applications, the mentioned commits changed the behavior for
requests not using NLM_F_ECHO. Multicast notifications are still received,
but now use the portid of the requester and the sequence number of the
request instead of zero values used previously. For the application, this
message may be unexpected and likely handled as a response to the
NLM_F_ACKed request, especially if it uses the same socket to handle
requests and notifications.
To fix existing applications relying on the old notification behavior,
set the portid and sequence number in the notification only if the
request included the NLM_F_ECHO flag. This restores the old behavior
for applications not using it, but allows unicasted notifications for
others.
Fixes: f3a63cce1b ("rtnetlink: Honour NLM_F_ECHO flag in rtnl_delete_link")
Fixes: d88e136cab ("rtnetlink: Honour NLM_F_ECHO flag in rtnl_newlink_create")
Signed-off-by: Martin Willi <martin@strongswan.org>
Acked-by: Guillaume Nault <gnault@redhat.com>
Acked-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://lore.kernel.org/r/20230411074319.24133-1-martin@strongswan.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull dmaengine fixes from Vinod Koul:
"A couple of fixes in apple driver, core and kernedoc fix for dmaengine
subsystem:
- apple admac driver fixes for current_tx, src_addr_widths and
global' interrupt flags handling
- xdma kerneldoc fix
- core fix for use of devm_add_action_or_reset"
* tag 'dmaengine-fix-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine:
dmaengine: apple-admac: Fix 'current_tx' not getting freed
dmaengine: apple-admac: Set src_addr_widths capability
dmaengine: apple-admac: Handle 'global' interrupt flags
dmaengine: xilinx: xdma: Fix some kernel-doc comments
dmaengine: Actually use devm_add_action_or_reset()
It is found that attaching a task to the top_cpuset does not currently
ignore CPUs allocated to subpartitions in cpuset_attach_task(). So the
code is changed to fix that.
Signed-off-by: Waiman Long <longman@redhat.com>
Reviewed-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
In the case of CLONE_INTO_CGROUP, not all cpusets are ready to accept
new tasks. It is too late to check that in cpuset_fork(). So we need
to add the cpuset_can_fork() and cpuset_cancel_fork() methods to
pre-check it before we can allow attachment to a different cpuset.
We also need to set the attach_in_progress flag to alert other code
that a new task is going to be added to the cpuset.
Fixes: ef2c41cf38 ("clone3: allow spawning processes into cgroups")
Suggested-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Cc: stable@vger.kernel.org # v5.7+
Signed-off-by: Tejun Heo <tj@kernel.org>
By default, the clone(2) syscall spawn a child process into the same
cgroup as its parent. With the use of the CLONE_INTO_CGROUP flag
introduced by commit ef2c41cf38 ("clone3: allow spawning processes
into cgroups"), the child will be spawned into a different cgroup which
is somewhat similar to writing the child's tid into "cgroup.threads".
The current cpuset_fork() method does not properly handle the
CLONE_INTO_CGROUP case where the cpuset of the child may be different
from that of its parent. Update the cpuset_fork() method to treat the
CLONE_INTO_CGROUP case similar to cpuset_attach().
Since the newly cloned task has not been running yet, its actual
memory usage isn't known. So it is not necessary to make change to mm
in cpuset_fork().
Fixes: ef2c41cf38 ("clone3: allow spawning processes into cgroups")
Reported-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Cc: stable@vger.kernel.org # v5.7+
Signed-off-by: Tejun Heo <tj@kernel.org>
After a successful cpuset_can_attach() call which increments the
attach_in_progress flag, either cpuset_cancel_attach() or cpuset_attach()
will be called later. In cpuset_attach(), tasks in cpuset_attach_wq,
if present, will be woken up at the end. That is not the case in
cpuset_cancel_attach(). So missed wakeup is possible if the attach
operation is somehow cancelled. Fix that by doing the wakeup in
cpuset_cancel_attach() as well.
Fixes: e44193d39e ("cpuset: let hotplug propagation work wait for task attaching")
Signed-off-by: Waiman Long <longman@redhat.com>
Reviewed-by: Michal Koutný <mkoutny@suse.com>
Cc: stable@vger.kernel.org # v3.11+
Signed-off-by: Tejun Heo <tj@kernel.org>
Fix netfs_extract_iter_to_sg() for ITER_UBUF and ITER_IOVEC to set the
size of the page to the part of the page extracted, not the remaining
amount of data in the extracted page array at that point.
This doesn't yet affect anything as cifs, the only current user, only
passes in non-user-backed iterators.
Fixes: 0185846975 ("netfs: Add a function to extract an iterator into a scatterlist")
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Cc: Steve French <sfrench@samba.org>
Cc: Shyam Prasad N <nspmangalore@gmail.com>
Cc: Rohith Surabattula <rohiths.msft@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
TI CPSW uses of_platform_* functions which are declared in of_platform.h.
of_platform.h gets implicitly included by of_device.h, but that is going
to be removed soon. Nothing else depends on of_device.h so it can be
dropped. of_platform.h also implicitly includes platform_device.h, so
add an explicit include for it, too.
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Smatch reports:
drivers/net/wwan/iosm/iosm_ipc_pcie.c:298 ipc_pcie_probe()
warn: missing unwind goto?
When dma_set_mask fails it directly returns without disabling pci
device and freeing ipc_pcie. Fix this my calling a correct goto label
As dma_set_mask returns either 0 or -EIO, we can use a goto label, as
it finally returns -EIO.
Add a set_mask_fail goto label which stands consistent with other goto
labels in this function..
Fixes: 035e3befc1 ("net: wwan: iosm: fix driver not working with INTEL_IOMMU disabled")
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With Eric's ref tracker, syzbot finally found a repro for
use-after-free in tcp_write_timer_handler() by kernel TCP
sockets. [0]
If SMC creates a kernel socket in __smc_create(), the kernel
socket is supposed to be freed in smc_clcsock_release() by
calling sock_release() when we close() the parent SMC socket.
However, at the end of smc_clcsock_release(), the kernel
socket's sk_state might not be TCP_CLOSE. This means that
we have not called inet_csk_destroy_sock() in __tcp_close()
and have not stopped the TCP timers.
The kernel socket's TCP timers can be fired later, so we
need to hold a refcnt for net as we do for MPTCP subflows
in mptcp_subflow_create_socket().
[0]:
leaked reference.
sk_alloc (./include/net/net_namespace.h:335 net/core/sock.c:2108)
inet_create (net/ipv4/af_inet.c:319 net/ipv4/af_inet.c:244)
__sock_create (net/socket.c:1546)
smc_create (net/smc/af_smc.c:3269 net/smc/af_smc.c:3284)
__sock_create (net/socket.c:1546)
__sys_socket (net/socket.c:1634 net/socket.c:1618 net/socket.c:1661)
__x64_sys_socket (net/socket.c:1672)
do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
==================================================================
BUG: KASAN: slab-use-after-free in tcp_write_timer_handler (net/ipv4/tcp_timer.c:378 net/ipv4/tcp_timer.c:624 net/ipv4/tcp_timer.c:594)
Read of size 1 at addr ffff888052b65e0d by task syzrepro/18091
CPU: 0 PID: 18091 Comm: syzrepro Tainted: G W 6.3.0-rc4-01174-gb5d54eb5899a #7
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-1.amzn2022.0.1 04/01/2014
Call Trace:
<IRQ>
dump_stack_lvl (lib/dump_stack.c:107)
print_report (mm/kasan/report.c:320 mm/kasan/report.c:430)
kasan_report (mm/kasan/report.c:538)
tcp_write_timer_handler (net/ipv4/tcp_timer.c:378 net/ipv4/tcp_timer.c:624 net/ipv4/tcp_timer.c:594)
tcp_write_timer (./include/linux/spinlock.h:390 net/ipv4/tcp_timer.c:643)
call_timer_fn (./arch/x86/include/asm/jump_label.h:27 ./include/linux/jump_label.h:207 ./include/trace/events/timer.h:127 kernel/time/timer.c:1701)
__run_timers.part.0 (kernel/time/timer.c:1752 kernel/time/timer.c:2022)
run_timer_softirq (kernel/time/timer.c:2037)
__do_softirq (./arch/x86/include/asm/jump_label.h:27 ./include/linux/jump_label.h:207 ./include/trace/events/irq.h:142 kernel/softirq.c:572)
__irq_exit_rcu (kernel/softirq.c:445 kernel/softirq.c:650)
irq_exit_rcu (kernel/softirq.c:664)
sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1107 (discriminator 14))
</IRQ>
Fixes: ac7138746e ("smc: establish new socket family")
Reported-by: syzbot+7e1e1bdb852961150198@syzkaller.appspotmail.com
Link: https://lore.kernel.org/netdev/000000000000a3f51805f8bcc43a@google.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Static code analyzer complains to unchecked return value.
The result of pci_reset_function() is unchecked.
Despite, the issue is on the FLR supported code path and in that
case reset can be done with pcie_flr(), the patch uses less invasive
approach by adding the result check of pci_reset_function().
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: 7e2cf4feba ("qlcnic: change driver hardware interface mechanism")
Signed-off-by: Denis Plotnikov <den-plotnikov@yandex-team.ru>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tony Nguyen says:
====================
iavf: fix racing in VLANs
Ahmed Zaki says:
This patchset mainly fixes a racing issue in the iavf where the number of
VLANs in the vlan_filter_list might be more than the PF limit. To fix that,
we get rid of the cvlans and svlans bitmaps and keep all the required info
in the list.
The second patch adds two new states that are needed so that we keep the
VLAN info while the interface goes DOWN:
-- DISABLE (notify PF, but keep the filter in the list)
-- INACTIVE (dev is DOWN, filter is removed from PF)
Finally, the current code keeps each state in a separate bit field, which
is error prone. The first patch refactors that by replacing all bits with
a single enum. The changes are minimal where each bit change is replaced
with the new state value.
* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
iavf: remove active_cvlans and active_svlans bitmaps
iavf: refactor VLAN filter states
====================
Link: https://lore.kernel.org/r/20230407210730.3046149-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Luiz Augusto von Dentz says:
====================
bluetooth pull request for net:
- Fix not setting Dath Path for broadcast sink
- Fix not cleaning up on LE Connection failure
- SCO: Fix possible circular locking dependency
- L2CAP: Fix use-after-free in l2cap_disconnect_{req,rsp}
- Fix race condition in hidp_session_thread
- btbcm: Fix logic error in forming the board name
- btbcm: Fix use after free in btsdio_remove
* tag 'for-net-2023-04-10' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: L2CAP: Fix use-after-free in l2cap_disconnect_{req,rsp}
Bluetooth: Set ISO Data Path on broadcast sink
Bluetooth: hci_conn: Fix possible UAF
Bluetooth: SCO: Fix possible circular locking dependency sco_sock_getsockopt
Bluetooth: SCO: Fix possible circular locking dependency on sco_connect_cfm
bluetooth: btbcm: Fix logic error in forming the board name.
Bluetooth: btsdio: fix use after free bug in btsdio_remove due to race condition
Bluetooth: Fix race condition in hidp_session_thread
Bluetooth: Fix printing errors if LE Connection times out
Bluetooth: hci_conn: Fix not cleaning up on LE Connection failure
====================
Link: https://lore.kernel.org/r/20230410172718.4067798-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This reverts commit 3fe97ff3d9 ("scsi: ses: Don't attach if enclosure
has no components") and introduces proper handling of case where there are
no detected secondary components, but primary component (enumerated in
num_enclosures) does exist. That fix was originally proposed by Ding Hui
<dinghui@sangfor.com.cn>.
Completely ignoring devices that have one primary enclosure and no
secondary one results in ses_intf_add() bailing completely
scsi 2:0:0:254: enclosure has no enumerated components
scsi 2:0:0:254: Failed to bind enclosure -12ven in valid configurations such
even on valid configurations with 1 primary and 0 secondary enclosures as
below:
# sg_ses /dev/sg0
3PARdata SES 3321
Supported diagnostic pages:
Supported Diagnostic Pages [sdp] [0x0]
Configuration (SES) [cf] [0x1]
Short Enclosure Status (SES) [ses] [0x8]
# sg_ses -p cf /dev/sg0
3PARdata SES 3321
Configuration diagnostic page:
number of secondary subenclosures: 0
generation code: 0x0
enclosure descriptor list
Subenclosure identifier: 0 [primary]
relative ES process id: 0, number of ES processes: 1
number of type descriptor headers: 1
enclosure logical identifier (hex): 20000002ac02068d
enclosure vendor: 3PARdata product: VV rev: 3321
type descriptor header and text list
Element type: Unspecified, subenclosure id: 0
number of possible elements: 1
The changelog for the original fix follows
=====
We can get a crash when disconnecting the iSCSI session,
the call trace like this:
[ffff00002a00fb70] kfree at ffff00000830e224
[ffff00002a00fba0] ses_intf_remove at ffff000001f200e4
[ffff00002a00fbd0] device_del at ffff0000086b6a98
[ffff00002a00fc50] device_unregister at ffff0000086b6d58
[ffff00002a00fc70] __scsi_remove_device at ffff00000870608c
[ffff00002a00fca0] scsi_remove_device at ffff000008706134
[ffff00002a00fcc0] __scsi_remove_target at ffff0000087062e4
[ffff00002a00fd10] scsi_remove_target at ffff0000087064c0
[ffff00002a00fd70] __iscsi_unbind_session at ffff000001c872c4
[ffff00002a00fdb0] process_one_work at ffff00000810f35c
[ffff00002a00fe00] worker_thread at ffff00000810f648
[ffff00002a00fe70] kthread at ffff000008116e98
In ses_intf_add, components count could be 0, and kcalloc 0 size scomp,
but not saved in edev->component[i].scratch
In this situation, edev->component[0].scratch is an invalid pointer,
when kfree it in ses_intf_remove_enclosure, a crash like above would happen
The call trace also could be other random cases when kfree cannot catch
the invalid pointer
We should not use edev->component[] array when the components count is 0
We also need check index when use edev->component[] array in
ses_enclosure_data_process
=====
Reported-by: Michal Kolar <mich.k@seznam.cz>
Originally-by: Ding Hui <dinghui@sangfor.com.cn>
Cc: stable@vger.kernel.org
Fixes: 3fe97ff3d9 ("scsi: ses: Don't attach if enclosure has no components")
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Link: https://lore.kernel.org/r/nycvar.YFH.7.76.2304042122270.29760@cbobk.fhfr.pm
Tested-by: Michal Kolar <mich.k@seznam.cz>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When loading a DT overlay that creates a device, the device is not
probed, unless the DT overlay is unloaded and reloaded again.
After the recent refactoring to improve fw_devlink, it no longer depends
on the "compatible" property to identify which device tree nodes will
become struct devices. fw_devlink now picks up dangling consumers
(consumers pointing to descendent device tree nodes of a device that
aren't converted to child devices) when a device is successfully bound
to a driver. See __fw_devlink_pickup_dangling_consumers().
However, during DT overlay, a device's device tree node can have
sub-nodes added/removed without unbinding/rebinding the driver. This
difference in behavior between the normal device instantiation and
probing flow vs. the DT overlay flow has a bunch of implications that
are pointed out elsewhere[1]. One of them is that the fw_devlink logic
to pick up dangling consumers is never exercised.
This patch solves the fw_devlink issue by marking all DT nodes added by
DT overlays with FWNODE_FLAG_NOT_DEVICE (fwnode that won't become
device), and by clearing the flag when a struct device is actually
created for the DT node. This way, fw_devlink knows not to have
consumers waiting on these newly added DT nodes, and to propagate the
dependency to an ancestor DT node that has the corresponding struct
device.
Based on a patch by Saravana Kannan, which covered only platform and spi
devices.
[1] https://lore.kernel.org/r/CAGETcx_bkuFaLCiPrAWCPQz+w79ccDp6=9e881qmK=vx3hBMyg@mail.gmail.com
Fixes: 4a032827da ("of: property: Simplify of_link_to_phandle()")
Link: https://lore.kernel.org/r/CAGETcx_+rhHvaC_HJXGrr5_WAd2+k5f=rWYnkCZ6z5bGX-wj4w@mail.gmail.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Mark Brown <broonie@kernel.org>
Acked-by: Wolfram Sang <wsa@kernel.org> # for I2C
Acked-by: Shawn Guo <shawnguo@kernel.org>
Acked-by: Saravana Kannan <saravanak@google.com>
Tested-by: Ivan Bornyakov <i.bornyakov@metrotek.ru>
Link: https://lore.kernel.org/r/e1fa546682ea4c8474ff997ab6244c5e11b6f8bc.1680182615.git.geert+renesas@glider.be
Signed-off-by: Rob Herring <robh@kernel.org>
The idi48 regmap can be used in an interrupt context by regmap-irq. To
prevent a deadlock, enable use_raw_spinlock for idi48_regmap_config.
Fixes: e28432a773 ("gpio: 104-idi-48: Migrate to the regmap-irq API")
Signed-off-by: William Breathitt Gray <william.gray@linaro.org>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
The dio48e regmap can be used in an interrupt context by regmap-irq. To
prevent a deadlock, enable use_raw_spinlock for dio48e_regmap_config.
Fixes: 2f7e845f51 ("gpio: 104-dio-48e: Migrate to the regmap-irq API")
Signed-off-by: William Breathitt Gray <william.gray@linaro.org>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Pull pci fixes from Bjorn Helgaas:
- Provide pci_msix_can_alloc_dyn() stub when CONFIG_PCI_MSI unset to
avoid build errors (Reinette Chatre)
- Quirk AMD XHCI controller that loses MSI-X state in D3hot to avoid
broken USB after hotplug or suspend/resume (Basavaraj Natikar)
- Fix use-after-free in pci_bus_release_domain_nr() (Rob Herring)
* tag 'pci-v6.3-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
PCI: Fix use-after-free in pci_bus_release_domain_nr()
x86/PCI: Add quirk for AMD XHCI controller that loses MSI-X state in D3hot
PCI/MSI: Provide missing stub for pci_msix_can_alloc_dyn()
Like the ASUS ExpertBook B2502CBA and various ASUS Vivobook laptops, the
ASUS ExpertBook B1502CBA has an ACPI DSDT table that describes IRQ 1 as
ActiveLow while the kernel overrides it to Edge_High.
$ sudo dmesg | grep DMI
DMI: ASUSTeK COMPUTER INC. ASUS EXPERTBOOK B1502CBA_B1502CBA/B1502CBA, BIOS B1502CBA.300 01/18/2023
$ grep -A 40 PS2K dsdt.dsl | grep IRQ -A 1
IRQ (Level, ActiveLow, Exclusive, )
{1}
This prevents the keyboard from working. To fix this issue, add this laptop
to the skip_override_table so that the kernel does not override IRQ 1.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217323
Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
amd_pstate mode can be changed by writing the mode name to the `status`
sysfs. But some combinations are not working. Fix this issue by taking
care of the edge cases.
Before the fix the mode change combination test fails:
#./pst_test.sh
Test passed: from: disable, to
Test passed: from: disable, to disable
Test failed: 1, From mode: disable, to mode: passive
Test failed: 1, From mode: disable, to mode: active
Test failed: 1, From mode: passive, to mode: active
Test passed: from: passive, to disable
Test failed: 1, From mode: passive, to mode: passive
Test failed: 1, From mode: passive, to mode: active
Test failed: 1, From mode: active, to mode: active
Test passed: from: active, to disable
Test failed: 1, From mode: active, to mode: passive
Test failed: 1, From mode: active, to mode: active
After the fix test passes:
#./pst_test.sh
Test passed: from: disable, to
Test passed: from: disable, to disable
Test passed: from: disable, to passive
Test passed: from: disable, to active
Test passed: from: passive, to active
Test passed: from: passive, to disable
Test passed: from: passive, to passive
Test passed: from: passive, to active
Test passed: from: active, to active
Test passed: from: active, to disable
Test passed: from: active, to passive
Test passed: from: active, to active
Fixes: abd61c08ef ("cpufreq: amd-pstate: add driver working mode switch support")
Acked-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Alexey Kardashevskiy <aik@amd.com>
Signed-off-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Pull btrfs fixes from David Sterba:
- fix fast checksum detection, this affects filesystems with non-crc32c
checksum, calculation would not be offloaded to worker threads
- restore thread_pool mount option behaviour for endio workers, the new
value for maximum active threads would not be set to the actual work
queues
* tag 'for-6.3-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix fast csum implementation detection
btrfs: restore the thread_pool= behavior in remount for the end I/O workqueues
Pull mtd fixes from Miquel Raynal:
"Core fix:
- mtdblock: Tolerate corrected bit-flips
Raw NAND fixes:
- meson: Fix bitmask for length in command word
- stm32_fmc2:
- Remove unsupported EDO mode
- Use timings.mode instead of checking tRC_min.
The first patch is the real fix but nowadays we use
timings.mode instead of bare timings, so in order to ease the
backports, the fix was split into two steps, the first one easy
to backport on older kernels, the second one just as a
follow-up so recent stable kernels would look like the
mainline"
* tag 'mtd/fixes-for-6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux:
mtd: rawnand: meson: fix bitmask for length in command word
mtdblock: tolerate corrected bit-flips
mtd: rawnand: stm32_fmc2: use timings.mode instead of checking tRC_min
mtd: rawnand: stm32_fmc2: remove unsupported EDO mode
Pull ata fix from Damien Le Moal:
- Update my email address in the MAINTAINERS file
* tag 'ata-6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
MAINTAINERS: Change ata maintainer email addresses
Pull kvm fixes from Paolo Bonzini:
"Two ARM fixes:
- Ensure the guest PMU context is restored before the first KVM_RUN,
fixing an issue where EL0 event counting is broken after vCPU
save/restore
- Actually initialize ID_AA64PFR0_EL1.{CSV2,CSV3} based on the
sanitized, system-wide values for protected VMs"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: arm64: Advertise ID_AA64PFR0_EL1.CSV2/3 to protected VMs
KVM: arm64: PMU: Restore the guest's EL0 event counting after migration
Some older processors don't allow BIT(13) and BIT(15) in the current
mask set by "THERM_STATUS_CLEAR_CORE_MASK". This results in:
unchecked MSR access error: WRMSR to 0x19c (tried to
write 0x000000000000aaa8) at rIP: 0xffffffff816f66a6
(throttle_active_work+0xa6/0x1d0)
To avoid unchecked MSR issues, check CPUID for each relevant feature and
use that information to set the supported feature bits only in the
"clear" mask for cores. Do the same for the analogous package mask set
by "THERM_STATUS_CLEAR_PKG_MASK".
Introduce functions thermal_intr_init_core_clear_mask() and
thermal_intr_init_pkg_clear_mask() to set core and package mask bits,
respectively. These functions are called during initialization.
Fixes: 6fe1e64b60 ("thermal: intel: Prevent accidental clearing of HFI status")
Reported-by: Rui Salvaterra <rsalvaterra@gmail.com>
Link: https://lore.kernel.org/lkml/cdf43fb423368ee3994124a9e8c9b4f8d00712c6.camel@linux.intel.com/T/
Tested-by: Rui Salvaterra <rsalvaterra@gmail.com>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: 6.2+ <stable@kernel.org> # 6.2+
[ rjw: Renamed 2 funtions and 2 static variables, edited subject and
changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Xu writes:
FPGA Manager changes for 6.3-final
Intel m10 bmc secure update:
- Ilpo's change fixes the return value of driver internal function
DFL PCI driver:
- Bjorn's change drops redundant pci_enable_pcie_error_reporting()
Xilinx:
- Michal's change uses xlnx_pr_decouple_read() instead of readl() to
resolve sparse issue.
FPGA core:
- Alexis's change fixes kernel warning on fpga bridge register
All patches have been reviewed on the mailing list, and have been in the
last linux-next releases (as part of our fixes branch)
Signed-off-by: Xu Yilun <yilun.xu@intel.com>
* tag 'fpga-for-6.3-final' of git://git.kernel.org/pub/scm/linux/kernel/git/fpga/linux-fpga:
fpga: bridge: properly initialize bridge device before populating children
fpga: xilinx-pr-decoupler: Use readl wrapper instead of pure readl
fpga: dfl-pci: Drop redundant pci_enable_pcie_error_reporting()
fpga: m10bmc-sec: Fix rsu_send_data() to return FW_UPLOAD_ERR_HW_ERROR
Jonathan writes:
2nd set of IIO fixes for the 6.3 cycle.
adi,ad5755
- Fix missing fwnode_handle_put() in error path.
atmel,at91-sam5d2
- Fix error code when trigger allocation fails that would have looked
like success.
taos,tsl2772
- Store the proximity-diodes value read from the device tree so it
is actually used rather than ignored.
* tag 'iio-fixes-for-6.3b' of https://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio:
iio: light: tsl2772: fix reading proximity-diodes from device tree
iio: dac: ad5755: Add missing fwnode_handle_put()
iio: adc: at91-sama5d2_adc: fix an error code in at91_adc_allocate_trigger()
Change my email address referenced in the MAINTAINERS file for the ata
subsystem to dlemoal@kernel.org. And while at it, also change other
references for zonefs and the k210 drivers to the same address.
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
When the source tree is dirty and contains untracked files, package
builds may fail, for example, when a broken symlink exists, a file
path contains whitespaces, etc.
Since commit 05e96e96a3 ("kbuild: use git-archive for source package
creation"), the source tarball only contains committed files because
it is created by 'git archive'. scripts/package/gen-diff-patch tries
to address the diff from HEAD, but including untracked files by the
hand-crafted script introduces more complexity. I wrote a patch [1] to
make it work in most cases, but still wonder if this is what we should
aim for.
To simplify the code, this patch just gives up untracked files. Going
forward, it is your responsibility to do 'git add' for what you want in
the source package. The script shows a warning just in case you forgot
to do so. It should be checked only when building source packages.
[1]: https://lore.kernel.org/all/CAK7LNAShbZ56gSh9PrbLnBDYKnjtTkHMoCXeGrhcxMvqXGq9=g@mail.gmail.com/2-0001-kbuild-make-package-builds-more-robust.patch
Fixes: 05e96e96a3 ("kbuild: use git-archive for source package creation")
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
Pull RCU fix from Paul McKenney:
"This fixes a pair of bugs in which an improbable but very real
sequence of events can cause kfree_rcu() to be a bit too quick about
freeing the memory passed to it.
It turns out that this pair of bugs is about two years old, and so
this is not a v6.3 regression. However: (1) It just started showing up
in the wild and (2) Its consequences are dire, so its fix needs to go
in sooner rather than later.
Testing is of course being upgraded, and the upgraded tests detect
this situation very quickly. But to the best of my knowledge right
now, the tests are not particularly urgent and will thus most likely
show up in the v6.5 merge window (the one after this coming one).
Kudos to Ziwei Dai and his group for tracking this one down the hard
way!"
* tag 'urgent-rcu.2023.04.07a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
rcu/kvfree: Avoid freeing new kfree_rcu() memory after old grace period
Pull virtio fixes from Michael Tsirkin:
"Some last minute fixes - most of them for regressions"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
vdpa_sim_net: complete the initialization before register the device
vdpa/mlx5: Add and remove debugfs in setup/teardown driver
tools/virtio: fix typo in README instructions
vhost-scsi: Fix crash during LUN unmapping
vhost-scsi: Fix vhost_scsi struct use after free
virtio-blk: fix ZBD probe in kernels without ZBD support
virtio-blk: fix to match virtio spec
Pull 9p fixes from Eric Van Hensbergen:
"These are some collected fixes for the 6.3-rc series that have been
passed our 9p regression tests and been in for-next for at least a
week.
They include a fix for a KASAN reported problem in the extended
attribute handling code and a use after free in the xen transport.
This also includes some updates for the MAINTAINERS file including the
transition of our development mailing list from sourceforge.net to
lists.linux.dev"
* tag '9p-6.3-fixes-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
Update email address and mailing list for v9fs
9p/xen : Fix use after free bug in xen_9pfs_front_remove due to race condition
9P FS: Fix wild-memory-access write in v9fs_get_acl
Pull UML fix from Richard Weinberger:
- Build regression fix for older gcc versions
* tag 'uml-for-linus-6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux:
um: Only disable SSE on clang to work around old GCC bugs
Similar to commit d0be8347c6 ("Bluetooth: L2CAP: Fix use-after-free
caused by l2cap_chan_put"), just use l2cap_chan_hold_unless_zero to
prevent referencing a channel that is about to be destroyed.
Cc: stable@kernel.org
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Min Li <lm0963hack@gmail.com>
This patch enables ISO data rx on broadcast sink.
Fixes: eca0ae4aea ("Bluetooth: Add initial implementation of BIS connections")
Signed-off-by: Claudia Draghicescu <claudia.rosu@nxp.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
This patch fixes an incorrect loop exit condition in code that replaces
'/' symbols in the board name. There might also be a memory corruption
issue here, but it is unlikely to be a real problem.
Cc: <stable@vger.kernel.org>
Signed-off-by: Sasha Finkelstein <fnkl.kernel@gmail.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
In btsdio_probe, the data->work is bound with btsdio_work. It will be
started in btsdio_send_frame.
If the btsdio_remove runs with a unfinished work, there may be a race
condition that hdev is freed but used in btsdio_work. Fix it by
canceling the work before do cleanup in btsdio_remove.
Signed-off-by: Zheng Wang <zyytlz.wz@163.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
There is a potential race condition in hidp_session_thread that may
lead to use-after-free. For instance, the timer is active while
hidp_del_timer is called in hidp_session_thread(). After hidp_session_put,
then 'session' will be freed, causing kernel panic when hidp_idle_timeout
is running.
The solution is to use del_timer_sync instead of del_timer.
Here is the call trace:
? hidp_session_probe+0x780/0x780
call_timer_fn+0x2d/0x1e0
__run_timers.part.0+0x569/0x940
hidp_session_probe+0x780/0x780
call_timer_fn+0x1e0/0x1e0
ktime_get+0x5c/0xf0
lapic_next_deadline+0x2c/0x40
clockevents_program_event+0x205/0x320
run_timer_softirq+0xa9/0x1b0
__do_softirq+0x1b9/0x641
__irq_exit_rcu+0xdc/0x190
irq_exit_rcu+0xe/0x20
sysvec_apic_timer_interrupt+0xa1/0xc0
Cc: stable@vger.kernel.org
Signed-off-by: Min Li <lm0963hack@gmail.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
This fixes errors like bellow when LE Connection times out since that
is actually not a controller error:
Bluetooth: hci0: Opcode 0x200d failed: -110
Bluetooth: hci0: request failed to create LE connection: err -110
Instead the code shall properly detect if -ETIMEDOUT is returned and
send HCI_OP_LE_CREATE_CONN_CANCEL to give up on the connection.
Link: https://github.com/bluez/bluez/issues/340
Fixes: 8e8b92ee60 ("Bluetooth: hci_sync: Add hci_le_create_conn_sync")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
hci_connect_le_scan_cleanup shall always be invoked to cleanup the
states and re-enable passive scanning if necessary, otherwise it may
cause the pending action to stay active causing multiple attempts to
connect.
Fixes: 9b3628d79b ("Bluetooth: hci_sync: Cleanup hci_conn if it cannot be aborted")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Pull perf fixes from Borislav Petkov:
- Fix "same task" check when redirecting event output
- Do not wait unconditionally for RCU on the event migration path if
there are no events to migrate
* tag 'perf_urgent_for_v6.3_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/core: Fix the same task check in perf_event_set_output
perf: Optimize perf_pmu_migrate_context()
Pull x86 fixes from Borislav Petkov:
- Add a new Intel Arrow Lake CPU model number
- Fix a confusion about how to check the version of the ACPI spec which
supports a "online capable" bit in the MADT table which lead to a
bunch of boot breakages with Zen1 systems and VMs
* tag 'x86_urgent_for_v6.3_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/cpu: Add model number for Intel Arrow Lake processor
x86/acpi/boot: Correct acpi_is_processor_usable() check
x86/ACPI/boot: Use FADT version to check support for online capable
Pull compute express link (cxl) fixes from Dan Williams:
"Several fixes for driver startup regressions that landed during the
merge window as well as some older bugs.
The regressions were due to a lack of testing with what the CXL
specification calls Restricted CXL Host (RCH) topologies compared to
the testing with Virtual Host (VH) CXL topologies. A VH topology is
typical PCIe while RCH topologies map CXL endpoints as Root Complex
Integrated endpoints. The impact is some driver crashes on startup.
This merge window also added compatibility for range registers (the
mechanism that CXL 1.1 defined for mapping memory) to treat them like
HDM decoders (the mechanism that CXL 2.0 defined for mapping
Host-managed Device Memory). That work collided with the new region
enumeration code that was tested with CXL 2.0 setups, and fails with
crashes at startup.
Lastly, the DOE (Data Object Exchange) implementation for retrieving
an ACPI-like data table from CXL devices is being reworked for v6.4.
Several fixes fell out of that work that are suitable for v6.3.
All of this has been in linux-next for a while, and all reported
issues [1] have been addressed.
Summary:
- Fix several issues with region enumeration in RCH topologies that
can trigger crashes on driver startup or shutdown.
- Fix CXL DVSEC range register compatibility versus region
enumeration that leads to startup crashes
- Fix CDAT endiannes handling
- Fix multiple buffer handling boundary conditions
- Fix Data Object Exchange (DOE) workqueue usage vs
CONFIG_DEBUG_OBJECTS warn splats"
Link: http://lore.kernel.org/r/20230405075704.33de8121@canb.auug.org.au [1]
* tag 'cxl-fixes-6.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
cxl/hdm: Extend DVSEC range register emulation for region enumeration
cxl/hdm: Limit emulation to the number of range registers
cxl/region: Move coherence tracking into cxl_region_attach()
cxl/region: Fix region setup/teardown for RCDs
cxl/port: Fix find_cxl_root() for RCDs and simplify it
cxl/hdm: Skip emulation when driver manages mem_enable
cxl/hdm: Fix double allocation of @cxlhdm
PCI/DOE: Fix memory leak with CONFIG_DEBUG_OBJECTS=y
PCI/DOE: Silence WARN splat with CONFIG_DEBUG_OBJECTS=y
cxl/pci: Handle excessive CDAT length
cxl/pci: Handle truncated CDAT entries
cxl/pci: Handle truncated CDAT header
cxl/pci: Fix CDAT retrieval on big endian
Ivan Bornyakov says:
====================
net: fix EEPROM read of absent SFP module
The patchset is to improve EEPROM read requests when SFP module is
absent.
ChangeLog:
v1:
https://lore.kernel.org/netdev/20230405153900.747-1-i.bornyakov@metrotek.ru/
v2:
* reword commit message of "net: sfp: initialize sfp->i2c_block_size
at sfp allocation"
* add second patch to eliminate excessive I2C transfers in
sfp_module_eeprom() and sfp_module_eeprom_by_page()
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
If SFP module is not present, it is sensible to fail sfp_module_eeprom()
and sfp_module_eeprom_by_page() early to avoid excessive I2C transfers
which are garanteed to fail.
Suggested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Ivan Bornyakov <i.bornyakov@metrotek.ru>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
sfp->i2c_block_size is initialized at SFP module insertion in
sfp_sm_mod_probe(). Because of that, if SFP module was never inserted
since boot, sfp_read() call will lead to zero-length I2C read attempt,
and not all I2C controllers are happy with zero-length reads.
One way to issue sfp_read() on empty SFP cage is to execute ethtool -m.
If SFP module was never plugged since boot, there will be a zero-length
I2C read attempt.
# ethtool -m xge0
i2c i2c-3: adapter quirk: no zero length (addr 0x0050, size 0, read)
Cannot get Module EEPROM data: Operation not supported
If SFP module was plugged then removed at least once,
sfp->i2c_block_size will be initialized and ethtool -m will fail with
different exit code and without I2C error
# ethtool -m xge0
Cannot get Module EEPROM data: Remote I/O error
Fix this by initializing sfp->i2_block_size at struct sfp allocation
stage so no wild sfp_read() could issue zero-length I2C read.
Signed-off-by: Ivan Bornyakov <i.bornyakov@metrotek.ru>
Fixes: 0d035bed2a ("net: sfp: VSOL V2801F / CarlitoxxPro CPGOS03-0490 v2.0 workaround")
Cc: stable@vger.kernel.org
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull cifs client fixes from Steve French:
"Two cifs/smb3 client fixes, one for stable:
- double lock fix for a cifs/smb1 reconnect path
- DFS prefixpath fix for reconnect when server moved"
* tag '6.3-rc5-smb3-cifs-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: double lock in cifs_reconnect_tcon()
cifs: sanitize paths in cifs_update_super_prepath.
Pull char/misc driver fixes from Greg KH:
"Here are a small set of various small driver changes for 6.3-rc6.
Included in here are:
- iio driver fixes for reported problems
- coresight hwtracing bugfix for reported problem
- small counter driver bugfixes
All have been in linux-next for a while with no reported problems"
* tag 'char-misc-6.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
coresight: etm4x: Do not access TRCIDR1 for identification
coresight-etm4: Fix for() loop drvdata->nr_addr_cmp range bug
iio: adc: ti-ads7950: Set `can_sleep` flag for GPIO chip
iio: adc: palmas_gpadc: fix NULL dereference on rmmod
counter: 104-quad-8: Fix Synapse action reported for Index signals
counter: 104-quad-8: Fix race condition between FLAG and CNTR reads
iio: adc: max11410: fix read_poll_timeout() usage
iio: dac: cio-dac: Fix max DAC write value check for 12-bit
iio: light: cm32181: Unregister second I2C client if present
iio: accel: kionix-kx022a: Get the timestamp from the driver's private data in the trigger_handler
iio: adc: ad7791: fix IRQ flags
iio: buffer: make sure O_NONBLOCK is respected
iio: buffer: correctly return bytes written in output buffers
iio: light: vcnl4000: Fix WARN_ON on uninitialized lock
iio: adis16480: select CONFIG_CRC32
drivers: iio: adc: ltc2497: fix LSB shift
iio: adc: qcom-spmi-adc5: Fix the channel name
Pull tty/serial driver fixes from Greg KH:
"Here are some small tty and serial driver fixes for some reported
problems:
- fsl_uart driver bugfixes
- sh-sci serial driver bugfixes
- renesas serial driver DT binding bugfixes
- 8250 DMA bugfix
All of these have been in linux-next for a while with no reported
problems"
* tag 'tty-6.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
tty: serial: sh-sci: Fix Rx on RZ/G2L SCI
tty: serial: fsl_lpuart: fix crash in lpuart_uport_is_active
tty: serial: fsl_lpuart: avoid checking for transfer complete when UARTCTRL_SBK is asserted in lpuart32_tx_empty
serial: 8250: Prevent starting up DMA Rx on THRI interrupt
dt-bindings: serial: renesas,scif: Fix 4th IRQ for 4-IRQ SCIFs
tty: serial: sh-sci: Fix transmit end interrupt handler
Pull USB bugfixes from Greg KH:
"Here are some small USB bugfixes for 6.3-rc6 that have been in my
tree, and in linux-next, for a while. Included in here are:
- new usb-serial driver device ids
- xhci bugfixes for reported problems
- gadget driver bugfixes for reported problems
- dwc3 new device id
All have been in linux-next with no reported problems"
* tag 'usb-6.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: cdnsp: Fixes error: uninitialized symbol 'len'
usb: gadgetfs: Fix ep_read_iter to handle ITER_UBUF
usb: gadget: f_fs: Fix ffs_epfile_read_iter to handle ITER_UBUF
usb: typec: altmodes/displayport: Fix configure initial pin assignment
usb: dwc3: pci: add support for the Intel Meteor Lake-S
xhci: Free the command allocated for setting LPM if we return early
Revert "usb: xhci-pci: Set PROBE_PREFER_ASYNCHRONOUS"
xhci: also avoid the XHCI_ZERO_64B_REGS quirk with a passthrough iommu
USB: serial: option: add Quectel RM500U-CN modem
usb: xhci: tegra: fix sleep in atomic call
USB: serial: option: add Telit FE990 compositions
USB: serial: cp210x: add Silicon Labs IFS-USB-DATACABLE IDs
Pull SCSI fixes from James Bottomley:
"Four small fixes, all in drivers. They're all one or two lines except
for the ufs one, but that's a simple revert of a previous feature"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: iscsi_tcp: Check that sock is valid before iscsi_set_param()
scsi: qla2xxx: Fix memory leak in qla2x00_probe_one()
scsi: mpi3mr: Handle soft reset in progress fault code (0xF002)
scsi: Revert "scsi: ufs: core: Initialize devfreq synchronously"
Pull block fixes from Jens Axboe:
- Ensure that ublk always reads the whole sqe upfront (me)
- Fix for a block size probing issue with ublk (Ming)
- Fix for the bio based polling (Keith)
- NVMe pull request via Christoph:
- fix discard support without oncs (Keith Busch)
- Partition scan error handling regression fix (Yu)
* tag 'block-6.3-2023-04-06' of git://git.kernel.dk/linux:
block: don't set GD_NEED_PART_SCAN if scan partition failed
block: ublk: make sure that block size is set correctly
ublk: read any SQE values upfront
nvme: fix discard support without oncs
blk-mq: directly poll requests
Pull io_uring fixes from Jens Axboe:
"Just two minor fixes for provided buffers - one where we could
potentially leak a buffer, and one where the returned values was
off-by-one in some cases"
* tag 'io_uring-6.3-2023-04-06' of git://git.kernel.dk/linux:
io_uring: fix memory leak when removing provided buffers
io_uring: fix return value when removing provided buffers
Pull dma-mapping fix from Christoph Hellwig:
- fix a braino in the swiotlb alignment check fix (Petr Tesarik)
* tag 'dma-mapping-6.3-2023-04-08' of git://git.infradead.org/users/hch/dma-mapping:
swiotlb: fix a braino in the alignment check fix
Pull tracing fixes from Steven Rostedt:
"A couple more minor fixes:
- Reset direct->addr back to its original value on error in updating
the direct trampoline code
- Make lastcmd_mutex static"
* tag 'trace-v6.3-rc5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing/synthetic: Make lastcmd_mutex static
ftrace: Fix issue that 'direct->addr' not restored in modify_ftrace_direct()
Pull MM fixes from Andrew Morton:
"28 hotfixes.
23 are cc:stable and the other five address issues which were
introduced during this merge cycle.
20 are for MM and the remainder are for other subsystems"
* tag 'mm-hotfixes-stable-2023-04-07-16-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (28 commits)
maple_tree: fix a potential concurrency bug in RCU mode
maple_tree: fix get wrong data_end in mtree_lookup_walk()
mm/swap: fix swap_info_struct race between swapoff and get_swap_pages()
nilfs2: fix sysfs interface lifetime
mm: take a page reference when removing device exclusive entries
mm: vmalloc: avoid warn_alloc noise caused by fatal signal
nilfs2: initialize "struct nilfs_binfo_dat"->bi_pad field
nilfs2: fix potential UAF of struct nilfs_sc_info in nilfs_segctor_thread()
zsmalloc: document freeable stats
zsmalloc: document new fullness grouping
fsdax: force clear dirty mark if CoW
mm/hugetlb: fix uffd wr-protection for CoW optimization path
mm: enable maple tree RCU mode by default
maple_tree: add RCU lock checking to rcu callback functions
maple_tree: add smp_rmb() to dead node detection
maple_tree: fix write memory barrier of nodes once dead for RCU mode
maple_tree: remove extra smp_wmb() from mas_dead_leaves()
maple_tree: fix freeing of nodes in rcu mode
maple_tree: detect dead nodes in mas_start()
maple_tree: be more cautious about dead nodes
...
Since 32ef9e5054, -Wa,-gdwarf-2 is no longer used in KBUILD_AFLAGS.
Instead, it includes -g, the appropriate -gdwarf-* flag, and also the
-Wa versions of both of those if building with Clang and GNU as. As a
result, debug info was being generated for the purgatory objects, even
though the intention was that it not be.
Fixes: 32ef9e5054 ("Makefile.debug: re-enable debug info for .S files")
Signed-off-by: Alyssa Ross <hi@alyssa.is>
Cc: stable@vger.kernel.org
Acked-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
When memory is a little tight on my system, it's pretty easy to see
warnings that look like this.
ksoftirqd/0: page allocation failure: order:3, mode:0x40a20(GFP_ATOMIC|__GFP_COMP), nodemask=(null),cpuset=/,mems_allowed=0
...
Call trace:
dump_backtrace+0x0/0x1e8
show_stack+0x20/0x2c
dump_stack_lvl+0x60/0x78
dump_stack+0x18/0x38
warn_alloc+0x104/0x174
__alloc_pages+0x588/0x67c
alloc_rx_agg+0xa0/0x190 [r8152 ...]
r8152_poll+0x270/0x760 [r8152 ...]
__napi_poll+0x44/0x1ec
net_rx_action+0x100/0x300
__do_softirq+0xec/0x38c
run_ksoftirqd+0x38/0xec
smpboot_thread_fn+0xb8/0x248
kthread+0x134/0x154
ret_from_fork+0x10/0x20
On a fragmented system it's normal that order 3 allocations will
sometimes fail, especially atomic ones. The driver handles these
failures fine and the WARN just creates spam in the logs for this
case. The __GFP_NOWARN flag is exactly for this situation, so add it
to the allocation.
NOTE: my testing is on a 5.15 system, but there should be no reason
that this would be fundamentally different on a mainline kernel.
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Acked-by: Hayes Wang <hayeswang@realtek.com>
Link: https://lore.kernel.org/r/20230406171411.1.I84dbef45786af440fd269b71e9436a96a8e7a152@changeid
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
assume the following setup on a single machine:
1. An openvswitch instance with one bridge and default flows
2. two network namespaces "server" and "client"
3. two ovs interfaces "server" and "client" on the bridge
4. for each ovs interface a veth pair with a matching name and 32 rx and
tx queues
5. move the ends of the veth pairs to the respective network namespaces
6. assign ip addresses to each of the veth ends in the namespaces (needs
to be the same subnet)
7. start some http server on the server network namespace
8. test if a client in the client namespace can reach the http server
when following the actions below the host has a chance of getting a cpu
stuck in a infinite loop:
1. send a large amount of parallel requests to the http server (around
3000 curls should work)
2. in parallel delete the network namespace (do not delete interfaces or
stop the server, just kill the namespace)
there is a low chance that this will cause the below kernel cpu stuck
message. If this does not happen just retry.
Below there is also the output of bpftrace for the functions mentioned
in the output.
The series of events happening here is:
1. the network namespace is deleted calling
`unregister_netdevice_many_notify` somewhere in the process
2. this sets first `NETREG_UNREGISTERING` on both ends of the veth and
then runs `synchronize_net`
3. it then calls `call_netdevice_notifiers` with `NETDEV_UNREGISTER`
4. this is then handled by `dp_device_event` which calls
`ovs_netdev_detach_dev` (if a vport is found, which is the case for
the veth interface attached to ovs)
5. this removes the rx_handlers of the device but does not prevent
packages to be sent to the device
6. `dp_device_event` then queues the vport deletion to work in
background as a ovs_lock is needed that we do not hold in the
unregistration path
7. `unregister_netdevice_many_notify` continues to call
`netdev_unregister_kobject` which sets `real_num_tx_queues` to 0
8. port deletion continues (but details are not relevant for this issue)
9. at some future point the background task deletes the vport
If after 7. but before 9. a packet is send to the ovs vport (which is
not deleted at this point in time) which forwards it to the
`dev_queue_xmit` flow even though the device is unregistering.
In `skb_tx_hash` (which is called in the `dev_queue_xmit`) path there is
a while loop (if the packet has a rx_queue recorded) that is infinite if
`dev->real_num_tx_queues` is zero.
To prevent this from happening we update `do_output` to handle devices
without carrier the same as if the device is not found (which would
be the code path after 9. is done).
Additionally we now produce a warning in `skb_tx_hash` if we will hit
the infinite loop.
bpftrace (first word is function name):
__dev_queue_xmit server: real_num_tx_queues: 1, cpu: 2, pid: 28024, tid: 28024, skb_addr: 0xffff9edb6f207000, reg_state: 1
netdev_core_pick_tx server: addr: 0xffff9f0a46d4a000 real_num_tx_queues: 1, cpu: 2, pid: 28024, tid: 28024, skb_addr: 0xffff9edb6f207000, reg_state: 1
dp_device_event server: real_num_tx_queues: 1 cpu 9, pid: 21024, tid: 21024, event 2, reg_state: 1
synchronize_rcu_expedited: cpu 9, pid: 21024, tid: 21024
synchronize_rcu_expedited: cpu 9, pid: 21024, tid: 21024
synchronize_rcu_expedited: cpu 9, pid: 21024, tid: 21024
synchronize_rcu_expedited: cpu 9, pid: 21024, tid: 21024
dp_device_event server: real_num_tx_queues: 1 cpu 9, pid: 21024, tid: 21024, event 6, reg_state: 2
ovs_netdev_detach_dev server: real_num_tx_queues: 1 cpu 9, pid: 21024, tid: 21024, reg_state: 2
netdev_rx_handler_unregister server: real_num_tx_queues: 1, cpu: 9, pid: 21024, tid: 21024, reg_state: 2
synchronize_rcu_expedited: cpu 9, pid: 21024, tid: 21024
netdev_rx_handler_unregister ret server: real_num_tx_queues: 1, cpu: 9, pid: 21024, tid: 21024, reg_state: 2
dp_device_event server: real_num_tx_queues: 1 cpu 9, pid: 21024, tid: 21024, event 27, reg_state: 2
dp_device_event server: real_num_tx_queues: 1 cpu 9, pid: 21024, tid: 21024, event 22, reg_state: 2
dp_device_event server: real_num_tx_queues: 1 cpu 9, pid: 21024, tid: 21024, event 18, reg_state: 2
netdev_unregister_kobject: real_num_tx_queues: 1, cpu: 9, pid: 21024, tid: 21024
synchronize_rcu_expedited: cpu 9, pid: 21024, tid: 21024
ovs_vport_send server: real_num_tx_queues: 0, cpu: 2, pid: 28024, tid: 28024, skb_addr: 0xffff9edb6f207000, reg_state: 2
__dev_queue_xmit server: real_num_tx_queues: 0, cpu: 2, pid: 28024, tid: 28024, skb_addr: 0xffff9edb6f207000, reg_state: 2
netdev_core_pick_tx server: addr: 0xffff9f0a46d4a000 real_num_tx_queues: 0, cpu: 2, pid: 28024, tid: 28024, skb_addr: 0xffff9edb6f207000, reg_state: 2
broken device server: real_num_tx_queues: 0, cpu: 2, pid: 28024, tid: 28024
ovs_dp_detach_port server: real_num_tx_queues: 0 cpu 9, pid: 9124, tid: 9124, reg_state: 2
synchronize_rcu_expedited: cpu 9, pid: 33604, tid: 33604
stuck message:
watchdog: BUG: soft lockup - CPU#5 stuck for 26s! [curl:1929279]
Modules linked in: veth pktgen bridge stp llc ip_set_hash_net nft_counter xt_set nft_compat nf_tables ip_set_hash_ip ip_set nfnetlink_cttimeout nfnetlink openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 tls binfmt_misc nls_iso8859_1 input_leds joydev serio_raw dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua sch_fq_codel drm efi_pstore virtio_rng ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear hid_generic usbhid hid crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel virtio_net ahci net_failover crypto_simd cryptd psmouse libahci virtio_blk failover
CPU: 5 PID: 1929279 Comm: curl Not tainted 5.15.0-67-generic #74-Ubuntu
Hardware name: OpenStack Foundation OpenStack Nova, BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
RIP: 0010:netdev_pick_tx+0xf1/0x320
Code: 00 00 8d 48 ff 0f b7 c1 66 39 ca 0f 86 e9 01 00 00 45 0f b7 ff 41 39 c7 0f 87 5b 01 00 00 44 29 f8 41 39 c7 0f 87 4f 01 00 00 <eb> f2 0f 1f 44 00 00 49 8b 94 24 28 04 00 00 48 85 d2 0f 84 53 01
RSP: 0018:ffffb78b40298820 EFLAGS: 00000246
RAX: 0000000000000000 RBX: ffff9c8773adc2e0 RCX: 000000000000083f
RDX: 0000000000000000 RSI: ffff9c8773adc2e0 RDI: ffff9c870a25e000
RBP: ffffb78b40298858 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff9c870a25e000
R13: ffff9c870a25e000 R14: ffff9c87fe043480 R15: 0000000000000000
FS: 00007f7b80008f00(0000) GS:ffff9c8e5f740000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f7b80f6a0b0 CR3: 0000000329d66000 CR4: 0000000000350ee0
Call Trace:
<IRQ>
netdev_core_pick_tx+0xa4/0xb0
__dev_queue_xmit+0xf8/0x510
? __bpf_prog_exit+0x1e/0x30
dev_queue_xmit+0x10/0x20
ovs_vport_send+0xad/0x170 [openvswitch]
do_output+0x59/0x180 [openvswitch]
do_execute_actions+0xa80/0xaa0 [openvswitch]
? kfree+0x1/0x250
? kfree+0x1/0x250
? kprobe_perf_func+0x4f/0x2b0
? flow_lookup.constprop.0+0x5c/0x110 [openvswitch]
ovs_execute_actions+0x4c/0x120 [openvswitch]
ovs_dp_process_packet+0xa1/0x200 [openvswitch]
? ovs_ct_update_key.isra.0+0xa8/0x120 [openvswitch]
? ovs_ct_fill_key+0x1d/0x30 [openvswitch]
? ovs_flow_key_extract+0x2db/0x350 [openvswitch]
ovs_vport_receive+0x77/0xd0 [openvswitch]
? __htab_map_lookup_elem+0x4e/0x60
? bpf_prog_680e8aff8547aec1_kfree+0x3b/0x714
? trace_call_bpf+0xc8/0x150
? kfree+0x1/0x250
? kfree+0x1/0x250
? kprobe_perf_func+0x4f/0x2b0
? kprobe_perf_func+0x4f/0x2b0
? __mod_memcg_lruvec_state+0x63/0xe0
netdev_port_receive+0xc4/0x180 [openvswitch]
? netdev_port_receive+0x180/0x180 [openvswitch]
netdev_frame_hook+0x1f/0x40 [openvswitch]
__netif_receive_skb_core.constprop.0+0x23d/0xf00
__netif_receive_skb_one_core+0x3f/0xa0
__netif_receive_skb+0x15/0x60
process_backlog+0x9e/0x170
__napi_poll+0x33/0x180
net_rx_action+0x126/0x280
? ttwu_do_activate+0x72/0xf0
__do_softirq+0xd9/0x2e7
? rcu_report_exp_cpu_mult+0x1b0/0x1b0
do_softirq+0x7d/0xb0
</IRQ>
<TASK>
__local_bh_enable_ip+0x54/0x60
ip_finish_output2+0x191/0x460
__ip_finish_output+0xb7/0x180
ip_finish_output+0x2e/0xc0
ip_output+0x78/0x100
? __ip_finish_output+0x180/0x180
ip_local_out+0x5e/0x70
__ip_queue_xmit+0x184/0x440
? tcp_syn_options+0x1f9/0x300
ip_queue_xmit+0x15/0x20
__tcp_transmit_skb+0x910/0x9c0
? __mod_memcg_state+0x44/0xa0
tcp_connect+0x437/0x4e0
? ktime_get_with_offset+0x60/0xf0
tcp_v4_connect+0x436/0x530
__inet_stream_connect+0xd4/0x3a0
? kprobe_perf_func+0x4f/0x2b0
? aa_sk_perm+0x43/0x1c0
inet_stream_connect+0x3b/0x60
__sys_connect_file+0x63/0x70
__sys_connect+0xa6/0xd0
? setfl+0x108/0x170
? do_fcntl+0xe8/0x5a0
__x64_sys_connect+0x18/0x20
do_syscall_64+0x5c/0xc0
? __x64_sys_fcntl+0xa9/0xd0
? exit_to_user_mode_prepare+0x37/0xb0
? syscall_exit_to_user_mode+0x27/0x50
? do_syscall_64+0x69/0xc0
? __sys_setsockopt+0xea/0x1e0
? exit_to_user_mode_prepare+0x37/0xb0
? syscall_exit_to_user_mode+0x27/0x50
? __x64_sys_setsockopt+0x1f/0x30
? do_syscall_64+0x69/0xc0
? irqentry_exit+0x1d/0x30
? exc_page_fault+0x89/0x170
entry_SYSCALL_64_after_hwframe+0x61/0xcb
RIP: 0033:0x7f7b8101c6a7
Code: 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2a 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 18 89 54 24 0c 48 89 34 24 89
RSP: 002b:00007ffffd6b2198 EFLAGS: 00000246 ORIG_RAX: 000000000000002a
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f7b8101c6a7
RDX: 0000000000000010 RSI: 00007ffffd6b2360 RDI: 0000000000000005
RBP: 0000561f1370d560 R08: 00002795ad21d1ac R09: 0030312e302e302e
R10: 00007ffffd73f080 R11: 0000000000000246 R12: 0000561f1370c410
R13: 0000000000000000 R14: 0000000000000005 R15: 0000000000000000
</TASK>
Fixes: 7f8a436eaa ("openvswitch: Add conntrack action")
Co-developed-by: Luca Czesla <luca.czesla@mail.schwarz>
Signed-off-by: Luca Czesla <luca.czesla@mail.schwarz>
Signed-off-by: Felix Huettner <felix.huettner@mail.schwarz>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/ZC0pBXBAgh7c76CA@kernel-bug-kernel-bug
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Daniel Borkmann says:
====================
pull-request: bpf 2023-04-08
We've added 4 non-merge commits during the last 11 day(s) which contain
a total of 5 files changed, 39 insertions(+), 6 deletions(-).
The main changes are:
1) Fix BPF TCP socket iterator to use correct helper for dropping
socket's refcount, that is, sock_gen_put instead of sock_put,
from Martin KaFai Lau.
2) Fix a BTI exception splat in BPF trampoline-generated code on arm64,
from Xu Kuohai.
3) Fix a LongArch JIT error from missing BPF_NOSPEC no-op, from George Guo.
4) Fix dynamic XDP feature detection of veth in xdp_redirect selftest,
from Lorenzo Bianconi.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
selftests/bpf: fix xdp_redirect xdp-features selftest for veth driver
bpf, arm64: Fixed a BTI error on returning to patched function
LoongArch, bpf: Fix jit to skip speculation barrier opcode
bpf: tcp: Use sock_gen_put instead of sock_put in bpf_iter_tcp
====================
Link: https://lore.kernel.org/r/20230407224642.30906-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull gpio fixes from Bartosz Golaszewski:
- fix irq handling in gpio-davinci
- fix Kconfig dependencies for gpio-regmap
* tag 'gpio-fixes-for-v6.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpio: davinci: Add irq chip flag to skip set wake
gpio: davinci: Do not clear the bank intr enable bit in save_context
gpio: GPIO_REGMAP: select REGMAP instead of depending on it
Pull ACPI fixes from Rafael Wysocki:
"Fix the ACPI backlight override mechanism for the cases when
acpi_backlight=video is set through the kernel command line or a DMI
quirk and add backlight quirks for Apple iMac14,1 and iMac14,2 and
Lenovo ThinkPad W530 (Hans de Goede)"
* tag 'acpi-6.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: video: Add acpi_backlight=video quirk for Lenovo ThinkPad W530
ACPI: video: Add acpi_backlight=video quirk for Apple iMac14,1 and iMac14,2
ACPI: video: Make acpi_backlight=video work independent from GPU driver
ACPI: video: Add auto_detect arg to __acpi_video_get_backlight_type()
Pull arm64 fix from Catalin Marinas:
"Fix uninitialised variable warning (from smatch) in the arm64 compat
alignment fixup code"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: compat: Work around uninitialized variable warning
Pull ksmbd server fixes from Steve French:
"Four fixes, three for stable:
- slab out of bounds fix
- lock cancellation fix
- minor cleanup to address clang warning
- fix for xfstest 551 (wrong parms passed to kvmalloc)"
* tag '6.3-rc5-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
ksmbd: fix slab-out-of-bounds in init_smb2_rsp_hdr
ksmbd: delete asynchronous work from list
ksmbd: remove unused is_char_allowed function
ksmbd: do not call kvmalloc() with __GFP_NORETRY | __GFP_NO_WARN
There are some interop issues seen across a few DP monitors with
HBR3 and herobrine boards where the DP display stays blank with hbr3.
This is still under investigation but in preparation for supporting
higher resolutions, its better to disable HBR3 till the issues are
root-caused as there is really no guarantee which monitors will show
the issue and which would not.
This can be enabled back after successful validation across more DP
sinks.
Signed-off-by: Abhinav Kumar <quic_abhinavk@quicinc.com>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Link: https://lore.kernel.org/r/20230329233416.27152-1-quic_abhinavk@quicinc.com
The VLAN filters info is currently being held in a list and 2 bitmaps
(active_cvlans and active_svlans). We are experiencing some racing where
data is not in sync in the list and bitmaps. For example, the VLAN is
initially added to the list but only when the PF replies, it is added to
the bitmap. If a user adds many V2 VLANS before the PF responds:
while [ $((i++)) ]
ip l add l eth0 name eth0.$i type vlan id $i
we might end up with more VLAN list entries than the designated limit.
Also, The "ip link show" will show more links added than the PF limit.
On the other and, the bitmaps are only used to check the number of VLAN
filters and to re-enable the filters when the interface goes from DOWN to
UP.
This patch gets rid of the bitmaps and uses the list only. To do that,
the states of the VLAN filter are modified:
1 - IAVF_VLAN_REMOVE: the entry needs to be totally removed after informing
the PF. This is the "ip link del eth0.$i" path.
2 - IAVF_VLAN_DISABLE: (new) the netdev went down. The filter needs to be
removed from the PF and then marked INACTIVE.
3 - IAVF_VLAN_INACTIVE: (new) no PF filter exists, but the user did not
delete the VLAN.
Fixes: 48ccc43ecf ("iavf: Add support VIRTCHNL_VF_OFFLOAD_VLAN_V2 during netdev config")
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The VLAN filter states are currently being saved as individual bits.
This is error prone as multiple bits might be mistakenly set.
Fix by replacing the bits with a single state enum. Also, add an
"ACTIVE" state for filters that are accepted by the PF.
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
This patch add bonding arp validate tests with mode active backup,
monitor arp_ip_target and ns_ip6_target. It also checks mii_status
to make sure all slaves are UP.
Acked-by: Jonathan Toppins <jtoppins@redhat.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To improve the testing process for bond options, A new bond topology lib
is added to our testing setup. The current option_prio.sh file will be
renamed to bond_options.sh so that all bonding options can be tested here.
Specifically, for priority testing, we will run all tests using modes
1, 5, and 6. These changes will help us streamline the testing process
and ensure that our bond options are rigorously evaluated.
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Jonathan Toppins <jtoppins@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When arp_validate is set to 2, 3, or 6, validation is performed for
backup slaves as well. As stated in the bond documentation, validation
involves checking the broadcast ARP request sent out via the active
slave. This helps determine which slaves are more likely to function in
the event of an active slave failure.
However, when the target is an IPv6 address, the NS message sent from
the active interface is not checked on backup slaves. Additionally,
based on the bond_arp_rcv() rule b, we must reverse the saddr and daddr
when checking the NS message.
Note that when checking the NS message, the destination address is a
multicast address. Therefore, we must convert the target address to
solicited multicast in the bond_get_targets_ip6() function.
Prior to the fix, the backup slaves had a mii status of "down", but
after the fix, all of the slaves' mii status was updated to "UP".
Fixes: 4e24be018e ("bonding: add new parameter ns_targets")
Reviewed-by: Jonathan Toppins <jtoppins@redhat.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
UBSAN: shift-out-of-bounds in net/ipv4/tcp_input.c:555:23
shift exponent 255 is too large for 32-bit type 'int'
CPU: 1 PID: 7907 Comm: ssh Not tainted 6.3.0-rc4-00161-g62bad54b26db-dirty #206
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x136/0x150
__ubsan_handle_shift_out_of_bounds+0x21f/0x5a0
tcp_init_transfer.cold+0x3a/0xb9
tcp_finish_connect+0x1d0/0x620
tcp_rcv_state_process+0xd78/0x4d60
tcp_v4_do_rcv+0x33d/0x9d0
__release_sock+0x133/0x3b0
release_sock+0x58/0x1b0
'maxwin' is int, shifting int for 32 or more bits is undefined behaviour.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Smatch reports: drivers/net/ethernet/sun/niu.c:4525
niu_alloc_channels() warn: missing unwind goto?
If niu_rbr_fill() fails, then we are directly returning 'err' without
freeing the channels.
Fix this by changing direct return to a goto 'out_err'.
Fixes: a3138df9f2 ("[NIU]: Add Sun Neptune ethernet driver.")
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This lock was supposed to be an unlock.
Fixes: 6cc041e90c ("cifs: avoid races in parallel reconnects in smb1")
Signed-off-by: Dan Carpenter <error27@gmail.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Currently if disk_scan_partitions() failed, GD_NEED_PART_SCAN will still
set, and partition scan will be proceed again when blkdev_get_by_dev()
is called. However, this will cause a problem that re-assemble partitioned
raid device will creat partition for underlying disk.
Test procedure:
mdadm -CR /dev/md0 -l 1 -n 2 /dev/sda /dev/sdb -e 1.0
sgdisk -n 0:0:+100MiB /dev/md0
blockdev --rereadpt /dev/sda
blockdev --rereadpt /dev/sdb
mdadm -S /dev/md0
mdadm -A /dev/md0 /dev/sda /dev/sdb
Test result: underlying disk partition and raid partition can be
observed at the same time
Note that this can still happen in come corner cases that
GD_NEED_PART_SCAN can be set for underlying disk while re-assemble raid
device.
Fixes: e5cfefa97b ("block: fix scan partition for exclusively open device again")
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Fix grep warning during the build, with GNU grep 3.8
with the following command
`grep -v '^\#\|^$$' rust/bindgen_parameters`
I see the following warning
```
grep: warning: stray \ before #
--opaque-type xregs_state
--opaque-type desc_struct
--opaque-type arch_lbr_state
--opaque-type local_apic
--opaque-type x86_msi_data
--opaque-type x86_msi_addr_lo
--opaque-type kunit_try_catch
--opaque-type spinlock
--no-doc-comments
```
Signed-off-by: Vincenzo Palazzo <vincenzopalazzodev@gmail.com>
Tested-by: Martin Rodriguez Reboredo <yakoyoku@gmail.com>
Reviewed-by: Martin Rodriguez Reboredo <yakoyoku@gmail.com>
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
More complex drivers might want to use modules to organize their Rust
code, but those module folders do not need a Makefile.
generate_rust_analyzer.py currently crashes on those. Fix it so that a
missing Makefile is silently ignored.
Link: https://github.com/Rust-for-Linux/linux/pull/883
Signed-off-by: Asahi Lina <lina@asahilina.net>
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
Picasso was the first APU that introduced s2idle support from AMD,
and it was predating before vendors started to use `StorageD3Enable`
in their firmware.
Windows doesn't have problems with this hardware and NVME so it was
likely on the list of hardcoded CPUs to use this behavior in Windows.
Add it to the list for Linux to avoid NVME resume issues.
Reported-by: Stuart Axon <stuaxo2@yahoo.com>
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2449
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Pull networking fixes from Jakub Kicinski:
"Including fixes from wireless and can.
Current release - regressions:
- wifi: mac80211:
- fix potential null pointer dereference
- fix receiving mesh packets in forwarding=0 networks
- fix mesh forwarding
Current release - new code bugs:
- virtio/vsock: fix leaks due to missing skb owner
Previous releases - regressions:
- raw: fix NULL deref in raw_get_next().
- sctp: check send stream number after wait_for_sndbuf
- qrtr:
- fix a refcount bug in qrtr_recvmsg()
- do not do DEL_SERVER broadcast after DEL_CLIENT
- wifi: brcmfmac: fix SDIO suspend/resume regression
- wifi: mt76: fix use-after-free in fw features query.
- can: fix race between isotp_sendsmg() and isotp_release()
- eth: mtk_eth_soc: fix remaining throughput regression
- eth: ice: reset FDIR counter in FDIR init stage
Previous releases - always broken:
- core: don't let netpoll invoke NAPI if in xmit context
- icmp: guard against too small mtu
- ipv6: fix an uninit variable access bug in __ip6_make_skb()
- wifi: mac80211: fix the size calculation of
ieee80211_ie_len_eht_cap()
- can: fix poll() to not report false EPOLLOUT events
- eth: gve: secure enough bytes in the first TX desc for all TCP
pkts"
* tag 'net-6.3-rc6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (47 commits)
net: stmmac: check fwnode for phy device before scanning for phy
net: stmmac: Add queue reset into stmmac_xdp_open() function
selftests: net: rps_default_mask.sh: delete veth link specifically
net: fec: make use of MDIO C45 quirk
can: isotp: fix race between isotp_sendsmg() and isotp_release()
can: isotp: isotp_ops: fix poll() to not report false EPOLLOUT events
can: isotp: isotp_recvmsg(): use sock_recv_cmsgs() to get SOCK_RXQ_OVFL infos
can: j1939: j1939_tp_tx_dat_new(): fix out-of-bounds memory access
gve: Secure enough bytes in the first TX desc for all TCP pkts
netlink: annotate lockless accesses to nlk->max_recvmsg_len
ethtool: reset #lanes when lanes is omitted
ping: Fix potentail NULL deref for /proc/net/icmp.
raw: Fix NULL deref in raw_get_next().
ice: Reset FDIR counter in FDIR init stage
ice: fix wrong fallback logic for FDIR
net: stmmac: fix up RX flow hash indirection table when setting channels
net: ethernet: ti: am65-cpsw: Fix mdio cleanup in probe
wifi: mt76: ignore key disable commands
wifi: ath11k: reduce the MHI timeout to 20s
ipv6: Fix an uninit variable access bug in __ip6_make_skb()
...
Pull Kselftest fixes from Shuah Khan:
"One single fix to mount_setattr_test build failure"
* tag 'linux-kselftest-fixes-6.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests mount: Fix mount_setattr_test builds failed
Pull iommufd fixes from Jason Gunthorpe:
- An invalid VA range can be be put in a pages and eventually trigger
WARN_ON, reject it early
- Use of the wrong start index value when doing the complex batch carry
scheme
- Wrong store ordering resulting in corrupting data used in a later
calculation that corrupted the batch structure during carry
* tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd:
iommufd: Do not corrupt the pfn list when doing batch carry
iommufd: Fix unpinning of pages when an access is present
iommufd: Check for uptr overflow
Pull pwm fixes from Thierry Reding:
"These are some fixes to make sure the PWM state structure is always
initialized to a known state.
Prior to this it could happen in some situations that random data from
the stack would leak into the data structure and cause subtle bugs"
* tag 'pwm/for-6.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm:
pwm: Zero-initialize the pwm_state passed to driver's .get_state()
pwm: meson: Explicitly set .polarity in .get_state()
pwm: sprd: Explicitly set .polarity in .get_state()
pwm: iqs620a: Explicitly set .polarity in .get_state()
pwm: cros-ec: Explicitly set .polarity in .get_state()
pwm: hibvt: Explicitly set .polarity in .get_state()
KVM/arm64 fixes for 6.3, part #3
- Ensure the guest PMU context is restored before the first KVM_RUN,
fixing an issue where EL0 event counting is broken after vCPU
save/restore
- Actually initialize ID_AA64PFR0_EL1.{CSV2,CSV3} based on the
sanitized, system-wide values for protected VMs
Pull drm fixes from Daniel Vetter:
"Mostly i915 fixes: dp mst for compression/dsc, perf ioctl uaf, ctx rpm
accounting, gt reset vs huc loading.
And a few individual driver fixes: ivpu dma fence&suspend, panfrost
mmap, nouveau color depth"
* tag 'drm-fixes-2023-04-06' of git://anongit.freedesktop.org/drm/drm:
accel/ivpu: Fix S3 system suspend when not idle
accel/ivpu: Add dma fence to command buffers only
drm/i915: Fix context runtime accounting
drm/i915: fix race condition UAF in i915_perf_add_config_ioctl
drm/i915: Use compressed bpp when calculating m/n value for DP MST DSC
drm/i915/huc: Cancel HuC delayed load timer on reset.
drm/i915/ttm: fix sparse warning
drm/panfrost: Fix the panfrost_mmu_map_fault_addr() error path
drm/nouveau/disp: Support more modes by checking with lower bpc
Pull sound fixes from Takashi Iwai:
"The majority of changes here are various fixes for Intel drivers,
and there is a change in ASoC PCM core for the format constraints.
In addition, a workaround for HD-audio HDMI regressions and usual
HD-audio quirks are found"
* tag 'sound-6.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda/hdmi: Preserve the previous PCM device upon re-enablement
ALSA: hda/realtek: Add quirk for Clevo X370SNW
ALSA: hda/realtek: fix mute/micmute LEDs for a HP ProBook
ASoC: SOF: avoid a NULL dereference with unsupported widgets
ASoC: da7213.c: add missing pm_runtime_disable()
ASoC: hdac_hdmi: use set_stream() instead of set_tdm_slots()
ASoC: codecs: lpass: fix the order or clks turn off during suspend
ASoC: Intel: bytcr_rt5640: Add quirk for the Acer Iconia One 7 B1-750
ASoC: SOF: ipc4: Ensure DSP is in D0I0 during sof_ipc4_set_get_data()
ASoC: amd: yc: Add DMI entries to support Victus by HP Laptop 16-e1xxx (8A22)
ASoC: soc-pcm: fix hw->formats cleared by soc_pcm_hw_init() for dpcm
ASoC: Intel: soc-acpi: add table for Intel 'Rooks County' NUC M15
ASOC: Intel: sof_sdw: add quirk for Intel 'Rooks County' NUC M15
Pull x86 platform driver fixes from Hans de Goede:
- more think-lmi fixes
- one DMI quirk addition
* tag 'platform-drivers-x86-v6.3-5' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/x86: thinkpad_acpi: Add missing T14s Gen1 type to s2idle quirk list
platform/x86: think-lmi: Clean up display of current_value on Thinkstation
platform/x86: think-lmi: Fix memory leaks when parsing ThinkStation WMI strings
platform/x86: think-lmi: Fix memory leak when showing current settings
Memory passed to kvfree_rcu() that is to be freed is tracked by a
per-CPU kfree_rcu_cpu structure, which in turn contains pointers
to kvfree_rcu_bulk_data structures that contain pointers to memory
that has not yet been handed to RCU, along with an kfree_rcu_cpu_work
structure that tracks the memory that has already been handed to RCU.
These structures track three categories of memory: (1) Memory for
kfree(), (2) Memory for kvfree(), and (3) Memory for both that arrived
during an OOM episode. The first two categories are tracked in a
cache-friendly manner involving a dynamically allocated page of pointers
(the aforementioned kvfree_rcu_bulk_data structures), while the third
uses a simple (but decidedly cache-unfriendly) linked list through the
rcu_head structures in each block of memory.
On a given CPU, these three categories are handled as a unit, with that
CPU's kfree_rcu_cpu_work structure having one pointer for each of the
three categories. Clearly, new memory for a given category cannot be
placed in the corresponding kfree_rcu_cpu_work structure until any old
memory has had its grace period elapse and thus has been removed. And
the kfree_rcu_monitor() function does in fact check for this.
Except that the kfree_rcu_monitor() function checks these pointers one
at a time. This means that if the previous kfree_rcu() memory passed
to RCU had only category 1 and the current one has only category 2, the
kfree_rcu_monitor() function will send that current category-2 memory
along immediately. This can result in memory being freed too soon,
that is, out from under unsuspecting RCU readers.
To see this, consider the following sequence of events, in which:
o Task A on CPU 0 calls rcu_read_lock(), then uses "from_cset",
then is preempted.
o CPU 1 calls kfree_rcu(cset, rcu_head) in order to free "from_cset"
after a later grace period. Except that "from_cset" is freed
right after the previous grace period ended, so that "from_cset"
is immediately freed. Task A resumes and references "from_cset"'s
member, after which nothing good happens.
In full detail:
CPU 0 CPU 1
---------------------- ----------------------
count_memcg_event_mm()
|rcu_read_lock() <---
|mem_cgroup_from_task()
|// css_set_ptr is the "from_cset" mentioned on CPU 1
|css_set_ptr = rcu_dereference((task)->cgroups)
|// Hard irq comes, current task is scheduled out.
cgroup_attach_task()
|cgroup_migrate()
|cgroup_migrate_execute()
|css_set_move_task(task, from_cset, to_cset, true)
|cgroup_move_task(task, to_cset)
|rcu_assign_pointer(.., to_cset)
|...
|cgroup_migrate_finish()
|put_css_set_locked(from_cset)
|from_cset->refcount return 0
|kfree_rcu(cset, rcu_head) // free from_cset after new gp
|add_ptr_to_bulk_krc_lock()
|schedule_delayed_work(&krcp->monitor_work, ..)
kfree_rcu_monitor()
|krcp->bulk_head[0]'s work attached to krwp->bulk_head_free[]
|queue_rcu_work(system_wq, &krwp->rcu_work)
|if rwork->rcu.work is not in WORK_STRUCT_PENDING_BIT state,
|call_rcu(&rwork->rcu, rcu_work_rcufn) <--- request new gp
// There is a perious call_rcu(.., rcu_work_rcufn)
// gp end, rcu_work_rcufn() is called.
rcu_work_rcufn()
|__queue_work(.., rwork->wq, &rwork->work);
|kfree_rcu_work()
|krwp->bulk_head_free[0] bulk is freed before new gp end!!!
|The "from_cset" is freed before new gp end.
// the task resumes some time later.
|css_set_ptr->subsys[(subsys_id) <--- Caused kernel crash, because css_set_ptr is freed.
This commit therefore causes kfree_rcu_monitor() to refrain from moving
kfree_rcu() memory to the kfree_rcu_cpu_work structure until the RCU
grace period has completed for all three categories.
v2: Use helper function instead of inserted code block at kfree_rcu_monitor().
Fixes: 34c8817455 ("rcu: Support kfree_bulk() interface in kfree_rcu()")
Fixes: 5f3c8d6204 ("rcu/tree: Maintain separate array for vmalloc ptrs")
Reported-by: Mukesh Ojha <quic_mojha@quicinc.com>
Signed-off-by: Ziwei Dai <ziwei.dai@unisoc.com>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Tested-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Pull asm-generic fixes from Arnd Bergmann:
"These are minor fixes to address false-positive build warnings:
Some of the less common I/O accessors are missing __force casts and
cause sparse warnings for their implied byteswap, and a recent change
to __generic_cmpxchg_local() causes a warning about constant integer
truncation"
* tag 'asm-generic-fixes-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
asm-generic: avoid __generic_cmpxchg_local warnings
asm-generic/io.h: suppress endianness warnings for relaxed accessors
asm-generic/io.h: suppress endianness warnings for readq() and writeq()
The current code path can lead to warnings because of uninitialized device,
which contains, as a consequence, uninitialized kobject. The uninitialized
device is passed to of_platform_populate, which will at some point, while
creating child device, try to get a reference on uninitialized parent,
resulting in the following warning:
kobject: '(null)' ((ptrval)): is not initialized, yet kobject_get() is
being called.
The warning is observed after migrating a kernel 5.10.x to 6.1.x.
Reverting commit 0d70af3c25 ("fpga: bridge: Use standard dev_release for
class driver") seems to remove the warning.
This commit aggregates device_initialize() and device_add() into
device_register() but this new call is done AFTER of_platform_populate
Fixes: 0d70af3c25 ("fpga: bridge: Use standard dev_release for class driver")
Signed-off-by: Alexis Lothoré <alexis.lothore@bootlin.com>
Acked-by: Xu Yilun <yilun.xu@intel.com>
Link: https://lore.kernel.org/r/20230404133102.2837535-2-alexis.lothore@bootlin.com
Signed-off-by: Xu Yilun <yilun.xu@intel.com>
Smatch Warns:
sound/firewire/tascam/tascam-stream.c:493 snd_tscm_stream_start_duplex()
warn: missing unwind goto?
The direct return will cause the stream list of "&tscm->domain" unemptied
and the session in "tscm" unfinished if amdtp_domain_start() returns with
an error.
Fix this by changing the direct return to a goto which will empty the
stream list of "&tscm->domain" and finish the session in "tscm".
The snd_tscm_stream_start_duplex() function is called in the prepare
callback of PCM. According to "ALSA Kernel API Documentation", the prepare
callback of PCM will be called many times at each setup. So, if the
"&d->streams" list is not emptied, when the prepare callback is called
next time, snd_tscm_stream_start_duplex() will receive -EBUSY from
amdtp_domain_add_stream() that tries to add an existing stream to the
domain. The error handling code after the "error" label will be executed
in this case, and the "&d->streams" list will be emptied. So not emptying
the "&d->streams" list will not cause an issue. But it is more efficient
and readable to empty it on the first error by changing the direct return
to a goto statement.
The session in "tscm" has been begun before amdtp_domain_start(), so it
needs to be finished when amdtp_domain_start() fails.
Fixes: c281d46a51 ("ALSA: firewire-tascam: support AMDTP domain")
Signed-off-by: Xu Biang <xubiang@hust.edu.cn>
Reviewed-by: Dan Carpenter <error27@gmail.com>
Acked-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20230406132801.105108-1-xubiang@hust.edu.cn
Signed-off-by: Takashi Iwai <tiwai@suse.de>
The BTRFS_FS_CSUM_IMPL_FAST flag is currently set whenever a non-generic
crc32c is detected, which is the incorrect check if the file system uses
a different checksumming algorithm. Refactor the code to only check
this if crc32c is actually used. Note that in an ideal world the
information if an algorithm is hardware accelerated or not should be
provided by the crypto API instead, but that's left for another day.
CC: stable@vger.kernel.org # 5.4.x: c8a5f8ca9a: btrfs: print checksum type and implementation at mount time
CC: stable@vger.kernel.org # 5.4.x
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Commit d7b9416fe5 ("btrfs: remove btrfs_end_io_wq") converted the read
and I/O handling from btrfs_workqueues to Linux workqueues, and as part
of that lost the code to apply the thread_pool= based max_active limit
on remount. Restore it.
Fixes: d7b9416fe5 ("btrfs: remove btrfs_end_io_wq")
CC: stable@vger.kernel.org # 6.0+
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Pull NVMe fix from Christoph:
"nvme fixes for Linux 6.3
- fix discard support without oncs (Keith Busch)"
* tag 'nvme-6.3-2023-04-06' of git://git.infradead.org/nvme:
nvme: fix discard support without oncs
block size is one very key setting for block layer, and bad block size
could panic kernel easily.
Make sure that block size is set correctly.
Meantime if ublk_validate_params() fails, clear ub->params so that disk
is prevented from being added.
Fixes: 71f28f3136 ("ublk_drv: add io_uring based userspace block driver")
Reported-and-tested-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
For memory alloc that store user data from nla[NFTA_OBJ_USERDATA],
use GFP_KERNEL_ACCOUNT is more suitable.
Fixes: 33758c8914 ("memcg: enable accounting for nft objects")
Signed-off-by: Chen Aotian <chenaotian2@163.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Recent attempt to ensure PREROUTING hook is executed again when a
decrypted ipsec packet received on a bridge passes through the network
stack a second time broke the physdev match in INPUT hook.
We can't discard the nf_bridge info strct from sabotage_in hook, as
this is needed by the physdev match.
Keep the struct around and handle this with another conditional instead.
Fixes: 2b272bb558 ("netfilter: br_netfilter: disable sabotage_in hook after first suppression")
Reported-and-tested-by: Farid BENAMROUCHE <fariouche@yahoo.fr>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
It could have never worked, as snd_emu10k1_fx8010_playback_prepare() and
snd_emu10k1_fx8010_playback_hw_free() assume the emu10k1 offset for the
ETRAM, and the default DSP code includes no handler for it. It also
wouldn't make a lot of sense to make it work, as Audigy has an own, much
simpler, pass-through mechanism. So just skip creation of the device.
Signed-off-by: Oswald Buddenhagen <oswald.buddenhagen@gmx.de>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20230405201220.2197938-1-oswald.buddenhagen@gmx.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Due to two copy/pastos, closing the MIC or EFX capture device would
make a running ADC capture hang due to unsetting its interrupt handler.
In principle, this would have also allowed dereferencing dangling
pointers, but we're actually rather thorough at disabling and flushing
the ints.
While it may sound like one, this actually wasn't a hypothetical bug:
PortAudio will open a capture stream at startup (and close it right
away) even if not asked to. If the first device is busy, it will just
proceed with the next one ... thus killing a concurrent capture.
Signed-off-by: Oswald Buddenhagen <oswald.buddenhagen@gmx.de>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20230405201220.2197923-1-oswald.buddenhagen@gmx.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
The BIOS botches this one completely - it says the 2nd S/PDIF output is
used, while in fact it's the 1st one. This is tested on DP45SG, but I'm
assuming it's valid for the other boards in the series as well.
Also add some comments regarding the pins.
FWIW, the codec is apparently still sold by Tempo Semiconductor, Inc.,
where one can download the documentation.
Signed-off-by: Oswald Buddenhagen <oswald.buddenhagen@gmx.de>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20230405201220.2197826-2-oswald.buddenhagen@gmx.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Since SQE memory is shared with userspace, we should only be reading it
once. We cannot read it multiple times, particularly when it's read once
for validation and then read again for the actual use.
ublk_ch_uring_cmd() is safe when called as a retry operation, as the
memory backing is stable at that point. But for normal issue, we want
to ensure that we only read ublksrv_io_cmd once. Wrap the function in
a helper that reads the value into an on-stack copy of the struct.
Cc: stable@vger.kernel.org # 6.0+
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
When deleting the netns and recreating a new one while re-adding the
veth interface, there is a small window of time during which the old
veth interface has not yet been removed. This can cause the new addition
to fail. To resolve this issue, we can either wait for a short while to
ensure that the old veth interface is deleted, or we can specifically
remove the veth interface.
Before this patch:
# ./rps_default_mask.sh
empty rps_default_mask [ ok ]
changing rps_default_mask dont affect existing devices [ ok ]
changing rps_default_mask dont affect existing netns [ ok ]
changing rps_default_mask affect newly created devices [ ok ]
changing rps_default_mask don't affect newly child netns[II][ ok ]
rps_default_mask is 0 by default in child netns [ ok ]
RTNETLINK answers: File exists
changing rps_default_mask in child ns don't affect the main one[ ok ]
cat: /sys/class/net/vethC11an1/queues/rx-0/rps_cpus: No such file or directory
changing rps_default_mask in child ns affects new childns devices./rps_default_mask.sh: line 36: [: -eq: unary operator expected
[fail] expected 1 found
changing rps_default_mask in child ns don't affect existing devices[ ok ]
After this patch:
# ./rps_default_mask.sh
empty rps_default_mask [ ok ]
changing rps_default_mask dont affect existing devices [ ok ]
changing rps_default_mask dont affect existing netns [ ok ]
changing rps_default_mask affect newly created devices [ ok ]
changing rps_default_mask don't affect newly child netns[II][ ok ]
rps_default_mask is 0 by default in child netns [ ok ]
changing rps_default_mask in child ns don't affect the main one[ ok ]
changing rps_default_mask in child ns affects new childns devices[ ok ]
changing rps_default_mask in child ns don't affect existing devices[ ok ]
Fixes: 3a7d84eae0 ("self-tests: more rps self tests")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/20230404072411.879476-1-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Not all fec MDIO bus drivers support C45 mode transactions. The older fec
hardware block in many ColdFire SoCs does not appear to support them, at
least according to most of the different ColdFire SoC reference manuals.
The bits used to generate C45 access on the iMX parts, in the OP field
of the MMFR register, are documented as generating non-compliant MII
frames (it is not documented as to exactly how they are non-compliant).
Commit 8d03ad1ab0 ("net: fec: Separate C22 and C45 transactions")
means the fec driver will always register c45 MDIO read and write
methods. During probe these will always be accessed now generating
non-compliant MII accesses on ColdFire based devices.
Add a quirk define, FEC_QUIRK_HAS_MDIO_C45, that can be used to
distinguish silicon that supports MDIO C45 framing or not. Add this to
all the existing iMX quirks, so they will be behave as they do now (*).
(*) it seems that some iMX parts may not support C45 transactions either.
The iMX25 and iMX50 Reference Manuals contain similar wording to
the ColdFire Reference Manuals on this.
Fixes: 8d03ad1ab0 ("net: fec: Separate C22 and C45 transactions")
Signed-off-by: Greg Ungerer <gerg@linux-m68k.org>
Reviewed-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20230404052207.3064861-1-gerg@linux-m68k.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Since commit 75c2f26da0 ("PCI: imx6: Add i.MX PCIe EP mode support")
the i.MX6 PCI driver is no longer selected by default. The
existing PCI_IMX6 was made a hidden option, selected by new options
PCI_IMX6_HOST (for the existing support) and PCI_IMX6_EP (for the
endpoint mode), but there has been no corresponding update to
imx_v6_v7_defconfig so the PCI_IMX6 ends up getting disabled.
Switch imx_v6_v7_defconfig to PCI_IMX6_HOST to preserve the existing
functionality.
This is based on the same fix done in commit 0cd5780eb6 ("arm64:
defconfig: Fix unintentional disablement of PCI on i.MX").
Fixes: 75c2f26da0 ("PCI: imx6: Add i.MX PCIe EP mode support")
Reported-by: Mattias Barthel <mattiasbarthel@gmail.com>
Signed-off-by: Fabio Estevam <festevam@denx.de>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
There is a concurrency bug that may cause the wrong value to be loaded
when a CPU is modifying the maple tree.
CPU1:
mtree_insert_range()
mas_insert()
mas_store_root()
...
mas_root_expand()
...
rcu_assign_pointer(mas->tree->ma_root, mte_mk_root(mas->node));
ma_set_meta(node, maple_leaf_64, 0, slot); <---IP
CPU2:
mtree_load()
mtree_lookup_walk()
ma_data_end();
When CPU1 is about to execute the instruction pointed to by IP, the
ma_data_end() executed by CPU2 may return the wrong end position, which
will cause the value loaded by mtree_load() to be wrong.
An example of triggering the bug:
Add mdelay(100) between rcu_assign_pointer() and ma_set_meta() in
mas_root_expand().
static DEFINE_MTREE(tree);
int work(void *p) {
unsigned long val;
for (int i = 0 ; i< 30; ++i) {
val = (unsigned long)mtree_load(&tree, 8);
mdelay(5);
pr_info("%lu",val);
}
return 0;
}
mt_init_flags(&tree, MT_FLAGS_USE_RCU);
mtree_insert(&tree, 0, (void*)12345, GFP_KERNEL);
run_thread(work)
mtree_insert(&tree, 1, (void*)56789, GFP_KERNEL);
In RCU mode, mtree_load() should always return the value before or after
the data structure is modified, and in this example mtree_load(&tree, 8)
may return 56789 which is not expected, it should always return NULL. Fix
it by put ma_set_meta() before rcu_assign_pointer().
Link: https://lkml.kernel.org/r/20230314124203.91572-4-zhangpeng.00@bytedance.com
Fixes: 54a611b605 ("Maple Tree: add new data structure")
Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The si->lock must be held when deleting the si from the available list.
Otherwise, another thread can re-add the si to the available list, which
can lead to memory corruption. The only place we have found where this
happens is in the swapoff path. This case can be described as below:
core 0 core 1
swapoff
del_from_avail_list(si) waiting
try lock si->lock acquire swap_avail_lock
and re-add si into
swap_avail_head
acquire si->lock but missing si already being added again, and continuing
to clear SWP_WRITEOK, etc.
It can be easily found that a massive warning messages can be triggered
inside get_swap_pages() by some special cases, for example, we call
madvise(MADV_PAGEOUT) on blocks of touched memory concurrently, meanwhile,
run much swapon-swapoff operations (e.g. stress-ng-swap).
However, in the worst case, panic can be caused by the above scene. In
swapoff(), the memory used by si could be kept in swap_info[] after
turning off a swap. This means memory corruption will not be caused
immediately until allocated and reset for a new swap in the swapon path.
A panic message caused: (with CONFIG_PLIST_DEBUG enabled)
------------[ cut here ]------------
top: 00000000e58a3003, n: 0000000013e75cda, p: 000000008cd4451a
prev: 0000000035b1e58a, n: 000000008cd4451a, p: 000000002150ee8d
next: 000000008cd4451a, n: 000000008cd4451a, p: 000000008cd4451a
WARNING: CPU: 21 PID: 1843 at lib/plist.c:60 plist_check_prev_next_node+0x50/0x70
Modules linked in: rfkill(E) crct10dif_ce(E)...
CPU: 21 PID: 1843 Comm: stress-ng Kdump: ... 5.10.134+
Hardware name: Alibaba Cloud ECS, BIOS 0.0.0 02/06/2015
pstate: 60400005 (nZCv daif +PAN -UAO -TCO BTYPE=--)
pc : plist_check_prev_next_node+0x50/0x70
lr : plist_check_prev_next_node+0x50/0x70
sp : ffff0018009d3c30
x29: ffff0018009d3c40 x28: ffff800011b32a98
x27: 0000000000000000 x26: ffff001803908000
x25: ffff8000128ea088 x24: ffff800011b32a48
x23: 0000000000000028 x22: ffff001800875c00
x21: ffff800010f9e520 x20: ffff001800875c00
x19: ffff001800fdc6e0 x18: 0000000000000030
x17: 0000000000000000 x16: 0000000000000000
x15: 0736076307640766 x14: 0730073007380731
x13: 0736076307640766 x12: 0730073007380731
x11: 000000000004058d x10: 0000000085a85b76
x9 : ffff8000101436e4 x8 : ffff800011c8ce08
x7 : 0000000000000000 x6 : 0000000000000001
x5 : ffff0017df9ed338 x4 : 0000000000000001
x3 : ffff8017ce62a000 x2 : ffff0017df9ed340
x1 : 0000000000000000 x0 : 0000000000000000
Call trace:
plist_check_prev_next_node+0x50/0x70
plist_check_head+0x80/0xf0
plist_add+0x28/0x140
add_to_avail_list+0x9c/0xf0
_enable_swap_info+0x78/0xb4
__do_sys_swapon+0x918/0xa10
__arm64_sys_swapon+0x20/0x30
el0_svc_common+0x8c/0x220
do_el0_svc+0x2c/0x90
el0_svc+0x1c/0x30
el0_sync_handler+0xa8/0xb0
el0_sync+0x148/0x180
irq event stamp: 2082270
Now, si->lock locked before calling 'del_from_avail_list()' to make sure
other thread see the si had been deleted and SWP_WRITEOK cleared together,
will not reinsert again.
This problem exists in versions after stable 5.10.y.
Link: https://lkml.kernel.org/r/20230404154716.23058-1-rongwei.wang@linux.alibaba.com
Fixes: a2468cc9bf ("swap: choose swap device according to numa node")
Tested-by: Yongchen Yin <wb-yyc939293@alibaba-inc.com>
Signed-off-by: Rongwei Wang <rongwei.wang@linux.alibaba.com>
Cc: Bagas Sanjaya <bagasdotme@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Aaron Lu <aaron.lu@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The current nilfs2 sysfs support has issues with the timing of creation
and deletion of sysfs entries, potentially leading to null pointer
dereferences, use-after-free, and lockdep warnings.
Some of the sysfs attributes for nilfs2 per-filesystem instance refer to
metadata file "cpfile", "sufile", or "dat", but
nilfs_sysfs_create_device_group that creates those attributes is executed
before the inodes for these metadata files are loaded, and
nilfs_sysfs_delete_device_group which deletes these sysfs entries is
called after releasing their metadata file inodes.
Therefore, access to some of these sysfs attributes may occur outside of
the lifetime of these metadata files, resulting in inode NULL pointer
dereferences or use-after-free.
In addition, the call to nilfs_sysfs_create_device_group() is made during
the locking period of the semaphore "ns_sem" of nilfs object, so the
shrinker call caused by the memory allocation for the sysfs entries, may
derive lock dependencies "ns_sem" -> (shrinker) -> "locks acquired in
nilfs_evict_inode()".
Since nilfs2 may acquire "ns_sem" deep in the call stack holding other
locks via its error handler __nilfs_error(), this causes lockdep to report
circular locking. This is a false positive and no circular locking
actually occurs as no inodes exist yet when
nilfs_sysfs_create_device_group() is called. Fortunately, the lockdep
warnings can be resolved by simply moving the call to
nilfs_sysfs_create_device_group() out of "ns_sem".
This fixes these sysfs issues by revising where the device's sysfs
interface is created/deleted and keeping its lifetime within the lifetime
of the metadata files above.
Link: https://lkml.kernel.org/r/20230330205515.6167-1-konishi.ryusuke@gmail.com
Fixes: dd70edbde2 ("nilfs2: integrate sysfs support into driver")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: syzbot+979fa7f9c0d086fdc282@syzkaller.appspotmail.com
Link: https://lkml.kernel.org/r/0000000000003414b505f7885f7e@google.com
Reported-by: syzbot+5b7d542076d9bddc3c6a@syzkaller.appspotmail.com
Link: https://lkml.kernel.org/r/0000000000006ac86605f5f44eb9@google.com
Cc: Viacheslav Dubeyko <slava@dubeyko.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Device exclusive page table entries are used to prevent CPU access to a
page whilst it is being accessed from a device. Typically this is used to
implement atomic operations when the underlying bus does not support
atomic access. When a CPU thread encounters a device exclusive entry it
locks the page and restores the original entry after calling mmu notifiers
to signal drivers that exclusive access is no longer available.
The device exclusive entry holds a reference to the page making it safe to
access the struct page whilst the entry is present. However the fault
handling code does not hold the PTL when taking the page lock. This means
if there are multiple threads faulting concurrently on the device
exclusive entry one will remove the entry whilst others will wait on the
page lock without holding a reference.
This can lead to threads locking or waiting on a folio with a zero
refcount. Whilst mmap_lock prevents the pages getting freed via munmap()
they may still be freed by a migration. This leads to warnings such as
PAGE_FLAGS_CHECK_AT_FREE due to the page being locked when the refcount
drops to zero.
Fix this by trying to take a reference on the folio before locking it.
The code already checks the PTE under the PTL and aborts if the entry is
no longer there. It is also possible the folio has been unmapped, freed
and re-allocated allowing a reference to be taken on an unrelated folio.
This case is also detected by the PTE check and the folio is unlocked
without further changes.
Link: https://lkml.kernel.org/r/20230330012519.804116-1-apopple@nvidia.com
Fixes: b756a3b5e7 ("mm: device exclusive memory access")
Signed-off-by: Alistair Popple <apopple@nvidia.com>
Reviewed-by: Ralph Campbell <rcampbell@nvidia.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The finalization of nilfs_segctor_thread() can race with
nilfs_segctor_kill_thread() which terminates that thread, potentially
causing a use-after-free BUG as KASAN detected.
At the end of nilfs_segctor_thread(), it assigns NULL to "sc_task" member
of "struct nilfs_sc_info" to indicate the thread has finished, and then
notifies nilfs_segctor_kill_thread() of this using waitqueue
"sc_wait_task" on the struct nilfs_sc_info.
However, here, immediately after the NULL assignment to "sc_task", it is
possible that nilfs_segctor_kill_thread() will detect it and return to
continue the deallocation, freeing the nilfs_sc_info structure before the
thread does the notification.
This fixes the issue by protecting the NULL assignment to "sc_task" and
its notification, with spinlock "sc_state_lock" of the struct
nilfs_sc_info. Since nilfs_segctor_kill_thread() does a final check to
see if "sc_task" is NULL with "sc_state_lock" locked, this can eliminate
the race.
Link: https://lkml.kernel.org/r/20230327175318.8060-1-konishi.ryusuke@gmail.com
Reported-by: syzbot+b08ebcc22f8f3e6be43a@syzkaller.appspotmail.com
Link: https://lkml.kernel.org/r/00000000000000660d05f7dfa877@google.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This patch fixes an issue that a hugetlb uffd-wr-protected mapping can be
writable even with uffd-wp bit set. It only happens with hugetlb private
mappings, when someone firstly wr-protects a missing pte (which will
install a pte marker), then a write to the same page without any prior
access to the page.
Userfaultfd-wp trap for hugetlb was implemented in hugetlb_fault() before
reaching hugetlb_wp() to avoid taking more locks that userfault won't
need. However there's one CoW optimization path that can trigger
hugetlb_wp() inside hugetlb_no_page(), which will bypass the trap.
This patch skips hugetlb_wp() for CoW and retries the fault if uffd-wp bit
is detected. The new path will only trigger in the CoW optimization path
because generic hugetlb_fault() (e.g. when a present pte was
wr-protected) will resolve the uffd-wp bit already. Also make sure
anonymous UNSHARE won't be affected and can still be resolved, IOW only
skip CoW not CoR.
This patch will be needed for v5.19+ hence copy stable.
[peterx@redhat.com: v2]
Link: https://lkml.kernel.org/r/ZBzOqwF2wrHgBVZb@x1n
[peterx@redhat.com: v3]
Link: https://lkml.kernel.org/r/20230324142620.2344140-1-peterx@redhat.com
Link: https://lkml.kernel.org/r/20230321191840.1897940-1-peterx@redhat.com
Fixes: 166f3ecc0d ("mm/hugetlb: hook page faults for uffd write protection")
Signed-off-by: Peter Xu <peterx@redhat.com>
Reported-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Tested-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Use the maple tree in RCU mode for VMA tracking.
The maple tree tracks the stack and is able to update the pivot
(lower/upper boundary) in-place to allow the page fault handler to write
to the tree while holding just the mmap read lock. This is safe as the
writes to the stack have a guard VMA which ensures there will always be a
NULL in the direction of the growth and thus will only update a pivot.
It is possible, but not recommended, to have VMAs that grow up/down
without guard VMAs. syzbot has constructed a testcase which sets up a VMA
to grow and consume the empty space. Overwriting the entire NULL entry
causes the tree to be altered in a way that is not safe for concurrent
readers; the readers may see a node being rewritten or one that does not
match the maple state they are using.
Enabling RCU mode allows the concurrent readers to see a stable node and
will return the expected result.
[Liam.Howlett@Oracle.com: we don't need to free the nodes with RCU[
Link: https://lore.kernel.org/linux-mm/000000000000b0a65805f663ace6@google.com/
Link: https://lkml.kernel.org/r/20230227173632.3292573-9-surenb@google.com
Fixes: d4af56c5c7 ("mm: start tracking VMAs with maple tree")
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reported-by: syzbot+8d95422d3537159ca390@syzkaller.appspotmail.com
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Dereferencing RCU objects within the RCU callback without the RCU check
has caused lockdep to complain. Fix the RCU dereferencing by using the
RCU callback lock to ensure the operation is safe.
Also stop creating a new lock to use for dereferencing during destruction
of the tree or subtree. Instead, pass through a pointer to the tree that
has the lock that is held for RCU dereferencing checking. It also does
not make sense to use the maple state in the freeing scenario as the tree
walk is a special case where the tree no longer has the normal encodings
and parent pointers.
Link: https://lkml.kernel.org/r/20230227173632.3292573-8-surenb@google.com
Fixes: 54a611b605 ("Maple Tree: add new data structure")
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reported-by: Suren Baghdasaryan <surenb@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
During the development of the maple tree, the strategy of freeing multiple
nodes changed and, in the process, the pivots were reused to store
pointers to dead nodes. To ensure the readers see accurate pivots, the
writers need to mark the nodes as dead and call smp_wmb() to ensure any
readers can identify the node as dead before using the pivot values.
There were two places where the old method of marking the node as dead
without smp_wmb() were being used, which resulted in RCU readers seeing
the wrong pivot value before seeing the node was dead. Fix this race
condition by using mte_set_node_dead() which has the smp_wmb() call to
ensure the race is closed.
Add a WARN_ON() to the ma_free_rcu() call to ensure all nodes being freed
are marked as dead to ensure there are no other call paths besides the two
updated paths.
This is necessary for the RCU mode of the maple tree.
Link: https://lkml.kernel.org/r/20230227173632.3292573-6-surenb@google.com
Fixes: 54a611b605 ("Maple Tree: add new data structure")
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Kalle Valo says:
====================
wireless fixes for v6.3
mt76 has a fix for leaking cleartext frames on a certain scenario and
two firmware file handling related fixes. For brcmfmac we have a fix
for an older SDIO suspend regression and for ath11k avoiding a kernel
crash during hibernation with SUSE kernels.
* tag 'wireless-2023-04-05' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
wifi: mt76: ignore key disable commands
wifi: ath11k: reduce the MHI timeout to 20s
wifi: mt76: mt7921: fix fw used for offload check for mt7922
wifi: mt76: mt7921: Fix use-after-free in fw features query.
wifi: brcmfmac: Fix SDIO suspend/resume regression
====================
Link: https://lore.kernel.org/r/20230405105536.4E946C433D2@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Marc Kleine-Budde says:
====================
pull-request: can 2023-04-05
The first patch is by Oleksij Rempel and fixes a out-of-bounds memory
access in the j1939 protocol.
The remaining 3 patches target the ISOTP protocol. Oliver Hartkopp
fixes the ISOTP protocol to pass information about dropped PDUs to the
user space via control messages. Michal Sojka's patch fixes poll() to
not forward false EPOLLOUT events. And Oliver Hartkopp fixes a race
condition between isotp_sendsmg() and isotp_release().
* tag 'linux-can-fixes-for-6.3-20230405' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
can: isotp: fix race between isotp_sendsmg() and isotp_release()
can: isotp: isotp_ops: fix poll() to not report false EPOLLOUT events
can: isotp: isotp_recvmsg(): use sock_recv_cmsgs() to get SOCK_RXQ_OVFL infos
can: j1939: j1939_tp_tx_dat_new(): fix out-of-bounds memory access
====================
Link: https://lore.kernel.org/r/20230405092444.1802340-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2023-04-04 (ice)
This series contains updates to ice driver only.
Simei adjusts error path on adding VF Flow Director filters that were
not releasing all resources.
Lingyu adds setting/resetting of VF Flow Director filters counters
during initialization.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
ice: Reset FDIR counter in FDIR init stage
ice: fix wrong fallback logic for FDIR
====================
Link: https://lore.kernel.org/r/20230404172306.450880-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The Lenovo ThinkPad W530 uses a nvidia k1000m GPU. When this gets used
together with one of the older nvidia binary driver series (the latest
series does not support it), then backlight control does not work.
This is caused by commit 3dbc80a3e4 ("ACPI: video: Make backlight
class device registration a separate step (v2)") combined with
commit 5aa9d943e9 ("ACPI: video: Don't enable fallback path for
creating ACPI backlight by default").
After these changes the acpi_video# backlight device is only registered
when requested by a GPU driver calling acpi_video_register_backlight()
which the nvidia binary driver does not do.
I realize that using the nvidia binary driver is not a supported use-case
and users can workaround this by adding acpi_backlight=video on the kernel
commandline, but the ThinkPad W530 is a popular model under Linux users,
so it seems worthwhile to add a quirk for this.
I will also email Nvidia asking them to make the driver call
acpi_video_register_backlight() when an internal LCD panel is detected.
So maybe the next maintenance release of the drivers will fix this...
Fixes: 5aa9d943e9 ("ACPI: video: Don't enable fallback path for creating ACPI backlight by default")
Cc: All applicable <stable@vger.kernel.org>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
On the Apple iMac14,1 and iMac14,2 all-in-ones (monitors with builtin "PC")
the connection between the GPU and the panel is seen by the GPU driver as
regular DP instead of eDP, causing the GPU driver to never call
acpi_video_register_backlight().
(GPU drivers only call acpi_video_register_backlight() when an internal
panel is detected, to avoid non working acpi_video# devices getting
registered on desktops which unfortunately is a real issue.)
Fix the missing acpi_video# backlight device on these all-in-ones by
adding a acpi_backlight=video DMI quirk, so that video.ko will
immediately register the backlight device instead of waiting for
an acpi_video_register_backlight() call.
Fixes: 5aa9d943e9 ("ACPI: video: Don't enable fallback path for creating ACPI backlight by default")
Cc: All applicable <stable@vger.kernel.org>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Commit 3dbc80a3e4 ("ACPI: video: Make backlight class device
registration a separate step (v2)") combined with
commit 5aa9d943e9 ("ACPI: video: Don't enable fallback path for
creating ACPI backlight by default")
Means that the video.ko code now fully depends on the GPU driver calling
acpi_video_register_backlight() for the acpi_video# backlight class
devices to get registered.
This means that if the GPU driver does not do this, acpi_backlight=video
on the cmdline, or DMI quirks for selecting acpi_video# will not work.
This is a problem on for example Apple iMac14,1 all-in-ones where
the monitor's LCD panel shows up as a regular DP connection instead of
eDP so the GPU driver will not call acpi_video_register_backlight() [1].
Fix this by making video.ko directly register the acpi_video# devices
when these have been explicitly requested either on the cmdline or
through DMI quirks (rather then auto-detection being used).
[1] GPU drivers only call acpi_video_register_backlight() when an internal
panel is detected, to avoid non working acpi_video# devices getting
registered on desktops which unfortunately is a real issue.
Fixes: 5aa9d943e9 ("ACPI: video: Don't enable fallback path for creating ACPI backlight by default")
Cc: All applicable <stable@vger.kernel.org>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Allow callers of __acpi_video_get_backlight_type() to pass a pointer
to a bool which will get set to false if the backlight-type comes from
the cmdline or a DMI quirk and set to true if auto-detection was used.
And make __acpi_video_get_backlight_type() non static so that it can
be called directly outside of video_detect.c .
While at it turn the acpi_video_get_backlight_type() and
acpi_video_backlight_use_native() wrappers into static inline functions
in include/acpi/video.h, so that we need to export one less symbol.
Fixes: 5aa9d943e9 ("ACPI: video: Don't enable fallback path for creating ACPI backlight by default")
Cc: All applicable <stable@vger.kernel.org>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The Anbernic and Odroid Go have different panels and take differently
named supplies, so move all the supplies to DTS defining actual panel to
fix warnings like:
rk3326-odroid-go3.dtb: panel@0: 'IOVCC-supply' is a required property
rk3326-odroid-go3.dtb: panel@0: 'iovcc-supply', 'vdd-supply' do not match any of the regexes: 'pinctrl-[0-9]+'
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20230326204520.80859-1-krzysztof.kozlowski@linaro.org
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
After a server reboot, clients are failing to move files with ENOENT.
This is caused by DFS referrals containing multiple separators, which
the server move call doesn't recognize.
v1: Initial patch.
v2: Move prototype to header.
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2182472
Fixes: a31080899d ("cifs: sanitize multiple delimiters in prepath")
Actually-Fixes: 24e0a1eff9 ("cifs: switch to new mount api")
Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Thiago Rafael Becker <tbecker@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
iov_iter for ep_read_iter can be ITER_UBUF with io_uring.
In that case dup_iter() does not have to allocate iov and it can
return NULL. Fix the assumption by checking for iter_is_ubuf()
other wise ep_read_iter can treat this as failure and return -ENOMEM.
Fixes: 1e23db450c ("io_uring: use iter_ubuf for single range imports")
Signed-off-by: Sandeep Dhavale <dhavale@google.com>
Acked-by: Jens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/20230401060509.3608259-3-dhavale@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
iov_iter for ffs_epfile_read_iter can be ITER_UBUF with io_uring.
In that case dup_iter() does not have to allocate anything and it
can return NULL. ffs_epfile_read_iter treats this as a failure and
returns -ENOMEM. Fix it by checking if iter_is_ubuf().
Fixes: 1e23db450c ("io_uring: use iter_ubuf for single range imports")
Signed-off-by: Sandeep Dhavale <dhavale@google.com>
Acked-by: Jens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/20230401060509.3608259-2-dhavale@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
While determining the initial pin assignment to be sent in the configure
message, using the DP_PIN_ASSIGN_DP_ONLY_MASK mask causes the DFP_U to
send both Pin Assignment C and E when both are supported by the DFP_U and
UFP_U. The spec (Table 5-7 DFP_U Pin Assignment Selection Mandates,
VESA DisplayPort Alt Mode Standard v2.0) indicates that the DFP_U never
selects Pin Assignment E when Pin Assignment C is offered.
Update the DP_PIN_ASSIGN_DP_ONLY_MASK conditional to intially select only
Pin Assignment C if it is available.
Fixes: 0e3bb7d689 ("usb: typec: Add driver for DisplayPort alternate mode")
Cc: stable@vger.kernel.org
Signed-off-by: RD Babiera <rdbabiera@google.com>
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Link: https://lore.kernel.org/r/20230329215159.2046932-1-rdbabiera@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Dan reports that smatch complains about a potential uninitialized
variable being used in the compat alignment fixup code.
The logic is not wrong per se, but we do end up using an uninitialized
variable if reading the instruction that triggered the alignment fault
from user space faults, even if the fault ensures that the uninitialized
value doesn't propagate any further.
Given that we just give up and return 1 if any fault occurs when reading
the instruction, let's get rid of the 'success handling' pattern that
captures the fault in a variable and aborts later, and instead, just
return 1 immediately if any of the get_user() calls result in an
exception.
Fixes: 3fc24ef32d ("arm64: compat: Implement misalignment fixups for multiword loads")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <error27@gmail.com>
Link: https://lore.kernel.org/r/202304021214.gekJ8yRc-lkp@intel.com/
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20230404103625.2386382-1-ardb@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Pull tracing fixes from Steven Rostedt:
- Fix timerlat notification, as it was not triggering the notify to
users when a new max latency was hit.
- Do not trigger max latency if the tracing is off.
When tracing is off, the ring buffer is not updated, it does not make
sense to notify when there's a new max latency detected by the
tracer, as why that latency happened is not available. The tracing
logic still runs when the ring buffer is disabled, but it should not
be triggering notifications.
- Fix race on freeing the synthetic event "last_cmd" variable by adding
a mutex around it.
- Fix race between reader and writer of the ring buffer by adding
memory barriers. When the writer is still on the reader page it must
have its content visible on the buffer before it moves the commit
index that the reader uses to know how much content is on the page.
- Make get_lock_parent_ip() always inlined, as it uses _THIS_IP_ and
_RET_IP_, which gets broken if it is not inlined.
- Make __field(int, arr[5]) in a TRACE_EVENT() macro fail to build.
The field formats of trace events are calculated by using
sizeof(type) and other means by what is passed into the structure
macros like __field(). The __field() macro is only meant for atom
types like int, long, short, pointer, etc. It is not meant for
arrays.
The code will currently compile with arrays, but then the format
produced will be inaccurate, and user space parsing tools will break.
Two bugs have already been fixed, now add code that will make the
kernel fail to build if another trace event includes this buggy field
format.
- Fix boot up snapshot code:
Boot snapshots were triggering when not even asked for on the kernel
command line. This was caused by two bugs:
1) It would trigger a snapshot on any instance if one was created
from the kernel command line.
2) The error handling would only affect the top level instance.
So the fact that a snapshot was done on a instance that didn't
allocate a buffer triggered a warning written into the top level
buffer, and worse yet, disabled the top level buffer.
- Fix memory leak that was caused when an error was logged in a trace
buffer instance, and then the buffer instance was removed.
The allocated error log messages still needed to be freed.
* tag 'trace-v6.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: Free error logs of tracing instances
tracing: Fix ftrace_boot_snapshot command line logic
tracing: Have tracing_snapshot_instance_cond() write errors to the appropriate instance
tracing: Error if a trace event has an array for a __field()
tracing/osnoise: Fix notify new tracing_max_latency
tracing/timerlat: Notify new max thread latency
ftrace: Mark get_lock_parent_ip() __always_inline
ring-buffer: Fix race while reader and writer are on the same page
tracing/synthetic: Fix races on freeing last_cmd
The device can report discard support without setting the ONCS DSM bit.
When not set, the driver clears max_discard_size expecting it to be set
later. We don't know the size until we have the namespace format,
though, so setting it is deferred until configuring one, but the driver
was abandoning the discard settings due to that initial clearing.
Move the max_discard_size calculation above the check for a '0' discard
size.
Fixes: 1a86924e4f ("nvme: fix interpretation of DMRSL")
Reported-by: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Tested-by: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Building with W=1 leads to the following dtc warning:
arch/arm/boot/dts/imx6ull-colibri.dtsi:36.9-46.5: Warning (graph_child_address): /connector/ports: graph node has single child node 'port@0', #address-cells/#size-cells are not necessary
Since a single port is used, 'ports' can be removed, as well as the
unnecessary #address-cells/#size-cells.
Fixes: bd5880e109 ("ARM: dts: colibri-imx6ull: Enable dual-role switching")
Signed-off-by: Fabio Estevam <festevam@denx.de>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Building with W=1 leads to the following dtc warning:
arch/arm/boot/dts/imx7d-remarkable2.dts:319.19-335.4: Warning (avoid_unnecessary_addr_size): /soc/bus@30800000/i2c@30a50000/pmic@62: unnecessary #address-cells/#size-cells without "ranges" or child "reg" property
Remove unnecessary #address-cells/#size-cells to fix it.
Fixes: 9076cbaa77 ("ARM: dts: imx7d-remarkable2: Enable silergy,sy7636a")
Signed-off-by: Fabio Estevam <festevam@denx.de>
Reviewed-by: Alistair Francis <alistair@alistair23.me>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
The property should be off-on-delay-us, not off-on-delay
Fixes: a39ed23bdf ("arm64: dts: freescale: add initial support for verdin imx8m plus")
Signed-off-by: Peng Fan <peng.fan@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
The property should be off-on-delay-us, not off-on-delay
Fixes: 6a57f224f7 ("arm64: dts: freescale: add initial support for verdin imx8m mini")
Signed-off-by: Peng Fan <peng.fan@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
The osc_32k supports #clock-cells as 0, using an id is wrong, drop it.
Fixes: a6a355ede5 ("arm64: dts: imx8mm-evk: Add 32.768 kHz clock to PMIC")
Signed-off-by: Peng Fan <peng.fan@nxp.com>
Reviewed-by: Marco Felsch <m.felsch@pengutronix.de>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
With PCI if the device was suspended it is brought back to full
power and then suspended again.
This doesn't happen when device is described via DT.
We need to make sure that we tear down pipelines only if the device
was previously active (thus the pipelines were setup).
Otherwise, we can break the use_count:
[ 219.009743] sof-audio-of-imx8m 3b6e8000.dsp:
sof_ipc3_tear_down_all_pipelines: widget PIPELINE.2.SAI3.IN is still in use: count -1
and after this everything stops working.
Fixes: d185e0689a ("ASoC: SOF: pm: Always tear down pipelines before DSP suspend")
Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Signed-off-by: Daniel Baluta <daniel.baluta@nxp.com>
Link: https://lore.kernel.org/r/20230405092655.19587-1-daniel.baluta@oss.nxp.com
Signed-off-by: Mark Brown <broonie@kernel.org>
In the j1939_tp_tx_dat_new() function, an out-of-bounds memory access
could occur during the memcpy() operation if the size of skb->cb is
larger than the size of struct j1939_sk_buff_cb. This is because the
memcpy() operation uses the size of skb->cb, leading to a read beyond
the struct j1939_sk_buff_cb.
Updated the memcpy() operation to use the size of struct
j1939_sk_buff_cb instead of the size of skb->cb. This ensures that the
memcpy() operation only reads the memory within the bounds of struct
j1939_sk_buff_cb, preventing out-of-bounds memory access.
Additionally, add a BUILD_BUG_ON() to check that the size of skb->cb
is greater than or equal to the size of struct j1939_sk_buff_cb. This
ensures that the skb->cb buffer is large enough to hold the
j1939_sk_buff_cb structure.
Fixes: 9d71dd0c70 ("can: add support of SAE J1939 protocol")
Reported-by: Shuangpeng Bai <sjb7183@psu.edu>
Tested-by: Shuangpeng Bai <sjb7183@psu.edu>
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://groups.google.com/g/syzkaller/c/G_LL-C3plRs/m/-8xCi6dCAgAJ
Link: https://lore.kernel.org/all/20230404073128.3173900-1-o.rempel@pengutronix.de
Cc: stable@vger.kernel.org
[mkl: rephrase commit message]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
The same task check in perf_event_set_output has some potential issues
for some usages.
For the current perf code, there is a problem if using of
perf_event_open() to have multiple samples getting into the same mmap’d
memory when they are both attached to the same process.
https://lore.kernel.org/all/92645262-D319-4068-9C44-2409EF44888E@gmail.com/
Because the event->ctx is not ready when the perf_event_set_output() is
invoked in the perf_event_open().
Besides the above issue, before the commit bd27568117 ("perf: Rewrite
core context handling"), perf record can errors out when sampling with
a hardware event and a software event as below.
$ perf record -e cycles,dummy --per-thread ls
failed to mmap with 22 (Invalid argument)
That's because that prior to the commit a hardware event and a software
event are from different task context.
The problem should be a long time issue since commit c3f00c7027
("perk: Separate find_get_context() from event initialization").
The task struct is stored in the event->hw.target for each per-thread
event. It is a more reliable way to determine whether two events are
attached to the same task.
The event->hw.target was also introduced several years ago by the
commit 50f16a8bf9 ("perf: Remove type specific target pointers"). It
can not only be used to fix the issue with the current code, but also
back port to fix the issues with an older kernel.
Note: The event->hw.target was introduced later than commit
c3f00c7027. The patch may cannot be applied between the commit
c3f00c7027 and commit 50f16a8bf9. Anybody that wants to back-port
this at that period may have to find other solutions.
Fixes: c3f00c7027 ("perf: Separate find_get_context() from event initialization")
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Zhengjun Xing <zhengjun.xing@linux.intel.com>
Link: https://lkml.kernel.org/r/20230322202449.512091-1-kan.liang@linux.intel.com
Currently job->done_fence is added to every BO handle within a job. If job
handle (command buffer) is shared between multiple submits, KMD will add
the fence in each of them. Then bo_wait_ioctl() executed on command buffer
will exit only when all jobs containing that handle are done.
This creates deadlock scenario for user mode driver in case when job handle
is added as dependency of another job, because bo_wait_ioctl() of first job
will wait until second job finishes, and second job can not finish before
first one.
Having fences added only to job buffer handle allows user space to execute
bo_wait_ioctl() on the job even if it's handle is submitted with other job.
Fixes: cd7272215c ("accel/ivpu: Add command buffer submission logic")
Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com>
Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230331113603.2802515-2-stanislaw.gruszka@linux.intel.com
The pmk8280 PMIC PON peripheral is gen3 and uses two sets of registers;
hlos and pbs.
This specifically fixes the following error message during boot when the
pbs registers are not defined:
PON_PBS address missing, can't read HW debounce time
Note that this also enables the spurious interrupt workaround introduced
by commit 0b65118e6b ("Input: pm8941-pwrkey - add software key press
debouncing support") (which may or may not be needed).
Fixes: ccd3517faf ("arm64: dts: qcom: sc8280xp: Add reference device")
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Tested-by: Steev Klimaszewski <steev@kali.org> #Thinkpad X13s
Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org>
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Link: https://lore.kernel.org/r/20230327122948.4323-1-johan+linaro@kernel.org
The kernel command line ftrace_boot_snapshot by itself is supposed to
trigger a snapshot at the end of boot up of the main top level trace
buffer. A ftrace_boot_snapshot=foo will do the same for an instance called
foo that was created by trace_instance=foo,...
The logic was broken where if ftrace_boot_snapshot was by itself, it would
trigger a snapshot for all instances that had tracing enabled, regardless
if it asked for a snapshot or not.
When a snapshot is requested for a buffer, the buffer's
tr->allocated_snapshot is set to true. Use that to know if a trace buffer
wants a snapshot at boot up or not.
Since the top level buffer is part of the ftrace_trace_arrays list,
there's no reason to treat it differently than the other buffers. Just
iterate the list if ftrace_boot_snapshot was specified.
Link: https://lkml.kernel.org/r/20230405022341.895334039@goodmis.org
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ross Zwisler <zwisler@google.com>
Fixes: 9c1c251d67 ("tracing: Allow boot instances to have snapshot buffers")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Non-GSO TCP packets whose SKBs' linear portion did not include the
entire TCP header were not populating the first Tx descriptor with
as many bytes as the vNIC expected. This change ensures that all
TCP packets populate the first descriptor with the correct number of
bytes.
Fixes: 893ce44df5 ("gve: Add basic driver framework for Compute Engine Virtual NIC")
Signed-off-by: Shailend Chand <shailend@google.com>
Link: https://lore.kernel.org/r/20230403172809.2939306-1-shailend@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If the number of lanes was forced and then subsequently the user
omits this parameter, the ksettings->lanes is reset. The driver
should then reset the number of lanes to the device's default
for the specified speed.
However, although the ksettings->lanes is set to 0, the mod variable
is not set to true to indicate the driver and userspace should be
notified of the changes.
The consequence is that the same ethtool operation will produce
different results based on the initial state.
If the initial state is:
$ ethtool swp1 | grep -A 3 'Speed: '
Speed: 500000Mb/s
Lanes: 2
Duplex: Full
Auto-negotiation: on
then executing 'ethtool -s swp1 speed 50000 autoneg off' will yield:
$ ethtool swp1 | grep -A 3 'Speed: '
Speed: 500000Mb/s
Lanes: 2
Duplex: Full
Auto-negotiation: off
While if the initial state is:
$ ethtool swp1 | grep -A 3 'Speed: '
Speed: 500000Mb/s
Lanes: 1
Duplex: Full
Auto-negotiation: off
executing the same 'ethtool -s swp1 speed 50000 autoneg off' results in:
$ ethtool swp1 | grep -A 3 'Speed: '
Speed: 500000Mb/s
Lanes: 1
Duplex: Full
Auto-negotiation: off
This patch fixes this behavior. Omitting lanes will always results in
the driver choosing the default lane width for the chosen speed. In this
scenario, regardless of the initial state, the end state will be, e.g.,
$ ethtool swp1 | grep -A 3 'Speed: '
Speed: 500000Mb/s
Lanes: 2
Duplex: Full
Auto-negotiation: off
Fixes: 012ce4dd31 ("ethtool: Extend link modes settings uAPI with lanes")
Signed-off-by: Andy Roulin <aroulin@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/ac238d6b-8726-8156-3810-6471291dbc7f@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kuniyuki Iwashima says:
====================
raw/ping: Fix locking in /proc/net/{raw,icmp}.
The first patch fixes a NULL deref for /proc/net/raw and second one fixes
the same issue for ping sockets.
The first patch also converts hlist_nulls to hlist, but this is because
the current code uses sk_nulls_for_each() for lockless readers, instead
of sk_nulls_for_each_rcu() which adds memory barrier, but raw sockets
does not use the nulls marker nor SLAB_TYPESAFE_BY_RCU in the first place.
OTOH, the ping sockets already uses sk_nulls_for_each_rcu(), and such
conversion can be posted later for net-next.
====================
Link: https://lore.kernel.org/r/20230403194959.48928-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
After commit dbca1596bb ("ping: convert to RCU lookups, get rid
of rwlock"), we use RCU for ping sockets, but we should use spinlock
for /proc/net/icmp to avoid a potential NULL deref mentioned in
the previous patch.
Let's go back to using spinlock there.
Note we can convert ping sockets to use hlist instead of hlist_nulls
because we do not use SLAB_TYPESAFE_BY_RCU for ping sockets.
Fixes: dbca1596bb ("ping: convert to RCU lookups, get rid of rwlock")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
One motivation for mapping range registers to decoder objects is
to use those settings for region autodiscovery.
The need to map a region for devices programmed to use range registers
is especially urgent now that the kernel no longer routes "Soft
Reserved" ranges in the memory map to device-dax by default. The CXL
memory range loses all access mechanisms.
Complete the implementation by marking the DPA reservation and setting
the endpoint-decoder state to signal autodiscovery. Note that the
default settings of ways=1 and granularity=4096 set in cxl_decode_init()
do not need to be updated.
Fixes: 09d09e04d2 ("cxl/dax: Create dax devices for CXL RAM regions")
Tested-by: Dave Jiang <dave.jiang@intel.com>
Tested-by: Gregory Price <gregory.price@memverge.com>
Link: https://lore.kernel.org/r/168012575521.221280.14177293493678527326.stgit@dwillia2-xfh.jf.intel.com
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Recall that range register emulation seeks to treat the 2 potential
range registers as Linux CXL "decoder" objects. The number of range
registers can be 1 or 2, while HDM decoder ranges can include more than
2.
Be careful not to confuse DVSEC range count with HDM capability decoder
count. Commit to range register earlier in devm_cxl_setup_hdm().
Otherwise, a device with more HDM decoders than range registers can set
@cxlhdm->decoder_count to an invalid value.
Avoid introducing a forward declaration by just moving the definition of
should_emulate_decoders() earlier in the file. should_emulate_decoders()
is unchanged.
Tested-by: Dave Jiang <dave.jiang@intel.com>
Fixes: d7a2153762 ("cxl/hdm: Add emulation when HDM decoders are not committed")
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/168012574932.221280.15944705098679646436.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Each time the contents of a given HPA are potentially changed in a cache
incoherent manner the CXL core sets CXL_REGION_F_INCOHERENT to
invalidate CPU caches before the region is used.
Successful invocation of attach_target() indicates that DPA has been
newly assigned to a given HPA in the dynamic region creation flow.
However, attach_target() is also reused in the autodiscovery flow where
the region was activated by platform firmware. In that case there is no
need to invalidate caches because that region is already in active use
and nothing about the autodiscovery flow modifies the HPA-to-DPA
relationship.
In the autodiscovery case cxl_region_attach() exits early after
determining the endpoint decoder is already correctly attached to the
region.
Fixes: a32320b71f ("cxl/region: Add region autodiscovery")
Reviewed-by: Fan Ni <fan.ni@samsung.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/168002858817.50647.1217607907088920888.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
RCDs (CXL memory devices that link train without VH capability and show
up as root complex integrated endpoints), hide the presence of the link
between the endpoint and the host-bridge. The CXL region setup/teardown
paths assume that a link hop is present and go looking for at least one
'struct cxl_port' instance between the CXL root port-object and an
endpoint port-object leading to crashes of the form:
BUG: kernel NULL pointer dereference, address: 0000000000000008
[..]
RIP: 0010:cxl_region_setup_targets+0x3e9/0xae0 [cxl_core]
[..]
Call Trace:
<TASK>
cxl_region_attach+0x46c/0x7a0 [cxl_core]
cxl_create_region+0x20b/0x270 [cxl_core]
cxl_mock_mem_probe+0x641/0x800 [cxl_mock_mem]
platform_probe+0x5b/0xb0
Detect RCDs explicitly and skip walking the non-existent port hierarchy
between root and endpoint in that case.
While this has been a problem since:
commit 0a19bfc8de ("cxl/port: Add RCD endpoint port enumeration")
...it becomes a more reliable crash scenario with the new autodiscovery
implementation.
Fixes: a32320b71f ("cxl/region: Add region autodiscovery")
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/168002858268.50647.728091521032131326.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The find_cxl_root() helper is used to lookup root decoders and other CXL
platform topology information for a given endpoint. It turns out that
for RCDs it has never worked. The result of find_cxl_root(&cxlmd->dev)
is always NULL for the RCH topology case because it expects to find a
cxl_port at the host-bridge. RCH topologies only have the root cxl_port
object with the host-bridge as a dport. While there are no reports of
this being a problem to date, by inspection region enumeration should
crash as a result of this problem, and it does in a local unit test for
this scenario.
However, an observation that ever since:
commit f17b558d66 ("cxl/pmem: Refactor nvdimm device registration, delete the workqueue")
...all callers of find_cxl_root() occur after the memdev connection to
the port topology has been established. That means that find_cxl_root()
can be simplified to a walk of the endpoint port topology to the root.
Switch to that arrangement which also fixes the RCD bug.
Fixes: a32320b71f ("cxl/region: Add region autodiscovery")
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/168002857715.50647.344876437247313909.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Pull kvm fixes from Paolo Bonzini:
"PPC:
- Hide KVM_CAP_IRQFD_RESAMPLE if XIVE is enabled
s390:
- Fix handling of external interrupts in protected guests
x86:
- Resample the pending state of IOAPIC interrupts when unmasking them
- Fix usage of Hyper-V "enlightened TLB" on AMD
- Small fixes to real mode exceptions
- Suppress pending MMIO write exits if emulator detects exception
Documentation:
- Fix rST syntax"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
docs: kvm: x86: Fix broken field list
KVM: PPC: Make KVM_CAP_IRQFD_RESAMPLE platform dependent
KVM: s390: pv: fix external interruption loop not always detected
KVM: nVMX: Do not report error code when synthesizing VM-Exit from Real Mode
KVM: x86: Clear "has_error_code", not "error_code", for RM exception injection
KVM: x86: Suppress pending MMIO write exits if emulator detects exception
KVM: x86/ioapic: Resample the pending state of an IRQ when unmasking
KVM: irqfd: Make resampler_list an RCU list
KVM: SVM: Flush Hyper-V TLB when required
Initialization must be completed before calling _vdpa_register_device()
since it can connect the device to the vDPA bus, so requests can arrive
after that call.
So for example vdpasim_net_work(), which uses the net->*_stats variables,
can be scheduled before they are initialized.
Let's move _vdpa_register_device() to the end of vdpasim_net_dev_add()
and add a comment to avoid future issues.
Fixes: 0899774cb3 ("vdpa_sim_net: vendor satistics")
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20230329160321.187176-1-sgarzare@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Pull nfsd fixes from Chuck Lever:
- Fix a crash and a resource leak in NFSv4 COMPOUND processing
- Fix issues with AUTH_SYS credential handling
- Try again to address an NFS/NFSD/SUNRPC build dependency regression
* tag 'nfsd-6.3-5' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
NFSD: callback request does not use correct credential for AUTH_SYS
NFS: Remove "select RPCSEC_GSS_KRB5
sunrpc: only free unix grouplist after RCU settles
nfsd: call op_release, even when op_func returns an error
NFSD: Avoid calling OPDESC() with ops->opnum == OP_ILLEGAL
Code that passes a 32-bit constant into cmpxchg() produces a harmless
sparse warning because of the truncation in the branch that is not taken:
fs/erofs/zdata.c: note: in included file (through /home/arnd/arm-soc/arch/arm/include/asm/cmpxchg.h, /home/arnd/arm-soc/arch/arm/include/asm/atomic.h, /home/arnd/arm-soc/include/linux/atomic.h, ...):
include/asm-generic/cmpxchg-local.h:29:33: warning: cast truncates bits from constant value (5f0ecafe becomes fe)
include/asm-generic/cmpxchg-local.h:33:34: warning: cast truncates bits from constant value (5f0ecafe becomes cafe)
include/asm-generic/cmpxchg-local.h:29:33: warning: cast truncates bits from constant value (5f0ecafe becomes fe)
include/asm-generic/cmpxchg-local.h:30:42: warning: cast truncates bits from constant value (5f0edead becomes ad)
include/asm-generic/cmpxchg-local.h:33:34: warning: cast truncates bits from constant value (5f0ecafe becomes cafe)
include/asm-generic/cmpxchg-local.h:34:44: warning: cast truncates bits from constant value (5f0edead becomes dead)
This was reported as a regression to Matt's recent __generic_cmpxchg_local
patch, though this patch only added more warnings on top of the ones
that were already there.
Rewording the truncation to use an explicit bitmask instead of a cast
to a smaller type avoids the warning but otherwise leaves the code
unchanged.
I had another look at why the cast is even needed for atomic_cmpxchg(),
and as Matt describes the problem here is that atomic_t contains a
signed 'int', but cmpxchg() takes an 'unsigned long' argument, and
converting between the two leads to a 64-bit sign-extension of
negative 32-bit atomics.
I checked the other implementations of arch_cmpxchg() and did not find
any others that run into the same problem as __generic_cmpxchg_local(),
but it's easy to be on the safe side here and always convert the
signed int into an unsigned int when calling arch_cmpxchg(), as this
will work even when any of the arch_cmpxchg() implementations run
into the same problem.
Fixes: 6246541522 ("locking/atomic: cmpxchg: Make __generic_cmpxchg_local compare against zero-extended 'old' value")
Reviewed-by: Matt Evans <mev@rivosinc.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Copy the forced type casts from the normal MMIO accessors to suppress
the sparse warnings that point out __raw_readl() returns a native endian
word (just like readl()).
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
The existing pKVM code attempts to advertise CSV2/3 using values
initialized to 0, but never set. To advertise CSV2/3 to protected
guests, pass the CSV2/3 values to hyp when initializing hyp's
view of guests' ID_AA64PFR0_EL1.
Similar to non-protected KVM, these are system-wide, rather than
per cpu, for simplicity.
Fixes: 6c30bfb18d ("KVM: arm64: Add handlers for protected VM System Registers")
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20230404152321.413064-1-tabba@google.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Reset the FDIR counters when FDIR inits. Without this patch,
when VF initializes or resets, all the FDIR counters are not
cleaned, which may cause unexpected behaviors for future FDIR
rule create (e.g., rule conflict).
Fixes: 1f7ea1cd6a ("ice: Enable FDIR Configure for AVF")
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
Signed-off-by: Lingyu Liu <lingyu.liu@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
When adding a FDIR filter, if ice_vc_fdir_set_irq_ctx returns failure,
the inserted fdir entry will not be removed and if ice_vc_fdir_write_fltr
returns failure, the fdir context info for irq handler will not be cleared
which may lead to inconsistent or memory leak issue. This patch refines
failure cases to resolve this issue.
Fixes: 1f7ea1cd6a ("ice: Enable FDIR Configure for AVF")
Signed-off-by: Simei Su <simei.su@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The right place to add the debugfs create is in
setup_driver() and remove it in teardown_driver().
Current code adds the debugfs when creating the device but resetting a
device will remove the debugfs subtree and subsequent set_driver will
not be able to create the files since the debugfs pointer is NULL.
Fixes: 2942210043 ("vdpa/mlx5: Add debugfs subtree")
Signed-off-by: Eli Cohen <elic@nvidia.com>
v3 -> v4:
Fix error flow in setup_driver()
Message-Id: <20230403114039.11102-1-elic@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
We need to have a unique chardev for each data path, else the chardevs
will collide and qemu will die with this message:
qemu-system-x86_64: -device
virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel0,
id=channel1,name=trace-path-cpu0:
Property 'virtserialport.chardev' can't take value 'charchannel0':
Device 'charchannel0' is in use
Signed-off-by: Ross Zwisler <zwisler@google.com>
Message-Id: <20230215223350.2658616-7-zwisler@google.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
We normally clear the endpoint then unmap LUNs so the devices are fully
shutdown when the LUN is unmapped, but it's legal to unmap before
clearing. If the user does that while TMFs are running then we can end
up crashing.
vhost_scsi_port_unlink assumes that the LUN's tmf struct will always be on
the tmf_queue list. However, if a TMF is running then it will have been
removed while it's executing. If we do a LUN unmap at this time, then
we assume the entry is on the list and just start accessing it and free
it.
This fixes the bug by just allocating the vhost_scsi_tmf struct when it's
needed like is done with the se_tmr struct that's needed when we submit
the TMF. In this path perf is not an issue and we can use GFP_KERNEL
since it won't swing directly back on us, so we don't need to preallocate
the struct.
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Message-Id: <20230321020624.13323-3-michael.christie@oracle.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
If vhost_scsi_setup_vq_cmds fails we leave the tpg->vhost_scsi pointer
set. If the device is freed and then the user unmaps the LUN, the call to
vhost_scsi_port_unlink -> vhost_scsi_hotunplug will see the that
tpg->vhost_scsi is still set and try to use it.
This has us clear the vhost_scsi pointer in the failure path. It also
has us take tv_tpg_mutex in this failure path, because tv_tpg_vhost_count
is accessed under this mutex in vhost_scsi_drop_nexus and in the future
we will want to serialize access to tpg->vhost_scsi with that mutex
instead of the vhost_scsi_mutex.
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Message-Id: <20230321020624.13323-2-michael.christie@oracle.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
When the kernel is built without support for zoned block devices,
virtio-blk probe needs to error out any host-managed device scans
to prevent such devices from appearing in the system as non-zoned.
The current virtio-blk code simply bypasses all ZBD checks if
CONFIG_BLK_DEV_ZONED is not defined and this leads to host-managed
block devices being presented as non-zoned in the OS. This is one of
the main problems this patch series is aimed to fix.
In this patch, make VIRTIO_BLK_F_ZONED feature defined even when
CONFIG_BLK_DEV_ZONED is not. This change makes the code compliant with
the voted revision of virtio-blk ZBD spec. Modify the probe code to
look at the situation when VIRTIO_BLK_F_ZONED is negotiated in a kernel
that is built without ZBD support. In this case, the code checks
the zoned model of the device and fails the probe is the device
is host-managed.
The patch also adds the comment to clarify that the call to perform
the zoned device probe is correctly placed after virtio_device ready().
Fixes: 95bfec41bd ("virtio-blk: add support for zoned block devices")
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Message-Id: <20230330214953.1088216-3-dmitry.fomichev@wdc.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The merged patch series to support zoned block devices in virtio-blk
is not the most up to date version. The merged patch can be found at
https://lore.kernel.org/linux-block/20221016034127.330942-3-dmitry.fomichev@wdc.com/
but the latest and reviewed version is
https://lore.kernel.org/linux-block/20221110053952.3378990-3-dmitry.fomichev@wdc.com/
The reason is apparently that the correct mailing lists and
maintainers were not copied.
The differences between the two are mostly cleanups, but there is one
change that is very important in terms of compatibility with the
approved virtio-zbd specification.
Before it was approved, the OASIS virtio spec had a change in
VIRTIO_BLK_T_ZONE_APPEND request layout that is not reflected in the
current virtio-blk driver code. In the running code, the status is
the first byte of the in-header that is followed by some pad bytes
and the u64 that carries the sector at which the data has been written
to the zone back to the driver, aka the append sector.
This layout turned out to be problematic for implementing in QEMU and
the request status byte has been eventually made the last byte of the
in-header. The current code doesn't expect that and this causes the
append sector value always come as zero to the block layer. This needs
to be fixed ASAP.
Fixes: 95bfec41bd ("virtio-blk: add support for zoned block devices")
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Message-Id: <20230330214953.1088216-2-dmitry.fomichev@wdc.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Currently callback request does not use the credential specified in
CREATE_SESSION if the security flavor for the back channel is AUTH_SYS.
Problem was discovered by pynfs 4.1 DELEG5 and DELEG7 test with error:
DELEG5 st_delegation.testCBSecParms : FAILURE
expected callback with uid, gid == 17, 19, got 0, 0
Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Fixes: 8276c902bb ("SUNRPC: remove uid and gid from struct auth_cred")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
If CONFIG_CRYPTO=n (e.g. arm/shmobile_defconfig):
WARNING: unmet direct dependencies detected for RPCSEC_GSS_KRB5
Depends on [n]: NETWORK_FILESYSTEMS [=y] && SUNRPC [=y] && CRYPTO [=n]
Selected by [y]:
- NFS_V4 [=y] && NETWORK_FILESYSTEMS [=y] && NFS_FS [=y]
As NFSv4 can work without crypto enabled, remove the RPCSEC_GSS_KRB5
dependency altogether.
Trond says:
> It is possible to use the NFSv4.1 client with just AUTH_SYS, and
> in fact there are plenty of people out there using only that. The
> fact that RFC5661 gets its knickers in a twist about RPCSEC_GSS
> support is largely irrelevant to those people.
>
> The other issue is that ’select’ enforces the strict dependency
> that if the NFS client is compiled into the kernel, then the
> RPCSEC_GSS and kerberos code needs to be compiled in as well: they
> cannot exist as modules.
Fixes: e57d065277 ("NFS & NFSD: Update GSS dependencies")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Niklas Söderlund <niklas.soderlund@ragnatech.se>
Suggested-by: Trond Myklebust <trondmy@hammerspace.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
syzkaller found that the calculation of batch_last_index should use
'start_index' since at input to this function the batch is either empty or
it has already been adjusted to cross any accesses so it will start at the
point we are unmapping from.
Getting this wrong causes the unmap to run over the end of the pages
which corrupts pages that were never mapped. In most cases this triggers
the num pinned debugging:
WARNING: CPU: 0 PID: 557 at drivers/iommu/iommufd/pages.c:294 __iopt_area_unfill_domain+0x152/0x560
Modules linked in:
CPU: 0 PID: 557 Comm: repro Not tainted 6.3.0-rc2-eeac8ede1755 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
RIP: 0010:__iopt_area_unfill_domain+0x152/0x560
Code: d2 0f ff 44 8b 64 24 54 48 8b 44 24 48 31 ff 44 89 e6 48 89 44 24 38 e8 fc d3 0f ff 45 85 e4 0f 85 eb 01 00 00 e8 0e d2 0f ff <0f> 0b e8 07 d2 0f ff 48 8b 44 24 38 89 5c 24 58 89 18 8b 44 24 54
RSP: 0018:ffffc9000108baf0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 00000000ffffffff RCX: ffffffff821e3f85
RDX: 0000000000000000 RSI: ffff88800faf0000 RDI: 0000000000000002
RBP: ffffc9000108bd18 R08: 000000000003ca25 R09: 0000000000000014
R10: 000000000003ca00 R11: 0000000000000024 R12: 0000000000000004
R13: 0000000000000801 R14: 00000000000007ff R15: 0000000000000800
FS: 00007f3499ce1740(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020000243 CR3: 00000000179c2001 CR4: 0000000000770ef0
PKRU: 55555554
Call Trace:
<TASK>
iopt_area_unfill_domain+0x32/0x40
iopt_table_remove_domain+0x23f/0x4c0
iommufd_device_selftest_detach+0x3a/0x90
iommufd_selftest_destroy+0x55/0x70
iommufd_object_destroy_user+0xce/0x130
iommufd_destroy+0xa2/0xc0
iommufd_fops_ioctl+0x206/0x330
__x64_sys_ioctl+0x10e/0x160
do_syscall_64+0x3b/0x90
entry_SYSCALL_64_after_hwframe+0x72/0xdc
Also add some useful WARN_ON sanity checks.
Cc: <stable@vger.kernel.org>
Fixes: 8d160cd4d5 ("iommufd: Algorithms for PFN storage")
Link: https://lore.kernel.org/r/2-v1-ceab6a4d7d7a+94-iommufd_syz_jgg@nvidia.com
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reported-by: Pengfei Xu <pengfei.xu@intel.com>
Tested-by: Pengfei Xu <pengfei.xu@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Platform device helper routines won't update the NUMA distance table
while creating a platform device, even if the device is present on a
NUMA node that doesn't have memory or CPU. This is especially true for
pmem devices. If the target node of the pmem device is not online, we
find the nearest online node to the device and associate the pmem device
with that online node. To find the nearest online node, we should have
the numa distance table updated correctly. Update the distance
information during the device probe.
For a papr scm device on NUMA node 3 distance_lookup_table value for
distance_ref_points_depth = 2 before and after fix is below:
Before fix:
node 3 distance depth 0 - 0
node 3 distance depth 1 - 0
node 4 distance depth 0 - 4
node 4 distance depth 1 - 2
node 5 distance depth 0 - 5
node 5 distance depth 1 - 1
After fix
node 3 distance depth 0 - 3
node 3 distance depth 1 - 1
node 4 distance depth 0 - 4
node 4 distance depth 1 - 2
node 5 distance depth 0 - 5
node 5 distance depth 1 - 1
Without the fix, the nearest numa node to the pmem device (NUMA node 3)
will be picked as 4. After the fix, we get the correct numa node which
is 5.
Fixes: da1115fdbd ("powerpc/nvdimm: Pick nearby online node if the device node is not online")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230404041433.1781804-1-aneesh.kumar@linux.ibm.com
In the am65_cpsw_nuss_probe() function's cleanup path, the call to
of_platform_device_destroy() for the common->mdio_dev device is invoked
unconditionally. It is possible that either the MDIO node is not present
in the device-tree, or the MDIO node is disabled in the device-tree. In
both these cases, the MDIO device is not created, resulting in a NULL
pointer dereference when the of_platform_device_destroy() function is
invoked on the common->mdio_dev device on the cleanup path.
Fix this by ensuring that the common->mdio_dev device exists, before
attempting to invoke of_platform_device_destroy().
Fixes: a45cfcc69a ("net: ethernet: ti: am65-cpsw-nuss: use of_platform_device_create() for mdio")
Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com>
Reviewed-by: Roger Quadros <rogerq@kernel.org>
Link: https://lore.kernel.org/r/20230403090321.835877-1-s-vadapalli@ti.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Gregory Price reports a WARN splat with CONFIG_DEBUG_OBJECTS=y upon CXL
probing because pci_doe_submit_task() invokes INIT_WORK() instead of
INIT_WORK_ONSTACK() for a work_struct that was allocated on the stack.
All callers of pci_doe_submit_task() allocate the work_struct on the
stack, so replace INIT_WORK() with INIT_WORK_ONSTACK() as a backportable
short-term fix.
The long-term fix implemented by a subsequent commit is to move to a
synchronous API which allocates the work_struct internally in the DOE
library.
Stacktrace for posterity:
WARNING: CPU: 0 PID: 23 at lib/debugobjects.c:545 __debug_object_init.cold+0x18/0x183
CPU: 0 PID: 23 Comm: kworker/u2:1 Not tainted 6.1.0-0.rc1.20221019gitaae703b02f92.17.fc38.x86_64 #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
Call Trace:
pci_doe_submit_task+0x5d/0xd0
pci_doe_discovery+0xb4/0x100
pcim_doe_create_mb+0x219/0x290
cxl_pci_probe+0x192/0x430
local_pci_probe+0x41/0x80
pci_device_probe+0xb3/0x220
really_probe+0xde/0x380
__driver_probe_device+0x78/0x170
driver_probe_device+0x1f/0x90
__driver_attach_async_helper+0x5c/0xe0
async_run_entry_fn+0x30/0x130
process_one_work+0x294/0x5b0
Fixes: 9d24322e88 ("PCI/DOE: Add DOE mailbox support functions")
Link: https://lore.kernel.org/linux-cxl/Y1bOniJliOFszvIK@memverge.com/
Reported-by: Gregory Price <gregory.price@memverge.com>
Tested-by: Ira Weiny <ira.weiny@intel.com>
Tested-by: Gregory Price <gregory.price@memverge.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Gregory Price <gregory.price@memverge.com>
Cc: stable@vger.kernel.org # v6.0+
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/r/67a9117f463ecdb38a2dbca6a20391ce2f1e7a06.1678543498.git.lukas@wunner.de
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
If truncated CDAT entries are received from a device, the concatenation
of those entries constitutes a corrupt CDAT, yet is happily exposed to
user space.
Avoid by verifying response lengths and erroring out if truncation is
detected.
The last CDAT entry may still be truncated despite the checks introduced
herein if the length in the CDAT header is too small. However, that is
easily detectable by user space because it reaches EOF prematurely.
A subsequent commit which rightsizes the CDAT response allocation closes
that remaining loophole.
The two lines introduced here which exceed 80 chars are shortened to
less than 80 chars by a subsequent commit which migrates to a
synchronous DOE API and replaces "t.task.rv" by "rc".
The existing acpi_cdat_header and acpi_table_cdat struct definitions
provided by ACPICA cannot be used because they do not employ __le16 or
__le32 types. I believe that cannot be changed because those types are
Linux-specific and ACPI is specified for little endian platforms only,
hence doesn't care about endianness. So duplicate the structs.
Fixes: c97006046c ("cxl/port: Read CDAT table")
Tested-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: stable@vger.kernel.org # v6.0+
Link: https://lore.kernel.org/r/bce3aebc0e8e18a1173425a7a865b232c3912963.1678543498.git.lukas@wunner.de
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Pull vfs fix from Christian Brauner:
"When a mount or mount tree is made shared the vfs allocates new peer
group ids for all mounts that have no peer group id set. Only mounts
that aren't marked with MNT_SHARED are relevant here as MNT_SHARED
indicates that the mount has fully transitioned to a shared mount. The
peer group id handling is done with namespace lock held.
On failure, the peer group id settings of mounts for which a new peer
group id was allocated need to be reverted and the allocated peer
group id freed. The cleanup_group_ids() helper can identify the mounts
to cleanup by checking whether a given mount has a peer group id set
but isn't marked MNT_SHARED. The deallocation always needs to happen
with namespace lock held to protect against concurrent modifications
of the propagation settings.
This fixes the one place where the namespace lock was dropped before
calling cleanup_group_ids()"
* tag 'vfs.misc.fixes.v6.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping:
fs: drop peer group ids under namespace lock
Pull hyperv fixes from Wei Liu:
- Fix a bug in channel allocation for VMbus (Mohammed Gamal)
- Do not allow root partition functionality in CVM (Michael Kelley)
* tag 'hyperv-fixes-signed-20230402' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
x86/hyperv: Block root partition functionality in a Confidential VM
Drivers: vmbus: Check for channel allocation before looking up relids
A __field() in the TRACE_EVENT() macro is used to set up the fields of the
trace event data. It is for single storage units (word, char, int,
pointer, etc) and not for complex structures or arrays. Unfortunately,
there's nothing preventing the build from accepting:
__field(int, arr[5]);
from building. It will turn into a array value. This use to work fine, as
the offset and size use to be determined by the macro using the field name,
but things have changed and the offset and size are now determined by the
type. So the above would only be size 4, and the next field will be
located 4 bytes from it (instead of 20).
The proper way to declare static arrays is to use the __array() macro.
Instead of __field(int, arr[5]) it should be __array(int, arr, 5).
Add some macro tricks to the building of a trace event from the
TRACE_EVENT() macro such that __field(int, arr[5]) will fail to build. A
comment by the failure will explain why the build failed.
Link: https://lore.kernel.org/lkml/20230306122549.236561-1-douglas.raillard@arm.com/
Link: https://lore.kernel.org/linux-trace-kernel/20230309221302.642e82d9@gandalf.local.home
Reported-by: Douglas RAILLARD <douglas.raillard@arm.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
When user reads file 'trace_pipe', kernel keeps printing following logs
that warn at "cpu_buffer->reader_page->read > rb_page_size(reader)" in
rb_get_reader_page(). It just looks like there's an infinite loop in
tracing_read_pipe(). This problem occurs several times on arm64 platform
when testing v5.10 and below.
Call trace:
rb_get_reader_page+0x248/0x1300
rb_buffer_peek+0x34/0x160
ring_buffer_peek+0xbc/0x224
peek_next_entry+0x98/0xbc
__find_next_entry+0xc4/0x1c0
trace_find_next_entry_inc+0x30/0x94
tracing_read_pipe+0x198/0x304
vfs_read+0xb4/0x1e0
ksys_read+0x74/0x100
__arm64_sys_read+0x24/0x30
el0_svc_common.constprop.0+0x7c/0x1bc
do_el0_svc+0x2c/0x94
el0_svc+0x20/0x30
el0_sync_handler+0xb0/0xb4
el0_sync+0x160/0x180
Then I dump the vmcore and look into the problematic per_cpu ring_buffer,
I found that tail_page/commit_page/reader_page are on the same page while
reader_page->read is obviously abnormal:
tail_page == commit_page == reader_page == {
.write = 0x100d20,
.read = 0x8f9f4805, // Far greater than 0xd20, obviously abnormal!!!
.entries = 0x10004c,
.real_end = 0x0,
.page = {
.time_stamp = 0x857257416af0,
.commit = 0xd20, // This page hasn't been full filled.
// .data[0...0xd20] seems normal.
}
}
The root cause is most likely the race that reader and writer are on the
same page while reader saw an event that not fully committed by writer.
To fix this, add memory barriers to make sure the reader can see the
content of what is committed. Since commit a0fcaaed0c ("ring-buffer: Fix
race between reset page and reading page") has added the read barrier in
rb_get_reader_page(), here we just need to add the write barrier.
Link: https://lore.kernel.org/linux-trace-kernel/20230325021247.2923907-1-zhengyejian1@huawei.com
Cc: stable@vger.kernel.org
Fixes: 77ae365eca ("ring-buffer: make lockless")
Suggested-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Currently, the "last_cmd" variable can be accessed by multiple processes
asynchronously when multiple users manipulate synthetic_events node
at the same time, it could lead to use-after-free or double-free.
This patch add "lastcmd_mutex" to prevent "last_cmd" from being accessed
asynchronously.
================================================================
It's easy to reproduce in the KASAN environment by running the two
scripts below in different shells.
script 1:
while :
do
echo -n -e '\x88' > /sys/kernel/tracing/synthetic_events
done
script 2:
while :
do
echo -n -e '\xb0' > /sys/kernel/tracing/synthetic_events
done
================================================================
double-free scenario:
process A process B
------------------- ---------------
1.kstrdup last_cmd
2.free last_cmd
3.free last_cmd(double-free)
================================================================
use-after-free scenario:
process A process B
------------------- ---------------
1.kstrdup last_cmd
2.free last_cmd
3.tracing_log_err(use-after-free)
================================================================
Appendix 1. KASAN report double-free:
BUG: KASAN: double-free in kfree+0xdc/0x1d4
Free of addr ***** by task sh/4879
Call trace:
...
kfree+0xdc/0x1d4
create_or_delete_synth_event+0x60/0x1e8
trace_parse_run_command+0x2bc/0x4b8
synth_events_write+0x20/0x30
vfs_write+0x200/0x830
...
Allocated by task 4879:
...
kstrdup+0x5c/0x98
create_or_delete_synth_event+0x6c/0x1e8
trace_parse_run_command+0x2bc/0x4b8
synth_events_write+0x20/0x30
vfs_write+0x200/0x830
...
Freed by task 5464:
...
kfree+0xdc/0x1d4
create_or_delete_synth_event+0x60/0x1e8
trace_parse_run_command+0x2bc/0x4b8
synth_events_write+0x20/0x30
vfs_write+0x200/0x830
...
================================================================
Appendix 2. KASAN report use-after-free:
BUG: KASAN: use-after-free in strlen+0x5c/0x7c
Read of size 1 at addr ***** by task sh/5483
sh: CPU: 7 PID: 5483 Comm: sh
...
__asan_report_load1_noabort+0x34/0x44
strlen+0x5c/0x7c
tracing_log_err+0x60/0x444
create_or_delete_synth_event+0xc4/0x204
trace_parse_run_command+0x2bc/0x4b8
synth_events_write+0x20/0x30
vfs_write+0x200/0x830
...
Allocated by task 5483:
...
kstrdup+0x5c/0x98
create_or_delete_synth_event+0x80/0x204
trace_parse_run_command+0x2bc/0x4b8
synth_events_write+0x20/0x30
vfs_write+0x200/0x830
...
Freed by task 5480:
...
kfree+0xdc/0x1d4
create_or_delete_synth_event+0x74/0x204
trace_parse_run_command+0x2bc/0x4b8
synth_events_write+0x20/0x30
vfs_write+0x200/0x830
...
Link: https://lore.kernel.org/linux-trace-kernel/20230321110444.1587-1-Tze-nan.Wu@mediatek.com
Fixes: 27c888da98 ("tracing: Remove size restriction on synthetic event cmd error logging")
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Cc: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Cc: "Tom Zanussi" <zanussi@kernel.org>
Signed-off-by: Tze-nan Wu <Tze-nan.Wu@mediatek.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
When BPF_TRAMP_F_CALL_ORIG is set, BPF trampoline uses BLR to jump
back to the instruction next to call site to call the patched function.
For BTI-enabled kernel, the instruction next to call site is usually
PACIASP, in this case, it's safe to jump back with BLR. But when
the call site is not followed by a PACIASP or bti, a BTI exception
is triggered.
Here is a fault log:
Unhandled 64-bit el1h sync exception on CPU0, ESR 0x0000000034000002 -- BTI
CPU: 0 PID: 263 Comm: test_progs Tainted: GF
Hardware name: linux,dummy-virt (DT)
pstate: 40400805 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=-c)
pc : bpf_fentry_test1+0xc/0x30
lr : bpf_trampoline_6442573892_0+0x48/0x1000
sp : ffff80000c0c3a50
x29: ffff80000c0c3a90 x28: ffff0000c2e6c080 x27: 0000000000000000
x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000050
x23: 0000000000000000 x22: 0000ffffcfd2a7f0 x21: 000000000000000a
x20: 0000ffffcfd2a7f0 x19: 0000000000000000 x18: 0000000000000000
x17: 0000000000000000 x16: 0000000000000000 x15: 0000ffffcfd2a7f0
x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
x11: 0000000000000000 x10: ffff80000914f5e4 x9 : ffff8000082a1528
x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0101010101010101
x5 : 0000000000000000 x4 : 00000000fffffff2 x3 : 0000000000000001
x2 : ffff8001f4b82000 x1 : 0000000000000000 x0 : 0000000000000001
Kernel panic - not syncing: Unhandled exception
CPU: 0 PID: 263 Comm: test_progs Tainted: GF
Hardware name: linux,dummy-virt (DT)
Call trace:
dump_backtrace+0xec/0x144
show_stack+0x24/0x7c
dump_stack_lvl+0x8c/0xb8
dump_stack+0x18/0x34
panic+0x1cc/0x3ec
__el0_error_handler_common+0x0/0x130
el1h_64_sync_handler+0x60/0xd0
el1h_64_sync+0x78/0x7c
bpf_fentry_test1+0xc/0x30
bpf_fentry_test1+0xc/0x30
bpf_prog_test_run_tracing+0xdc/0x2a0
__sys_bpf+0x438/0x22a0
__arm64_sys_bpf+0x30/0x54
invoke_syscall+0x78/0x110
el0_svc_common.constprop.0+0x6c/0x1d0
do_el0_svc+0x38/0xe0
el0_svc+0x30/0xd0
el0t_64_sync_handler+0x1ac/0x1b0
el0t_64_sync+0x1a0/0x1a4
Kernel Offset: disabled
CPU features: 0x0000,00034c24,f994fdab
Memory Limit: none
And the instruction next to call site of bpf_fentry_test1 is ADD,
not PACIASP:
<bpf_fentry_test1>:
bti c
nop
nop
add w0, w0, #0x1
paciasp
For BPF prog, JIT always puts a PACIASP after call site for BTI-enabled
kernel, so there is no problem. To fix it, replace BLR with RET to bypass
the branch target check.
Fixes: efc9909fdc ("bpf, arm64: Add bpf trampoline for arm64")
Reported-by: Florent Revest <revest@chromium.org>
Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Florent Revest <revest@chromium.org>
Acked-by: Florent Revest <revest@chromium.org>
Link: https://lore.kernel.org/bpf/20230401234144.3719742-1-xukuohai@huaweicloud.com
Add the IRQCHIP_SKIP_SET_WAKE flag since there are no special IRQ Wake
bits that can be set to enable wakeup IRQ.
Fixes: 3d9edf09d4 ("[ARM] 4457/2: davinci: GPIO support")
Signed-off-by: Dhruva Gole <d-gole@ti.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
The interrupt enable bits might be set if we want to use the GPIO as
wakeup source. Clearing this will mean disabling of interrupts in the GPIO
banks that we may want to wakeup from.
Thus remove the line that was clearing this bit from the driver's save
context function.
Cc: Devarsh Thakkar <devarsht@ti.com>
Fixes: 0651a73092 ("gpio: davinci: Add support for system suspend/resume PM")
Signed-off-by: Dhruva Gole <d-gole@ti.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Keerthy <j-keerthy@ti.com>
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Currently ath11k breaks after hibernation, the reason being that ath11k expects
that the wireless device will have power during suspend and the firmware will
continue running. But of course during hibernation the power from the device is
cut off and firmware is not running when resuming, so ath11k will fail.
(The reason why ath11k needs the firmware running is the interaction between
mac80211 and MHI stack, it's a long story and more info in the bugzilla report.)
In SUSE kernels the watchdog timeout is reduced from the default 120 to 60 seconds:
CONFIG_DPM_WATCHDOG_TIMEOUT=60
But as the ath11k MHI timeout is 90 seconds the kernel will crash before will
ath11k will recover in resume callback. To avoid the crash reduce the MHI
timeout to just 20 seconds.
Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.9
Link: https://bugzilla.kernel.org/show_bug.cgi?id=214649
Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20230329162038.8637-1-kvalo@kernel.org
On the remote side, when QRTR socket is removed, af_qrtr will call
qrtr_port_remove() which broadcasts the DEL_CLIENT packet to all neighbours
including local NS. NS upon receiving the DEL_CLIENT packet, will remove
the lookups associated with the node:port and broadcasts the DEL_SERVER
packet.
But on the host side, due to the arrival of the DEL_CLIENT packet, the NS
would've already deleted the server belonging to that port. So when the
remote's NS again broadcasts the DEL_SERVER for that port, it throws below
error message on the host:
"failed while handling packet from 2:-2"
So fix this error by not broadcasting the DEL_SERVER packet when the
DEL_CLIENT packet gets processed."
Fixes: 0c2204a4ad ("net: qrtr: Migrate nameservice to kernel from userspace")
Reviewed-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Ram Kumar Dharuman <quic_ramd@quicinc.com>
Signed-off-by: Sricharan Ramabadhran <quic_srichara@quicinc.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When considering whether to mark one context as stopped and another as
started we need to look at whether the previous and new _contexts_ are
different and not just requests. Otherwise the software tracked context
start time was incorrectly updated to the most recent lite-restore time-
stamp, which was in some cases resulting in active time going backward,
until the context switch (typically the heartbeat pulse) would synchronise
with the hardware tracked context runtime. Easiest use case to observe
this behaviour was with a full screen clients with close to 100% engine
load.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Fixes: bb6287cb18 ("drm/i915: Track context current active time")
Cc: <stable@vger.kernel.org> # v5.19+
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230320151423.1708436-1-tvrtko.ursulin@linux.intel.com
[tursulin: Fix spelling in commit msg.]
(cherry picked from commit b3e7005187)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
When smb2_lock request is canceled by smb2_cancel or smb2_close(),
ksmbd is missing deleting async_request_entry async_requests list.
Because calling init_smb2_rsp_hdr() in smb2_lock() mark ->synchronous
as true and then it will not be deleted in
ksmbd_conn_try_dequeue_request(). This patch add release_async_work() to
release the ones allocated for async work.
Cc: stable@vger.kernel.org
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
There is a memory leak reported by kmemleak:
unreferenced object 0xffffc900003f0000 (size 12288):
comm "modprobe", pid 19117, jiffies 4299751452 (age 42490.264s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<00000000629261a8>] __vmalloc_node_range+0xe56/0x1110
[<0000000001906886>] __vmalloc_node+0xbd/0x150
[<000000005bb4dc34>] vmalloc+0x25/0x30
[<00000000a2dc1194>] qla2x00_create_host+0x7a0/0xe30 [qla2xxx]
[<0000000062b14b47>] qla2x00_probe_one+0x2eb8/0xd160 [qla2xxx]
[<00000000641ccc04>] local_pci_probe+0xeb/0x1a0
The root cause is traced to an error-handling path in qla2x00_probe_one()
when the adapter "base_vha" initialize failed. The fab_scan_rp "scan.l" is
used to record the port information and it is allocated in
qla2x00_create_host(). However, it is not released in the error handling
path "probe_failed".
Fix this by freeing the memory of "scan.l" when an error occurs in the
adapter initialization process.
Fixes: a4239945b8 ("scsi: qla2xxx: Add switch command to simplify fabric discovery")
Signed-off-by: Li Zetao <lizetao1@huawei.com>
Link: https://lore.kernel.org/r/20230325110004.363898-1-lizetao1@huawei.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The driver is exiting from the fault watchdog thread if it sees the 0xF002
(Soft reset in progress) fault code.
If the driver initiates the soft reset, then the driver restarts the
watchdog at the end of the soft reset completion. However, if the soft
reset is initiated by the firmware asynchronously, then the driver will
never restart the watchdog and never re-initialize the controller after the
asynchronous soft reset completion.
Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com>
Link: https://lore.kernel.org/r/20230331122317.11391-1-ranjan.kumar@broadcom.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
We've recently moved the mailing list to lists.linux.dev to move away
from the sourceforge infrastructure. This also updates the website
from the (no longer v9fs relevant?) swik.net address to the github
group which contains pointers to test cases, the protocol, servers,
etc. This also changes my email from my gmail to my kernel.org
address.
Signed-off-by: Eric Van Hensbergen <ericvh@kernel.org>
Acked-by: Dominique Martinet <asmadeus@codewreck.org>
Acked-by: Christian Schoenebeck <linux_oss@crudebyte.com>
Pull btrfs fixes from David Sterba:
- scan block devices in non-exclusive mode to avoid temporary mkfs
failures
- fix race between quota disable and quota assign ioctls
- fix deadlock when aborting transaction during relocation with scrub
- ignore fiemap path cache when there are multiple paths for a node
* tag 'for-6.3-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: ignore fiemap path cache when there are multiple paths for a node
btrfs: fix deadlock when aborting transaction during relocation with scrub
btrfs: scan device in non-exclusive mode
btrfs: fix race between quota disable and quota assign ioctls
This reverts commit a837e5161c, which broke probing of the venus
driver, at least on the SC7180 SoC HP X2 Chromebook:
qcom-venus aa00000.video-codec: Adding to iommu group 11
qcom-venus aa00000.video-codec: non legacy binding
qcom-venus aa00000.video-codec: failed to reset venus core
qcom-venus: probe of aa00000.video-codec failed with error -110
Matthias Kaehlcke also reported that the same change caused a regression
in SC7180 and sc7280, that prevents AOSS from entering sleep mode during
system suspend. So let's revert this commit for now to fix both issues.
Fixes: a837e5161c ("venus: firmware: Correct non-pix start and end addresses")
Reported-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull driver core fixes from Greg KH:
"Here are three small changes for 6.3-rc5 semi-related to driver core
stuff:
- documentation update where we move the security_bugs file to a more
relevant location.
- mdt/spi-nor debugfs memory leak fix that's been floating around for
a long time and acked by the maintainer
- cacheinfo bugfix for a regression in 6.3-rc1
All have been in linux-next with no reported problems"
* tag 'driver-core-6.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
cacheinfo: Fix LLC is not exported through sysfs
Documentation/security-bugs: move from admin-guide/ to process/
mtd: spi-nor: fix memory leak when using debugfs_lookup()
Pull powerpc fixes from Michael Ellerman:
- Fix a false positive warning in __pte_needs_flush() (with DEBUG_VM=y)
- Fix oops when a PF_IO_WORKER thread tries to core dump
- Don't try to reconfigure VAS when it's disabled
Thanks to Benjamin Gray, Haren Myneni, Jens Axboe, Nathan Lynch, and
Russell Currey.
* tag 'powerpc-6.3-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/pseries/vas: Ignore VAS update for DLPAR if copy/paste is not enabled
powerpc: Don't try to copy PPR for task with NULL pt_regs
powerpc/64s: Fix __pte_needs_flush() false positive warning
This patch fixes a corner case where the asoc out stream count may change
after wait_for_sndbuf.
When the main thread in the client starts a connection, if its out stream
count is set to N while the in stream count in the server is set to N - 2,
another thread in the client keeps sending the msgs with stream number
N - 1, and waits for sndbuf before processing INIT_ACK.
However, after processing INIT_ACK, the out stream count in the client is
shrunk to N - 2, the same to the in stream count in the server. The crash
occurs when the thread waiting for sndbuf is awake and sends the msg in a
non-existing stream(N - 1), the call trace is as below:
KASAN: null-ptr-deref in range [0x0000000000000038-0x000000000000003f]
Call Trace:
<TASK>
sctp_cmd_send_msg net/sctp/sm_sideeffect.c:1114 [inline]
sctp_cmd_interpreter net/sctp/sm_sideeffect.c:1777 [inline]
sctp_side_effects net/sctp/sm_sideeffect.c:1199 [inline]
sctp_do_sm+0x197d/0x5310 net/sctp/sm_sideeffect.c:1170
sctp_primitive_SEND+0x9f/0xc0 net/sctp/primitive.c:163
sctp_sendmsg_to_asoc+0x10eb/0x1a30 net/sctp/socket.c:1868
sctp_sendmsg+0x8d4/0x1d90 net/sctp/socket.c:2026
inet_sendmsg+0x9d/0xe0 net/ipv4/af_inet.c:825
sock_sendmsg_nosec net/socket.c:722 [inline]
sock_sendmsg+0xde/0x190 net/socket.c:745
The fix is to add an unlikely check for the send stream number after the
thread wakes up from the wait_for_sndbuf.
Fixes: 5bbbbe32a4 ("sctp: introduce stream scheduler foundations")
Reported-by: syzbot+47c24ca20a2fa01f082e@syzkaller.appspotmail.com
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Based on further tests, it seems that the QDMA shaper is not able to
perform shaping close to the MAC link rate without throughput loss.
This cannot be compensated by increasing the shaping rate, so it seems
to be an internal limit.
Fix the remaining throughput regression by detecting that condition and
limiting shaping to ports with lower link speed.
This patch intentionally ignores link speed gain from TRGMII, because
even on such links, shaping to 1000 Mbit/s incurs some throughput
degradation.
Fixes: f63959c7ee ("net: ethernet: mtk_eth_soc: implement multi-queue support for per-port queues")
Tested-By: Frank Wunderlich <frank-w@public-files.de>
Reported-by: Frank Wunderlich <frank-w@public-files.de>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: David S. Miller <davem@davemloft.net>
The force watchdog event bit is not cleared during SW reset in the
mv88e6393x switch. This is a different behavior compared to mv886390 which
clears the force WD event bit as advertised. This causes a force WD event
to be handled over and over again as the SW reset following the event never
clears the force WD event bit.
Explicitly clear the watchdog event register to 0 in irq_action when
handling an event to prevent the switch from sending continuous interrupts.
Marvell aren't aware of any other stuck bits apart from the force WD
bit.
Fixes: de776d0d31 ("net: dsa: mv88e6xxx: add support for mv88e6393x family"
Signed-off-by: Gustav Ekelund <gustaek@axis.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 0db3dc73f7 ("[NETPOLL]: tx lock deadlock fix") narrowed
down the region under netif_tx_trylock() inside netpoll_send_skb().
(At that point in time netif_tx_trylock() would lock all queues of
the device.) Taking the tx lock was problematic because driver's
cleanup method may take the same lock. So the change made us hold
the xmit lock only around xmit, and expected the driver to take
care of locking within ->ndo_poll_controller().
Unfortunately this only works if netpoll isn't itself called with
the xmit lock already held. Netpoll code is careful and uses
trylock(). The drivers, however, may be using plain lock().
Printing while holding the xmit lock is going to result in rare
deadlocks.
Luckily we record the xmit lock owners, so we can scan all the queues,
the same way we scan NAPI owners. If any of the xmit locks is held
by the local CPU we better not attempt any polling.
It would be nice if we could narrow down the check to only the NAPIs
and the queue we're trying to use. I don't see a way to do that now.
Reported-by: Roman Gushchin <roman.gushchin@linux.dev>
Fixes: 0db3dc73f7 ("[NETPOLL]: tx lock deadlock fix")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In xen_9pfs_front_probe, it calls xen_9pfs_front_alloc_dataring
to init priv->rings and bound &ring->work with p9_xen_response.
When it calls xen_9pfs_front_event_handler to handle IRQ requests,
it will finally call schedule_work to start the work.
When we call xen_9pfs_front_remove to remove the driver, there
may be a sequence as follows:
Fix it by finishing the work before cleanup in xen_9pfs_front_free.
Note that, this bug is found by static analysis, which might be
false positive.
CPU0 CPU1
|p9_xen_response
xen_9pfs_front_remove|
xen_9pfs_front_free|
kfree(priv) |
//free priv |
|p9_tag_lookup
|//use priv->client
Fixes: 71ebd71921 ("xen/9pfs: connect to the backend")
Signed-off-by: Zheng Wang <zyytlz.wz@163.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Eric Van Hensbergen <ericvh@kernel.org>
When removing provided buffers, io_buffer structs are not being disposed
of, leading to a memory leak. They can't be freed individually, because
they are allocated in page-sized groups. They need to be added to some
free list instead, such as io_buffers_cache. All callers already hold
the lock protecting it, apart from when destroying buffers, so had to
extend the lock there.
Fixes: cc3cec8367 ("io_uring: speedup provided buffer handling")
Signed-off-by: Wojciech Lukowicz <wlukowicz01@gmail.com>
Link: https://lore.kernel.org/r/20230401195039.404909-2-wlukowicz01@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
When a request to remove buffers is submitted, and the given number to be
removed is larger than available in the specified buffer group, the
resulting CQE result will be the number of removed buffers + 1, which is
1 more than it should be.
Previously, the head was part of the list and it got removed after the
loop, so the increment was needed. Now, the head is not an element of
the list, so the increment shouldn't be there anymore.
Fixes: dbc7d452e7 ("io_uring: manage provided buffers strictly ordered")
Signed-off-by: Wojciech Lukowicz <wlukowicz01@gmail.com>
Link: https://lore.kernel.org/r/20230401195039.404909-2-wlukowicz01@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull cifs client fixes from Steve French:
"Four cifs/smb3 client (reconnect and DFS related) fixes, including two
for stable:
- DFS oops fix
- DFS reconnect recursion fix
- An SMB1 parallel reconnect fix
- Trivial dead code removal in smb2_reconnect"
* tag '6.3-rc4-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: get rid of dead check in smb2_reconnect()
cifs: prevent infinite recursion in CIFSGetDFSRefer()
cifs: avoid races in parallel reconnects in smb1
cifs: fix DFS traversal oops without CONFIG_CIFS_DFS_UPCALL
Pull input fixes from Dmitry Torokhov:
- fixes to ALPS and Focaltech PS/2 drivers dealing with the breakage of
switching to -funsigned-char
- quirks to i8042 to better handle Lifebook A574/H and TUXEDO devices
- a quirk to Goodix touchscreen driver to handle Yoga Book X90F
- a fix for incorrectly merged patch to xpad game controller driver
* tag 'input-for-v6.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: i8042 - add TUXEDO devices to i8042 quirk tables for partial fix
Input: alps - fix compatibility with -funsigned-char
Input: focaltech - use explicitly signed char type
Input: xpad - fix incorrectly applied patch for MAP_PROFILE_BUTTON
Input: goodix - add Lenovo Yoga Book X90F to nine_bytes_report DMI table
Input: i8042 - add quirk for Fujitsu Lifebook A574/H
Pull pin control fixes from Linus Walleij:
"Some pin control fixes for the v6.3 series.
The most notable and urgent one is probably the AMD fix which affects
AMD laptops, found by the Chromium people.
Summary:
- Fix up the Kconfig options for MediaTek MT7981
- Fix the irq domain name in the AT91-PIO4 driver
- Fix some alternative muxing modes in the Ocelot driver
- Allocate the GPIO numbers dynamically in the STM32 driver
- Disable and mask interrupts on resume in the AMD driver
- Fix a typo in the Qualcomm SM8550 pin control device tree bindings"
* tag 'pinctrl-v6.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
dt-bindings: pinctrl: qcom,sm8550-lpass-lpi: allow input-enabled and bias-bus-hold
pinctrl: amd: Disable and mask interrupts on resume
pinctrl: stm32: use dynamic allocation of GPIO base
pinctrl: ocelot: Fix alt mode for ocelot
pinctrl: at91-pio4: fix domain name assignment
pinctrl: mediatek: fix naming inconsistency
pinctrl: mediatek: add missing options to PINCTRL_MT7981
Pull Kbuild fixes from Masahiro Yamada:
- Fix linux-headers debian package
- Fix a merge_config.sh error due to a misspelled variable
- Fix modversion for 32-bit build machines
* tag 'kbuild-fixes-v6.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
modpost: Fix processing of CRCs on 32-bit build machines
scripts: merge_config: Fix typo in variable name.
kbuild: deb-pkg: set version for linux-headers paths
Pull iommu fixes from Joerg Roedel:
- Maintainer update for S390 IOMMU driver
- A fix for the set_platform_dma_ops() call-back in the Exynos
IOMMU driver
- Intel VT-d fixes from Lu Baolu:
- Fix a lockdep splat
- Fix a supplement of the specification
- Fix a warning in perfmon code
* tag 'iommu-fixes-6.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/vt-d: Fix an IOMMU perfmon warning when CPU hotplug
iommu/vt-d: Allow zero SAGAW if second-stage not supported
iommu/vt-d: Remove unnecessary locking in intel_irq_remapping_alloc()
iommu/exynos: Fix set_platform_dma_ops() callback
MAINTAINERS: Update s390-iommu driver maintainer information
When a DRM driver turns on or off the screen with the audio
capability, it notifies the ELD to HD-audio HDMI codec driver via
component ops. HDMI codec driver, in turn, attaches or detaches the
PCM stream for the given port on the fly.
The problem is that, since the recent code change, the HDMI driver
always treats the PCM stream assignment dynamically; this ended up the
confusion of the PCM device appearance. e.g. when a screen goes once
off and on again, it may appear on a different PCM device before the
screen-off. Although the application should treat such a change, it
doesn't seem working gracefully with the current pipewire (maybe
PulseAudio, too).
As a workaround, this patch changes the HDMI codec driver behavior
slightly to be more consistent. Now it remembers the previous PCM
slot for the given port and try to assign to it. That is, if a port
is re-enabled, the driver tries to use the same PCM slot that was
assigned to that port previously. If it conflicts, a new slot is
searched and used like before, instead.
Note that multiple monitor connections are the only typical case where
the PCM slot preservation is effective. As long as only a single
monitor is connected, the behavior isn't changed, and the first PCM
slot is still assigned always.
Fixes: ef6f5494fa ("ALSA: hda/hdmi: Use only dynamic PCM device allocation")
Reviewed-by: Jaroslav Kysela <perex@perex.cz>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217259
Link: https://lore.kernel.org/r/20230331142217.19791-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
The runtime suspend/resume functions are only referenced from the
dev_pm_ops, but they use the old SET_RUNTIME_PM_OPS() helper that
requires a __maybe_unused annotation to avoid a warning:
drivers/media/i2c/imx290.c:1082:12: error: unused function 'imx290_runtime_resume' [-Werror,-Wunused-function]
static int imx290_runtime_resume(struct device *dev)
^
drivers/media/i2c/imx290.c:1090:12: error: unused function 'imx290_runtime_suspend' [-Werror,-Wunused-function]
static int imx290_runtime_suspend(struct device *dev)
^
Convert this to the new RUNTIME_PM_OPS() helper that so this is not
required. To improve this further, also use the pm_ptr() helper that
lets the dev_pm_ops get dropped entirely when CONFIG_PM is disabled.
A related mistake happened in the of_match_ptr() macro here, which like
SET_RUNTIME_PM_OPS() requires the match table to be marked as
__maybe_unused, though I could not reproduce building this without
CONFIG_OF. Remove the of_match_ptr() here as there is no point in
dropping the match table in configurations without CONFIG_OF.
Fixes: 02852c01f6 ("media: i2c: imx290: Initialize runtime PM before subdev")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reported-by: Guenter Roeck <linux@roeck-us.net>
Reported-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Reviewed-by: Manivannan Sadhasivam <mani@kernel.org>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
For ops with "trivial" replies, nfsd4_encode_operation will shortcut
most of the encoding work and skip to just marshalling up the status.
One of the things it skips is calling op_release. This could cause a
memory leak in the layoutget codepath if there is an error at an
inopportune time.
Have the compound processing engine always call op_release, even when
op_func sets an error in op->status. With this change, we also need
nfsd4_block_get_device_info_scsi to set the gd_device pointer to NULL
on error to avoid a double free.
Reported-by: Zhi Li <yieli@redhat.com>
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2181403
Fixes: 34b1744c91 ("nfsd4: define ->op_release for compound ops")
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
OPDESC() simply indexes into nfsd4_ops[] by the op's operation
number, without range checking that value. It assumes callers are
careful to avoid calling it with an out-of-bounds opnum value.
nfsd4_decode_compound() is not so careful, and can invoke OPDESC()
with opnum set to OP_ILLEGAL, which is 10044 -- well beyond the end
of nfsd4_ops[].
Reported-by: Jeff Layton <jlayton@kernel.org>
Fixes: f4f9ef4a1b ("nfsd4: opdesc will be useful outside nfs4proc.c")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Pull NFS client fixes from Anna Schumaker:
- Fix shutdown of NFS TCP client sockets
- Fix hangs when recovering open state after a server reboot
* tag 'nfs-for-6.3-3' of git://git.linux-nfs.org/projects/anna/linux-nfs:
SUNRPC: fix shutdown of NFS TCP client socket
NFSv4: Fix hangs when recovering open state after a server reboot
Pull x86 platform driver fixes from Hans de Goede:
- Fix a regression in ideapad-laptop which caused the touchpad to stop
working after a suspend/resume on some models
- One other small fix and three hw-id additions
* tag 'platform-drivers-x86-v6.3-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/x86: ideapad-laptop: Stop sending KEY_TOUCHPAD_TOGGLE
platform/x86: asus-nb-wmi: Add quirk_asus_tablet_mode to other ROG Flow X13 models
platform/x86: gigabyte-wmi: add support for X570S AORUS ELITE
platform/x86: gigabyte-wmi: add support for B650 AORUS ELITE AX
platform/x86/intel/pmc: Alder Lake PCH slp_s0_residency fix
Pull PCI fix from Bjorn Helgaas:
- Fix DesignWare PORT_LINK_CONTROL setup, which was corrupted when the
DT "snps,enable-cdm-check" property was present (Yoshihiro Shimoda)
* tag 'pci-v6.3-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
PCI: dwc: Fix PORT_LINK_CONTROL update when CDM check enabled
Pull regulator fix from Mark Brown:
"Deferred probe fix for v6.3.
This fixes a rarely triggered issue where we would treat probe
deferral for clocks as a fatal error in the fixed regulator, causing
it to fail to retry when it should"
* tag 'regulator-fix-v6.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: Handle deferred clk
ASoC: Fixes for v6.3
More fixes for v6.3, plus a few new trivial device ID additions.
Almost all of this is for the Intel drivers, though there is one
core fix from Shengjiu which ensures that format constraints are
correctly applied in some cases where they were missed.
Pull block fixes from Jens Axboe:
- NVMe pull request via Christoph:
- Mark Lexar NM760 as IGNORE_DEV_SUBNQN (Juraj Pecigos)
- Fix a possible UAF when failing to allocate an TCP io queue (Sagi
Grimberg)
- MD pull request via Song:
- Fix a null pointer deference in 6.3-rc (Yu Kuai)
- uevent partition fix (Alyssa)
* tag 'block-6.3-2023-03-30' of git://git.kernel.dk/linux:
nvme-tcp: fix a possible UAF when failing to allocate an io queue
md: fix regression for null-ptr-deference in __md_stop()
nvme-pci: mark Lexar NM760 as IGNORE_DEV_SUBNQN
loop: LOOP_CONFIGURE: send uevents for partitions
Pull io_uring fixes from Jens Axboe:
- Fix a regression with the poll retry, introduced in this merge window
(me)
- Fix a regression with the alloc cache not decrementing the member
count on removal. Also a regression from this merge window (Pavel)
- Fix race around rsrc node grabbing (Pavel)
* tag 'io_uring-6.3-2023-03-30' of git://git.kernel.dk/linux:
io_uring: fix poll/netmsg alloc caches
io_uring/rsrc: fix rogue rsrc node grabbing
io_uring/poll: clear single/double poll flags on poll arming
Commit 5829f8a897 ("platform/x86: ideapad-laptop: Send
KEY_TOUCHPAD_TOGGLE on some models") made ideapad-laptop send
KEY_TOUCHPAD_TOGGLE when we receive an ACPI notify with VPC event bit 5 set
and the touchpad-state has not been changed by the EC itself already.
This was done under the assumption that this would be good to do to make
the touchpad-toggle hotkey work on newer models where the EC does not
toggle the touchpad on/off itself (because it is not routed through
the PS/2 controller, but uses I2C).
But it turns out that at least some models, e.g. the Yoga 7-15ITL5 the EC
triggers an ACPI notify with VPC event bit 5 set on resume, which would
now cause a spurious KEY_TOUCHPAD_TOGGLE on resume to which the desktop
environment responds by disabling the touchpad in software, breaking
the touchpad (until manually re-enabled) on resume.
It was never confirmed that sending KEY_TOUCHPAD_TOGGLE actually improves
things on new models and at least some new models like the Yoga 7-15ITL5
don't have a touchpad on/off toggle hotkey at all, while still sending
ACPI notify events with VPC event bit 5 set.
So it seems best to revert the change to send KEY_TOUCHPAD_TOGGLE when
receiving an ACPI notify events with VPC event bit 5 and the touchpad
state as reported by the EC has not changed.
Note this is not a full revert the code to cache the last EC touchpad
state is kept to avoid sending spurious KEY_TOUCHPAD_ON / _OFF events
on resume.
Fixes: 5829f8a897 ("platform/x86: ideapad-laptop: Send KEY_TOUCHPAD_TOGGLE on some models")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217234
Cc: stable@vger.kernel.org
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20230330194644.64628-1-hdegoede@redhat.com
Pull thermal control fixes from Rafael Wysocki:
"These remove two recently added excessive lockdep assertions from the
sysfs-related thermal code and fix two issues in Intel thermal
drivers.
Specifics:
- Drop two lockdep assertions producing false positive warnings from
the sysfs-related thermal core code (Rafael Wysocki)
- Fix handling of two recently added module parameters in the Intel
powerclamp thermal driver (David Arcari)
- Fix one more deadlock in the int340x thermal driver (Srinivas
Pandruvada)"
* tag 'thermal-6.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
thermal: intel: powerclamp: Fix cpumask and max_idle module parameters
thermal: intel: int340x: processor_thermal: Fix additional deadlock
thermal: core: Drop excessive lockdep_assert_held() calls
Pull ACPI fix from Rafael Wysocki:
"Fix a recent regression related to the handling of ACPI notifications
that made it more likely for ACPI driver callbacks to be invoked in an
unexpected order and NULL pointers can be dereferenced as a result or
similar.
The fix is to modify the global ACPI notification handler so it does
not invoke driver callbacks at all and allow the device-level
notification handlers to receive "system" notifications (for the
drivers that want to receive them)"
* tag 'acpi-6.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: bus: Rework system-level device notification handling
Pull RISC-V fixes from Palmer Dabbelt:
- A fix for FPU probing in XIP kernels
- Always enable the alternative framework for non-XIP kernels
* tag 'riscv-for-linus-6.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
RISC-V: always select RISCV_ALTERNATIVE for non-xip kernels
RISC-V: add non-alternative fallback for riscv_has_extension_[un]likely()
Pull MIPS fix from Thomas Bogendoerfer:
"Fix to avoid crash on BCM6358 platforms"
* tag 'mips-fixes_6.3_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
mips: bmips: BCM6358: disable RAC flush for TP1
When introduced, IRQFD resampling worked on POWER8 with XICS. However
KVM on POWER9 has never implemented it - the compatibility mode code
("XICS-on-XIVE") misses the kvm_notify_acked_irq() call and the native
XIVE mode does not handle INTx in KVM at all.
This moved the capability support advertising to platforms and stops
advertising it on XIVE, i.e. POWER9 and later.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Acked-by: Anup Patel <anup@brainfault.org>
Acked-by: Nicholas Piggin <npiggin@gmail.com>
Message-Id: <20220504074807.3616813-1-aik@ozlabs.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When compiling selftests with target mount_setattr I encountered some errors with the below messages:
mount_setattr_test.c: In function ‘mount_setattr_thread’:
mount_setattr_test.c:343:16: error: variable ‘attr’ has initializer but incomplete type
343 | struct mount_attr attr = {
| ^~~~~~~~~~
These errors might be because of linux/mount.h is not included. This patch resolves that issue.
Signed-off-by: Anh Tuan Phan <tuananhlfc@gmail.com>
Acked-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
After commit 92cadedd9d ("brcmfmac: Avoid keeping power to SDIO card
unless WOWL is used"), the wifi adapter by default is turned off on suspend
and then re-probed on resume.
In at least 2 model x86/acpi tablets with brcmfmac43430a1 wifi adapters,
the newly added re-probe on resume fails like this:
brcmfmac: brcmf_sdio_bus_rxctl: resumed on timeout
ieee80211 phy1: brcmf_bus_started: failed: -110
ieee80211 phy1: brcmf_attach: dongle is not responding: err=-110
brcmfmac: brcmf_sdio_firmware_callback: brcmf_attach failed
It seems this specific brcmfmac model does not like being reprobed without
it actually being turned off first.
And the adapter is not being turned off during suspend because of
commit f0992ace68 ("brcmfmac: prohibit ACPI power management for brcmfmac
driver").
Now that the driver is being reprobed on resume, the disabling of ACPI
pm is no longer necessary, except when WOWL is used (in which case there
is no-reprobe).
Move the dis-/en-abling of ACPI pm to brcmf_sdio_wowl_config(), this fixes
the brcmfmac43430a1 suspend/resume regression and should help save some
power when suspended.
This change means that the code now also may re-enable ACPI pm when WOWL
gets disabled. ACPI pm should only be re-enabled if it was enabled by
the ACPI core originally. Add a brcmf_sdiod_acpi_save_power_manageable()
to save the original state for this.
This has been tested on the following devices:
Asus T100TA brcmfmac43241b4-sdio
Acer Iconia One 7 B1-750 brcmfmac43340-sdio
Chuwi Hi8 brcmfmac43430a0-sdio
Chuwi Hi8 brcmfmac43430a1-sdio
(the Asus T100TA is the device for which the prohibiting of ACPI pm
was originally added)
Fixes: 92cadedd9d ("brcmfmac: Avoid keeping power to SDIO card unless WOWL is used")
Cc: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20230320122252.240070-1-hdegoede@redhat.com
In terminate_all we should queue up all submitted descriptors to be
freed. We do that for the content of the 'issued' and 'submitted' lists,
but the 'current_tx' descriptor falls through the cracks as it's
removed from the 'issued' list once it gets assigned to be the current
descriptor. Explicitly queue up freeing of the 'current_tx' descriptor
to address a memory leak that is otherwise present.
Fixes: b127315d9a ("dmaengine: apple-admac: Add Apple ADMAC driver")
Signed-off-by: Martin Povišer <povik+lin@cutebit.org>
Link: https://lore.kernel.org/r/20230224152222.26732-2-povik+lin@cutebit.org
Signed-off-by: Vinod Koul <vkoul@kernel.org>
In addition to TX channel and RX channel interrupt flags there's
another class of 'global' interrupt flags with unknown semantics. Those
weren't being handled up to now, and they are the suspected cause of
stuck IRQ states that have been sporadically occurring. Check the global
flags and clear them if raised.
Fixes: b127315d9a ("dmaengine: apple-admac: Add Apple ADMAC driver")
Signed-off-by: Martin Povišer <povik+lin@cutebit.org>
Link: https://lore.kernel.org/r/20230224152222.26732-1-povik+lin@cutebit.org
Signed-off-by: Vinod Koul <vkoul@kernel.org>
This adds conversion of VMCI specific error code to general -ENOMEM. It
is needed, because af_vsock.c passes error value returned from transport
to the user, which does not expect to get VMCI_ERROR_* values.
Fixes: c43170b7e1 ("vsock: return errors other than -ENOMEM to socket")
Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
Reviewed-by: Vishnu Dasa <vdasa@vmware.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The type of MAX_SKB_FRAGS has changed recently, so the debug printk
needs to be updated:
drivers/net/ethernet/ti/netcp_core.c: In function 'netcp_create_interface':
drivers/net/ethernet/ti/netcp_core.c:2084:30: error: format '%ld' expects argument of type 'long int', but argument 3 has type 'int' [-Werror=format=]
2084 | dev_err(dev, "tx-pool size too small, must be at least %ld\n",
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Fixes: 3948b05950 ("net: introduce a config option to tweak MAX_SKB_FRAGS")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 65b32f801b ("uapi: move IPPROTO_L2TP to in.h") moved the
definition of IPPROTO_L2TP from a define to an enum, but since
__stringify doesn't work properly with enums, we ended up breaking the
modalias strings for the l2tp modules:
$ modinfo l2tp_ip l2tp_ip6 | grep alias
alias: net-pf-2-proto-IPPROTO_L2TP
alias: net-pf-2-proto-2-type-IPPROTO_L2TP
alias: net-pf-10-proto-IPPROTO_L2TP
alias: net-pf-10-proto-2-type-IPPROTO_L2TP
Use the resolved number directly in MODULE_ALIAS_*() macros (as we
already do with SOCK_DGRAM) to fix the alias strings:
$ modinfo l2tp_ip l2tp_ip6 | grep alias
alias: net-pf-2-proto-115
alias: net-pf-2-proto-115-type-2
alias: net-pf-10-proto-115
alias: net-pf-10-proto-115-type-2
Moreover, fix the ordering of the parameters passed to
MODULE_ALIAS_NET_PF_PROTO_TYPE() by switching proto and type.
Fixes: 65b32f801b ("uapi: move IPPROTO_L2TP to in.h")
Link: https://lore.kernel.org/lkml/ZCQt7hmodtUaBlCP@righiandr-XPS-13-7390
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Tested-by: Wojciech Drewek <wojciech.drewek@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Sit Wei Hong says:
====================
Fix PHY handle no longer parsing
After the fixed link support was introduced, it is observed that PHY
no longer attach to the MAC properly. So we introduce a helper
function to determine if the MAC should expect to connect to a PHY
and proceed accordingly.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, intel_speed_mode_2500() will fix-up xpcs_an_inband
to 1 if the underlying controller has a max speed of 1000Mbps.
The value has been initialized and modified if it is
a fixed-linked setup earlier.
This patch removes the fix-up to allow for fixed-linked setup
support. In stmmac_phy_setup(), ovr_an_inband is set based on
the value of xpcs_an_inband. Which in turn will return an
error in phylink_parse_mode() where MLO_AN_FIXED and
ovr_an_inband are both set.
Fixes: c82386310d ("stmmac: intel: prepare to support 1000BASE-X phy interface setting")
Signed-off-by: Michael Sit Wei Hong <michael.wei.hong.sit@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After the introduction of the fixed-link support, the MAC driver
no longer attempt to scan for a PHY to attach to. This causes the
non fixed-link setups to stop working.
Using the phylink_expects_phy() to check and determine if the MAC
should expect and attach a PHY.
Fixes: ab21cf9209 ("net: stmmac: make mdio register skips PHY scanning for fixed-link")
Signed-off-by: Michael Sit Wei Hong <michael.wei.hong.sit@intel.com>
Signed-off-by: Lai Peter Jun Ann <peter.jun.ann.lai@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Provide phylink_expects_phy() to allow MAC drivers to check if it
is expecting a PHY to attach to. Since fixed-linked setups do not
need to attach to a PHY.
Provides a boolean value as to if the MAC should expect a PHY.
Returns true if a PHY is expected.
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Michael Sit Wei Hong <michael.wei.hong.sit@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A warning can be triggered when hotplug CPU 0.
$ echo 0 > /sys/devices/system/cpu/cpu0/online
------------[ cut here ]------------
Voluntary context switch within RCU read-side critical section!
WARNING: CPU: 0 PID: 19 at kernel/rcu/tree_plugin.h:318
rcu_note_context_switch+0x4f4/0x580
RIP: 0010:rcu_note_context_switch+0x4f4/0x580
Call Trace:
<TASK>
? perf_event_update_userpage+0x104/0x150
__schedule+0x8d/0x960
? perf_event_set_state.part.82+0x11/0x50
schedule+0x44/0xb0
schedule_timeout+0x226/0x310
? __perf_event_disable+0x64/0x1a0
? _raw_spin_unlock+0x14/0x30
wait_for_completion+0x94/0x130
__wait_rcu_gp+0x108/0x130
synchronize_rcu+0x67/0x70
? invoke_rcu_core+0xb0/0xb0
? __bpf_trace_rcu_stall_warning+0x10/0x10
perf_pmu_migrate_context+0x121/0x370
iommu_pmu_cpu_offline+0x6a/0xa0
? iommu_pmu_del+0x1e0/0x1e0
cpuhp_invoke_callback+0x129/0x510
cpuhp_thread_fun+0x94/0x150
smpboot_thread_fn+0x183/0x220
? sort_range+0x20/0x20
kthread+0xe6/0x110
? kthread_complete_and_exit+0x20/0x20
ret_from_fork+0x1f/0x30
</TASK>
---[ end trace 0000000000000000 ]---
The synchronize_rcu() will be invoked in the perf_pmu_migrate_context(),
when migrating a PMU to a new CPU. However, the current for_each_iommu()
is within RCU read-side critical section.
Two methods were considered to fix the issue.
- Use the dmar_global_lock to replace the RCU read lock when going
through the drhd list. But it triggers a lockdep warning.
- Use the cpuhp_setup_state_multi() to set up a dedicated state for each
IOMMU PMU. The lock can be avoided.
The latter method is implemented in this patch. Since each IOMMU PMU has
a dedicated state, add cpuhp_node and cpu in struct iommu_pmu to track
the state. The state can be dynamically allocated now. Remove the
CPUHP_AP_PERF_X86_IOMMU_PERF_ONLINE.
Fixes: 46284c6ceb ("iommu/vt-d: Support cpumask for IOMMU perfmon")
Reported-by: Ammy Yi <ammy.yi@intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Link: https://lore.kernel.org/r/20230328182028.1366416-1-kan.liang@linux.intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20230329134721.469447-4-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The global rwsem dmar_global_lock was introduced by commit 3a5670e8ac
("iommu/vt-d: Introduce a rwsem to protect global data structures"). It
is used to protect DMAR related global data from DMAR hotplug operations.
Using dmar_global_lock in intel_irq_remapping_alloc() is unnecessary as
the DMAR global data structures are not touched there. Remove it to avoid
below lockdep warning.
======================================================
WARNING: possible circular locking dependency detected
6.3.0-rc2 #468 Not tainted
------------------------------------------------------
swapper/0/1 is trying to acquire lock:
ff1db4cb40178698 (&domain->mutex){+.+.}-{3:3},
at: __irq_domain_alloc_irqs+0x3b/0xa0
but task is already holding lock:
ffffffffa0c1cdf0 (dmar_global_lock){++++}-{3:3},
at: intel_iommu_init+0x58e/0x880
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (dmar_global_lock){++++}-{3:3}:
lock_acquire+0xd6/0x320
down_read+0x42/0x180
intel_irq_remapping_alloc+0xad/0x750
mp_irqdomain_alloc+0xb8/0x2b0
irq_domain_alloc_irqs_locked+0x12f/0x2d0
__irq_domain_alloc_irqs+0x56/0xa0
alloc_isa_irq_from_domain.isra.7+0xa0/0xe0
mp_map_pin_to_irq+0x1dc/0x330
setup_IO_APIC+0x128/0x210
apic_intr_mode_init+0x67/0x110
x86_late_time_init+0x24/0x40
start_kernel+0x41e/0x7e0
secondary_startup_64_no_verify+0xe0/0xeb
-> #0 (&domain->mutex){+.+.}-{3:3}:
check_prevs_add+0x160/0xef0
__lock_acquire+0x147d/0x1950
lock_acquire+0xd6/0x320
__mutex_lock+0x9c/0xfc0
__irq_domain_alloc_irqs+0x3b/0xa0
dmar_alloc_hwirq+0x9e/0x120
iommu_pmu_register+0x11d/0x200
intel_iommu_init+0x5de/0x880
pci_iommu_init+0x12/0x40
do_one_initcall+0x65/0x350
kernel_init_freeable+0x3ca/0x610
kernel_init+0x1a/0x140
ret_from_fork+0x29/0x50
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(dmar_global_lock);
lock(&domain->mutex);
lock(dmar_global_lock);
lock(&domain->mutex);
*** DEADLOCK ***
Fixes: 9dbb8e3452 ("irqdomain: Switch to per-domain locking")
Reviewed-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Tested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20230314051836.23817-1-baolu.lu@linux.intel.com
Link: https://lore.kernel.org/r/20230329134721.469447-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
This patch sets the skb owner in the recv and send path for virtio.
For the send path, this solves the leak caused when
virtio_transport_purge_skbs() finds skb->sk is always NULL and therefore
never matches it with the current socket. Setting the owner upon
allocation fixes this.
For the recv path, this ensures correctness of accounting and also
correct transfer of ownership in vsock_loopback (when skbs are sent from
one socket and received by another).
Fixes: 71dc9ec9ac ("virtio/vsock: replace virtio_vsock_pkt with sk_buff")
Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
Reported-by: Cong Wang <xiyou.wangcong@gmail.com>
Link: https://lore.kernel.org/all/ZCCbATwov4U+GBUv@pop-os.localdomain/
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Johan writes:
USB-serial fixes for 6.3-rc5
Here are some new device ids for 6.3.
All have been in linux-next with no reported issues.
* tag 'usb-serial-6.3-rc5' of https://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial:
USB: serial: option: add Quectel RM500U-CN modem
USB: serial: option: add Telit FE990 compositions
USB: serial: cp210x: add Silicon Labs IFS-USB-DATACABLE IDs
Johannes Berg says:
====================
Just a few fixes:
* fix size calculation for EHT element to put into SKBs
* remove erroneous pre-RCU calls for drivers not using sta_state calls
* fix mesh forwarding and non-forwarding RX
* fix mesh flow dissection
* fix a potential NULL dereference on A-MSDU RX w/o station
* make two variable non-static that really shouldn't be static
* tag 'wireless-2023-03-30' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
wifi: mac80211: fix invalid drv_sta_pre_rcu_remove calls for non-uploaded sta
wifi: mac80211: fix flow dissection for forwarded packets
wifi: mac80211: fix mesh forwarding
wifi: mac80211: fix receiving mesh packets in forwarding=0 networks
wifi: mac80211: fix the size calculation of ieee80211_ie_len_eht_cap()
wifi: mac80211: fix potential null pointer dereference
wifi: mac80211: drop bogus static keywords in A-MSDU rx
====================
Link: https://lore.kernel.org/r/20230330203313.919164-1-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
There is neither a driver that parses this nor a DT binding schema that
documents it, so let's remove from the DTS files that make use of this.
The properties that exist are post-pwm-on-delay-ms and pwm-off-delay-ms,
defined in the pwm-backlight DT binding. If the delays are really needed
then those properties should be used instead.
Brian Norris mentioned though that looking at the first downstream usage
of the pwm-delay-us property for RK3399 Gru systems in ChromiumOS tree,
he couldn't find a spec reference that said that this was really needed.
So perhaps it was unnecessary added and a simple removal would be enough.
Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
Reviewed-by: Brian Norris <briannorris@chromium.org>
Link: https://lore.kernel.org/r/20230330231924.2404747-1-javierm@redhat.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Pull dma-mapping fixes from Christoph Hellwig:
- fix for swiotlb deadlock due to wrong alignment checks (GuoRui.Yu,
Petr Tesarik)
* tag 'dma-mapping-6.3-2023-03-31' of git://git.infradead.org/users/hch/dma-mapping:
swiotlb: fix slot alignment checks
swiotlb: use wrap_area_index() instead of open-coding it
swiotlb: fix the deadlock in swiotlb_do_find_slots
The SMB2_IOCTL check in the switch statement will never be true as we
return earlier from smb2_reconnect() if @smb2_command == SMB2_IOCTL.
Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
We can't call smb_init() in CIFSGetDFSRefer() as cifs_reconnect_tcon()
may end up calling CIFSGetDFSRefer() again to get new DFS referrals
and thus causing an infinite recursion.
Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Cc: stable@vger.kernel.org # 6.2
Signed-off-by: Steve French <stfrench@microsoft.com>
Prevent multiple threads of doing negotiate, session setup and tree
connect by holding @ses->session_mutex in cifs_reconnect_tcon() while
reconnecting session and tcon.
Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Pull SCSI fixes from James Bottomley:
"Four small fixes, three in drivers. The core fix is yet another
attempt to insulate us from UFS devices' weird behaviour for VPD
pages"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: mpt3sas: Don't print sense pool info twice
scsi: core: Improve scsi_vpd_inquiry() checks
scsi: megaraid_sas: Fix crash after a double completion
scsi: megaraid_sas: Fix fw_crash_buffer_show()
Pull NVMe fixes from Christoph:
"nvme fixes for Linux 6.3
- mark Lexar NM760 as IGNORE_DEV_SUBNQN (Juraj Pecigos)
- fix a possible UAF when failing to allocate an TCP io queue
(Sagi Grimberg)"
* tag 'nvme-6.3-2023-03-31' of git://git.infradead.org/nvme:
nvme-tcp: fix a possible UAF when failing to allocate an io queue
nvme-pci: mark Lexar NM760 as IGNORE_DEV_SUBNQN
When compiled with CONFIG_CIFS_DFS_UPCALL disabled, cifs_dfs_d_automount
is NULL. cifs.ko logic for mapping CIFS_FATTR_DFS_REFERRAL attributes to
S_AUTOMOUNT and corresponding dentry flags is retained regardless of
CONFIG_CIFS_DFS_UPCALL, leading to a NULL pointer dereference in
VFS follow_automount() when traversing a DFS referral link:
BUG: kernel NULL pointer dereference, address: 0000000000000000
...
Call Trace:
<TASK>
__traverse_mounts+0xb5/0x220
? cifs_revalidate_mapping+0x65/0xc0 [cifs]
step_into+0x195/0x610
? lookup_fast+0xe2/0xf0
path_lookupat+0x64/0x140
filename_lookup+0xc2/0x140
? __create_object+0x299/0x380
? kmem_cache_alloc+0x119/0x220
? user_path_at_empty+0x31/0x50
user_path_at_empty+0x31/0x50
__x64_sys_chdir+0x2a/0xd0
? exit_to_user_mode_prepare+0xca/0x100
do_syscall_64+0x42/0x90
entry_SYSCALL_64_after_hwframe+0x72/0xdc
This fix adds an inline cifs_dfs_d_automount() {return -EREMOTE} handler
when CONFIG_CIFS_DFS_UPCALL is disabled. An alternative would be to
avoid flagging S_AUTOMOUNT, etc. without CONFIG_CIFS_DFS_UPCALL. This
approach was chosen as it provides more control over the error path.
Signed-off-by: David Disseldorp <ddiss@suse.de>
Cc: stable@vger.kernel.org
Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Pull networking fixes from Jakub Kicinski:
"Including fixes from CAN and WPAN.
Still quite a few bugs from this release. This pull is a bit smaller
because major subtrees went into the previous one. Or maybe people
took spring break off?
Current release - regressions:
- phy: micrel: correct KSZ9131RNX EEE capabilities and advertisement
Current release - new code bugs:
- eth: wangxun: fix vector length of interrupt cause
- vsock/loopback: consistently protect the packet queue with
sk_buff_head.lock
- virtio/vsock: fix header length on skb merging
- wpan: ca8210: fix unsigned mac_len comparison with zero
Previous releases - regressions:
- eth: stmmac: don't reject VLANs when IFF_PROMISC is set
- eth: smsc911x: avoid PHY being resumed when interface is not up
- eth: mtk_eth_soc: fix tx throughput regression with direct 1G links
- eth: bnx2x: use the right build_skb() helper after core rework
- wwan: iosm: fix 7560 modem crash on use on unsupported channel
Previous releases - always broken:
- eth: sfc: don't overwrite offload features at NIC reset
- eth: r8169: fix RTL8168H and RTL8107E rx crc error
- can: j1939: prevent deadlock by moving j1939_sk_errqueue()
- virt: vmxnet3: use GRO callback when UPT is enabled
- virt: xen: don't do grant copy across page boundary
- phy: dp83869: fix default value for tx-/rx-internal-delay
- dsa: ksz8: fix multiple issues with ksz8_fdb_dump
- eth: mvpp2: fix classification/RSS of VLAN and fragmented packets
- eth: mtk_eth_soc: fix flow block refcounting logic
Misc:
- constify fwnode pointers in SFP handling"
* tag 'net-6.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (55 commits)
net: ethernet: mtk_eth_soc: add missing ppe cache flush when deleting a flow
net: ethernet: mtk_eth_soc: fix L2 offloading with DSA untag offload
net: ethernet: mtk_eth_soc: fix flow block refcounting logic
net: mvneta: fix potential double-frees in mvneta_txq_sw_deinit()
net: dsa: sync unicast and multicast addresses for VLAN filters too
net: dsa: mv88e6xxx: Enable IGMP snooping on user ports only
xen/netback: use same error messages for same errors
test/vsock: new skbuff appending test
virtio/vsock: WARN_ONCE() for invalid state of socket
virtio/vsock: fix header length on skb merging
bnxt_en: Add missing 200G link speed reporting
bnxt_en: Fix typo in PCI id to device description string mapping
bnxt_en: Fix reporting of test result in ethtool selftest
i40e: fix registers dump after run ethtool adapter self test
bnx2x: use the right build_skb() helper
net: ipa: compute DMA pool size properly
net: wwan: iosm: fixes 7560 modem crash
net: ethernet: mtk_eth_soc: fix tx throughput regression with direct 1G links
ice: fix invalid check for empty list in ice_sched_assoc_vsi_to_agg()
ice: add profile conflict check for AVF FDIR
...
Pull device mapper fixes from Mike Snitzer:
- Fix two DM core bugs in the code that handles splitting "abnormal" IO
(discards, write same and secure erase) and issuing that IO to the
correct underlying devices (and offsets within those devices).
* tag 'for-6.3/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm: fix __send_duplicate_bios() to always allow for splitting IO
dm: fix improper splitting for abnormal bios
Pull drm fixes from Daniel Vetter:
"Two regression fixes in here, otherwise just the usual stuff:
- i915 fixes for color mgmt, psr, lmem flush, hibernate oops, and
more
- amdgpu: dp mst and hibernate regression fix
- etnaviv: revert fdinfo support (incl drm/sched revert), leak fix
- misc ivpu fixes, nouveau backlight, drm buddy allocator 32bit
fixes"
* tag 'drm-fixes-2023-03-30' of git://anongit.freedesktop.org/drm/drm: (27 commits)
Revert "drm/scheduler: track GPU active time per entity"
Revert "drm/etnaviv: export client GPU usage statistics via fdinfo"
drm/etnaviv: fix reference leak when mmaping imported buffer
drm/amdgpu: allow more APUs to do mode2 reset when go to S4
drm/amd/display: Take FEC Overhead into Timeslot Calculation
drm/amd/display: Add DSC Support for Synaptics Cascaded MST Hub
drm: test: Fix 32-bit issue in drm_buddy_test
drm: buddy_allocator: Fix buddy allocator init on 32-bit systems
drm/nouveau/kms: Fix backlight registration
drm/i915/perf: Drop wakeref on GuC RC error
drm/i915/dpt: Treat the DPT BO as a framebuffer
drm/i915/gem: Flush lmem contents after construction
drm/i915/tc: Fix the ICL PHY ownership check in TC-cold state
drm/i915: Disable DC states for all commits
drm/i915: Workaround ICL CSC_MODE sticky arming
drm/i915: Add a .color_post_update() hook
drm/i915: Move CSC load back into .color_commit_arm() when PSR is enabled on skl/glk
drm/i915: Split icl_color_commit_noarm() from skl_color_commit_noarm()
drm/i915/pmu: Use functions common with sysfs to read actual freq
accel/ivpu: Fix IPC buffer header status field value
...
Commit 7dd76d1fee ("dm: improve bio splitting and associated IO
accounting") only called setup_split_accounting() from
__send_duplicate_bios() if a single bio were being issued. But the case
where duplicate bios are issued must call it too.
Otherwise the bio won't be split and resubmitted (via recursion through
block core back to DM) to submit the later portions of a bio (which may
map to an entirely different target).
For example, when discarding an entire DM striped device with the
following DM table:
vg-lvol0: 0 159744 striped 2 128 7:0 2048 7:1 2048
vg-lvol0: 159744 45056 striped 2 128 7:2 2048 7:3 2048
Before (broken, discards the first striped target's devices twice):
device-mapper: striped: target_stripe=0, bdev=7:0, start=2048 len=79872
device-mapper: striped: target_stripe=1, bdev=7:1, start=2048 len=79872
device-mapper: striped: target_stripe=0, bdev=7:0, start=2049 len=22528
device-mapper: striped: target_stripe=1, bdev=7:1, start=2048 len=22528
After (works as expected):
device-mapper: striped: target_stripe=0, bdev=7:0, start=2048 len=79872
device-mapper: striped: target_stripe=1, bdev=7:1, start=2048 len=79872
device-mapper: striped: target_stripe=0, bdev=7:2, start=2048 len=22528
device-mapper: striped: target_stripe=1, bdev=7:3, start=2048 len=22528
Fixes: 7dd76d1fee ("dm: improve bio splitting and associated IO accounting")
Cc: stable@vger.kernel.org
Reported-by: Orange Kao <orange@aiven.io>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
"Abnormal" bios include discards, write zeroes and secure erase. By no
longer passing the calculated 'len' pointer, commit 7dd06a2548 ("dm:
allow dm_accept_partial_bio() for dm_io without duplicate bios") took a
senseless approach to disallowing dm_accept_partial_bio() from working
for duplicate bios processed using __send_duplicate_bios().
It inadvertently and incorrectly stopped the use of 'len' when
initializing a target's io (in alloc_tio). As such the resulting tio
could address more area of a device than it should.
For example, when discarding an entire DM striped device with the
following DM table:
vg-lvol0: 0 159744 striped 2 128 7:0 2048 7:1 2048
vg-lvol0: 159744 45056 striped 2 128 7:2 2048 7:3 2048
Before this fix:
device-mapper: striped: target_stripe=0, bdev=7:0, start=2048 len=102400
blkdiscard: attempt to access beyond end of device
loop0: rw=2051, sector=2048, nr_sectors = 102400 limit=81920
device-mapper: striped: target_stripe=1, bdev=7:1, start=2048 len=102400
blkdiscard: attempt to access beyond end of device
loop1: rw=2051, sector=2048, nr_sectors = 102400 limit=81920
After this fix;
device-mapper: striped: target_stripe=0, bdev=7:0, start=2048 len=79872
device-mapper: striped: target_stripe=1, bdev=7:1, start=2048 len=79872
Fixes: 7dd06a2548 ("dm: allow dm_accept_partial_bio() for dm_io without duplicate bios")
Cc: stable@vger.kernel.org
Reported-by: Orange Kao <orange@aiven.io>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Reported on the Turris forum, mvneta provokes kernel warnings in the
architecture DMA mapping code when mvneta_setup_txqs() fails to
allocate memory. This happens because when mvneta_cleanup_txqs() is
called in the mvneta_stop() path, we leave pointers in the structure
that have been freed.
Then on mvneta_open(), we call mvneta_setup_txqs(), which starts
allocating memory. On memory allocation failure, mvneta_cleanup_txqs()
will walk all the queues freeing any non-NULL pointers - which includes
pointers that were previously freed in mvneta_stop().
Fix this by setting these pointers to NULL to prevent double-freeing
of the same memory.
Fixes: 2adb719d74 ("net: mvneta: Implement software TSO")
Link: https://forum.turris.cz/t/random-kernel-exceptions-on-hbl-tos-7-0/18865/8
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/E1phUe5-00EieL-7q@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If certain conditions are met, DSA can install all necessary MAC
addresses on the CPU ports as FDB entries and disable flooding towards
the CPU (we call this RX filtering).
There is one corner case where this does not work.
ip link add br0 type bridge vlan_filtering 1 && ip link set br0 up
ip link set swp0 master br0 && ip link set swp0 up
ip link add link swp0 name swp0.100 type vlan id 100
ip link set swp0.100 up && ip addr add 192.168.100.1/24 dev swp0.100
Traffic through swp0.100 is broken, because the bridge turns on VLAN
filtering in the swp0 port (causing RX packets to be classified to the
FDB database corresponding to the VID from their 802.1Q header), and
although the 8021q module does call dev_uc_add() towards the real
device, that API is VLAN-unaware, so it only contains the MAC address,
not the VID; and DSA's current implementation of ndo_set_rx_mode() is
only for VID 0 (corresponding to FDB entries which are installed in an
FDB database which is only hit when the port is VLAN-unaware).
It's interesting to understand why the bridge does not turn on
IFF_PROMISC for its swp0 bridge port, and it may appear at first glance
that this is a regression caused by the logic in commit 2796d0c648
("bridge: Automatically manage port promiscuous mode."). After all,
a bridge port needs to have IFF_PROMISC by its very nature - it needs to
receive and forward frames with a MAC DA different from the bridge
ports' MAC addresses.
While that may be true, when the bridge is VLAN-aware *and* it has a
single port, there is no real reason to enable promiscuity even if that
is an automatic port, with flooding and learning (there is nowhere for
packets to go except to the BR_FDB_LOCAL entries), and this is how the
corner case appears. Adding a second automatic interface to the bridge
would make swp0 promisc as well, and would mask the corner case.
Given the dev_uc_add() / ndo_set_rx_mode() API is what it is (it doesn't
pass a VLAN ID), the only way to address that problem is to install host
FDB entries for the cartesian product of RX filtering MAC addresses and
VLAN RX filters.
Fixes: 7569459a52 ("net: dsa: manage flooding on the CPU ports")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20230329151821.745752-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Do not set the MV88E6XXX_PORT_CTL0_IGMP_MLD_SNOOP bit on CPU or DSA ports.
This allows the host CPU port to be a regular IGMP listener by sending out
IGMP Membership Reports, which would otherwise not be forwarded by the
mv88exxx chip, but directly looped back to the CPU port itself.
Fixes: 54d792f257 ("net: dsa: Centralise global and port setup code into mv88e6xxx.")
Signed-off-by: Steffen Bätz <steffen@innosonix.de>
Signed-off-by: Fabio Estevam <festevam@denx.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20230329150140.701559-1-festevam@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When cpumask is specified as a module parameter the value is
overwritten by the module init routine. This can easily be fixed
by checking to see if the mask has already been allocated in the
init routine.
When max_idle is specified as a module parameter a panic will occur.
The problem is that the idle_injection_cpu_mask is not allocated until
the module init routine executes. This can easily be fixed by allocating
the cpumask if it's not already allocated.
Fixes: ebf5197102 ("thermal: intel: powerclamp: Add two module parameters")
Signed-off-by: David Arcari <darcari@redhat.com>
Reviewed-by: Srinivas Pandruvada<srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
drm/i915 fixes for v6.3-rc5:
- Fix PMU support by reusing functions with sysfs
- Fix a number of issues related to color, PSR and arm/noarm
- Fix state check related to ICL PHY ownership check in TC-cold state
- Flush lmem contents after construction
- Fix hibernate oops related to DPT BO
- Fix perf stream error path wakeref balance
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/87355m4gtm.fsf@intel.com
Pull sound fixes from Takashi Iwai:
"A collection of small fixes:
- A potential deadlock fix for USB-audio, involving some change in
PCM core side
- A regression fix for probes of USB-audio devices with the
vendor-specific PCM format bits
- Two regression fixes for the old YMFPCI driver
- A few HD-audio quirks as usual"
* tag 'sound-6.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda/realtek: Add quirk for Lenovo ZhaoYang CF4620Z
ALSA: ymfpci: Fix BUG_ON in probe function
ALSA: ymfpci: Create card with device-managed snd_devm_card_new()
ALSA: usb-audio: Fix regression on detection of Roland VS-100
ALSA: hda/realtek: Fix support for Dell Precision 3260
ALSA: usb-audio: Fix recursive locking at XRUN during syncing
ALSA: hda/conexant: Partial revert of a quirk for Lenovo
ALSA: hda/realtek: Add quirks for some Clevo laptops
Pull zonefs fixes from Damien Le Moal:
- Make sure to always invalidate the last page of an inode straddling
inode->i_size to avoid data inconsistencies with appended data when
the device zone write granularity does not match the page size.
- Do not propagate iomap -ENOBLK error to userspace and use -EBUSY
instead.
* tag 'zonefs-6.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs:
zonefs: Do not propagate iomap_dio_rw() ENOTBLK error to user space
zonefs: Always invalidate last cached page on append write
This reverts commit df622729dd as it introduces a use-after-free,
which isn't easy to fix without going back to the design drawing board.
Reported-by: Danilo Krummrich <dakr@redhat.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
This reverts commit 97804a133c, as it builds on top of df622729dd
("drm/scheduler: track GPU active time per entity") which needs to be
reverted, as it introduces a use-after-free.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
drm_gem_prime_mmap() takes a reference on the GEM object, but before that
drm_gem_mmap_obj() already takes a reference, which will be leaked as only
one reference is dropped when the mapping is closed. Drop the extra
reference when dma_buf_mmap() succeeds.
Cc: stable@vger.kernel.org
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
We increase cache->nr_cached when we free into the cache but don't
decrease when we take from it, so in some time we'll get an empty
cache with cache->nr_cached larger than IO_ALLOC_CACHE_MAX, that fails
io_alloc_cache_put() and effectively disables caching.
Fixes: 9b797a37c4 ("io_uring: add abstraction around apoll cache")
Cc: stable@vger.kernel.org
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The call to invalidate_inode_pages2_range() in __iomap_dio_rw() may
fail, in which case -ENOTBLK is returned and this error code is
propagated back to user space trhough iomap_dio_rw() ->
zonefs_file_dio_write() return chain. This error code is fairly obscure
and may confuse the user. Avoid this and be consistent with the behavior
of zonefs_file_dio_append() for similar invalidate_inode_pages2_range()
errors by returning -EBUSY to user space when iomap_dio_rw() returns
-ENOTBLK.
Suggested-by: Christoph Hellwig <hch@infradead.org>
Fixes: 8dcc1a9d90 ("fs: New zonefs file system")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Tested-by: Hans Holmberg <hans.holmberg@wdc.com>
When a direct append write is executed, the append offset may correspond
to the last page of a sequential file inode which might have been cached
already by buffered reads, page faults with mmap-read or non-direct
readahead. To ensure that the on-disk and cached data is consistant for
such last cached page, make sure to always invalidate it in
zonefs_file_dio_append(). If the invalidation fails, return -EBUSY to
userspace to differentiate from IO errors.
This invalidation will always be a no-op when the FS block size (device
zone write granularity) is equal to the page size (e.g. 4K).
Reported-by: Hans Holmberg <Hans.Holmberg@wdc.com>
Fixes: 02ef12a663 ("zonefs: use REQ_OP_ZONE_APPEND for sync DIO")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Tested-by: Hans Holmberg <hans.holmberg@wdc.com>
rx->sta->amsdu_mesh_control is being passed to ieee80211_amsdu_to_8023s
without checking rx->sta. Since it doesn't make sense to accept A-MSDU
packets without a sta, simply add a check earlier.
Fixes: 6e4c0d0460 ("wifi: mac80211: add a workaround for receiving non-standard mesh A-MSDU")
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20230330090001.60750-2-nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
ACPI 6.3 introduced the online capable bit, and also introduced MADT
version 5.
Latter was used to distinguish whether the offset storing online capable
could be used. However ACPI 6.2b has MADT version "45" which is for
an errata version of the ACPI 6.2 spec. This means that the Linux code
for detecting availability of MADT will mistakenly flag ACPI 6.2b as
supporting online capable which is inaccurate as it's an ACPI 6.3 feature.
Instead use the FADT major and minor revision fields to distinguish this.
[ bp: Massage. ]
Fixes: aa06e20f1b ("x86/ACPI: Don't add CPUs that are not online capable")
Reported-by: Eric DeVolder <eric.devolder@oracle.com>
Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: <stable@kernel.org>
Link: https://lore.kernel.org/r/943d2445-84df-d939-f578-5d8240d342cc@unsolicited.net
Arseniy Krasnov says:
====================
fix header length on skb merging
this patchset fixes appending newly arrived skbuff to the last skbuff of
the socket's queue during rx path. Problem fires when we are trying to
append data to skbuff which was already processed in dequeue callback
at least once. Dequeue callback calls function 'skb_pull()' which changes
'skb->len'. In current implementation 'skb->len' is used to update length
in header of last skbuff after new data was copied to it. This is bug,
because value in header is used to calculate 'rx_bytes'/'fwd_cnt' and
thus must be constant during skbuff lifetime. Here is example, we have
two skbuffs: skb0 with length 10 and skb1 with length 4.
1) skb0 arrives, hdr->len == skb->len == 10, rx_bytes == 10
2) Read 3 bytes from skb0, skb->len == 7, hdr->len == 10, rx_bytes == 10
3) skb1 arrives, hdr->len == skb->len == 4, rx_bytes == 14
4) Append skb1 to skb0, skb0 now has skb->len == 11, hdr->len == 11.
But value of 11 in header is invalid.
5) Read whole skb0, update rx_bytes by 11 from skb0's header.
6) At this moment rx_bytes == 3, but socket's queue is empty.
This bug starts to fire since:
commit
0777061657 ("virtio/vsock: don't use skbuff state to account credit")
In fact, it presents before, but didn't triggered due to a little bit
buggy implementation of credit calculation logic. So i'll use Fixes tag
for it.
I really forgot about this branch in rx path when implemented patch
0777061657.
This patchset contains 3 patches:
1) Fix itself.
2) Patch with WARN_ONCE() to catch such problems in future.
3) Patch with test which triggers skb appending logic. It looks like
simple test with several 'send()' and 'recv()', but it checks, that
skbuff appending works ok.
====================
Link: https://lore.kernel.org/r/0683cc6e-5130-484c-1105-ef2eb792d355@sberdevices.ru
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
This adds test which checks case when data of newly received skbuff is
appended to the last skbuff in the socket's queue. It looks like simple
test with 'send()' and 'recv()', but internally it triggers logic which
appends one received skbuff to another. Test checks that this feature
works correctly.
This test is actual only for virtio transport.
Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
This adds WARN_ONCE() and return from stream dequeue callback when
socket's queue is empty, but 'rx_bytes' still non-zero. This allows
the detection of potential bugs due to packet merging (see previous
patch).
Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
This fixes appending newly arrived skbuff to the last skbuff of the
socket's queue. Problem fires when we are trying to append data to skbuff
which was already processed in dequeue callback at least once. Dequeue
callback calls function 'skb_pull()' which changes 'skb->len'. In current
implementation 'skb->len' is used to update length in header of the last
skbuff after new data was copied to it. This is bug, because value in
header is used to calculate 'rx_bytes'/'fwd_cnt' and thus must be not
be changed during skbuff's lifetime.
Bug starts to fire since:
commit 0777061657
("virtio/vsock: don't use skbuff state to account credit")
It presents before, but didn't triggered due to a little bit buggy
implementation of credit calculation logic. So use Fixes tag for it.
Fixes: 0777061657 ("virtio/vsock: don't use skbuff state to account credit")
Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Like the other calls in this function virt_to_page() expects
a pointer, not an integer.
However since many architectures implement virt_to_pfn() as
a macro, this function becomes polymorphic and accepts both a
(unsigned long) and a (void *).
Fix this up with an explicit cast.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>
Currently, with VHE, KVM enables the EL0 event counting for the
guest on vcpu_load() or KVM enables it as a part of the PMU
register emulation process, when needed. However, in the migration
case (with VHE), the same handling is lacking, as vPMU register
values that were restored by userspace haven't been propagated yet
(the PMU events haven't been created) at the vcpu load-time on the
first KVM_RUN (kvm_vcpu_pmu_restore_guest() called from vcpu_load()
on the first KVM_RUN won't do anything as events_{guest,host} of
kvm_pmu_events are still zero).
So, with VHE, enable the guest's EL0 event counting on the first
KVM_RUN (after the migration) when needed. More specifically,
have kvm_pmu_handle_pmcr() call kvm_vcpu_pmu_restore_guest()
so that kvm_pmu_handle_pmcr() on the first KVM_RUN can take
care of it.
Fixes: d0c94c4979 ("KVM: arm64: Restore PMU configuration on first run")
Cc: stable@vger.kernel.org
Reviewed-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Reiji Watanabe <reijiw@google.com>
Link: https://lore.kernel.org/r/20230329023944.2488484-1-reijiw@google.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Michael Chan says:
====================
bnxt_en: 3 Bug fixes
This series contains 3 small bug fixes covering ethtool self test, PCI
ID string typos, and some missing 200G link speed ethtool reporting logic.
====================
Link: https://lore.kernel.org/r/20230329013021.5205-1-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
bnxt_fw_to_ethtool_speed() is missing the case statement for 200G
link speed reported by firmware. As a result, ethtool will report
unknown speed when the firmware reports 200G link speed.
Fixes: 532262ba3b ("bnxt_en: ethtool: support PAM4 link speeds up to 200G")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2023-03-28 (ice)
This series contains updates to ice driver only.
Jesse fixes mismatched header documentation reported when building with
W=1.
Brett restricts setting of VSI context to only applicable fields for the
given ICE_AQ_VSI_PROP_Q_OPT_VALID bit.
Junfeng adds check when adding Flow Director filters that conflict with
existing filter rules.
Jakob Koschel adds interim variable for iterating to prevent possible
misuse after looping.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
ice: fix invalid check for empty list in ice_sched_assoc_vsi_to_agg()
ice: add profile conflict check for AVF FDIR
ice: Fix ice_cfg_rdma_fltr() to only update relevant fields
ice: fix W=1 headers mismatch
====================
Link: https://lore.kernel.org/r/20230328172035.3904953-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Stefan Schmidt says:
====================
ieee802154 for net 2023-03-29
Two small fixes this time.
Dongliang Mu removed an unnecessary null pointer check.
Harshit Mogalapalli fixed an int comparison unsigned against signed from a
recent other fix in the ca8210 driver.
* tag 'ieee802154-for-net-2023-03-29' of git://git.kernel.org/pub/scm/linux/kernel/git/wpan/wpan:
net: ieee802154: remove an unnecessary null pointer check
ca8210: Fix unsigned mac_len comparison with zero in ca8210_skb_tx()
====================
Link: https://lore.kernel.org/r/20230329064541.2147400-1-stefan@datenfreihafen.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In gsi_trans_pool_init_dma(), the total size of a pool of memory
used for DMA transactions is calculated. However the calculation is
done incorrectly.
For 4KB pages, this total size is currently always more than one
page, and as a result, the calculation produces a positive (though
incorrect) total size. The code still works in this case; we just
end up with fewer DMA pool entries than we intended.
Bjorn Andersson tested booting a kernel with 16KB pages, and hit a
null pointer derereference in sg_alloc_append_table_from_pages(),
descending from gsi_trans_pool_init_dma(). The cause of this was
that a 16KB total size was going to be allocated, and with 16KB
pages the order of that allocation is 0. The total_size calculation
yielded 0, which eventually led to the crash.
Correcting the total_size calculation fixes the problem.
Reported-by: Bjorn Andersson <quic_bjorande@quicinc.com>
Tested-by: Bjorn Andersson <quic_bjorande@quicinc.com>
Fixes: 9dd441e4ed ("soc: qcom: ipa: GSI transactions")
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Alex Elder <elder@linaro.org>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20230328162751.2861791-1-elder@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When we allocate a nvme-tcp queue, we set the data_ready callback before
we actually need to use it. This creates the potential that if a stray
controller sends us data on the socket before we connect, we can trigger
the io_work and start consuming the socket.
In this case reported: we failed to allocate one of the io queues, and
as we start releasing the queues that we already allocated, we get
a UAF [1] from the io_work which is running before it should really.
Fix this by setting the socket ops callbacks only before we start the
queue, so that we can't accidentally schedule the io_work in the
initialization phase before the queue started. While we are at it,
rename nvme_tcp_restore_sock_calls to pair with nvme_tcp_setup_sock_ops.
[1]:
[16802.107284] nvme nvme4: starting error recovery
[16802.109166] nvme nvme4: Reconnecting in 10 seconds...
[16812.173535] nvme nvme4: failed to connect socket: -111
[16812.173745] nvme nvme4: Failed reconnect attempt 1
[16812.173747] nvme nvme4: Reconnecting in 10 seconds...
[16822.413555] nvme nvme4: failed to connect socket: -111
[16822.413762] nvme nvme4: Failed reconnect attempt 2
[16822.413765] nvme nvme4: Reconnecting in 10 seconds...
[16832.661274] nvme nvme4: creating 32 I/O queues.
[16833.919887] BUG: kernel NULL pointer dereference, address: 0000000000000088
[16833.920068] nvme nvme4: Failed reconnect attempt 3
[16833.920094] #PF: supervisor write access in kernel mode
[16833.920261] nvme nvme4: Reconnecting in 10 seconds...
[16833.920368] #PF: error_code(0x0002) - not-present page
[16833.921086] Workqueue: nvme_tcp_wq nvme_tcp_io_work [nvme_tcp]
[16833.921191] RIP: 0010:_raw_spin_lock_bh+0x17/0x30
...
[16833.923138] Call Trace:
[16833.923271] <TASK>
[16833.923402] lock_sock_nested+0x1e/0x50
[16833.923545] nvme_tcp_try_recv+0x40/0xa0 [nvme_tcp]
[16833.923685] nvme_tcp_io_work+0x68/0xa0 [nvme_tcp]
[16833.923824] process_one_work+0x1e8/0x390
[16833.923969] worker_thread+0x53/0x3d0
[16833.924104] ? process_one_work+0x390/0x390
[16833.924240] kthread+0x124/0x150
[16833.924376] ? set_kthread_struct+0x50/0x50
[16833.924518] ret_from_fork+0x1f/0x30
[16833.924655] </TASK>
Reported-by: Yanjun Zhang <zhangyanjun@cestc.cn>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Tested-by: Yanjun Zhang <zhangyanjun@cestc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Following process will make ubi attaching failed since commit
1b42b1a36f ("ubi: ensure that VID header offset ... size"):
ID="0xec,0xa1,0x00,0x15" # 128M 128KB 2KB
modprobe nandsim id_bytes=$ID
flash_eraseall /dev/mtd0
modprobe ubi mtd="0,2048" # set vid_hdr offset as 2048 (one page)
(dmesg):
ubi0 error: ubi_attach_mtd_dev [ubi]: VID header offset 2048 too large.
UBI error: cannot attach mtd0
UBI error: cannot initialize UBI, error -22
Rework original solution, the key point is making sure
'vid_hdr_shift + UBI_VID_HDR_SIZE < ubi->vid_hdr_alsize',
so we should check vid_hdr_shift rather not vid_hdr_offset.
Then, ubi still support (sub)page aligined VID header offset.
Fixes: 1b42b1a36f ("ubi: ensure that VID header offset ... size")
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Tested-by: Nicolas Schichan <nschichan@freebox.fr>
Tested-by: Miquel Raynal <miquel.raynal@bootlin.com> # v5.10, v4.19
Signed-off-by: Richard Weinberger <richard@nod.at>
8b/10b encoding needs to add 3% fec overhead into the pbn.
In the Synapcis Cascaded MST hub, the first stage MST branch device
needs the information to determine the timeslot count for the
second stage MST branch device. Missing this overhead will leads to
insufficient timeslot allocation.
Cc: stable@vger.kernel.org
Cc: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Hersen Wu <hersenxs.wu@amd.com>
Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com>
Signed-off-by: Fangzhi Zuo <Jerry.Zuo@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Conor Dooley <conor.dooley@microchip.com> says:
Here's my attempt at fixing both the use of an FPU on XIP kernels and
the issue that Jason ran into where CONFIG_FPU, which needs the
alternatives frame work for has_fpu() checks, could be enabled without
the alternatives actually being present.
For the former, a "slow" fallback that does not use alternatives is
added to riscv_has_extension_[un]likely() that can be used with XIP.
Obviously, we want to make use of Jisheng's alternatives based approach
where possible, so any users of riscv_has_extension_[un]likely() will
want to make sure that they select RISCV_ALTERNATIVE.
If they don't however, they'll hit the fallback path which (should,
sparing a silly mistake from me!) behave in the same way, thus
succeeding silently. Sounds like a
To prevent "depends on !XIP_KERNEL; select RISCV_ALTERNATIVE" spreading
like the plague through the various places that want to check for the
presence of extensions, and sidestep the potential silent "success"
mentioned above, all users RISCV_ALTERNATIVE are converted from selects
to dependencies, with the option being selected for all !XIP_KERNEL
builds.
I know that the VDSO was a key place that Jisheng wanted to use the new
helper rather than static branches, and I think the fallback path
should not cause issues there.
See the thread at [1] for the prior discussion.
1 - https://lore.kernel.org/linux-riscv/20230128172856.3814-1-jszhang@kernel.org/T/#m21390d570997145d31dd8bb95002fd61f99c6573
[Palmer: merging in the fixes as a branch as there's some features that
depend on it.]
* b4-shazam-merge:
RISC-V: always select RISCV_ALTERNATIVE for non-xip kernels
RISC-V: add non-alternative fallback for riscv_has_extension_[un]likely()
Link: https://lore.kernel.org/r/20230324100538.3514663-1-conor.dooley@microchip.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Update I2C timing registers based on latest hardware design.
This fix does not break functionality of chips with older design and
existing users will not be affected.
Fixes: 3616936972 ("i2c: microchip: pci1xxxx: Add driver for I2C host controller in multifunction endpoint of pci1xxxx switch")
Signed-off-by: Tharun Kumar P <tharunkumar.pasumarthi@microchip.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
When moving switch_to's has_fpu() over to using
riscv_has_extension_likely() rather than static branches, the FPU code
gained a dependency on the alternatives framework.
That dependency has now been removed, as riscv_has_extension_ikely() now
contains a fallback path, using __riscv_isa_extension_available(), but
if CONFIG_RISCV_ALTERNATIVE isn't selected when CONFIG_FPU is, has_fpu()
checks will not benefit from the "fast path" that the alternatives
framework provides.
We want to ensure that alternatives are available whenever
riscv_has_extension_[un]likely() is used, rather than silently falling
back to the slow path, but rather than rely on selecting
RISCV_ALTERNATIVE in the myriad of locations that may use
riscv_has_extension_[un]likely(), select it (almost) always instead by
adding it to the main RISCV config entry.
xip kernels cannot make use of the alternatives framework, so it is not
enabled for those configurations, although this is the status quo.
All current sites that select RISCV_ALTERNATIVE are converted to
dependencies on the option instead. The explicit dependencies on
!XIP_KERNEL can be dropped, as RISCV_ALTERNATIVE is not user selectable.
Fixes: 702e64550b ("riscv: fpu: switch has_fpu() to riscv_has_extension_likely()")
Link: https://lore.kernel.org/all/ZBruFRwt3rUVngPu@zx2c4.com/
Reported-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com>
Link: https://lore.kernel.org/r/20230324100538.3514663-3-conor.dooley@microchip.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
The has_fpu() check, which in turn calls riscv_has_extension_likely(),
relies on alternatives to figure out whether the system has an FPU.
As a result, it will malfunction on XIP kernels, as they do not support
the alternatives mechanism.
When alternatives support is not present, fall back to using
__riscv_isa_extension_available() in riscv_has_extension_[un]likely()
instead stead, which handily takes the same argument, so that kernels
that do not support alternatives can accurately report the presence of
FPU support.
Fixes: 702e64550b ("riscv: fpu: switch has_fpu() to riscv_has_extension_likely()")
Link: https://lore.kernel.org/all/ad445951-3d13-4644-94d9-e0989cda39c3@spud/
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com>
Link: https://lore.kernel.org/r/20230324100538.3514663-2-conor.dooley@microchip.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
It was found that commit 7a2127e66a ("cpuset: Call
set_cpus_allowed_ptr() with appropriate mask for task") introduced a bug
that corrupted "cpuset.cpus" of a partition root when it was updated.
It is because the tmp->new_cpus field of the passed tmp parameter
of update_parent_subparts_cpumask() should not be used at all as
it contains important cpumask data that should not be overwritten.
Fix it by using tmp->addmask instead.
Also update update_cpumask() to make sure that trialcs->cpu_allowed
will not be corrupted until it is no longer needed.
Fixes: 7a2127e66a ("cpuset: Call set_cpus_allowed_ptr() with appropriate mask for task")
Signed-off-by: Waiman Long <longman@redhat.com>
Cc: stable@vger.kernel.org # v6.2+
Signed-off-by: Tejun Heo <tj@kernel.org>
Commit 52f04f10b9 ("thermal: intel: int340x: processor_thermal: Fix
deadlock") addressed deadlock issue during user space trip update. But it
missed a case when thermal zone device is disabled when user writes 0.
Call to thermal_zone_device_disable() also causes deadlock as it also
tries to lock tz->lock, which is already claimed by trip_point_temp_store()
in the thermal core code.
Remove call to thermal_zone_device_disable() in the function
sys_set_trip_temp(), which is called from trip_point_temp_store().
Fixes: 52f04f10b9 ("thermal: intel: int340x: processor_thermal: Fix deadlock")
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: 6.2+ <stable@vger.kernel.org> # 6.2+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Commit 3e45352259 ("md: Free resources in __md_stop") tried to fix
null-ptr-deference for 'active_io' by moving percpu_ref_exit() to
__md_stop(), however, the commit also moving 'writes_pending' to
__md_stop(), and this will cause mdadm tests broken:
BUG: kernel NULL pointer dereference, address: 0000000000000038
Oops: 0000 [#1] PREEMPT SMP
CPU: 15 PID: 17830 Comm: mdadm Not tainted 6.3.0-rc3-next-20230324-00009-g520d37
RIP: 0010:free_percpu+0x465/0x670
Call Trace:
<TASK>
__percpu_ref_exit+0x48/0x70
percpu_ref_exit+0x1a/0x90
__md_stop+0xe9/0x170
do_md_stop+0x1e1/0x7b0
md_ioctl+0x90c/0x1aa0
blkdev_ioctl+0x19b/0x400
vfs_ioctl+0x20/0x50
__x64_sys_ioctl+0xba/0xe0
do_syscall_64+0x6c/0xe0
entry_SYSCALL_64_after_hwframe+0x63/0xcd
And the problem can be reporduced 100% by following test:
mdadm -CR /dev/md0 -l1 -n1 /dev/sda --force
echo inactive > /sys/block/md0/md/array_state
echo read-auto > /sys/block/md0/md/array_state
echo inactive > /sys/block/md0/md/array_state
Root cause:
// start raid
raid1_run
mddev_init_writes_pending
percpu_ref_init
// inactive raid
array_state_store
do_md_stop
__md_stop
percpu_ref_exit
// start raid again
array_state_store
do_md_run
raid1_run
mddev_init_writes_pending
if (mddev->writes_pending.percpu_count_ptr)
// won't reinit
// inactive raid again
...
percpu_ref_exit
-> null-ptr-deference
Before the commit, 'writes_pending' is exited when mddev is freed, and
it's safe to restart raid because mddev_init_writes_pending() already make
sure that 'writes_pending' will only be initialized once.
Fix the prblem by moving 'writes_pending' back, it's a litter hard to find
the relationship between alloc memory and free memory, however, code
changes is much less and we lived with this for a long time already.
Fixes: 3e45352259 ("md: Free resources in __md_stop")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20230328094400.1448955-1-yukuai1@huaweicloud.com
Pull xtensa fixes from Max Filippov:
- fix KASAN report in show_stack
- drop linux-xtensa mailing list from the MAINTAINERS file
* tag 'xtensa-20230327' of https://github.com/jcmvbkbc/linux-xtensa:
MAINTAINERS: xtensa: drop linux-xtensa@linux-xtensa.org mailing list
xtensa: fix KASAN report for show_stack
Pull f2fs fix from Jaegeuk Kim:
"This fixes a tracepoint field size in f2fs in preparation for stricter
rules for tracing fields"
* tag 'f2fs-fix-6.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs:
f2fs: Fix f2fs_truncate_partial_nodes ftrace event
The drm_buddy_test KUnit tests verify that returned blocks have sizes
which are powers of two using is_power_of_2(). However, is_power_of_2()
operations on a 'long', but the block size is a u64. So on systems where
long is 32-bit, this can sometimes fail even on correctly sized blocks.
This only reproduces randomly, as the parameters passed to the buddy
allocator in this test are random. The seed 0xb2e06022 reproduced it
fine here.
For now, just hardcode an is_power_of_2() implementation using
x & (x - 1).
Signed-off-by: David Gow <davidgow@google.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Arunpravin Paneer Selvam <arunpravin.paneerselvam@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230329065532.2122295-2-davidgow@google.com
Signed-off-by: Christian König <christian.koenig@amd.com>
The hypervisor supports user-mode NX from Power10.
pseries_vas_dlpar_cpu() is called from lparcfg_write() to update VAS
windows for DLPAR event in shared processor mode and the kernel gets
-ENOTSUPP for HCALLs if the user-mode NX is not supported. The current
VAS implementation also supports only with Radix page tables. Whereas in
dedicated processor mode, pseries_vas_notifier() is registered only if
the copy/paste feature is enabled. So instead of displaying HCALL error
messages, update VAS capabilities if the copy/paste feature is
available.
This patch ignores updating VAS capabilities in pseries_vas_dlpar_cpu()
and returns success if the copy/paste feature is not enabled. Then
lparcfg_write() completes the processor DLPAR operations without any
failures.
Fixes: 2147783d6b ("powerpc/pseries: Use lparcfg to reconfig VAS windows for DLPAR CPU")
Cc: stable@vger.kernel.org # v6.1+
Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Reviewed-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/1d0e727e7dbd9a28627ef08ca9df9c86a50175e2.camel@linux.ibm.com
After entering 6.3-rc1 the LLC cacheinfo is not exported on our ACPI
based arm64 server. This is because the LLC cacheinfo is partly reset
when secondary CPUs boot up. On arm64 the primary cpu will allocate
and setup cacheinfo:
init_cpu_topology()
for_each_possible_cpu()
fetch_cache_info() // Allocate cacheinfo and init levels
detect_cache_attributes()
cache_shared_cpu_map_setup()
if (!last_level_cache_is_valid()) // not valid, setup LLC
cache_setup_properties() // setup LLC
On secondary CPU boot up:
detect_cache_attributes()
populate_cache_leaves()
get_cache_type() // Get cache type from clidr_el1,
// for LLC type=CACHE_TYPE_NOCACHE
cache_shared_cpu_map_setup()
if (!last_level_cache_is_valid()) // Valid and won't go to this branch,
// leave LLC's type=CACHE_TYPE_NOCACHE
The last_level_cache_is_valid() use cacheinfo->{attributes, fw_token} to
test it's valid or not, but populate_cache_leaves() will only reset
LLC's type, so we won't try to re-setup LLC's type and leave it
CACHE_TYPE_NOCACHE and won't export it through sysfs.
This patch tries to fix this by not re-populating the cache leaves if
the LLC is valid.
Fixes: 5944ce092b ("arch_topology: Build cacheinfo from primary CPU")
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Reviewed-by: Pierre Gondois <pierre.gondois@arm.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Link: https://lore.kernel.org/r/20230328114915.33340-1-yangyicong@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The nouveau code used to call drm_fb_helper_initial_config() from
nouveau_fbcon_init() before calling drm_dev_register(). This would
probe all connectors so that drm_connector->status could be used during
backlight registration which runs from nouveau_connector_late_register().
After commit 4a16dd9d18 ("drm/nouveau/kms: switch to drm fbdev helpers")
the fbdev emulation code, which now is a drm-client, can only run after
drm_dev_register(). So during backlight registration the connectors are
not probed yet and the drm_connector->status == connected check in
nv50_backlight_init() would now always fail.
Replace the drm_connector->status == connected check with
a drm_helper_probe_detect() == connected check to fix nv_backlight
no longer getting registered because of this.
Fixes: 4a16dd9d18 ("drm/nouveau/kms: switch to drm fbdev helpers")
Link: https://gitlab.freedesktop.org/drm/nouveau/-/issues/202
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2181941
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230326205433.36485-1-hdegoede@redhat.com
According to LPUART RM, Transmission Complete Flag becomes 0 if queuing
a break character by writing 1 to CTRL[SBK], so here need to avoid
checking for transmission complete when UARTCTRL_SBK is asserted,
otherwise the lpuart32_tx_empty may never get TIOCSER_TEMT.
Commit 2411fd94ceaa("tty: serial: fsl_lpuart: skip waiting for
transmission complete when UARTCTRL_SBK is asserted") only fix it in
lpuart32_set_termios(), here also fix it in lpuart32_tx_empty().
Fixes: 380c966c09 ("tty: serial: fsl_lpuart: add 32-bit register interface support")
Cc: stable <stable@kernel.org>
Signed-off-by: Sherry Sun <sherry.sun@nxp.com>
Link: https://lore.kernel.org/r/20230323054415.20363-1-sherry.sun@nxp.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Hans de Goede reported Bluetooth adapters (HCIs) connected over an UART
connection failed due corrupted Rx payload. The problem was narrowed
down to DMA Rx starting on UART_IIR_THRI interrupt. The problem occurs
despite LSR having DR bit set, which is precondition for attempting to
start DMA Rx in the first place.
From a debug patch:
[x.807834] 8250irq: iir=cc lsr+saved=60 received=0/15 ier=0f dma_t/rx/err=0/0/0
[x.808676] 8250irq: iir=c2 lsr+saved=61 received=0/0 ier=0f dma_t/rx/err=0/0/0
[x.808776] 8250irq: iir=cc lsr+saved=60 received=1/12 ier=0d dma_t/rx/err=0/1/0
[x.808870] Bluetooth: hci0: Frame reassembly failed (-84)
In the debug snippet, received field indicates 1 byte was transferred
over DMA and 12 bytes after that with the non-DMA Rx. The sole byte DMA
handled was corrupted (gets zeroed) which leads to the HCI failure.
This problem became apparent after commit e8ffbb71f7 ("serial: 8250:
use THRE & __stop_tx also with DMA") changed Tx stop behavior. Tx stop
is now triggered from a THRI interrupt.
Despite that this problem looks like a HW bug, this fix is not adding
UART_BUG_xx flag to the driver beucase it seems useful in general to
avoid starting DMA when there are only a few bytes to transfer.
Skipping DMA for small transfers avoids the extra overhead DMA incurs.
Thus, don't setup DMA Rx on UART_IIR_THRI but leave it to a subsequent
interrupt which has Rx a related IIR value.
By returning false from handle_rx_dma(), the DMA vs non-DMA decision is
postponed until either UART_IIR_RDI (FIFO threshold worth of bytes
awaiting) or UART_IIR_TIMEOUT (inter-character timeout) triggers at a
later time which allows better to discern whether the number of bytes
warrants starting DMA or not.
Reported-by: Hans de Goede <hdegoede@redhat.com>
Tested-by: Hans de Goede <hdegoede@redhat.com>
Fixes: e8ffbb71f7 ("serial: 8250: use THRE & __stop_tx also with DMA")
Cc: stable@vger.kernel.org
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20230317103034.12881-1-ilpo.jarvinen@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When we set the dual-role port to Host mode, we observed the following
splat:
[ 167.057718] BUG: sleeping function called from invalid context at
include/linux/sched/mm.h:229
[ 167.057872] Workqueue: events tegra_xusb_usb_phy_work
[ 167.057954] Call trace:
[ 167.057962] dump_backtrace+0x0/0x210
[ 167.057996] show_stack+0x30/0x50
[ 167.058020] dump_stack_lvl+0x64/0x84
[ 167.058065] dump_stack+0x14/0x34
[ 167.058100] __might_resched+0x144/0x180
[ 167.058140] __might_sleep+0x64/0xd0
[ 167.058171] slab_pre_alloc_hook.constprop.0+0xa8/0x110
[ 167.058202] __kmalloc_track_caller+0x74/0x2b0
[ 167.058233] kvasprintf+0xa4/0x190
[ 167.058261] kasprintf+0x58/0x90
[ 167.058285] tegra_xusb_find_port_node.isra.0+0x58/0xd0
[ 167.058334] tegra_xusb_find_port+0x38/0xa0
[ 167.058380] tegra_xusb_padctl_get_usb3_companion+0x38/0xd0
[ 167.058430] tegra_xhci_id_notify+0x8c/0x1e0
[ 167.058473] notifier_call_chain+0x88/0x100
[ 167.058506] atomic_notifier_call_chain+0x44/0x70
[ 167.058537] tegra_xusb_usb_phy_work+0x60/0xd0
[ 167.058581] process_one_work+0x1dc/0x4c0
[ 167.058618] worker_thread+0x54/0x410
[ 167.058650] kthread+0x188/0x1b0
[ 167.058672] ret_from_fork+0x10/0x20
The function tegra_xusb_padctl_get_usb3_companion eventually calls
tegra_xusb_find_port and this in turn calls kasprintf which might sleep
and so cannot be called from an atomic context.
Fix this by moving the call to tegra_xusb_padctl_get_usb3_companion to
the tegra_xhci_id_work function where it is really needed.
Fixes: f836e78430 ("usb: xhci-tegra: Add OTG support")
Cc: stable@vger.kernel.org
Signed-off-by: Wayne Chang <waynec@nvidia.com>
Signed-off-by: Haotien Hsu <haotienh@nvidia.com>
Link: https://lore.kernel.org/r/20230327095548.1599470-1-haotienh@nvidia.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
During fiemap, when walking backreferences to determine if a b+tree
node/leaf is shared, we may find a tree block (leaf or node) for which
two parents were added to the references ulist. This happens if we get
for example one direct ref (shared tree block ref) and one indirect ref
(non-shared tree block ref) for the tree block at the current level,
which can happen during relocation.
In that case the fiemap path cache can not be used since it's meant for
a single path, with one tree block at each possible level, so having
multiple references for a tree block at any level may result in getting
the level counter exceed BTRFS_MAX_LEVEL and eventually trigger the
warning:
WARN_ON_ONCE(level >= BTRFS_MAX_LEVEL)
at lookup_backref_shared_cache() and at store_backref_shared_cache().
This is harmless since the code ignores any level >= BTRFS_MAX_LEVEL, the
warning is there just to catch any unexpected case like the one described
above. However if a user finds this it may be scary and get reported.
So just ignore the path cache once we find a tree block for which there
are more than one reference, which is the less common case, and update
the cache with the sharedness check result for all levels below the level
for which we found multiple references.
Reported-by: Jarno Pelkonen <jarno.pelkonen@gmail.com>
Link: https://lore.kernel.org/linux-btrfs/CAKv8qLmDNAGJGCtsevxx_VZ_YOvvs1L83iEJkTgyA4joJertng@mail.gmail.com/
Fixes: 12a824dc67 ("btrfs: speedup checking for extent sharedness during fiemap")
CC: stable@vger.kernel.org # 6.1+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In an dedupe comparison iter loop, the length of iomap_iter decreases
because it implies the remaining length after each iteration.
The dedupe command will fail with -EIO if the range is larger than one
page size and not aligned to the page size. Also report warning in dmesg:
[ 4338.498374] ------------[ cut here ]------------
[ 4338.498689] WARNING: CPU: 3 PID: 1415645 at fs/iomap/iter.c:16
...
The compare function should use the min length of the current iters,
not the total length.
Link: https://lkml.kernel.org/r/1679469958-2-1-git-send-email-ruansy.fnst@fujitsu.com
Fixes: 0e79e3736d ("fsdax: dedupe: iter two files at the same time")
Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Pull RCU fix from Paul McKenney:
"This brings the rcu_torture_read event trace into line with the new
trace tools by replacing this event trace's __field() with the
corresponding __array().
Without this, the new trace tools will fail when presented wtih an
rcu_torture_read event trace, which is a regression from the viewpoint
of trace tools users"
Link: https://lore.kernel.org/all/20230320133650.5388a05e@gandalf.local.home/
* tag 'urgent-rcu.2023.03.28a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
rcu: Fix rcu_torture_read ftrace event
Pull Kselftest fixes from Shuah Khan:
"One single fix for sigaltstack test -Wuninitialized warning found when
building with clang"
* tag 'linux-kselftest-fixes-6.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests: sigaltstack: fix -Wuninitialized
The lockdep_assert_held() calls added to cooling_device_stats_setup()
and cooling_device_stats_destroy() by commit 790930f442 ("thermal:
core: Introduce thermal_cooling_device_update()") trigger false-positive
lockdep reports in code paths that are not subject to race conditions
(before cooling device registration and after cooling device removal).
For this reason, remove the lockdep_assert_held() calls from both
cooling_device_stats_setup() and cooling_device_stats_destroy() and
add one to thermal_cooling_device_stats_reinit() that has to be called
under the cdev lock.
Fixes: 790930f442 ("thermal: core: Introduce thermal_cooling_device_update()")
Link: https://lore.kernel.org/linux-acpi/ZCIDTLFt27Ei7+V6@ideak-desk.fi.intel.com
Reported-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Pull s390 fixes from Vasily Gorbik:
- Fix an error handling issue with PTRACE_GET_LAST_BREAK request so
that -EFAULT is returned if put_user() fails, instead of ignoring it
- Fix a build race for the modules_prepare target when
CONFIG_EXPOLINE_EXTERN is enabled by reintroducing the dependence on
scripts
- Fix a memory leak in vfio_ap device driver
- Add missing earlyclobber annotations to __clear_user() inline
assembly to prevent incorrect register allocation
* tag 's390-6.3-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/ptrace: fix PTRACE_GET_LAST_BREAK error handling
s390: reintroduce expoline dependence to scripts
s390/vfio-ap: fix memory leak in vfio_ap device driver
s390/uaccess: add missing earlyclobber annotations to __clear_user()
The code implicitly assumes that the list iterator finds a correct
handle. If 'vsi_handle' is not found the 'old_agg_vsi_info' was
pointing to an bogus memory location. For safety a separate list
iterator variable should be used to make the != NULL check on
'old_agg_vsi_info' correct under any circumstances.
Additionally Linus proposed to avoid any use of the list iterator
variable after the loop, in the attempt to move the list iterator
variable declaration into the macro to avoid any potential misuse after
the loop. Using it in a pointer comparison after the loop is undefined
behavior and should be omitted if possible [1].
Fixes: 37c592062b ("ice: remove the VSI info from previous agg")
Link: https://lore.kernel.org/all/CAHk-=wgRr_D8CB-D9Kg-c=EHreAsk5SqXPwr9Y7k9sA6cWXJ6w@mail.gmail.com/ [1]
Signed-off-by: Jakob Koschel <jkl820.git@gmail.com>
Tested-by: Arpana Arland <arpanax.arland@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Add profile conflict check while adding some FDIR rules to avoid
unexpected flow behavior, rules may have conflict including:
IPv4 <---> {IPv4_UDP, IPv4_TCP, IPv4_SCTP}
IPv6 <---> {IPv6_UDP, IPv6_TCP, IPv6_SCTP}
For example, when we create an FDIR rule for IPv4, this rule will work
on packets including IPv4, IPv4_UDP, IPv4_TCP and IPv4_SCTP. But if we
then create an FDIR rule for IPv4_UDP and then destroy it, the first
FDIR rule for IPv4 cannot work on pkt IPv4_UDP then.
To prevent this unexpected behavior, we add restriction in software
when creating FDIR rules by adding necessary profile conflict check.
Fixes: 1f7ea1cd6a ("ice: Enable FDIR Configure for AVF")
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The current implementation causes ice_vsi_update() to update all VSI
fields based on the cached VSI context. This also assumes that the
ICE_AQ_VSI_PROP_Q_OPT_VALID bit is set. This can cause problems if the
VSI context is not correctly synced by the driver. Fix this by only
updating the fields that correspond to ICE_AQ_VSI_PROP_Q_OPT_VALID.
Also, make sure to save the updated result in the cached VSI context
on success.
Fixes: 348048e724 ("ice: Implement iidc operations")
Co-developed-by: Robert Malz <robertx.malz@intel.com>
Signed-off-by: Robert Malz <robertx.malz@intel.com>
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Reviewed-by: Piotr Raczynski <piotr.raczynski@intel.com>
Tested-by: Jakub Andrysiak <jakub.andrysiak@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
make modules W=1 returns:
.../ice/ice_txrx_lib.c:448: warning: Function parameter or member 'first_idx' not described in 'ice_finalize_xdp_rx'
.../ice/ice_txrx.c:948: warning: Function parameter or member 'ntc' not described in 'ice_get_rx_buf'
.../ice/ice_txrx.c:1038: warning: Excess function parameter 'rx_buf' description in 'ice_construct_skb'
Fix these warnings by adding and deleting the deviant arguments.
Fixes: 2fba7dc515 ("ice: Add support for XDP multi-buffer on Rx side")
Fixes: d7956d81f1 ("ice: Pull out next_to_clean bump out of ice_put_rx_buf()")
CC: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Reviewed-by: Piotr Raczynski <piotr.raczynski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
There are some subtle differences between release_device() and
set_platform_dma_ops() callbacks, so separate those two callbacks. Device
links should be removed only in release_device(), because they were
created in probe_device() on purpose and they are needed for proper
Exynos IOMMU driver operation. While fixing this, remove the conditional
code as it is not really needed.
Reported-by: Jason Gunthorpe <jgg@ziepe.ca>
Fixes: 189d496b48 ("iommu/exynos: Add missing set_platform_dma_ops callback")
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: Sam Protsenko <semen.protsenko@linaro.org>
Link: https://lore.kernel.org/r/20230315232514.1046589-1-m.szyprowski@samsung.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
This fixes a similar problem to the one observed in:
commit 4e5a04be88 ("pinctrl: amd: disable and mask interrupts on probe").
On some systems, during suspend/resume cycle firmware leaves
an interrupt enabled on a pin that is not used by the kernel.
This confuses the AMD pinctrl driver and causes spurious interrupts.
The driver already has logic to detect if a pin is used by the kernel.
Leverage it to re-initialize interrupt fields of a pin only if it's not
used by us.
Cc: stable@vger.kernel.org
Fixes: dbad75dd1f ("pinctrl: add AMD GPIO driver support.")
Signed-off-by: Kornel Dulęba <korneld@chromium.org>
Link: https://lore.kernel.org/r/20230320093259.845178-1-korneld@chromium.org
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Unless we have at least one entry queued, then don't call into
io_poll_remove_entries(). Normally this isn't possible, but if we
retry poll then we can have ->nr_entries cleared again as we're
setting it up. If this happens for a poll retry, then we'll still have
at least REQ_F_SINGLE_POLL set. io_poll_remove_entries() then thinks
it has entries to remove.
Clear REQ_F_SINGLE_POLL and REQ_F_DOUBLE_POLL unconditionally when
arming a poll request.
Fixes: c16bda3759 ("io_uring/poll: allow some retries for poll triggering spuriously")
Cc: stable@vger.kernel.org
Reported-by: Pengfei Xu <pengfei.xu@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Juergen Gross says:
====================
xen/netback: fix issue introduced recently
The fix for XSA-423 introduced a bug which resulted in loss of network
connection in some configurations.
The first patch is fixing the issue, while the second one is removing
a test which isn't needed.
====================
Link: https://lore.kernel.org/r/20230327083646.18690-1-jgross@suse.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The tests for the number of grant mapping or copy operations reaching
the array size of the operations buffer at the end of the main loop in
xenvif_tx_build_gops() isn't needed.
The loop can handle at maximum MAX_PENDING_REQS transfer requests, as
XEN_RING_NR_UNCONSUMED_REQUESTS() is taking unsent responses into
consideration, too.
Remove the tests.
Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Fix xenvif_get_requests() not to do grant copy operations across local
page boundaries. This requires to double the maximum number of copy
operations per queue, as each copy could now be split into 2.
Make sure that struct xenvif_tx_cb doesn't grow too large.
Cc: stable@vger.kernel.org
Fixes: ad7f402ae4 ("xen/netback: Ensure protocol headers don't fall in the non-linear area")
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
During warm reset device->fw_client is set to NULL. If a bus driver is
registered after this NULL setting and before new firmware clients are
enumerated by ISHTP, kernel panic will result in the function
ishtp_cl_bus_match(). This is because of reference to
device->fw_client->props.protocol_name.
ISH firmware after getting successfully loaded, sends a warm reset
notification to remove all clients from the bus and sets
device->fw_client to NULL. Until kernel v5.15, all enabled ISHTP kernel
module drivers were loaded right after any of the first ISHTP device was
registered, regardless of whether it was a matched or an unmatched
device. This resulted in all drivers getting registered much before the
warm reset notification from ISH.
Starting kernel v5.16, this issue got exposed after the change was
introduced to load only bus drivers for the respective matching devices.
In this scenario, cros_ec_ishtp device and cros_ec_ishtp driver are
registered after the warm reset device fw_client NULL setting.
cros_ec_ishtp driver_register() triggers the callback to
ishtp_cl_bus_match() to match ISHTP driver to the device and causes kernel
panic in guid_equal() when dereferencing fw_client NULL pointer to get
protocol_name.
Fixes: f155dfeaa4 ("platform/x86: isthp_eclite: only load for matching devices")
Fixes: facfe0a4fd ("platform/chrome: chros_ec_ishtp: only load for matching devices")
Fixes: 0d0cccc0fd ("HID: intel-ish-hid: hid-client: only load for matching devices")
Fixes: 44e2a58cb8 ("HID: intel-ish-hid: fw-loader: only load for matching devices")
Cc: <stable@vger.kernel.org> # 5.16+
Signed-off-by: Tanu Malhotra <tanu.malhotra@intel.com>
Tested-by: Shaunak Saha <shaunak.saha@intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Jonathan writes:
1st set of IIO fixes for 6.3
Usual mixed bag:
- core - output buffers
Fix return of bytes written when only some succeed.
Fix O_NONBLOCK handling to not block.
- adi,ad7791
Fix IRQ type. Not confirmed to have any impact but good to correct it anyway
- adi,adis16400
Missing CONFIG_CRC32
- capella,cm32181
Unregister 2nd I2C client if one is used.
- cio-dac
Fix bitdepth for range check on write.
- linear,ltc2497
Fix a wrong shift of the LSB introduced when switching to be24 handling.
- maxim,max11410
Fix handling of return code in read_poll_timeout()
- qcom,spmi-adc
Fix an accidental change of channel name to include the reg value from OF.
- ti,palmas
Fix a null dereference on remove due to wrong function used to get the
drvdata.
- ti,ads7950
Mark GPIO as can sleep.
* tag 'iio-fixes-for-6.3a' of https://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio:
iio: adc: ti-ads7950: Set `can_sleep` flag for GPIO chip
iio: adc: palmas_gpadc: fix NULL dereference on rmmod
iio: adc: max11410: fix read_poll_timeout() usage
iio: dac: cio-dac: Fix max DAC write value check for 12-bit
iio: light: cm32181: Unregister second I2C client if present
iio: accel: kionix-kx022a: Get the timestamp from the driver's private data in the trigger_handler
iio: adc: ad7791: fix IRQ flags
iio: buffer: make sure O_NONBLOCK is respected
iio: buffer: correctly return bytes written in output buffers
iio: light: vcnl4000: Fix WARN_ON on uninitialized lock
iio: adis16480: select CONFIG_CRC32
drivers: iio: adc: ltc2497: fix LSB shift
iio: adc: qcom-spmi-adc5: Fix the channel name
powerpc sets up PF_KTHREAD and PF_IO_WORKER with a NULL pt_regs, which
from my (arguably very short) checking is not commonly done for other
archs. This is fine, except when PF_IO_WORKER's have been created and
the task does something that causes a coredump to be generated. Then we
get this crash:
Kernel attempted to read user page (160) - exploit attempt? (uid: 1000)
BUG: Kernel NULL pointer dereference on read at 0x00000160
Faulting instruction address: 0xc0000000000c3a60
Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=32 NUMA pSeries
Modules linked in: bochs drm_vram_helper drm_kms_helper xts binfmt_misc ecb ctr syscopyarea sysfillrect cbc sysimgblt drm_ttm_helper aes_generic ttm sg libaes evdev joydev virtio_balloon vmx_crypto gf128mul drm dm_mod fuse loop configfs drm_panel_orientation_quirks ip_tables x_tables autofs4 hid_generic usbhid hid xhci_pci xhci_hcd usbcore usb_common sd_mod
CPU: 1 PID: 1982 Comm: ppc-crash Not tainted 6.3.0-rc2+ #88
Hardware name: IBM pSeries (emulated by qemu) POWER9 (raw) 0x4e1202 0xf000005 of:SLOF,HEAD hv:linux,kvm pSeries
NIP: c0000000000c3a60 LR: c000000000039944 CTR: c0000000000398e0
REGS: c0000000041833b0 TRAP: 0300 Not tainted (6.3.0-rc2+)
MSR: 800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 88082828 XER: 200400f8
...
NIP memcpy_power7+0x200/0x7d0
LR ppr_get+0x64/0xb0
Call Trace:
ppr_get+0x40/0xb0 (unreliable)
__regset_get+0x180/0x1f0
regset_get_alloc+0x64/0x90
elf_core_dump+0xb98/0x1b60
do_coredump+0x1c34/0x24a0
get_signal+0x71c/0x1410
do_notify_resume+0x140/0x6f0
interrupt_exit_user_prepare_main+0x29c/0x320
interrupt_exit_user_prepare+0x6c/0xa0
interrupt_return_srr_user+0x8/0x138
Because ppr_get() is trying to copy from a PF_IO_WORKER with a NULL
pt_regs.
Check for a valid pt_regs in both ppc_get/ppr_set, and return an error
if not set. The actual error value doesn't seem to be important here, so
just pick -EINVAL.
Fixes: fa439810cc ("powerpc/ptrace: Enable support for NT_PPPC_TAR, NT_PPC_PPR, NT_PPC_DSCR")
Cc: stable@vger.kernel.org # v4.8+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
[mpe: Trim oops in change log, add Fixes & Cc stable]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/d9f63344-fe7c-56ae-b420-4a1a04a2ae4c@kernel.dk
Userspace PROT_NONE ptes set _PAGE_PRIVILEGED, triggering a false
positive debug assertion that __pte_flags_need_flush() is not called
on a kernel mapping.
Detect when it is a userspace PROT_NONE page by checking the required
bits of PAGE_NONE are set, and none of the RWX bits are set.
pte_protnone() is insufficient here because it always returns 0 when
CONFIG_NUMA_BALANCING=n.
Fixes: b11931e9ad ("powerpc/64s: add pte_needs_flush and huge_pmd_needs_flush")
Cc: stable@vger.kernel.org # v6.1+
Reported-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230302225947.81083-1-bgray@linux.ibm.com
Sven Auhagen says:
====================
net: mvpp2: rss fixes
This patch series fixes up some rss problems
in the mvpp2 driver.
The classifier is missing some fragmentation flags,
the parser has the QinQ headers switched and
the PPPoE Layer 4 detecion is not working
correctly.
This is leading to no or bad rss for the default
settings.
====================
Link: https://lore.kernel.org/r/20230325163903.ofefgus43x66as7i@Svens-MacBookPro.local
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
In PPPoE add all IPv4 header option length to the parser
and adjust the L3 and L4 offset accordingly.
Currently the L4 match does not work with PPPoE and
all packets are matched as L3 IP4 OPT.
Fixes: 3f518509de ("ethernet: Add new driver for Marvell Armada 375 network unit")
Signed-off-by: Sven Auhagen <sven.auhagen@voleatech.de>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The mvpp2 parser entry for QinQ has the inner and outer VLAN
in the wrong order.
Fix the problem by swapping them.
Fixes: 3f518509de ("ethernet: Add new driver for Marvell Armada 375 network unit")
Signed-off-by: Sven Auhagen <sven.auhagen@voleatech.de>
Reviewed-by: Marcin Wojtas <mw@semihalf.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Just skip the opcode(BPF_ST | BPF_NOSPEC) in the BPF JIT instead of
failing to JIT the entire program, given LoongArch currently has no
couterpart of a speculation barrier instruction. To verify the issue,
use the ltp testcase as shown below.
Also, Wang says:
I can confirm there's currently no speculation barrier equivalent
on LonogArch. (Loongson says there are builtin mitigations for
Spectre-V1 and V2 on their chips, and AFAIK efforts to port the
exploits to mips/LoongArch have all failed a few years ago.)
Without this patch:
$ ./bpf_prog02
[...]
bpf_common.c:123: TBROK: Failed verification: ??? (524)
[...]
Summary:
passed 0
failed 0
broken 1
skipped 0
warnings 0
With this patch:
$ ./bpf_prog02
[...]
Summary:
passed 0
failed 0
broken 0
skipped 0
warnings 0
Fixes: 5dc615520c ("LoongArch: Add BPF JIT support")
Signed-off-by: George Guo <guodongtai@kylinos.cn>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: WANG Xuerui <git@xen0n.name>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Link: https://lore.kernel.org/bpf/20230328071335.2664966-1-guodongtai@kylinos.cn
While reviewing the udp-iter batching patches, noticed the bpf_iter_tcp
calling sock_put() is incorrect. It should call sock_gen_put instead
because bpf_iter_tcp is iterating the ehash table which has the req sk
and tw sk. This patch replaces all sock_put with sock_gen_put in the
bpf_iter_tcp codepath.
Fixes: 04c7820b77 ("bpf: tcp: Bpf iter batching and lock_sock")
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230328004232.2134233-1-martin.lau@linux.dev
To determine whether the guest has caused an external interruption loop
upon code 20 (external interrupt) intercepts, the ext_new_psw needs to
be inspected to see whether external interrupts are enabled.
Under non-PV, ext_new_psw can simply be taken from guest lowcore. Under
PV, KVM can only access the encrypted guest lowcore and hence the
ext_new_psw must not be taken from guest lowcore.
handle_external_interrupt() incorrectly did that and hence was not able
to reliably tell whether an external interruption loop is happening or
not. False negatives cause spurious failures of my kvm-unit-test
for extint loops[1] under PV.
Since code 20 is only caused under PV if and only if the guest's
ext_new_psw is enabled for external interrupts, false positive detection
of a external interruption loop can not happen.
Fix this issue by instead looking at the guest PSW in the state
description. Since the PSW swap for external interrupt is done by the
ultravisor before the intercept is caused, this reliably tells whether
the guest is enabled for external interrupts in the ext_new_psw.
Also update the comments to explain better what is happening.
[1] https://lore.kernel.org/kvm/20220812062151.1980937-4-nrb@linux.ibm.com/
Signed-off-by: Nico Boehr <nrb@linux.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Fixes: 201ae986ea ("KVM: s390: protvirt: Implement interrupt injection")
Link: https://lore.kernel.org/r/20230213085520.100756-2-nrb@linux.ibm.com
Message-Id: <20230213085520.100756-2-nrb@linux.ibm.com>
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Marc Kleine-Budde says:
====================
pull-request: can 2023-03-27
Oleksij Rempel and Hillf Danton contribute a patch for the CAN J1939
protocol that prevents a potential deadlock in j1939_sk_errqueue().
Ivan Orlov fixes an uninit-value in the CAN BCM protocol in the
bcm_tx_setup() function.
* tag 'linux-can-fixes-for-6.3-20230327' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
can: bcm: bcm_tx_setup(): fix KMSAN uninit-value in vfs_write
can: j1939: prevent deadlock by moving j1939_sk_errqueue()
====================
Link: https://lore.kernel.org/r/20230327124807.1157134-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This fixes mkfs/mount/check failures due to race with systemd-udevd
scan.
During the device scan initiated by systemd-udevd, other user space
EXCL operations such as mkfs, mount, or check may get blocked and result
in a "Device or resource busy" error. This is because the device
scan process opens the device with the EXCL flag in the kernel.
Two reports were received:
- btrfs/179 test case, where the fsck command failed with the -EBUSY
error
- LTP pwritev03 test case, where mkfs.vfs failed with
the -EBUSY error, when mkfs.vfs tried to overwrite old btrfs filesystem
on the device.
In both cases, fsck and mkfs (respectively) were racing with a
systemd-udevd device scan, and systemd-udevd won, resulting in the
-EBUSY error for fsck and mkfs.
Reproducing the problem has been difficult because there is a very
small window during which these userspace threads can race to
acquire the exclusive device open. Even on the system where the problem
was observed, the problem occurrences were anywhere between 10 to 400
iterations and chances of reproducing decreases with debug printk()s.
However, an exclusive device open is unnecessary for the scan process,
as there are no write operations on the device during scan. Furthermore,
during the mount process, the superblock is re-read in the below
function call chain:
btrfs_mount_root
btrfs_open_devices
open_fs_devices
btrfs_open_one_device
btrfs_get_bdev_and_sb
So, to fix this issue, removes the FMODE_EXCL flag from the scan
operation, and add a comment.
The case where mkfs may still write to the device and a scan is running,
the btrfs signature is not written at that time so scan will not
recognize such device.
Reported-by: Sherry Yang <sherry.yang@oracle.com>
Reported-by: kernel test robot <oliver.sang@intel.com>
Link: https://lore.kernel.org/oe-lkp/202303170839.fdf23068-oliver.sang@intel.com
CC: stable@vger.kernel.org # 5.4+
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
LOOP_CONFIGURE is, as far as I understand it, supposed to be a way to
combine LOOP_SET_FD and LOOP_SET_STATUS64 into a single syscall. When
using LOOP_SET_FD+LOOP_SET_STATUS64, a single uevent would be sent for
each partition found on the loop device after the second ioctl(), but
when using LOOP_CONFIGURE, no such uevent was being sent.
In the old setup, uevents are disabled for LOOP_SET_FD, but not for
LOOP_SET_STATUS64. This makes sense, as it prevents uevents being
sent for a partially configured device during LOOP_SET_FD - they're
only sent at the end of LOOP_SET_STATUS64. But for LOOP_CONFIGURE,
uevents were disabled for the entire operation, so that final
notification was never issued. To fix this, reduce the critical
section to exclude the loop_reread_partitions() call, which causes
the uevents to be issued, to after uevents are re-enabled, matching
the behaviour of the LOOP_SET_FD+LOOP_SET_STATUS64 combination.
I noticed this because Busybox's losetup program recently changed from
using LOOP_SET_FD+LOOP_SET_STATUS64 to LOOP_CONFIGURE, and this broke
my setup, for which I want a notification from the kernel any time a
new partition becomes available.
Signed-off-by: Alyssa Ross <hi@alyssa.is>
[hch: reduced the critical section]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Fixes: 3448914e8c ("loop: Add LOOP_CONFIGURE ioctl")
Link: https://lore.kernel.org/r/20230320125430.55367-1-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull kvm fixes from Paolo Bonzini:
"RISC-V:
- Fix VM hang in case of timer delta being zero
ARM:
- MMU fixes:
- Read the MMU notifier seq before dropping the mmap lock to guard
against reading a potentially stale VMA
- Disable interrupts when walking user page tables to protect
against the page table being freed
- Read the MTE permissions for the VMA within the mmap lock
critical section, avoiding the use of a potentally stale VMA
pointer
- vPMU fixes:
- Return the sum of the current perf event value and PMC snapshot
for reads from userspace
- Don't save the value of guest writes to PMCR_EL0.{C,P}, which
could otherwise lead to userspace erroneously resetting the vPMU
during VM save/restore"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
riscv/kvm: Fix VM hang in case of timer delta being zero.
KVM: arm64: Check for kvm_vma_mte_allowed in the critical section
KVM: arm64: Disable interrupts while walking userspace PTs
KVM: arm64: Retry fault if vma_lookup() results become invalid
KVM: arm64: PMU: Don't save PMCR_EL0.{C,P} for the vCPU
KVM: arm64: PMU: Fix GET_ONE_REG for vPMC regs to return the current value
For ACPI drivers that provide a ->notify() callback and set
ACPI_DRIVER_ALL_NOTIFY_EVENTS in their flags, that callback can be
invoked while either the ->add() or the ->remove() callback is running
without any synchronization at the bus type level which is counter to
the common-sense expectation that notification handling should only be
enabled when the driver is actually bound to the device. As a result,
if the driver is not careful enough, it's ->notify() callback may crash
when it is invoked too early or too late [1].
This issue has been amplified by commit d6fb6ee182 ("ACPI: bus: Drop
driver member of struct acpi_device") that made acpi_bus_notify() check
for the presence of the driver and its ->notify() callback directly
instead of using an extra driver pointer that was only set and cleared
by the bus type code, but it was present before that commit although
it was harder to reproduce then.
It can be addressed by using the observation that
acpi_device_install_notify_handler() can be modified to install the
handler for all types of events when ACPI_DRIVER_ALL_NOTIFY_EVENTS is
set in the driver flags, in which case acpi_bus_notify() will not need
to invoke the driver's ->notify() callback any more and that callback
will only be invoked after acpi_device_install_notify_handler() has run
and before acpi_device_remove_notify_handler() runs, which implies the
correct ordering with respect to the other ACPI driver callbacks.
Modify the code accordingly and while at it, drop two redundant local
variables from acpi_bus_notify() and turn its description comment into
a proper kerneldoc one.
Fixes: d6fb6ee182 ("ACPI: bus: Drop driver member of struct acpi_device")
Link: https://lore.kernel.org/linux-acpi/9f6cba7a8a57e5a687c934e8e406e28c.squirrel@mail.panix.com # [1]
Reported-by: Pierre Asselin <pa@panix.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Pierre Asselin <pa@panix.com>
Pull x86 platform driver fixes from Hans de Goede:
- Intel tpmi/vsec fixes
- think-lmi fixes
- two other small fixes / hw-id additions
* tag 'platform-drivers-x86-v6.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/surface: aggregator: Add missing fwnode_handle_put()
platform/x86: think-lmi: Add possible_values for ThinkStation
platform/x86: think-lmi: only display possible_values if available
platform/x86: think-lmi: use correct possible_values delimiters
platform/x86: think-lmi: add missing type attribute
platform/x86 (gigabyte-wmi): Add support for A320M-S2H V2
platform/x86/intel: tpmi: Revise the comment of intel_vsec_add_aux
platform/x86/intel: tpmi: Fix double free in tpmi_create_device()
platform/x86/intel: vsec: Fix a memory leak in intel_vsec_add_aux
Return -EFAULT if put_user() for the PTRACE_GET_LAST_BREAK
request fails, instead of silently ignoring it.
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
The device release callback function invoked to release the matrix device
uses the dev_get_drvdata(device *dev) function to retrieve the
pointer to the vfio_matrix_dev object in order to free its storage. The
problem is, this object is not stored as drvdata with the device; since the
kfree function will accept a NULL pointer, the memory for the
vfio_matrix_dev object is never freed.
Since the device being released is contained within the vfio_matrix_dev
object, the container_of macro will be used to retrieve its pointer.
Fixes: 1fde573413 ("s390: vfio-ap: base implementation of VFIO AP device driver")
Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Harald Freudenberger <freude@linux.ibm.com>
Link: https://lore.kernel.org/r/20230320150447.34557-1-akrowiak@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Don't report an error code to L1 when synthesizing a nested VM-Exit and
L2 is in Real Mode. Per Intel's SDM, regarding the error code valid bit:
This bit is always 0 if the VM exit occurred while the logical processor
was in real-address mode (CR0.PE=0).
The bug was introduced by a recent fix for AMD's Paged Real Mode, which
moved the error code suppression from the common "queue exception" path
to the "inject exception" path, but missed VMX's "synthesize VM-Exit"
path.
Fixes: b97f074583 ("KVM: x86: determine if an exception has an error code only when injecting it.")
Cc: stable@vger.kernel.org
Cc: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230322143300.2209476-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When injecting an exception into a vCPU in Real Mode, suppress the error
code by clearing the flag that tracks whether the error code is valid, not
by clearing the error code itself. The "typo" was introduced by recent
fix for SVM's funky Paged Real Mode.
Opportunistically hoist the logic above the tracepoint so that the trace
is coherent with respect to what is actually injected (this was also the
behavior prior to the buggy commit).
Fixes: b97f074583 ("KVM: x86: determine if an exception has an error code only when injecting it.")
Cc: stable@vger.kernel.org
Cc: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230322143300.2209476-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Clear vcpu->mmio_needed when injecting an exception from the emulator to
squash a (legitimate) warning about vcpu->mmio_needed being true at the
start of KVM_RUN without a callback being registered to complete the
userspace MMIO exit. Suppressing the MMIO write exit is inarguably wrong
from an architectural perspective, but it is the least awful hack-a-fix
due to shortcomings in KVM's uAPI, not to mention that KVM already
suppresses MMIO writes in this scenario.
Outside of REP string instructions, KVM doesn't provide a way to resume
an instruction at the exact point where it was "interrupted" if said
instruction partially completed before encountering an MMIO access. For
MMIO reads, KVM immediately exits to userspace upon detecting MMIO as
userspace provides the to-be-read value in a buffer, and so KVM can safely
(more or less) restart the instruction from the beginning. When the
emulator re-encounters the MMIO read, KVM will service the MMIO by getting
the value from the buffer instead of exiting to userspace, i.e. KVM won't
put the vCPU into an infinite loop.
On an emulated MMIO write, KVM finishes the instruction before exiting to
userspace, as exiting immediately would ultimately hang the vCPU due to
the aforementioned shortcoming of KVM not being able to resume emulation
in the middle of an instruction.
For the vast majority of _emulated_ instructions, deferring the userspace
exit doesn't cause problems as very few x86 instructions (again ignoring
string operations) generate multiple writes. But for instructions that
generate multiple writes, e.g. PUSHA (multiple pushes onto the stack),
deferring the exit effectively results in only the final write triggering
an exit to userspace. KVM does support multiple MMIO "fragments", but
only for page splits; if an instruction performs multiple distinct MMIO
writes, the number of fragments gets reset when the next MMIO write comes
along and any previous MMIO writes are dropped.
Circling back to the warning, if a deferred MMIO write coincides with an
exception, e.g. in this case a #SS due to PUSHA underflowing the stack
after queueing a write to an MMIO page on a previous push, KVM injects
the exceptions and leaves the deferred MMIO pending without registering a
callback, thus triggering the splat.
Sweep the problem under the proverbial rug as dropping MMIO writes is not
unique to the exception scenario (see above), i.e. instructions like PUSHA
are fundamentally broken with respect to MMIO, and have been since KVM's
inception.
Reported-by: zhangjianguo <zhangjianguo18@huawei.com>
Reported-by: syzbot+760a73552f47a8cd0fd9@syzkaller.appspotmail.com
Reported-by: syzbot+8accb43ddc6bd1f5713a@syzkaller.appspotmail.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230322141220.2206241-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
KVM irqfd based emulation of level-triggered interrupts doesn't work
quite correctly in some cases, particularly in the case of interrupts
that are handled in a Linux guest as oneshot interrupts (IRQF_ONESHOT).
Such an interrupt is acked to the device in its threaded irq handler,
i.e. later than it is acked to the interrupt controller (EOI at the end
of hardirq), not earlier.
Linux keeps such interrupt masked until its threaded handler finishes,
to prevent the EOI from re-asserting an unacknowledged interrupt.
However, with KVM + vfio (or whatever is listening on the resamplefd)
we always notify resamplefd at the EOI, so vfio prematurely unmasks the
host physical IRQ, thus a new physical interrupt is fired in the host.
This extra interrupt in the host is not a problem per se. The problem is
that it is unconditionally queued for injection into the guest, so the
guest sees an extra bogus interrupt. [*]
There are observed at least 2 user-visible issues caused by those
extra erroneous interrupts for a oneshot irq in the guest:
1. System suspend aborted due to a pending wakeup interrupt from
ChromeOS EC (drivers/platform/chrome/cros_ec.c).
2. Annoying "invalid report id data" errors from ELAN0000 touchpad
(drivers/input/mouse/elan_i2c_core.c), flooding the guest dmesg
every time the touchpad is touched.
The core issue here is that by the time when the guest unmasks the IRQ,
the physical IRQ line is no longer asserted (since the guest has
acked the interrupt to the device in the meantime), yet we
unconditionally inject the interrupt queued into the guest by the
previous resampling. So to fix the issue, we need a way to detect that
the IRQ is no longer pending, and cancel the queued interrupt in this
case.
With IOAPIC we are not able to probe the physical IRQ line state
directly (at least not if the underlying physical interrupt controller
is an IOAPIC too), so in this patch we use irqfd resampler for that.
Namely, instead of injecting the queued interrupt, we just notify the
resampler that this interrupt is done. If the IRQ line is actually
already deasserted, we are done. If it is still asserted, a new
interrupt will be shortly triggered through irqfd and injected into the
guest.
In the case if there is no irqfd resampler registered for this IRQ, we
cannot fix the issue, so we keep the existing behavior: immediately
unconditionally inject the queued interrupt.
This patch fixes the issue for x86 IOAPIC only. In the long run, we can
fix it for other irqchips and other architectures too, possibly taking
advantage of reading the physical state of the IRQ line, which is
possible with some other irqchips (e.g. with arm64 GIC, maybe even with
the legacy x86 PIC).
[*] In this description we assume that the interrupt is a physical host
interrupt forwarded to the guest e.g. by vfio. Potentially the same
issue may occur also with a purely virtual interrupt from an
emulated device, e.g. if the guest handles this interrupt, again, as
a oneshot interrupt.
Signed-off-by: Dmytro Maluka <dmy@semihalf.com>
Link: https://lore.kernel.org/kvm/31420943-8c5f-125c-a5ee-d2fde2700083@semihalf.com/
Link: https://lore.kernel.org/lkml/87o7wrug0w.wl-maz@kernel.org/
Message-Id: <20230322204344.50138-3-dmy@semihalf.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
It is useful to be able to do read-only traversal of the list of all the
registered irqfd resamplers without locking the resampler_lock mutex.
In particular, we are going to traverse it to search for a resampler
registered for the given irq of an irqchip, and that will be done with
an irqchip spinlock (ioapic->lock) held, so it is undesirable to lock a
mutex in this context. So turn this list into an RCU list.
For protecting the read side, reuse kvm->irq_srcu which is already used
for protecting a number of irq related things (kvm->irq_routing,
irqfd->resampler->list, kvm->irq_ack_notifier_list,
kvm->arch.mask_notifier_list).
Signed-off-by: Dmytro Maluka <dmy@semihalf.com>
Message-Id: <20230322204344.50138-2-dmy@semihalf.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The Hyper-V "EnlightenedNptTlb" enlightenment is always enabled when KVM
is running on top of Hyper-V and Hyper-V exposes support for it (which
is always). On AMD CPUs this enlightenment results in ASID invalidations
not flushing TLB entries derived from the NPT. To force the underlying
(L0) hypervisor to rebuild its shadow page tables, an explicit hypercall
is needed.
The original KVM implementation of Hyper-V's "EnlightenedNptTlb" on SVM
only added remote TLB flush hooks. This worked out fine for a while, as
sufficient remote TLB flushes where being issued in KVM to mask the
problem. Since v5.17, changes in the TDP code reduced the number of
flushes and the out-of-sync TLB prevents guests from booting
successfully.
Split svm_flush_tlb_current() into separate callbacks for the 3 cases
(guest/all/current), and issue the required Hyper-V hypercall when a
Hyper-V TLB flush is needed. The most important case where the TLB flush
was missing is when loading a new PGD, which is followed by what is now
svm_flush_tlb_current().
Cc: stable@vger.kernel.org # v5.17+
Fixes: 1e0c7d4075 ("KVM: SVM: hyper-v: Remote TLB flush for SVM")
Link: https://lore.kernel.org/lkml/43980946-7bbf-dcef-7e40-af904c456250@linux.microsoft.com/
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Jeremi Piotrowski <jpiotrowski@linux.microsoft.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20230324145233.4585-1-jpiotrowski@linux.microsoft.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
KVM/arm64 fixes for 6.3, part #2
Fixes for a rather interesting set of bugs relating to the MMU:
- Read the MMU notifier seq before dropping the mmap lock to guard
against reading a potentially stale VMA
- Disable interrupts when walking user page tables to protect against
the page table being freed
- Read the MTE permissions for the VMA within the mmap lock critical
section, avoiding the use of a potentally stale VMA pointer
Additionally, some fixes targeting the vPMU:
- Return the sum of the current perf event value and PMC snapshot for
reads from userspace
- Don't save the value of guest writes to PMCR_EL0.{C,P}, which could
otherwise lead to userspace erroneously resetting the vPMU during VM
save/restore
Syzkaller reported the following issue:
=====================================================
BUG: KMSAN: uninit-value in aio_rw_done fs/aio.c:1520 [inline]
BUG: KMSAN: uninit-value in aio_write+0x899/0x950 fs/aio.c:1600
aio_rw_done fs/aio.c:1520 [inline]
aio_write+0x899/0x950 fs/aio.c:1600
io_submit_one+0x1d1c/0x3bf0 fs/aio.c:2019
__do_sys_io_submit fs/aio.c:2078 [inline]
__se_sys_io_submit+0x293/0x770 fs/aio.c:2048
__x64_sys_io_submit+0x92/0xd0 fs/aio.c:2048
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
Uninit was created at:
slab_post_alloc_hook mm/slab.h:766 [inline]
slab_alloc_node mm/slub.c:3452 [inline]
__kmem_cache_alloc_node+0x71f/0xce0 mm/slub.c:3491
__do_kmalloc_node mm/slab_common.c:967 [inline]
__kmalloc+0x11d/0x3b0 mm/slab_common.c:981
kmalloc_array include/linux/slab.h:636 [inline]
bcm_tx_setup+0x80e/0x29d0 net/can/bcm.c:930
bcm_sendmsg+0x3a2/0xce0 net/can/bcm.c:1351
sock_sendmsg_nosec net/socket.c:714 [inline]
sock_sendmsg net/socket.c:734 [inline]
sock_write_iter+0x495/0x5e0 net/socket.c:1108
call_write_iter include/linux/fs.h:2189 [inline]
aio_write+0x63a/0x950 fs/aio.c:1600
io_submit_one+0x1d1c/0x3bf0 fs/aio.c:2019
__do_sys_io_submit fs/aio.c:2078 [inline]
__se_sys_io_submit+0x293/0x770 fs/aio.c:2048
__x64_sys_io_submit+0x92/0xd0 fs/aio.c:2048
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
CPU: 1 PID: 5034 Comm: syz-executor350 Not tainted 6.2.0-rc6-syzkaller-80422-geda666ff2276 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/12/2023
=====================================================
We can follow the call chain and find that 'bcm_tx_setup' function
calls 'memcpy_from_msg' to copy some content to the newly allocated
frame of 'op->frames'. After that the 'len' field of copied structure
being compared with some constant value (64 or 8). However, if
'memcpy_from_msg' returns an error, we will compare some uninitialized
memory. This triggers 'uninit-value' issue.
This patch will add 'memcpy_from_msg' possible errors processing to
avoid uninit-value issue.
Tested via syzkaller
Reported-by: syzbot+c9bfd85eca611ebf5db1@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?id=47f897f8ad958bbde5790ebf389b5e7e0a345089
Signed-off-by: Ivan Orlov <ivan.orlov0322@gmail.com>
Fixes: 6f3b911d5f ("can: bcm: add support for CAN FD frames")
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://lore.kernel.org/all/20230314120445.12407-1-ivan.orlov0322@gmail.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
For platforms with Alder Lake PCH (Alder Lake S and Raptor Lake S) the
slp_s0_residency attribute has been reporting the wrong value. Unlike other
platforms, ADL PCH does not have a counter for the time that the SLP_S0
signal was asserted. Instead, firmware uses the aggregate of the Low Power
Mode (LPM) substate counters as the S0ix value. Since the LPM counters run
at a different frequency, this lead to misreporting of the S0ix time.
Add a check for Alder Lake PCH and adjust the frequency accordingly when
display slp_s0_residency.
Fixes: bbab31101f ("platform/x86/intel: pmc/core: Add Alderlake support to pmc core driver")
Signed-off-by: Rajvi Jingar <rajvi.jingar@linux.intel.com>
Signed-off-by: David E. Box <david.e.box@linux.intel.com>
Reviewed-by: Rajneesh Bhardwaj <irenic.rajneesh@gmail.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Link: https://lore.kernel.org/r/20230320212029.3154407-1-david.e.box@linux.intel.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Currently i915_gem_object_is_framebuffer() doesn't treat the
BO containing the framebuffer's DPT as a framebuffer itself.
This means eg. that the shrinker can evict the DPT BO while
leaving the actual FB BO bound, when the DPT is allocated
from regular shmem.
That causes an immediate oops during hibernate as we
try to rewrite the PTEs inside the already evicted
DPT obj.
TODO: presumably this might also be the reason for the
DPT related display faults under heavy memory pressure,
but I'm still not sure how that would happen as the object
should be pinned by intel_dpt_pin() while in active use by
the display engine...
Cc: stable@vger.kernel.org
Cc: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Fixes: 0dc987b699 ("drm/i915/display: Add smem fallback allocation for dpt")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230320090522.9909-2-ville.syrjala@linux.intel.com
Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
(cherry picked from commit 779cb5ba64)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Keeping DC states enabled is incompatible with the _noarm()/_arm()
split we use for writing pipe/plane registers. When DC5 and PSR
are enabled, all pipe/plane registers effectively become self-arming
on account of DC5 exit arming the update, and PSR exit latching it.
What probably saves us most of the time is that (with PIPE_MISC[21]=0)
all pipe register writes themselves trigger PSR exit, and then
we don't re-enter PSR until the idle frame count has elapsed.
So it may be that the PSR exit happens already before we've
updated the state too much.
Also the PSR1 panel (at least on this KBL) seems to discard the first
frame we trasmit, presumably still scanning out from its internal
framebuffer at that point. So only the second frame we transmit is
actually visible. But I suppose that could also be panel specific
behaviour. I haven't checked out how other PSR panels behave, nor
did I bother to check what the eDP spec has to say about this.
And since this really is all about DC states, let's switch from
the MODESET domain to the DC_OFF domain. Functionally they are
100% identical. We should probably remove the MODESET domain...
And for good measure let's toss in an assert to the place where
we do the _noarm() register writes to make sure DC states are
in fact off.
v2: Just use intel_display_power_is_enabled() (Imre)
Cc: <stable@vger.kernel.org> #v5.17+
Cc: Manasi Navare <navaremanasi@google.com>
Cc: Drew Davenport <ddavenport@chromium.org>
Cc: Jouni Högander <jouni.hogander@intel.com>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Fixes: d13dde4495 ("drm/i915: Split pipe+output CSC programming to noarm+arm pair")
Fixes: f8a005eb89 ("drm/i915: Optimize icl+ universal plane programming")
Fixes: 890b6ec4a5 ("drm/i915: Split skl+ plane update into noarm+arm pair")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230320183532.17727-1-ville.syrjala@linux.intel.com
(cherry picked from commit 41b4c7fe72)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
SKL/GLK CSC unit suffers from a nasty issue where a CSC
coeff/offset register read or write between DC5 exit and
PSR exit will undo the CSC arming performed by DMC, and
then during PSR exit the hardware will latch zeroes into
the active CSC registers. This causes any plane going
through the CSC to output all black.
We can sidestep the issue by making sure the PSR exit has
already actually happened before we touch the CSC coeff/offset
registers. Easiest way to guarantee that is to just move the
CSC programming back into the .color_commir_arm() as we force
a PSR exit (and crucially wait for it to actually happen)
prior to touching the arming registers.
When PSR (and thus also DC states) are disabled we don't
have anything to worry about, so we can keep using the
more optional _noarm() hook for writing the CSC registers.
Cc: <stable@vger.kernel.org> #v5.19+
Cc: Manasi Navare <navaremanasi@google.com>
Cc: Drew Davenport <ddavenport@chromium.org>
Cc: Imre Deak <imre.deak@intel.com>
Cc: Jouni Högander <jouni.hogander@intel.com>
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/8283
Fixes: d13dde4495 ("drm/i915: Split pipe+output CSC programming to noarm+arm pair")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230320095438.17328-3-ville.syrjala@linux.intel.com
Reviewed-by: Imre Deak <imre.deak@intel.com>
(cherry picked from commit 80a892a4c2)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Expose intel_rps_read_actual_frequency_fw to read the actual freq without
taking forcewake for use by PMU. The code is refactored to use a common set
of functions across sysfs and PMU. Using common functions with sysfs in PMU
solves the issues of missing support for MTL and missing support for older
generations (prior to Gen6). It also future proofs the PMU where sometimes
code has been updated for sysfs and PMU has been missed.
v2: Remove runtime_pm_if_in_use from read_actual_frequency_fw (Tvrtko)
v3: (Tvrtko)
- Remove goto in __read_cagf
- Unexport intel_rps_get_cagf and intel_rps_read_punit_req
Fixes: 22009b6dad ("drm/i915/mtl: Modify CAGF functions for MTL")
Link: https://gitlab.freedesktop.org/drm/intel/-/issues/8280
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230316004800.2539753-1-ashutosh.dixit@intel.com
(cherry picked from commit 44df42e661)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
The blamed commit has introduced the following tests to
dwmac4_add_hw_vlan_rx_fltr(), called from stmmac_vlan_rx_add_vid():
if (hw->promisc) {
netdev_err(dev,
"Adding VLAN in promisc mode not supported\n");
return -EPERM;
}
"VLAN promiscuous" mode is keyed in this driver to IFF_PROMISC, and so,
vlan_vid_add() and vlan_vid_del() calls cannot take place in IFF_PROMISC
mode. I have the following 2 arguments that this restriction is.... hm,
how shall I put it nicely... unproductive :)
First, take the case of a Linux bridge. If the kernel is compiled with
CONFIG_BRIDGE_VLAN_FILTERING=y, then this bridge shall have a VLAN
database. The bridge shall try to call vlan_add_vid() on its bridge
ports for each VLAN in the VLAN table. It will do this irrespectively of
whether that port is *currently* VLAN-aware or not. So it will do this
even when the bridge was created with vlan_filtering 0.
But the Linux bridge, in VLAN-unaware mode, configures its ports in
promiscuous (IFF_PROMISC) mode, so that they accept packets with any
MAC DA (a switch must do this in order to forward those packets which
are not directly targeted to its MAC address).
As a result, the stmmac driver does not work as a bridge port, when the
kernel is compiled with CONFIG_BRIDGE_VLAN_FILTERING=y.
$ ip link add br0 type bridge && ip link set br0 up
$ ip link set eth0 master br0 && ip link set eth0 up
[ 2333.943296] br0: port 1(eth0) entered blocking state
[ 2333.943381] br0: port 1(eth0) entered disabled state
[ 2333.943782] device eth0 entered promiscuous mode
[ 2333.944080] 4033c000.ethernet eth0: Adding VLAN in promisc mode not supported
[ 2333.976509] 4033c000.ethernet eth0: failed to initialize vlan filtering on this port
RTNETLINK answers: Operation not permitted
Secondly, take the case of stmmac as DSA master. Some switch tagging
protocols are based on 802.1Q VLANs (tag_sja1105.c), and as such,
tag_8021q.c uses vlan_vid_add() to work with VLAN-filtering DSA masters.
But also, when a DSA port becomes promiscuous (for example when it joins
a bridge), the DSA framework also makes the DSA master promiscuous.
Moreover, for every VLAN that a DSA switch sends to the CPU, DSA also
programs a VLAN filter on the DSA master, because if the the DSA switch
uses a tail tag, then the hardware frame parser of the DSA master will
see VLAN as VLAN, and might filter them out, for being unknown.
Due to the above 2 reasons, my belief is that the stmmac driver does not
get to choose to not accept vlan_vid_add() calls while IFF_PROMISC is
enabled, because the 2 are completely independent and there are code
paths in the network stack which directly lead to this situation
occurring, without the user's direct input.
In fact, my belief is that "VLAN promiscuous" mode should have never
been keyed on IFF_PROMISC in the first place, but rather, on the
NETIF_F_HW_VLAN_CTAG_FILTER feature flag which can be toggled by the
user through ethtool -k, when present in netdev->hw_features.
In the stmmac driver, NETIF_F_HW_VLAN_CTAG_FILTER is only present in
"features", making this feature "on [fixed]".
I have this belief because I am unaware of any definition of promiscuity
which implies having an effect on anything other than MAC DA (therefore
not VLAN). However, I seem to be rather alone in having this opinion,
looking back at the disagreements from this discussion:
https://lore.kernel.org/netdev/20201110153958.ci5ekor3o2ekg3ky@ipetronik.com/
In any case, to remove the vlan_vid_add() dependency on !IFF_PROMISC,
one would need to remove the check and see what fails. I guess the test
was there because of the way in which dwmac4_vlan_promisc_enable() is
implemented.
For context, the dwmac4 supports Perfect Filtering for a limited number
of VLANs - dwmac4_get_num_vlan(), priv->hw->num_vlan, with a fallback on
Hash Filtering - priv->dma_cap.vlhash - see stmmac_vlan_update(), also
visible in cat /sys/kernel/debug/stmmaceth/eth0/dma_cap | grep 'VLAN
Hash Filtering'.
The perfect filtering is based on MAC_VLAN_Tag_Filter/MAC_VLAN_Tag_Data
registers, accessed in the driver through dwmac4_write_vlan_filter().
The hash filtering is based on the MAC_VLAN_Hash_Table register, named
GMAC_VLAN_HASH_TABLE in the driver and accessed by dwmac4_update_vlan_hash().
The control bit for enabling hash filtering is GMAC_VLAN_VTHM
(MAC_VLAN_Tag_Ctrl bit VTHM: VLAN Tag Hash Table Match Enable).
Now, the description of dwmac4_vlan_promisc_enable() is that it iterates
through the driver's cache of perfect filter entries (hw->vlan_filter[i],
added by dwmac4_add_hw_vlan_rx_fltr()), and evicts them from hardware by
unsetting their GMAC_VLAN_TAG_DATA_VEN (MAC_VLAN_Tag_Data bit VEN - VLAN
Tag Enable) bit. Then it unsets the GMAC_VLAN_VTHM bit, which disables
hash matching.
This leaves the MAC, according to table "VLAN Match Status" from the
documentation, to always enter these data paths:
VID |VLAN Perfect Filter |VTHM Bit |VLAN Hash Filter |Final VLAN Match
|Match Result | |Match Result |Status
-------|--------------------|---------|-----------------|----------------
VID!=0 |Fail |0 |don't care |Pass
So, dwmac4_vlan_promisc_enable() does its job, but by unsetting
GMAC_VLAN_VTHM, it conflicts with the other code path which controls
this bit: dwmac4_update_vlan_hash(), called through stmmac_update_vlan_hash()
from stmmac_vlan_rx_add_vid() and from stmmac_vlan_rx_kill_vid().
This is, I guess, why dwmac4_add_hw_vlan_rx_fltr() is not allowed to run
after dwmac4_vlan_promisc_enable() has unset GMAC_VLAN_VTHM: because if
it did, then dwmac4_update_vlan_hash() would set GMAC_VLAN_VTHM again,
breaking the "VLAN promiscuity".
It turns out that dwmac4_vlan_promisc_enable() is way too complicated
for what needs to be done. The MAC_Packet_Filter register also has the
VTFE bit (VLAN Tag Filter Enable), which simply controls whether VLAN
tagged packets which don't match the filtering tables (either perfect or
hash) are dropped or not. At the moment, this driver unconditionally
sets GMAC_PACKET_FILTER_VTFE if NETIF_F_HW_VLAN_CTAG_FILTER was detected
through the priv->dma_cap.vlhash capability bits of the device, in
stmmac_dvr_probe().
I would suggest deleting the unnecessarily complex logic from
dwmac4_vlan_promisc_enable(), and simply unsetting GMAC_PACKET_FILTER_VTFE
when becoming IFF_PROMISC, which has the same effect of allowing packets
with any VLAN tags, but has the additional benefit of being able to run
concurrently with stmmac_vlan_rx_add_vid() and stmmac_vlan_rx_kill_vid().
As much as I believe that the VTFE bit should have been exclusively
controlled by NETIF_F_HW_VLAN_CTAG_FILTER through ethtool, and not by
IFF_PROMISC, changing that is not a punctual fix to the problem, and it
would probably break the VFFQ feature added by the later commit
e0f9956a38 ("net: stmmac: Add option for VLAN filter fail queue
enable"). From the commit description, VFFQ needs IFF_PROMISC=on and
VTFE=off in order to work (and this change respects that). But if VTFE
was changed to be controlled through ethtool -k, then a user-visible
change would have been introduced in Intel's scripts (a need to run
"ethtool -k eth0 rx-vlan-filter off" which did not exist before).
The patch was tested with this set of commands:
ip link set eth0 up
ip link add link eth0 name eth0.100 type vlan id 100
ip addr add 192.168.100.2/24 dev eth0.100 && ip link set eth0.100 up
ip link set eth0 promisc on
ip link add link eth0 name eth0.101 type vlan id 101
ip addr add 192.168.101.2/24 dev eth0.101 && ip link set eth0.101 up
ip link set eth0 promisc off
ping -c 5 192.168.100.1
ping -c 5 192.168.101.1
ip link set eth0 promisc on
ping -c 5 192.168.100.1
ping -c 5 192.168.101.1
ip link del eth0.100
ip link del eth0.101
# Wait for VLAN-tagged pings from the other end...
# Check with "tcpdump -i eth0 -e -n -p" and we should see them
ip link set eth0 promisc off
# Wait for VLAN-tagged pings from the other end...
# Check with "tcpdump -i eth0 -e -n -p" and we shouldn't see them
# anymore, but remove the "-p" argument from tcpdump and they're there.
Fixes: c89f44ff10 ("net: stmmac: Add support for VLAN promiscuous mode")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This commit addresses a deadlock situation that can occur in certain
scenarios, such as when running data TP/ETP transfer and subscribing to
the error queue while receiving a net down event. The deadlock involves
locks in the following order:
3
j1939_session_list_lock -> active_session_list_lock
j1939_session_activate
...
j1939_sk_queue_activate_next -> sk_session_queue_lock
...
j1939_xtp_rx_eoma_one
2
j1939_sk_queue_drop_all -> sk_session_queue_lock
...
j1939_sk_netdev_event_netdown -> j1939_socks_lock
j1939_netdev_notify
1
j1939_sk_errqueue -> j1939_socks_lock
__j1939_session_cancel -> active_session_list_lock
j1939_tp_rxtimer
CPU0 CPU1
---- ----
lock(&priv->active_session_list_lock);
lock(&jsk->sk_session_queue_lock);
lock(&priv->active_session_list_lock);
lock(&priv->j1939_socks_lock);
The solution implemented in this commit is to move the
j1939_sk_errqueue() call out of the active_session_list_lock context,
thus preventing the deadlock situation.
Reported-by: syzbot+ee1cd780f69483a8616b@syzkaller.appspotmail.com
Fixes: 5b9272e93f ("can: j1939: extend UAPI to notify about RX status")
Co-developed-by: Hillf Danton <hdanton@sina.com>
Signed-off-by: Hillf Danton <hdanton@sina.com>
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://lore.kernel.org/all/20230324130141.2132787-1-o.rempel@pengutronix.de
Cc: stable@vger.kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Implement phy_read16() and phy_write16() ops for B53 MMAP to avoid accessing
B53_PORT_MII_PAGE registers which hangs the device.
This access should be done through the MDIO Mux bus controller.
Signed-off-by: Álvaro Fernández Rojas <noltari@gmail.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The KSZ9131RNX incorrectly shows EEE capabilities in its registers.
Although the "EEE control and capability 1" (Register 3.20) is set to 0,
indicating no EEE support, the "EEE advertisement 1" (Register 7.60) is
set to 0x6, advertising EEE support for 1000BaseT/Full and
100BaseT/Full.
This inconsistency causes PHYlib to assume there is no EEE support,
preventing control over EEE advertisement, which is enabled by default.
This patch resolves the issue by utilizing the ksz9477_get_features()
function to correctly set the EEE capabilities for the KSZ9131RNX. This
adjustment allows proper control over EEE advertisement and ensures
accurate representation of the device's capabilities.
Fixes: 8b68710a31 ("net: phy: start using genphy_c45_ethtool_get/set_eee()")
Reported-by: Marek Vasut <marex@denx.de>
Tested-by: Marek Vasut <marex@denx.de>
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
pkt_list_lock was used before commit 71dc9ec9ac ("virtio/vsock:
replace virtio_vsock_pkt with sk_buff") to protect the packet queue.
After that commit we switched to sk_buff and we are using
sk_buff_head.lock in almost every place to protect the packet queue
except in vsock_loopback_work() when we call skb_queue_splice_init().
As reported by syzbot, this caused unlocked concurrent access to the
packet queue between vsock_loopback_work() and
vsock_loopback_cancel_pkt() since it is not holding pkt_list_lock.
With the introduction of sk_buff_head, pkt_list_lock is redundant and
can cause confusion, so let's remove it and use sk_buff_head.lock
everywhere to protect the packet queue access.
Fixes: 71dc9ec9ac ("virtio/vsock: replace virtio_vsock_pkt with sk_buff")
Cc: bobby.eshleman@bytedance.com
Reported-and-tested-by: syzbot+befff0a9536049e7902e@syzkaller.appspotmail.com
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
Reviewed-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King says:
====================
Constify a few sfp/phy fwnodes
This series constifies a bunch of fwnode_handle pointers that are only
used to refer to but not modify the contents of the fwnode structures.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The failover txq is inited as 16 queues.
when a packet is transmitted from the failover device firstly,
the failover device will select the queue which is returned from
the primary device if the primary device is UP and running.
If the primary device txq is bigger than the default 16,
it can lead to the following warning:
eth0 selects TX queue 18, but real number of TX queues is 16
The warning backtrace is:
[ 32.146376] CPU: 18 PID: 9134 Comm: chronyd Tainted: G E 6.2.8-1.el7.centos.x86_64 #1
[ 32.147175] Hardware name: Red Hat KVM, BIOS 1.10.2-3.el7_4.1 04/01/2014
[ 32.147730] Call Trace:
[ 32.147971] <TASK>
[ 32.148183] dump_stack_lvl+0x48/0x70
[ 32.148514] dump_stack+0x10/0x20
[ 32.148820] netdev_core_pick_tx+0xb1/0xe0
[ 32.149180] __dev_queue_xmit+0x529/0xcf0
[ 32.149533] ? __check_object_size.part.0+0x21c/0x2c0
[ 32.149967] ip_finish_output2+0x278/0x560
[ 32.150327] __ip_finish_output+0x1fe/0x2f0
[ 32.150690] ip_finish_output+0x2a/0xd0
[ 32.151032] ip_output+0x7a/0x110
[ 32.151337] ? __pfx_ip_finish_output+0x10/0x10
[ 32.151733] ip_local_out+0x5e/0x70
[ 32.152054] ip_send_skb+0x19/0x50
[ 32.152366] udp_send_skb.isra.0+0x163/0x3a0
[ 32.152736] udp_sendmsg+0xba8/0xec0
[ 32.153060] ? __folio_memcg_unlock+0x25/0x60
[ 32.153445] ? __pfx_ip_generic_getfrag+0x10/0x10
[ 32.153854] ? sock_has_perm+0x85/0xa0
[ 32.154190] inet_sendmsg+0x6d/0x80
[ 32.154508] ? inet_sendmsg+0x6d/0x80
[ 32.154838] sock_sendmsg+0x62/0x70
[ 32.155152] ____sys_sendmsg+0x134/0x290
[ 32.155499] ___sys_sendmsg+0x81/0xc0
[ 32.155828] ? _get_random_bytes.part.0+0x79/0x1a0
[ 32.156240] ? ip4_datagram_release_cb+0x5f/0x1e0
[ 32.156649] ? get_random_u16+0x69/0xf0
[ 32.156989] ? __fget_light+0xcf/0x110
[ 32.157326] __sys_sendmmsg+0xc4/0x210
[ 32.157657] ? __sys_connect+0xb7/0xe0
[ 32.157995] ? __audit_syscall_entry+0xce/0x140
[ 32.158388] ? syscall_trace_enter.isra.0+0x12c/0x1a0
[ 32.158820] __x64_sys_sendmmsg+0x24/0x30
[ 32.159171] do_syscall_64+0x38/0x90
[ 32.159493] entry_SYSCALL_64_after_hwframe+0x72/0xdc
Fix that by reducing txq number as the non-existent primary-dev does.
Fixes: cfc80d9a11 ("net: Introduce net_failover driver")
Signed-off-by: Faicker Mo <faicker.mo@ucloud.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
0x238 is the offset for PANIC0_THRES, so the length needs to be greater
than that. Use the size from memory map from reference manual.
Fixes: 94e6197dad ("arm64: dts: imx8mp: Add LCDIF2 & LDB nodes")
Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
KASAN reported the following issue:
[ 36.825817][ T5923] BUG: KASAN: wild-memory-access in v9fs_get_acl+0x1a4/0x390
[ 36.827479][ T5923] Write of size 4 at addr 9fffeb37f97f1c00 by task syz-executor798/5923
[ 36.829303][ T5923]
[ 36.829846][ T5923] CPU: 0 PID: 5923 Comm: syz-executor798 Not tainted 6.2.0-syzkaller-18302-g596b6b709632 #0
[ 36.832110][ T5923] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/21/2023
[ 36.834464][ T5923] Call trace:
[ 36.835196][ T5923] dump_backtrace+0x1c8/0x1f4
[ 36.836229][ T5923] show_stack+0x2c/0x3c
[ 36.837100][ T5923] dump_stack_lvl+0xd0/0x124
[ 36.838103][ T5923] print_report+0xe4/0x4c0
[ 36.839068][ T5923] kasan_report+0xd4/0x130
[ 36.840052][ T5923] kasan_check_range+0x264/0x2a4
[ 36.841199][ T5923] __kasan_check_write+0x2c/0x3c
[ 36.842216][ T5923] v9fs_get_acl+0x1a4/0x390
[ 36.843232][ T5923] v9fs_mount+0x77c/0xa5c
[ 36.844163][ T5923] legacy_get_tree+0xd4/0x16c
[ 36.845173][ T5923] vfs_get_tree+0x90/0x274
[ 36.846137][ T5923] do_new_mount+0x25c/0x8c8
[ 36.847066][ T5923] path_mount+0x590/0xe58
[ 36.848147][ T5923] __arm64_sys_mount+0x45c/0x594
[ 36.849273][ T5923] invoke_syscall+0x98/0x2c0
[ 36.850421][ T5923] el0_svc_common+0x138/0x258
[ 36.851397][ T5923] do_el0_svc+0x64/0x198
[ 36.852398][ T5923] el0_svc+0x58/0x168
[ 36.853224][ T5923] el0t_64_sync_handler+0x84/0xf0
[ 36.854293][ T5923] el0t_64_sync+0x190/0x194
Calling '__v9fs_get_acl' method in 'v9fs_get_acl' creates the
following chain of function calls:
__v9fs_get_acl
v9fs_fid_get_acl
v9fs_fid_xattr_get
p9_client_xattrwalk
Function p9_client_xattrwalk accepts a pointer to u64-typed
variable attr_size and puts some u64 value into it. However,
after the executing the p9_client_xattrwalk, in some circumstances
we assign the value of u64-typed variable 'attr_size' to the
variable 'retval', which we will return. However, the type of
'retval' is ssize_t, and if the value of attr_size is larger
than SSIZE_MAX, we will face the signed type overflow. If the
overflow occurs, the result of v9fs_fid_xattr_get may be
negative, but not classified as an error. When we try to allocate
an acl with 'broken' size we receive an error, but don't process
it. When we try to free this acl, we face the 'wild-memory-access'
error (because it wasn't allocated).
This patch will add new condition to the 'v9fs_fid_xattr_get'
function, so it will return an EOVERFLOW error if the 'attr_size'
is larger than SSIZE_MAX.
In this version of the patch I simplified the condition.
In previous (v2) version of the patch I removed explicit type conversion
and added separate condition to check the possible overflow and return
an error (in v1 version I've just modified the existing condition).
Tested via syzkaller.
Suggested-by: Christian Schoenebeck <linux_oss@crudebyte.com>
Reported-by: syzbot+cb1d16facb3cc90de5fb@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?id=fbbef66d9e4d096242f3617de5d14d12705b4659
Signed-off-by: Ivan Orlov <ivan.orlov0322@gmail.com>
Reviewed-by: Christian Schoenebeck <linux_oss@crudebyte.com>
Signed-off-by: Eric Van Hensbergen <ericvh@kernel.org>
The spi-max-frequency is a property of SPI children, not the
controller:
k210_generic.dtb: spi@50240000: Unevaluated properties are not allowed ('spi-max-frequency' was unexpected)
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Pull USB / Thunderbolt driver fixes from Greg KH:
"Here are a small set of USB and Thunderbolt driver fixes for reported
problems and a documentation update, for 6.3-rc4.
Included in here are:
- documentation update for uvc gadget driver
- small thunderbolt driver fixes
- cdns3 driver fixes
- dwc3 driver fixes
- dwc2 driver fixes
- chipidea driver fixes
- typec driver fixes
- onboard_usb_hub device id updates
- quirk updates
All of these have been in linux-next with no reported problems"
* tag 'usb-6.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (30 commits)
usb: dwc2: fix a race, don't power off/on phy for dual-role mode
usb: dwc2: fix a devres leak in hw_enable upon suspend resume
usb: chipidea: core: fix possible concurrent when switch role
usb: chipdea: core: fix return -EINVAL if request role is the same with current role
thunderbolt: Rename shadowed variables bit to interrupt_bit and auto_clear_bit
thunderbolt: Disable interrupt auto clear for rings
thunderbolt: Use const qualifier for `ring_interrupt_index`
usb: gadget: Use correct endianness of the wLength field for WebUSB
uas: Add US_FL_NO_REPORT_OPCODES for JMicron JMS583Gen 2
usb: cdnsp: changes PCI Device ID to fix conflict with CNDS3 driver
usb: cdns3: Fix issue with using incorrect PCI device function
usb: cdnsp: Fixes issue with redundant Status Stage
MAINTAINERS: make me a reviewer of USB/IP
thunderbolt: Use scale field when allocating USB3 bandwidth
thunderbolt: Limit USB3 bandwidth of certain Intel USB4 host routers
thunderbolt: Call tb_check_quirks() after initializing adapters
thunderbolt: Add missing UNSET_INBOUND_SBTX for retimer access
thunderbolt: Fix memory leak in margining
usb: dwc2: drd: fix inconsistent mode if role-switch-default-mode="host"
docs: usb: Add documentation for the UVC Gadget
...
Pull scheduler fix from Borislav Petkov:
- Fix a corner case where vruntime of a task is not being sanitized
* tag 'sched_urgent_for_v6.3_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/fair: Sanitize vruntime of entity being migrated
Pull perf fix from Borislav Petkov:
- Properly clear perf event status tracking in the AMD perf event
overflow handler
* tag 'perf_urgent_for_v6.3_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/amd/core: Always clear status for idx
Pull core fixes from Borislav Petkov:
- Do the delayed RCU wakeup for kthreads in the proper order so that
former doesn't get ignored
- A noinstr warning fix
* tag 'core_urgent_for_v6.3_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
entry/rcu: Check TIF_RESCHED _after_ delayed RCU wake-up
entry: Fix noinstr warning in __enter_from_user_mode()
Pull x86 fixes from Borislav Petkov:
- Add a AMX ptrace self test
- Prevent a false-positive warning when retrieving the (invalid)
address of dynamic FPU features in their init state which are not
saved in init_fpstate at all
- Randomize per-CPU entry areas only when KASLR is enabled
* tag 'x86_urgent_for_v6.3_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
selftests/x86/amx: Add a ptrace test
x86/fpu/xstate: Prevent false-positive warning in __copy_xstate_uabi_buf()
x86/mm: Do not shuffle CPU entry areas without KASLR
Pull cifs client fixes from Steve French:
"Twelve cifs/smb3 client fixes (most also for stable)
- forced umount fix
- fix for two perf regressions
- reconnect fixes
- small debugging improvements
- multichannel fixes"
* tag 'smb3-client-fixes-6.3-rc3' of git://git.samba.org/sfrench/cifs-2.6:
smb3: fix unusable share after force unmount failure
cifs: fix dentry lookups in directory handle cache
smb3: lower default deferred close timeout to address perf regression
cifs: fix missing unload_nls() in smb2_reconnect()
cifs: avoid race conditions with parallel reconnects
cifs: append path to open_enter trace event
cifs: print session id while listing open files
cifs: dump pending mids for all channels in DebugData
cifs: empty interface list when server doesn't support query interfaces
cifs: do not poll server interfaces too regularly
cifs: lock chan_lock outside match_session
cifs: check only tcon status on tcon related functions
Pull nfsd fix from Chuck Lever:
- Fix a crash when using NFS with krb5p
* tag 'nfsd-6.3-4' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
SUNRPC: Fix a crash in gss_krb5_checksum()
Pull yet more xfs bug fixes from Darrick Wong:
"The first bugfix addresses a longstanding problem where we use the
wrong file mapping cursors when trying to compute the speculative
preallocation quantity. This has been causing sporadic crashes when
alwayscow mode is engaged.
The other two fixes correct minor problems in more recent changes.
- Fix the new allocator tracepoints because git am mismerged the
changes such that the trace_XXX got rebased to be in function YYY
instead of XXX
- Ensure that the perag AGFL_RESET state is consistent with whatever
we've just read off the disk
- Fix a bug where we used the wrong iext cursor during a write begin"
* tag 'xfs-6.3-fixes-7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: fix mismerged tracepoints
xfs: clear incore AGFL_RESET state if it's not needed
xfs: pass the correct cursor to xfs_iomap_prealloc_size
Pull xfs percpu counter fixes from Darrick Wong:
"We discovered a filesystem summary counter corruption problem that was
traced to cpu hot-remove racing with the call to percpu_counter_sum
that sets the free block count in the superblock when writing it to
disk. The root cause is that percpu_counter_sum doesn't cull from
dying cpus and hence misses those counter values if the cpu shutdown
hooks have not yet run to merge the values.
I'm hoping this is a fairly painless fix to the problem, since the
dying cpu mask should generally be empty. It's been in for-next for a
week without any complaints from the bots.
- Fix a race in the percpu counters summation code where the
summation failed to add in the values for any CPUs that were dying
but not yet dead. This fixes some minor discrepancies and incorrect
assertions when running generic/650"
* tag 'xfs-6.3-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
pcpcntr: remove percpu_counter_sum_all()
fork: remove use of percpu_counter_sum_all
pcpcntrs: fix dying cpu summation race
cpumask: introduce for_each_cpu_or
clang with W=1 reports
fs/ksmbd/unicode.c:122:19: error: unused function
'is_char_allowed' [-Werror,-Wunused-function]
static inline int is_char_allowed(char *ch)
^
This function is not used so remove it.
Signed-off-by: Tom Rix <trix@redhat.com>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Commit 83dcedd554 ("ksmbd: fix infinite loop in ksmbd_conn_handler_loop()"),
changes GFP modifiers passed to kvmalloc(). This cause xfstests generic/551
test to fail. We limit pdu length size according to connection status and
maximum number of connections. In the rest, memory allocation of request
is limited by credit management. so these flags are no longer needed.
Fixes: 83dcedd554 ("ksmbd: fix infinite loop in ksmbd_conn_handler_loop()")
Cc: stable@vger.kernel.org
Signed-off-by: Marios Makassikis <mmakassikis@freebox.fr>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Pull xfs fixes from Darrick Wong:
"This batch started with some debugging enhancements to the new
allocator refactoring that we put in 6.3-rc1 to assist developers in
rebasing their dev branches.
As for more serious code changes -- there's a bug fix to make the
lockless allocator scan the whole filesystem before resorting to the
locking allocator. We're also adding a selftest for the venerable
directory/xattr hash function to make sure that it produces consistent
results so that we can address any fallout as soon as possible.
- Add a few debugging assertions so that people (me) trying to port
code to the new allocator functions don't mess up the caller
requirements
- Relax some overly cautious lock ordering enforcement in the new
allocator code, which means that file allocations will locklessly
scan for the best space they can get before backing off to the
traditional lock-and-really-get-it behavior
- Add tracepoints to make it easier to trace the xfs allocator
behavior
- Actually test the dir/xattr hash algorithm to make sure it produces
consistent results across all the platforms XFS supports"
* tag 'xfs-6.3-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: test dir/attr hash when loading module
xfs: add tracepoints for each of the externally visible allocators
xfs: walk all AGs if TRYLOCK passed to xfs_alloc_vextent_iterate_ags
xfs: try to idiot-proof the allocators
Pull hwmon fixes from Guenter Roeck:
- it87: Fix voltage scaling for chips with 10.9mV ADCs
- xgene: Fix ioremap and memremap leak
- peci/cputemp: Fix miscalculated DTS temperature for SKX
- hwmon core: fix potential sensor registration failure with thermal
subsystem if of_node is missing
* tag 'hwmon-for-v6.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon (it87): Fix voltage scaling for chips with 10.9mV ADCs
hwmon: (xgene) Fix ioremap and memremap leak
hwmon: fix potential sensor registration fail if of_node is missing
hwmon: (peci/cputemp) Fix miscalculated DTS for SKX
When link speed is 10 Mbps and temperature is under -20°C, RTL8168H and
RTL8107E may have rx crc error. Disable phy 10 Mbps pll off to fix this
issue.
Fixes: 6e1d0b8988 ("r8169:add support for RTL8168H and RTL8107E")
Signed-off-by: ChunHao Lin <hau@realtek.com>
Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Oleksij Rempel says:
====================
net: dsa: microchip: ksz8: fixes for stable
changes v2:
- use proper Fixes tag
- add Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com> on all
reviewed patches except the ksz8863_smi patch.
These fixes address issues such as incomplete FDB extraction, incorrect
FID extraction and configuration, incorrect timestamp extraction, and
ghost entry extraction from an empty dynamic MAC table. These updates
ensure proper functioning of the FDB/MDB functionality for the
ksz8863/ksz8873 series of chips.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
FID is directly mapped to VID. However, configuring a MAC address with a
VID != 0 resulted in incorrect configuration due to an incorrect bit
mask. This kernel commit fixed the issue by correcting the bit mask and
ensuring proper configuration of MAC addresses with non-zero VID.
Fixes: 4b20a07e10 ("net: dsa: microchip: ksz8795: add support for ksz88xx chips")
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Current regmap bulk access is broken, resulting to wrong reads/writes
if ksz_read64/ksz_write64 functions are used.
Mostly this issue was visible by using ksz8_fdb_dump(), which returned
corrupt MAC address.
The reason is that regmap was configured to have max_raw_read/write,
even if ksz8863_mdio_read/write functions are able to handle unlimited
read/write accesses. On ksz_read64 function we are using multiple 32bit
accesses by incrementing each access by 1 instead of 4. Resulting buffer
had 01234567.12345678 instead of 01234567.89abcdef.
We have multiple ways to fix it:
- enable 4 byte alignment for 32bit accesses. Since the HW do not have
this requirement. It will break driver.
- disable max_raw_* limit.
This patch is removing max_raw_* limit for regmap accesses in ksz8863_smi.
Fixes: 60a3647600 ("net: dsa: microchip: Add Microchip KSZ8863 SMI based driver support")
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the dynamic MAC table is empty, we will still extract one outdated
entry. Fix it by using correct bit offset.
Fixes: 4b20a07e10 ("net: dsa: microchip: ksz8795: add support for ksz88xx chips")
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Current ksz8_fdb_dump() is able to extract only max 249 entries on
the ksz8863/ksz8873 series of switches. This happened due to wrong
bit mask and offset calculation.
This commit corrects the issue and allows for the complete extraction of
all 1024 entries.
Fixes: 4b20a07e10 ("net: dsa: microchip: ksz8795: add support for ksz88xx chips")
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Before this patch, the ksz8_fdb_dump() function had several issues, such
as uninitialized variables and incorrect usage of source port as a bit
mask. These problems caused inaccurate reporting of vid information and
port assignment in the bridge fdb.
Fixes: e587be759e ("net: dsa: microchip: update fdb add/del/dump in ksz_common")
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Smatch complains that:
drivers/ptp/ptp_qoriq.c ptp_qoriq_probe()
warn: 'base' from ioremap() not released.
Fix this by revising the parameter from 'ptp_qoriq->base' to 'base'.
This is only a bug if ptp_qoriq_init() returns on the
first -ENODEV error path.
For other error paths ptp_qoriq->base and base are the same.
And this change makes the code more readable.
Fixes: 7f4399ba40 ("ptp_qoriq: fix NULL access if ptp dt node missing")
Signed-off-by: SongJingyi <u201912584@hust.edu.cn>
Reviewed-by: Dan Carpenter <error27@gmail.com>
Reviewed-by: Dongliang Mu <dzm91@hust.edu.cn>
Link: https://lore.kernel.org/r/20230324031406.1895159-1-u201912584@hust.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently, vmxnet3 uses GRO callback only if LRO is disabled. However,
on smartNic based setups where UPT is supported, LRO can be enabled
from guest VM but UPT devicve does not support LRO as of now. In such
cases, there can be performance degradation as GRO is not being done.
This patch fixes this issue by calling GRO API when UPT is enabled. We
use updateRxProd to determine if UPT mode is active or not.
To clarify few things discussed over the thread:
The patch is not neglecting any feature bits nor disabling GRO. It uses
GRO callback when UPT is active as LRO is not available in UPT.
GRO callback cannot be used as default for all cases as it degrades
performance for non-UPT cases or for cases when LRO is already done in
ESXi.
Cc: stable@vger.kernel.org
Fixes: 6f91f4ba04 ("vmxnet3: add support for capability registers")
Signed-off-by: Ronak Doshi <doshir@vmware.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230323200721.27622-1-doshir@vmware.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The probe function sets priv->chip_data to (void *)priv + sizeof(*priv)
with the expectation that priv has enough trailing space.
However, only realtek-smi actually allocated this chip_data space.
Do likewise in realtek-mdio to fix out-of-bounds accesses.
These accesses likely went unnoticed so far, because of an (unused)
buf[4096] member in struct realtek_priv, which caused kmalloc to
round up the allocated buffer to a big enough size, so nothing of
value was overwritten. With a different allocator (like in the barebox
bootloader port of the driver) or with KASAN, the memory corruption
becomes quickly apparent.
Fixes: aac9400106 ("net: dsa: realtek: add new mdio interface for drivers")
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Luiz Angelo Daros de Luca <luizluca@gmail.com>
Reviewed-by: Alvin Šipraga <alsi@bang-olufsen.dk>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
Link: https://lore.kernel.org/r/20230323103735.2331786-1-a.fatoum@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull misc fixes from Andrew Morton:
"21 hotfixes, 8 of which are cc:stable. 11 are for MM, the remainder
are for other subsystems"
* tag 'mm-hotfixes-stable-2023-03-24-17-09' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (21 commits)
mm: mmap: remove newline at the end of the trace
mailmap: add entries for Richard Leitner
kcsan: avoid passing -g for test
kfence: avoid passing -g for test
mm: kfence: fix using kfence_metadata without initialization in show_object()
lib: dhry: fix unstable smp_processor_id(_) usage
mailmap: add entry for Enric Balletbo i Serra
mailmap: map Sai Prakash Ranjan's old address to his current one
mailmap: map Rajendra Nayak's old address to his current one
Revert "kasan: drop skip_kasan_poison variable in free_pages_prepare"
mailmap: add entry for Tobias Klauser
kasan, powerpc: don't rename memintrinsics if compiler adds prefixes
mm/ksm: fix race with VMA iteration and mm_struct teardown
kselftest: vm: fix unused variable warning
mm: fix error handling for map_deny_write_exec
mm: deduplicate error handling for map_deny_write_exec
checksyscalls: ignore fstat to silence build warning on LoongArch
nilfs2: fix kernel-infoleak in nilfs_ioctl_wrap_copy()
test_maple_tree: add more testing for mas_empty_area()
maple_tree: fix mas_skip_node() end slot detection
...
Some USB-SATA adapters have broken behavior when an unsupported VPD page is
probed: Depending on the VPD page number, a 4-byte header with a valid VPD
page number but with a 0 length is returned. Currently, scsi_vpd_inquiry()
only checks that the page number is valid to determine if the page is
valid, which results in receiving only the 4-byte header for the
non-existent page. This error manifests itself very often with page 0xb9
for the Concurrent Positioning Ranges detection done by sd_read_cpr(),
resulting in the following error message:
sd 0:0:0:0: [sda] Invalid Concurrent Positioning Ranges VPD page
Prevent such misleading error message by adding a check in
scsi_vpd_inquiry() to verify that the page length is not 0.
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Link: https://lore.kernel.org/r/20230322022211.116327-1-damien.lemoal@opensource.wdc.com
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Pull ksmbd server fixes from Steve French:
- return less confusing messages on unsupported dialects
(STATUS_NOT_SUPPORTED instead of I/O error)
- fix for overly frequent inactive session termination
- fix refcount leak
- fix bounds check problems found by static checkers
- fix to advertise named stream support correctly
- Fix AES256 signing bug when connected to from MacOS
* tag '6.3-rc3-ksmbd-smb3-server-fixes' of git://git.samba.org/ksmbd:
ksmbd: return unsupported error on smb1 mount
ksmbd: return STATUS_NOT_SUPPORTED on unsupported smb2.0 dialect
ksmbd: don't terminate inactive sessions after a few seconds
ksmbd: fix possible refcount leak in smb2_open()
ksmbd: add low bound validation to FSCTL_QUERY_ALLOCATED_RANGES
ksmbd: add low bound validation to FSCTL_SET_ZERO_DATA
ksmbd: set FILE_NAMED_STREAMS attribute in FS_ATTRIBUTE_INFORMATION
ksmbd: fix wrong signingkey creation when encryption is AES256
Pull ARM SoC fixes from Arnd Bergmann:
"As usual, most of the bug fixes address issues in the devicetree
files, and out of these, most are for the Qualcomm and NXP platforms,
including:
- A missing 'reserved-memory' property on LG G Watch R that is needed
to prevent clashing with firmware
- Annotations for cache coherency on multiple machines
- Corrections for pinctrl, regulator, clock, iommu and power domain
properties for i.MX and Qualcomm to correctly reflect the hardware
settings
- Firmware file names on multiple machines SA8540P Ride board
- An incompatible change to the qcom vadc driver requires adding
individual labels
- Fix EQoS PHY reset GPIO by dropping the deprecated/wrong property
and switch to the new bindings.
- A fix for PCI bus address translation Tegra194 and Tegra234.
There are also a couple of device driver fixes, addressing:
- A race condition in the amdtee driver
- A performance regression in the Qualcomm 'llcc' driver
- An unitialized variable use NXP i.MX 'weim' driver
- Error handling issues in Qualcomm 'rmtfs', and 'scm' drivers and
the Arm scmi firmware driver"
* tag 'arm-fixes-6.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (48 commits)
arm64: dts: qcom: sc8280xp-x13s: mark bob regulator as always-on
arm64: dts: qcom: sc8280xp-x13s: mark s12b regulator as always-on
arm64: dts: qcom: sc8280xp-x13s: mark s10b regulator as always-on
arm64: dts: qcom: sc8280xp-x13s: mark s11b regulator as always-on
arm64: dts: imx93: add missing #address-cells and #size-cells to i2c nodes
bus: imx-weim: fix branch condition evaluates to a garbage value
arm64: dts: imx8mn: specify #sound-dai-cells for SAI nodes
ARM: dts: imx6sl: tolino-shine2hd: fix usbotg1 pinctrl
ARM: dts: imx6sll: e60k02: fix usbotg1 pinctrl
ARM: dts: imx6sll: e70k02: fix usbotg1 pinctrl
arm64: dts: imx93: Fix eqos properties
arm64: dts: imx8mp: Fix LCDIF2 node clock order
arm64: dts: imx8mm-nitrogen-r2: fix WM8960 clock name
arm64: dts: imx8dxl-evk: Fix eqos phy reset gpio
firmware: qcom: scm: fix bogus irq error at probe
arm64: dts: qcom: sm8550: Mark UFS controller as cache coherent
arm64: dts: qcom: sa8540p-ride: correct name of remoteproc_nsp0 firmware
arm64: dts: qcom: sm8450: Mark UFS controller as cache coherent
arm64: dts: qcom: sm8350: Mark UFS controller as cache coherent
arm64: dts: qcom: sm8550: fix LPASS pinctrl slew base address
...
Pull power supply fixes from Sebastian Reichel:
- rk817: Fix compiler warning
- cros_usbpd-charger: Fix excessive error printing
- axp288_fuel_gauge: handle platform_get_irq error
- bq24190 and da9150: Fix race condition in remove path
* tag 'for-v6.3-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply:
power: supply: da9150: Fix use after free bug in da9150_charger_remove due to race condition
power: supply: bq24190: Fix use after free bug in bq24190_remove due to race condition
power: supply: axp288_fuel_gauge: Added check for negative values
power: supply: cros_usbpd: reclassify "default case!" as debug
power: supply: rk817: Fix unsigned comparison with less than zero
Pull drm fixes from Daniel Vetter:
- usual pile of fixes for amdgpu & i915
- probe error handling fixes for meson, lt8912b bridge
- the host1x patch from Arnd
- panel-orientation fix for Lenovo Book X90F
* tag 'drm-fixes-2023-03-24' of git://anongit.freedesktop.org/drm/drm: (23 commits)
gpu: host1x: fix uninitialized variable use
drm/amd/display: Set dcn32 caps.seamless_odm
drm/amd/display: fix wrong index used in dccg32_set_dpstreamclk
drm/amdgpu/nv: Apply ASPM quirk on Intel ADL + AMD Navi
drm/amd/display: remove outdated 8bpc comments
drm/amdgpu/gfx: set cg flags to enter/exit safe mode
drm/amdgpu: Force signal hw_fences that are embedded in non-sched jobs
drm/amdgpu: add mes resume when do gfx post soft reset
drm/amdgpu: skip ASIC reset for APUs when go to S4
drm/amdgpu: reposition the gpu reset checking for reuse
drm/bridge: lt8912b: return EPROBE_DEFER if bridge is not found
drm/meson: fix missing component unbind on bind errors
drm: panel-orientation-quirks: Add quirk for Lenovo Yoga Book X90F
Revert "drm/i915/hwmon: Enable PL1 power limit"
drm/i915: Update vblank timestamping stuff on seamless M/N change
drm/i915: Fix format for perf_limit_reasons
drm/i915/gt: perform uc late init after probe error injection
drm/i915/active: Fix missing debug object activation
drm/i915/guc: Fix missing ecodes
drm/i915/mtl: Disable MC6 for MTL A step
...
dp83869 internally uses a look-up table for mapping supported delays in
nanoseconds to register values.
When specific delays are defined in device-tree, phy_get_internal_delay
does the lookup automatically returning an index.
The default case wrongly assigns the nanoseconds value from the lookup
table, resulting in numeric value 2000 applied to delay configuration
register, rather than the expected index values 0-7 (7 for 2000).
Ultimately this issue broke RX for 1Gbps links.
Fix default delay configuration by assigning the intended index value
directly.
Cc: stable@vger.kernel.org
Fixes: 736b25afe2 ("net: dp83869: Add RGMII internal delay configuration")
Co-developed-by: Yazan Shhady <yazan.shhady@solid-run.com>
Signed-off-by: Yazan Shhady <yazan.shhady@solid-run.com>
Signed-off-by: Josua Mayer <josua@solid-run.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230323102536.31988-1-josua@solid-run.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
At NIC reset, some offload features related to encapsulated traffic
might have changed (this mainly happens if the firmware-variant is
changed with the sfboot userspace tool). Because of this, features are
checked and set again at reset time.
However, this was not done right, and some features were improperly
overwritten at NIC reset:
- Tunneled IPv6 segmentation was always disabled
- Features disabled with ethtool were reenabled
- Features that becomes unsupported after the reset were not disabled
Also, checking if the device supports IPV6_CSUM to enable TSO6 is no
longer necessary because all currently supported devices support it.
Additionally, move the assignment of some other features to the
EF10_OFFLOAD_FEATURES macro, like it is done in ef100, leaving the
selection of features in efx_pci_probe_post_io a bit cleaner.
Fixes: ffffd2454a ("sfc: correctly advertise tunneled IPv6 segmentation")
Fixes: 24b2c3751a ("sfc: advertise encapsulated offloads on EF10")
Reported-by: Tianhao Zhao <tizhao@redhat.com>
Suggested-by: Jonathan Cooper <jonathan.s.cooper@amd.com>
Tested-by: Jonathan Cooper <jonathan.s.cooper@amd.com>
Signed-off-by: Íñigo Huguet <ihuguet@redhat.com>
Acked-by: Edward Cree <ecree.xilinx@gmail.com>
Link: https://lore.kernel.org/r/20230323083417.7345-1-ihuguet@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull device mapper fixes from Mike Snitzer:
- Fix DM thin to work as a swap device by using 'limit_swap_bios' DM
target flag (initially added to allow swap to dm-crypt) to throttle
the amount of outstanding swap bios.
- Fix DM crypt soft lockup warnings by calling cond_resched() from the
cpu intensive loop in dmcrypt_write().
- Fix DM crypt to not access an uninitialized tasklet. This fix allows
for consistent handling of IO completion, by _not_ needlessly punting
to a workqueue when tasklets are not needed.
- Fix DM core's alloc_dev() initialization for DM stats to check for
and propagate alloc_percpu() failure.
* tag 'for-6.3/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm stats: check for and propagate alloc_percpu failure
dm crypt: avoid accessing uninitialized tasklet
dm crypt: add cond_resched() to dmcrypt_write()
dm thin: fix deadlock when swapping to thin device
Pull block fixes from Jens Axboe:
- NVMe pull request via Christoph:
- Send Identify with CNS 06h only to I/O controllers (Martin
George)
- Fix nvme_tcp_term_pdu to match spec (Caleb Sander)
- Pass in issue_flags for uring_cmd, so the end_io handlers don't need
to assume what the right context is (me)
- Fix for ublk, marking it as LIVE before adding it to avoid races on
the initial IO (Ming)
* tag 'block-6.3-2023-03-24' of git://git.kernel.dk/linux:
nvme-tcp: fix nvme_tcp_term_pdu to match spec
nvme: send Identify with CNS 06h only to I/O controllers
block/io_uring: pass in issue_flags for uring_cmd task_work handling
block: ublk_drv: mark device as LIVE before adding disk
Pull io_uring fixes from Jens Axboe:
- Fix an issue with repeated -ECONNREFUSED on a socket (me)
- Fix a NULL pointer deference due to a stale lookup cache for
allocating direct descriptors (Savino)
* tag 'io_uring-6.3-2023-03-24' of git://git.kernel.dk/linux:
io_uring/rsrc: fix null-ptr-deref in io_file_bitmap_get()
io_uring/net: avoid sending -ECONNABORTED on repeated connection requests
Pull thermal control fixes from Rafael Wysocki:
"These address two recent regressions related to thermal control.
Specifics:
- Restore the thermal core behavior regarding zero-temperature trip
points to avoid a driver regression (Ido Schimmel)
- Fix a recent regression in the ACPI processor driver preventing it
from changing the number of CPU cooling device states exposed via
sysfs after the given CPU cooling device has been registered
(Rafael Wysocki)"
* tag 'thermal-6.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
thermal: core: Restore behavior regarding invalid trip points
ACPI: processor: thermal: Update CPU cooling devices on cpufreq policy changes
thermal: core: Introduce thermal_cooling_device_update()
thermal: core: Introduce thermal_cooling_device_present()
ACPI: processor: Reorder acpi_processor_driver_init()
Pull ACPI fixes from Rafael Wysocki:
"These add new ACPI IRQ override and backlight detection quirks.
Specifics:
- Add backlight=native DMI quirk for Acer Aspire 3830TG to the ACPI
backlight driver (Hans de Goede)
- Add an ACPI IRQ override quirk for Medion S17413 (Aymeric Wibo)"
* tag 'acpi-6.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: resource: Add Medion S17413 to IRQ override quirk
ACPI: video: Add backlight=native DMI quirk for Acer Aspire 3830TG
At some point in between sending this patch to the list and merging it
into for-next, the tracepoints got all mixed up because I've
over-reliant on automated tools not sucking. The end result is that the
tracepoints are all wrong, so fix them.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
If user does forced unmount ("umount -f") while files are still open
on the share (as was seen in a Kubernetes example running on SMB3.1.1
mount) then we were marking the share as "TID_EXITING" in umount_begin()
which caused all subsequent operations (except write) to fail ... but
unfortunately when umount_begin() is called we do not know yet that
there are open files or active references on the share that would prevent
unmount from succeeding. Kubernetes had example when they were doing
umount -f when files were open which caused the share to become
unusable until the files were closed (and the umount retried).
Fix this so that TID_EXITING is not set until we are about to send
the tree disconnect (not at the beginning of forced umounts in
umount_begin) so that if "umount -f" fails (due to open files or
references) the mount is still usable.
Cc: stable@vger.kernel.org
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Get rid of any prefix paths in @path before lookup_positive_unlocked()
as it will call ->lookup() which already adds those prefix paths
through build_path_from_dentry().
This has caused a performance regression when mounting shares with a
prefix path where readdir(2) would end up retrying several times to
open bad directory names that contained duplicate prefix paths.
Fix this by skipping any prefix paths in @path before calling
lookup_positive_unlocked().
Fixes: e4029e0726 ("cifs: find and use the dentry for cached non-root directories also")
Cc: stable@vger.kernel.org # 6.1+
Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Performance tests with large number of threads noted that the change
of the default closetimeo (deferred close timeout between when
close is done by application and when client has to send the close
to the server), to 5 seconds from 1 second, significantly degraded
perf in some cases like this (in the filebench example reported,
the stats show close requests on the wire taking twice as long,
and 50% regression in filebench perf). This is stil configurable
via mount parm closetimeo, but to be safe, decrease default back
to its previous value of 1 second.
Reported-by: Yin Fengwei <fengwei.yin@intel.com>
Reported-by: kernel test robot <yujie.liu@intel.com>
Link: https://lore.kernel.org/lkml/997614df-10d4-af53-9571-edec36b0e2f3@intel.com/
Fixes: 5efdd9122e ("smb3: allow deferred close timeout to be configurable")
Cc: stable@vger.kernel.org # 6.0+
Tested-by: Yin Fengwei <fengwei.yin@intel.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Make sure to unload_nls() @nls_codepage if we no longer need it.
Fixes: bc962159e8 ("cifs: avoid race conditions with parallel reconnects")
Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Cc: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Pull slab fix from Vlastimil Babka:
"A single build fix for a corner case configuration that is apparently
possible to achieve on some arches, from Geert"
* tag 'slab-fix-for-6.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
mm/slab: Fix undefined init_cache_node_node() for NUMA and !SMP
Pull EFI fixes from Ard Biesheuvel:
- Set the NX compat flag for arm64 and zboot, to ensure compatibility
with EFI firmware that complies with tightening requirements imposed
across the ecosystem.
- Improve identification of Ampere Altra systems based on SMBIOS data.
- Fix some issues related to the EFI framebuffer that were introduced
as a result from some refactoring related to zboot and the merge with
sysfb.
- Makefile tweak to avoid rebuilding vmlinuz unnecessarily.
- Fix efi_random_alloc() return value on out of memory condition.
* tag 'efi-fixes-for-v6.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
efi/libstub: randomalloc: Return EFI_OUT_OF_RESOURCES on failure
efi/libstub: Use relocated version of kernel's struct screen_info
efi/libstub: zboot: Add compressed image to make targets
efi: sysfb_efi: Add quirk for Lenovo Yoga Book X91F/L
efi: sysfb_efi: Fix DMI quirks not working for simpledrm
efi/libstub: smbios: Drop unused 'recsize' parameter
arm64: efi: Use SMBIOS processor version to key off Ampere quirk
efi/libstub: smbios: Use length member instead of record struct size
efi: earlycon: Reprobe after parsing config tables
arm64: efi: Set NX compat flag in PE/COFF header
efi/libstub: arm64: Remap relocated image with strict permissions
efi/libstub: zboot: Mark zboot EFI application as NX compatible
Qualcomm driver fixes for v6.3
Support for the secure world interrupting the SCM driver drive the wait
queue mechanism was recently introduced, but most platforms doesn't have
this mechanism and an error should not be printed in the log.
The rmtfs_mem driver recently gained support for assigning the region to
multiple VMIDs, but accidentally removed the support for running without
assignment. A couple of changes are introducd to correct this.
The SC8280XP LLCC slice configuration is wrong, reslting in incorrect
configuration of the hardware. The table is corrected, based on the
datasheet.
* tag 'qcom-driver-fixes-for-6.3' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux:
firmware: qcom: scm: fix bogus irq error at probe
soc: qcom: rmtfs: handle optional qcom,vmid correctly
soc: qcom: rmtfs: fix error handling reading qcom,vmid
soc: qcom: llcc: Fix slice configuration values for SC8280XP
Link: https://lore.kernel.org/r/20230323142505.1086072-1-andersson@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Qualcomm ARM64 Devicetree fixes for v6.3
This correct SIM card selection on the two newly introduced
MSM8916-based USB modems.
The firmware-name for the first CDSP is corrected on the SA8540P Ride
board.
The PCIe controller in SC7280 is marked cache-coherent, which resolves
seen data corruption issues.
Labels are added to the vadc channel nodes on SC8280XP, as the Linux
driver was updated to not include the unit address when generating
device names and collisions thereby prevented registration of the
channels. Audio clocks and routing is corrected and a few regulators are
marked always-on for the Lenovo Thinkpad X13s, as their clients are not
fully described at this point.
SPI5 was accidentally enabled by default on SM6115, and is disabled
again.
CDSP on SM6375 is provided its power-domains, to appropriately vote for
during power up for the DSP.
The iommu mask for the PCIe controllers in SM8150 is updated, to match
what the hypervisor expects.
Th Venus firmware path is corrected on Xiaomi Mi Pad 5 Pro.
The UFS controller is marked cache coherent on SM8350 and SM8450.
The clocks for the second WSA macro on SM8450 is corrected, and given
its own clocks.
The bias-pull-up value for I2C pins are corrected on SM8550, to trigger
the selection of the strong pull. CPU compatibles and the base address
of the LPASS TLMM block are corrected.
* tag 'qcom-arm64-fixes-for-6.3' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux: (23 commits)
arm64: dts: qcom: sc8280xp-x13s: mark bob regulator as always-on
arm64: dts: qcom: sc8280xp-x13s: mark s12b regulator as always-on
arm64: dts: qcom: sc8280xp-x13s: mark s10b regulator as always-on
arm64: dts: qcom: sc8280xp-x13s: mark s11b regulator as always-on
arm64: dts: qcom: sm8550: Mark UFS controller as cache coherent
arm64: dts: qcom: sa8540p-ride: correct name of remoteproc_nsp0 firmware
arm64: dts: qcom: sm8450: Mark UFS controller as cache coherent
arm64: dts: qcom: sm8350: Mark UFS controller as cache coherent
arm64: dts: qcom: sm8550: fix LPASS pinctrl slew base address
arm64: dts: qcom: sc8280xp-x13s: fix va dmic dai links and routing
arm64: dts: qcom: sc8280xp-x13s: fix dmic sample rate
arm64: dts: qcom: sc8280xp: fix lpass tx macro clocks
arm64: dts: qcom: sc8280xp: fix rx frame shapping info
arm64: dts: qcom: sm8450: correct WSA2 assigned clocks
arm64: dts: qcom: sc7280: Mark PCIe controller as cache coherent
arm64: dts: qcom: msm8916-ufi: Fix sim card selection pinctrl
arm64: dts: qcom: sm8250-xiaomi-elish: Correct venus firmware path
arm64: dts: qcom: sm8550: Use correct CPU compatibles
arm64: dts: qcom: sm8550: Add bias pull up value to tlmm i2c data clk states
arm64: dts: qcom: sm6375: Add missing power-domain-named to CDSP
...
Link: https://lore.kernel.org/r/20230323141642.1085684-1-andersson@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Pull RISC-V fixes from Palmer Dabbelt:
- A fix to match the CSR ASID masking rules when passing ASIDs to
firmware
- Force GCC to use ISA 2.2, to avoid a host of compatibily issues
between toolchains
* tag 'riscv-for-linus-6.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: Handle zicsr/zifencei issues between clang and binutils
riscv: mm: Fix incorrect ASID argument when flushing TLB
Pull xen fixes from Juergen Gross:
- fix build warning
- avoid concurrent accesses to the Xen PV console ring page
* tag 'for-linus-6.3-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
x86/PVH: avoid 32-bit build warning when obtaining VGA console info
hvc/xen: prevent concurrent accesses to the shared ring
Pull chrome platform fix from Tzung-Bi Shih:
"Fix a kernel data leak vulnerability"
* tag 'tag-chrome-platform-fixes-for-v6.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux:
platform/chrome: cros_ec_chardev: fix kernel data leak from ioctl
Pull i2c fixes from Wolfram Sang:
"A set of regular driver fixes"
* tag 'i2c-for-6.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: xgene-slimpro: Fix out-of-bounds bug in xgene_slimpro_i2c_xfer()
i2c: hisi: Only use the completion interrupt to finish the transfer
i2c: hisi: Avoid redundant interrupts
i2c: mxs: ensure that DMA buffers are safe for DMA
i2c: imx-lpi2c: check only for enabled interrupt flags
i2c: imx-lpi2c: clean rx/tx buffers upon new message
Pull networking fixes from Jakub Kicinski:
"Including fixes from bpf, wifi and bluetooth.
Current release - regressions:
- wifi: mt76: mt7915: add back 160MHz channel width support for
MT7915
- libbpf: revert poisoning of strlcpy, it broke uClibc-ng
Current release - new code bugs:
- bpf: improve the coverage of the "allow reads from uninit stack"
feature to fix verification complexity problems
- eth: am65-cpts: reset PPS genf adj settings on enable
Previous releases - regressions:
- wifi: mac80211: serialize ieee80211_handle_wake_tx_queue()
- wifi: mt76: do not run mt76_unregister_device() on unregistered hw,
fix null-deref
- Bluetooth: btqcomsmd: fix command timeout after setting BD address
- eth: igb: revert rtnl_lock() that causes a deadlock
- dsa: mscc: ocelot: fix device specific statistics
Previous releases - always broken:
- xsk: add missing overflow check in xdp_umem_reg()
- wifi: mac80211:
- fix QoS on mesh interfaces
- fix mesh path discovery based on unicast packets
- Bluetooth:
- ISO: fix timestamped HCI ISO data packet parsing
- remove "Power-on" check from Mesh feature
- usbnet: more fixes to drivers trusting packet length
- wifi: iwlwifi: mvm: fix mvmtxq->stopped handling
- Bluetooth: btintel: iterate only bluetooth device ACPI entries
- eth: iavf: fix inverted Rx hash condition leading to disabled hash
- eth: igc: fix the validation logic for taprio's gate list
- dsa: tag_brcm: legacy: fix daisy-chained switches
Misc:
- bpf: adjust insufficient default bpf_jit_limit to account for
growth of BPF use over the last 5 years
- xdp: bpf_xdp_metadata() use EOPNOTSUPP as unique errno indicating
no driver support"
* tag 'net-6.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (84 commits)
Bluetooth: HCI: Fix global-out-of-bounds
Bluetooth: mgmt: Fix MGMT add advmon with RSSI command
Bluetooth: btsdio: fix use after free bug in btsdio_remove due to unfinished work
Bluetooth: L2CAP: Fix responding with wrong PDU type
Bluetooth: btqcomsmd: Fix command timeout after setting BD address
Bluetooth: btinel: Check ACPI handle for NULL before accessing
net: mdio: thunder: Add missing fwnode_handle_put()
net: dsa: mt7530: move setting ssc_delta to PHY_INTERFACE_MODE_TRGMII case
net: dsa: mt7530: move lowering TRGMII driving to mt7530_setup()
net: dsa: mt7530: move enabling disabling core clock to mt7530_pll_setup()
net: asix: fix modprobe "sysfs: cannot create duplicate filename"
gve: Cache link_speed value from device
tools: ynl: Fix genlmsg header encoding formats
net: enetc: fix aggregate RMON counters not showing the ranges
Bluetooth: Remove "Power-on" check from Mesh feature
Bluetooth: Fix race condition in hci_cmd_sync_clear
Bluetooth: btintel: Iterate only bluetooth device ACPI entries
Bluetooth: ISO: fix timestamped HCI ISO data packet parsing
Bluetooth: btusb: Remove detection of ISO packets over bulk
Bluetooth: hci_core: Detect if an ACL packet is in fact an ISO packet
...
Prior to commit 7ac2ff8bb3, when we loaded the incore perag structure
with information from the AGF header, we would set or clear the
pagf_agfl_reset field based on whether or not the AGFL list was
misaligned within the block. IOWs, it's an incore state bit that's
supposed to cache something in the ondisk metadata. Therefore, the code
still needs to support clearing the incore bit if (somehow) the AGFL
were to correct itself.
It turns out that xfs_repair does exactly this -- phase 4 loads the AGF
to scan the rmapbt for corrupt records, which can set NEEDS_AGFL_RESET.
The scan unsets AGF_INIT but doesn't unset NEEDS_AGFL_RESET. Phase 5
totally rewrites the AGFL and fixes the alignment problem, didn't clear
NEEDS_AGFL_RESET historically, and reloads the perag state to fix the
freelist. This results in the AGFL being reset based on stale data,
which then causes the new AGFL blocks to be leaked. A subsequent
xfs_repair -n then complains about the leaks.
One could argue that phase 5 ought to clear this bit directly when it
reloads the perag AGF data after rewriting the AGFL, but libxfs used to
handle this for us, so it should go back to doing that.
Found by fuzzing flfirst = ones in xfs/352.
Fixes: 7ac2ff8bb3 ("xfs: perags need atomic operational state")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
In xfs_buffered_write_iomap_begin, @icur is the iext cursor for the data
fork and @ccur is the cursor for the cow fork. Pass in whichever cursor
corresponds to allocfork, because otherwise the xfs_iext_prev_extent
call can use the data fork cursor to walk off the end of the cow fork
structure. Best case it returns the wrong results, worst case it does
this:
stack segment: 0000 [#1] PREEMPT SMP
CPU: 2 PID: 3141909 Comm: fsstress Tainted: G W 6.3.0-rc2-xfsx #6.3.0-rc2 7bf5cc2e98997627cae5c930d890aba3aeec65dd
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20171121_152543-x86-ol7-builder-01.us.oracle.com-4.el7.1 04/01/2014
RIP: 0010:xfs_iext_prev+0x71/0x150 [xfs]
RSP: 0018:ffffc90002233aa8 EFLAGS: 00010297
RAX: 000000000000000f RBX: 000000000000000e RCX: 000000000000000c
RDX: 0000000000000002 RSI: 000000000000000e RDI: ffff8883d0019ba0
RBP: 989642409af8a7a7 R08: ffffea0000000001 R09: 0000000000000002
R10: 0000000000000000 R11: 000000000000000c R12: ffffc90002233b00
R13: ffff8883d0019ba0 R14: 989642409af8a6bf R15: 000ffffffffe0000
FS: 00007fdf8115f740(0000) GS:ffff88843fd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fdf8115e000 CR3: 0000000357256000 CR4: 00000000003506e0
Call Trace:
<TASK>
xfs_iomap_prealloc_size.constprop.0.isra.0+0x1a6/0x410 [xfs 619a268fb2406d68bd34e007a816b27e70abc22c]
xfs_buffered_write_iomap_begin+0xa87/0xc60 [xfs 619a268fb2406d68bd34e007a816b27e70abc22c]
iomap_iter+0x132/0x2f0
iomap_file_buffered_write+0x92/0x330
xfs_file_buffered_write+0xb1/0x330 [xfs 619a268fb2406d68bd34e007a816b27e70abc22c]
vfs_write+0x2eb/0x410
ksys_write+0x65/0xe0
do_syscall_64+0x2b/0x80
entry_SYSCALL_64_after_hwframe+0x46/0xb0
Found by xfs/538 in alwayscow mode, but this doesn't seem particular to
that test.
Fixes: 590b16516e ("xfs: refactor xfs_iomap_prealloc_size")
Actually-Fixes: 66ae56a53f ("xfs: introduce an always_cow mode")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Pull btrfs fixes from David Sterba:
"A few more fixes, the zoned accounting fix is spread across a few
patches, preparatory and the actual fixes:
- zoned mode:
- fix accounting of unusable zone space
- fix zone activation condition for DUP profile
- preparatory patches
- improved error handling of missing chunks
- fix compiler warning"
* tag 'for-6.3-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: zoned: drop space_info->active_total_bytes
btrfs: zoned: count fresh BG region as zone unusable
btrfs: use temporary variable for space_info in btrfs_update_block_group
btrfs: rename BTRFS_FS_NO_OVERCOMMIT to BTRFS_FS_ACTIVE_ZONE_TRACKING
btrfs: zoned: fix btrfs_can_activate_zone() to support DUP profile
btrfs: fix compiler warning on SPARC/PA-RISC handling fscrypt_setup_filename
btrfs: handle missing chunk mapping more gracefully
Pull SCSI fixes from James Bottomley:
"Four small fixes, three in drivers.
The core fix adds a UFS device to an existing quirk to avoid a huge
delay on boot"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: scsi_dh_alua: Fix memleak for 'qdata' in alua_activate()
scsi: qla2xxx: Synchronize the IOCB count to be in order
scsi: qla2xxx: Perform lockless command completion in abort path
scsi: core: Add BLIST_SKIP_VPD_PAGES for SKhynix H28U74301AMR
When multiple processes/channels do reconnects in parallel
we used to return success immediately
negotiate/session-setup/tree-connect, causing race conditions
between processes that enter the function in parallel.
This caused several errors related to session not found to
show up during parallel reconnects.
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Cc: stable@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
We do not dump the file path for smb3_open_enter ftrace
calls, which is a severe handicap while debugging
using ftrace evens. This change adds that info.
Unfortunately, we're not updating the path in open params
in many places; which I had to do as a part of this change.
SMB2_open gets path in utf16 format, but it's easier of
path is supplied as char pointer in oparms.
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Cc: stable@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
On some platforms there are some platform devices created with
invalid names. For example: "HID-SENSOR-INT-020b?.39.auto" instead
of "HID-SENSOR-INT-020b.39.auto"
This string include some invalid characters, hence it will fail to
properly load the driver which will handle this custom sensor. Also
it is a problem for some user space tools, which parses the device
names from ftrace and dmesg.
This is because the string, real_usage, is not NULL terminated and
printed with %s to form device name.
To address this, initialize the real_usage string with 0s.
Reported-and-tested-by: Todd Brandt <todd.e.brandt@linux.intel.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217169
Fixes: 98c062e824 ("HID: hid-sensor-custom: Allow more custom iio sensors")
Cc: stable@vger.kernel.org
Suggested-by: Philipp Jungkamp <p.jungkamp@gmx.net>
Signed-off-by: Philipp Jungkamp <p.jungkamp@gmx.net>
Signed-off-by: Todd Brandt <todd.e.brandt@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
The linux-xtensa@linux-xtensa.org mailing list has been bouncing emails
for a few months now. Drop it from the xtensa entries in the MAINTAINERS
file.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
It's been reported that the recent kernel can't probe the PCM devices
on Roland VS-100 properly, and it turned out to be a regression by the
recent addition of the bit shift range check for the format bits.
In the old code, we just did bit-shift and it resulted in zero, which
is then corrected to the standard PCM format, while the new code
explicitly returns an error in such a case.
For addressing the regression, relax the check and fallback to the
standard PCM type (with the info output).
Fixes: 43d5ca88df ("ALSA: usb-audio: Fix potential out-of-bounds shift")
Cc: <stable@vger.kernel.org>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217084
Link: https://lore.kernel.org/r/20230324075005.19403-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
ksmbd disconnect connection when mounting with vers=smb1.
ksmbd should send smb1 negotiate response to client for correct
unsupported error return. This patch add needed SMB1 macros and fill
NegProt part of the response for smb1 negotiate response.
Cc: stable@vger.kernel.org
Reported-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
A lot of modern Clevo barebones have touchpad and/or keyboard issues after
suspend fixable with nomux + reset + noloop + nopnp. Luckily, none of them
have an external PS/2 port so this can safely be set for all of them.
I'm not entirely sure if every device listed really needs all four quirks,
but after testing and production use, no negative effects could be
observed when setting all four.
Setting SERIO_QUIRK_NOMUX or SERIO_QUIRK_RESET_ALWAYS on the Clevo N150CU
and the Clevo NHxxRZQ makes the keyboard very laggy for ~5 seconds after
boot and sometimes also after resume. However both are required for the
keyboard to not fail completely sometimes after boot or resume.
Signed-off-by: Werner Sembach <wse@tuxedocomputers.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230321191619.647911-1-wse@tuxedocomputers.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Nathan reported that when building with GNU as and a version of clang that
defaults to DWARF5, the assembler will complain with:
Error: non-constant .uleb128 is not supported
This is because `-g` defaults to the compiler debug info default. If the
assembler does not support some of the directives used, the above errors
occur. To fix, remove the explicit passing of `-g`.
All the test wants is that stack traces print valid function names, and
debug info is not required for that. (I currently cannot recall why I
added the explicit `-g`.)
Link: https://lkml.kernel.org/r/20230316224705.709984-2-elver@google.com
Fixes: 1fe84fd4a4 ("kcsan: Add test suite")
Signed-off-by: Marco Elver <elver@google.com>
Reported-by: Nathan Chancellor <nathan@kernel.org>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Nathan reported that when building with GNU as and a version of clang that
defaults to DWARF5:
$ make -skj"$(nproc)" ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- \
LLVM=1 LLVM_IAS=0 O=build \
mrproper allmodconfig mm/kfence/kfence_test.o
/tmp/kfence_test-08a0a0.s: Assembler messages:
/tmp/kfence_test-08a0a0.s:14627: Error: non-constant .uleb128 is not supported
/tmp/kfence_test-08a0a0.s:14628: Error: non-constant .uleb128 is not supported
/tmp/kfence_test-08a0a0.s:14632: Error: non-constant .uleb128 is not supported
/tmp/kfence_test-08a0a0.s:14633: Error: non-constant .uleb128 is not supported
/tmp/kfence_test-08a0a0.s:14639: Error: non-constant .uleb128 is not supported
...
This is because `-g` defaults to the compiler debug info default. If the
assembler does not support some of the directives used, the above errors
occur. To fix, remove the explicit passing of `-g`.
All the test wants is that stack traces print valid function names, and
debug info is not required for that. (I currently cannot recall why I
added the explicit `-g`.)
Link: https://lkml.kernel.org/r/20230316224705.709984-1-elver@google.com
Fixes: bc8fbc5f30 ("kfence: add test suite")
Signed-off-by: Marco Elver <elver@google.com>
Reported-by: Nathan Chancellor <nathan@kernel.org>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This reverts commit 487a32ec24.
should_skip_kasan_poison() reads the PG_skip_kasan_poison flag from
page->flags. However, this line of code in free_pages_prepare():
page->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;
clears most of page->flags, including PG_skip_kasan_poison, before calling
should_skip_kasan_poison(), which meant that it would never return true as
a result of the page flag being set. Therefore, fix the code to call
should_skip_kasan_poison() before clearing the flags, as we were doing
before the reverted patch.
This fixes a measurable performance regression introduced in the reverted
commit, where munmap() takes longer than intended if HW tags KASAN is
supported and enabled at runtime. Without this patch, we see a
single-digit percentage performance regression in a particular
mmap()-heavy benchmark when enabling HW tags KASAN, and with the patch,
there is no statistically significant performance impact when enabling HW
tags KASAN.
Link: https://lkml.kernel.org/r/20230310042914.3805818-2-pcc@google.com
Fixes: 487a32ec24 ("kasan: drop skip_kasan_poison variable in free_pages_prepare")
Link: https://linux-review.googlesource.com/id/Ic4f13affeebd20548758438bb9ed9ca40e312b79
Signed-off-by: Peter Collingbourne <pcc@google.com>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com> [arm64]
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: <stable@vger.kernel.org> [6.1]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The ioctl helper function nilfs_ioctl_wrap_copy(), which exchanges a
metadata array to/from user space, may copy uninitialized buffer regions
to user space memory for read-only ioctl commands NILFS_IOCTL_GET_SUINFO
and NILFS_IOCTL_GET_CPINFO.
This can occur when the element size of the user space metadata given by
the v_size member of the argument nilfs_argv structure is larger than the
size of the metadata element (nilfs_suinfo structure or nilfs_cpinfo
structure) on the file system side.
KMSAN-enabled kernels detect this issue as follows:
BUG: KMSAN: kernel-infoleak in instrument_copy_to_user
include/linux/instrumented.h:121 [inline]
BUG: KMSAN: kernel-infoleak in _copy_to_user+0xc0/0x100 lib/usercopy.c:33
instrument_copy_to_user include/linux/instrumented.h:121 [inline]
_copy_to_user+0xc0/0x100 lib/usercopy.c:33
copy_to_user include/linux/uaccess.h:169 [inline]
nilfs_ioctl_wrap_copy+0x6fa/0xc10 fs/nilfs2/ioctl.c:99
nilfs_ioctl_get_info fs/nilfs2/ioctl.c:1173 [inline]
nilfs_ioctl+0x2402/0x4450 fs/nilfs2/ioctl.c:1290
nilfs_compat_ioctl+0x1b8/0x200 fs/nilfs2/ioctl.c:1343
__do_compat_sys_ioctl fs/ioctl.c:968 [inline]
__se_compat_sys_ioctl+0x7dd/0x1000 fs/ioctl.c:910
__ia32_compat_sys_ioctl+0x93/0xd0 fs/ioctl.c:910
do_syscall_32_irqs_on arch/x86/entry/common.c:112 [inline]
__do_fast_syscall_32+0xa2/0x100 arch/x86/entry/common.c:178
do_fast_syscall_32+0x37/0x80 arch/x86/entry/common.c:203
do_SYSENTER_32+0x1f/0x30 arch/x86/entry/common.c:246
entry_SYSENTER_compat_after_hwframe+0x70/0x82
Uninit was created at:
__alloc_pages+0x9f6/0xe90 mm/page_alloc.c:5572
alloc_pages+0xab0/0xd80 mm/mempolicy.c:2287
__get_free_pages+0x34/0xc0 mm/page_alloc.c:5599
nilfs_ioctl_wrap_copy+0x223/0xc10 fs/nilfs2/ioctl.c:74
nilfs_ioctl_get_info fs/nilfs2/ioctl.c:1173 [inline]
nilfs_ioctl+0x2402/0x4450 fs/nilfs2/ioctl.c:1290
nilfs_compat_ioctl+0x1b8/0x200 fs/nilfs2/ioctl.c:1343
__do_compat_sys_ioctl fs/ioctl.c:968 [inline]
__se_compat_sys_ioctl+0x7dd/0x1000 fs/ioctl.c:910
__ia32_compat_sys_ioctl+0x93/0xd0 fs/ioctl.c:910
do_syscall_32_irqs_on arch/x86/entry/common.c:112 [inline]
__do_fast_syscall_32+0xa2/0x100 arch/x86/entry/common.c:178
do_fast_syscall_32+0x37/0x80 arch/x86/entry/common.c:203
do_SYSENTER_32+0x1f/0x30 arch/x86/entry/common.c:246
entry_SYSENTER_compat_after_hwframe+0x70/0x82
Bytes 16-127 of 3968 are uninitialized
...
This eliminates the leak issue by initializing the page allocated as
buffer using get_zeroed_page().
Link: https://lkml.kernel.org/r/20230307085548.6290-1-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: syzbot+132fdd2f1e1805fdc591@syzkaller.appspotmail.com
Link: https://lkml.kernel.org/r/000000000000a5bd2d05f63f04ae@google.com
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "Fix mas_skip_node() for mas_empty_area()", v2.
mas_empty_area() was incorrectly returning an error when there was room.
The issue was tracked down to mas_skip_node() using the incorrect
end-of-slot count. Instead of using the nodes hard limit, the limit of
data should be used.
mas_skip_node() was also setting the min and max to that of the child
node, which was unnecessary. Within these limits being set, there was
also a bug that corrupted the maple state's max if the offset was set to
the maximum node pivot. The bug was without consequence unless there was
a sufficient gap in the next child node which would cause an error to be
returned.
This patch set fixes these errors by removing the limit setting from
mas_skip_node() and uses the mas_data_end() for slot limits, and adds
tests for all failures discovered.
This patch (of 2):
mas_skip_node() is used to move the maple state to the node with a higher
limit. It does this by walking up the tree and increasing the slot count.
Since slot count may not be able to be increased, it may need to walk up
multiple times to find room to walk right to a higher limit node. The
limit of slots that was being used was the node limit and not the last
location of data in the node. This would cause the maple state to be
shifted outside actual data and enter an error state, thus returning
-EBUSY.
The result of the incorrect error state means that mas_awalk() would
return an error instead of finding the allocation space.
The fix is to use mas_data_end() in mas_skip_node() to detect the nodes
data end point and continue walking the tree up until it is safe to move
to a node with a higher limit.
The walk up the tree also sets the maple state limits so remove the buggy
code from mas_skip_node(). Setting the limits had the unfortunate side
effect of triggering another bug if the parent node was full and the there
was no suitable gap in the second last child, but room in the next child.
mas_skip_node() may also be passed a maple state in an error state from
mas_anode_descend() when no allocations are available. Return on such an
error state immediately.
Link: https://lkml.kernel.org/r/20230307180247.2220303-1-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20230307180247.2220303-2-Liam.Howlett@oracle.com
Fixes: 54a611b605 ("Maple Tree: add new data structure")
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reported-by: Snild Dolkow <snild@sony.com>
Link: https://lore.kernel.org/linux-mm/cb8dc31a-fef2-1d09-f133-e9f7b9f9e77a@sony.com/
Tested-by: Snild Dolkow <snild@sony.com>
Cc: Peng Zhang <zhangpeng.00@bytedance.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Gao Xiang has reported that the page allocator complains about high order
__GFP_NOFAIL request coming from the vmalloc core:
__alloc_pages+0x1cb/0x5b0 mm/page_alloc.c:5549
alloc_pages+0x1aa/0x270 mm/mempolicy.c:2286
vm_area_alloc_pages mm/vmalloc.c:2989 [inline]
__vmalloc_area_node mm/vmalloc.c:3057 [inline]
__vmalloc_node_range+0x978/0x13c0 mm/vmalloc.c:3227
kvmalloc_node+0x156/0x1a0 mm/util.c:606
kvmalloc include/linux/slab.h:737 [inline]
kvmalloc_array include/linux/slab.h:755 [inline]
kvcalloc include/linux/slab.h:760 [inline]
it seems that I have completely missed high order allocation backing
vmalloc areas case when implementing __GFP_NOFAIL support. This means
that [k]vmalloc at al. can allocate higher order allocations with
__GFP_NOFAIL which can trigger OOM killer for non-costly orders easily or
cause a lot of reclaim/compaction activity if those requests cannot be
satisfied.
Fix the issue by falling back to zero order allocations for __GFP_NOFAIL
requests if the high order request fails.
Link: https://lkml.kernel.org/r/ZAXynvdNqcI0f6Us@dhcp22.suse.cz
Fixes: 9376130c39 ("mm/vmalloc: add support for __GFP_NOFAIL")
Reported-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lkml.kernel.org/r/20230305053035.1911-1-hsiangkao@linux.alibaba.com
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Baoquan He <bhe@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Daniel Borkmann says:
====================
pull-request: bpf 2023-03-23
We've added 8 non-merge commits during the last 13 day(s) which contain
a total of 21 files changed, 238 insertions(+), 161 deletions(-).
The main changes are:
1) Fix verification issues in some BPF programs due to their stack usage
patterns, from Eduard Zingerman.
2) Fix to add missing overflow checks in xdp_umem_reg and return an error
in such case, from Kal Conley.
3) Fix and undo poisoning of strlcpy in libbpf given it broke builds for
libcs which provided the former like uClibc-ng, from Jesus Sanchez-Palencia.
4) Fix insufficient bpf_jit_limit default to avoid users running into hard
to debug seccomp BPF errors, from Daniel Borkmann.
5) Fix driver return code when they don't support a bpf_xdp_metadata kfunc
to make it unambiguous from other errors, from Jesper Dangaard Brouer.
6) Two BPF selftest fixes to address compilation errors from recent changes
in kernel structures, from Alexei Starovoitov.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
xdp: bpf_xdp_metadata use EOPNOTSUPP for no driver support
bpf: Adjust insufficient default bpf_jit_limit
xsk: Add missing overflow check in xdp_umem_reg
selftests/bpf: Fix progs/test_deny_namespace.c issues.
selftests/bpf: Fix progs/find_vma_fail1.c build error.
libbpf: Revert poisoning of strlcpy
selftests/bpf: Tests for uninitialized stack reads
bpf: Allow reads from uninit stack
====================
Link: https://lore.kernel.org/r/20230323225221.6082-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Luiz Augusto von Dentz says:
====================
bluetooth pull request for net:
- Fix MGMT add advmon with RSSI command
- L2CAP: Fix responding with wrong PDU type
- Fix race condition in hci_cmd_sync_clear
- ISO: Fix timestamped HCI ISO data packet parsing
- HCI: Fix global-out-of-bounds
- hci_sync: Resume adv with no RPA when active scan
* tag 'for-net-2023-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: HCI: Fix global-out-of-bounds
Bluetooth: mgmt: Fix MGMT add advmon with RSSI command
Bluetooth: btsdio: fix use after free bug in btsdio_remove due to unfinished work
Bluetooth: L2CAP: Fix responding with wrong PDU type
Bluetooth: btqcomsmd: Fix command timeout after setting BD address
Bluetooth: btinel: Check ACPI handle for NULL before accessing
Bluetooth: Remove "Power-on" check from Mesh feature
Bluetooth: Fix race condition in hci_cmd_sync_clear
Bluetooth: btintel: Iterate only bluetooth device ACPI entries
Bluetooth: ISO: fix timestamped HCI ISO data packet parsing
Bluetooth: btusb: Remove detection of ISO packets over bulk
Bluetooth: hci_core: Detect if an ACL packet is in fact an ISO packet
Bluetooth: hci_sync: Resume adv with no RPA when active scan
====================
Link: https://lore.kernel.org/r/20230323202335.3380841-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kalle Valo says:
====================
wireless fixes for v6.3
Third set of fixes for v6.3. mt76 has two kernel crash fixes and
adding back 160 MHz channel support for mt7915. mac80211 has fixes for
a race in transmit path and two mesh related fixes. iwlwifi also has
fixes for races.
* tag 'wireless-2023-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
wifi: mac80211: fix mesh path discovery based on unicast packets
wifi: mac80211: fix qos on mesh interfaces
wifi: iwlwifi: mvm: protect TXQ list manipulation
wifi: iwlwifi: mvm: fix mvmtxq->stopped handling
wifi: mac80211: Serialize ieee80211_handle_wake_tx_queue()
wifi: mwifiex: mark OF related data as maybe unused
wifi: mt76: connac: do not check WED status for non-mmio devices
wifi: mt76: mt7915: add back 160MHz channel width support for MT7915
wifi: mt76: do not run mt76_unregister_device() on unregistered hw
====================
Link: https://lore.kernel.org/r/20230323110332.C4FE4C433D2@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull gfs2 fix from Andreas Gruenbacher:
- Reinstate commit 970343cd49 ("GFS2: free disk inode which is
deleted by remote node -V2") as reverting that commit could cause
gfs2_put_super() to hang.
* tag 'gfs2-v6.3-rc3-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
Reinstate "GFS2: free disk inode which is deleted by remote node -V2"
The MGMT command: MGMT_OP_ADD_ADV_PATTERNS_MONITOR_RSSI uses variable
length argument. This causes host not able to register advmon with rssi.
This patch has been locally tested by adding monitor with rssi via
btmgmt on a kernel 6.1 machine.
Reviewed-by: Archie Pusaka <apusaka@chromium.org>
Fixes: b338d91703 ("Bluetooth: Implement support for Mesh")
Signed-off-by: Howard Chung <howardchung@google.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
In btsdio_probe, &data->work was bound with btsdio_work.In
btsdio_send_frame, it was started by schedule_work.
If we call btsdio_remove with an unfinished job, there may
be a race condition and cause UAF bug on hdev.
Fixes: ddbaf13e36 ("[Bluetooth] Add generic driver for Bluetooth SDIO devices")
Signed-off-by: Zheng Wang <zyytlz.wz@163.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
L2CAP_ECRED_CONN_REQ shall be responded with L2CAP_ECRED_CONN_RSP not
L2CAP_LE_CONN_RSP:
L2CAP LE EATT Server - Reject - run
Listening for connections
New client connection with handle 0x002a
Sending L2CAP Request from client
Client received response code 0x15
Unexpected L2CAP response code (expected 0x18)
L2CAP LE EATT Server - Reject - test failed
> ACL Data RX: Handle 42 flags 0x02 dlen 26
LE L2CAP: Enhanced Credit Connection Request (0x17) ident 1 len 18
PSM: 39 (0x0027)
MTU: 64
MPS: 64
Credits: 5
Source CID: 65
Source CID: 66
Source CID: 67
Source CID: 68
Source CID: 69
< ACL Data TX: Handle 42 flags 0x00 dlen 16
LE L2CAP: LE Connection Response (0x15) ident 1 len 8
invalid size
00 00 00 00 00 00 06 00
L2CAP LE EATT Server - Reject - run
Listening for connections
New client connection with handle 0x002a
Sending L2CAP Request from client
Client received response code 0x18
L2CAP LE EATT Server - Reject - test passed
Fixes: 15f02b9105 ("Bluetooth: L2CAP: Add initial code for Enhanced Credit Based Mode")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
On most devices using the btqcomsmd driver (e.g. the DragonBoard 410c
and other devices based on the Qualcomm MSM8916/MSM8909/... SoCs)
the Bluetooth firmware seems to become unresponsive for a while after
setting the BD address. On recent kernel versions (at least 5.17+)
this often causes timeouts for subsequent commands, e.g. the HCI reset
sent by the Bluetooth core during initialization:
Bluetooth: hci0: Opcode 0x c03 failed: -110
Unfortunately this behavior does not seem to be documented anywhere.
Experimentation suggests that the minimum necessary delay to avoid
the problem is ~150us. However, to be sure add a sleep for > 1ms
in case it is a bit longer on other firmware versions.
Older kernel versions are likely also affected, although perhaps with
slightly different errors or less probability. Side effects can easily
hide the issue in most cases, e.g. unrelated incoming interrupts that
cause the necessary delay.
Fixes: 1511cc750c ("Bluetooth: Introduce Qualcomm WCNSS SMD based HCI driver")
Signed-off-by: Stephan Gerhold <stephan.gerhold@kernkonzept.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
There are two related issues that appear in certain combinations with
clang and GNU binutils.
The first occurs when a version of clang that supports zicsr or zifencei
via '-march=' [1] (i.e, >= 17.x) is used in combination with a version
of GNU binutils that do not recognize zicsr and zifencei in the
'-march=' value (i.e., < 2.36):
riscv64-linux-gnu-ld: -march=rv64i2p0_m2p0_a2p0_c2p0_zicsr2p0_zifencei2p0: Invalid or unknown z ISA extension: 'zifencei'
riscv64-linux-gnu-ld: failed to merge target specific data of file fs/efivarfs/file.o
riscv64-linux-gnu-ld: -march=rv64i2p0_m2p0_a2p0_c2p0_zicsr2p0_zifencei2p0: Invalid or unknown z ISA extension: 'zifencei'
riscv64-linux-gnu-ld: failed to merge target specific data of file fs/efivarfs/super.o
The second occurs when a version of clang that does not support zicsr or
zifencei via '-march=' (i.e., <= 16.x) is used in combination with a
version of GNU as that defaults to a newer ISA base spec, which requires
specifying zicsr and zifencei in the '-march=' value explicitly (i.e, >=
2.38):
../arch/riscv/kernel/kexec_relocate.S: Assembler messages:
../arch/riscv/kernel/kexec_relocate.S:147: Error: unrecognized opcode `fence.i', extension `zifencei' required
clang-12: error: assembler command failed with exit code 1 (use -v to see invocation)
This is the same issue addressed by commit 6df2a016c0 ("riscv: fix
build with binutils 2.38") (see [2] for additional information) but
older versions of clang miss out on it because the cc-option check
fails:
clang-12: error: invalid arch name 'rv64imac_zicsr_zifencei', unsupported standard user-level extension 'zicsr'
clang-12: error: invalid arch name 'rv64imac_zicsr_zifencei', unsupported standard user-level extension 'zicsr'
To resolve the first issue, only attempt to add zicsr and zifencei to
the march string when using the GNU assembler 2.38 or newer, which is
when the default ISA spec was updated, requiring these extensions to be
specified explicitly. LLVM implements an older version of the base
specification for all currently released versions, so these instructions
are available as part of the 'i' extension. If LLVM's implementation is
updated in the future, a CONFIG_AS_IS_LLVM condition can be added to
CONFIG_TOOLCHAIN_NEEDS_EXPLICIT_ZICSR_ZIFENCEI.
To resolve the second issue, use version 2.2 of the base ISA spec when
using an older version of clang that does not support zicsr or zifencei
via '-march=', as that is the spec version most compatible with the one
clang/LLVM implements and avoids the need to specify zicsr and zifencei
explicitly due to still being a part of 'i'.
[1]: 22e199e6af
[2]: https://lore.kernel.org/ZAxT7T9Xy1Fo3d5W@aurel32.net/
Cc: stable@vger.kernel.org
Link: https://github.com/ClangBuiltLinux/linux/issues/1808
Co-developed-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230313-riscv-zicsr-zifencei-fiasco-v1-1-dd1b7840a551@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
NFS server Duplicate Request Cache (DRC) algorithms rely on NFS clients
reconnecting using the same local TCP port. Unique NFS operations are
identified by the per-TCP connection set of XIDs. This prevents file
corruption when non-idempotent NFS operations are retried.
Currently, NFS client TCP connections are using different local TCP ports
when reconnecting to NFS servers.
After an NFS server initiates shutdown of the TCP connection, the NFS
client's TCP socket is set to NULL after the socket state has reached
TCP_LAST_ACK(9). When reconnecting, the new socket attempts to reuse
the same local port but fails with EADDRNOTAVAIL (99). This forces the
socket to use a different local TCP port to reconnect to the remote NFS
server.
State Transition and Events:
TCP_CLOSE_WAIT(8)
TCP_LAST_ACK(9)
connect(fail EADDRNOTAVAIL(99))
TCP_CLOSE(7)
bind on new port
connect success
dmesg excerpts showing reconnect switching from TCP local port of 926 to
763 after commit 7c81e6a9d7:
[13354.947854] NFS call mkdir testW
...
[13405.654781] RPC: xs_tcp_state_change client 00000000037d0f03...
[13405.654813] RPC: state 8 conn 1 dead 0 zapped 1 sk_shutdown 1
[13405.654826] RPC: xs_data_ready...
[13405.654892] RPC: xs_tcp_state_change client 00000000037d0f03...
[13405.654895] RPC: state 9 conn 0 dead 0 zapped 1 sk_shutdown 3
[13405.654899] RPC: xs_tcp_state_change client 00000000037d0f03...
[13405.654900] RPC: state 9 conn 0 dead 0 zapped 1 sk_shutdown 3
[13405.654950] RPC: xs_connect scheduled xprt 00000000037d0f03
[13405.654975] RPC: xs_bind 0.0.0.0:926: ok (0)
[13405.654980] RPC: worker connecting xprt 00000000037d0f03 via tcp
to 10.101.6.228 (port 2049)
[13405.654991] RPC: 00000000037d0f03 connect status 99 connected 0
sock state 7
[13405.655001] RPC: xs_tcp_state_change client 00000000037d0f03...
[13405.655002] RPC: state 7 conn 0 dead 0 zapped 1 sk_shutdown 3
[13405.655024] RPC: xs_connect scheduled xprt 00000000037d0f03
[13405.655038] RPC: xs_bind 0.0.0.0:763: ok (0)
[13405.655041] RPC: worker connecting xprt 00000000037d0f03 via tcp
to 10.101.6.228 (port 2049)
[13405.655065] RPC: 00000000037d0f03 connect status 115 connected 0
sock state 2
State Transition and Events with patch applied:
TCP_CLOSE_WAIT(8)
TCP_LAST_ACK(9)
TCP_CLOSE(7)
connect(reuse of port succeeds)
dmesg excerpts showing reconnect on same TCP local port of 936 with patch
applied:
[ 257.139935] NFS: mkdir(0:59/560857152), testQ
[ 257.139937] NFS call mkdir testQ
...
[ 307.822702] RPC: state 8 conn 1 dead 0 zapped 1 sk_shutdown 1
[ 307.822714] RPC: xs_data_ready...
[ 307.822817] RPC: xs_tcp_state_change client 00000000ce702f14...
[ 307.822821] RPC: state 9 conn 0 dead 0 zapped 1 sk_shutdown 3
[ 307.822825] RPC: xs_tcp_state_change client 00000000ce702f14...
[ 307.822826] RPC: state 9 conn 0 dead 0 zapped 1 sk_shutdown 3
[ 307.823606] RPC: xs_tcp_state_change client 00000000ce702f14...
[ 307.823609] RPC: state 7 conn 0 dead 0 zapped 1 sk_shutdown 3
[ 307.823629] RPC: xs_tcp_state_change client 00000000ce702f14...
[ 307.823632] RPC: state 7 conn 0 dead 0 zapped 1 sk_shutdown 3
[ 307.823676] RPC: xs_connect scheduled xprt 00000000ce702f14
[ 307.823704] RPC: xs_bind 0.0.0.0:936: ok (0)
[ 307.823709] RPC: worker connecting xprt 00000000ce702f14 via tcp
to 10.101.1.30 (port 2049)
[ 307.823748] RPC: 00000000ce702f14 connect status 115 connected 0
sock state 2
...
[ 314.916193] RPC: state 7 conn 0 dead 0 zapped 1 sk_shutdown 3
[ 314.916251] RPC: xs_connect scheduled xprt 00000000ce702f14
[ 314.916282] RPC: xs_bind 0.0.0.0:936: ok (0)
[ 314.916292] RPC: worker connecting xprt 00000000ce702f14 via tcp
to 10.101.1.30 (port 2049)
[ 314.916342] RPC: 00000000ce702f14 connect status 115 connected 0
sock state 2
Fixes: 7c81e6a9d7 ("SUNRPC: Tweak TCP socket shutdown in the RPC client")
Signed-off-by: Siddharth Rajendra Kawar <sikawar@microsoft.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Pull NVMe fixes from Christoph:
"nvme fixes for Linux 6.3
- send Identify with CNS 06h only to I/O controllers (Martin George)
- fix nvme_tcp_term_pdu to match spec (Caleb Sander)"
* tag 'nvme-6.3-2023-03-23' of git://git.infradead.org/nvme:
nvme-tcp: fix nvme_tcp_term_pdu to match spec
nvme: send Identify with CNS 06h only to I/O controllers
It turns out that reverting commit 970343cd49 ("GFS2: free disk inode
which is deleted by remote node -V2") causes a regression related to
evicting inodes that were unlinked on a different cluster node.
We could also have simply added a call to d_mark_dontcache() to function
gfs2_try_evict(), but the original pre-revert code is better tested and
proven.
This reverts commit 445cb1277e.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
When in dual role mode (dr_mode == USB_DR_MODE_OTG), platform probe
successively basically calls:
- dwc2_gadget_init()
- dwc2_hcd_init()
- dwc2_lowlevel_hw_disable() since recent change [1]
- usb_add_gadget_udc()
The PHYs (and so the clocks it may provide) shouldn't be disabled for all
SoCs, in OTG mode, as the HCD part has been initialized.
On STM32 this creates some weird race condition upon boot, when:
- initially attached as a device, to a HOST
- and there is a gadget script invoked to setup the device part.
Below issue becomes systematic, as long as the gadget script isn't
started by userland: the hardware PHYs (and so the clocks provided by the
PHYs) remains disabled.
It ends up in having an endless interrupt storm, before the watchdog
resets the platform.
[ 16.924163] dwc2 49000000.usb-otg: EPs: 9, dedicated fifos, 952 entries in SPRAM
[ 16.962704] dwc2 49000000.usb-otg: DWC OTG Controller
[ 16.966488] dwc2 49000000.usb-otg: new USB bus registered, assigned bus number 2
[ 16.974051] dwc2 49000000.usb-otg: irq 77, io mem 0x49000000
[ 17.032170] hub 2-0:1.0: USB hub found
[ 17.042299] hub 2-0:1.0: 1 port detected
[ 17.175408] dwc2 49000000.usb-otg: Mode Mismatch Interrupt: currently in Host mode
[ 17.181741] dwc2 49000000.usb-otg: Mode Mismatch Interrupt: currently in Host mode
[ 17.189303] dwc2 49000000.usb-otg: Mode Mismatch Interrupt: currently in Host mode
...
The host part is also not functional, until the gadget part is configured.
The HW may only be disabled for peripheral mode (original init), e.g.
dr_mode == USB_DR_MODE_PERIPHERAL, until the gadget driver initializes.
But when in USB_DR_MODE_OTG, the HW should remain enabled, as the HCD part
is able to run, while the gadget part isn't necessarily configured.
I don't fully get the of purpose the original change, that claims disabling
the hardware is missing. It creates conditions on SOCs using the PHY
initialization to be completely non working in OTG mode. Original
change [1] should be reworked to be platform specific.
[1] https://lore.kernel.org/r/20221206-dwc2-gadget-dual-role-v1-2-36515e1092cd@theobroma-systems.com
Fixes: ade23d7b7e ("usb: dwc2: power on/off phy for peripheral mode in dual-role mode")
Cc: stable <stable@kernel.org>
Signed-off-by: Fabrice Gasnier <fabrice.gasnier@foss.st.com>
Reviewed-by: Quentin Schulz <quentin.schulz@theobroma-systems.com>
Tested-by: Quentin Schulz <quentin.schulz@theobroma-systems.com>
Link: https://lore.kernel.org/r/20230315144433.3095859-1-fabrice.gasnier@foss.st.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Each time the platform goes to low power, PM suspend / resume routines
call: __dwc2_lowlevel_hw_enable -> devm_add_action_or_reset().
This adds a new devres each time.
This may also happen at runtime, as dwc2_lowlevel_hw_enable() can be
called from udc_start().
This can be seen with tracing:
- echo 1 > /sys/kernel/debug/tracing/events/dev/devres_log/enable
- go to low power
- cat /sys/kernel/debug/tracing/trace
A new "ADD" entry is found upon each low power cycle:
... devres_log: 49000000.usb-otg ADD 82a13bba devm_action_release (8 bytes)
... devres_log: 49000000.usb-otg ADD 49889daf devm_action_release (8 bytes)
...
A second issue is addressed here:
- regulator_bulk_enable() is called upon each PM cycle (suspend/resume).
- regulator_bulk_disable() never gets called.
So the reference count for these regulators constantly increase, by one
upon each low power cycle, due to missing regulator_bulk_disable() call
in __dwc2_lowlevel_hw_disable().
The original fix that introduced the devm_add_action_or_reset() call,
fixed an issue during probe, that happens due to other errors in
dwc2_driver_probe() -> dwc2_core_reset(). Then the probe fails without
disabling regulators, when dr_mode == USB_DR_MODE_PERIPHERAL.
Rather fix the error path: disable all the low level hardware in the
error path, by using the "hsotg->ll_hw_enabled" flag. Checking dr_mode
has been introduced to avoid a dual call to dwc2_lowlevel_hw_disable().
"ll_hw_enabled" should achieve the same (and is used currently in the
remove() routine).
Fixes: 54c1960605 ("usb: dwc2: Always disable regulators on driver teardown")
Fixes: 33a06f1300 ("usb: dwc2: Fix error path in gadget registration")
Cc: stable <stable@kernel.org>
Signed-off-by: Fabrice Gasnier <fabrice.gasnier@foss.st.com>
Link: https://lore.kernel.org/r/20230316084127.126084-1-fabrice.gasnier@foss.st.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pull zonefs fixes from Damien Le Moal:
- Silence a false positive smatch warning about an uninitialized
variable
- Fix an error message to provide more useful information about invalid
zone append write results
* tag 'zonefs-6.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs:
zonefs: Fix error message in zonefs_file_dio_append()
zonefs: Prevent uninitialized symbol 'size' warning
In the output of /proc/fs/cifs/open_files, we only print
the tree id for the tcon of each open file. It becomes
difficult to know which tcon these files belong to with
just the tree id.
This change dumps ses id in addition to all other data today.
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Cc: stable@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
Currently, we only dump the pending mid information only
on the primary channel in /proc/fs/cifs/DebugData.
If multichannel is active, we do not print the pending MID
list on secondary channels.
This change will dump the pending mids for all the channels
based on server->conn_id.
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Cc: stable@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
When querying server interfaces returns -EOPNOTSUPP,
clear the list of interfaces. Assumption is that multichannel
would be disabled too.
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Cc: stable@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
We have the server interface list hanging off the tcon
structure today for reasons unknown. So each tcon which is
connected to a file server can query them separately,
which is really unnecessary. To avoid this, in the query
function, we will check the time of last update of the
interface list, and avoid querying the server if it is
within a certain range.
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Cc: stable@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
The logic in efi_random_alloc() will iterate over the memory map twice,
once to count the number of candidate slots, and another time to locate
the chosen slot after randomization.
If there is insufficient memory to do the allocation, the second loop
will run to completion without actually having located a slot, but we
currently return EFI_SUCCESS in this case, as we fail to initialize
status to the appropriate error value of EFI_OUT_OF_RESOURCES.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
The order in which clocks are stopped matters as some of the clock
like NPL are derived from MCLK.
Without this patch, Dragonboard RB5 DSP would crash with below error:
qcom_q6v5_pas 17300000.remoteproc: fatal error received:
ABT_dal.c:278:ABTimeout: AHB Bus hang is detected,
Number of bus hang detected := 2 , addr0 = 0x3370000 , addr1 = 0x0!!!
Turn off fsgen first, followed by npl and then finally mclk, which is exactly
the opposite order of enable sequence.
Fixes: 1dc3459009 ("ASoC: codecs: lpass: register mclk after runtime pm")
Reported-by: Amit Pundir <amit.pundir@linaro.org>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Tested-by: Amit Pundir <amit.pundir@linaro.org>
Link: https://lore.kernel.org/r/20230323110125.23790-1-srinivas.kandagatla@linaro.org
Signed-off-by: Mark Brown <broonie@kernel.org>
The driver only supports normal polarity. Complete the implementation of
.get_state() by setting .polarity accordingly.
This fixes a regression that was possible since commit c73a310762
("pwm: Handle .get_state() failures") which stopped to zero-initialize
the state passed to the .get_state() callback. This was reported at
https://forum.odroid.com/viewtopic.php?f=177&t=46360 . While this was an
unintended side effect, the real issue is the driver's callback not
setting the polarity.
There is a complicating fact, that the .apply() callback fakes support
for inversed polarity. This is not (and cannot) be matched by
.get_state(). As fixing this isn't easy, only point it out in a comment
to prevent authors of other drivers from copying that approach.
Fixes: c375bcbaab ("pwm: meson: Read the full hardware state in meson_pwm_get_state()")
Reported-by: Munehisa Kamata <kamatam@amazon.com>
Acked-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Link: https://lore.kernel.org/r/20230310191405.2606296-1-u.kleine-koenig@pengutronix.de
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
[Why]
The commit c76e483cd9 ("drm/amd/display: Don't restrict bpc to 8 bpc")
removes the historical 8bpc dependency and sets max_bpc to 16.
[How]
The comment that states "8bpc for non-edp" needs to be removed as well.
Reviewed-by: Harry Wentland <Harry.Wentland@amd.com>
Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
sriov needs to enter/exit safe mode in update umd p state
add the cg flag to let it enter or exit while needed
Signed-off-by: Jane Jian <Jane.Jian@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
For engines not supporting soft reset, i.e. VCN, there will be a failed
ib test before mode 1 reset during asic reset. The fences in this case
are never signaled and next time when we try to free the sa_bo, kernel
will hang.
[How]
During pre_asic_reset, driver will clear job fences and afterwards the
fences' refcount will be reduced to 1. For drm_sched_jobs it will be
released in job_free_cb, and for non-sched jobs like ib_test, it's meant
to be released in sa_bo_free but only when the fences are signaled. So
we have to force signal the non_sched bad job's fence during
pre_asic_reset or the clear is not complete.
Signed-off-by: YuBiao Wang <YuBiao.Wang@amd.com>
Acked-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[why]
when gfx do soft reset, mes will also do reset, if mes is not
resumed when do recover from soft reset, mes is unable to respond
in later sequence
[how]
resume mes when do gfx post soft reset
Signed-off-by: Tong Liu01 <Tong.Liu01@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
For GC IP v11.0.4/11, PSP TMR need to be reserved
for ASIC mode2 reset. But for S4, when psp suspend,
it will destroy the TMR that fails the ASIC reset.
[ 96.006101] amdgpu 0000:62:00.0: amdgpu: MODE2 reset
[ 100.409717] amdgpu 0000:62:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000011 SMN_C2PMSG_82:0x00000002
[ 100.411593] amdgpu 0000:62:00.0: amdgpu: Mode2 reset failed!
[ 100.412470] amdgpu 0000:62:00.0: PM: pci_pm_freeze(): amdgpu_pmops_freeze+0x0/0x50 [amdgpu] returns -62
[ 100.414020] amdgpu 0000:62:00.0: PM: dpm_run_callback(): pci_pm_freeze+0x0/0xd0 returns -62
[ 100.415311] amdgpu 0000:62:00.0: PM: pci_pm_freeze+0x0/0xd0 returned -62 after 4623202 usecs
[ 100.416608] amdgpu 0000:62:00.0: PM: failed to freeze async: error -62
We can skip the reset on APUs, assuming we can resume them
properly. Verified on some GFX11, GFX10 and old GFX9 APUs.
Signed-off-by: Tim Huang <tim.huang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.1.x
In some cases, we expose the kernel's struct screen_info to the EFI stub
directly, so it gets populated before even entering the kernel. This
means the early console is available as soon as the early param parsing
happens, which is nice. It also means we need two different ways to pass
this information, as this trick only works if the EFI stub is baked into
the core kernel image, which is not always the case.
Huacai reports that the preparatory refactoring that was needed to
implement this alternative method for zboot resulted in a non-functional
efifb earlycon for other cases as well, due to the reordering of the
kernel image relocation with the population of the screen_info struct,
and the latter now takes place after copying the image to its new
location, which means we copy the old, uninitialized state.
So let's ensure that the same-image version of alloc_screen_info()
produces the correct screen_info pointer, by taking the displacement of
the loaded image into account.
Reported-by: Huacai Chen <chenhuacai@loongson.cn>
Tested-by: Huacai Chen <chenhuacai@loongson.cn>
Link: https://lore.kernel.org/linux-efi/20230310021749.921041-1-chenhuacai@loongson.cn/
Fixes: 42c8ea3dca ("efi: libstub: Factor out EFI stub entrypoint into separate file")
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Timing Information in Datasheet assumes that HIGH_SPEED_ENA=1 should be
set for SDR12 and SDR25 modes. But sdhci_am654 driver clears
HIGH_SPEED_ENA register. Thus, Modify sdhci_am654 to not clear
HIGH_SPEED_ENA (HOST_CONTROL[2]) bit for SDR12 and SDR25 speed modes.
Fixes: e374e87538 ("mmc: sdhci_am654: Clear HISPD_ENA in some lower speed modes")
Signed-off-by: Bhavya Kapoor <b-kapoor@ti.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230317092711.660897-1-b-kapoor@ti.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
In device_for_each_child_node(), we should add fwnode_handle_put()
when break out of the iteration device_for_each_child_node()
as it will automatically increase and decrease the refcounter.
Fixes: 379d7ac7ca ("phy: mdio-thunder: Add driver for Cavium Thunder SoC MDIO buses.")
Signed-off-by: Liang He <windhl@126.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As for multicast:
- The SIDR is the only mode that makes sense;
- Besides PS_UDP, other port spaces like PS_IB is also allowed, as it is
UD compatible. In this case qkey also needs to be set [1].
This patch allows only UD qp_type to join multicast, and set qkey to
default if it's not set, to fix an uninit-value error: the ib->rec.qkey
field is accessed without being initialized.
=====================================================
BUG: KMSAN: uninit-value in cma_set_qkey drivers/infiniband/core/cma.c:510 [inline]
BUG: KMSAN: uninit-value in cma_make_mc_event+0xb73/0xe00 drivers/infiniband/core/cma.c:4570
cma_set_qkey drivers/infiniband/core/cma.c:510 [inline]
cma_make_mc_event+0xb73/0xe00 drivers/infiniband/core/cma.c:4570
cma_iboe_join_multicast drivers/infiniband/core/cma.c:4782 [inline]
rdma_join_multicast+0x2b83/0x30a0 drivers/infiniband/core/cma.c:4814
ucma_process_join+0xa76/0xf60 drivers/infiniband/core/ucma.c:1479
ucma_join_multicast+0x1e3/0x250 drivers/infiniband/core/ucma.c:1546
ucma_write+0x639/0x6d0 drivers/infiniband/core/ucma.c:1732
vfs_write+0x8ce/0x2030 fs/read_write.c:588
ksys_write+0x28c/0x520 fs/read_write.c:643
__do_sys_write fs/read_write.c:655 [inline]
__se_sys_write fs/read_write.c:652 [inline]
__ia32_sys_write+0xdb/0x120 fs/read_write.c:652
do_syscall_32_irqs_on arch/x86/entry/common.c:114 [inline]
__do_fast_syscall_32+0x96/0xf0 arch/x86/entry/common.c:180
do_fast_syscall_32+0x34/0x70 arch/x86/entry/common.c:205
do_SYSENTER_32+0x1b/0x20 arch/x86/entry/common.c:248
entry_SYSENTER_compat_after_hwframe+0x4d/0x5c
Local variable ib.i created at:
cma_iboe_join_multicast drivers/infiniband/core/cma.c:4737 [inline]
rdma_join_multicast+0x586/0x30a0 drivers/infiniband/core/cma.c:4814
ucma_process_join+0xa76/0xf60 drivers/infiniband/core/ucma.c:1479
CPU: 0 PID: 29874 Comm: syz-executor.3 Not tainted 5.16.0-rc3-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
=====================================================
[1] https://lore.kernel.org/linux-rdma/20220117183832.GD84788@nvidia.com/
Fixes: b5de0c60cc ("RDMA/cma: Fix use after free race in roce multicast join")
Reported-by: syzbot+8fcbb77276d43cc8b693@syzkaller.appspotmail.com
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Link: https://lore.kernel.org/r/58a4a98323b5e6b1282e83f6b76960d06e43b9fa.1679309909.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
modpost now reads CRCs from .*.cmd files, parsing them using strtol().
This is inconsistent with its parsing of Module.symvers and with their
definition as *unsigned* 32-bit values.
strtol() clamps values to [LONG_MIN, LONG_MAX], and when building on a
32-bit system this changes all CRCs >= 0x80000000 to be 0x7fffffff.
Change extract_crcs_for_object() to use strtoul() instead.
Cc: stable@vger.kernel.org
Fixes: f292d875d0 ("modpost: extract symbol versions from *.cmd files")
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
${WARNOVERRIDE} was misspelled as ${WARNOVVERIDE}, which caused a shell
syntax error in certain paths of the script execution.
Fixes: 46dff8d7e3 ("scripts: merge_config: Add option to suppress warning on overrides")
Signed-off-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
Reviewed-by: Mark Brown <broonie@kernel.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Saeed Mahameed says:
====================
mlx5 fixes 2023-03-21
This series provides bug fixes to mlx5 driver.
* tag 'mlx5-fixes-2023-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5: E-Switch, Fix an Oops in error handling code
net/mlx5: Read the TC mapping of all priorities on ETS query
net/mlx5e: Overcome slow response for first macsec ASO WQE
net/mlx5e: Initialize link speed to zero
net/mlx5: Fix steering rules cleanup
net/mlx5e: Block entering switchdev mode with ns inconsistency
net/mlx5e: Set uplink rep as NETNS_LOCAL
====================
Link: https://lore.kernel.org/r/20230321211135.47711-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2023-03-21 (ice)
This series contains updates to ice driver only.
Piotr sets first_desc field for proper handling of Flow Director
packets.
Michal moves error checking for VF earlier in function to properly return
error before other checks/reporting; he also corrects VSI filter removal to
be done during VSI removal and not rebuild.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
ice: remove filters only if VSI is deleted
ice: check if VF exists before mode check
ice: fix rx buffers handling for flow director packets
====================
Link: https://lore.kernel.org/r/20230321183641.2849726-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2023-03-21 (iavf, i40e)
This series contains updates to iavf and i40e drivers.
Stefan Assmann adds check, and return, if driver has already gone
through remove to prevent hang for iavf.
Radoslaw adds zero initialization to ensure Flow Director packets are
populated with correct values for i40e.
* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
i40e: fix flow director packet filter programming
iavf: fix hang on reboot with ice
====================
Link: https://lore.kernel.org/r/20230321183548.2849671-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Move lowering the TRGMII Tx clock driving to mt7530_setup(), after setting
the core clock, as seen on the U-Boot MediaTek ethernet driver.
Move the code which looks like it lowers the TRGMII Rx clock driving to
after the TRGMII Tx clock driving is lowered. This is run after lowering
the Tx clock driving on the U-Boot MediaTek ethernet driver as well.
This way, the switch should consume less power regardless of port 6 being
used.
Update the comment explaining mt7530_pad_clk_setup().
Tested rgmii and trgmii modes of port 6 and rgmii mode of port 5 on MCM
MT7530 on MT7621AT Unielec U7621-06 and standalone MT7530 on MT7623NI
Bananapi BPI-R2.
Fixes: b8f126a8d5 ("net-next: dsa: add dsa support for Mediatek MT7530 switch")
Link: 29a48bf9cc/drivers/net/mtk_eth.c (L682)
Tested-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Link: https://lore.kernel.org/r/20230320190520.124513-2-arinc.unal@arinc9.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Split the code that enables and disables TRGMII clocks and core clock.
Move enabling and disabling core clock to mt7530_pll_setup() as it's
supposed to be run there.
Add 20 ms delay before enabling the core clock as seen on the U-Boot
MediaTek ethernet driver.
Change the comment for enabling and disabling TRGMII clocks as the code
seems to affect both TXC and RXC.
Tested rgmii and trgmii modes of port 6 and rgmii mode of port 5 on MCM
MT7530 on MT7621AT Unielec U7621-06 and standalone MT7530 on MT7623NI
Bananapi BPI-R2.
Fixes: b8f126a8d5 ("net-next: dsa: add dsa support for Mediatek MT7530 switch")
Link: 29a48bf9cc/drivers/net/mtk_eth.c (L589)
Tested-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Link: https://lore.kernel.org/r/20230320190520.124513-1-arinc.unal@arinc9.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
"modprobe asix ; rmmod asix ; modprobe asix" fails with:
sysfs: cannot create duplicate filename \
'/devices/virtual/mdio_bus/usb-003:004'
Issue was originally reported by Anton Lundin on 2022-06-22 (link below).
Chrome OS team hit the same issue in Feb, 2023 when trying to find
work arounds for other issues with AX88172 devices.
The use of devm_mdiobus_register() with usbnet devices results in the
MDIO data being associated with the USB device. When the asix driver
is unloaded, the USB device continues to exist and the corresponding
"mdiobus_unregister()" is NOT called until the USB device is unplugged
or unauthorized. So the next "modprobe asix" will fail because the MDIO
phy sysfs attributes still exist.
The 'easy' (from a design PoV) fix is to use the non-devm variants of
mdiobus_* functions and explicitly manage this use in the asix_bind
and asix_unbind function calls. I've not explored trying to fix usbnet
initialization so devm_* stuff will work.
Fixes: e532a096be ("net: usb: asix: ax88772: add phylib support")
Reported-by: Anton Lundin <glance@acc.umu.se>
Link: https://lore.kernel.org/netdev/20220623063649.GD23685@pengutronix.de/T/
Tested-by: Eizan Miyamoto <eizan@chromium.org>
Signed-off-by: Grant Grundler <grundler@chromium.org>
Link: https://lore.kernel.org/r/20230321170539.732147-1-grundler@chromium.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The link speed is never changed for the uptime of a VM, and the current
implementation sends an admin queue command for each call. Admin queue
command invocations have nontrivial overhead (e.g., VM exits), which can
be disruptive to users if triggered frequently. Our telemetry data shows
that there are VMs that make frequent calls to this admin queue command.
Caching the result of the original admin queue command would eliminate
the need to send multiple admin queue commands on subsequent calls to
retrieve link speed.
Fixes: 7e074d5a76 ("gve: Enable Link Speed Reporting in the driver.")
Signed-off-by: Joshua Washington <joshwash@google.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230321172332.91678-1-joshwash@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Coverity had rightly indicated a possible deadlock
due to chan_lock being done inside match_session.
All callers of match_* functions should pick up the
necessary locks and call them.
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Cc: stable@vger.kernel.org
Fixes: 724244cdb3 ("cifs: protect session channel fields with chan_lock")
Signed-off-by: Steve French <stfrench@microsoft.com>
When running "ethtool -S eno0 --groups rmon" without an explicit "--src
emac|pmac" argument, the kernel will not report
rx-rmon-etherStatsPkts64to64Octets, rx-rmon-etherStatsPkts65to127Octets,
etc. This is because on ETHTOOL_MAC_STATS_SRC_AGGREGATE, we do not
populate the "ranges" argument.
ocelot_port_get_rmon_stats() does things differently and things work
there. I had forgotten to make sure that the code is structured the same
way in both drivers, so do that now.
Fixes: cf52bd238b ("net: enetc: add support for MAC Merge statistics counters")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230321232831.1200905-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The Bluetooth mesh experimental feature enable was requiring the
controller to be powered off in order for the Enable to work. Mesh is
supposed to be enablable regardless of the controller state, and created
an unintended requirement that the mesh daemon be started before the
classic bluetoothd daemon.
Fixes: af6bcc1921 ("Bluetooth: Add experimental wrapper for MGMT based mesh")
Signed-off-by: Brian Gix <brian.gix@gmail.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
There is a potential race condition in hci_cmd_sync_work and
hci_cmd_sync_clear, and could lead to use-after-free. For instance,
hci_cmd_sync_work is added to the 'req_workqueue' after cancel_work_sync
The entry of 'cmd_sync_work_list' may be freed in hci_cmd_sync_clear, and
causing kernel panic when it is used in 'hci_cmd_sync_work'.
Here's the call trace:
dump_stack_lvl+0x49/0x63
print_report.cold+0x5e/0x5d3
? hci_cmd_sync_work+0x282/0x320
kasan_report+0xaa/0x120
? hci_cmd_sync_work+0x282/0x320
__asan_report_load8_noabort+0x14/0x20
hci_cmd_sync_work+0x282/0x320
process_one_work+0x77b/0x11c0
? _raw_spin_lock_irq+0x8e/0xf0
worker_thread+0x544/0x1180
? poll_idle+0x1e0/0x1e0
kthread+0x285/0x320
? process_one_work+0x11c0/0x11c0
? kthread_complete_and_exit+0x30/0x30
ret_from_fork+0x22/0x30
</TASK>
Allocated by task 266:
kasan_save_stack+0x26/0x50
__kasan_kmalloc+0xae/0xe0
kmem_cache_alloc_trace+0x191/0x350
hci_cmd_sync_queue+0x97/0x2b0
hci_update_passive_scan+0x176/0x1d0
le_conn_complete_evt+0x1b5/0x1a00
hci_le_conn_complete_evt+0x234/0x340
hci_le_meta_evt+0x231/0x4e0
hci_event_packet+0x4c5/0xf00
hci_rx_work+0x37d/0x880
process_one_work+0x77b/0x11c0
worker_thread+0x544/0x1180
kthread+0x285/0x320
ret_from_fork+0x22/0x30
Freed by task 269:
kasan_save_stack+0x26/0x50
kasan_set_track+0x25/0x40
kasan_set_free_info+0x24/0x40
____kasan_slab_free+0x176/0x1c0
__kasan_slab_free+0x12/0x20
slab_free_freelist_hook+0x95/0x1a0
kfree+0xba/0x2f0
hci_cmd_sync_clear+0x14c/0x210
hci_unregister_dev+0xff/0x440
vhci_release+0x7b/0xf0
__fput+0x1f3/0x970
____fput+0xe/0x20
task_work_run+0xd4/0x160
do_exit+0x8b0/0x22a0
do_group_exit+0xba/0x2a0
get_signal+0x1e4a/0x25b0
arch_do_signal_or_restart+0x93/0x1f80
exit_to_user_mode_prepare+0xf5/0x1a0
syscall_exit_to_user_mode+0x26/0x50
ret_from_fork+0x15/0x30
Fixes: 6a98e3836f ("Bluetooth: Add helper for serialized HCI command execution")
Cc: stable@vger.kernel.org
Signed-off-by: Min Li <lm0963hack@gmail.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Current flow interates over entire ACPI table entries looking for
Bluetooth Per Platform Antenna Gain(PPAG) entry. This patch iterates
over ACPI entries relvant to Bluetooth device only.
Fixes: c585a92b2f ("Bluetooth: btintel: Set Per Platform Antenna Gain(PPAG)")
Signed-off-by: Kiran K <kiran.k@intel.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Use correct HCI ISO data packet header struct when the packet has
timestamp. The timestamp, when present, goes before the other fields
(Core v5.3 4E 5.4.5), so the structs are not compatible.
Fixes: ccf74f2390 ("Bluetooth: Add BTPROTO_ISO socket type")
Signed-off-by: Pauli Virtanen <pav@iki.fi>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
This removes the code introduced by
14202eff21 as hci_recv_frame is now able
to detect ACL packets that are in fact ISO packets.
Fixes: 14202eff21 ("Bluetooth: btusb: Detect if an ACL packet is in fact an ISO packet")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Because some transports don't have a dedicated type for ISO packets
(see 14202eff21) they may use ACL type
when in fact they are ISO packets.
In the past this was left for the driver to detect such thing but it
creates a problem when using the likes of btproxy when used by a VM as
the host would not be aware of the connection the guest is doing it
won't be able to detect such behavior, so this make bt_recv_frame
detect when it happens as it is the common interface to all drivers
including guest VMs.
Fixes: 14202eff21 ("Bluetooth: btusb: Detect if an ACL packet is in fact an ISO packet")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
The address resolution should be disabled during the active scan,
so all the advertisements can reach the host. The advertising
has to be paused before disabling the address resolution,
because the advertising will prevent any changes to the resolving
list and the address resolution status. Skipping this will cause
the hci error and the discovery failure.
According to the bluetooth specification:
"7.8.44 LE Set Address Resolution Enable command
This command shall not be used when:
- Advertising (other than periodic advertising) is enabled,
- Scanning is enabled, or
- an HCI_LE_Create_Connection, HCI_LE_Extended_Create_Connection, or
HCI_LE_Periodic_Advertising_Create_Sync command is outstanding."
If the host is using RPA, the controller needs to generate RPA for
the advertising, so the advertising must remain paused during the
active scan.
If the host is not using RPA, the advertising can be resumed after
disabling the address resolution.
Fixes: 9afc675ede ("Bluetooth: hci_sync: allow advertise when scan without RPA")
Signed-off-by: Zhengping Jiang <jiangzp@google.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Steve reported that inactive sessions are terminated after a few
seconds. ksmbd terminate when receiving -EAGAIN error from
kernel_recvmsg(). -EAGAIN means there is no data available in timeout.
So ksmbd should keep connection with unlimited retries instead of
terminating inactive sessions.
Cc: stable@vger.kernel.org
Reported-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reference count of acls will leak when memory allocation fails. Fix this
by adding the missing posix_acl_release().
Fixes: e2f34481b2 ("cifsd: add server-side procedures for SMB3")
Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Smatch static checker warning:
fs/ksmbd/vfs.c:1040 ksmbd_vfs_fqar_lseek() warn: no lower bound on 'length'
fs/ksmbd/vfs.c:1041 ksmbd_vfs_fqar_lseek() warn: no lower bound on 'start'
Fix unexpected result that could caused from negative start and length.
Fixes: f441584858 ("cifsd: add file operations")
Reported-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Smatch static checker warning:
fs/ksmbd/smb2pdu.c:7759 smb2_ioctl()
warn: no lower bound on 'off'
Fix unexpected result that could caused from negative off and bfz.
Fixes: b5e5f9dfc9 ("ksmbd: check invalid FileOffset and BeyondFinalZero in FSCTL_ZERO_DATA")
Reported-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
If vfs objects = streams_xattr in ksmbd.conf FILE_NAMED_STREAMS should
be set to Attributes in FS_ATTRIBUTE_INFORMATION. MacOS client show
"Format: SMB (Unknown)" on faked NTFS and no streams support.
Cc: stable@vger.kernel.org
Reported-by: Miao Lihua <441884205@qq.com>
Tested-by: Miao Lihua <441884205@qq.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
MacOS and Win11 support AES256 encrytion and it is included in the cipher
array of encryption context. Especially on macOS, The most preferred
cipher is AES256. Connecting to ksmbd fails on newer MacOS clients that
support AES256 encryption. MacOS send disconnect request after receiving
final session setup response from ksmbd. Because final session setup is
signed with signing key was generated incorrectly.
For signging key, 'L' value should be initialized to 128 if key size is
16bytes.
Cc: stable@vger.kernel.org
Reported-by: Miao Lihua <441884205@qq.com>
Tested-by: Miao Lihua <441884205@qq.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Pull ARM fix from Russell King:
"Just one fix for now to eliminate a KASAN false positive"
* tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: 9290/1: uaccess: Fix KASAN false-positives
Pull bootconfig fixes from Masami Hiramatsu:
- Fix bootconfig test script to test increased maximum number (8192)
node correctly
- Change the console message if there is no bootconfig data and the
kernel is compiled with CONFIG_BOOT_CONFIG_FORCE=y
* tag 'bootconfig-fixes-v6.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
bootconfig: Change message if no bootconfig with CONFIG_BOOT_CONFIG_FORCE=y
bootconfig: Fix testcase to increase max node
Fix the rcutorturename field so that its size is correctly reported in
the text format embedded in trace.dat files. As it stands, it is
reported as being of size 1:
field:char rcutorturename[8]; offset:8; size:1; signed:0;
Signed-off-by: Douglas Raillard <douglas.raillard@arm.com>
Reviewed-by: Mukesh Ojha <quic_mojha@quicinc.com>
Cc: stable@vger.kernel.org
Fixes: 04ae87a520 ("ftrace: Rework event_create_dir()")
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
[ boqun: Add "Cc" and "Fixes" tags per Steven ]
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Anna says:
> KASAN reports [...] a slab-out-of-bounds in gss_krb5_checksum(),
> and it can cause my client to panic when running cthon basic
> tests with krb5p.
> Running faddr2line gives me:
>
> gss_krb5_checksum+0x4b6/0x630:
> ahash_request_free at
> /home/anna/Programs/linux-nfs.git/./include/crypto/hash.h:619
> (inlined by) gss_krb5_checksum at
> /home/anna/Programs/linux-nfs.git/net/sunrpc/auth_gss/gss_krb5_crypto.c:358
My diagnosis is that the memcpy() at the end of gss_krb5_checksum()
reads past the end of the buffer containing the checksum data
because the callers have ignored gss_krb5_checksum()'s API contract:
* Caller provides the truncation length of the output token (h) in
* cksumout.len.
Instead they provide the fixed length of the hmac buffer. This
length happens to be larger than the value returned by
crypto_ahash_digestsize().
Change these errant callers to work like krb5_etm_{en,de}crypt().
As a defensive measure, bound the length of the byte copy at the
end of gss_krb5_checksum().
Kunit sez:
Testing complete. Ran 68 tests: passed: 68
Elapsed time: 81.680s total, 5.875s configuring, 75.610s building, 0.103s running
Reported-by: Anna Schumaker <schumaker.anna@gmail.com>
Fixes: 8270dbfceb ("SUNRPC: Obscure Kerberos integrity keys")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
When we're using a cached open stateid or a delegation in order to avoid
sending a CLAIM_PREVIOUS open RPC call to the server, we don't have a
new open stateid to present to update_open_stateid().
Instead rely on nfs4_try_open_cached(), just as if we were doing a
normal open.
Fixes: d2bfda2e7a ("NFSv4: don't reprocess cached open CLAIM_PREVIOUS")
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Mika writes:
thunderbolt: Fixes for v6.3-rc4
This includes following fixes and quirks for v6.3-rc:
- Quirk to disable CL-states on AMD USB4 host routers
- Fix memory leak in lane margining
- Correct the retimer access flows
- Quirk to limit USB3 bandwidth on certain Intel USB4 host routers
- Fix usage of scale field when allocting USB3 bandwidth
- Fix interrupt "auto clear" on non-Intel USB4 host routers.
There are also two commits that are not fixes themselves but are needed
for the USB3 bandwidth quirk and for the interrupt auto clear fix to
work.
All these have been in linux-next with no reported issues.
* tag 'thunderbolt-for-v6.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/westeri/thunderbolt:
thunderbolt: Rename shadowed variables bit to interrupt_bit and auto_clear_bit
thunderbolt: Disable interrupt auto clear for rings
thunderbolt: Use const qualifier for `ring_interrupt_index`
thunderbolt: Use scale field when allocating USB3 bandwidth
thunderbolt: Limit USB3 bandwidth of certain Intel USB4 host routers
thunderbolt: Call tb_check_quirks() after initializing adapters
thunderbolt: Add missing UNSET_INBOUND_SBTX for retimer access
thunderbolt: Fix memory leak in margining
thunderbolt: Add quirk to disable CLx
Commit 7c3d5c20dc ("thermal/core: Add a generic thermal_zone_get_trip()
function") stopped marking trip points with a zero temperature as
disabled, behavior that was originally introduced in commit 81ad4276b5
("Thermal: Ignore invalid trip points").
When using the mlxsw driver we see that when such trip points are not
disabled, the thermal subsystem repeatedly tries to set the state of the
associated cooling devices to the maximum state.
Address this by restoring the original behavior and mark trip points
with a zero temperature as disabled.
Fixes: 7c3d5c20dc ("thermal/core: Add a generic thermal_zone_get_trip() function")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
__copy_xstate_to_uabi_buf() copies either from the tasks XSAVE buffer
or from init_fpstate into the ptrace buffer. Dynamic features, like
XTILEDATA, have an all zeroes init state and are not saved in
init_fpstate, which means the corresponding bit is not set in the
xfeatures bitmap of the init_fpstate header.
But __copy_xstate_to_uabi_buf() retrieves addresses for both the tasks
xstate and init_fpstate unconditionally via __raw_xsave_addr().
So if the tasks XSAVE buffer has a dynamic feature set, then the
address retrieval for init_fpstate triggers the warning in
__raw_xsave_addr() which checks the feature bit in the init_fpstate
header.
Remove the address retrieval from init_fpstate for extended features.
They have an all zeroes init state so init_fpstate has zeros for them.
Then zeroing the user buffer for the init state is the same as copying
them from init_fpstate.
Fixes: 2308ee57d9 ("x86/fpu/amx: Enable the AMX feature in 64-bit mode")
Reported-by: Mingwei Zhang <mizhang@google.com>
Link: https://lore.kernel.org/kvm/20230221163655.920289-2-mizhang@google.com/
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Tested-by: Mingwei Zhang <mizhang@google.com>
Link: https://lore.kernel.org/all/20230227210504.18520-2-chang.seok.bae%40intel.com
Cc: stable@vger.kernel.org
The commit 97e3d26b5e ("x86/mm: Randomize per-cpu entry area") fixed
an omission of KASLR on CPU entry areas. It doesn't take into account
KASLR switches though, which may result in unintended non-determinism
when a user wants to avoid it (e.g. debugging, benchmarking).
Generate only a single combination of CPU entry areas offsets -- the
linear array that existed prior randomization when KASLR is turned off.
Since we have 3f148f3318 ("x86/kasan: Map shadow for percpu pages on
demand") and followups, we can use the more relaxed guard
kasrl_enabled() (in contrast to kaslr_memory_enabled()).
Fixes: 97e3d26b5e ("x86/mm: Randomize per-cpu entry area")
Signed-off-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/20230306193144.24605-1-mkoutny%40suse.com
When fixed files are unregistered, file_alloc_end and alloc_hint
are not cleared. This can later cause a NULL pointer dereference in
io_file_bitmap_get() if auto index selection is enabled via
IORING_FILE_INDEX_ALLOC:
[ 6.519129] BUG: kernel NULL pointer dereference, address: 0000000000000000
[...]
[ 6.541468] RIP: 0010:_find_next_zero_bit+0x1a/0x70
[...]
[ 6.560906] Call Trace:
[ 6.561322] <TASK>
[ 6.561672] io_file_bitmap_get+0x38/0x60
[ 6.562281] io_fixed_fd_install+0x63/0xb0
[ 6.562851] ? __pfx_io_socket+0x10/0x10
[ 6.563396] io_socket+0x93/0xf0
[ 6.563855] ? __pfx_io_socket+0x10/0x10
[ 6.564411] io_issue_sqe+0x5b/0x3d0
[ 6.564914] io_submit_sqes+0x1de/0x650
[ 6.565452] __do_sys_io_uring_enter+0x4fc/0xb20
[ 6.566083] ? __do_sys_io_uring_register+0x11e/0xd80
[ 6.566779] do_syscall_64+0x3c/0x90
[ 6.567247] entry_SYSCALL_64_after_hwframe+0x72/0xdc
[...]
To fix the issue, set file alloc range and alloc_hint to zero after
file tables are freed.
Cc: stable@vger.kernel.org
Fixes: 4278a0deb1 ("io_uring: defer alloc_hint update to io_file_bitmap_set()")
Signed-off-by: Savino Dicanosa <sd7.dev@pm.me>
[axboe: add explicit bitmap == NULL check as well]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
When driver doesn't implement a bpf_xdp_metadata kfunc the fallback
implementation returns EOPNOTSUPP, which indicate device driver doesn't
implement this kfunc.
Currently many drivers also return EOPNOTSUPP when the hint isn't
available, which is ambiguous from an API point of view. Instead
change drivers to return ENODATA in these cases.
There can be natural cases why a driver doesn't provide any hardware
info for a specific hint, even on a frame to frame basis (e.g. PTP).
Lets keep these cases as separate return codes.
When describing the return values, adjust the function kernel-doc layout
to get proper rendering for the return values.
Fixes: ab46182d0d ("net/mlx4_en: Support RX XDP metadata")
Fixes: bc8d405b1b ("net/mlx5e: Support RX XDP metadata")
Fixes: 306531f024 ("veth: Support RX XDP metadata")
Fixes: 3d76a4d3d4 ("bpf: XDP metadata RX kfuncs")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/167940675120.2718408.8176058626864184420.stgit@firesoul
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The hvc machinery registers both a console and a tty device based on
the hv ops provided by the specific implementation. Those two
interfaces however have different locks, and there's no single locks
that's shared between the tty and the console implementations, hence
the driver needs to protect itself against concurrent accesses.
Otherwise concurrent calls using the split interfaces are likely to
corrupt the ring indexes, leaving the console unusable.
Introduce a lock to xencons_info to serialize accesses to the shared
ring. This is only required when using the shared memory console,
concurrent accesses to the hypercall based console implementation are
not an issue.
Note the conditional logic in domU_read_console() is slightly modified
so the notify_daemon() call can be done outside of the locked region:
it's an hypercall and there's no need for it to be done with the lock
held.
Fixes: b536b4b962 ('xen: use the hvc console infrastructure for Xen console')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20221130150919.13935-1-roger.pau@citrix.com
Signed-off-by: Juergen Gross <jgross@suse.com>
The continuous read support added recently makes nandsim
unhappy. Indeed, all the supported commands should be re-encoded into
internal commands, so of course there is currently no support for the
commands and patterns needed for continuous reads to work.
I tried to add support for them but nandsim (which is more a tool to
develop/debug upper layers rather than the raw NAND core) suffers from a
big limitation: it's internal parser needs to know what exact operation
is happening when the address cycles are performed. The research is then
sequential from the start up to the address cycles, but does not check
what's coming next even though the information is available. This is a
limitation which is related to the old API used by the core which kind
of forced the controllers to guess what operation was being performed
rather early. Today the core uses a more transparent API called
->exec_op() which no longer requires controller drivers to do any more
guessing, but despite being updated to ->exec_op(), nandsim is still a
bit constrained on this regard and thus cannot handle sequential page
reads because the start sequence beginning is identical to a regular
page read.
If the internal algorithm is updated some day, it should be possible to
make it support sequential page reads by adding something like:
/* Large page devices continuous read page start */
{OPT_LARGEPAGE, {STATE_CMD_READ0, STATE_ADDR_PAGE, STATE_CMD_READSTART,
STATE_CMD_READCACHESEQ | ACTION_CPY, STATE_DATAOUT,
STATE_READY}},
/* Large page devices continuous read page continue */
{OPT_LARGEPAGE, {STATE_CMD_READCACHESEQ | ACTION_CPY_NEXT, STATE_DATAOUT,
STATE_READY}},
/* Large page devices continuous read page end */
{OPT_LARGEPAGE, {STATE_CMD_READCACHEEND | ACTION_CPY_NEXT, STATE_DATAOUT,
STATE_READY}},
For now, we just return -EOPNOTSUPP when the core asks controller
drivers if they support the feature in order to prevent any further use
of these opcodes.
Note: This is a hack, ->exec_op() is not supposed to check against the
COMMAND opcodes unless _really_ needed.
Fixes: 003fe4b954 ("mtd: rawnand: Support for sequential cache reads")
Reported-by: Zhihao Cheng <chengzhihao1@huawei.com>
Link: https://lore.kernel.org/linux-mtd/fd34fe55-7f4a-030d-8653-9bb9cf08410d@huawei.com/
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Tested-by: Zhihao Cheng <chengzhihao1@huawei.com>
Acked-by: Richard Weinberger <richard@nod.at>
Link: https://lore.kernel.org/linux-mtd/20230310085452.1368716-1-miquel.raynal@bootlin.com
Introduce a core thermal API function, thermal_cooling_device_update(),
for updating the max_state value for a cooling device and rearranging
its statistics in sysfs after a possible change of its ->get_max_state()
callback return value.
That callback is now invoked only once, during cooling device
registration, to populate the max_state field in the cooling device
object, so if its return value changes, it needs to be invoked again
and the new return value needs to be stored as max_state. Moreover,
the statistics presented in sysfs need to be rearranged in general,
because there may not be enough room in them to store data for all
of the possible states (in the case when max_state grows).
The new function takes care of that (and some other minor things
related to it), but some extra locking and lockdep annotations are
added in several places too to protect against crashes in the cases
when the statistics are not present or when a stale max_state value
might be used by sysfs attributes.
Note that the actual user of the new function will be added separately.
Link: https://lore.kernel.org/linux-pm/53ec1f06f61c984100868926f282647e57ecfb2d.camel@intel.com/
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Introduce a helper function, thermal_cooling_device_present(), for
checking if the given cooling device is in the list of registered
cooling devices to avoid some code duplication in a subsequent
patch.
No expected functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
The cpufreq policy notifier in the ACPI processor driver may as
well be registered before the driver itself, which causes
acpi_processor_cpufreq_init to be true (unless the notifier
registration fails, which is unlikely at that point) when the
ACPI CPU thermal cooling devices are registered, so the
processor_get_max_state() result does not change while
acpi_processor_driver_init() is running.
Change the ordering in acpi_processor_driver_init() accordingly
to prevent the max_state value from remaining 0 permanently for all
ACPI CPU cooling devices due to setting acpi_processor_cpufreq_init
too late. [Note that processor_get_max_state() may still return
different values at different times after this change, depending on
the cpufreq driver registration time, but that issue needs to be
addressed separately.]
Fixes: a365105c685c("thermal: sysfs: Reuse cdev->max_state")
Reported-by: Wang, Quanxian <quanxian.wang@intel.com>
Link: https://lore.kernel.org/linux-pm/53ec1f06f61c984100868926f282647e57ecfb2d.camel@intel.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
The s390 DMA API conversion changes currently under review will extend
the use of the s390-iommu driver to the DMA API. With s390's mandatory
use of an IOMMU this means all DMA for PCI devices will then use the
s390-iommu driver. With this in mind and considering my involvement in
these changes it makes sense to reflect this increased interdependence
in the maintainer structure. Thus add myself as first maintainer and
move Gerald to reviewer status.
Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Acked-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Link: https://lore.kernel.org/r/20230221161043.37065-1-schnelle@linux.ibm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
If a packet has reached its intended destination, it was bumped to the code
that accepts it, without first checking if a mesh_path needs to be created
based on the discovered source.
Fix this by moving the destination address check further down.
Cc: stable@vger.kernel.org
Fixes: 986e43b19a ("wifi: mac80211: fix receiving A-MSDU frames on mesh interfaces")
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20230314095956.62085-3-nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Some recent upstream debugging uncovered the fact that in
iwlwifi, the TXQ list manipulation is racy.
Introduce a new state bit for when the TXQ is completely
ready and can be used without locking, and if that's not
set yet acquire the lock to check everything correctly.
Reviewed-by: Benjamin Berg <benjamin.berg@intel.com>
Tested-by: Jose Ignacio Tornos Martinez <jtornosm@redhat.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This could race if the queue is redirected while full, then
the flushing internally would start it while it's not yet
usable again. Fix it by using two state bits instead of just
one.
Reviewed-by: Benjamin Berg <benjamin.berg@intel.com>
Tested-by: Jose Ignacio Tornos Martinez <jtornosm@redhat.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
sh/migor_defconfig:
mm/slab.c: In function ‘slab_memory_callback’:
mm/slab.c:1127:23: error: implicit declaration of function ‘init_cache_node_node’; did you mean ‘drain_cache_node_node’? [-Werror=implicit-function-declaration]
1127 | ret = init_cache_node_node(nid);
| ^~~~~~~~~~~~~~~~~~~~
| drain_cache_node_node
The #ifdef condition protecting the definition of init_cache_node_node()
no longer matches the conditions protecting the (multiple) users.
Fix this by syncing the conditions.
Fixes: 76af6a054d ("mm/migrate: add CPU hotplug to demotion #ifdef")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/r/b5bdea22-ed2f-3187-6efe-0c72330270a4@infradead.org
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
The FEI field of C2HTermReq/H2CTermReq is 4 bytes but not 4-byte-aligned
in the NVMe/TCP specification (it is located at offset 10 in the PDU).
Split it into two 16-bit integers in struct nvme_tcp_term_pdu
so no padding is inserted. There should also be 10 reserved bytes after.
There are currently no users of this type.
Fixes: fc221d0544 ("nvme-tcp: Add protocol header")
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Caleb Sander <csander@purestorage.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Identify CNS 06h (I/O Command Set Specific Identify Controller data
structure) is supported only on i/o controllers.
But nvme_init_non_mdts_limits() currently invokes this on all
controllers. Correct this by ensuring this is sent to I/O
controllers only.
Signed-off-by: Martin George <marting@netapp.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Explicit alignment and page alignment are used only to calculate
the stride, not when checking actual slot physical address.
Originally, only page alignment was implemented, and that worked,
because the whole SWIOTLB is allocated on a page boundary, so
aligning the start index was sufficient to ensure a page-aligned
slot.
When commit 1f221a0d0d ("swiotlb: respect min_align_mask") added
support for min_align_mask, the index could be incremented in the
search loop, potentially finding an unaligned slot if minimum device
alignment is between IO_TLB_SIZE and PAGE_SIZE. The bug could go
unnoticed, because the slot size is 2 KiB, and the most common page
size is 4 KiB, so there is no alignment value in between.
IIUC the intention has been to find a slot that conforms to all
alignment constraints: device minimum alignment, an explicit
alignment (given as function parameter) and optionally page
alignment (if allocation size is >= PAGE_SIZE). The most
restrictive mask can be trivially computed with logical AND. The
rest can stay.
Fixes: 1f221a0d0d ("swiotlb: respect min_align_mask")
Fixes: e81e99bacc ("swiotlb: Support aligned swiotlb buffers")
Signed-off-by: Petr Tesarik <petr.tesarik.ext@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
No functional change, just use an existing helper.
Signed-off-by: Petr Tesarik <petr.tesarik.ext@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Vladimir Oltean says:
====================
Fix trainwreck with Ocelot switch statistics counters
While testing the patch set for preemptible traffic classes with some
controlled traffic and measuring counter deltas:
https://lore.kernel.org/netdev/20230220122343.1156614-1-vladimir.oltean@nxp.com/
I noticed that in the output of "ethtool -S swp0 --groups eth-mac
eth-phy eth-ctrl rmon -- --src emac | grep -v ': 0'", the TX counters
were off. Quickly I realized that their values were permutated by 1
compared to their names, and that for example
tx-rmon-etherStatsPkts64to64Octets was incrementing when
tx-rmon-etherStatsPkts65to127Octets should have.
Initially I suspected something having to do with the bulk reading
logic, and indeed I found a bug there (fixed as 1/3), but that was not
the source of the problems. Instead it revealed other problems.
While dumping the regions created by the driver on my switch, I figured
out that it sees a discontinuity which shouldn't have existed between
reg 0x278 and reg 0x280.
Discontinuity between last reg 0x0 and new reg 0x0, creating new region
Discontinuity between last reg 0x108 and new reg 0x200, creating new region
Discontinuity between last reg 0x278 and new reg 0x280, creating new region
Discontinuity between last reg 0x2b0 and new reg 0x400, creating new region
region of 67 contiguous counters starting with SYS:STAT:CNT[0x000]
region of 31 contiguous counters starting with SYS:STAT:CNT[0x080]
region of 13 contiguous counters starting with SYS:STAT:CNT[0x0a0]
region of 18 contiguous counters starting with SYS:STAT:CNT[0x100]
That is where TX_MM_HOLD should have been, and that was the bug, since
it was missing. After adding it, the regions look like this and the
off-by-one issue is resolved:
Discontinuity between last reg 0x000000 and new reg 0x000000, creating new region
Discontinuity between last reg 0x000108 and new reg 0x000200, creating new region
Discontinuity between last reg 0x0002b0 and new reg 0x000400, creating new region
region of 67 contiguous counters starting with SYS:STAT:CNT[0x000]
region of 45 contiguous counters starting with SYS:STAT:CNT[0x080]
region of 18 contiguous counters starting with SYS:STAT:CNT[0x100]
However, as I am thinking out loud, it should have not reported the
other counters as off by one even when skipping TX_MM_HOLD... after all,
on Ocelot/Seville, there are more counters which need to be skipped.
Which is when I investigated and noticed the bug solved in 2/3.
I've validated that both on native VSC9959 (which uses
ocelot_mm_stats_layout) as well as by faking the other switches by
making VSC9959 use the plain ocelot_stats_layout.
To summarize: on all Ocelot switches, the TX counters and drop counters
are completely broken. The RX counters are mostly fine.
With this occasion, I have collected more cleanup patches in this area,
which I'm going to submit after the net -> net-next merge.
====================
Link: https://lore.kernel.org/r/20230321010325.897817-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The lack of a definition for this counter is what initially prompted me
to investigate a problem which really manifested itself as the previous
change, "net: mscc: ocelot: fix transfer from region->buf to ocelot->stats".
When TX_MM_HOLD is defined in enum ocelot_stat but not in struct
ocelot_stat_layout ocelot_mm_stats_layout, this creates a hole, which
due to the aforementioned bug, makes all counters following TX_MM_HOLD
be recorded off by one compared to their correct position. So for
example, a non-zero TX_PMAC_OCTETS would be reported as TX_MERGE_FRAGMENTS,
TX_PMAC_UNICAST would be reported as TX_PMAC_OCTETS, TX_PMAC_64 would be
reported as TX_PMAC_PAUSE, etc etc. This is because the size of the hole
(1) is much smaller than the size of the region, so the phenomenon where
the stats are off-by-one, rather than lost, prevails.
However, the phenomenon where stats are lost can be seen too, for
example with DROP_LOCAL, which is at the beginning of its own region
(offset 0x000400 vs the previous 0x0002b0 constitutes a discontinuity).
This is also reported as off by one and saved to TX_PMAC_1527_MAX, but
that counter is not reported to the unstructured "ethtool -S", as
opposed to DROP_LOCAL which is (as "drop_local").
Fixes: ab3f97a961 ("net: mscc: ocelot: export ethtool MAC Merge stats for Felix VSC9959")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
To understand the problem, we need some definitions.
The driver is aware of multiple counters (enum ocelot_stat), yet not all
switches supported by the driver implement all counters. There are 2
statistics layouts: ocelot_stats_layout and ocelot_mm_stats_layout, the
latter having 36 counters more than the former.
ocelot->stats[] is not a compact array, i.e. there are elements within
it which are not going to be populated for ocelot_stats_layout. On the
other hand, ocelot->stats[] is easily indexable, for example "tx_octets"
for port 3 can be found at ocelot->stats[3 * OCELOT_NUM_STATS +
OCELOT_STAT_TX_OCTETS], and that is why we keep it sparse.
Regions, as created by ocelot_prepare_stats_regions(), are compact
(every element from region->buf will correspond to a counter that is
present in this switch's layout) but are not easily indexable.
Let's define holes as the ranges of values of enum ocelot_stat for which
ocelot_stats_layout doesn't have a "reg" defined. For example, there is
a hole between OCELOT_STAT_RX_GREEN_PRIO_7 and OCELOT_STAT_TX_OCTETS
which is of 23 elements that are only present on ocelot_mm_stats_layout,
and as such, they are also present in enum ocelot_stat. Let's define the
left extremity of the hole - the last enum ocelot_stat still defined -
as A (in this case OCELOT_STAT_RX_GREEN_PRIO_7) and the right extremity -
the first enum ocelot_stat that is defined after a series of undefined
ones - as B (in this case OCELOT_STAT_TX_OCTETS).
There is a bug in the procedure which transfers stats from region->buf[]
to ocelot->stats[].
For each hole in the ocelot_stats_layout, the logic transfers the stats
starting with enum ocelot_stat B to ocelot->stats[] index A + 1. So all
stats after a hole are saved to a position which is off by B - A + 1
elements.
This causes 2 kinds of issues:
(a) counters which shouldn't increment increment
(b) counters which should increment don't
Holes in the ocelot_stat_layout automatically imply the end of a region
and the beginning of a new one; however the reverse is not necessarily
true. For example, for ocelot_mm_stat_layout, there could be multiple
regions (which indicate discontinuities in register addresses) while
there is no hole (which indicates discontinuities in enum ocelot_stat
values).
In the example above, the stats from the second region->buf[] are not
transferred to ocelot->stats starting with index
"port * OCELOT_NUM_STATS + OCELOT_STAT_TX_OCTETS" as they should, but
rather, starting with element
"port * OCELOT_NUM_STATS + OCELOT_STAT_RX_GREEN_PRIO_7 + 1".
That stats[] array element is not reported to user space for switches
that use ocelot_stat_layout, and that is how issue (b) occurs.
However, if the length of the second region is larger than the hole,
then some stats will start to be transferred to the ocelot->stats[]
indices which *are* reported to user space, but those indices contain
wrong values (corresponding to unexpected counters). This is how issue
(a) occurs.
The procedure, as it was introduced in commit d87b1c08f3 ("net: mscc:
ocelot: use bulk reads for stats"), was not buggy, because there were no
holes in the struct ocelot_stat_layout instances at that time. The
problem is that when those holes were introduced, the function was not
updated to take them into consideration.
To update the procedure, we need to know, for each region, which enum
ocelot_stat corresponds to its region->base. We have no way of deducing
that based on the contents of struct ocelot_stats_region, so we need to
add this information.
Fixes: ab3f97a961 ("net: mscc: ocelot: export ethtool MAC Merge stats for Felix VSC9959")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The blamed commit changed struct ocelot_stat_layout :: "u32 offset" to
"u32 reg".
However, "u32 reg" is not quite a register address, but an enum
ocelot_reg, which in itself encodes an enum ocelot_target target in the
upper bits, and an index into the ocelot->map[target][] array in the
lower bits.
So, whereas the previous code comparison between stats_layout[i].offset
and last + 1 was correct (because those "offsets" at the time were
32-bit relative addresses), the new code, comparing layout[i].reg to
last + 4 is not correct, because the "reg" here is an enum/index, not an
actual register address.
What we want to compare are indeed register addresses, but to do that,
we need to actually go through the same motions as
__ocelot_bulk_read_ix() itself.
With this bug, all statistics counters are deemed by
ocelot_prepare_stats_regions() as constituting their own region.
(Truncated) log on VSC9959 (Felix) below (prints added by me):
Before:
region of 1 contiguous counters starting with SYS:STAT:CNT[0x000]
region of 1 contiguous counters starting with SYS:STAT:CNT[0x001]
region of 1 contiguous counters starting with SYS:STAT:CNT[0x002]
...
region of 1 contiguous counters starting with SYS:STAT:CNT[0x041]
region of 1 contiguous counters starting with SYS:STAT:CNT[0x042]
region of 1 contiguous counters starting with SYS:STAT:CNT[0x080]
region of 1 contiguous counters starting with SYS:STAT:CNT[0x081]
...
region of 1 contiguous counters starting with SYS:STAT:CNT[0x0ac]
region of 1 contiguous counters starting with SYS:STAT:CNT[0x100]
region of 1 contiguous counters starting with SYS:STAT:CNT[0x101]
...
region of 1 contiguous counters starting with SYS:STAT:CNT[0x111]
After:
region of 67 contiguous counters starting with SYS:STAT:CNT[0x000]
region of 45 contiguous counters starting with SYS:STAT:CNT[0x080]
region of 18 contiguous counters starting with SYS:STAT:CNT[0x100]
Since commit d87b1c08f3 ("net: mscc: ocelot: use bulk reads for
stats") intended bulking as a performance improvement, and since now,
with trivial-sized regions, performance is even worse than without
bulking at all, this could easily qualify as a performance regression.
Fixes: d4c3676507 ("net: mscc: ocelot: keep ocelot_stat_layout by reg address, not offset")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Colin Foster <colin.foster@in-advantage.com>
Tested-by: Colin Foster <colin.foster@in-advantage.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
There are memory leaks reported by kmemleak:
unreferenced object 0xffff888106500800 (size 128):
comm "modprobe", pid 1017, jiffies 4297787785 (age 67.152s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<00000000970ce626>] __kmem_cache_alloc_node+0x20c/0x380
[<00000000fb5f78d9>] kmalloc_trace+0x2f/0xb0
[<000000000e947e2a>] idt77252_init_one+0x2847/0x3c90 [idt77252]
[<000000006efb048e>] local_pci_probe+0xeb/0x1a0
...
unreferenced object 0xffff888106500b00 (size 128):
comm "modprobe", pid 1017, jiffies 4297787785 (age 67.152s)
hex dump (first 32 bytes):
00 20 3d 01 80 88 ff ff 00 20 3d 01 80 88 ff ff . =...... =.....
f0 23 3d 01 80 88 ff ff 00 20 3d 01 00 00 00 00 .#=...... =.....
backtrace:
[<00000000970ce626>] __kmem_cache_alloc_node+0x20c/0x380
[<00000000fb5f78d9>] kmalloc_trace+0x2f/0xb0
[<00000000f451c5be>] alloc_scq.constprop.0+0x4a/0x400 [idt77252]
[<00000000e6313849>] idt77252_init_one+0x28cf/0x3c90 [idt77252]
The root cause is traced to the vc_maps which alloced in open_card_oam()
are not freed in close_card_oam(). The vc_maps are used to record
open connections, so when close a vc_map in close_card_oam(), the memory
should be freed. Moreover, the ubr0 is not closed when close a idt77252
device, leading to the memory leak of vc_map and scq_info.
Fix them by adding kfree in close_card_oam() and implementing new
close_card_ubr0() to close ubr0.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Li Zetao <lizetao1@huawei.com>
Reviewed-by: Francois Romieu <romieu@fr.zoreil.com>
Link: https://lore.kernel.org/r/20230320143318.2644630-1-lizetao1@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
As a result of the switch to dh_listpackages, $version is no longer set
when install_kernel_headers() is called. This causes files in the
linux-headers deb package to be installed to a path with an empty
$version (e.g. /usr/src/linux-headers-/scripts/sign-file rather than
/usr/src/linux-headers-6.3.0-rc3/scripts/sign-file).
To avoid this, while continuing to use the version information from
dh_listpackages, pass $version from $package as the second argument
of install_kernel_headers().
Fixes: 36862e14e3 ("kbuild: deb-pkg: use dh_listpackages to know enabled packages")
Signed-off-by: Kevin Locke <kevin@kevinlocke.name>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
When BCM63xx internal switches are connected to switches with a 4-byte
Broadcom tag, it does not identify the packet as VLAN tagged, so it adds one
based on its PVID (which is likely 0).
Right now, the packet is received by the BCM63xx internal switch and the 6-byte
tag is properly processed. The next step would to decode the corresponding
4-byte tag. However, the internal switch adds an invalid VLAN tag after the
6-byte tag and the 4-byte tag handling fails.
In order to fix this we need to remove the invalid VLAN tag after the 6-byte
tag before passing it to the 4-byte tag decoding.
Fixes: 964dbf186e ("net: dsa: tag_brcm: add support for legacy tags")
Signed-off-by: Álvaro Fernández Rojas <noltari@gmail.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20230319095540.239064-1-noltari@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull VFIO fix from Alex Williamson:
- Fix dirty size reporting for pre-copy in mlx5 variant driver (Yishai
Hadas)
* tag 'vfio-v6.3-rc4' of https://github.com/awilliam/linux-vfio:
vfio/mlx5: Fix the report of dirty_bytes upon pre-copy
Pull nfsd fixes from Chuck Lever:
- Fix a crash during NFS READs from certain client implementations
- Address a minor kbuild regression in v6.3
* tag 'nfsd-6.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
nfsd: don't replace page in rq_pages if it's a continuation of last page
NFS & NFSD: Update GSS dependencies
The error handling dereferences "vport". There is nothing we can do if
it is an error pointer except returning the error code.
Fixes: 133dcfc577 ("net/mlx5: E-Switch, Alloc and free unique metadata for match")
Signed-off-by: Dan Carpenter <error27@gmail.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
When ETS configurations are queried by the user to get the mapping
assignment between packet priority and traffic class, only priorities up
to maximum TCs are queried from QTCT register in FW to retrieve their
assigned TC, leaving the rest of the priorities mapped to the default
TC #0 which might be misleading.
Fix by querying the TC mapping of all priorities on each ETS query,
regardless of the maximum number of TCs configured in FW.
Fixes: 820c2c5e77 ("net/mlx5e: Read ETS settings directly from firmware")
Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
First ASO WQE poll causes a cache miss in hardware hence the resut is
delayed. It causes to the situation where such WQE is polled earlier
than it is needed.
Add logic to retry ASO CQ polling operation.
Fixes: 739cfa3451 ("net/mlx5: Make ASO poll CQ usable in atomic context")
Signed-off-by: Emeel Hakim <ehakim@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
mlx5e_port_max_linkspeed does not guarantee value assignment for speed.
Avoid cases where link_speed might be used uninitialized. In case
mlx5e_port_max_linkspeed fails, a default link speed of 50000 will be
used for the calculations.
Fixes: 3f6d08d196 ("net/mlx5e: Add RSS support for hairpin")
Signed-off-by: Roy Novich <royno@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Aya Levin <ayal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
vport's mc, uc and multicast rules are not deleted in teardown path when
EEH happens. Since the vport's promisc settings(uc, mc and all) in
firmware are reset after EEH, mlx5 driver will try to delete the above
rules in the initialization path. This cause kernel crash because these
software rules are no longer valid.
Fix by nullifying these rules right after delete to avoid accessing any dangling
pointers.
Call Trace:
__list_del_entry_valid+0xcc/0x100 (unreliable)
tree_put_node+0xf4/0x1b0 [mlx5_core]
tree_remove_node+0x30/0x70 [mlx5_core]
mlx5_del_flow_rules+0x14c/0x1f0 [mlx5_core]
esw_apply_vport_rx_mode+0x10c/0x200 [mlx5_core]
esw_update_vport_rx_mode+0xb4/0x180 [mlx5_core]
esw_vport_change_handle_locked+0x1ec/0x230 [mlx5_core]
esw_enable_vport+0x130/0x260 [mlx5_core]
mlx5_eswitch_enable_sriov+0x2a0/0x2f0 [mlx5_core]
mlx5_device_enable_sriov+0x74/0x440 [mlx5_core]
mlx5_load_one+0x114c/0x1550 [mlx5_core]
mlx5_pci_resume+0x68/0xf0 [mlx5_core]
eeh_report_resume+0x1a4/0x230
eeh_pe_dev_traverse+0x98/0x170
eeh_handle_normal_event+0x3e4/0x640
eeh_handle_event+0x4c/0x370
eeh_event_handler+0x14c/0x210
kthread+0x168/0x1b0
ret_from_kernel_thread+0x5c/0x84
Fixes: a35f71f27a ("net/mlx5: E-Switch, Implement promiscuous rx modes vf request handling")
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Lama Kayal <lkayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Upon entering switchdev mode, VF/SF representors are spawned in the
devlink instance's net namespace, whereas the PF net device transforms
into the uplink representor, remaining in the net namespace the PF net
device was in. Therefore, if a PF net device's namespace is different from
its parent devlink net namespace, entering switchdev mode can create an
illegal situation where all representors sharing the same core device
are NOT in the same net namespace.
To avoid this issue, block entering switchdev mode for devices whose child
netdev net namespace has diverged from the parent devlink's.
Fixes: 7768d1971d ("net/mlx5: E-Switch, Add control for encapsulation")
Signed-off-by: Gavin Li <gavinl@nvidia.com>
Reviewed-by: Gavi Teitz <gavi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Previously, NETNS_LOCAL was not set for uplink representors, inconsistent
with VF representors, and allowed the uplink representor to be moved
between net namespaces and separated from the VF representors it shares
the core device with. Such usage would break the isolation model of
namespaces, as devices in different namespaces would have access to
shared memory.
To solve this issue, set NETNS_LOCAL for uplink representors if eswitch is
in switchdev mode.
Fixes: 7a9fb35e8c ("net/mlx5e: Do not reload ethernet ports when changing eswitch mode")
Signed-off-by: Gavin Li <gavinl@nvidia.com>
Reviewed-by: Gavi Teitz <gavi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
We've seen recent AWS EKS (Kubernetes) user reports like the following:
After upgrading EKS nodes from v20230203 to v20230217 on our 1.24 EKS
clusters after a few days a number of the nodes have containers stuck
in ContainerCreating state or liveness/readiness probes reporting the
following error:
Readiness probe errored: rpc error: code = Unknown desc = failed to
exec in container: failed to start exec "4a11039f730203ffc003b7[...]":
OCI runtime exec failed: exec failed: unable to start container process:
unable to init seccomp: error loading seccomp filter into kernel:
error loading seccomp filter: errno 524: unknown
However, we had not been seeing this issue on previous AMIs and it only
started to occur on v20230217 (following the upgrade from kernel 5.4 to
5.10) with no other changes to the underlying cluster or workloads.
We tried the suggestions from that issue (sysctl net.core.bpf_jit_limit=452534528)
which helped to immediately allow containers to be created and probes to
execute but after approximately a day the issue returned and the value
returned by cat /proc/vmallocinfo | grep bpf_jit | awk '{s+=$2} END {print s}'
was steadily increasing.
I tested bpf tree to observe bpf_jit_charge_modmem, bpf_jit_uncharge_modmem
their sizes passed in as well as bpf_jit_current under tcpdump BPF filter,
seccomp BPF and native (e)BPF programs, and the behavior all looks sane
and expected, that is nothing "leaking" from an upstream perspective.
The bpf_jit_limit knob was originally added in order to avoid a situation
where unprivileged applications loading BPF programs (e.g. seccomp BPF
policies) consuming all the module memory space via BPF JIT such that loading
of kernel modules would be prevented. The default limit was defined back in
2018 and while good enough back then, we are generally seeing far more BPF
consumers today.
Adjust the limit for the BPF JIT pool from originally 1/4 to now 1/2 of the
module memory space to better reflect today's needs and avoid more users
running into potentially hard to debug issues.
Fixes: fdadd04931 ("bpf: fix bpf_jit_limit knob for PAGE_SIZE >= 64K")
Reported-by: Stephen Haynes <sh@synk.net>
Reported-by: Lefteris Alexakis <lefteris.alexakis@kpn.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://github.com/awslabs/amazon-eks-ami/issues/1179
Link: https://github.com/awslabs/amazon-eks-ami/issues/1219
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20230320143725.8394-1-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The CDAT exposed in sysfs differs between little endian and big endian
arches: On big endian, every 4 bytes are byte-swapped.
PCI Configuration Space is little endian (PCI r3.0 sec 6.1). Accessors
such as pci_read_config_dword() implicitly swap bytes on big endian.
That way, the macros in include/uapi/linux/pci_regs.h work regardless of
the arch's endianness. For an example of implicit byte-swapping, see
ppc4xx_pciex_read_config(), which calls in_le32(), which uses lwbrx
(Load Word Byte-Reverse Indexed).
DOE Read/Write Data Mailbox Registers are unlike other registers in
Configuration Space in that they contain or receive a 4 byte portion of
an opaque byte stream (a "Data Object" per PCIe r6.0 sec 7.9.24.5f).
They need to be copied to or from the request/response buffer verbatim.
So amend pci_doe_send_req() and pci_doe_recv_resp() to undo the implicit
byte-swapping.
The CXL_DOE_TABLE_ACCESS_* and PCI_DOE_DATA_OBJECT_DISC_* macros assume
implicit byte-swapping. Byte-swap requests after constructing them with
those macros and byte-swap responses before parsing them.
Change the request and response type to __le32 to avoid sparse warnings.
Per a request from Jonathan, replace sizeof(u32) with sizeof(__le32) for
consistency.
Fixes: c97006046c ("cxl/port: Read CDAT table")
Tested-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: stable@vger.kernel.org # v6.0+
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/3051114102f41d19df3debbee123129118fc5e6d.1678543498.git.lukas@wunner.de
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Pull keyrings fixes from David Howells:
- Fix request_key() so that it doesn't cache a looked up key on the
current thread if that thread is a kernel thread.
The cache is cleared during notify_resume - but that doesn't happen
in kernel threads. This is causing cifs DNS keys to be
un-invalidateable.
- Fix a wrapper check in verify_pefile() to not round up the length.
- Change asymmetric_keys code to log errors to make it easier for users
to work out why failures occurred.
* tag 'keys-fixes-20230321' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
asymmetric_keys: log on fatal failures in PE/pkcs7
verify_pefile: relax wrapper length check
keys: Do not cache key in task struct if key is requested from kernel thread
When a system with E810 with existing VFs gets rebooted the following
hang may be observed.
Pid 1 is hung in iavf_remove(), part of a network driver:
PID: 1 TASK: ffff965400e5a340 CPU: 24 COMMAND: "systemd-shutdow"
#0 [ffffaad04005fa50] __schedule at ffffffff8b3239cb
#1 [ffffaad04005fae8] schedule at ffffffff8b323e2d
#2 [ffffaad04005fb00] schedule_hrtimeout_range_clock at ffffffff8b32cebc
#3 [ffffaad04005fb80] usleep_range_state at ffffffff8b32c930
#4 [ffffaad04005fbb0] iavf_remove at ffffffffc12b9b4c [iavf]
#5 [ffffaad04005fbf0] pci_device_remove at ffffffff8add7513
#6 [ffffaad04005fc10] device_release_driver_internal at ffffffff8af08baa
#7 [ffffaad04005fc40] pci_stop_bus_device at ffffffff8adcc5fc
#8 [ffffaad04005fc60] pci_stop_and_remove_bus_device at ffffffff8adcc81e
#9 [ffffaad04005fc70] pci_iov_remove_virtfn at ffffffff8adf9429
#10 [ffffaad04005fca8] sriov_disable at ffffffff8adf98e4
#11 [ffffaad04005fcc8] ice_free_vfs at ffffffffc04bb2c8 [ice]
#12 [ffffaad04005fd10] ice_remove at ffffffffc04778fe [ice]
#13 [ffffaad04005fd38] ice_shutdown at ffffffffc0477946 [ice]
#14 [ffffaad04005fd50] pci_device_shutdown at ffffffff8add58f1
#15 [ffffaad04005fd70] device_shutdown at ffffffff8af05386
#16 [ffffaad04005fd98] kernel_restart at ffffffff8a92a870
#17 [ffffaad04005fda8] __do_sys_reboot at ffffffff8a92abd6
#18 [ffffaad04005fee0] do_syscall_64 at ffffffff8b317159
#19 [ffffaad04005ff08] __context_tracking_enter at ffffffff8b31b6fc
#20 [ffffaad04005ff18] syscall_exit_to_user_mode at ffffffff8b31b50d
#21 [ffffaad04005ff28] do_syscall_64 at ffffffff8b317169
#22 [ffffaad04005ff50] entry_SYSCALL_64_after_hwframe at ffffffff8b40009b
RIP: 00007f1baa5c13d7 RSP: 00007fffbcc55a98 RFLAGS: 00000202
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f1baa5c13d7
RDX: 0000000001234567 RSI: 0000000028121969 RDI: 00000000fee1dead
RBP: 00007fffbcc55ca0 R8: 0000000000000000 R9: 00007fffbcc54e90
R10: 00007fffbcc55050 R11: 0000000000000202 R12: 0000000000000005
R13: 0000000000000000 R14: 00007fffbcc55af0 R15: 0000000000000000
ORIG_RAX: 00000000000000a9 CS: 0033 SS: 002b
During reboot all drivers PM shutdown callbacks are invoked.
In iavf_shutdown() the adapter state is changed to __IAVF_REMOVE.
In ice_shutdown() the call chain above is executed, which at some point
calls iavf_remove(). However iavf_remove() expects the VF to be in one
of the states __IAVF_RUNNING, __IAVF_DOWN or __IAVF_INIT_FAILED. If
that's not the case it sleeps forever.
So if iavf_shutdown() gets invoked before iavf_remove() the system will
hang indefinitely because the adapter is already in state __IAVF_REMOVE.
Fix this by returning from iavf_remove() if the state is __IAVF_REMOVE,
as we already went through iavf_shutdown().
Fixes: 974578017f ("iavf: Add waiting so the port is initialized in remove")
Fixes: a8417330f8 ("iavf: Fix race condition between iavf_shutdown and iavf_remove")
Reported-by: Marius Cornea <mcornea@redhat.com>
Signed-off-by: Stefan Assmann <sassmann@kpanic.de>
Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
If CDM_CHECK is enabled (by the DT "snps,enable-cdm-check" property), 'val'
is overwritten by PCIE_PL_CHK_REG_CONTROL_STATUS initialization. Commit
ec7b952f45 ("PCI: dwc: Always enable CDM check if "snps,enable-cdm-check"
exists") did not account for further usage of 'val', so we wrote improper
values to PCIE_PORT_LINK_CONTROL when the CDM check is enabled.
Move the PCIE_PORT_LINK_CONTROL update to be completely after the
PCIE_PL_CHK_REG_CONTROL_STATUS register initialization.
[bhelgaas: commit log adapted from Serge's version]
Fixes: ec7b952f45 ("PCI: dwc: Always enable CDM check if "snps,enable-cdm-check" exists")
Link: https://lore.kernel.org/r/20230310123510.675685-2-yoshihiro.shimoda.uh@renesas.com
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Serge Semin <fancer.lancer@gmail.com>
Disabling the cache in commit 2ff4ba9e37 ("clk: rs9: Fix I2C accessors")
without removing cache synchronization in resume path results in a
kernel panic as map->cache_ops is unset, due to REGCACHE_NONE.
Enable flat cache again to support resume again. num_reg_defaults_raw
is necessary to read the cache defaults from hardware. Some registers
are strapped in hardware and cannot be provided in software.
Fixes: 2ff4ba9e37 ("clk: rs9: Fix I2C accessors")
Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Link: https://lore.kernel.org/r/20230310074940.3475703-1-alexander.stein@ew.tq-group.com
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Filters shouldn't be removed in VSI rebuild path. Removing them on PF
VSI results in no rule for PF MAC after changing for example queues
amount.
Remove all filters only in the VSI remove flow. As unload should also
cause the filter to be removed introduce, a new function ice_stop_eth().
It will unroll ice_start_eth(), so remove filters and close VSI.
Fixes: 6624e780a5 ("ice: split ice_vsi_setup into smaller functions")
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Arpana Arland <arpanax.arland@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The key which gets cached in task structure from a kernel thread does not
get invalidated even after expiry. Due to which, a new key request from
kernel thread will be served with the cached key if it's present in task
struct irrespective of the key validity. The change is to not cache key in
task_struct when key requested from kernel thread so that kernel thread
gets a valid key on every key request.
The problem has been seen with the cifs module doing DNS lookups from a
kernel thread and the results getting pinned by being attached to that
kernel thread's cache - and thus not something that can be easily got rid
of. The cache would ordinarily be cleared by notify-resume, but kernel
threads don't do that.
This isn't seen with AFS because AFS is doing request_key() within the
kernel half of a user thread - which will do notify-resume.
Fixes: 7743c48e54 ("keys: Cache result of request_key*() temporarily in task_struct")
Signed-off-by: Bharath SM <bharathsm@microsoft.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Steve French <smfrench@gmail.com>
cc: keyrings@vger.kernel.org
cc: linux-cifs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/CAGypqWw951d=zYRbdgNR4snUDvJhWL=q3=WOyh7HhSJupjz2vA@mail.gmail.com/
Since commit 6c40624930 ("bootconfig: Increase max nodes of bootconfig
from 1024 to 8192 for DCC support") increased the max number of bootconfig
node to 8192, the bootconfig testcase of the max number of nodes fails.
To fix this issue, we can not simply increase the number in the test script
because the test bootconfig file becomes too big (>32KB). To fix that, we
can use a combination of three alphabets (26^3 = 17576). But with that,
we can not express the 8193 (just one exceed from the limitation) because
it also exceeds the max size of bootconfig. So, the first 26 nodes will just
use one alphabet.
With this fix, test-bootconfig.sh passes all tests.
Link: https://lore.kernel.org/all/167888844790.791176.670805252426835131.stgit@devnote2/
Reported-by: Heinz Wiesinger <pprkut@slackware.com>
Link: https://lore.kernel.org/all/2463802.XAFRqVoOGU@amaterasu.liwjatan.org
Fixes: 6c40624930 ("bootconfig: Increase max nodes of bootconfig from 1024 to 8192 for DCC support")
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Suzuki writes:
coresight: Fixes for v6.3
Fixes for coresight subsystem includes:
- Fix etm4_enable_hw to program all the address comparator pairs (instead of
half of them)
- Do not access TRCIDR1 register without OSLK cleared in etm4_probe for mmio
access.
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
* tag 'coresight-fixes-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/coresight/linux:
coresight: etm4x: Do not access TRCIDR1 for identification
coresight-etm4: Fix for() loop drvdata->nr_addr_cmp range bug
Smatch reports:
drivers/hwmon/xgene-hwmon.c:757 xgene_hwmon_probe() warn:
'ctx->pcc_comm_addr' from ioremap() not released on line: 757.
This is because in drivers/hwmon/xgene-hwmon.c:701 xgene_hwmon_probe(),
ioremap and memremap is not released, which may cause a leak.
To fix this, ioremap and memremap is modified to devm_ioremap and
devm_memremap.
Signed-off-by: Tianyi Jing <jingfelix@hust.edu.cn>
Reviewed-by: Dongliang Mu <dzm91@hust.edu.cn>
Link: https://lore.kernel.org/r/20230318143851.2191625-1-jingfelix@hust.edu.cn
[groeck: Fixed formatting and subject]
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Avoid needlessly rebuilding the compressed image by adding the file
'vmlinuz' to the 'targets' Kbuild make variable.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
RCU sometimes needs to perform a delayed wake up for specific kthreads
handling offloaded callbacks (RCU_NOCB). These wakeups are performed
by timers and upon entry to idle (also to guest and to user on nohz_full).
However the delayed wake-up on kernel exit is actually performed after
the thread flags are fetched towards the fast path check for work to
do on exit to user. As a result, and if there is no other pending work
to do upon that kernel exit, the current task will resume to userspace
with TIF_RESCHED set and the pending wake up ignored.
Fix this with fetching the thread flags _after_ the delayed RCU-nocb
kthread wake-up.
Fixes: 47b8ff194c ("entry: Explicitly flush pending rcuog wakeup before last rescheduling point")
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230315194349.10798-3-joel@joelfernandes.org
The variable 'status' (which contains the unhandled overflow bits) is
not being properly masked in some cases, displaying the following
warning:
WARNING: CPU: 156 PID: 475601 at arch/x86/events/amd/core.c:972 amd_pmu_v2_handle_irq+0x216/0x270
This seems to be happening because the loop is being continued before
the status bit being unset, in case x86_perf_event_set_period()
returns 0. This is also causing an inconsistency because the "handled"
counter is incremented, but the status bit is not cleaned.
Move the bit cleaning together above, together when the "handled"
counter is incremented.
Fixes: 7685665c39 ("perf/x86/amd/core: Add PerfMonV2 overflow handling")
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Sandipan Das <sandipan.das@amd.com>
Link: https://lore.kernel.org/r/20230321113338.1669660-1-leitao@debian.org
Commit 829c1651e9 ("sched/fair: sanitize vruntime of entity being placed")
fixes an overflowing bug, but ignore a case that se->exec_start is reset
after a migration.
For fixing this case, we delay the reset of se->exec_start after
placing the entity which se->exec_start to detect long sleeping task.
In order to take into account a possible divergence between the clock_task
of 2 rqs, we increase the threshold to around 104 days.
Fixes: 829c1651e9 ("sched/fair: sanitize vruntime of entity being placed")
Originally-by: Zhang Qiao <zhangqiao22@huawei.com>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Zhang Qiao <zhangqiao22@huawei.com>
Link: https://lore.kernel.org/r/20230317160810.107988-1-vincent.guittot@linaro.org
__enter_from_user_mode() is triggering noinstr warnings with
CONFIG_DEBUG_PREEMPT due to its call of preempt_count_add() via
ct_state().
The preemption disable isn't needed as interrupts are already disabled.
And the context_tracking_enabled() check in ct_state() also isn't needed
as that's already being done by the CT_WARN_ON().
Just use __ct_state() instead.
Fixes the following warnings:
vmlinux.o: warning: objtool: enter_from_user_mode+0xba: call to preempt_count_add() leaves .noinstr.text section
vmlinux.o: warning: objtool: syscall_enter_from_user_mode+0xf9: call to preempt_count_add() leaves .noinstr.text section
vmlinux.o: warning: objtool: syscall_enter_from_user_mode_prepare+0xc7: call to preempt_count_add() leaves .noinstr.text section
vmlinux.o: warning: objtool: irqentry_enter_from_user_mode+0xba: call to preempt_count_add() leaves .noinstr.text section
Fixes: 171476775d ("context_tracking: Convert state to atomic_t")
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/d8955fa6d68dc955dda19baf13ae014ae27926f5.1677369694.git.jpoimboe@kernel.org
Like the Windows Lenovo Yoga Book X91F/L the Android Lenovo Yoga Book
X90F/L has a portrait 1200x1920 screen used in landscape mode,
add a quirk for this.
When the quirk for the X91F/L was initially added it was written to
also apply to the X90F/L but this does not work because the Android
version of the Yoga Book uses completely different DMI strings.
Also adjust the X91F/L quirk to reflect that it only applies to
the X91F/L models.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230301095218.28457-1-hdegoede@redhat.com
The recent support of low latency playback in USB-audio driver made
the snd_usb_queue_pending_output_urbs() function to be called via PCM
ack ops. In the new code path, the function is performed already in
the PCM stream lock. The problem is that, when an XRUN is detected,
the function calls snd_pcm_xrun() to notify, but snd_pcm_xrun() is
supposed to be called only outside the stream lock. As a result, it
leads to a deadlock of PCM stream locking.
For avoiding such a recursive locking, this patch adds an additional
check to the code paths in PCM core that call the ack callback; now it
checks the error code from the callback, and if it's -EPIPE, the XRUN
is handled in the PCM core side gracefully. Along with it, the
USB-audio driver code is changed to follow that, i.e. -EPIPE is
returned instead of the explicit snd_pcm_xrun() call when the function
is performed already in the stream lock.
Fixes: d5f871f89e ("ALSA: usb-audio: Improved lowlatency playback support")
Reported-and-tested-by: John Keeping <john@metanate.com>
Link: https://lore.kernel.org/r/20230317195128.3911155-1-john@metanate.com
Reviewed-by: Jaroslav Kysela <perex@perex.cz>
Reviewed-by; Takashi Sakamoto <o-takashi@sakamocchi.jp>
Link: https://lore.kernel.org/r/20230320142838.494-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
The recent commit f83bb25924 ("ALSA: hda/conexant: Add quirk for
LENOVO 20149 Notebook model") introduced a quirk for the device with
17aa:3977, but this caused a regression on another model (Lenovo
Ideadpad U31) with the very same PCI SSID. And, through skimming over
the net, it seems that this PCI SSID is used for multiple different
models, so it's no good idea to apply the quirk with the SSID.
Although we may take a different ID check (e.g. the codec SSID instead
of the PCI SSID), unfortunately, the original patch author couldn't
identify the hardware details any longer as the machine was returned,
and we can't develop the further proper fix.
In this patch, instead, we partially revert the change so that the
quirk won't be applied as default for addressing the regression.
Meanwhile, the quirk function itself is kept, and it's now made to be
applicable via the explicit model=lenovo-20149 option.
Fixes: f83bb25924 ("ALSA: hda/conexant: Add quirk for LENOVO 20149 Notebook model")
Reported-by: Jetro Jormalainen <jje-lxkl@jetro.fi>
Link: https://lore.kernel.org/r/20230308215009.4d3e58a6@mopti
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20230320140954.31154-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Since io_uring does nonblocking connect requests, if we do two repeated
ones without having a listener, the second will get -ECONNABORTED rather
than the expected -ECONNREFUSED. Treat -ECONNABORTED like a normal retry
condition if we're nonblocking, if we haven't already seen it.
Cc: stable@vger.kernel.org
Fixes: 3fb1bd6881 ("io_uring/net: handle -EINPROGRESS correct for IORING_OP_CONNECT")
Link: https://github.com/axboe/liburing/issues/828
Reported-by: Hui, Chunyang <sanqian.hcy@antgroup.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
io_uring_cmd_done() currently assumes that the uring_lock is held
when invoked, and while it generally is, this is not guaranteed.
Pass in the issue_flags associated with it, so that we have
IO_URING_F_UNLOCKED available to be able to lock the CQ ring
appropriately when completing events.
Cc: stable@vger.kernel.org
Fixes: ee692a21e9 ("fs,io_uring: add infrastructure for uring-cmd")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Building sigaltstack with clang via:
$ ARCH=x86 make LLVM=1 -C tools/testing/selftests/sigaltstack/
produces the following warning:
warning: variable 'sp' is uninitialized when used here [-Wuninitialized]
if (sp < (unsigned long)sstack ||
^~
Clang expects these to be declared at global scope; we've fixed this in
the kernel proper by using the macro `current_stack_pointer`. This is
defined in different headers for different target architectures, so just
create a new header that defines the arch-specific register names for
the stack pointer register, and define it for more targets (at least the
ones that support current_stack_pointer/ARCH_HAS_CURRENT_STACK_POINTER).
Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Link: https://lore.kernel.org/lkml/CA+G9fYsi3OOu7yCsMutpzKDnBMAzJBCPimBp86LhGBa0eCnEpA@mail.gmail.com/
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Tested-by: Linux Kernel Functional Testing <lkft@linaro.org>
Tested-by: Anders Roxell <anders.roxell@linaro.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Pull fsverity fixes from Eric Biggers:
"Fix two significant performance issues with fsverity"
* tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linux:
fsverity: don't drop pagecache at end of FS_IOC_ENABLE_VERITY
fsverity: Remove WQ_UNBOUND from fsverity read workqueue
Pull fscrypt fix from Eric Biggers:
"Fix a bug where when a filesystem was being unmounted, the fscrypt
keyring was destroyed before inodes have been released by the Landlock
LSM.
This bug was found by syzbot"
* tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux:
fscrypt: check for NULL keyring in fscrypt_put_master_key_activeref()
fscrypt: improve fscrypt_destroy_keyring() documentation
fscrypt: destroy keyring after security_sb_delete()
Since the expected write location in a sequential file is always at the
end of the file (append write), when an invalid write append location is
detected in zonefs_file_dio_append(), print the invalid written location
instead of the expected write location.
Fixes: a608da3bd7 ("zonefs: Detect append writes at invalid locations")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
The error handling for platform_get_irq() failing no longer works after
a recent change, clang now points this out with a warning:
drivers/gpu/host1x/dev.c:520:6: error: variable 'syncpt_irq' is uninitialized when used here [-Werror,-Wuninitialized]
if (syncpt_irq < 0)
^~~~~~~~~~
Fix this by removing the variable and checking the correct error status.
Fixes: 625d4ffb43 ("gpu: host1x: Rewrite syncpoint interrupt handling")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Jon Hunter <jonathanh@nvidia.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Mikko Perttunen <mperttunen@nvidia.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The Acer Aspire 3830TG predates Windows 8, so it defaults to using
acpi_video# for backlight control, but this is non functional on
this model.
Add a DMI quirk to use the native backlight interface which does
work properly.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Pull NFS client fixes from Anna Schumaker:
- Fix /proc/PID/io read_bytes accounting
- Fix setting NLM file_lock start and end during decoding testargs
- Fix timing for setting access cache timestamps
* tag 'nfs-for-6.3-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
NFS: Correct timing for assigning access cache timestamp
lockd: set file_lock start and end when decoding nlm4 testargs
NFS: Fix /proc/PID/io read_bytes for buffered reads
cppcheck reports
drivers/thunderbolt/nhi.c:74:7: style: Local variable 'bit' shadows outer variable [shadowVariable]
int bit;
^
drivers/thunderbolt/nhi.c:66:6: note: Shadowed declaration
int bit = ring_interrupt_index(ring) & 31;
^
drivers/thunderbolt/nhi.c:74:7: note: Shadow variable
int bit;
^
For readablity rename the outer to interrupt_bit and the innner
to auto_clear_bit.
Fixes: 468c49f447 ("thunderbolt: Disable interrupt auto clear for ring")
Cc: stable@vger.kernel.org
Signed-off-by: Tom Rix <trix@redhat.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
William writes:
First set of Counter driver fixes for 6.3
This set consists of two fixes for the 104-quad-8 driver:
- fix a read race condition between the FLAG and CNTR registers
(as a result 25-bit count values are no longer supported)
- invert condition check to report correct Index Synapse action
* tag 'counter-fixes-6.3a' of git://git.kernel.org/pub/scm/linux/kernel/git/wbg/counter:
counter: 104-quad-8: Fix Synapse action reported for Index signals
counter: 104-quad-8: Fix race condition between FLAG and CNTR reads
The previous commit 6a192c0cbf ("platform/x86/intel/tpmi: Fix
double free reported by Smatch") incorrectly handle the deallocation of
res variable. As shown in the comment, intel_vsec_add_aux handles all
the deallocation of res and feature_vsec_dev. Therefore, kfree(res) can
still cause double free if intel_vsec_add_aux returns error.
Fix this by adjusting the error handling part in tpmi_create_device,
following the function intel_vsec_add_dev.
Fixes: 6a192c0cbf ("platform/x86/intel/tpmi: Fix double free reported by Smatch")
Signed-off-by: Dongliang Mu <dzm91@hust.edu.cn>
Link: https://lore.kernel.org/r/20230309040107.534716-2-dzm91@hust.edu.cn
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Max EQ depth of hardware is 32K, the current default EQ depth is too small
for some applications, so change the default depth to 4096.
Max send WRs the hardware can support is 8K, but the driver limits the
value to 4K. Remove this limitation.
Fixes: be3cff0f24 ("RDMA/erdma: Add the hardware related definitions")
Fixes: db23ae64ca ("RDMA/erdma: Add verbs header file")
Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com>
Link: https://lore.kernel.org/r/20230320084652.16807-3-chengyou@linux.alibaba.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Geoff Levand says:
====================
net/ps3_gelic_net: DMA related fixes
v9: Make rx_skb_size local to gelic_descr_prepare_rx.
v8: Add more cpu_to_be32 calls.
v7: Remove all cleanups, sync to spider net.
v6: Reworked and cleaned up patches.
v5: Some additional patch cleanups.
v4: More patch cleanups.
v3: Cleaned up patches as requested.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The current Gelic Etherenet driver was checking the return value of its
dma_map_single call, and not using the dma_mapping_error() routine.
Fixes runtime problems like these:
DMA-API: ps3_gelic_driver sb_05: device driver failed to check map error
WARNING: CPU: 0 PID: 0 at kernel/dma/debug.c:1027 .check_unmap+0x888/0x8dc
Fixes: 02c1889166 ("ps3: gigabit ethernet driver for PS3, take3")
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: Geoff Levand <geoff@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The Gelic Ethernet device needs to have the RX sk_buffs aligned to
GELIC_NET_RXBUF_ALIGN, and also the length of the RX sk_buffs must
be a multiple of GELIC_NET_RXBUF_ALIGN.
The current Gelic Ethernet driver was not allocating sk_buffs large
enough to allow for this alignment.
Also, correct the maximum and minimum MTU sizes, and add a new
preprocessor macro for the maximum frame size, GELIC_NET_MAX_FRAME.
Fixes various randomly occurring runtime network errors.
Fixes: 02c1889166 ("ps3: gigabit ethernet driver for PS3, take3")
Signed-off-by: Geoff Levand <geoff@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
clang with W=1 reports
drivers/net/usb/plusb.c:65:1: error:
unused function 'pl_clear_QuickLink_features' [-Werror,-Wunused-function]
pl_clear_QuickLink_features(struct usbnet *dev, int val)
^
This static function is not used, so remove it.
Signed-off-by: Tom Rix <trix@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Packet length retrieved from descriptor may be larger than
the actual socket buffer length. In such case the cloned
skb passed up the network stack will leak kernel memory contents.
Additionally prevent integer underflow when size is less than
ETH_FCS_LEN.
Fixes: 55d7de9de6 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet device driver")
Signed-off-by: Szymon Heidrich <szymon.heidrich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In emac_probe, &adpt->work_thread is bound with
emac_work_thread. Then it will be started by timeout
handler emac_tx_timeout or a IRQ handler emac_isr.
If we remove the driver which will call emac_remove
to make cleanup, there may be a unfinished work.
The possible sequence is as follows:
Fix it by finishing the work before cleanup in the emac_remove
and disable timeout response.
CPU0 CPU1
|emac_work_thread
emac_remove |
free_netdev |
kfree(netdev); |
|emac_reinit_locked
|emac_mac_down
|//use netdev
Fixes: b9b17debc6 ("net: emac: emac gigabit ethernet controller driver")
Signed-off-by: Zheng Wang <zyytlz.wz@163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We collect the software statistics counters for RX bytes (reported to
/proc/net/dev and to ethtool -S $dev | grep 'rx_bytes: ") at a time when
skb->len has already been adjusted by the eth_type_trans() ->
skb_pull_inline(skb, ETH_HLEN) call to exclude the L2 header.
This means that when connecting 2 DSA interfaces back to back and
sending 1 packet with length 100, the sending interface will report
tx_bytes as incrementing by 100, and the receiving interface will report
rx_bytes as incrementing by 86.
Since accounting for that in scripts is quirky and is something that
would be DSA-specific behavior (requiring users to know that they are
running on a DSA interface in the first place), the proposal is that we
treat it as a bug and fix it.
This design bug has always existed in DSA, according to my analysis:
commit 91da11f870 ("net: Distributed Switch Architecture protocol
support") also updates skb->dev->stats.rx_bytes += skb->len after the
eth_type_trans() call. Technically, prior to Florian's commit
a86d8becc3 ("net: dsa: Factor bottom tag receive functions"), each and
every vendor-specific tagging protocol driver open-coded the same bug,
until the buggy code was consolidated into something resembling what can
be seen now. So each and every driver should have its own Fixes: tag,
because of their different histories until the convergence point.
I'm not going to do that, for the sake of simplicity, but just blame the
oldest appearance of buggy code.
There are 2 ways to fix the problem. One is the obvious way, and the
other is how I ended up doing it. Obvious would have been to move
dev_sw_netstats_rx_add() one line above eth_type_trans(), and below
skb_push(skb, ETH_HLEN). But DSA processing is not as simple as that.
We count the bytes after removing everything DSA-related from the
packet, to emulate what the packet's length was, on the wire, when the
user port received it.
When eth_type_trans() executes, dsa_untag_bridge_pvid() has not run yet,
so in case the switch driver requests this behavior - commit
412a1526d0 ("net: dsa: untag the bridge pvid from rx skbs") has the
details - the obvious variant of the fix wouldn't have worked, because
the positioning there would have also counted the not-yet-stripped VLAN
header length, something which is absent from the packet as seen on the
wire (there it may be untagged, whereas software will see it as
PVID-tagged).
Fixes: f613ed665b ("net: dsa: Add support for 64-bit statistics")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When we change the M/N values seamlessly during a fastset we should
also update the vblank timestamping stuff to make sure the vblank
timestamp corrections/guesstimations come out exact.
Note that only crtc_clock and framedur_ns can actually end up
changing here during fastsets. Everything else we touch can
only change during full modesets.
Technically we should try to do this exactly at the start of
vblank, but that would require some kind of double buffering
scheme. Let's skip that for now and just update things right
after the commit has been submitted to the hardware. This
means the information will be properly up to date when the
vblank irq handler goes to work. Only if someone ends up
querying some vblanky stuff in between the commit and start
of vblank may we see a slight discrepancy.
Also this same problem really exists for the DRRS downclocking
stuff. But as that is supposed to be more or less transparent
to the user, and it only drops to low gear after a long delay
(1 sec currently) we probably don't have to worry about it.
Any time something is actively submitting updates DRRS will
remain in high gear and so the timestamping constants will
match the hardware state.
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Mitul Golani <mitulkumar.ajitkumar.golani@intel.com>
Fixes: e6f29923c0 ("drm/i915: Allow M/N change during fastset on bdw+")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230310235828.17439-1-ville.syrjala@linux.intel.com
(cherry picked from commit 8cb1f95cca)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
The Wa_14017073508 require to send Media Busy/Idle mailbox while
accessing Media tile. As of now it is getting handled while __gt_unpark,
__gt_park. But there are various corner cases where forcewakes are taken
without __gt_unpark i.e. without sending Busy Mailbox especially during
register reads. Forcewakes are taken without busy mailbox leads to
GPU HANG. So bringing mailbox calls under forcewake calls are no feasible
option as forcewake calls are atomic and mailbox calls are blocking.
The issue already fixed in B step so disabling MC6 on A step and
reverting previous commit which handles Wa_14017073508
Fixes: 8f70f1ec58 ("drm/i915/mtl: Add Wa_14017073508 for SAMedia")
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230310061339.2495416-2-badal.nilawar@intel.com
(cherry picked from commit 038a24835a)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Currently, when driver queries PTYS to report which link speed is being
used on its RoCE ports, it does not check the case of having 400Gbps
transmitted over 8 lanes. Thus it fails to report the said speed and
instead it defaults to report 10G over 4 lanes.
Add a check for the said speed when querying PTYS and report it back
correctly when needed.
Fixes: 08e8676f16 ("IB/mlx5: Add support for 50Gbps per lane link modes")
Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
Reviewed-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/ec9040548d119d22557d6a4b4070d6f421701fd4.1678973994.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
When interrupt auto clear is programmed, any read to the interrupt
status register will clear all interrupts. If two interrupts have
come in before one can be serviced then this will cause lost interrupts.
On AMD USB4 routers this has manifested in odd problems particularly
with long strings of control tranfers such as reading the DROM via bit
banging.
Instead of clearing interrupts automatically, clear the bit corresponding
to the given ring's interrupt in the ISR.
Fixes: 7a1808f82a ("thunderbolt: Handle ring interrupt by reading interrupt status register")
Cc: Sanju Mehta <Sanju.Mehta@amd.com>
Cc: stable@vger.kernel.org
Tested-by: Anson Tsao <anson.tsao@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
The AlpsPS/2 code previously relied on the assumption that `char` is a
signed type, which was true on x86 platforms (the only place where this
driver is used) before kernel 6.2. However, on 6.2 and later, this
assumption is broken due to the introduction of -funsigned-char as a new
global compiler flag.
Fix this by explicitly specifying the signedness of `char` when sign
extending the values received from the device.
Fixes: f3f33c6776 ("Input: alps - Rushmore and v7 resolution support")
Signed-off-by: msizanoen <msizanoen@qtmlabs.xyz>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230320045228.182259-1-msizanoen@qtmlabs.xyz
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Pull tracing fixes from Steven Rostedt:
- Fix setting affinity of hwlat threads in containers
Using sched_set_affinity() has unwanted side effects when being
called within a container. Use set_cpus_allowed_ptr() instead
- Fix per cpu thread management of the hwlat tracer:
- Do not start per_cpu threads if one is already running for the CPU
- When starting per_cpu threads, do not clear the kthread variable
as it may already be set to running per cpu threads
- Fix return value for test_gen_kprobe_cmd()
On error the return value was overwritten by being set to the result
of the call from kprobe_event_delete(), which would likely succeed,
and thus have the function return success
- Fix splice() reads from the trace file that was broken by commit
36e2c7421f ("fs: don't allow splice read/write without explicit
ops")
- Remove obsolete and confusing comment in ring_buffer.c
The original design of the ring buffer used struct page flags for
tricks to optimize, which was shortly removed due to them being
tricks. But a comment for those tricks remained
- Set local functions and variables to static
* tag 'trace-v6.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing/hwlat: Replace sched_setaffinity with set_cpus_allowed_ptr
ring-buffer: remove obsolete comment for free_buffer_page()
tracing: Make splice_read available again
ftrace: Set direct_ops storage-class-specifier to static
trace/hwlat: Do not start per-cpu thread if it is already running
trace/hwlat: Do not wipe the contents of per-cpu thread data
tracing/osnoise: set several trace_osnoise.c variables storage-class-specifier to static
tracing: Fix wrong return in kprobe_event_gen_test.c
There is a problem with the behavior of hwlat in a container,
resulting in incorrect output. A warning message is generated:
"cpumask changed while in round-robin mode, switching to mode none",
and the tracing_cpumask is ignored. This issue arises because
the kernel thread, hwlatd, is not a part of the container, and
the function sched_setaffinity is unable to locate it using its PID.
Additionally, the task_struct of hwlatd is already known.
Ultimately, the function set_cpus_allowed_ptr achieves
the same outcome as sched_setaffinity, but employs task_struct
instead of PID.
Test case:
# cd /sys/kernel/tracing
# echo 0 > tracing_on
# echo round-robin > hwlat_detector/mode
# echo hwlat > current_tracer
# unshare --fork --pid bash -c 'echo 1 > tracing_on'
# dmesg -c
Actual behavior:
[573502.809060] hwlat_detector: cpumask changed while in round-robin mode, switching to mode none
Link: https://lore.kernel.org/linux-trace-kernel/20230316144535.1004952-1-costa.shul@redhat.com
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Fixes: 0330f7aa8e ("tracing: Have hwlat trace migrate across tracing_cpumask CPUs")
Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
Acked-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Since the commit 36e2c7421f ("fs: don't allow splice read/write
without explicit ops") is applied to the kernel, splice() and
sendfile() calls on the trace file (/sys/kernel/debug/tracing
/trace) return EINVAL.
This patch restores these system calls by initializing splice_read
in file_operations of the trace file. This patch only enables such
functionalities for the read case.
Link: https://lore.kernel.org/linux-trace-kernel/20230314013707.28814-1-sfoon.kim@samsung.com
Cc: stable@vger.kernel.org
Fixes: 36e2c7421f ("fs: don't allow splice read/write without explicit ops")
Signed-off-by: Sung-hun Kim <sfoon.kim@samsung.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Pull tty/serial driver fixes from Greg KH:
"Here are some small tty and serial driver fixes for 6.3-rc3 to resolve
some reported issues.
They include:
- 8250 driver Kconfig issue pointed out by you that showed up in -rc1
- qcom-geni serial driver fixes
- various 8250 driver fixes for reported problems
- fsl_lpuart driver fixes
- serdev fix for regression in -rc1
- vt.c bugfix
All have been in linux-next for over a week with no reported problems"
* tag 'tty-6.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
tty: vt: protect KD_FONT_OP_GET_TALL from unbound access
serial: qcom-geni: drop bogus uart_write_wakeup()
serial: qcom-geni: fix mapping of empty DMA buffer
serial: qcom-geni: fix DMA mapping leak on shutdown
serial: qcom-geni: fix console shutdown hang
serdev: Set fwnode for serdev devices
tty: serial: fsl_lpuart: fix race on RX DMA shutdown
serial: 8250_pci1xxxx: Disable SERIAL_8250_PCI1XXXX config by default
serial: 8250_fsl: fix handle_irq locking
serial: 8250_em: Fix UART port type
serial: 8250: ASPEED_VUART: select REGMAP instead of depending on it
tty: serial: fsl_lpuart: skip waiting for transmission complete when UARTCTRL_SBK is asserted
Revert "tty: serial: fsl_lpuart: adjust SERIAL_FSL_LPUART_CONSOLE config dependency"
percpu_counter_sum_all() is now redundant as the race condition it
was invented to handle is now dealt with by percpu_counter_sum()
directly and all users of percpu_counter_sum_all() have been
removed.
Remove it.
This effectively reverts the changes made in f689054aac
("percpu_counter: add percpu_counter_sum_all interface") except for
the cpumask iteration that fixes percpu_counter_sum() made earlier
in this series.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
This effectively reverts the change made in commit f689054aac
("percpu_counter: add percpu_counter_sum_all interface") as the
race condition percpu_counter_sum_all() was invented to avoid is
now handled directly in percpu_counter_sum() and nobody needs to
care about summing racing with cpu unplug anymore.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
In commit f689054aac ("percpu_counter: add percpu_counter_sum_all
interface") a race condition between a cpu dying and
percpu_counter_sum() iterating online CPUs was identified. The
solution was to iterate all possible CPUs for summation via
percpu_counter_sum_all().
We recently had a percpu_counter_sum() call in XFS trip over this
same race condition and it fired a debug assert because the
filesystem was unmounting and the counter *should* be zero just
before we destroy it. That was reported here:
https://lore.kernel.org/linux-kernel/20230314090649.326642-1-yebin@huaweicloud.com/
likely as a result of running generic/648 which exercises
filesystems in the presence of CPU online/offline events.
The solution to use percpu_counter_sum_all() is an awful one. We
use percpu counters and percpu_counter_sum() for accurate and
reliable threshold detection for space management, so a summation
race condition during these operations can result in overcommit of
available space and that may result in filesystem shutdowns.
As percpu_counter_sum_all() iterates all possible CPUs rather than
just those online or even those present, the mask can include CPUs
that aren't even installed in the machine, or in the case of
machines that can hot-plug CPU capable nodes, even have physical
sockets present in the machine.
Fundamentally, this race condition is caused by the CPU being
offlined being removed from the cpu_online_mask before the notifier
that cleans up per-cpu state is run. Hence percpu_counter_sum() will
not sum the count for a cpu currently being taken offline,
regardless of whether the notifier has run or not. This is
the root cause of the bug.
The percpu counter notifier iterates all the registered counters,
locks the counter and moves the percpu count to the global sum.
This is serialised against other operations that move the percpu
counter to the global sum as well as percpu_counter_sum() operations
that sum the percpu counts while holding the counter lock.
Hence the notifier is safe to run concurrently with sum operations,
and the only thing we actually need to care about is that
percpu_counter_sum() iterates dying CPUs. That's trivial to do,
and when there are no CPUs dying, it has no addition overhead except
for a cpumask_or() operation.
This change makes percpu_counter_sum() always do the right thing in
the presence of CPU hot unplug events and makes
percpu_counter_sum_all() unnecessary. This, in turn, means that
filesystems like XFS, ext4, and btrfs don't have to work out when
they should use percpu_counter_sum() vs percpu_counter_sum_all() in
their space accounting algorithms
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Equivalent of for_each_cpu_and, except it ORs the two masks together
so it iterates all the CPUs present in either mask.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Pull RAS fix from Borislav Petkov:
- Flush out logged errors immediately after MCA banks configuration
changes over sysfs have been done instead of waiting until something
else triggers the workqueue later - another error or the polling
interval cycle is reached
* tag 'ras_urgent_for_v6.3_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mce: Make sure logged MCEs are processed after sysfs update
Back in the 6.2-rc1 days, Eric Whitney reported a fstests regression in
ext4 against generic/454. The cause of this test failure was the
unfortunate combination of setting an xattr name containing UTF8 encoded
emoji, an xattr hash function that accepted a char pointer with no
explicit signedness, signed type extension of those chars to an int, and
the 6.2 build tools maintainers deciding to mandate -funsigned-char
across the board. As a result, the ondisk extended attribute structure
written out by 6.1 and 6.2 were not the same.
This discrepancy, in fact, had been noticeable if a filesystem with such
an xattr were moved between any two architectures that don't employ the
same signedness of a raw "char" declaration. The only reason anyone
noticed is that x86 gcc defaults to signed, and no such -funsigned-char
update was made to e2fsprogs, so e2fsck immediately started reporting
data corruption.
After a day and a half of discussing how to handle this use case (xattrs
with bit 7 set anywhere in the name) without breaking existing users,
Linus merged his own patch and didn't tell the maintainer. None of the
ext4 developers realized this until AUTOSEL announced that the commit
had been backported to stable.
In the end, this problem could have been detected much earlier if there
had been any useful tests of hash function(s) in use inside ext4 to make
sure that they always produce the same outputs given the same inputs.
The XFS dirent/xattr name hash takes a uint8_t*, so I don't think it's
vulnerable to this problem. However, let's avoid all this drama by
adding our own self test to check that the da hash produces the same
outputs for a static pile of inputs on various platforms. This enables
us to fix any breakage that may result in a controlled fashion. The
buffer and test data are identical to the patches submitted to xfsprogs.
Link: https://lore.kernel.org/linux-ext4/Y8bpkm3jA3bDm3eL@debian-BULLSEYE-live-builder-AMD64/
Link: https://lore.kernel.org/linux-xfs/ZBUKCRR7xvIqPrpX@destitution/T/#md38272cc684e2c0d61494435ccbb91f022e8dee4
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
There are now five separate space allocator interfaces exposed to the
rest of XFS for five different strategies to find space. Add
tracepoints for each of them so that I can tell from a trace dump
exactly which ones got called and what happened underneath them. Add a
sixth so it's more obvious if an allocation actually happened.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Callers of xfs_alloc_vextent_iterate_ags that pass in the TRYLOCK flag
want us to perform a non-blocking scan of the AGs for free space. There
are no ordering constraints for non-blocking AGF lock acquisition, so
the scan can freely start over at AG 0 even when minimum_agno > 0.
This manifests fairly reliably on xfs/294 on 6.3-rc2 with the parent
pointer patchset applied and the realtime volume enabled. I observed
the following sequence as part of an xfs_dir_createname call:
0. Fragment the free space, then allocate nearly all the free space in
all AGs except AG 0.
1. Create a directory in AG 2 and let it grow for a while.
2. Try to allocate 2 blocks to expand the dirent part of a directory.
The space will be allocated out of AG 0, but the allocation will not
be contiguous. This (I think) activates the LOWMODE allocator.
3. The bmapi call decides to convert from extents to bmbt format and
tries to allocate 1 block. This allocation request calls
xfs_alloc_vextent_start_ag with the inode number, which starts the
scan at AG 2. We ignore AG 0 (with all its free space) and instead
scrape AG 2 and 3 for more space. We find one block, but this now
kicks t_highest_agno to 3.
4. The createname call decides it needs to split the dabtree. It tries
to allocate even more space with xfs_alloc_vextent_start_ag, but now
we're constrained to AG 3, and we don't find the space. The
createname returns ENOSPC and the filesystem shuts down.
This change fixes the problem by making the trylock scan wrap around to
AG 0 if it doesn't like the AGs that it finds. Since the current
transaction itself holds AGF 0, the trylock of AGF 0 will succeed, and
we take space from the AG that has plenty.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Pull perf fixes from Borislav Petkov:
- Check whether sibling events have been deactivated before adding them
to groups
- Update the proper event time tracking variable depending on the event
type
- Fix a memory overwrite issue due to using the wrong function argument
when outputting perf events
* tag 'perf_urgent_for_v6.3_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf: Fix check before add_event_to_groups() in perf_group_detach()
perf: fix perf_event_context->time
perf/core: Fix perf_output_begin parameter is incorrectly invoked in perf_event_bpf_output
Pull x86 fixes from Borislav Petkov:
"There's a little bit more 'movement' in there for my taste but it
needs to happen and should make the code better after it.
- Check cmdline_find_option()'s return value before further
processing
- Clear temporary storage in the resctrl code to prevent access to an
unexistent MSR
- Add a simple throttling mechanism to protect the hypervisor from
potentially malicious SEV guests issuing requests in rapid
succession.
In order to not jeopardize the sanity of everyone involved in
maintaining this code, the request issuing side has received a
cleanup, split in more or less trivial, small and digestible
pieces. Otherwise, the code was threatening to become an
unmaintainable mess.
Therefore, that cleanup is marked indirectly also for stable so
that there's no differences between the upstream code and the
stable variant when it comes down to backporting more there"
* tag 'x86_urgent_for_v6.3_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm: Fix use of uninitialized buffer in sme_enable()
x86/resctrl: Clear staged_config[] before and after it is used
virt/coco/sev-guest: Add throttling awareness
virt/coco/sev-guest: Convert the sw_exit_info_2 checking to a switch-case
virt/coco/sev-guest: Do some code style cleanups
virt/coco/sev-guest: Carve out the request issuing logic into a helper
virt/coco/sev-guest: Remove the disable_vmpck label in handle_guest_request()
virt/coco/sev-guest: Simplify extended guest request handling
virt/coco/sev-guest: Check SEV_SNP attribute at probe time
Pull ext4 fix from Ted Ts'o:
"Fix a double unlock bug on an error path in ext4, found by smatch and
syzkaller"
* tag 'ext4_for_linus_urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: fix possible double unlock when moving a directory
smatch reports several similar warnings
kernel/trace/trace_osnoise.c:220:1: warning:
symbol '__pcpu_scope_per_cpu_osnoise_var' was not declared. Should it be static?
kernel/trace/trace_osnoise.c:243:1: warning:
symbol '__pcpu_scope_per_cpu_timerlat_var' was not declared. Should it be static?
kernel/trace/trace_osnoise.c:335:14: warning:
symbol 'interface_lock' was not declared. Should it be static?
kernel/trace/trace_osnoise.c:2242:5: warning:
symbol 'timerlat_min_period' was not declared. Should it be static?
kernel/trace/trace_osnoise.c:2243:5: warning:
symbol 'timerlat_max_period' was not declared. Should it be static?
These variables are only used in trace_osnoise.c, so it should be static
Link: https://lore.kernel.org/linux-trace-kernel/20230309150414.4036764-1-trix@redhat.com
Signed-off-by: Tom Rix <trix@redhat.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Overwriting the error code with the deletion result may cause the
function to return 0 despite encountering an error. Commit b111545d26
("tracing: Remove the useless value assignment in
test_create_synth_event()") solves a similar issue by
returning the original error code, so this patch does the same.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Link: https://lore.kernel.org/linux-trace-kernel/20230131075818.5322-1-aagusev@ispras.ru
Signed-off-by: Anton Gusev <aagusev@ispras.ru>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
The cooling levels array is supposed to prevent the system fans from
being configured below a 20% duty cycle as otherwise some of them get
stuck at 0 RPM.
Due to an off-by-one error, the last element in the array was not
initialized, causing it to be set to zero, which in turn lead to fans
being configured with a 0% duty cycle in maximum cooling state.
Since commit 332fdf951d ("mlxsw: thermal: Fix out-of-bounds memory
accesses") the contents of the array are static. Therefore, instead of
fixing the initialization of the array, simply remove it and adjust
thermal_cooling_device_ops::set_cur_state() so that the configured duty
cycle is never set below 20%.
Before:
# cat /sys/class/thermal/thermal_zone0/cdev0/type
mlxsw_fan
# echo 10 > /sys/class/thermal/thermal_zone0/cdev0/cur_state
# cat /sys/class/hwmon/hwmon0/name
mlxsw
# cat /sys/class/hwmon/hwmon0/pwm1
0
After:
# cat /sys/class/thermal/thermal_zone0/cdev0/type
mlxsw_fan
# echo 10 > /sys/class/thermal/thermal_zone0/cdev0/cur_state
# cat /sys/class/hwmon/hwmon0/name
mlxsw
# cat /sys/class/hwmon/hwmon0/pwm1
255
This bug was uncovered when the thermal subsystem repeatedly tried to
configure the cooling devices to their maximum state due to another
issue [1]. This resulted in the fans being stuck at 0 RPM, which
eventually lead to the system undergoing thermal shutdown.
[1] https://lore.kernel.org/netdev/ZA3CFNhU4AbtsP4G@shredder/
Fixes: a421ce088a ("mlxsw: core: Extend cooling device with cooling levels")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Vadim Pasternak <vadimp@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew reports that the SFF modules on one of the ZII platforms do not
indicate link up due to the SFP code believing that LOS indicating that
there is no signal being received from the remote end, but in fact the
LOS signal is showing that there is signal.
What makes SFF modules different from SFPs is they typically have an
inverted LOS, which uncovered this issue. When we read the hardware
state, we mask it with state_hw_mask so we ignore anything we're not
interested in. However, we don't re-read when state_hw_mask changes,
leading to sfp->state being stale.
Arrange for a software poll of the module state after we have parsed
the EEPROM in sfp_sm_mod_probe() and updated state_*_mask. This will
generate any necessary events for signal changes for the state
machine as well as updating sfp->state.
Reported-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Andrew Lunn <andrew@lunn.ch>
Fixes: 8475c4b70b ("net: sfp: re-implement soft state polling setup")
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently DMA address width is either read from a RO device register
or force set from the platform data. This breaks DMA when the host DMA
address width is <=32it but the device is >32bit.
Right now the driver may decide to use a 2nd DMA descriptor for
another buffer (happens in case of TSO xmit) assuming that 32bit
addressing is used due to platform configuration but the device will
still use both descriptor addresses as one address.
This can be observed with the Intel EHL platform driver that sets
32bit for addr64 but the MAC reports 40bit. The TX queue gets stuck in
case of TCP with iptables NAT configuration on TSO packets.
The logic should be like this: Whatever we do on the host side (memory
allocation GFP flags) should happen with the host DMA width, whenever
we decide how to set addresses on the device registers we must use the
device DMA address width.
This patch renames the platform address width field from addr64 (term
used in device datasheet) to host_addr and uses this value exclusively
for host side operations while all chip operations consider the device
DMA width as read from the device register.
Fixes: 7cfc4486e7 ("stmmac: intel: Configure EHL PSE0 GbE and PSE1 GbE to 32 bits DMA addressing")
Signed-off-by: Jochen Henneberg <jh@henneberg-systemdesign.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli says:
====================
ACPI/DT mdiobus module owner fixes
This patch series fixes wrong mdiobus module ownership for MDIO buses
registered from DT or ACPI.
Thanks Maxime for providing the first patch and making me see that ACPI
also had the same issue.
Changes in v2:
- fixed missing kdoc in the first patch
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Bus ownership is wrong when using acpi_mdiobus_register() to register an
mdio bus. That function is not inline, so when it calls
mdiobus_register() the wrong THIS_MODULE value is captured.
CC: Maxime Bizon <mbizon@freebox.fr>
Fixes: 803ca24d2f ("net: mdio: Add ACPI support code for mdio")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bus ownership is wrong when using of_mdiobus_register() to register an mdio
bus. That function is not inline, so when it calls mdiobus_register() the wrong
THIS_MODULE value is captured.
Signed-off-by: Maxime Bizon <mbizon@freebox.fr>
Fixes: 90eff9096c ("net: phy: Allow splitting MDIO bus/device support from PHYs")
[florian: fix kdoc, added Fixes tag]
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the phy_disconnect() -> phy_stop() path, we will be forcibly setting
the PHY state machine to PHY_HALTED. This invalidates the old_state !=
phydev->state condition in phy_state_machine() such that we will neither
display the state change for debugging, nor will we invoke the
link_change_notify() callback.
Factor the code by introducing phy_process_state_change(), and ensure
that we process the state change from phy_stop() as well.
Fixes: 5c5f626bca ("net: phy: improve handling link_change_notify callback")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2023-03-16 (igb, igbvf, igc)
This series contains updates to igb, igbvf, and igc drivers.
Lin Ma removes rtnl_lock() when disabling SRIOV on remove which was
causing deadlock on igb.
Akihiko Odaki delays enabling of SRIOV on igb to prevent early messages
that could get ignored and clears MAC address when PF returns nack on
reset; indicating no MAC address was assigned for igbvf.
Gaosheng Cui frees IRQs in error path for igbvf.
Akashi Takahiro fixes logic on checking TAPRIO gate support for igc.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
In xirc2ps_probe, the local->tx_timeout_task was bounded
with xirc2ps_tx_timeout_task. When timeout occurs,
it will call xirc_tx_timeout->schedule_work to start the
work.
When we call xirc2ps_detach to remove the driver, there
may be a sequence as follows:
Stop responding to timeout tasks and complete scheduled
tasks before cleanup in xirc2ps_detach, which will fix
the problem.
CPU0 CPU1
|xirc2ps_tx_timeout_task
xirc2ps_detach |
free_netdev |
kfree(dev); |
|
| do_reset
| //use dev
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Zheng Wang <zyytlz.wz@163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On rmmod of irdma, the PBLE object memory is not being freed. PBLE object
memory are not statically pre-allocated at function initialization time
unlike other HMC objects. PBLEs objects and the Segment Descriptors (SD)
for it can be dynamically allocated during scale up and SD's remain
allocated till function deinitialization.
Fix this leak by adding IRDMA_HMC_IW_PBLE to the iw_hmc_obj_types[] table
and skip pbles in irdma_create_hmc_obj but not in irdma_del_hmc_objects().
Fixes: 44d9e52977 ("RDMA/irdma: Implement device initialization definitions")
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Link: https://lore.kernel.org/r/20230315145231.931-3-shiraz.saleem@intel.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
We have to make sure that the info returned by the helper is valid
before using it.
Found by Linux Verification Center (linuxtesting.org) with the SVACE
static analysis tool.
Fixes: f990c82c38 ("qed*: Add support for ndo_set_vf_trust")
Fixes: 733def6a04 ("qed*: IOV link control")
Signed-off-by: Daniil Tatianin <d-tatianin@yandex-team.ru>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is a bug for fscrypt_put_master_key_activeref() to see a NULL
keyring. But it used to be possible due to the bug, now fixed, where
fscrypt_destroy_keyring() was called before security_sb_delete(). To be
consistent with how fscrypt_destroy_keyring() uses WARN_ON for the same
issue, WARN and leak the fscrypt_master_key if the keyring is NULL
instead of dereferencing the NULL pointer.
This is a robustness improvement, not a fix.
Link: https://lore.kernel.org/r/20230313221231.272498-4-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Pull fbdev fixes from Helge Deller:
"The majority of lines changed is due to a code style cleanup in the
pnmtologo helper program.
Arnd removed the omap1 osk driver and the SIS fb driver is now
orphaned.
Other than that it's the usual bunch of small fixes and cleanups, e.g.
prevent possible divide-by-zero in various fb drivers if the pixclock
is zero and various conversions to devm_platform*() and of_property*()
functions:
- Drop omap1 osk driver
- Various potential divide by zero pixclock fixes
- Add pixelclock and fb_check_var() to stifb
- Code style cleanups and indenting fixes"
* tag 'fbdev-for-6.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev:
fbdev: Use of_property_present() for testing DT property presence
fbdev: au1200fb: Fix potential divide by zero
fbdev: lxfb: Fix potential divide by zero
fbdev: intelfb: Fix potential divide by zero
fbdev: nvidia: Fix potential divide by zero
fbdev: stifb: Provide valid pixelclock and add fb_check_var() checks
fbdev: omapfb: remove omap1 osk driver
fbdev: xilinxfb: Use devm_platform_get_and_ioremap_resource()
fbdev: wm8505fb: Use devm_platform_ioremap_resource()
fbdev: pxa3xx-gcu: Use devm_platform_get_and_ioremap_resource()
fbdev: Use of_property_read_bool() for boolean properties
fbdev: clps711x-fb: Use devm_platform_get_and_ioremap_resource()
fbdev: tgafb: Fix potential divide by zero
MAINTAINERS: orphan SIS FRAMEBUFFER DRIVER
fbdev: omapfb: cleanup inconsistent indentation
drivers: video: logo: add SPDX comment, remove GPL notice in pnmtologo.c
drivers: video: logo: fix code style issues in pnmtologo.c
Pull Kbuild fixes from Masahiro Yamada:
- Exclude kallsyms_seqs_of_names from kallsyms to fix build error
- Fix 'make kernelrelease' for external module builds
- Get the Debian source package compilable again
- Fix the wrong uname when Debian packages are built with the
KDEB_PKGVERSION option
- Fix superfluous CROSS_COMPILE when building Debian packages
- Fix RPM package build error when KCONFIG_CONFIG is set
- Use 'git archive' for creating source tarballs
- Remove the scripts/list-gitignored tool
* tag 'kbuild-fixes-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kbuild: use git-archive for source package creation
kbuild: rpm-pkg: move source components to rpmbuild/SOURCES
kbuild: deb-pkg: use dh_listpackages to know enabled packages
kbuild: deb-pkg: split image and debug objects staging out into functions
kbuild: deb-pkg: set CROSS_COMPILE only when undefined
kbuild: deb-pkg: do not take KERNELRELEASE from the source version
kbuild: deb-pkg: make debian source package working again
Makefile: Make kernelrelease target work with M=
kconfig: Update config changed flag before calling callback
kallsyms: add kallsyms_seqs_of_names to list of special symbols
Pull hwmon fixes from Guenter Roeck:
- ltc2992, adm1266: Set missing can_sleep flag
- tmp512/tmp513: Drop of_match_ptr for ID table to fix build with
!CONFIG_OF
- ucd90320: Fix back-to-back access problem
- ina3221: Fix bad error return from probe function
- xgene: Fix use-after-free bug in remove function
- adt7475: Fix hysteresis register bit masks, and fix association of
'smoothing' attributes
* tag 'hwmon-for-v6.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (ltc2992) Set `can_sleep` flag for GPIO chip
hwmon: (adm1266) Set `can_sleep` flag for GPIO chip
hwmon: tmp512: drop of_match_ptr for ID table
hwmon: (ucd90320) Add minimum delay between bus accesses
hwmon: (ina3221) return prober error code
hwmon: (xgene) Fix use after free bug in xgene_hwmon_remove due to race condition
hwmon: (adt7475) Fix masking of hysteresis registers
hwmon: (adt7475) Display smoothing attributes in correct order
Pull ata fixes from Damien Le Moal:
- Two fixes from Ondrej for the pata_parport driver to address an issue
with error handling during drive connection and to fix memory leaks
in case of errors during initialization and when disconnecting a
device.
* tag 'ata-6.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
ata: pata_parport: fix memory leaks
ata: pata_parport: fix parport release without claim
The __find_restype() function loops over the m5mols_default_ffmt[]
array, and the termination condition ends up being wrong: instead of
stopping when the iterator becomes the size of the array it traverses,
it stops after it has already overshot the array.
Now, in practice this doesn't likely matter, because the code will
always find the entry it looks for, and will thus return early and never
hit that last extra iteration.
But it turns out that clang will unroll the loop fully, because it has
only two iterations (well, three due to the off-by-one bug), and then
clang will end up just giving up in the middle of the loop unrolling
when it notices that the code walks past the end of the array.
And that made 'objtool' very unhappy indeed, because the generated code
just falls off the edge of the universe, and ends up falling through to
the next function, causing this warning:
drivers/media/i2c/m5mols/m5mols.o: warning: objtool: m5mols_set_fmt() falls through to next function m5mols_get_frame_desc()
Fix the loop ending condition.
Reported-by: Jens Axboe <axboe@kernel.dk>
Analyzed-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Analyzed-by: Nick Desaulniers <ndesaulniers@google.com>
Link: https://lore.kernel.org/linux-block/CAHk-=wgTSdKYbmB1JYM5vmHMcD9J9UZr0mn7BOYM_LudrP+Xvw@mail.gmail.com/
Fixes: bc125106f8 ("[media] Add support for M-5MOLS 8 Mega Pixel camera ISP")
Cc: HeungJun, Kim <riverful.kim@samsung.com>
Cc: Sylwester Nawrocki <s.nawrocki@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signal 16 and higher represent the device's Index lines. The
priv->preset_enable array holds the device configuration for these Index
lines. The preset_enable configuration is active low on the device, so
invert the conditional check in quad8_action_read() to properly handle
the logical state of preset_enable.
Fixes: f1d8a071d4 ("counter: 104-quad-8: Add Generic Counter interface support")
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20230316203426.224745-1-william.gray@linaro.org/
Signed-off-by: William Breathitt Gray <william.gray@linaro.org>
The Counter (CNTR) register is 24 bits wide, but we can have an
effective 25-bit count value by setting bit 24 to the XOR of the Borrow
flag and Carry flag. The flags can be read from the FLAG register, but a
race condition exists: the Borrow flag and Carry flag are instantaneous
and could change by the time the count value is read from the CNTR
register.
Since the race condition could result in an incorrect 25-bit count
value, remove support for 25-bit count values from this driver;
hard-coded maximum count values are replaced by a LS7267_CNTR_MAX define
for consistency and clarity.
Fixes: 28e5d3bb03 ("iio: 104-quad-8: Add IIO support for the ACCES 104-QUAD-8")
Cc: <stable@vger.kernel.org> # 6.1.x
Cc: <stable@vger.kernel.org> # 6.2.x
Link: https://lore.kernel.org/r/20230312231554.134858-1-william.gray@linaro.org/
Signed-off-by: William Breathitt Gray <william.gray@linaro.org>
Another Lenovo convertable which reports a landscape resolution of
1920x1200 with a pitch of (1920 * 4) bytes, while the actual framebuffer
has a resolution of 1200x1920 with a pitch of (1200 * 4) bytes.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Commit 8633ef82f1 ("drivers/firmware: consolidate EFI framebuffer setup
for all arches") moved the sysfb_apply_efi_quirks() call in sysfb_init()
from before the [sysfb_]parse_mode() call to after it.
But sysfb_apply_efi_quirks() modifies the global screen_info struct which
[sysfb_]parse_mode() parses, so doing it later is too late.
This has broken all DMI based quirks for correcting wrong firmware efifb
settings when simpledrm is used.
To fix this move the sysfb_apply_efi_quirks() call back to its old place
and split the new setup of the efifb_fwnode (which requires
the platform_device) into its own function and call that at
the place of the moved sysfb_apply_efi_quirks(pd) calls.
Fixes: 8633ef82f1 ("drivers/firmware: consolidate EFI framebuffer setup for all arches")
Cc: stable@vger.kernel.org
Cc: Javier Martinez Canillas <javierm@redhat.com>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
We no longer use the recsize argument for locating the string table in
an SMBIOS record, so we can drop it from the internal API.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Instead of using the SMBIOS type 1 record 'family' field, which is often
modified by OEMs, use the type 4 'processor ID' and 'processor version'
fields, which are set to a small set of probe-able values on all known
Ampere EFI systems in the field.
Fixes: 550b33cfd4 ("arm64: efi: Force the use of ...")
Tested-by: Andrea Righi <andrea.righi@canonical.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
The type 1 SMBIOS record happens to always be the same size, but there
are other record types which have been augmented over time, and so we
should really use the length field in the header to decide where the
string table starts.
Fixes: 550b33cfd4 ("arm64: efi: Force the use of ...")
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2023-03-16 (iavf)
This series contains updates to iavf driver only.
Alex fixes incorrect check against Rx hash feature and corrects payload
value for IPv6 UDP packet.
Ahmed removes bookkeeping of VLAN 0 filter as it always exists and can
cause a false max filter error message.
* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
iavf: do not track VLAN 0 filters
iavf: fix non-tunneled IPv6 UDP packet type and hashing
iavf: fix inverted Rx hash condition leading to disabled hash
====================
Link: https://lore.kernel.org/r/20230316155316.1554931-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show_stack dumps raw stack contents which may trigger an unnecessary
KASAN report. Fix it by copying stack contents to a temporary buffer
with __memcpy and then printing that buffer instead of passing stack
pointer directly to the print_hex_dump.
Cc: stable@vger.kernel.org
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
We need to reset forceidle_sum to 0 when reading from root, since the
bstat we accumulate into is stack allocated.
To make this more robust, just replace the existing cputime reset with a
memset of the overall bstat.
Signed-off-by: Josh Don <joshdon@google.com>
Fixes: 1fcf54deb7 ("sched/core: add forced idle accounting for cgroups")
Cc: stable@vger.kernel.org # v6.0+
Signed-off-by: Tejun Heo <tj@kernel.org>
The splice read calls nfsd_splice_actor to put the pages containing file
data into the svc_rqst->rq_pages array. It's possible however to get a
splice result that only has a partial page at the end, if (e.g.) the
filesystem hands back a short read that doesn't cover the whole page.
nfsd_splice_actor will plop the partial page into its rq_pages array and
return. Then later, when nfsd_splice_actor is called again, the
remainder of the page may end up being filled out. At this point,
nfsd_splice_actor will put the page into the array _again_ corrupting
the reply. If this is done enough times, rq_next_page will overrun the
array and corrupt the trailing fields -- the rq_respages and
rq_next_page pointers themselves.
If we've already added the page to the array in the last pass, don't add
it to the array a second time when dealing with a splice continuation.
This was originally handled properly in nfsd_splice_actor, but commit
91e23b1c39 ("NFSD: Clean up nfsd_splice_actor()") removed the check
for it.
Fixes: 91e23b1c39 ("NFSD: Clean up nfsd_splice_actor()")
Cc: Al Viro <viro@zeniv.linux.org.uk>
Reported-by: Dario Lesca <d.lesca@solinos.it>
Tested-by: David Critch <dcritch@redhat.com>
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2150630
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Pull networking fixes from Jakub Kicinski:
"Including fixes from netfilter, wifi and ipsec.
A little more changes than usual, but it's pretty normal for us that
the rc3/rc4 PRs are oversized as people start testing in earnest.
Possibly an extra boost from people deploying the 6.1 LTS but that's
more of an unscientific hunch.
Current release - regressions:
- phy: mscc: fix deadlock in phy_ethtool_{get,set}_wol()
- virtio: vsock: don't use skbuff state to account credit
- virtio: vsock: don't drop skbuff on copy failure
- virtio_net: fix page_to_skb() miscalculating the memory size
Current release - new code bugs:
- eth: correct xdp_features after device reconfig
- wifi: nl80211: fix the puncturing bitmap policy
- net/mlx5e: flower:
- fix raw counter initialization
- fix missing error code
- fix cloned flow attribute
- ipa:
- fix some register validity checks
- fix a surprising number of bad offsets
- kill FILT_ROUT_CACHE_CFG IPA register
Previous releases - regressions:
- tcp: fix bind() conflict check for dual-stack wildcard address
- veth: fix use after free in XDP_REDIRECT when skb headroom is small
- ipv4: fix incorrect table ID in IOCTL path
- ipvlan: make skb->skb_iif track skb->dev for l3s mode
- mptcp:
- fix possible deadlock in subflow_error_report
- fix UaFs when destroying unaccepted and listening sockets
- dsa: mv88e6xxx: fix max_mtu of 1492 on 6165, 6191, 6220, 6250, 6290
Previous releases - always broken:
- tcp: tcp_make_synack() can be called from process context, don't
assume preemption is disabled when updating stats
- netfilter: correct length for loading protocol registers
- virtio_net: add checking sq is full inside xdp xmit
- bonding: restore IFF_MASTER/SLAVE flags on bond enslave Ethertype
change
- phy: nxp-c45-tja11xx: fix MII_BASIC_CONFIG_REV bit number
- eth: i40e: fix crash during reboot when adapter is in recovery mode
- eth: ice: avoid deadlock on rtnl lock when auxiliary device
plug/unplug meets bonding
- dsa: mt7530:
- remove now incorrect comment regarding port 5
- set PLL frequency and trgmii only when trgmii is used
- eth: mtk_eth_soc: reset PCS state when changing interface types
Misc:
- ynl: another license adjustment
- move the TCA_EXT_WARN_MSG attribute for tc action"
* tag 'net-6.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (108 commits)
selftests: bonding: add tests for ether type changes
bonding: restore bond's IFF_SLAVE flag if a non-eth dev enslave fails
bonding: restore IFF_MASTER/SLAVE flags on bond enslave ether type change
net: renesas: rswitch: Fix GWTSDIE register handling
net: renesas: rswitch: Fix the output value of quote from rswitch_rx()
ethernet: sun: add check for the mdesc_grab()
net: ipa: fix some register validity checks
net: ipa: kill FILT_ROUT_CACHE_CFG IPA register
net: ipa: add two missing declarations
net: ipa: reg: include <linux/bug.h>
net: xdp: don't call notifiers during driver init
net/sched: act_api: add specific EXT_WARN_MSG for tc action
Revert "net/sched: act_api: move TCA_EXT_WARN_MSG to the correct hierarchy"
net: dsa: microchip: fix RGMII delay configuration on KSZ8765/KSZ8794/KSZ8795
ynl: make the tooling check the license
ynl: broaden the license even more
tools: ynl: make definitions optional again
hsr: ratelimit only when errors are printed
qed/qed_mng_tlv: correctly zero out ->min instead of ->hour
selftests: net: devlink_port_split.py: skip test if no suitable device available
...
We had a couple of checks for session in cifs_tree_connect
and cifs_mark_open_files_invalid, which were unnecessary.
And that was done with ses_lock. Changed that to tc_lock too.
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Pull block fixes from Jens Axboe:
"A bit bigger than usual, as the NVMe pull request missed last weeks
submission. In detail:
- NVMe pull request via Christoph:
- Avoid potential UAF in nvmet_req_complete (Damien Le Moal)
- More quirks (Elmer Miroslav Mosher Golovin, Philipp Geulen)
- Fix a memory leak in the nvme-pci probe teardown path
(Irvin Cote)
- Repair the MAINTAINERS entry (Lukas Bulwahn)
- Fix handling single range discard request (Ming Lei)
- Show more opcode names in trace events (Minwoo Im)
- Fix nvme-tcp timeout reporting (Sagi Grimberg)
- MD pull request via Song:
- Two fixes for old issues (Neil)
- Resource leak in device stopping (Xiao)
- Bio based device stats fix (Yu)
- Kill unused CONFIG_BLOCK_COMPAT (Lukas)
- sunvdc missing mdesc_grab() failure check (Liang)
- Fix for reversal of request ordering upon issue for certain cases
(Jan)
- null_blk timeout fixes (Damien)
- Loop use-after-free fix (Bart)
- blk-mq SRCU fix for BLK_MQ_F_BLOCKING devices (Chris)"
* tag 'block-6.3-2023-03-16' of git://git.kernel.dk/linux:
block: remove obsolete config BLOCK_COMPAT
md: select BLOCK_LEGACY_AUTOLOAD
block: count 'ios' and 'sectors' when io is done for bio-based device
block: sunvdc: add check for mdesc_grab() returning NULL
nvmet: avoid potential UAF in nvmet_req_complete()
nvme-trace: show more opcode names
nvme-tcp: add nvme-tcp pdu size build protection
nvme-tcp: fix opcode reporting in the timeout handler
nvme-pci: add NVME_QUIRK_BOGUS_NID for Lexar NM620
nvme-pci: add NVME_QUIRK_BOGUS_NID for Netac NV3000
nvme-pci: fixing memory leak in probe teardown path
nvme: fix handling single range discard request
MAINTAINERS: repair malformed T: entries in NVM EXPRESS DRIVERS
block: null_blk: cleanup null_queue_rq()
block: null_blk: Fix handling of fake timeout request
blk-mq: fix "bad unlock balance detected" on q->srcu in __blk_mq_run_dispatch_ops
loop: Fix use-after-free issues
block: do not reverse request order when flushing plug list
md: avoid signed overflow in slot_store()
md: Free resources in __md_stop
Pull io_uring fixes from Jens Axboe:
- When PF_NO_SETAFFINITY was removed for io-wq threads, we kind of
forgot about the SQPOLL thread. Remove it there as well, there's even
less of a reason to set it there (Michal)
- Fixup a confusing 'ret' setting (Li)
- When MSG_RING is used to send a direct descriptor to another ring,
it's possible to have it allocate it on the target ring rather than
provide a specific index for it. If this is done, return the chosen
value in the CQE, like we would've done locally (Pavel)
- Fix a regression in this series on huge page bvec collapsing (Pavel)
* tag 'io_uring-6.3-2023-03-16' of git://git.kernel.dk/linux:
io_uring/rsrc: fix folio accounting
io_uring/msg_ring: let target know allocated index
io_uring: rsrc: Optimize return value variable 'ret'
io_uring/sqpoll: Do not set PF_NO_SETAFFINITY on sqpoll threads
Pull power management fixes from Rafael Wysocki:
"These fix an error code path issue in a cpuidle driver and make the
sleepgraph utility more robust against unexpected input.
Specifics:
- Fix the psci_pd_init_topology() failure path in the PSCI cpuidle
driver (Shawn Guo)
- Modify the sleepgraph utility so it does not crash on binary data
in device names (Todd Brandt)"
* tag 'pm-6.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
pm-graph: sleepgraph: Avoid crashing on binary data in device names
cpuidle: psci: Iterate backwards over list in psci_pd_remove()
Pull ACPI fixes from Rafael Wysocki:
"These add some new quirks, fix PPTT handling, fix an ACPI utility and
correct a mistake in the ACPI documentation.
Specifics:
- Fix ACPI PPTT handling to avoid sleep in the atomic context when it
is not present (Sudeep Holla)
- Add 'backlight=native' DMI quirk for Dell Vostro 15 3535 to the
ACPI video driver (Chia-Lin Kao)
- Add ACPI quirks for I2C device enumeration on Lenovo Yoga Book X90
and Acer Iconia One 7 B1-750 (Hans de Goede)
- Fix handling of invalid command line option values in the ACPI
pfrut utility (Chen Yu)
- Fix references to I2C device data type in the ACPI documentation
for device enumeration (Andy Shevchenko)"
* tag 'acpi-6.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: tools: pfrut: Check if the input of level and type is in the right numeric range
ACPI: PPTT: Fix to avoid sleep in the atomic context when PPTT is absent
ACPI: x86: Add skip i2c clients quirk for Lenovo Yoga Book X90
ACPI: x86: Add skip i2c clients quirk for Acer Iconia One 7 B1-750
ACPI: x86: Introduce an acpi_quirk_skip_gpio_event_handlers() helper
ACPI: video: Add backlight=native DMI quirk for Dell Vostro 15 3535
ACPI: docs: enumeration: Correct reference to the I²C device data type
Pull turbostat fweaks and fixes from Len Brown:
"Leprechaun sized fixes and tweaks touching only turbostat.
'Keeping happy users happy since 2010'"
* 'turbostat' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
tools/power turbostat: version 2023.03.17
tools/power turbostat: fix decoding of HWP_STATUS
tools/power turbostat: Introduce support for EMR
tools/power turbostat: remove stray newlines from warn/warnx strings
tools/power turbostat: Fix /dev/cpu_dma_latency warnings
tools/power turbostat: Provide better debug messages for failed capabilities accesses
tools/power turbostat: update dump of SECONDARY_TURBO_RATIO_LIMIT
Pull xen fixes from Juergen Gross:
- cleanup for xen time handling
- enable the VGA console in a Xen PVH dom0
- cleanup in the xenfs driver
* tag 'for-linus-6.3-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen: remove unnecessary (void*) conversions
x86/PVH: obtain VGA console info in Dom0
x86/xen/time: cleanup xen_tsc_safe_clocksource
xen: update arch/x86/include/asm/xen/cpuid.h
Pull RISC-V fixes from Palmer Dabbelt:
- fixes to the ASID allocator to avoid leaking stale mappings between
tasks
- fix the vmalloc fault handler to tolerate huge pages
* tag 'riscv-for-linus-6.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
RISC-V: mm: Support huge page in vmalloc_fault()
riscv: asid: Fixup stale TLB entry cause application crash
Revert "riscv: mm: notify remote harts about mmu cache updates"
Pull s390 fixes from Vasily Gorbik:
- Update defconfigs
- Fix early boot code by adding missing intersection check to prevent
potential overwriting of the ipl report
- Fix a use-after-free issue in s390-specific code related to PCI
resources being retained after hot-unplugging individual functions,
by removing the resources from the PCI bus's resource list and using
the zpci_bar_struct's resource pointer directly
* tag 's390-6.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390: update defconfigs
PCI: s390: Fix use-after-free of PCI resources with per-function hotplug
s390/ipl: add missing intersection check to ipl_report handling
Pull powerpc fixes from Michael Ellerman:
- Fix false detection of read faults, introduced by execute-only
support
- Fix a build failure when GENERIC_ALLOCATOR is not selected
Thanks to Russell Currey, Randy Dunlap, Michal Suchánek, Nathan Lynch,
and Benjamin Gray.
* tag 'powerpc-6.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/mm: Fix false detection of read faults
powerpc/pseries: RTAS work area requires GENERIC_ALLOCATOR
Pull sound fixes from Takashi Iwai:
"Nothing surprising, a collection of small device-specific fixes.
The majority of changes are for ASoC Intel stuff, while a few other
ASoC and HD-audio fixes are found"
* tag 'sound-6.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (31 commits)
ALSA: hda/ca0132: fixup buffer overrun at tuning_ctl_set()
ALSA: asihpi: check pao in control_message()
ASoC: hdmi-codec: only startup/shutdown on supported streams
ASoC: da7219: Initialize jack_det_mutex
ALSA: hda: Match only Intel devices with CONTROLLER_IN_GPU()
ALSA: hda/realtek: Fix the speaker output on Samsung Galaxy Book2 Pro
ALSA: hda/realtek: fix speaker, mute/micmute LEDs not work on a HP platform
ALSA: hda: intel-dsp-config: add MTL PCI id
ASoC: SOF: IPC4: update gain ipc msg definition to align with fw
ASoC: SOF: sof-audio: don't squelch errors in WIDGET_SETUP phase
ASoC: SOF: Intel: hda-ctrl: re-add sleep after entering and exiting reset
ASoC: SOF: Intel: hda-dsp: harden D0i3 programming sequence
ASoC: SOF: ipc4-topology: set dmic dai index from copier
ASoC: SOF: sof-audio: Fix broken early bclk feature for SSP
ASoC: SOF: Intel: pci-tng: revert invalid bar size setting
ASoC: SOF: topology: Fix error handling in sof_widget_ready()
ASoC: Intel: soc-acpi: fix copy-paste issue in topology names
ASoC: SOF: ipc4-topology: Fix incorrect sample rate print unit
ASoC: SOF: ipc3: Check for upper size limit for the received message
ASOC: SOF: Intel: pci-tgl: Fix device description
...
Pull drm fixes from Dave Airlie:
"Seems like a pretty regular rc3, i915 and amdgpu with the usual
selection of fixes, then a scattering of fixes across misc drivers and
other areas:
accel:
- build fix for accel
edid:
- fix info leak in edid
ttm:
- fix NULL ptr deref
- reference counting fix
i915:
- Fix hwmon PL1 power limit enabling
- Fix audio ELD handling for DP MST
- Fix PSR io and wake line calculations
- Fix DG2 HDMI modes with 267.30 and 319.89 MHz pixel clocks
- Fix SSEU subslice out-of-bounds access
- Fix misuse of non-idle barriers as fence trackers
amdgpu:
- SMU 13 update
- RDNA2 suspend/resume fix when overclocking is enabled
- SRIOV VCN fixes
- HDCP suspend/resume fix
- Fix drm polling splat regression
- Fix dirty rectangle tracking for PSR
- Fix vangogh regression on certain BIOSes
- Misc display fixes
- Suspend/resume IOMMU regression fix
amdkfd:
- Fix BO offset for multi-VMA page migration
- Fix a possible double free
- Fix potential use after free
- Fix process cleanup on module exit
bridge:
- fix returned array size name documentation
fbdev:
- ref-counting fix for fbdev deferred I/O
virtio:
- dma sync fix
shmem-helper:
- error path fix
msm:
- shrinker blocking fix
panfrost:
- shrinker rpm fix
chipsfb:
- fix error code
meson:
- fix 1px pink line
- fix regulator interaction
sun4i:
- fix missing component unbind"
* tag 'drm-fixes-2023-03-17' of git://anongit.freedesktop.org/drm/drm: (38 commits)
drm/ttm: drop extra ttm_bo_put in ttm_bo_cleanup_refs
drm/amdgpu: Don't resume IOMMU after incomplete init
drm/amdkfd: Fixed kfd_process cleanup on module exit.
drm/amd/display: disconnect MPCC only on OTG change
drm/amd/display: Fix DP MST sinks removal issue
drm/amd/display: Do not set DRR on pipe Commit
drm/amd/display: Remove OTG DIV register write for Virtual signals.
drm/meson: dw-hdmi: Fix devm_regulator_*get_enable*() conversion again
drm/bridge: Fix returned array size name for atomic_get_input_bus_fmts kdoc
drm/amdgpu/vcn: Disable indirect SRAM on Vangogh broken BIOSes
drm/amdgpu/nv: fix codec array for SR_IOV
drm/amd/display: Write to correct dirty_rect
drm/amdgpu: move poll enabled/disable into non DC path
drm/amd/display: Fix HDCP failing to enable after suspend
drm/amdkfd: fix potential kgd_mem UAFs
drm/amdgpu/vcn: custom video info caps for sriov
drm/amd/pm: Fix sienna cichlid incorrect OD volage after resume
drm/amd/pm: bump SMU 13.0.4 driver_if header version
drm/amdkfd: fix a potential double free in pqm_create_queue
drm/amdkfd: Get prange->offset after svm_range_vram_node_new
...
Pull SCSI fixes from James Bottomley:
"Ten patches, eight in drivers and two in the core, which correct a
regression from directory removal and add a no VPD size quirk also to
fix a regression. All pretty small"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: ufs: mcq: Use active_reqs to check busy in clock scaling
scsi: core: Fix a procfs host directory removal regression
scsi: core: Add BLIST_NO_VPD_SIZE for some VDASD
scsi: mpi3mr: Fix expander node leak in mpi3mr_remove()
scsi: mpi3mr: Fix memory leaks in mpi3mr_init_ioc()
scsi: mpi3mr: Fix sas_hba.phy memory leak in mpi3mr_remove()
scsi: mpi3mr: Fix mpi3mr_hba_port memory leak in mpi3mr_remove()
scsi: mpi3mr: Fix config page DMA memory leak
scsi: mpi3mr: Fix throttle_groups memory leak
scsi: mpt3sas: Fix NULL pointer access in mpt3sas_transport_port_add()
Arm SCMI fixes for v6.3
Few fixes addressing issues around validation of device tree SCMI node,
allowing raw SCMI access even on systems which fail to probe the normal
driver stack, duplicate header inclusion, clean up return statement using
literal values instead of variable to simplify and use of devm_bitmap_zalloc
instead of devm_kcalloc to improve semantic.
* tag 'scmi-fixes-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux:
firmware: arm_scmi: Use the bitmap API to allocate bitmaps
firmware: arm_scmi: Fix device node validation for mailbox transport
firmware: arm_scmi: Fix raw coexistence mode behaviour on failure path
firmware: arm_scmi: Remove duplicate include header inclusion
firmware: arm_scmi: Return a literal instead of a variable
firmware: arm_scmi: Clean up a return statement in scmi_probe
Link: https://lore.kernel.org/r/20230315193557.1709241-1-sudeep.holla@arm.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Merge a PSCI cpuidle driver fix for 6.3-rc1:
- Fix the psci_pd_init_topology() failure path in the PSCI cpuidle
driver (Shawn Guo).
* pm-cpuidle:
cpuidle: psci: Iterate backwards over list in psci_pd_remove()
Merge a new ACPI backlight quirk, new ACPI quirks for I2C device
enumeration on some platforms, a pfrut utility fix and an ACPI
documentation fix for 6.3-rc3:
- Add backlight=native DMI quirk for Dell Vostro 15 3535 to the ACPI
video driver (Chia-Lin Kao).
- Add ACPI quirks for I2C devices enumeration on Lenovo Yoga Book X90
and Acer Iconia One 7 B1-750 (Hans de Goede).
- Fix handling of invalid command line option values in the ACPI pfrut
utility (Chen Yu).
- Fix references to I2C device data type in the ACPI documentation for
device enumeration (Andy Shevchenko).
* acpi-video:
ACPI: video: Add backlight=native DMI quirk for Dell Vostro 15 3535
* acpi-x86:
ACPI: x86: Add skip i2c clients quirk for Lenovo Yoga Book X90
ACPI: x86: Add skip i2c clients quirk for Acer Iconia One 7 B1-750
ACPI: x86: Introduce an acpi_quirk_skip_gpio_event_handlers() helper
* acpi-tools:
ACPI: tools: pfrut: Check if the input of level and type is in the right numeric range
* acpi-docs:
ACPI: docs: enumeration: Correct reference to the I²C device data type
When running as non-root the following error is seen in turbostat:
turbostat: fopen /dev/cpu_dma_latency
: Permission denied
turbostat and the man page have information on how to avoid other
permission errors, so these can be fixed the same way.
Provide better /dev/cpu_dma_latency warnings that provide instructions on
how to avoid the error, and update the man page.
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Cc: linux-pm@vger.kernel.org
Signed-off-by: Len Brown <len.brown@intel.com>
turbostat reports some capabilities access errors and not others. Provide
the same debug message for all errors.
[lenb: remove extra quotes]
Cc: David Arcari <darcari@redhat.com>
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>
i.MX fixes for 6.3:
- A couple of i.MX93 fixes from Alexander Stein to correct EQoS Ethernet
properties.
- Correct clock-names of FlexSPI device in imx8-ss-lsio DT.
- Fix EQoS PHY reset GPIO by dropping the deprecated/wrong property and
switch to the new bindings.
- Fix an issue with imx-weim bus driver that branch condition evaluates
to a garbage value.
- Correct WM8960 clock name for imx8mm-nitrogen-r2 board.
- Fix LCDIF2 clocks for i.MX8MP DT.
- Add missing #sound-dai-cells properties to SAI nodes for i.MX8MN DT.
- Revert LS1028A DT changes of getting MAC addresses from VPD, as the
dependency on NVMEM device is not in place.
- A series from Peng Fan to add missing pinctrl property for i.MX6SL
based devices.
* tag 'imx-fixes-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux:
arm64: dts: imx93: add missing #address-cells and #size-cells to i2c nodes
bus: imx-weim: fix branch condition evaluates to a garbage value
arm64: dts: imx8mn: specify #sound-dai-cells for SAI nodes
ARM: dts: imx6sl: tolino-shine2hd: fix usbotg1 pinctrl
ARM: dts: imx6sll: e60k02: fix usbotg1 pinctrl
ARM: dts: imx6sll: e70k02: fix usbotg1 pinctrl
arm64: dts: imx93: Fix eqos properties
arm64: dts: imx8mp: Fix LCDIF2 node clock order
arm64: dts: imx8mm-nitrogen-r2: fix WM8960 clock name
arm64: dts: imx8dxl-evk: Fix eqos phy reset gpio
Revert "arm64: dts: ls1028a: sl28: get MAC addresses from VPD"
arm64: dts: freescale: imx8-ss-lsio: Fix flexspi clock order
Link: https://lore.kernel.org/r/20230315132814.GF143566@dragon
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
arm64: tegra: Device tree fixes for v6.3-rc1
This contains a fix for the CBB bus' ranges property on Tegra194 and
Tegra234 that restores proper translation of PCI addresses.
* tag 'tegra-for-6.3-arm64-dt-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux:
arm64: tegra: Bump CBB ranges property on Tegra194 and Tegra234
Link: https://lore.kernel.org/r/20230302094213.3874449-1-thierry.reding@gmail.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Hyper-V should never specify a VM that is a Confidential VM and also
running in the root partition. Nonetheless, explicitly block such a
combination to guard against a compromised Hyper-V maliciously trying to
exploit root partition functionality in a Confidential VM to expose
Confidential VM secrets. No known bug is being fixed, but the attack
surface for Confidential VMs on Hyper-V is reduced.
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/1678894453-95392-1-git-send-email-mikelley@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
The Android Lenovo Yoga Book X90F / X90L uses the same goodix touchscreen
with 9 bytes touch reports for its touch keyboard as the already supported
Windows Lenovo Yoga Book X91F/L, add a DMI match for this to
the nine_bytes_report DMI table.
When the quirk for the X91F/L was initially added it was written to
also apply to the X90F/L but this does not work because the Android
version of the Yoga Book uses completely different DMI strings.
Also adjust the X91F/L quirk to reflect that it only applies to
the X91F/L models.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Bastien Nocera <hadess@hadess.net>
Link: https://lore.kernel.org/r/20230315134442.71787-1-hdegoede@redhat.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
llsec_parse_seclevel has the null pointer check at its begining. Compared
with nl802154_add_llsec_seclevel, nl802154_del_llsec_seclevel has a
redundant null pointer check of info->attrs[NL802154_ATTR_SEC_LEVEL]
before llsec_parse_seclevel.
Fix this issue by removing the null pointer check in
nl802154_del_llsec_seclevel.
Signed-off-by: Dongliang Mu <dzm91@hust.edu.cn>
Link: https://lore.kernel.org/r/20230308083231.460015-1-dzm91@hust.edu.cn
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
In case when VCPU is blocked due to WFI, we schedule the timer
from `kvm_riscv_vcpu_timer_blocking()` to keep timer interrupt
ticking.
But in case when delta_ns comes to be zero, we never schedule
the timer and VCPU keeps sleeping indefinitely until any activity
is done with VM console.
This is easily reproduce-able using kvmtool.
./lkvm-static run -c1 --console virtio -p "earlycon root=/dev/vda" \
-k ./Image -d rootfs.ext4
Also, just add a print in kvm_riscv_vcpu_vstimer_expired() to
check the interrupt delivery and run `top` or similar auto-upating
cmd from guest. Within sometime one can notice that print from
timer expiry routine stops and the `top` cmd output will stop
updating.
This change fixes this by making sure we schedule the timer even
with delta_ns being zero to bring the VCPU out of sleep immediately.
Fixes: 8f5cb44b1b ("RISC-V: KVM: Support sstc extension")
Signed-off-by: Rajnesh Kanwal <rkanwal@rivosinc.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
Nikolay Aleksandrov says:
====================
bonding: properly restore flags when bond changes ether type
A bug was reported by syzbot[1] that causes a warning and a myriad of
other potential issues if a bond, that is also a slave, fails to enslave a
non-eth device. While fixing that bug I found that we have the same
issues when such enslave passes and after that the bond changes back to
ARPHRD_ETHER (again due to ether_setup). This set fixes all issues by
extracting the ether_setup() sequence in a helper which does the right
thing about bond flags when it needs to change back to ARPHRD_ETHER. It
also adds selftests for these cases.
Patch 01 adds the new bond_ether_setup helper and fixes the issues when a
bond device changes its ether type due to successful enslave. Patch 02
fixes the issues when it changes its ether type due to an unsuccessful
enslave. Note we need two patches because the bugs were introduced by
different commits. Patch 03 adds the new selftests.
Due to the comment adjustment and squash, could you please review
patch 01 again? I've kept the other acks since there were no code
changes.
v3: squash the helper patch and the first fix, adjust the comment above
it to be explicit about the bond device, no code changes
v2: new set, all patches are new due to new approach of fixing these bugs
[1] https://syzkaller.appspot.com/bug?id=391c7b1f6522182899efba27d891f1743e8eb3ef
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Add new network selftests for the bonding device which exercise the ether
type changing call paths. They also test for the recent syzbot bug[1] which
causes a warning and results in wrong device flags (IFF_SLAVE missing).
The test adds three bond devices and a nlmon device, enslaves one of the
bond devices to the other and then uses the nlmon device for successful
and unsuccesful enslaves both of which change the bond ether type. Thus
we can test for both MASTER and SLAVE flags at the same time.
If the flags are properly restored we get:
TEST: Change ether type of an enslaved bond device with unsuccessful enslave [ OK ]
TEST: Change ether type of an enslaved bond device with successful enslave [ OK ]
[1] https://syzkaller.appspot.com/bug?id=391c7b1f6522182899efba27d891f1743e8eb3ef
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Acked-by: Jonathan Toppins <jtoppins@redhat.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add bond_ether_setup helper which is used to fix ether_setup() calls in the
bonding driver. It takes care of both IFF_MASTER and IFF_SLAVE flags, the
former is always restored and the latter only if it was set.
If the bond enslaves non-ARPHRD_ETHER device (changes its type), then
releases it and enslaves ARPHRD_ETHER device (changes back) then we
use ether_setup() to restore the bond device type but it also resets its
flags and removes IFF_MASTER and IFF_SLAVE[1]. Use the bond_ether_setup
helper to restore both after such transition.
[1] reproduce (nlmon is non-ARPHRD_ETHER):
$ ip l add nlmon0 type nlmon
$ ip l add bond2 type bond mode active-backup
$ ip l set nlmon0 master bond2
$ ip l set nlmon0 nomaster
$ ip l add bond1 type bond
(we use bond1 as ARPHRD_ETHER device to restore bond2's mode)
$ ip l set bond1 master bond2
$ ip l sh dev bond2
37: bond2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether be:d7:c5:40:5b:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 1500
(notice bond2's IFF_MASTER is missing)
Fixes: e36b9d16c6 ("bonding: clean muticast addresses when device changes type")
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yoshihiro Shimoda says:
====================
net: renesas: rswitch: Fix rx and timestamp
I got reports locally about issues on the rswitch driver.
So, fix the issues.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Since the GWCA has the TX timestamp feature, this driver
should not disable it if one of ports is opened. So, fix it.
Reported-by: Phong Hoang <phong.hoang.wz@renesas.com>
Fixes: 33f5d733b5 ("net: renesas: rswitch: Improve TX timestamp accuracy")
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the RX descriptor doesn't have any data, the output value of quote
from rswitch_rx() will be increased unexpectedily. So, fix it.
Reported-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Fixes: 3590918b5d ("net: ethernet: renesas: Add support for "Ethernet Switch"")
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In vnet_port_probe() and vsw_port_probe(), we should
check the return value of mdesc_grab() as it may
return NULL which can caused NPD bugs.
Fixes: 5d01fa0c6b ("ldmvsw: Add ldmvsw.c driver code")
Fixes: 43fdf27470 ("[SPARC64]: Abstract out mdesc accesses for better MD update handling.")
Signed-off-by: Liang He <windhl@126.com>
Reviewed-by: Piotr Raczynski <piotr.raczynski@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alex Elder says:
====================
net: ipa: minor bug fixes
The four patches in this series fix some errors, though none of them
cause any compile or runtime problems.
The first changes the files included by "drivers/net/ipa/reg.h" to
ensure everything it requires is included with the file. It also
stops unnecessarily including another file. The prerequisites are
apparently satisfied other ways, currently.
The second adds two struct declarations to "gsi_reg.h", to ensure
they're declared before they're used later in the file. Again, it
seems these declarations are currently resolved wherever this file
is included.
The third removes register definitions that were added for IPA v5.0
that are not needed. And the last updates some validity checks for
IPA v5.0 registers. No IPA v5.0 platforms are yet supported, so the
issues resolved here were never harmful.
Versions 2 and 3 of this series change the "Fixes" tags in patches
so they supply legitimate commit hashes.
====================
Link: https://lore.kernel.org/r/20230316145136.1795469-1-elder@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
A recent commit defined HW_PARAM_4 as a GSI register ID but did not
add it to gsi_reg_id_valid() to indicate it's valid (for IPA v5.0+).
Add version checks for the HW_PARAM_2 and INTER_EE IRQ GSI registers
there as well.
IPA v5.0 supports up to 8 source and destination resource groups.
Update the validity check (and the comments where the register IDs
are defined) to reflect that. Similarly update comments and
validity checks for the hash/cache-related registers.
Note that this patch fixes an omission and constrains things
further, but these don't technically represent bugs.
Fixes: f651334e1e ("net: ipa: add HW_PARAM_4 GSI register")
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
A recent commit defined a few IPA registers used for IPA v5.0+.
One of those was a mistake. Although the filter and router caches
get *flushed* using a single register, they use distinct registers
(ENDP_FILTER_CACHE_CFG and ENDP_ROUTER_CACHE_CFG) for configuration.
And although there *exists* a FILT_ROUT_CACHE_CFG register, it is
not needed in upstream code. So get rid of definitions related to
FILT_ROUT_CACHE_CFG, because they are not needed.
Fixes: 8ba59716d1 ("net: ipa: define IPA v5.0+ registers")
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When gsi_reg_init() got added, its declaration was added to
"gsi_reg.h" without declaring the two struct pointer types it uses.
Add these struct declarations to "gsi_reg.h".
Fixes: 3c506add35 ("net: ipa: introduce gsi_reg_init()")
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When "reg.h" got created, it included calls to WARN() and WARN_ON().
Those macros are defined via <linux/bug.h>. In addition, it uses
is_power_of_2(), which is defined in <linux/log2.h>. Include those
files so IPA "reg.h" has access to all definitions it requires.
Meanwhile, <linux/bits.h> is included but nothing defined therein
is required directly in "reg.h", so get rid of that.
Fixes: 81772e444d ("net: ipa: start generalizing "ipa_reg"")
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Drivers will commonly perform feature setting during init, if they use
the xdp_set_features_flag() helper they'll likely run into an ASSERT_RTNL()
inside call_netdevice_notifiers_info().
Don't call the notifier until the device is actually registered.
Nothing should be tracking the device until its registered and
after its unregistration has started.
Fixes: 4d5ab0ad96 ("net/mlx5e: take into account device reconfiguration for xdp_features flag")
Link: https://lore.kernel.org/r/20230316220234.598091-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Hangbin Liu says:
====================
net/sched: fix parsing of TCA_EXT_WARN_MSG for tc action
In my previous commit 0349b8779c ("sched: add new attr TCA_EXT_WARN_MSG
to report tc extact message") I didn't notice the tc action use different
enum with filter. So we can't use TCA_EXT_WARN_MSG directly for tc action.
Let's rever the previous fix 923b2e30dc ("net/sched: act_api: move
TCA_EXT_WARN_MSG to the correct hierarchy") and add a new
TCA_ROOT_EXT_WARN_MSG for tc action specifically.
Here is the tdc test result:
1..1119
ok 1 d959 - Add cBPF action with valid bytecode
ok 2 f84a - Add cBPF action with invalid bytecode
ok 3 e939 - Add eBPF action with valid object-file
ok 4 282d - Add eBPF action with invalid object-file
ok 5 d819 - Replace cBPF bytecode and action control
ok 6 6ae3 - Delete cBPF action
ok 7 3e0d - List cBPF actions
ok 8 55ce - Flush BPF actions
ok 9 ccc3 - Add cBPF action with duplicate index
ok 10 89c7 - Add cBPF action with invalid index
[...]
ok 1115 2348 - Show TBF class
ok 1116 84a0 - Create TEQL with default setting
ok 1117 7734 - Create TEQL with multiple device
ok 1118 34a9 - Delete TEQL with valid handle
ok 1119 6289 - Show TEQL stats
====================
Link: https://lore.kernel.org/r/20230316033753.2320557-1-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In my previous commit 0349b8779c ("sched: add new attr TCA_EXT_WARN_MSG
to report tc extact message") I didn't notice the tc action use different
enum with filter. So we can't use TCA_EXT_WARN_MSG directly for tc action.
Let's add a TCA_ROOT_EXT_WARN_MSG for tc action specifically and put this
param before going to the TCA_ACT_TAB nest.
Fixes: 0349b8779c ("sched: add new attr TCA_EXT_WARN_MSG to report tc extact message")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This reverts commit 923b2e30dc.
This is not a correct fix as TCA_EXT_WARN_MSG is not a hierarchy to
TCA_ACT_TAB. I didn't notice the TC actions use different enum when adding
TCA_EXT_WARN_MSG. To fix the difference I will add a new WARN enum in
TCA_ROOT_MAX as Jamal suggested.
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The blamed commit has replaced a ksz_write8() call to address
REG_PORT_5_CTRL_6 (0x56) with a ksz_set_xmii() -> ksz_pwrite8() call to
regs[P_XMII_CTRL_1], which is also defined as 0x56 for ksz8795_regs[].
The trouble is that, when compared to ksz_write8(), ksz_pwrite8() also
adjusts the register offset with the port base address. So in reality,
ksz_pwrite8(offset=0x56) accesses register 0x56 + 0x50 = 0xa6, which in
this switch appears to be unmapped, and the RGMII delay configuration on
the CPU port does nothing.
So if the switch wasn't fine with the RGMII delay configuration done
through pin strapping and relied on Linux to apply a different one in
order to pass traffic, this is now broken.
Using the offset translation logic imposed by ksz_pwrite8(), the correct
value for regs[P_XMII_CTRL_1] should have been 0x6 on ksz8795_regs[], in
order to really end up accessing register 0x56.
Static code analysis shows that, despite there being multiple other
accesses to regs[P_XMII_CTRL_1] in this driver, the only code path that
is applicable to ksz8795_regs[] and ksz8_dev_ops is ksz_set_xmii().
Therefore, the problem is isolated to RGMII delays.
In its current form, ksz8795_regs[] contains the same value for
P_XMII_CTRL_0 and for P_XMII_CTRL_1, and this raises valid suspicions
that writes made by the driver to regs[P_XMII_CTRL_0] might overwrite
writes made to regs[P_XMII_CTRL_1] or vice versa.
Again, static analysis shows that the only accesses to P_XMII_CTRL_0
from the driver are made from code paths which are not reachable with
ksz8_dev_ops. So the accesses made by ksz_set_xmii() are safe for this
switch family.
[ vladimiroltean: rewrote commit message ]
Fixes: c476bede4b ("net: dsa: microchip: ksz8795: use common xmii function")
Signed-off-by: Marek Vasut <marex@denx.de>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20230315231916.2998480-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski says:
====================
ynl: another license adjustment
Hopefully the last adjustment to the licensing of the specs.
I'm still the author so should be fine to do this.
====================
Link: https://lore.kernel.org/r/20230315230351.478320-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The (only recently documented) expectation is that all specs
are under a certain license, but we don't actually enforce it.
What's worse we then go ahead and assume the license was right,
outputting the expected license into generated files.
Fixes: 37d9df224d ("ynl: re-license uniformly under GPL-2.0 OR BSD-3-Clause")
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
I relicensed Netlink spec code to GPL-2.0 OR BSD-3-Clause but
we still put a slightly different license on the uAPI header
than the rest of the code. Use the Linux-syscall-note on all
the specs and all generated code. It's moot for kernel code,
but should not hurt. This way the licenses match everywhere.
Cc: Chuck Lever <chuck.lever@oracle.com>
Fixes: 37d9df224d ("ynl: re-license uniformly under GPL-2.0 OR BSD-3-Clause")
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
definitions are optional, commit in question breaks cli for ethtool.
Fixes: 6517a60b03 ("tools: ynl: move the enum classes to shared code")
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Saeed Mahameed says:
====================
mlx5 fixes 2023-03-15
This series provides bug fixes to mlx5 driver.
* tag 'mlx5-fixes-2023-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5e: TC, Remove error message log print
net/mlx5e: TC, fix cloned flow attribute
net/mlx5e: TC, fix missing error code
net/sched: TC, fix raw counter initialization
net/mlx5e: Lower maximum allowed MTU in XSK to match XDP prerequisites
net/mlx5: Set BREAK_FW_WAIT flag first when removing driver
net/mlx5e: kTLS, Fix missing error unwind on unsupported cipher type
net/mlx5e: Fix cleanup null-ptr deref on encap lock
net/mlx5: E-switch, Fix missing set of split_count when forward to ovs internal port
net/mlx5: E-switch, Fix wrong usage of source port rewrite in split rules
net/mlx5: Disable eswitch before waiting for VF pages
net/mlx5: Fix setting ec_function bit in MANAGE_PAGES
net/mlx5e: Don't cache tunnel offloads capability
net/mlx5e: Fix macsec ASO context alignment
====================
Link: https://lore.kernel.org/r/20230315225847.360083-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
While adding and removing the controller, the following call trace was
observed:
WARNING: CPU: 3 PID: 623596 at kernel/dma/mapping.c:532 dma_free_attrs+0x33/0x50
CPU: 3 PID: 623596 Comm: sh Kdump: loaded Not tainted 5.14.0-96.el9.x86_64 #1
RIP: 0010:dma_free_attrs+0x33/0x50
Call Trace:
qla2x00_async_sns_sp_done+0x107/0x1b0 [qla2xxx]
qla2x00_abort_srb+0x8e/0x250 [qla2xxx]
? ql_dbg+0x70/0x100 [qla2xxx]
__qla2x00_abort_all_cmds+0x108/0x190 [qla2xxx]
qla2x00_abort_all_cmds+0x24/0x70 [qla2xxx]
qla2x00_abort_isp_cleanup+0x305/0x3e0 [qla2xxx]
qla2x00_remove_one+0x364/0x400 [qla2xxx]
pci_device_remove+0x36/0xa0
__device_release_driver+0x17a/0x230
device_release_driver+0x24/0x30
pci_stop_bus_device+0x68/0x90
pci_stop_and_remove_bus_device_locked+0x16/0x30
remove_store+0x75/0x90
kernfs_fop_write_iter+0x11c/0x1b0
new_sync_write+0x11f/0x1b0
vfs_write+0x1eb/0x280
ksys_write+0x5f/0xe0
do_syscall_64+0x5c/0x80
? do_user_addr_fault+0x1d8/0x680
? do_syscall_64+0x69/0x80
? exc_page_fault+0x62/0x140
? asm_exc_page_fault+0x8/0x30
entry_SYSCALL_64_after_hwframe+0x44/0xae
The command was completed in the abort path during driver unload with a
lock held, causing the warning in abort path. Hence complete the command
without any lock held.
Reported-by: Lin Li <lilin@redhat.com>
Tested-by: Lin Li <lilin@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Link: https://lore.kernel.org/r/20230313043711.13500-2-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Reviewed-by: John Meneghini <jmeneghi@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Xiaomi Poco F1 (qcom/sdm845-xiaomi-beryllium*.dts) comes with a SKhynix
H28U74301AMR UFS. The sd_read_cpr() operation leads to a 120 second
timeout, making the device bootup very slow:
[ 121.457736] sd 0:0:0:1: [sdb] tag#23 timing out command, waited 120s
Setting the BLIST_SKIP_VPD_PAGES allows the device to skip the failing
sd_read_cpr operation and boot normally.
Signed-off-by: Joel Selvaraj <joelselvaraj.oss@gmail.com>
Link: https://lore.kernel.org/r/20230313041402.39330-1-joelselvaraj.oss@gmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
drm/i915 fixes for v6.3-rc3:
- Fix hwmon PL1 power limit enabling
- Fix audio ELD handling for DP MST
- Fix PSR io and wake line calculations
- Fix DG2 HDMI modes with 267.30 and 319.89 MHz pixel clocks
- Fix SSEU subslice out-of-bounds access
- Fix misuse of non-idle barriers as fence trackers
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/87r0tq5nyn.fsf@intel.com
The `devlink -j port show` command output may not contain the "flavour"
key, an example from Ubuntu 22.10 s390x LPAR(5.19.0-37-generic), with
mlx4 driver and iproute2-5.15.0:
{"port":{"pci/0001:00:00.0/1":{"type":"eth","netdev":"ens301"},
"pci/0001:00:00.0/2":{"type":"eth","netdev":"ens301d1"},
"pci/0002:00:00.0/1":{"type":"eth","netdev":"ens317"},
"pci/0002:00:00.0/2":{"type":"eth","netdev":"ens317d1"}}}
This will cause a KeyError exception.
Create a validate_devlink_output() to check for this "flavour" from
devlink command output to avoid this KeyError exception. Also let
it handle the check for `devlink -j dev show` output in main().
Apart from this, if the test was not started because the max lanes of
the designated device is 0. The script will still return 0 and thus
causing a false-negative test result.
Use a found_max_lanes flag to determine if these tests were skipped
due to this reason and return KSFT_SKIP to make it more clear.
Link: https://bugs.launchpad.net/bugs/1937133
Fixes: f3348a82e7 ("selftests: net: Add port split test")
Signed-off-by: Po-Hsu Lin <po-hsu.lin@canonical.com>
Link: https://lore.kernel.org/r/20230315165353.229590-1-po-hsu.lin@canonical.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
iucv_irq_data needs to be 4 bytes larger.
These bytes are not used by the iucv module, but written by
the z/VM hypervisor in case a CPU is deconfigured.
Reported as:
BUG dma-kmalloc-64 (Not tainted): kmalloc Redzone overwritten
-----------------------------------------------------------------------------
0x0000000000400564-0x0000000000400567 @offset=1380. First byte 0x80 instead of 0xcc
Allocated in iucv_cpu_prepare+0x44/0xd0 age=167839 cpu=2 pid=1
__kmem_cache_alloc_node+0x166/0x450
kmalloc_node_trace+0x3a/0x70
iucv_cpu_prepare+0x44/0xd0
cpuhp_invoke_callback+0x156/0x2f0
cpuhp_issue_call+0xf0/0x298
__cpuhp_setup_state_cpuslocked+0x136/0x338
__cpuhp_setup_state+0xf4/0x288
iucv_init+0xf4/0x280
do_one_initcall+0x78/0x390
do_initcalls+0x11a/0x140
kernel_init_freeable+0x25e/0x2a0
kernel_init+0x2e/0x170
__ret_from_fork+0x3c/0x58
ret_from_fork+0xa/0x40
Freed in iucv_init+0x92/0x280 age=167839 cpu=2 pid=1
__kmem_cache_free+0x308/0x358
iucv_init+0x92/0x280
do_one_initcall+0x78/0x390
do_initcalls+0x11a/0x140
kernel_init_freeable+0x25e/0x2a0
kernel_init+0x2e/0x170
__ret_from_fork+0x3c/0x58
ret_from_fork+0xa/0x40
Slab 0x0000037200010000 objects=32 used=30 fp=0x0000000000400640 flags=0x1ffff00000010200(slab|head|node=0|zone=0|
Object 0x0000000000400540 @offset=1344 fp=0x0000000000000000
Redzone 0000000000400500: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc ................
Redzone 0000000000400510: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc ................
Redzone 0000000000400520: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc ................
Redzone 0000000000400530: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc ................
Object 0000000000400540: 00 01 00 03 00 00 00 00 00 00 00 00 00 00 00 00 ................
Object 0000000000400550: f3 86 81 f2 f4 82 f8 82 f0 f0 f0 f0 f0 f0 f0 f2 ................
Object 0000000000400560: 00 00 00 00 80 00 00 00 cc cc cc cc cc cc cc cc ................
Object 0000000000400570: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc ................
Redzone 0000000000400580: cc cc cc cc cc cc cc cc ........
Padding 00000000004005d4: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
Padding 00000000004005e4: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
Padding 00000000004005f4: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZ
CPU: 6 PID: 121030 Comm: 116-pai-crypto. Not tainted 6.3.0-20230221.rc0.git4.99b8246b2d71.300.fc37.s390x+debug #1
Hardware name: IBM 3931 A01 704 (z/VM 7.3.0)
Call Trace:
[<000000032aa034ec>] dump_stack_lvl+0xac/0x100
[<0000000329f5a6cc>] check_bytes_and_report+0x104/0x140
[<0000000329f5aa78>] check_object+0x370/0x3c0
[<0000000329f5ede6>] free_debug_processing+0x15e/0x348
[<0000000329f5f06a>] free_to_partial_list+0x9a/0x2f0
[<0000000329f5f4a4>] __slab_free+0x1e4/0x3a8
[<0000000329f61768>] __kmem_cache_free+0x308/0x358
[<000000032a91465c>] iucv_cpu_dead+0x6c/0x88
[<0000000329c2fc66>] cpuhp_invoke_callback+0x156/0x2f0
[<000000032aa062da>] _cpu_down.constprop.0+0x22a/0x5e0
[<0000000329c3243e>] cpu_device_down+0x4e/0x78
[<000000032a61dee0>] device_offline+0xc8/0x118
[<000000032a61e048>] online_store+0x60/0xe0
[<000000032a08b6b0>] kernfs_fop_write_iter+0x150/0x1e8
[<0000000329fab65c>] vfs_write+0x174/0x360
[<0000000329fab9fc>] ksys_write+0x74/0x100
[<000000032aa03a5a>] __do_syscall+0x1da/0x208
[<000000032aa177b2>] system_call+0x82/0xb0
INFO: lockdep is turned off.
FIX dma-kmalloc-64: Restoring kmalloc Redzone 0x0000000000400564-0x0000000000400567=0xcc
FIX dma-kmalloc-64: Object at 0x0000000000400540 not freed
Fixes: 2356f4cb19 ("[S390]: Rewrite of the IUCV base code, part 2")
Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
Link: https://lore.kernel.org/r/20230315131435.4113889-1-wintera@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The aq_xdp_run_prog() function falls back to the XDP_ABORTED action
handler (using a goto) if the operations for any of the other actions fail.
The XDP_ABORTED handler in turn calls the bpf_warn_invalid_xdp_action()
tracepoint. However, the function also jumps into the XDP_PASS helper if no
XDP program is loaded on the device, which means the XDP_ABORTED handler
can be run with a NULL program pointer. This results in a NULL pointer
deref because the tracepoint dereferences the 'prog' pointer passed to it.
This situation can happen in multiple ways:
- If a packet arrives between the removal of the program from the interface
and the static_branch_dec() in aq_xdp_setup()
- If there are multiple devices using the same driver in the system and
one of them has an XDP program loaded and the other does not.
Fix this by refactoring the aq_xdp_run_prog() function to remove the 'goto
pass' handling if there is no XDP program loaded. Instead, factor out the
skb building in a separate small helper function.
Fixes: 26efaef759 ("net: atlantic: Implement xdp data plane")
Reported-by: Freysteinn Alfredsson <Freysteinn.Alfredsson@kau.se>
Tested-by: Freysteinn Alfredsson <Freysteinn.Alfredsson@kau.se>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/r/20230315125539.103319-1-toke@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit f96a3d7455 ("ipv4: Fix incorrect route flushing when source
address is deleted") started to take the table ID field in the FIB info
structure into account when determining if two structures are identical
or not. This field is initialized using the 'fc_table' field in the
route configuration structure, which is not set when adding a route via
IOCTL.
The above can result in user space being able to install two identical
routes that only differ in the table ID field of their associated FIB
info.
Fix by initializing the table ID field in the route configuration
structure in the IOCTL path.
Before the fix:
# ip route add default via 192.0.2.2
# route add default gw 192.0.2.2
# ip -4 r show default
# default via 192.0.2.2 dev dummy10
# default via 192.0.2.2 dev dummy10
After the fix:
# ip route add default via 192.0.2.2
# route add default gw 192.0.2.2
SIOCADDRT: File exists
# ip -4 r show default
default via 192.0.2.2 dev dummy10
Audited the code paths to ensure there are no other paths that do not
properly initialize the route configuration structure when installing a
route.
Fixes: 5a56a0b3a4 ("net: Don't delete routes in different VRFs")
Fixes: f96a3d7455 ("ipv4: Fix incorrect route flushing when source address is deleted")
Reported-by: gaoxingwang <gaoxingwang1@huawei.com>
Link: https://lore.kernel.org/netdev/20230314144159.2354729-1-gaoxingwang1@huawei.com/
Tested-by: gaoxingwang <gaoxingwang1@huawei.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20230315124009.4015212-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Steffen Klassert says:
====================
pull request (net): ipsec 2023-03-15
1) Fix an information leak when dumping algos and encap.
From Herbert Xu
2) Allow transport-mode states with AF_UNSPEC selector
to allow for nested transport-mode states.
From Herbert Xu.
* tag 'ipsec-2023-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec:
xfrm: Allow transport-mode states with AF_UNSPEC selector
xfrm: Zero padding when dumping algos and encap
====================
Link: https://lore.kernel.org/r/20230315105623.1396491-1-steffen.klassert@secunet.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Wolfram Sang says:
====================
net: renesas: set 'mac_managed_pm' at probe time
When suspending/resuming an interface which was not up, we saw mdiobus
related PM handling despite 'mac_managed_pm' being set for RAVB/SH_ETH.
Heiner kindly suggested the fix to set this flag at probe time, not at
init/open time. I implemented his suggestion and it works fine on these
two Renesas drivers.
====================
Link: https://lore.kernel.org/r/20230315074115.3008-1-wsa+renesas@sang-engineering.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
SH_ETH doesn't need mdiobus suspend/resume, that's why it sets
'mac_managed_pm'. However, setting it needs to be moved from init to
probe, so mdiobus PM functions will really never be called (e.g. when
the interface is not up yet during suspend/resume).
Fixes: 6a1dbfefda ("net: sh_eth: Fix PHY state warning splat during system resume")
Suggested-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
RAVB doesn't need mdiobus suspend/resume, that's why it sets
'mac_managed_pm'. However, setting it needs to be moved from init to
probe, so mdiobus PM functions will really never be called (e.g. when
the interface is not up yet during suspend/resume).
Fixes: 4924c0cdce ("net: ravb: Fix PHY state warning splat during system resume")
Suggested-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
On page fault, we find about the VMA that backs the page fault
early on, and quickly release the mmap_read_lock. However, using
the VMA pointer after the critical section is pretty dangerous,
as a teardown may happen in the meantime and the VMA be long gone.
Move the sampling of the MTE permission early, and NULL-ify the
VMA pointer after that, just to be on the safe side.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230316174546.3777507-3-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
We walk the userspace PTs to discover what mapping size was
used there. However, this can race against the userspace tables
being freed, and we end-up in the weeds.
Thankfully, the mm code is being generous and will IPI us when
doing so. So let's implement our part of the bargain and disable
interrupts around the walk. This ensures that nothing terrible
happens during that time.
We still need to handle the removal of the page tables before
the walk. For that, allow get_user_mapping_size() to return an
error, and make sure this error can be propagated all the way
to the the exit handler.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230316174546.3777507-2-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Pull cifs client fixes from Steve French:
"Seven cifs/smb3 client fixes, all also for stable:
- four DFS fixes
- multichannel reconnect fix
- fix smb1 stats for cancel command
- fix for set file size error path"
* tag '6.3-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: use DFS root session instead of tcon ses
cifs: return DFS root session id in DebugData
cifs: fix use-after-free bug in refresh_cache_worker()
cifs: set DFS root session in cifs_get_smb_ses()
cifs: generate signkey for the channel that's reconnecting
cifs: Fix smb2_set_path_size()
cifs: Move the in_send statistic to __smb_send_rqst()
The data->block[0] variable comes from user and is a number between
0-255. Without proper check, the variable may be very large to cause
an out-of-bounds when performing memcpy in slimpro_i2c_blkwr.
Fix this bug by checking the value of writelen.
Fixes: f6505fbabc ("i2c: add SLIMpro I2C device driver on APM X-Gene platform")
Signed-off-by: Wei Chen <harperchen1110@gmail.com>
Cc: stable@vger.kernel.org
Reviewed-by: Andi Shyti <andi.shyti@kernel.org>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
The controller will always generate a completion interrupt when the
transfer is finished normally or not. Currently we use either error or
completion interrupt to finish, this may result the completion
interrupt unhandled and corrupt the next transfer, especially at low
speed mode. Since on error case, the error interrupt will come first
then is the completion interrupt. So only use the completion interrupt
to finish the whole transfer process.
Fixes: d62fbdb99a ("i2c: add support for HiSilicon I2C controller")
Reported-by: Sheng Feng <fengsheng5@huawei.com>
Signed-off-by: Sheng Feng <fengsheng5@huawei.com>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
After issuing all the messages we can disable the TX_EMPTY interrupts
to avoid handling redundant interrupts. For doing a sinlge bus
detection (i2cdetect -y -r 0) we can reduce ~97% interrupts (before
~12000 after ~400).
Signed-off-by: Sheng Feng <fengsheng5@huawei.com>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
We found that after commit 9c46929e79
("ARM: implement THREAD_INFO_IN_TASK for uniprocessor systems"), the
PCF85063 RTC driver stopped working on i.MX28 due to regmap_bulk_read()
reading bogus data into a stack buffer. This is caused by the i2c-mxs
driver using DMA transfers even for messages without the I2C_M_DMA_SAFE
flag, and the aforementioned commit enabling vmapped stacks.
As the MXS I2C controller requires DMA for reads of >4 bytes, DMA can't be
disabled, so the issue is fixed by using i2c_get_dma_safe_msg_buf() to
create a bounce buffer when needed.
Fixes: 9c46929e79 ("ARM: implement THREAD_INFO_IN_TASK for uniprocessor systems")
Signed-off-by: Matthias Schiffer <matthias.schiffer@ew.tq-group.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
When reading from I2C, the Tx watermark is set to 0. Unfortunately the
TDF (transmit data flag) is enabled when Tx FIFO entries is equal or less
than watermark. So it is set in every case, hence the reset default of 1.
This results in the MSR_RDF _and_ MSR_TDF flags to be set thus trying
to send Tx data on a read message.
Mask the IRQ status to filter for wanted flags only.
Fixes: a55fa9d0e4 ("i2c: imx-lpi2c: add low power i2c bus driver")
Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Tested-by: Emanuele Ghidoli <emanuele.ghidoli@toradex.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Pull kvm fixes from Paolo Bonzini:
"ARM64:
- Address a rather annoying bug w.r.t. guest timer offsetting. The
synchronization of timer offsets between vCPUs was broken, leading
to inconsistent timer reads within the VM.
x86:
- New tests for the slow path of the EVTCHNOP_send Xen hypercall
- Add missing nVMX consistency checks for CR0 and CR4
- Fix bug that broke AMD GATag on 512 vCPU machines
Selftests:
- Skip hugetlb tests if huge pages are not available
- Sync KVM exit reasons"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: selftests: Sync KVM exit reasons in selftests
KVM: selftests: Add macro to generate KVM exit reason strings
KVM: selftests: Print expected and actual exit reason in KVM exit reason assert
KVM: selftests: Make vCPU exit reason test assertion common
KVM: selftests: Add EVTCHNOP_send slow path test to xen_shinfo_test
KVM: selftests: Use enum for test numbers in xen_shinfo_test
KVM: selftests: Add helpers to make Xen-style VMCALL/VMMCALL hypercalls
KVM: selftests: Move the guts of kvm_hypercall() to a separate macro
KVM: SVM: WARN if GATag generation drops VM or vCPU ID information
KVM: SVM: Modify AVIC GATag to support max number of 512 vCPUs
KVM: SVM: Fix a benign off-by-one bug in AVIC physical table mask
selftests: KVM: skip hugetlb tests if huge pages are not available
KVM: VMX: Use tabs instead of spaces for indentation
KVM: VMX: Fix indentation coding style issue
KVM: nVMX: remove unnecessary #ifdef
KVM: nVMX: add missing consistency checks for CR0 and CR4
KVM: arm64: timers: Convert per-vcpu virtual offset to a global value
Xuan Zhuo says:
====================
virtio_net: fix two bugs related to XDP
This patch set fixes two bugs related to XDP.
These two patch is not associated.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
It is preferred to use typed property access functions (i.e.
of_property_read_<type> functions) rather than low-level
of_get_property/of_find_property functions for reading properties.
Convert reading boolean properties to of_property_read_bool().
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> # for net/can
Acked-by: Kalle Valo <kvalo@kernel.org>
Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Acked-by: Francois Romieu <romieu@fr.zoreil.com>
Reviewed-by: Wei Fang <wei.fang@nxp.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean says:
====================
Fix MTU reporting for Marvell DSA switches where we can't change it
As explained in patch 2, the driver doesn't know how to change the MTU
on MV88E6165, MV88E6191, MV88E6220, MV88E6250 and MV88E6290, and there
is a regression where it actually reports an MTU value below the
Ethernet standard (1500).
Fixing that shows another issue where DSA is unprepared to be told that
a switch supports an MTU of only 1500, and still errors out. That is
addressed by patch 1.
Testing was not done on "real" hardware, but on a different Marvell DSA
switch, with code modified such that the driver doesn't know how to
change the MTU on that, either.
A key assumption is that these switches don't need any MTU configuration
to pass full MTU-sized, DSA-tagged packets, which seems like a
reasonable assumption to make. My 6390 and 6190 switches, with
.port_set_jumbo_size commented out, certainly don't seem to have any
problem passing MTU-sized traffic, as can be seen in this iperf3 session
captured with tcpdump on the DSA master:
$MAC > $MAC, Marvell DSA mode Forward, dev 2, port 8, untagged, VID 1000,
FPri 0, ethertype IPv4 (0x0800), length 1518:
10.0.0.69.49590 > 10.0.0.1.5201: Flags [.], seq 81088:82536,
ack 1, win 502, options [nop,nop,TS val 2221498829 ecr 3012859850],
length 1448
I don't want to go all the way and say that the adjustment made by
commit b9c587fed6 ("dsa: mv88e6xxx: Include tagger overhead when
setting MTU for DSA and CPU ports") is completely unnecessary, just that
there's an equally good chance that the switches with unknown MTU
configuration procedure "just work".
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
There are 3 classes of switch families that the driver is aware of, as
far as mv88e6xxx_change_mtu() is concerned:
- MTU configuration is available per port. Here, the
chip->info->ops->port_set_jumbo_size() method will be present.
- MTU configuration is global to the switch. Here, the
chip->info->ops->set_max_frame_size() method will be present.
- We don't know how to change the MTU. Here, none of the above methods
will be present.
Switch families MV88E6165, MV88E6191, MV88E6220, MV88E6250 and MV88E6290
fall in category 3.
The blamed commit has adjusted the MTU for all 3 categories by EDSA_HLEN
(8 bytes), resulting in a new maximum MTU of 1492 being reported by the
driver for these switches.
I don't have the hardware to test, but I do have a MV88E6390 switch on
which I can simulate this by commenting out its .port_set_jumbo_size
definition from mv88e6390_ops. The result is this set of messages at
probe time:
mv88e6085 d0032004.mdio-mii:10: nonfatal error -34 setting MTU to 1500 on port 1
mv88e6085 d0032004.mdio-mii:10: nonfatal error -34 setting MTU to 1500 on port 2
mv88e6085 d0032004.mdio-mii:10: nonfatal error -34 setting MTU to 1500 on port 3
mv88e6085 d0032004.mdio-mii:10: nonfatal error -34 setting MTU to 1500 on port 4
mv88e6085 d0032004.mdio-mii:10: nonfatal error -34 setting MTU to 1500 on port 5
mv88e6085 d0032004.mdio-mii:10: nonfatal error -34 setting MTU to 1500 on port 6
mv88e6085 d0032004.mdio-mii:10: nonfatal error -34 setting MTU to 1500 on port 7
mv88e6085 d0032004.mdio-mii:10: nonfatal error -34 setting MTU to 1500 on port 8
It is highly implausible that there exist Ethernet switches which don't
support the standard MTU of 1500 octets, and this is what the DSA
framework says as well - the error comes from dsa_slave_create() ->
dsa_slave_change_mtu(slave_dev, ETH_DATA_LEN).
But the error messages are alarming, and it would be good to suppress
them.
As a consequence of this unlikeliness, we reimplement mv88e6xxx_get_max_mtu()
and mv88e6xxx_change_mtu() on switches from the 3rd category as follows:
the maximum supported MTU is 1500, and any request to set the MTU to a
value larger than that fails in dev_validate_mtu().
Fixes: b9c587fed6 ("dsa: mv88e6xxx: Include tagger overhead when setting MTU for DSA and CPU ports")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, when dsa_slave_change_mtu() is called on a user port where
dev->max_mtu is 1500 (as returned by ds->ops->port_max_mtu()), the code
will stumble upon this check:
if (new_master_mtu > mtu_limit)
return -ERANGE;
because new_master_mtu is adjusted for the tagger overhead but mtu_limit
is not.
But it would be good if the logic went through, for example if the DSA
master really depends on an MTU adjustment to accept DSA-tagged frames.
To make the code pass through the check, we need to adjust mtu_limit for
the overhead as well, if the minimum restriction was caused by the DSA
user port's MTU (dev->max_mtu). A DSA user port MTU and a DSA master MTU
are always offset by the protocol overhead.
Currently no drivers return 1500 .port_max_mtu(), but this is only
temporary and a bug in itself - mv88e6xxx should have done that, but
since commit b9c587fed6 ("dsa: mv88e6xxx: Include tagger overhead when
setting MTU for DSA and CPU ports") it no longer does. This is a
preparation for fixing that.
Fixes: bfcb813203 ("net: dsa: configure the MTU for switch ports")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Check alloc_precpu()'s return value and return an error from
dm_stats_init() if it fails. Update alloc_dev() to fail if
dm_stats_init() does.
Otherwise, a NULL pointer dereference will occur in dm_stats_cleanup()
even if dm-stats isn't being actively used.
Fixes: fd2ed4d252 ("dm: add statistics support")
Cc: stable@vger.kernel.org
Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
This adds SOCK_STREAM and SOCK_SEQPACKET tests for invalid buffer case.
It tries to read data to NULL buffer (data already presents in socket's
queue), then uses valid buffer. For SOCK_STREAM second read must return
data, because skbuff is not dropped, but for SOCK_SEQPACKET skbuff will
be dropped by kernel, and 'recv()' will return EAGAIN.
Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This returns behaviour of SOCK_STREAM read as before skbuff usage. When
copying to user fails current skbuff won't be dropped, but returned to
sockets's queue. Technically instead of 'skb_dequeue()', 'skb_peek()' is
called and when skbuff becomes empty, it is removed from queue by
'__skb_unlink()'.
Fixes: 71dc9ec9ac ("virtio/vsock: replace virtio_vsock_pkt with sk_buff")
Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since we now no longer use 'skb->len' to update credit, there is no sense
to update skbuff state, because it is used only once after dequeue to
copy data and then will be released.
Fixes: 71dc9ec9ac ("virtio/vsock: replace virtio_vsock_pkt with sk_buff")
Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
'skb->len' can vary when we partially read the data, this complicates the
calculation of credit to be updated in 'virtio_transport_inc_rx_pkt()/
virtio_transport_dec_rx_pkt()'.
Also in 'virtio_transport_dec_rx_pkt()' we were miscalculating the
credit since 'skb->len' was redundant.
For these reasons, let's replace the use of skbuff state to calculate new
'rx_bytes'/'fwd_cnt' values with explicit value as input argument. This
makes code more simple, because it is not needed to change skbuff state
before each call to update 'rx_bytes'/'fwd_cnt'.
Fixes: 71dc9ec9ac ("virtio/vsock: replace virtio_vsock_pkt with sk_buff")
Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In porting his development branch to 6.3-rc1, yours truly has
repeatedly screwed up the args->pag being fed to the xfs_alloc_vextent*
functions. Add some debugging assertions to test the preconditions
required of the callers.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
The check introduced in the commit a5fd39464a ("igc: Lift TAPRIO schedule
restriction") can detect a false positive error in some corner case.
For instance,
tc qdisc replace ... taprio num_tc 4
...
sched-entry S 0x01 100000 # slot#1
sched-entry S 0x03 100000 # slot#2
sched-entry S 0x04 100000 # slot#3
sched-entry S 0x08 200000 # slot#4
flags 0x02 # hardware offload
Here the queue#0 (the first queue) is on at the slot#1 and #2,
and off at the slot#3 and #4. Under the current logic, when the slot#4
is examined, validate_schedule() returns *false* since the enablement
count for the queue#0 is two and it is already off at the previous slot
(i.e. #3). But this definition is truely correct.
Let's fix the logic to enforce a strict validation for consecutively-opened
slots.
Fixes: a5fd39464a ("igc: Lift TAPRIO schedule restriction")
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
vf reset nack actually represents the reset operation itself is
performed but no address is assigned. Therefore, e1000_reset_hw_vf
should fill the "perm_addr" with the zero address and return success on
such an occasion. This prevents its callers in netdev.c from saying PF
still resetting, and instead allows them to correctly report that no
address is assigned.
Fixes: 6ddbc4cf1f ("igb: Indicate failure on vf reset for empty mac address")
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Tested-by: Marek Szlosek <marek.szlosek@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Enabling SR-IOV causes the virtual functions to make requests to the
PF via the mailbox. Notably, E1000_VF_RESET request will happen during
the initialization of the VF. However, unless the reinit is done, the
VMMB interrupt, which delivers mailbox interrupt from VF to PF will be
kept masked and such requests will be silently ignored.
Enable SR-IOV at the very end of the procedure to configure the device
for SR-IOV so that the PF is configured properly for SR-IOV when a VF is
activated.
Fixes: fa44f2f185 ("igb: Enable SR-IOV configuration via PCI sysfs interface")
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Tested-by: Marek Szlosek <marek.szlosek@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
When an interface with the maximum number of VLAN filters is brought up,
a spurious error is logged:
[257.483082] 8021q: adding VLAN 0 to HW filter on device enp0s3
[257.483094] iavf 0000:00:03.0 enp0s3: Max allowed VLAN filters 8. Remove existing VLANs or disable filtering via Ethtool if supported.
The VF driver complains that it cannot add the VLAN 0 filter.
On the other hand, the PF driver always adds VLAN 0 filter on VF
initialization. The VF does not need to ask the PF for that filter at
all.
Fix the error by not tracking VLAN 0 filters altogether. With that, the
check added by commit 0e710a3ffd ("iavf: Fix VF driver counting VLAN 0
filters") in iavf_virtchnl.c is useless and might be confusing if left as
it suggests that we track VLAN 0.
Fixes: 0e710a3ffd ("iavf: Fix VF driver counting VLAN 0 filters")
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Currently, IAVF's decode_rx_desc_ptype() correctly reports payload type
of L4 for IPv4 UDP packets and IPv{4,6} TCP, but only L3 for IPv6 UDP.
Originally, i40e, ice and iavf were affected.
Commit 73df8c9e3e ("i40e: Correct UDP packet header for non_tunnel-ipv6")
fixed that in i40e, then
commit 638a0c8c88 ("ice: fix incorrect payload indicator on PTYPE")
fixed that for ice.
IPv6 UDP is L4 obviously. Fix it and make iavf report correct L4 hash
type for such packets, so that the stack won't calculate it on CPU when
needs it.
Fixes: 206812b5fc ("i40e/i40evf: i40e implementation for skb_set_hash")
Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com>
Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Condition, which checks whether the netdev has hashing enabled is
inverted. Basically, the tagged commit effectively disabled passing flow
hash from descriptor to skb, unless user *disables* it via Ethtool.
Commit a876c3ba59 ("i40e/i40evf: properly report Rx packet hash")
fixed this problem, but only for i40e.
Invert the condition now in iavf and unblock passing hash to skbs again.
Fixes: 857942fd1a ("i40e: Fix Rx hash reported to the stack by our driver")
Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com>
Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Before commit bdc1ddad3e ("compat_ioctl: block: move
blkdev_compat_ioctl() into ioctl.c"), the config BLOCK_COMPAT was used to
include compat_ioctl.c into the kernel build. With this commit, the code
is moved into ioctl.c and included with the config COMPAT. So, since then,
the config BLOCK_COMPAT has no effect and any further purpose.
Remove this obsolete config BLOCK_COMPAT.
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20230316111630.4897-1-lukas.bulwahn@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
| BUG: Bad page state in process kworker/u8:0 pfn:5c001
| page:00000000bfda61c8 refcount:0 mapcount:0 mapping:0000000000000000 index:0x20001 pfn:0x5c001
| head:0000000011409842 order:9 entire_mapcount:0 nr_pages_mapped:0 pincount:1
| anon flags: 0x3fffc00000b0004(uptodate|head|mappedtodisk|swapbacked|node=0|zone=0|lastcpupid=0xffff)
| raw: 03fffc0000000000 fffffc0000700001 ffffffff00700903 0000000100000000
| raw: 0000000000000200 0000000000000000 00000000ffffffff 0000000000000000
| head: 03fffc00000b0004 dead000000000100 dead000000000122 ffff00000a809dc1
| head: 0000000000020000 0000000000000000 00000000ffffffff 0000000000000000
| page dumped because: nonzero pincount
| CPU: 3 PID: 9 Comm: kworker/u8:0 Not tainted 6.3.0-rc2-00001-gc6811bf0cd87 #1
| Hardware name: linux,dummy-virt (DT)
| Workqueue: events_unbound io_ring_exit_work
| Call trace:
| dump_backtrace+0x13c/0x208
| show_stack+0x34/0x58
| dump_stack_lvl+0x150/0x1a8
| dump_stack+0x20/0x30
| bad_page+0xec/0x238
| free_tail_pages_check+0x280/0x350
| free_pcp_prepare+0x60c/0x830
| free_unref_page+0x50/0x498
| free_compound_page+0xcc/0x100
| free_transhuge_page+0x1f0/0x2b8
| destroy_large_folio+0x80/0xc8
| __folio_put+0xc4/0xf8
| gup_put_folio+0xd0/0x250
| unpin_user_page+0xcc/0x128
| io_buffer_unmap+0xec/0x2c0
| __io_sqe_buffers_unregister+0xa4/0x1e0
| io_ring_exit_work+0x68c/0x1188
| process_one_work+0x91c/0x1a58
| worker_thread+0x48c/0xe30
| kthread+0x278/0x2f0
| ret_from_fork+0x10/0x20
Mark reports an issue with the recent patches coalescing compound pages
while registering them in io_uring. The reason is that we try to drop
excessive references with folio_put_refs(), but pages were acquired
with pin_user_pages(), which has extra accounting and so should be put
down with matching unpin_user_pages() or at least gup_put_folio().
As a fix unpin_user_pages() all but first page instead, and let's figure
out a better API after.
Fixes: 57bebf807e ("io_uring/rsrc: optimise registered huge pages")
Reported-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Tested-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/10efd5507d6d1f05ea0f3c601830e08767e189bd.1678980230.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
It is preferred to use typed property access functions (i.e.
of_property_read_<type> functions) rather than low-level
of_get_property/of_find_property functions for reading properties. As
part of this, convert of_get_property/of_find_property calls to the
recently added of_property_present() helper when we just want to test
for presence of a property and nothing more.
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Helge Deller <deller@gmx.de>
var->pixclock can be assigned to zero by user. Without
proper check, divide by zero would occur when invoking
macro PICOS2KHZ in au1200fb_fb_check_var.
Error out if var->pixclock is zero.
Signed-off-by: Wei Chen <harperchen1110@gmail.com>
Signed-off-by: Helge Deller <deller@gmx.de>
var->pixclock can be assigned to zero by user. Without proper
check, divide by zero would occur in lx_set_clock.
Error out if var->pixclock is zero.
Signed-off-by: Wei Chen <harperchen1110@gmail.com>
Signed-off-by: Helge Deller <deller@gmx.de>
Variable var->pixclock is controlled by user and can be assigned
to zero. Without proper check, divide by zero would occur in
intelfbhw_validate_mode and intelfbhw_mode_to_hw.
Error out if var->pixclock is zero.
Signed-off-by: Wei Chen <harperchen1110@gmail.com>
Signed-off-by: Helge Deller <deller@gmx.de>
variable var->pixclock can be set by user. In case it
equals to zero, divide by zero would occur in nvidiafb_set_par.
Similar crashes have happened in other fbdev drivers. There
is no check and modification on var->pixclock along the call
chain to nvidia_check_var and nvidiafb_set_par. We believe it
could also be triggered in driver nvidia from user site.
Signed-off-by: Wei Chen <harperchen1110@gmail.com>
Signed-off-by: Helge Deller <deller@gmx.de>
Find a valid modeline depending on the machine graphic card
configuration and add the fb_check_var() function to validate
Xorg provided graphics settings.
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org
Prepare to add more files to the source RPM.
Also, fix the build error when KCONFIG_CONFIG is set:
error: Bad file: ./.config: No such file or directory
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
msg_ring requests transferring files support auto index selection via
IORING_FILE_INDEX_ALLOC, however they don't return the selected index
to the target ring and there is no other good way for the userspace to
know where is the receieved file.
Return the index for allocated slots and 0 otherwise, which is
consistent with other fixed file installing requests.
Cc: stable@vger.kernel.org # v6.0+
Fixes: e6130eba8a ("io_uring: add support for passing fixed file descriptors")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://github.com/axboe/liburing/issues/809
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull NVMe fixes from Christoph:
"nvme fixes for Linux 6.3
- avoid potential UAF in nvmet_req_complete (Damien Le Moal)
- more quirks (Elmer Miroslav Mosher Golovin, Philipp Geulen)
- fix a memory leak in the nvme-pci probe teardown path (Irvin Cote)
- repair the MAINTAINERS entry (Lukas Bulwahn)
- fix handling single range discard request (Ming Lei)
- show more opcode names in trace events (Minwoo Im)
- fix nvme-tcp timeout reporting (Sagi Grimberg)"
* tag 'nvme-6.3-2022-03-16' of git://git.infradead.org/nvme:
nvmet: avoid potential UAF in nvmet_req_complete()
nvme-trace: show more opcode names
nvme-tcp: add nvme-tcp pdu size build protection
nvme-tcp: fix opcode reporting in the timeout handler
nvme-pci: add NVME_QUIRK_BOGUS_NID for Lexar NM620
nvme-pci: add NVME_QUIRK_BOGUS_NID for Netac NV3000
nvme-pci: fixing memory leak in probe teardown path
nvme: fix handling single range discard request
MAINTAINERS: repair malformed T: entries in NVM EXPRESS DRIVERS
cmdline_find_option() may fail before doing any initialization of
the buffer array. This may lead to unpredictable results when the same
buffer is used later in calls to strncmp() function. Fix the issue by
returning early if cmdline_find_option() returns an error.
Found by Linux Verification Center (linuxtesting.org) with static
analysis tool SVACE.
Fixes: aca20d5462 ("x86/mm: Add support to make use of Secure Memory Encryption")
Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Cc: <stable@kernel.org>
Link: https://lore.kernel.org/r/20230306160656.14844-1-n.zhandarovich@fintech.ru
When ida_alloc() fails, "pi" is not freed although the misleading
comment says otherwise.
Move the ida_alloc() call up so we really don't have to free "pi" in
case of ida_alloc() failure.
Also move ida_free() call from pi_remove_one() to
pata_parport_dev_release(). It was dereferencing already freed dev
pointer.
Testing revealed leak even in non-failure case which was tracked down
to missing put_device() call after bus_find_device_by_name(). As a
result, pata_parport_dev_release() was never called.
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <error27@gmail.com>
Link: https://lore.kernel.org/r/202303111822.IHNchbkp-lkp@intel.com/
Signed-off-by: Ondrej Zary <linux@zary.sk>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
The full pagecache drop at the end of FS_IOC_ENABLE_VERITY is causing
performance problems and is hindering adoption of fsverity. It was
intended to solve a race condition where unverified pages might be left
in the pagecache. But actually it doesn't solve it fully.
Since the incomplete solution for this race condition has too much
performance impact for it to be worth it, let's remove it for now.
Fixes: 3fda4c617e ("fs-verity: implement FS_IOC_ENABLE_VERITY ioctl")
Cc: stable@vger.kernel.org
Reviewed-by: Victor Hsieh <victorhsieh@google.com>
Link: https://lore.kernel.org/r/20230314235332.50270-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Since the blamed commit, phy_ethtool_get_wol() and phy_ethtool_set_wol()
acquire phydev->lock, but the mscc phy driver implementations,
vsc85xx_wol_get() and vsc85xx_wol_set(), acquire the same lock as well,
resulting in a deadlock.
$ ip link set swp3 down
============================================
WARNING: possible recursive locking detected
mscc_felix 0000:00:00.5 swp3: Link is Down
--------------------------------------------
ip/375 is trying to acquire lock:
ffff3d7e82e987a8 (&dev->lock){+.+.}-{4:4}, at: vsc85xx_wol_get+0x2c/0xf4
but task is already holding lock:
ffff3d7e82e987a8 (&dev->lock){+.+.}-{4:4}, at: phy_ethtool_get_wol+0x3c/0x6c
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&dev->lock);
lock(&dev->lock);
*** DEADLOCK ***
May be due to missing lock nesting notation
2 locks held by ip/375:
#0: ffffd43b2a955788 (rtnl_mutex){+.+.}-{4:4}, at: rtnetlink_rcv_msg+0x144/0x58c
#1: ffff3d7e82e987a8 (&dev->lock){+.+.}-{4:4}, at: phy_ethtool_get_wol+0x3c/0x6c
Call trace:
__mutex_lock+0x98/0x454
mutex_lock_nested+0x2c/0x38
vsc85xx_wol_get+0x2c/0xf4
phy_ethtool_get_wol+0x50/0x6c
phy_suspend+0x84/0xcc
phy_state_machine+0x1b8/0x27c
phy_stop+0x70/0x154
phylink_stop+0x34/0xc0
dsa_port_disable_rt+0x2c/0xa4
dsa_slave_close+0x38/0xec
__dev_close_many+0xc8/0x16c
__dev_change_flags+0xdc/0x218
dev_change_flags+0x24/0x6c
do_setlink+0x234/0xea4
__rtnl_newlink+0x46c/0x878
rtnl_newlink+0x50/0x7c
rtnetlink_rcv_msg+0x16c/0x58c
Removing the mutex_lock(&phydev->lock) calls from the driver restores
the functionality.
Fixes: 2f987d4866 ("net: phy: Add locks to ethtool functions")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20230314153025.2372970-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The ltc2992 drivers uses a mutex and I2C bus access in its GPIO chip `set`
and `get` implementation. This means these functions can sleep and the GPIO
chip should set the `can_sleep` property to true.
This will ensure that a warning is printed when trying to set or get the
GPIO value from a context that potentially can't sleep.
Fixes: 9ca26df1ba ("hwmon: (ltc2992) Add support for GPIOs.")
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Link: https://lore.kernel.org/r/20230314093146.2443845-2-lars@metafoo.de
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
The adm1266 driver uses I2C bus access in its GPIO chip `set` and `get`
implementation. This means these functions can sleep and the GPIO chip
should set the `can_sleep` property to true.
This will ensure that a warning is printed when trying to set or get the
GPIO value from a context that potentially can't sleep.
Fixes: d98dfad35c ("hwmon: (pmbus/adm1266) Add support for GPIOs")
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Link: https://lore.kernel.org/r/20230314093146.2443845-1-lars@metafoo.de
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
The cited commit attempts to update the hw stats when dumping tc actions.
However, the driver may be called to update the stats of a police action
that may not be in hardware. In such cases the driver will fail to lookup
the police action object and will output an error message both to extack
and dmesg. The dmesg error is confusing as it may not indicate an actual
error.
Remove the dmesg error.
Fixes: 2b68d659a7 ("net/mlx5e: TC, support per action stats")
Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Currently the cloned flow attr resets the original tc action cookies
count.
Fix that by resetting the cloned flow attribute.
Fixes: cca7eac138 ("net/mlx5e: TC, store tc action cookies per attr")
Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Freed counters may be reused by fs core.
As such, raw counters may not be initialized to zero.
Cache the counter values when the action stats object is initialized to
have a proper base value for calculating the difference from the previous
query.
Fixes: 2b68d659a7 ("net/mlx5e: TC, support per action stats")
Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
XSK redirecting XDP programs require linearity, hence applies
restrictions on the MTU. For PAGE_SIZE=4K, MTU shouldn't exceed 3498.
Features that contradict with XDP such HW-LRO and HW-GRO are enforced
by the driver in advance, during XSK params validation, except for MTU,
which was not enforced before this patch.
This has been spotted during test scenario described below:
Attaching xdpsock program (PAGE_SIZE=4K), with MTU < 3498, detaching
XDP program, changing the MTU to arbitrary value in the range
[3499, 3754], attaching XDP program again, which ended up with failure
since MTU is > 3498.
This commit lowers the XSK MTU limitation to be aligned with XDP MTU
limitation, since XSK socket is meaningless without XDP program.
Signed-off-by: Adham Faris <afaris@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Currently, BREAK_FW_WAIT flag is set after syncing with fw_reset.
However, fw_reset can call mlx5_load_one() which is waiting for fw
init bit and BREAK_FW_WAIT flag is intended to stop. e.g.: the driver
might wait on a loop it should exit.
Fix it by setting the flag before syncing with fw_reset.
Fixes: 8324a02c34 ("net/mlx5: Add exit route when waiting for FW")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Do proper error unwinding when adding an unsupported TX/RX cipher type.
Move the switch case prior to key creation so there's less to unwind,
and change the goto label name to describe the action performed instead
of what failed.
Fixes: 4960c414db ("net/mlx5e: Support 256 bit keys with kTLS device offload")
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
During module is unloaded while a peer tc flow is still offloaded,
first the peer uplink rep profile is changed to a nic profile, and so
neigh encap lock is destroyed. Next during unload, the VF reps netdevs
are unregistered which causes the original non-peer tc flow to be deleted,
which deletes the peer flow. The peer flow deletion detaches the encap
entry and try to take the already destroyed encap lock, causing the
below trace.
Fix this by clearing peer flows during tc eswitch cleanup
(mlx5e_tc_esw_cleanup()).
Relevant trace:
[ 4316.837128] BUG: kernel NULL pointer dereference, address: 00000000000001d8
[ 4316.842239] RIP: 0010:__mutex_lock+0xb5/0xc40
[ 4316.851897] Call Trace:
[ 4316.852481] <TASK>
[ 4316.857214] mlx5e_rep_neigh_entry_release+0x93/0x790 [mlx5_core]
[ 4316.858258] mlx5e_rep_encap_entry_detach+0xa7/0xf0 [mlx5_core]
[ 4316.859134] mlx5e_encap_dealloc+0xa3/0xf0 [mlx5_core]
[ 4316.859867] clean_encap_dests.part.0+0x5c/0xe0 [mlx5_core]
[ 4316.860605] mlx5e_tc_del_fdb_flow+0x32a/0x810 [mlx5_core]
[ 4316.862609] __mlx5e_tc_del_fdb_peer_flow+0x1a2/0x250 [mlx5_core]
[ 4316.863394] mlx5e_tc_del_flow+0x(/0x630 [mlx5_core]
[ 4316.864090] mlx5e_flow_put+0x5f/0x100 [mlx5_core]
[ 4316.864771] mlx5e_delete_flower+0x4de/0xa40 [mlx5_core]
[ 4316.865486] tc_setup_cb_reoffload+0x20/0x80
[ 4316.865905] fl_reoffload+0x47c/0x510 [cls_flower]
[ 4316.869181] tcf_block_playback_offloads+0x91/0x1d0
[ 4316.869649] tcf_block_unbind+0xe7/0x1b0
[ 4316.870049] tcf_block_offload_cmd.isra.0+0x1ee/0x270
[ 4316.879266] tcf_block_offload_unbind+0x61/0xa0
[ 4316.879711] __tcf_block_put+0xa4/0x310
Fixes: 04de7dda73 ("net/mlx5e: Infrastructure for duplicated offloading of TC flows")
Fixes: 1418ddd96a ("net/mlx5e: Duplicate offloaded TC eswitch rules under uplink LAG")
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Rules with mirror actions are split to two FTEs when the actions after the mirror
action contains pedit, vlan push/pop or ct. Forward to ovs internal port adds
implicit header rewrite (pedit) but missing trigger to do split.
Fix by setting split_count when forwarding to ovs internal port which
will trigger split in mirror rules.
Fixes: 27484f7170 ("net/mlx5e: Offload tc rules that redirect to ovs internal port")
Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
In few cases, rules with mirror use case are split to two FTEs, one which
do the mirror action and forward to second FTE which do the rest of the rule
actions and the second redirect action.
In case of mirror rules which do split and forward to ovs internal port or
VF stack devices, source port rewrite should be used in the second FTE but
it is wrongly also set in the first FTE which break the offload.
Fix this issue by removing the wrong check if source port rewrite is needed to
be used on the first FTE of the split and instead return EOPNOTSUPP which will
block offload of rules which mirror to ovs internal port or VF stack devices
which isn't supported.
Fixes: 10742efc20 ("net/mlx5e: VF tunnel TX traffic offloading")
Fixes: a508728a4c ("net/mlx5e: VF tunnel RX traffic offloading")
Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The offending commit changed the ordering of moving to legacy mode and
waiting for the VF pages. Moving to legacy mode is important in
bluefield, because it sends the host driver into error state, and frees
its pages. Without this transition we end up waiting 2 minutes for
pages that aren't coming before carrying on with the unload process.
Fixes: f019679ea5 ("net/mlx5: E-switch, Remove dependency between sriov and eswitch mode")
Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
When ECPF is a page supplier, reclaim pages missed to honor the
ec_function bit provided by the firmware. It always used the ec_function
to true during driver unload flow for ECPF. This is incorrect.
Honor the ec_function bit provided by device during page allocation
request event.
Fixes: d6945242f4 ("net/mlx5: Hold pages RB tree per VF")
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
When mlx5e attaches again after device health recovery, the device
capabilities might have changed by the eswitch manager.
For example in one flow when ECPF changes the eswitch mode between
legacy and switchdev, it updates the flow table tunnel capability.
The cached value is only used in one place, so just check the capability
there instead.
Fixes: 5bef709d76 ("net/mlx5: Enable host PF HCA after eswitch is initialized")
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Currently mlx5e_macsec_umr struct does not satisfy hardware memory
alignment requirement. Hence the result of querying advanced steering
operation (ASO) is not copied to the memory region as expected.
Fix by satisfying hardware memory alignment requirement and move
context to be first field in struct for better readability.
Fixes: 1f53da6764 ("net/mlx5e: Create advanced steering operation (ASO) object for MACsec")
Signed-off-by: Emeel Hakim <ehakim@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Handle case when module is unloaded (kfd_exit) before a process space
(mm_struct) is released.
v2: Fixed potential race conditions by removing all kfd_process from
the process table first, then working on releasing the resources.
v3: Fixed loop element access / synchronization. Fixed extra empty lines.
Signed-off-by: David Belanger <david.belanger@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
As a temporary storage, staged_config[] in rdt_domain should be cleared
before and after it is used. The stale value in staged_config[] could
cause an MSR access error.
Here is a reproducer on a system with 16 usable CLOSIDs for a 15-way L3
Cache (MBA should be disabled if the number of CLOSIDs for MB is less than
16.) :
mount -t resctrl resctrl -o cdp /sys/fs/resctrl
mkdir /sys/fs/resctrl/p{1..7}
umount /sys/fs/resctrl/
mount -t resctrl resctrl /sys/fs/resctrl
mkdir /sys/fs/resctrl/p{1..8}
An error occurs when creating resource group named p8:
unchecked MSR access error: WRMSR to 0xca0 (tried to write 0x00000000000007ff) at rIP: 0xffffffff82249142 (cat_wrmsr+0x32/0x60)
Call Trace:
<IRQ>
__flush_smp_call_function_queue+0x11d/0x170
__sysvec_call_function+0x24/0xd0
sysvec_call_function+0x89/0xc0
</IRQ>
<TASK>
asm_sysvec_call_function+0x16/0x20
When creating a new resource control group, hardware will be configured
by the following process:
rdtgroup_mkdir()
rdtgroup_mkdir_ctrl_mon()
rdtgroup_init_alloc()
resctrl_arch_update_domains()
resctrl_arch_update_domains() iterates and updates all resctrl_conf_type
whose have_new_ctrl is true. Since staged_config[] holds the same values as
when CDP was enabled, it will continue to update the CDP_CODE and CDP_DATA
configurations. When group p8 is created, get_config_index() called in
resctrl_arch_update_domains() will return 16 and 17 as the CLOSIDs for
CDP_CODE and CDP_DATA, which will be translated to an invalid register -
0xca0 in this scenario.
Fix it by clearing staged_config[] before and after it is used.
[reinette: re-order commit tags]
Fixes: 75408e4350 ("x86/resctrl: Allow different CODE/DATA configurations to be staged")
Suggested-by: Xin Hao <xhao@linux.alibaba.com>
Signed-off-by: Shawn Wang <shawnwang@linux.alibaba.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Cc:stable@vger.kernel.org
Link: https://lore.kernel.org/all/2fad13f49fbe89687fc40e9a5a61f23a28d1507a.1673988935.git.reinette.chatre%40intel.com
[Why]
In USB4 DP tunneling, it's possible to have this scenario that
the path becomes unavailable and CM tears down the path a little bit late.
So, in this case, the HPD is high but fails to read any DPCD register.
That causes the link connection type to be set to sst.
And not all sinks are removed behind the MST branch.
[How]
Restore the link connection type if it fails to read DPCD register.
Cc: stable@vger.kernel.org
Cc: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Wenjing Liu <Wenjing.Liu@amd.com>
Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com>
Signed-off-by: Cruise Hung <Cruise.Hung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[WHY]
Hot plugging and then hot unplugging leads to k1 and k2 values to
change, as signal is detected as a virtual signal on hot unplug. Writing
these values to OTG_PIXEL_RATE_DIV register might cause primary display
to blank (known hw bug).
[HOW]
No longer write k1 and k2 values to register if signal is virtual, we
have safe guards in place in the case that k1 and k2 is unassigned so
that an unknown value is not written to the register either.
Cc: stable@vger.kernel.org
Cc: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Samson Tam <Samson.Tam@amd.com>
Reviewed-by: Alvin Lee <Alvin.Lee2@amd.com>
Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com>
Signed-off-by: Saaem Rizvi <SyedSaaem.Rizvi@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Events should only be added to a groups rb tree if they have not been
removed from their context by list_del_event(). Since remove_on_exec
made it possible to call list_del_event() on individual events before
they are detached from their group, perf_group_detach() should check each
sibling's attach_state before calling add_event_to_groups() on it.
Fixes: 2e498d0a74 ("perf: Add support for event removal on exec")
Signed-off-by: Budimir Markovic <markovicbudimir@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/ZBFzvQV9tEqoHEtH@gentoo
Time readers rely on perf_event_context->[time|timestamp|timeoffset] to get
accurate time_enabled and time_running for an event. The difference between
ctx->timestamp and ctx->time is the among of time when the context is not
enabled. __update_context_time(ctx, false) is used to increase timestamp,
but not time. Therefore, it should only be called in ctx_sched_in() when
EVENT_TIME was not enabled.
Fixes: 09f5e7dc7a ("perf: Fix perf_event_read_local() time")
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/r/20230313171608.298734-1-song@kernel.org
syzkaller reportes a KASAN issue with stack-out-of-bounds.
The call trace is as follows:
dump_stack+0x9c/0xd3
print_address_description.constprop.0+0x19/0x170
__kasan_report.cold+0x6c/0x84
kasan_report+0x3a/0x50
__perf_event_header__init_id+0x34/0x290
perf_event_header__init_id+0x48/0x60
perf_output_begin+0x4a4/0x560
perf_event_bpf_output+0x161/0x1e0
perf_iterate_sb_cpu+0x29e/0x340
perf_iterate_sb+0x4c/0xc0
perf_event_bpf_event+0x194/0x2c0
__bpf_prog_put.constprop.0+0x55/0xf0
__cls_bpf_delete_prog+0xea/0x120 [cls_bpf]
cls_bpf_delete_prog_work+0x1c/0x30 [cls_bpf]
process_one_work+0x3c2/0x730
worker_thread+0x93/0x650
kthread+0x1b8/0x210
ret_from_fork+0x1f/0x30
commit 267fb27352 ("perf: Reduce stack usage of perf_output_begin()")
use on-stack struct perf_sample_data of the caller function.
However, perf_event_bpf_output uses incorrect parameter to convert
small-sized data (struct perf_bpf_event) into large-sized data
(struct perf_sample_data), which causes memory overwriting occurs in
__perf_event_header__init_id.
Fixes: 267fb27352 ("perf: Reduce stack usage of perf_output_begin()")
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20230314044735.56551-1-yangjihong1@huawei.com
The space_info->active_total_bytes is no longer necessary as we now
count the region of newly allocated block group as zone_unusable. Drop
its usage.
Fixes: 6a921de589 ("btrfs: zoned: introduce space_info->active_total_bytes")
CC: stable@vger.kernel.org # 6.1+
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The naming of space_info->active_total_bytes is misleading. It counts
not only active block groups but also full ones which are previously
active but now inactive. That confusion results in a bug not counting
the full BGs into active_total_bytes on mount time.
For a background, there are three kinds of block groups in terms of
activation.
1. Block groups never activated
2. Block groups currently active
3. Block groups previously active and currently inactive (due to fully
written or zone finish)
What we really wanted to exclude from "total_bytes" is the total size of
BGs #1. They seem empty and allocatable but since they are not activated,
we cannot rely on them to do the space reservation.
And, since BGs #1 never get activated, they should have no "used",
"reserved" and "pinned" bytes.
OTOH, BGs #3 can be counted in the "total", since they are already full
we cannot allocate from them anyway. For them, "total_bytes == used +
reserved + pinned + zone_unusable" should hold.
Tracking #2 and #3 as "active_total_bytes" (current implementation) is
confusing. And, tracking #1 and subtract that properly from "total_bytes"
every time you need space reservation is cumbersome.
Instead, we can count the whole region of a newly allocated block group as
zone_unusable. Then, once that block group is activated, release
[0 .. zone_capacity] from the zone_unusable counters. With this, we can
eliminate the confusing ->active_total_bytes and the code will be common
among regular and the zoned mode. Also, no additional counter is needed
with this approach.
Fixes: 6a921de589 ("btrfs: zoned: introduce space_info->active_total_bytes")
CC: stable@vger.kernel.org # 6.1+
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We do
cache->space_info->counter += num_bytes;
everywhere in here. This is makes the lines longer than they need to
be, and will be especially noticeable when we add the active tracking in,
so add a temp variable for the space_info so this is cleaner.
Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This flag only gets set when we're doing active zone tracking, and we're
going to need to use this flag for things related to this behavior.
Rename the flag to represent what it actually means for the file system
so it can be used in other ways and still make sense.
Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
btrfs_can_activate_zone() returns true if at least one device has one zone
available for activation. This is OK for the single profile, but not OK for
DUP profile. We need two zones to create a DUP block group. Fix it by
properly handling the case with the profile flags.
Fixes: 265f7237dd ("btrfs: zoned: allow DUP on meta-data block groups")
CC: stable@vger.kernel.org # 6.1+
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Commit 1ec49744ba ("btrfs: turn on -Wmaybe-uninitialized") exposed
that on SPARC and PA-RISC, gcc is unaware that fscrypt_setup_filename()
only returns negative error values or 0. This ultimately results in a
maybe-uninitialized warning in btrfs_lookup_dentry().
Change to only return negative error values or 0 from
fscrypt_setup_filename() at the relevant call site, and assert that no
positive error codes are returned (which would have wider implications
involving other users).
Reported-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/all/481b19b5-83a0-4793-b4fd-194ad7b978c3@roeck-us.net/
Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
During my scrub rework, I did a stupid thing like this:
bio->bi_iter.bi_sector = stripe->logical;
btrfs_submit_bio(fs_info, bio, stripe->mirror_num);
Above bi_sector assignment is using logical address directly, which
lacks ">> SECTOR_SHIFT".
This results a read on a range which has no chunk mapping.
This results the following crash:
BTRFS critical (device dm-1): unable to find logical 11274289152 length 65536
assertion failed: !IS_ERR(em), in fs/btrfs/volumes.c:6387
Sure this is all my fault, but this shows a possible problem in real
world, that some bit flip in file extents/tree block can point to
unmapped ranges, and trigger above ASSERT(), or if CONFIG_BTRFS_ASSERT
is not configured, cause invalid pointer access.
[PROBLEMS]
In the above call chain, we just don't handle the possible error from
btrfs_get_chunk_map() inside __btrfs_map_block().
[FIX]
The fix is straightforward, replace the ASSERT() with proper error
handling (callers handle errors already).
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Pull kselftest fixes from Shuah Khan:
"A fix to amd-pstate test Makefile and a fix to LLVM build for x86 in
kselftest common lib.mk"
* tag 'linux-kselftest-fixes-6.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests: fix LLVM build for i386 and x86_64
selftests: amd-pstate: fix TEST_FILES
Pull MD fixes from Song:
"This set contains two fixes for old issues (by Neil) and one fix
for 6.3 (by Xiao)."
* 'md-fixes' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md:
md: select BLOCK_LEGACY_AUTOLOAD
md: avoid signed overflow in slot_store()
md: Free resources in __md_stop
When BLOCK_LEGACY_AUTOLOAD is not enable, mdadm is not able to
activate new arrays unless "CREATE names=yes" appears in
mdadm.conf
As this is a regression we need to always enable BLOCK_LEGACY_AUTOLOAD
for when MD is selected - at least until mdadm is updated and the
updates widely available.
Cc: stable@vger.kernel.org # v5.18+
Fixes: fbdee71bb5 ("block: deprecate autoloading based on dev_t")
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Song Liu <song@kernel.org>
While using iostat for raid, I observed very strange 'await'
occasionally, and turns out it's due to that 'ios' and 'sectors' is
counted in bdev_start_io_acct(), while 'nsecs' is counted in
bdev_end_io_acct(). I'm not sure why they are ccounted like that
but I think this behaviour is obviously wrong because user will get
wrong disk stats.
Fix the problem by counting 'ios' and 'sectors' when io is done, like
what rq-based device does.
Fixes: 394ffa503b ("blk: introduce generic io stat accounting help function")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230223091226.1135678-1-yukuai1@huaweicloud.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
In general, if swiotlb is sufficient, the logic of index =
wrap_area_index(mem, index + 1) is fine, it will quickly take a slot and
release the area->lock; But if swiotlb is insufficient and the device
has min_align_mask requirements, such as NVME, we may not be able to
satisfy index == wrap and exit the loop properly. In this case, other
kernel threads will not be able to acquire the area->lock and release
the slot, resulting in a deadlock.
The current implementation of wrap_area_index does not involve a modulo
operation, so adjusting the wrap to ensure the loop ends is not trivial.
Introduce a new variable to record the number of loops and exit the loop
after completing the traversal.
Backtraces:
Other CPUs are waiting this core to exit the swiotlb_do_find_slots
loop.
[10199.924391] RIP: 0010:swiotlb_do_find_slots+0x1fe/0x3e0
[10199.924403] Call Trace:
[10199.924404] <TASK>
[10199.924405] swiotlb_tbl_map_single+0xec/0x1f0
[10199.924407] swiotlb_map+0x5c/0x260
[10199.924409] ? nvme_pci_setup_prps+0x1ed/0x340
[10199.924411] dma_direct_map_page+0x12e/0x1c0
[10199.924413] nvme_map_data+0x304/0x370
[10199.924415] nvme_prep_rq.part.0+0x31/0x120
[10199.924417] nvme_queue_rq+0x77/0x1f0
...
[ 9639.596311] NMI backtrace for cpu 48
[ 9639.596336] Call Trace:
[ 9639.596337]
[ 9639.596338] _raw_spin_lock_irqsave+0x37/0x40
[ 9639.596341] swiotlb_do_find_slots+0xef/0x3e0
[ 9639.596344] swiotlb_tbl_map_single+0xec/0x1f0
[ 9639.596347] swiotlb_map+0x5c/0x260
[ 9639.596349] dma_direct_map_sg+0x7a/0x280
[ 9639.596352] __dma_map_sg_attrs+0x30/0x70
[ 9639.596355] dma_map_sgtable+0x1d/0x30
[ 9639.596356] nvme_map_data+0xce/0x370
...
[ 9639.595665] NMI backtrace for cpu 50
[ 9639.595682] Call Trace:
[ 9639.595682]
[ 9639.595683] _raw_spin_lock_irqsave+0x37/0x40
[ 9639.595686] swiotlb_release_slots.isra.0+0x86/0x180
[ 9639.595688] dma_direct_unmap_sg+0xcf/0x1a0
[ 9639.595690] nvme_unmap_data.part.0+0x43/0xc0
Fixes: 1f221a0d0d ("swiotlb: respect min_align_mask")
Signed-off-by: GuoRui.Yu <GuoRui.Yu@linux.alibaba.com>
Signed-off-by: Xiaokang Hu <xiaokang.hxk@alibaba-inc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
An nvme target ->queue_response() operation implementation may free the
request passed as argument. Such implementation potentially could result
in a use after free of the request pointer when percpu_ref_put() is
called in nvmet_req_complete().
Avoid such problem by using a local variable to save the sq pointer
before calling __nvmet_req_complete(), thus avoiding dereferencing the
req pointer after that function call.
Fixes: a07b4970f4 ("nvmet: add a generic NVMe target")
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
We have more commands to show in the trace. Sync up.
Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Make sure that we don't somehow mess up the wire structures in the spec.
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <kkch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
For non in-capsule writes we reuse the request pdu space for a h2cdata
pdu in order to avoid over allocating space (either preallocate or
dynamically upon receving an r2t pdu). However if the request times out
the core expects to find the opcode in the start of the request, which
we override.
In order to prevent that, without sacrificing additional 24 bytes per
request, we just use the tail of the command pdu space instead (last
24 bytes from the 72 bytes command pdu). That should make the command
opcode always available, and we get away from allocating more space.
If in the future we would need the last 24 bytes of the nvme command
available we would need to allocate a dedicated space for it in the
request, but until then we can avoid doing so.
Reported-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <kkch@nvidia.com>
Tested-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
In case the nvme_probe teardown path is triggered the ctrl ref count does
not reach 0 thus creating a memory leak upon failure of nvme_probe.
Signed-off-by: Irvin Cote <irvincoteg@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
When investigating one customer report on warning in nvme_setup_discard,
we observed the controller(nvme/tcp) actually exposes
queue_max_discard_segments(req->q) == 1.
Obviously the current code can't handle this situation, since contiguity
merge like normal RW request is taken.
Fix the issue by building range from request sector/nr_sectors directly.
Fixes: b35ba01ea6 ("nvme: support ranged discard requests")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
The T: entries shall be composed of a SCM tree type (git, hg, quilt, stgit
or topgit) and location.
Add the SCM tree type to the T: entry, and reorder the file entries in
alphabetical order.
Fixes: b508fc354f ("nvme: update maintainers information")
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
When injecting a fake timeout into the null_blk driver using
fail_io_timeout, the request timeout handler does not execute
blk_mq_complete_request(), so the complete callback is never executed
for a timedout request.
The null_blk driver also has a driver-specific fake timeout mechanism
which does not have this problem. Fix the problem with fail_io_timeout
by using the same meachanism as null_blk internal timeout feature, using
the fake_timeout field of null_blk commands.
Reported-by: Akinobu Mita <akinobu.mita@gmail.com>
Fixes: de3510e52b ("null_blk: fix command timeout completion handling")
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20230314041106.19173-2-damien.lemoal@opensource.wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The driver can be compile tested with !CONFIG_OF making certain data
unused:
drivers/net/wireless/marvell/mwifiex/sdio.c:498:34: error: ‘mwifiex_sdio_of_match_table’ defined but not used [-Werror=unused-const-variable=]
drivers/net/wireless/marvell/mwifiex/pcie.c:175:34: error: ‘mwifiex_pcie_of_match_table’ defined but not used [-Werror=unused-const-variable=]
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20230312132523.352182-1-krzysztof.kozlowski@linaro.org
To support detection of read faults with Radix execute-only memory, the
vma_is_accessible() check in access_error() (which checks for PROT_NONE)
was replaced with a check to see if VM_READ was missing, and if so,
returns true to assert the fault was caused by a bad read.
This is incorrect, as it ignores that both VM_WRITE and VM_EXEC imply
read on powerpc, as defined in protection_map[]. This causes mappings
containing VM_WRITE or VM_EXEC without VM_READ to misreport the cause of
page faults, since the MMU is still allowing reads.
Correct this by restoring the original vma_is_accessible() check for
PROT_NONE mappings, and adding a separate check for Radix PROT_EXEC-only
mappings.
Fixes: 395cac7752 ("powerpc/mm: Support execute-only memory on the Radix MMU")
Reported-by: Michal Suchánek <msuchanek@suse.de>
Link: https://lore.kernel.org/r/20230308152702.GR19419@kitsune.suse.cz
Tested-by: Benjamin Gray <bgray@linux.ibm.com>
Signed-off-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230310050834.63105-1-ruscur@russell.cc
Daniel Golle says:
====================
net: ethernet: mtk_eth_soc: minor SGMII fixes
This small series brings two minor fixes for the SGMII unit found in
MediaTek's router SoCs.
The first patch resets the PCS internal state machine on major
configuration changes, just like it is also done in MediaTek's SDK.
The second patch makes sure we only write values and restart AN if
actually needed, thus preventing unnesseray loss of an existing link
in some cases.
Both patches have previously been submitted as part of the series
"net: ethernet: mtk_eth_soc: various enhancements" which grew a bit
too big and it has correctly been criticized that some of the patches
should rather go as fixes to net-next.
This new series tries to address this.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Only restart auto-negotiation and write link timer if actually
necessary. This prevents losing the link in case of minor
changes.
Fixes: 7e53837269 ("net: ethernet: mediatek: Re-add support SGMII")
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Tested-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reset the internal PCS state machine when changing interface mode.
This prevents confusing the state machine when changing interface
modes, e.g. from SGMII to 2500Base-X or vice-versa.
Fixes: 7e53837269 ("net: ethernet: mediatek: Re-add support SGMII")
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Tested-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Packet length retrieved from skb data may be larger than
the actual socket buffer length (up to 9026 bytes). In such
case the cloned skb passed up the network stack will leak
kernel memory contents.
Fixes: d0cad87170 ("smsc75xx: SMSC LAN75xx USB gigabit ethernet adapter driver")
Signed-off-by: Szymon Heidrich <szymon.heidrich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wenjia Zhang says:
====================
net/smc: Fixes 2023-03-01
The 1st patch solves the problem that CLC message initialization was
not properly reversed in error handling path. And the 2nd one fixes
the possible deadlock triggered by cancel_delayed_work_sync().
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Spectrum ASICs have a configurable limit on how deep into the packet
they parse. By default, the limit is 96 bytes.
There are several cases where this parsing depth is not enough and there
is a need to increase it. For example, timestamping of PTP packets and a
FIB multipath hash policy that requires hashing on inner fields. The
driver therefore maintains a reference count that reflects the number of
consumers that require an increased parsing depth.
During reload_down() the parsing depth reference count does not
necessarily drop to zero, but the parsing depth itself is restored to
the default during reload_up() when the firmware is reset. It is
therefore possible to end up in situations where the driver thinks that
the parsing depth was increased (reference count is non-zero), when it
is not.
Fix by making sure that all the consumers that increase the parsing
depth reference count also decrease it during reload_down().
Specifically, make sure that when the routing code is de-initialized it
drops the reference count if it was increased because of a FIB multipath
hash policy that requires hashing on inner fields.
Add a warning if the reference count is not zero after the driver was
de-initialized and explicitly reset it to zero during initialization for
good measures.
Fixes: 2d91f0803b ("mlxsw: spectrum: Add infrastructure for parsing configuration")
Reported-by: Maksym Yaremchuk <maksymy@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://lore.kernel.org/r/9c35e1b3e6c1d8f319a2449d14e2b86373f3b3ba.1678727526.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This bug influences both st_nci_i2c_remove and st_nci_spi_remove.
Take st_nci_i2c_remove as an example.
In st_nci_i2c_probe, it called ndlc_probe and bound &ndlc->sm_work
with llt_ndlc_sm_work.
When it calls ndlc_recv or timeout handler, it will finally call
schedule_work to start the work.
When we call st_nci_i2c_remove to remove the driver, there
may be a sequence as follows:
Fix it by finishing the work before cleanup in ndlc_remove
CPU0 CPU1
|llt_ndlc_sm_work
st_nci_i2c_remove |
ndlc_remove |
st_nci_remove |
nci_free_device|
kfree(ndev) |
//free ndlc->ndev |
|llt_ndlc_rcv_queue
|nci_recv_frame
|//use ndlc->ndev
Fixes: 35630df68d ("NFC: st21nfcb: Add driver for STMicroelectronics ST21NFCB NFC chip")
Signed-off-by: Zheng Wang <zyytlz.wz@163.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20230312160837.2040857-1-zyytlz.wz@163.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The test checks if (IPv4, IPv6) address pair properly conflict or not.
* IPv4
* 0.0.0.0
* 127.0.0.1
* IPv6
* ::
* ::1
If the IPv6 address is [::], the second bind() always fails.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Use dh_listpackages to get a list of all binary packages.
With this, debian/control lists which binary packages will be produced.
Previously, ARCH=um listed linux-libc-dev in debian/control, but it
was not generated because each of mkdebian and builddeb independently
maintained the if-conditionals.
Another motivation is to allow scripts/package/builddeb to get the
package name (linux-image-*, etc.) dynamically from debian/control.
This will also allow the BuildProfile to control the generation of
the binary packages.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Commit 3ab18a625c ("kbuild: deb-pkg: improve the usability of source
package") set needless CROSS_COMPILE.
For example, 'make allnoconfig bindeb-pkg' on a x86_64 system will set
CROSS_COMPILE=i686-linux-gnu-, where the biarch compiler 'gcc' should
work for building the i386 kernel.
$ uname -m
x86_64
$ make allnoconfig bindeb-pkg >/dev/null
dpkg-architecture: warning: specified GNU system type i686-linux-gnu does not match CC system type x86_64-linux-gnu, try setting a correct CC environment variable
dpkg-source --before-build .
debian/rules binary
scripts/Kconfig.include:39: C compiler 'i686-linux-gnu-gcc' not found
make[6]: *** [scripts/kconfig/Makefile:77: olddefconfig] Error 1
make[5]: *** [Makefile:693: olddefconfig] Error 2
make[4]: *** [Makefile:358: __build_one_by_one] Error 2
make[3]: *** [debian/rules:7: build-arch] Error 2
dpkg-buildpackage: error: debian/rules binary subprocess returned exit status 2
make[2]: *** [scripts/Makefile.package:127: bindeb-pkg] Error 2
make[1]: *** [Makefile:1657: bindeb-pkg] Error 2
make: *** [Makefile:358: __build_one_by_one] Error 2
Check whether CROSS_COMPILE is defined, instead of whether it is non-empty.
If you invoke debian/rules via Kbuild, CROSS_COMPILE is always defined
in the top Makefile.
Fixes: 3ab18a625c ("kbuild: deb-pkg: improve the usability of source package")
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
KERNELRELEASE does not need to match the package version in changelog.
Rather, it conventially matches what is called 'ABINAME', which is a
part of the binary package names.
Both are the same by default, but the former might be overridden by
KDEB_PKGVERSION. In this case, the resulting package would not boot
because /lib/modules/$(uname -r) does not point the module directory.
Partially revert 3ab18a625c ("kbuild: deb-pkg: improve the usability
of source package").
Reported-by: Péter Ujfalusi <peter.ujfalusi@linux.intel.com>
Fixes: 3ab18a625c ("kbuild: deb-pkg: improve the usability of source package")
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Tested-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com>
Since commit c5bf2efb05 ("kbuild: deb-pkg: fix binary-arch and clean
in debian/rules"), the source package generated by 'make deb-pkg' fails
to build.
I terribly missed the fact that the intdeb-pkg target may regenerate
include/config/kernel.release due to the following in the top Makefile:
%pkg: include/config/kernel.release FORCE
Restore KERNELRELEASE= option to avoid the kernel.release disagreement
between build-arch and binary-arch.
Fixes: c5bf2efb05 ("kbuild: deb-pkg: fix binary-arch and clean in debian/rules")
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
That commit required the use of filechk_kernel.release for the
kernelrelease Makefile target. It is currently only being set when
KBUILD_EXTMOD is not set. Make sure it is set in that case as well.
Fixes: 1cb86b6c31 ("kbuild: save overridden KERNELRELEASE in include/config/kernel.release")
Signed-off-by: Tzafrir Cohen <nvidia@cohens.org.il>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Use DFS root session whenever possible to get new DFS referrals
otherwise we might end up with an IPC tcon (tcon->ses->tcon_ipc) that
doesn't respond to them. It should be safe accessing
@ses->dfs_root_ses directly in cifs_inval_name_dfs_link_error() as it
has same lifetime as of @tcon.
Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Cc: stable@vger.kernel.org # 6.2
Signed-off-by: Steve French <stfrench@microsoft.com>
Return the DFS root session id in /proc/fs/cifs/DebugData to make it
easier to track which IPC tcon was used to get new DFS referrals for a
specific connection, and aids in debugging.
A simple output of it would be
Sessions:
1) Address: 192.168.1.13 Uses: 1 Capability: 0x300067 Session Status: 1
Security type: RawNTLMSSP SessionId: 0xd80000000009
User: 0 Cred User: 0
DFS root session id: 0x128006c000035
Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Cc: stable@vger.kernel.org # 6.2
Signed-off-by: Steve French <stfrench@microsoft.com>
The getaffinity() system call uses 'cpumask_size()' to decide how big
the CPU mask is - so far so good. It is indeed the allocation size of a
cpumask.
But the code also assumes that the whole allocation is initialized
without actually doing so itself. That's wrong, because we might have
fixed-size allocations (making copying and clearing more efficient), but
not all of it is then necessarily used if 'nr_cpu_ids' is smaller.
Having checked other users of 'cpumask_size()', they all seem to be ok,
either using it purely for the allocation size, or explicitly zeroing
the cpumask before using the size in bytes to copy it.
See for example the ublk_ctrl_get_queue_affinity() function that uses
the proper 'zalloc_cpumask_var()' to make sure that the whole mask is
cleared, whether the storage is on the stack or if it was an external
allocation.
Fix this by just zeroing the allocation before using it. Do the same
for the compat version of sched_getaffinity(), which had the same logic.
Also, for consistency, make sched_getaffinity() use 'cpumask_bits()' to
access the bits. For a cpumask_var_t, it ends up being a pointer to the
same data either way, but it's just a good idea to treat it like you
would a 'cpumask_t'. The compat case already did that.
Reported-by: Ryan Roberts <ryan.roberts@arm.com>
Link: https://lore.kernel.org/lkml/7d026744-6bd6-6827-0471-b5e8eae0be3f@arm.com/
Cc: Yury Norov <yury.norov@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Set the DFS root session pointer earlier when creating a new SMB
session to prevent racing with smb2_reconnect(), cifs_reconnect_tcon()
and DFS cache refresher.
Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Cc: stable@vger.kernel.org # 6.2
Signed-off-by: Steve French <stfrench@microsoft.com>
When adapter is not found, pi->disconnect() is called without previous
pi->connect(). This results in error like this:
parport0: pata_parport tried to release parport when not owner
Add missing out_disconnect label and use it correctly.
Signed-off-by: Ondrej Zary <linux@zary.sk>
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
The 'q' parameter of the macro __blk_mq_run_dispatch_ops may not be one
local variable, such as, it is rq->q, then request queue pointed by
this variable could be changed to another queue in case of
BLK_MQ_F_TAG_QUEUE_SHARED after 'dispatch_ops' returns, then
'bad unlock balance' is triggered.
Fixes the issue by adding one local variable for doing srcu lock/unlock.
Fixes: 2a904d0085 ("blk-mq: remove hctx_lock and hctx_unlock")
Cc: Marco Patalano <mpatalan@redhat.com>
Signed-off-by: Chris Leech <cleech@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20230310010913.1014789-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
do_req_filebacked() calls blk_mq_complete_request() synchronously or
asynchronously when using asynchronous I/O unless memory allocation fails.
Hence, modify loop_handle_cmd() such that it does not dereference 'cmd' nor
'rq' after do_req_filebacked() finished unless we are sure that the request
has not yet been completed. This patch fixes the following kernel crash:
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000054
Call trace:
css_put.42938+0x1c/0x1ac
loop_process_work+0xc8c/0xfd4
loop_rootcg_workfn+0x24/0x34
process_one_work+0x244/0x558
worker_thread+0x400/0x8fc
kthread+0x16c/0x1e0
ret_from_fork+0x10/0x20
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Dan Schatzberg <schatzberg.dan@gmail.com>
Fixes: c74d40e8b5 ("loop: charge i/o to mem and blk cg")
Fixes: bc07c10a36 ("block: loop: support DIO & AIO")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20230314182155.80625-1-bvanassche@acm.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull misc fixes from Andrew Morton:
"Eleven hotfixes.
Four of these are cc:stable and the remainder address post-6.2 issues
or aren't considered suitable for backporting.
Seven of these fixes are for MM"
* tag 'mm-hotfixes-stable-2023-03-14-16-51' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mm/damon/paddr: fix folio_nr_pages() after folio_put() in damon_pa_mark_accessed_or_deactivate()
mm/damon/paddr: fix folio_size() call after folio_put() in damon_pa_young()
ocfs2: fix data corruption after failed write
migrate_pages: try migrate in batch asynchronously firstly
migrate_pages: move split folios processing out of migrate_pages_batch()
migrate_pages: fix deadlock in batched migration
.mailmap: add Alexandre Ghiti personal email address
mailmap: correct Dikshita Agarwal's Qualcomm email address
mailmap: updates for Jarkko Sakkinen
mm/userfaultfd: propagate uffd-wp bit when PTE-mapping the huge zeropage
mm: teach mincore_hugetlb about pte markers
Pull tracing fixes from Steven Rostedt:
- Do not allow histogram values to have modifies. They can cause a NULL
pointer dereference if they do.
- Warn if hist_field_name() is passed a NULL. Prevent the NULL pointer
dereference mentioned above.
- Fix invalid address look up race in lookup_rec()
- Define ftrace_stub_graph conditionally to prevent linker errors
- Always check if RCU is watching at all tracepoint locations
* tag 'trace-v6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: Make tracepoint lockdep check actually test something
ftrace,kcfi: Define ftrace_stub_graph conditionally
ftrace: Fix invalid address access in lookup_rec() when index is 0
tracing: Check field value in hist_field_name()
tracing: Do not let histogram values have some modifiers
Pull zstd fixes from Nick Terrell:
"A small number of fixes for zstd-v1.5.2.
I'm not pulling in zstd-v1.5.4 from upstream this release because it
didn't have any time to bake in linux-next, but I'm aiming for the
next update in v6.4"
* tag 'zstd-linus-v6.3-rc3' of https://github.com/terrelln/linux:
zstd: Fix definition of assert()
lib: zstd: Backport fix for in-place decompression
lib: zstd: Fix -Wstringop-overflow warning
Pull clk fixes from Stephen Boyd:
"A collection of clk driver fixes, and a couple OF clk patches to fix
regressions seen in the last few weeks. The fwnode patch broke the
build for one driver that isn't always compiled, so I waited over the
weekend to be certain no more build issues came up.
- Mark the firmware node (fwnode) that matches the compatible in
CLK_OF_DECLARE() as initialized to fix a regression on u8500 SoCs
after fw_devlink stopped checking parent nodes in
of_link_to_phandle()
- Remove a couple MODULE_LICENSE macros in non-modules
- Update the maintainers file for Microchip clk drivers
- Use 'select' instead of 'depend on' for the REGMAP config to fix
Kconfig issues
- Use div_u64() for portable 64-bit division in K210 clk driver"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: Avoid invalid function names in CLK_OF_DECLARE()
clk: k210: remove an implicit 64-bit division
MAINTAINERS: add missing clock driver coverage for Microchip FPGAs
clk: HI655X: select REGMAP instead of depending on it
kbuild, clk: remove MODULE_LICENSE in non-modules
kbuild, clk: bcm2835: remove MODULE_LICENSE in non-modules
clk: Mark a fwnode as initialized when using CLK_OF_DECLARE() macro
WQ_UNBOUND causes significant scheduler latency on ARM64/Android. This
is problematic for latency sensitive workloads, like I/O
post-processing.
Removing WQ_UNBOUND gives a 96% reduction in fsverity workqueue related
scheduler latency and improves app cold startup times by ~30ms.
WQ_UNBOUND was also removed from the dm-verity workqueue for the same
reason [1].
This code was tested by running Android app startup benchmarks and
measuring how long the fsverity workqueue spent in the runnable state.
Before
Total workqueue scheduler latency: 553800us
After
Total workqueue scheduler latency: 18962us
[1]: https://lore.kernel.org/all/20230202012348.885402-1-nhuck@google.com/
Signed-off-by: Nathan Huckleberry <nhuck@google.com>
Fixes: 8a1d0f9cac ("fs-verity: add data verification hooks for ->readpages()")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230310193325.620493-1-nhuck@google.com
Signed-off-by: Eric Biggers <ebiggers@google.com>
Before my changes to how multichannel reconnects work, the
primary channel was always used to do a non-binding session
setup. With my changes, that is not the case anymore.
Missed this place where channel at index 0 was forcibly
updated with the signing key.
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Cc: stable@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
If cifs_get_writable_path() finds a writable file, smb2_compound_op()
must use that file's FID and not the COMPOUND_FID.
Cc: stable@vger.kernel.org
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
A while ago where the trace events had the following:
rcu_read_lock_sched_notrace();
rcu_dereference_sched(...);
rcu_read_unlock_sched_notrace();
If the tracepoint is enabled, it could trigger RCU issues if called in
the wrong place. And this warning was only triggered if lockdep was
enabled. If the tracepoint was never enabled with lockdep, the bug would
not be caught. To handle this, the above sequence was done when lockdep
was enabled regardless if the tracepoint was enabled or not (although the
always enabled code really didn't do anything, it would still trigger a
warning).
But a lot has changed since that lockdep code was added. One is, that
sequence no longer triggers any warning. Another is, the tracepoint when
enabled doesn't even do that sequence anymore.
The main check we care about today is whether RCU is "watching" or not.
So if lockdep is enabled, always check if rcu_is_watching() which will
trigger a warning if it is not (tracepoints require RCU to be watching).
Note, that old sequence did add a bit of overhead when lockdep was enabled,
and with the latest kernel updates, would cause the system to slow down
enough to trigger kernel "stalled" warnings.
Link: http://lore.kernel.org/lkml/20140806181801.GA4605@redhat.com
Link: http://lore.kernel.org/lkml/20140807175204.C257CAC5@viggo.jf.intel.com
Link: https://lore.kernel.org/lkml/20230307184645.521db5c9@gandalf.local.home/
Link: https://lore.kernel.org/linux-trace-kernel/20230310172856.77406446@gandalf.local.home
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Fixes: e6753f23d9 ("tracepoint: Make rcuidle tracepoint callers use SRCU")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
The user provides arbitrary non-numeic value to level and type,
which could bring unexpected behavior. In this case the expected
behavior would be to throw an error.
pfrut -h
usage: pfrut [OPTIONS]
code injection:
-l, --load
-s, --stage
-a, --activate
-u, --update [stage and activate]
-q, --query
-d, --revid
update telemetry:
-G, --getloginfo
-T, --type(0:execution, 1:history)
-L, --level(0, 1, 2, 4)
-R, --read
-D, --revid log
pfrut -T A
pfrut -G
log_level:0
log_type:0
log_revid:2
max_data_size:65536
chunk1_size:0
chunk2_size:1530
rollover_cnt:0
reset_cnt:17
Fix this by restricting the input to be in the expected range.
Reported-by: Hariganesh Govindarajulu <hariganesh.govindarajulu@intel.com>
Suggested-by: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Commit 0c80f9e165 ("ACPI: PPTT: Leave the table mapped for the runtime usage")
enabled to map PPTT once on the first invocation of acpi_get_pptt() and
never unmapped the same allowing it to be used at runtime with out the
hassle of mapping and unmapping the table. This was needed to fetch LLC
information from the PPTT in the cpuhotplug path which is executed in
the atomic context as the acpi_get_table() might sleep waiting for a
mutex.
However it missed to handle the case when there is no PPTT on the system
which results in acpi_get_pptt() being called from all the secondary
CPUs attempting to fetch the LLC information in the atomic context
without knowing the absence of PPTT resulting in the splat like below:
| BUG: sleeping function called from invalid context at kernel/locking/semaphore.c:164
| in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 0, name: swapper/1
| preempt_count: 1, expected: 0
| RCU nest depth: 0, expected: 0
| no locks held by swapper/1/0.
| irq event stamp: 0
| hardirqs last enabled at (0): 0x0
| hardirqs last disabled at (0): copy_process+0x61c/0x1b40
| softirqs last enabled at (0): copy_process+0x61c/0x1b40
| softirqs last disabled at (0): 0x0
| CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.3.0-rc1 #1
| Call trace:
| dump_backtrace+0xac/0x138
| show_stack+0x30/0x48
| dump_stack_lvl+0x60/0xb0
| dump_stack+0x18/0x28
| __might_resched+0x160/0x270
| __might_sleep+0x58/0xb0
| down_timeout+0x34/0x98
| acpi_os_wait_semaphore+0x7c/0xc0
| acpi_ut_acquire_mutex+0x58/0x108
| acpi_get_table+0x40/0xe8
| acpi_get_pptt+0x48/0xa0
| acpi_get_cache_info+0x38/0x140
| init_cache_level+0xf4/0x118
| detect_cache_attributes+0x2e4/0x640
| update_siblings_masks+0x3c/0x330
| store_cpu_topology+0x88/0xf0
| secondary_start_kernel+0xd0/0x168
| __secondary_switched+0xb8/0xc0
Update acpi_get_pptt() to consider the fact that PPTT is once checked and
is not available on the system and return NULL avoiding any attempts to
fetch PPTT and thereby avoiding any possible sleep waiting for a mutex
in the atomic context.
Fixes: 0c80f9e165 ("ACPI: PPTT: Leave the table mapped for the runtime usage")
Reported-by: Aishwarya TCV <aishwarya.tcv@arm.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Tested-by: Pierre Gondois <pierre.gondois@arm.com>
Cc: 6.0+ <stable@vger.kernel.org> # 6.0+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
When the user's login time is newer than the cache's timestamp,
the original entry in the RB-tree will be replaced by a new entry.
Currently, the timestamp is only set if the entry is not found in
the RB-tree, which can cause the timestamp to be undefined when
the entry exists. This may result in a significant increase in
ACCESS operations if the timestamp is set to zero.
Signed-off-by: Chengen Du <chengen.du@canonical.com>
Fixes: 0eb43812c0 ("NFS: Clear the file access cache upon login”)
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Pull documentation fixes from Jonathan Corbet:
"A handful of fixes and minor documentation updates"
* tag 'docs-6.3-fixes' of git://git.lwn.net/linux:
docs: vfio: fix header path
docs: process: typo fix
docs/mm: hugetlbfs_reserv: fix a reference to a file that doesn't exist
docs/mm: Physical Memory: fix a reference to a file that doesn't exist
docs: rebasing-and-merging: Drop wrong statement about git
docs: programming-language: add Rust programming language section
docs: programming-language: remove mention of the Intel compiler
docs: Correct missing "d_" prefix for dentry_operations member d_weak_revalidate
sched/doc: supplement CPU capacity with RISC-V
Commit 6930bcbfb6 dropped the setting of the file_lock range when
decoding a nlm_lock off the wire. This causes the client side grant
callback to miss matching blocks and reject the lock, only to rerequest
it 30s later.
Add a helper function to set the file_lock range from the start and end
values that the protocol uses, and have the nlm_lock decoder call that to
set up the file_lock args properly.
Fixes: 6930bcbfb6 ("lockd: detect and reject lock arguments that overflow")
Reported-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Tested-by: Amir Goldstein <amir73il@gmail.com>
Cc: stable@vger.kernel.org #6.0
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
A regression has occurred in the hid-sensor code where a device
name string has not been initialized to 0, and ends up without
a NULL char and is printed with %s. This includes random binary
data in the device name, which makes its way into the ftrace output
and ends up crashing sleepgraph because it expects the ftrace output
to be ASCII only.
For example: "HID-SENSOR-INT-020b?.39.auto" ends up in ftrace instead
of "HID-SENSOR-INT-020b.39.auto". It causes this crash in sleepgraph:
File "/usr/bin/sleepgraph", line 5579, in executeSuspend
for line in fp:
File "/usr/lib/python3.10/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position
1568: invalid start byte
The issue is present in 6.3-rc1 and is described in full here:
https://bugzilla.kernel.org/show_bug.cgi?id=217169
A separate fix has been submitted to have this issue repaired, but
it has also exposed a larger bug in sleepgraph, since nothing should
make sleepgraph crash. Sleepgraph needs to be able to handle binary
data showing up in ftrace gracefully.
Modify the ftrace processing code to treat it as potentially binary
and to filter out binary data and leave just the ASCII.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217169
Fixes: 98c062e824 ("HID: hid-sensor-custom: Allow more custom iio sensors")
Signed-off-by: Todd Brandt <todd.e.brandt@linux.intel.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Commit 21a3e6eed4 ("ARM: omap1: remove osk-mistral add-on board
support") removed the platform_device definition for the "lcd_osk"
device, so this driver is now unused and can be removed as well.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Helge Deller <deller@gmx.de>
According to commit 890cc39a87 ("drivers: provide
devm_platform_get_and_ioremap_resource()"), convert
platform_get_resource(), devm_ioremap_resource() to a single
call to devm_platform_get_and_ioremap_resource(), as this is exactly
what this function does.
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Helge Deller <deller@gmx.de>
According to commit 7945f929f1 ("drivers: provide
devm_platform_ioremap_resource()"), convert platform_get_resource(),
devm_ioremap_resource() to a single call to Use
devm_platform_ioremap_resource(), as this is exactly what this function
does.
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Helge Deller <deller@gmx.de>
According to commit 890cc39a87 ("drivers: provide
devm_platform_get_and_ioremap_resource()"), convert
platform_get_resource(), devm_ioremap_resource() to a single
call to devm_platform_get_and_ioremap_resource(), as this is exactly
what this function does.
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Helge Deller <deller@gmx.de>
It is preferred to use typed property access functions (i.e.
of_property_read_<type> functions) rather than low-level
of_get_property/of_find_property functions for reading properties.
Convert reading boolean properties to to of_property_read_bool().
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Helge Deller <deller@gmx.de>
According to commit 890cc39a87 ("drivers: provide
devm_platform_get_and_ioremap_resource()"), convert
platform_get_resource(), devm_ioremap_resource() to a single call to
devm_platform_get_and_ioremap_resource(), as this is exactly what this
function does.
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Helge Deller <deller@gmx.de>
fb_set_var would by called when user invokes ioctl with cmd
FBIOPUT_VSCREENINFO. User-provided data would finally reach
tgafb_check_var. In case var->pixclock is assigned to zero,
divide by zero would occur when checking whether reciprocal
of var->pixclock is too high.
Similar crashes have happened in other fbdev drivers. There
is no check and modification on var->pixclock along the call
chain to tgafb_check_var. We believe it could also be triggered
in driver tgafb from user site.
Signed-off-by: Wei Chen <harperchen1110@gmail.com>
Signed-off-by: Helge Deller <deller@gmx.de>
This was triggered by the fact that the webpage:
http://www.winischhofer.net/linuxsisvga.shtml
cannot be reached anymore.
Thomas Winischhofer is still reachable at the given email address, but he
has not been active since 2005.
Mark the SIS FRAMEBUFFER DRIVER as orphan to reflect the current state.
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Acked-by: Thomas Zimmermann <tzimmermann@suse.de>
Signed-off-by: Helge Deller <deller@gmx.de>
This cleans up the indentation according to the Linux kernel coding
style, and should fix the warning created by the kernel test robot.
Fixes: 8b08cf2b64 ("OMAP: add TI OMAP framebuffer driver")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Lucy Mielke <mielkesteven@icloud.com>
Signed-off-by: Helge Deller <deller@gmx.de>
Prior to commit 8786fde842 ("Convert NFS from readpages to
readahead"), nfs_readpages() used the old mm interface read_cache_pages()
which called task_io_account_read() for each NFS page read. After
this commit, nfs_readpages() is converted to nfs_readahead(), which
now uses the new mm interface readahead_page(). The new interface
requires callers to call task_io_account_read() themselves.
In addition, to nfs_readahead() task_io_account_read() should also
be called from nfs_read_folio().
Fixes: 8786fde842 ("Convert NFS from readpages to readahead")
Link: https://lore.kernel.org/linux-nfs/CAPt2mGNEYUk5u8V4abe=5MM5msZqmvzCVrtCP4Qw1n=gCHCnww@mail.gmail.com/
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Read mmu_invalidate_seq before dropping the mmap_lock so that KVM can
detect if the results of vma_lookup() (e.g. vma_shift) become stale
before it acquires kvm->mmu_lock. This fixes a theoretical bug where a
VMA could be changed by userspace after vma_lookup() and before KVM
reads the mmu_invalidate_seq, causing KVM to install page table entries
based on a (possibly) no-longer-valid vma_shift.
Re-order the MMU cache top-up to earlier in user_mem_abort() so that it
is not done after KVM has read mmu_invalidate_seq (i.e. so as to avoid
inducing spurious fault retries).
This bug has existed since KVM/ARM's inception. It's unlikely that any
sane userspace currently modifies VMAs in such a way as to trigger this
race. And even with directed testing I was unable to reproduce it. But a
sufficiently motivated host userspace might be able to exploit this
race.
Fixes: 94f8e6418d ("KVM: ARM: Handle guest faults in KVM")
Cc: stable@vger.kernel.org
Reported-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230313235454.2964067-1-dmatlack@google.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Merge series from Bard Liao <yung-chuan.liao@linux.intel.com>:
Adding Intel 'Rooks County' NUC M15 support. To support 'Rooks County', we
also need the "soundwire: dmi-quirks: add remapping for Intel 'Rooks
County'" patch.
tuning_ctl_set() might have buffer overrun at (X) if it didn't break
from loop by matching (A).
static int tuning_ctl_set(...)
{
for (i = 0; i < TUNING_CTLS_COUNT; i++)
(A) if (nid == ca0132_tuning_ctls[i].nid)
break;
snd_hda_power_up(...);
(X) dspio_set_param(..., ca0132_tuning_ctls[i].mid, ...);
snd_hda_power_down(...); ^
return 1;
}
We will get below error by cppcheck
sound/pci/hda/patch_ca0132.c:4229:2: note: After for loop, i has value 12
for (i = 0; i < TUNING_CTLS_COUNT; i++)
^
sound/pci/hda/patch_ca0132.c:4234:43: note: Array index out of bounds
dspio_set_param(codec, ca0132_tuning_ctls[i].mid, 0x20,
^
This patch cares non match case.
Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Link: https://lore.kernel.org/r/87sfe9eap7.wl-kuninori.morimoto.gx@renesas.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
If no frames has been exchanged with a node for HSR_NODE_FORGET_TIME, the
node will be deleted from the node_db list. If a frame is sent to the node
after it is deleted, a netdev_err message for each slave interface is
produced. This should not happen with dan nodes because of supervision
frames, but can happen often with san nodes, which clutters the kernel
log. Since the hsr protocol does not support sans, this is only relevant
for the prp protocol.
Signed-off-by: Kristian Overskeid <koverskeid@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 26fed4ac4e ("block: flush plug based on hardware and software
queue order") changed flushing of plug list to submit requests one
device at a time. However while doing that it also started using
list_add_tail() instead of list_add() used previously thus effectively
submitting requests in reverse order. Also when forming a rq_list with
remaining requests (in case two or more devices are used), we
effectively reverse the ordering of the plug list for each device we
process. Submitting requests in reverse order has negative impact on
performance for rotational disks (when BFQ is not in use). We observe
10-25% regression in random 4k write throughput, as well as ~20%
regression in MariaDB OLTP benchmark on rotational storage on btrfs
filesystem.
Fix the problem by preserving ordering of the plug list when inserting
requests into the queuelist as well as by appending to requeue_list
instead of prepending to it.
Fixes: 26fed4ac4e ("block: flush plug based on hardware and software queue order")
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230313093002.11756-1-jack@suse.cz
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The VCN firmware loading path enables the indirect SRAM mode if it's
advertised as supported. We might have some cases of FW issues that
prevents this mode to working properly though, ending-up in a failed
probe. An example below, observed in the Steam Deck:
[...]
[drm] failed to load ucode VCN0_RAM(0x3A)
[drm] psp gfx command LOAD_IP_FW(0x6) failed and response status is (0xFFFF0000)
amdgpu 0000:04:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring vcn_dec_0 test failed (-110)
[drm:amdgpu_device_init.cold [amdgpu]] *ERROR* hw_init of IP block <vcn_v3_0> failed -110
amdgpu 0000:04:00.0: amdgpu: amdgpu_device_ip_init failed
amdgpu 0000:04:00.0: amdgpu: Fatal error during GPU init
[...]
Disabling the VCN block circumvents this, but it's a very invasive
workaround that turns off the entire feature. So, let's add a quirk
on VCN loading that checks for known problematic BIOSes on Vangogh,
so we can proactively disable the indirect SRAM mode and allow the
HW proper probe and VCN IP block to work fine.
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2385
Fixes: 82132ecc54 ("drm/amdgpu: enable Vangogh VCN indirect sram mode")
Cc: stable@vger.kernel.org
Cc: James Zhu <James.Zhu@amd.com>
Cc: Leo Liu <leo.liu@amd.com>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
When FB_DAMAGE_CLIPS are provided in a non-MPO scenario, the loop does
not use the counter i. This causes the fill_dc_dity_rect() to always
fill dirty_rects[0], causing graphical artifacts when a damage clip
aware DRM client sends more than 1 damage clip.
Instead, use the flip_addrs->dirty_rect_count which is incremented by
fill_dc_dirty_rect() on a successful fill.
Fixes: 30ebe41582 ("drm/amd/display: add FB_DAMAGE_CLIPS support")
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2453
Signed-off-by: Benjamin Cheng <ben@bcheng.me>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.1.x
Some amd asics having reliable hotplug support don't call
drm_kms_helper_poll_init in driver init sequence. However,
due to the unified suspend/resume path for all asics, because
the output_poll_work->func is not set for these asics, a warning
arrives when suspending.
[ 90.656049] <TASK>
[ 90.656050] ? console_unlock+0x4d/0x100
[ 90.656053] ? __irq_work_queue_local+0x27/0x60
[ 90.656056] ? irq_work_queue+0x2b/0x50
[ 90.656057] ? __wake_up_klogd+0x40/0x60
[ 90.656059] __cancel_work_timer+0xed/0x180
[ 90.656061] drm_kms_helper_poll_disable.cold+0x1f/0x2c [drm_kms_helper]
[ 90.656072] amdgpu_device_suspend+0x81/0x170 [amdgpu]
[ 90.656180] amdgpu_pmops_runtime_suspend+0xb5/0x1b0 [amdgpu]
[ 90.656269] pci_pm_runtime_suspend+0x61/0x1b0
drm_kms_helper_poll_enable/disable is valid when poll_init is called in
amdgpu code, which is only used in non DC path. So move such codes into
non-DC path code to get rid of such warnings.
v1: introduce use_kms_poll flag in amdgpu as the poll stuff check
v2: use dc_enabled as the flag to simply code
v3: move code into non DC path instead of relying on any flag
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2411
Fixes: a4e771729a ("drm/probe_helper: sort out poll_running vs poll_enabled")
Reported-by: Bert Karwatzki <spasswolf@web.de>
Suggested-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
[Why]
On resume some displays are not ready for HDCP, so they will fail if we
start the hdcp authentintication too soon.
Add a delay so that the displays can be ready before we start.
NOTE: Previoulsy this delay was set to 3 seconds but it was causing
issues with compliance, 2 seconds should enough for compliance and the
s3 resume case.
[How]
Change the Delay to 2 seconds.
Reviewed-by: Aurabindo Pillai <Aurabindo.Pillai@amd.com>
Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com>
Signed-off-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
kgd_mem pointers returned by kfd_process_device_translate_handle are
only guaranteed to be valid while p->mutex is held. As soon as the mutex
is unlocked, another thread can free the BO.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Always setup overdrive tables after resume. Preserve only some
user-defined settings in user_overdrive_table if they're set.
Copy restored user_overdrive_table into od_table to get correct
values.
On cold boot, BTC was triggered and GfxVfCurve was calibrated. We
got VfCurve settings (a). On resuming back, BTC will be triggered
again and GfxVfCurve will be recalibrated. VfCurve settings (b)
got may be different from those of cold boot. So if we reuse
those VfCurve settings (a) got on cold boot on suspend, we can
run into discrepencies.
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1897
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2276
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Błażej Szczygieł <mumei6102@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
During miration to vram prange->offset is valid after vram buffer is located,
either use old one or allocate a new one. Move svm_range_vram_node_new before
migrate for each vma to get valid prange->offset.
v2: squash in warning fix
Fixes: b4ee960637 ("drm/amdkfd: Fix BO offset for multi-VMA page migration")
Signed-off-by: Xiaogang Chen <Xiaogang.Chen@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
svm_migrate_ram_to_vram migrates a prange from sys ram to vram. The prange may
cross multiple vma. Need remember current dst vram offset in the TTM resource for
each migration.
v2: squash in warning fix (Alex)
Signed-off-by: Xiaogang Chen <Xiaogang.Chen@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The xen_shinfo_test started off with very few iterations, and the numbers
we used in GUEST_SYNC() were precisely mapped to the RUNSTATE_xxx values
anyway to start with.
It has since grown quite a few more tests, and it's kind of awful to be
handling them all as bare numbers. Especially when I want to add a new
test in the middle. Define an enum for the test stages, and use it both
in the guest code and the host switch statement.
No functional change, if I can count to 24.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230204024151.1373296-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add wrappers to do hypercalls using VMCALL/VMMCALL and Xen's register ABI
(as opposed to full Xen-style hypercalls through a hypervisor provided
page). Using the common helpers dedups a pile of code, and uses the
native hypercall instruction when running on AMD.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230204024151.1373296-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Extract the guts of kvm_hypercall() to a macro so that Xen hypercalls,
which have a different register ABI, can reuse the VMCALL vs. VMMCALL
logic.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230204024151.1373296-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
WARN if generating a GATag given a VM ID and vCPU ID doesn't yield the
same IDs when pulling the IDs back out of the tag. Don't bother adding
error handling to callers, this is very much a paranoid sanity check as
KVM fully controls the VM ID and is supposed to reject too-big vCPU IDs.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Tested-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Message-Id: <20230207002156.521736-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Define AVIC_VCPU_ID_MASK based on AVIC_PHYSICAL_MAX_INDEX, i.e. the mask
that effectively controls the largest guest physical APIC ID supported by
x2AVIC, instead of hardcoding the number of bits to 8 (and the number of
VM bits to 24).
The AVIC GATag is programmed into the AMD IOMMU IRTE to provide a
reference back to KVM in case the IOMMU cannot inject an interrupt into a
non-running vCPU. In such a case, the IOMMU notifies software by creating
a GALog entry with the corresponded GATag, and KVM then uses the GATag to
find the correct VM+vCPU to kick. Dropping bit 8 from the GATag results
in kicking the wrong vCPU when targeting vCPUs with x2APIC ID > 255.
Fixes: 4d1d7942e3 ("KVM: SVM: Introduce logic to (de)activate x2AVIC mode")
Cc: stable@vger.kernel.org
Reported-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Tested-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Message-Id: <20230207002156.521736-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Define the "physical table max index mask" as bits 8:0, not 9:0. x2AVIC
currently supports a max of 512 entries, i.e. the max index is 511, and
the inputs to GENMASK_ULL() are inclusive. The bug is benign as bit 9 is
reserved and never set by KVM, i.e. KVM is just clearing bits that are
guaranteed to be zero.
Note, as of this writing, APM "Rev. 3.39-October 2022" incorrectly states
that bits 11:8 are reserved in Table B-1. VMCB Layout, Control Area. I.e.
that table wasn't updated when x2AVIC support was added.
Opportunistically fix the comment for the max AVIC ID to align with the
code, and clean up comment formatting too.
Fixes: 4d1d7942e3 ("KVM: SVM: Introduce logic to (de)activate x2AVIC mode")
Cc: stable@vger.kernel.org
Cc: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Tested-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Message-Id: <20230207002156.521736-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Right now, if KVM memory stress tests are run with hugetlb sources but hugetlb is
not available (either in the kernel or because /proc/sys/vm/nr_hugepages is 0)
the test will fail with a memory allocation error.
This makes it impossible to add tests that default to hugetlb-backed memory,
because on a machine with a default configuration they will fail. Therefore,
check HugePages_Total as well and, if zero, direct the user to enable hugepages
in procfs. Furthermore, return KSFT_SKIP whenever hugetlb is not available.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
nested_vmx_check_controls() has already run by the time KVM checks host state,
so the "host address space size" exit control can only be set on x86-64 hosts.
Simplify the condition at the cost of adding some dead code to 32-bit kernels.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The effective values of the guest CR0 and CR4 registers may differ from
those included in the VMCS12. In particular, disabling EPT forces
CR4.PAE=1 and disabling unrestricted guest mode forces CR0.PG=CR0.PE=1.
Therefore, checks on these bits cannot be delegated to the processor
and must be performed by KVM.
Reported-by: Reima ISHII <ishiir@g.ecc.u-tokyo.ac.jp>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
KVM/arm64 fixes for 6.3, part #1
A single patch to address a rather annoying bug w.r.t. guest timer
offsetting. Effectively the synchronization of timer offsets between
vCPUs was broken, leading to inconsistent timer reads within the VM.
The hw->formats may be set by snd_dmaengine_pcm_refine_runtime_hwparams()
in component's startup()/open(), but soc_pcm_hw_init() will init
hw->formats in dpcm_runtime_setup_fe() after component's startup()/open(),
which causes the valuable hw->formats to be cleared.
So need to store the hw->formats before initialization, then restore
it after initialization.
Signed-off-by: Shengjiu Wang <shengjiu.wang@nxp.com>
Link: https://lore.kernel.org/r/1678346017-3660-1-git-send-email-shengjiu.wang@nxp.com
Signed-off-by: Mark Brown <broonie@kernel.org>
The RTAS work area allocator uses code that is built by
GENERIC_ALLOCATOR, so the PSERIES Kconfig should select the
required Kconfig symbol to fix multiple build errors.
powerpc64-linux-ld: arch/powerpc/platforms/pseries/rtas-work-area.o: in function `.rtas_work_area_allocator_init':
rtas-work-area.c:(.init.text+0x288): undefined reference to `.gen_pool_create'
powerpc64-linux-ld: rtas-work-area.c:(.init.text+0x2dc): undefined reference to `.gen_pool_set_algo'
powerpc64-linux-ld: rtas-work-area.c:(.init.text+0x310): undefined reference to `.gen_pool_add_owner'
powerpc64-linux-ld: rtas-work-area.c:(.init.text+0x43c): undefined reference to `.gen_pool_destroy'
powerpc64-linux-ld: arch/powerpc/platforms/pseries/rtas-work-area.o:(.toc+0x0): undefined reference to `gen_pool_first_fit_order_align'
powerpc64-linux-ld: arch/powerpc/platforms/pseries/rtas-work-area.o: in function `.__rtas_work_area_alloc':
rtas-work-area.c:(.ref.text+0x14c): undefined reference to `.gen_pool_alloc_algo_owner'
powerpc64-linux-ld: rtas-work-area.c:(.ref.text+0x238): undefined reference to `.gen_pool_alloc_algo_owner'
powerpc64-linux-ld: arch/powerpc/platforms/pseries/rtas-work-area.o: in function `.rtas_work_area_free':
rtas-work-area.c:(.ref.text+0x44c): undefined reference to `.gen_pool_free_owner'
Fixes: 43033bc62d ("powerpc/pseries: add RTAS work area allocator")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230223070116.660-2-rdunlap@infradead.org
Add them to the SoC .dtsi, so that not every board has to specify them.
Fixes: 1225396fef ("arm64: dts: imx93: add lpi2c nodes")
Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
If bus type is other than imx50_weim_devtype and have no child devices,
variable 'ret' in function weim_parse_dt() will not be initialized, but
will be used as branch condition and return value. Fix this by
initializing 'ret' with 0.
This was discovered with help of clang-analyzer, but the situation is
quite possible in real life.
Fixes: 52c47b6341 ("bus: imx-weim: improve error handling upon child probe-failure")
Signed-off-by: Ivan Bornyakov <i.bornyakov@metrotek.ru>
Cc: stable@vger.kernel.org
Reviewed-by: Fabio Estevam <festevam@gmail.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
usb@2184000: 'pinctrl-0' is a dependency of 'pinctrl-names'
Signed-off-by: Peng Fan <peng.fan@nxp.com>
Fixes: c100ea86e6 ("ARM: dts: add Netronix E60K02 board common file")
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
usb@2184000: 'pinctrl-0' is a dependency of 'pinctrl-names'
Signed-off-by: Peng Fan <peng.fan@nxp.com>
Fixes: 3bb3fd8565 ("ARM: dts: add Netronix E70K02 board common file")
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
'macirq' is supposed to be listed first. Also only 'snps,clk-csr' is
listed in the bindings while 'clk_csr' is only supported for legacy
reasons. See commit 83936ea8d8 ("net: stmmac: add a parse for new
property 'snps,clk-csr'")
Fixes: 1f4263ea6a ("arm64: dts: imx93: add eqos support")
Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Reviewed-by: Peng Fan <peng.fan@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
The 'axi' clock are the bus APB clock, the 'disp_axi' clock are the
pixel data AXI clock. The naming is confusing. Fix the clock order.
Fixes: 94e6197dad ("arm64: dts: imx8mp: Add LCDIF2 & LDB nodes")
Signed-off-by: Marek Vasut <marex@denx.de>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
The WM8960 Linux driver expects the clock to be named "mclk". Otherwise
the clock will be ignored and not prepared/enabled by the driver.
Fixes: 40ba2eda0a ("arm64: dts: imx8mm-nitrogen-r2: add audio")
Cc: <stable@vger.kernel.org>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
The deprecated property is named snps,reset-gpio, but this devicetree
used snps,reset-gpios instead which results in the reset not being used
and the following make dtbs_check error:
./arch/arm64/boot/dts/freescale/imx8dxl-evk.dtb: ethernet@5b050000: 'snps,reset-gpio' is a dependency of 'snps,reset-delays-us'
From schema: ./Documentation/devicetree/bindings/net/snps,dwmac.yaml
Use the preferred method of defining the reset gpio in the phy node
itself. Note that this drops the 10 us pre-delay, but prior this wasn't
used at all and a pre-delay doesn't make much sense in this context so
it should be fine.
Fixes: 8dd495d123 ("arm64: dts: freescale: add support for i.MX8DXL EVK board")
Signed-off-by: Andrew Halaney <ahalaney@redhat.com>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
A recent commit eliminated a hack that adjusted the offset used for
many GSI registers. It became possible because we now specify all
GSI register offsets explicitly for every version of IPA.
Unfortunately, a large number of register offsets were *not* updated
as they should have been in that commit. For IPA v4.5+, the offset
for every GSI register *except* the two inter-EE interrupt masking
registers were supposed to have been reduced by 0xd000.
Tested-by: Luca Weiss <luca.weiss@fairphone.com>
Tested-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> # SM8350-HDK
Fixes: 59b12b1d27 ("net: ipa: kill gsi->virt_raw")
Signed-off-by: Alex Elder <elder@linaro.org>
Link: https://lore.kernel.org/r/20230310193709.1477102-1-elder@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
As my testing on the MCM MT7530 switch on MT7621 SoC shows, setting the PLL
frequency does not affect MII modes other than trgmii on port 5 and port 6.
So the assumption is that the operation here called "setting the PLL
frequency" actually sets the frequency of the TRGMII TX clock.
Make it so that it and the rest of the trgmii setup run only when the
trgmii mode is used.
Tested rgmii and trgmii modes of port 6 on MCM MT7530 on MT7621AT Unielec
U7621-06 and standalone MT7530 on MT7623NI Bananapi BPI-R2.
Fixes: b8f126a8d5 ("net-next: dsa: add dsa support for Mediatek MT7530 switch")
Tested-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Link: https://lore.kernel.org/r/20230310073338.5836-2-arinc.unal@arinc9.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Previously we would divide total_left_rate by zero if num_vports
happened to be 1 because non_requested_count is calculated as
num_vports - req_count. Guard against this by validating num_vports at
the beginning and returning an error otherwise.
Found by Linux Verification Center (linuxtesting.org) with the SVACE
static analysis tool.
Fixes: bcd197c81f ("qed: Add vport WFQ configuration APIs")
Signed-off-by: Daniil Tatianin <d-tatianin@yandex-team.ru>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230309201556.191392-1-d-tatianin@yandex-team.ru
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When performing a stress test on SMC-R by rmmod mlx5_ib driver
during the wrk/nginx test, we found that there is a probability
of triggering a panic while terminating all link groups.
This issue dues to the race between smc_smcr_terminate_all()
and smc_buf_create().
smc_smcr_terminate_all
smc_buf_create
/* init */
conn->sndbuf_desc = NULL;
...
__smc_lgr_terminate
smc_conn_kill
smc_close_abort
smc_cdc_get_slot_and_msg_send
__softirqentry_text_start
smc_wr_tx_process_cqe
smc_cdc_tx_handler
READ(conn->sndbuf_desc->len);
/* panic dues to NULL sndbuf_desc */
conn->sndbuf_desc = xxx;
This patch tries to fix the issue by always to check the sndbuf_desc
before send any cdc msg, to make sure that no null pointer is
seen during cqe processing.
Fixes: 0b29ec6436 ("net/smc: immediate termination for SMCR link groups")
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>
Link: https://lore.kernel.org/r/1678263432-17329-1-git-send-email-alibuda@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When using a PHC in shared between multiple hosts, the previous
frequency value may not be reset and could lead to host being unable to
compensate the offset with timecounter adjustments. To avoid such state
reset the hardware frequency of PHC to zero on init. Some refactoring is
needed to make code readable.
Fixes: 85036aee19 ("bnxt_en: Add a non-real time mode to access NIC clock")
Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/20230310151356.678059-1-vadfed@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit 732ea9db9d ("efi: libstub: Move screen_info handling to common
code") reorganized the earlycon handling so that all architectures pass
the screen_info data via a EFI config table instead of populating struct
screen_info directly, as the latter is only possible when the EFI stub
is baked into the kernel (and not into the decompressor).
However, this means that struct screen_info may not have been populated
yet by the time the earlycon probe takes place, and this results in a
non-functional early console.
So let's probe again right after parsing the config tables and
populating struct screen_info. Note that this means that earlycon output
starts a bit later than before, and so it may fail to capture issues
that occur while doing the early EFI initialization.
Fixes: 732ea9db9d ("efi: libstub: Move screen_info handling to common code")
Reported-by: Shawn Guo <shawn.guo@linaro.org>
Tested-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Merge commit bf9bec4cb3 ("Merge branch 'bpf: Allow reads from uninit stack'")
from bpf-next to bpf tree to address verification issues in some programs
due to stack usage.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
slot_store() uses kstrtouint() to get a slot number, but stores the
result in an "int" variable (by casting a pointer).
This can result in a negative slot number if the unsigned int value is
very large.
A negative number means that the slot is empty, but setting a negative
slot number this way will not remove the device from the array. I don't
think this is a serious problem, but it could cause confusion and it is
best to fix it.
Reported-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Song Liu <song@kernel.org>
The current interconnect provider registration interface is inherently
racy as nodes are not added until the after adding the provider. This
can specifically cause racing DT lookups to trigger a NULL-pointer
deference when either a NULL pointer or not fully initialised node is
returned from exynos_generic_icc_xlate().
Switch to using the new API where the provider is not registered until
after it has been fully initialised.
Fixes: 2f95b9d5cf ("interconnect: Add generic interconnect driver for Exynos SoCs")
Cc: stable@vger.kernel.org # 5.11
Cc: Sylwester Nawrocki <s.nawrocki@samsung.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Link: https://lore.kernel.org/r/20230306075651.2449-16-johan+linaro@kernel.org
Signed-off-by: Georgi Djakov <djakov@kernel.org>
Prior to commit 5ee5465940 ("kconfig: change sym_change_count to a
boolean flag"), the conf_updated flag was set to the new value *before*
calling the callback. xconfig's save action depends on this behaviour,
because xconfig calls conf_get_changed() directly from the callback and
now sees the old value, thus never enabling the save button or the
shortcut.
Restore the previous behaviour.
Fixes: 5ee5465940 ("kconfig: change sym_change_count to a boolean flag")
Signed-off-by: Jurica Vukadin <jura@vukad.in>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Presently, when a guest writes 1 to PMCR_EL0.{C,P}, which is WO/RAZ,
KVM saves the register value, including these bits.
When userspace reads the register using KVM_GET_ONE_REG, KVM returns
the saved register value as it is (the saved value might have these
bits set). This could result in userspace setting these bits on the
destination during migration. Consequently, KVM may end up resetting
the vPMU counter registers (PMCCNTR_EL0 and/or PMEVCNTR<n>_EL0) to
zero on the first KVM_RUN after migration.
Fix this by not saving those bits when a guest writes 1 to those bits.
Fixes: ab9468340d ("arm64: KVM: Add access handler for PMCR register")
Cc: stable@vger.kernel.org
Reviewed-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Reiji Watanabe <reijiw@google.com>
Link: https://lore.kernel.org/r/20230313033234.1475987-1-reijiw@google.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Have KVM_GET_ONE_REG for vPMU counter (vPMC) registers (PMCCNTR_EL0
and PMEVCNTR<n>_EL0) return the sum of the register value in the sysreg
file and the current perf event counter value.
Values of vPMC registers are saved in sysreg files on certain occasions.
These saved values don't represent the current values of the vPMC
registers if the perf events for the vPMCs count events after the save.
The current values of those registers are the sum of the sysreg file
value and the current perf event counter value. But, when userspace
reads those registers (using KVM_GET_ONE_REG), KVM returns the sysreg
file value to userspace (not the sum value).
Fix this to return the sum value for KVM_GET_ONE_REG.
Fixes: 051ff581ce ("arm64: KVM: Add access handler for event counter register")
Cc: stable@vger.kernel.org
Reviewed-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Reiji Watanabe <reijiw@google.com>
Link: https://lore.kernel.org/r/20230313033208.1475499-1-reijiw@google.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
If md_run() fails after ->active_io is initialized, then percpu_ref_exit
is called in error path. However, later md_free_disk will call
percpu_ref_exit again which leads to a panic because of null pointer
dereference. It can also trigger this bug when resources are initialized
but are freed in error path, then will be freed again in md_free_disk.
BUG: kernel NULL pointer dereference, address: 0000000000000038
Oops: 0000 [#1] PREEMPT SMP
Workqueue: md_misc mddev_delayed_delete
RIP: 0010:free_percpu+0x110/0x630
Call Trace:
<TASK>
__percpu_ref_exit+0x44/0x70
percpu_ref_exit+0x16/0x90
md_free_disk+0x2f/0x80
disk_release+0x101/0x180
device_release+0x84/0x110
kobject_put+0x12a/0x380
kobject_put+0x160/0x380
mddev_delayed_delete+0x19/0x30
process_one_work+0x269/0x680
worker_thread+0x266/0x640
kthread+0x151/0x1b0
ret_from_fork+0x1f/0x30
For creating raid device, md raid calls do_md_run->md_run, dm raid calls
md_run. We alloc those memory in md_run. For stopping raid device, md raid
calls do_md_stop->__md_stop, dm raid calls md_stop->__md_stop. So we can
free those memory resources in __md_stop.
Fixes: 72adae23a7 ("md: Change active_io to percpu")
Reported-and-tested-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Song Liu <song@kernel.org>
Pull virtio fixes from Michael Tsirkin:
"Some virtio / vhost / vdpa fixes accumulated so far"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
tools/virtio: Ignore virtio-trace/trace-agent
vdpa_sim: set last_used_idx as last_avail_idx in vdpasim_queue_ready
vhost-vdpa: free iommu domain after last use during cleanup
vdpa/mlx5: should not activate virtq object when suspended
vp_vdpa: fix the crash in hot unplug with vp_vdpa
The Silicon Labs IFS-USB-DATACABLE is used in conjunction with for example
the Quint UPSes. It is used to enable Modbus communication with the UPS to
query configuration, power and battery status.
Signed-off-by: Kees Jan Koster <kjkoster@kjkoster.org>
Cc: stable@vger.kernel.org
Signed-off-by: Johan Hovold <johan@kernel.org>
A potentially malicious SEV guest can constantly hammer the hypervisor
using this driver to send down requests and thus prevent or at least
considerably hinder other guests from issuing requests to the secure
processor which is a shared platform resource.
Therefore, the host is permitted and encouraged to throttle such guest
requests.
Add the capability to handle the case when the hypervisor throttles
excessive numbers of requests issued by the guest. Otherwise, the VM
platform communication key will be disabled, preventing the guest from
attesting itself.
Realistically speaking, a well-behaved guest should not even care about
throttling. During its lifetime, it would end up issuing a handful of
requests which the hardware can easily handle.
This is more to address the case of a malicious guest. Such guest should
get throttled and if its VMPCK gets disabled, then that's its own
wrongdoing and perhaps that guest even deserves it.
To the implementation: the hypervisor signals with SNP_GUEST_REQ_ERR_BUSY
that the guest requests should be throttled. That error code is returned
in the upper 32-bit half of exitinfo2 and this is part of the GHCB spec
v2.
So the guest is given a throttling period of 1 minute in which it
retries the request every 2 seconds. This is a good default but if it
turns out to not pan out in practice, it can be tweaked later.
For safety, since the encryption algorithm in GHCBv2 is AES_GCM, control
must remain in the kernel to complete the request with the current
sequence number. Returning without finishing the request allows the
guest to make another request but with different message contents. This
is IV reuse, and breaks cryptographic protections.
[ bp:
- Rewrite commit message and do a simplified version.
- The stable tags are supposed to denote that a cleanup should go
upfront before backporting this so that any future fixes to this
can preserve the sanity of the backporter(s). ]
Fixes: d5af44dde5 ("x86/sev: Provide support for SNP guest request NAEs")
Signed-off-by: Dionna Glaze <dionnaglaze@google.com>
Co-developed-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Cc: <stable@kernel.org> # d6fd48eff7 ("virt/coco/sev-guest: Check SEV_SNP attribute at probe time")
Cc: <stable@kernel.org> # 970ab82374 (" virt/coco/sev-guest: Simplify extended guest request handling")
Cc: <stable@kernel.org> # c5a338274b ("virt/coco/sev-guest: Remove the disable_vmpck label in handle_guest_request()")
Cc: <stable@kernel.org> # 0fdb6cc7c8 ("virt/coco/sev-guest: Carve out the request issuing logic into a helper")
Cc: <stable@kernel.org> # d25bae7dc7 ("virt/coco/sev-guest: Do some code style cleanups")
Cc: <stable@kernel.org> # fa4ae42cc6 ("virt/coco/sev-guest: Convert the sw_exit_info_2 checking to a switch-case")
Link: https://lore.kernel.org/r/20230214164638.1189804-2-dionnaglaze@google.com
snp_issue_guest_request() checks the value returned by the hypervisor in
sw_exit_info_2 and returns a different error depending on it.
Convert those checks into a switch-case to make it more readable when
more error values are going to be checked in the future.
No functional changes.
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/20230307192449.24732-8-bp@alien8.de
Return a specific error code - -ENOSPC - to signal the too small cert
data buffer instead of checking exit code and exitinfo2.
While at it, hoist the *fw_err assignment in snp_issue_guest_request()
so that a proper error value is returned to the callers.
[ Tom: check override_err instead of err. ]
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230307192449.24732-4-bp@alien8.de
When tunneling aggregated USB3 (20 Gb/s) the bandwidth values that are
programmed to the ADP_USB3_CS_2 go higher than 4096 and that does not
fit anymore to the 12-bit field. Fix this by scaling the value using
the scale field accordingly.
Fixes: 3b1d8d577c ("thunderbolt: Implement USB3 bandwidth negotiation routines")
Cc: stable@vger.kernel.org
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Current Intel USB4 host routers have hardware limitation that the USB3
bandwidth cannot go higher than 16376 Mb/s. Work this around by adding a
new quirk that limits the bandwidth for the affected host routers.
Cc: stable@vger.kernel.org
Signed-off-by: Gil Fine <gil.fine@linux.intel.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
In order to apply quirks based on certain adapter types move call to
tb_check_quirks() happen after the adapters are initialized. This should
not affect the existing quirks.
Cc: stable@vger.kernel.org
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
According to USB4 retimer specification, the process of firmware update
sequence requires issuing a SET_INBOUND_SBTX port operation that later
shall be followed by UNSET_INBOUND_SBTX port operation. This last step
is not currently issued by the driver but it is necessary to make sure
the retimers are put back to passthrough mode even during enumeration.
If this step is missing the link may not come up properly after
soft-reboot for example.
For this reason issue UNSET_INBOUND_SBTX after SET_INBOUND_SBTX for
enumeration and also when the NVM upgrade is run.
Reported-by: Christian Schaubschläger <christian.schaubschlaeger@gmx.at>
Link: https://lore.kernel.org/linux-usb/b556f5ed-5ee8-9990-9910-afd60db93310@gmx.at/
Cc: stable@vger.kernel.org
Signed-off-by: Gil Fine <gil.fine@linux.intel.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Memory for the usb4->margining needs to be relased for the upstream port
of the router as well, even though the debugfs directory gets released
with the router device removal. Fix this.
Fixes: d0f1e0c2a6 ("thunderbolt: Add support for receiver lane margining")
Cc: stable@vger.kernel.org
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Users reported oopses on list corruptions when using i915 perf with a
number of concurrently running graphics applications. Root cause analysis
pointed at an issue in barrier processing code -- a race among perf open /
close replacing active barriers with perf requests on kernel context and
concurrent barrier preallocate / acquire operations performed during user
context first pin / last unpin.
When adding a request to a composite tracker, we try to reuse an existing
fence tracker, already allocated and registered with that composite. The
tracker we obtain may already track another fence, may be an idle barrier,
or an active barrier.
If the tracker we get occurs a non-idle barrier then we try to delete that
barrier from a list of barrier tasks it belongs to. However, while doing
that we don't respect return value from a function that performs the
barrier deletion. Should the deletion ever fail, we would end up reusing
the tracker still registered as a barrier task. Since the same structure
field is reused with both fence callback lists and barrier tasks list,
list corruptions would likely occur.
Barriers are now deleted from a barrier tasks list by temporarily removing
the list content, traversing that content with skip over the node to be
deleted, then populating the list back with the modified content. Should
that intentionally racy concurrent deletion attempts be not serialized,
one or more of those may fail because of the list being temporary empty.
Related code that ignores the results of barrier deletion was initially
introduced in v5.4 by commit d8af05ff38 ("drm/i915: Allow sharing the
idle-barrier from other kernel requests"). However, all users of the
barrier deletion routine were apparently serialized at that time, then the
issue didn't exhibit itself. Results of git bisect with help of a newly
developed igt@gem_barrier_race@remote-request IGT test indicate that list
corruptions might start to appear after commit 311770173f ("drm/i915/gt:
Schedule request retirement when timeline idles"), introduced in v5.5.
Respect results of barrier deletion attempts -- mark the barrier as idle
only if successfully deleted from the list. Then, before proceeding with
setting our fence as the one currently tracked, make sure that the tracker
we've got is not a non-idle barrier. If that check fails then don't use
that tracker but go back and try to acquire a new, usable one.
v3: use unlikely() to document what outcome we expect (Andi),
- fix bad grammar in commit description.
v2: no code changes,
- blame commit 311770173f ("drm/i915/gt: Schedule request retirement
when timeline idles"), v5.5, not commit d8af05ff38 ("drm/i915: Allow
sharing the idle-barrier from other kernel requests"), v5.4,
- reword commit description.
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/6333
Fixes: 311770173f ("drm/i915/gt: Schedule request retirement when timeline idles")
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@vger.kernel.org # v5.5
Cc: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230302120820.48740-1-janusz.krzysztofik@linux.intel.com
(cherry picked from commit 5060060557)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
It seems that commit bc3c5e0809 ("drm/i915/sseu: Don't try to store EU
mask internally in UAPI format") exposed a potential out-of-bounds
access, reported by UBSAN as following on a laptop with a gen 11 i915
card:
UBSAN: array-index-out-of-bounds in drivers/gpu/drm/i915/gt/intel_sseu.c:65:27
index 6 is out of range for type 'u16 [6]'
CPU: 2 PID: 165 Comm: systemd-udevd Not tainted 6.2.0-9-generic #9-Ubuntu
Hardware name: Dell Inc. XPS 13 9300/077Y9N, BIOS 1.11.0 03/22/2022
Call Trace:
<TASK>
show_stack+0x4e/0x61
dump_stack_lvl+0x4a/0x6f
dump_stack+0x10/0x18
ubsan_epilogue+0x9/0x3a
__ubsan_handle_out_of_bounds.cold+0x42/0x47
gen11_compute_sseu_info+0x121/0x130 [i915]
intel_sseu_info_init+0x15d/0x2b0 [i915]
intel_gt_init_mmio+0x23/0x40 [i915]
i915_driver_mmio_probe+0x129/0x400 [i915]
? intel_gt_probe_all+0x91/0x2e0 [i915]
i915_driver_probe+0xe1/0x3f0 [i915]
? drm_privacy_screen_get+0x16d/0x190 [drm]
? acpi_dev_found+0x64/0x80
i915_pci_probe+0xac/0x1b0 [i915]
...
According to the definition of sseu_dev_info, eu_mask->hsw is limited to
a maximum of GEN_MAX_SS_PER_HSW_SLICE (6) sub-slices, but
gen11_sseu_info_init() can potentially set 8 sub-slices, in the
!IS_JSL_EHL(gt->i915) case.
Fix this by reserving up to 8 slots for max_subslices in the eu_mask
struct.
Reported-by: Emil Renner Berthing <emil.renner.berthing@canonical.com>
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Fixes: bc3c5e0809 ("drm/i915/sseu: Don't try to store EU mask internally in UAPI format")
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230220171858.131416-1-andrea.righi@canonical.com
(cherry picked from commit 3cba09a6ac)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
On s390 PCI functions may be hotplugged individually even when they
belong to a multi-function device. In particular on an SR-IOV device VFs
may be removed and later re-added.
In commit a50297cf82 ("s390/pci: separate zbus creation from
scanning") it was missed however that struct pci_bus and struct
zpci_bus's resource list retained a reference to the PCI functions MMIO
resources even though those resources are released and freed on
hot-unplug. These stale resources may subsequently be claimed when the
PCI function re-appears resulting in use-after-free.
One idea of fixing this use-after-free in s390 specific code that was
investigated was to simply keep resources around from the moment a PCI
function first appeared until the whole virtual PCI bus created for
a multi-function device disappears. The problem with this however is
that due to the requirement of artificial MMIO addreesses (address
cookies) extra logic is then needed to keep the address cookies
compatible on re-plug. At the same time the MMIO resources semantically
belong to the PCI function so tying their lifecycle to the function
seems more logical.
Instead a simpler approach is to remove the resources of an individually
hot-unplugged PCI function from the PCI bus's resource list while
keeping the resources of other PCI functions on the PCI bus untouched.
This is done by introducing pci_bus_remove_resource() to remove an
individual resource. Similarly the resource also needs to be removed
from the struct zpci_bus's resource list. It turns out however, that
there is really no need to add the MMIO resources to the struct
zpci_bus's resource list at all and instead we can simply use the
zpci_bar_struct's resource pointer directly.
Fixes: a50297cf82 ("s390/pci: separate zbus creation from scanning")
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/r/20230306151014.60913-2-schnelle@linux.ibm.com
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
The code which handles the ipl report is searching for a free location
in memory where it could copy the component and certificate entries to.
It checks for intersection between the sections required for the kernel
and the component/certificate data area, but fails to check whether
the data structures linking these data areas together intersect.
This might cause the iplreport copy code to overwrite the iplreport
itself. Fix this by adding two addtional intersection checks.
Cc: <stable@vger.kernel.org>
Fixes: 9641b8cc73 ("s390/ipl: read IPL report at early boot")
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Starting from an used_idx different than 0 is needed in use cases like
virtual machine migration. Not doing so and letting the caller set an
avail idx different than 0 causes destination device to try to use old
buffers that source driver already recover and are not available
anymore.
Since vdpa_sim does not support receive inflight descriptors as a
destination of a migration, let's set both avail_idx and used_idx the
same at vq start. This is how vhost-user works in a
VHOST_SET_VRING_BASE call.
Although the simple fix is to set last_used_idx at vdpasim_set_vq_state,
it would be reset at vdpasim_queue_ready. The last_avail_idx case is
fixed with commit 0e84f918fa ("vdpa_sim: not reset state in
vdpasim_queue_ready"). Since the only option is to make it equal to
last_avail_idx, adding the only change needed here.
This was discovered and tested live migrating the vdpa_sim_net device.
Fixes: 2c53d0f64c ("vdpasim: vDPA device simulator")
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Message-Id: <20230302181857.925374-1-eperezma@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Currently vhost_vdpa_cleanup() unmaps the DMA mappings by calling
`iommu_unmap(v->domain, map->start, map->size);`
from vhost_vdpa_general_unmap() when the parent vDPA driver doesn't
provide DMA config operations.
However, the IOMMU domain referred to by `v->domain` is freed in
vhost_vdpa_free_domain() before vhost_vdpa_cleanup() in
vhost_vdpa_release() which results in NULL pointer de-reference.
Accordingly, moving the call to vhost_vdpa_free_domain() in
vhost_vdpa_cleanup() would makes sense. This will also help
detaching the dma device in error handling of vhost_vdpa_alloc_domain().
This issue was observed on terminating QEMU with SIGQUIT.
Fixes: 037d430556 ("vhost-vdpa: call vhost_vdpa_cleanup during the release")
Signed-off-by: Gautam Dawar <gautam.dawar@amd.com>
Message-Id: <20230301163203.29883-1-gautam.dawar@amd.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Pull tpm fixes from Jarkko Sakkinen:
"Two additional bug fixes for v6.3"
* tag 'tpm-v6.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
tpm: disable hwrng for fTPM on some AMD designs
tpm/eventlog: Don't abort tpm_read_log on faulty ACPI address
In da9150_charger_probe, &charger->otg_work is bound with
da9150_charger_otg_work. da9150_charger_otg_ncb may be
called to start the work.
If we remove the module which will call da9150_charger_remove
to make cleanup, there may be a unfinished work. The possible
sequence is as follows:
Fix it by canceling the work before cleanup in the da9150_charger_remove
CPU0 CPUc1
|da9150_charger_otg_work
da9150_charger_remove |
power_supply_unregister |
device_unregister |
power_supply_dev_release|
kfree(psy) |
|
| power_supply_changed(charger->usb);
| //use
Fixes: c1a281e34d ("power: Add support for DA9150 Charger")
Signed-off-by: Zheng Wang <zyytlz.wz@163.com>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
AMD has issued an advisory indicating that having fTPM enabled in
BIOS can cause "stuttering" in the OS. This issue has been fixed
in newer versions of the fTPM firmware, but it's up to system
designers to decide whether to distribute it.
This issue has existed for a while, but is more prevalent starting
with kernel 6.1 because commit b006c439d5 ("hwrng: core - start
hwrng kthread also for untrusted sources") started to use the fTPM
for hwrng by default. However, all uses of /dev/hwrng result in
unacceptable stuttering.
So, simply disable registration of the defective hwrng when detecting
these faulty fTPM versions. As this is caused by faulty firmware, it
is plausible that such a problem could also be reproduced by other TPM
interactions, but this hasn't been shown by any user's testing or reports.
It is hypothesized to be triggered more frequently by the use of the RNG
because userspace software will fetch random numbers regularly.
Intentionally continue to register other TPM functionality so that users
that rely upon PCR measurements or any storage of data will still have
access to it. If it's found later that another TPM functionality is
exacerbating this problem a module parameter it can be turned off entirely
and a module parameter can be introduced to allow users who rely upon
fTPM functionality to turn it on even though this problem is present.
Link: https://www.amd.com/en/support/kb/faq/pa-410
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216989
Link: https://lore.kernel.org/all/20230209153120.261904-1-Jason@zx2c4.com/
Fixes: b006c439d5 ("hwrng: core - start hwrng kthread also for untrusted sources")
Cc: stable@vger.kernel.org
Cc: Jarkko Sakkinen <jarkko@kernel.org>
Cc: Thorsten Leemhuis <regressions@leemhuis.info>
Cc: James Bottomley <James.Bottomley@hansenpartnership.com>
Tested-by: reach622@mailcuk.com
Tested-by: Bell <1138267643@qq.com>
Co-developed-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
The driver will match mostly by DT table (even thought there is regular
ID table) so there is little benefit in of_match_ptr (this also allows
ACPI matching via PRP0001, even though it might not be relevant here).
This also fixes !CONFIG_OF error:
drivers/hwmon/tmp513.c:610:34: error: ‘tmp51x_of_match’ defined but not used [-Werror=unused-const-variable=]
Fixes: 59dfa75e5d ("hwmon: Add driver for Texas Instruments TMP512/513 sensor chips.")
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20230312193723.478032-2-krzysztof.kozlowski@linaro.org
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
When probing the ucd90320 access to some of the registers randomly fails.
Sometimes it NACKs a transfer, sometimes it returns just random data and
the PEC check fails.
Experimentation shows that this seems to be triggered by a register access
directly back to back with a previous register write. Experimentation also
shows that inserting a small delay after register writes makes the issue go
away.
Use a similar solution to what the max15301 driver does to solve the same
problem. Create a custom set of bus read and write functions that make sure
that the delay is added.
Fixes: a470f11c5b ("hwmon: (pmbus/ucd9000) Add support for UCD90320 Power Sequencer")
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Link: https://lore.kernel.org/r/20230312160312.2227405-1-lars@metafoo.de
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
A recent change introduced a flag to queue up errors found during
boot-time polling. These errors will be processed during late init once
the MCE subsystem is fully set up.
A number of sysfs updates call mce_restart() which goes through a subset
of the CPU init flow. This includes polling MCA banks and logging any
errors found. Since the same function is used as boot-time polling,
errors will be queued. However, the system is now past late init, so the
errors will remain queued until another error is found and the workqueue
is triggered.
Call mce_schedule_work() at the end of mce_restart() so that queued
errors are processed.
Fixes: 3bff147b18 ("x86/mce: Defer processing of early errors")
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230301221420.2203184-1-yazen.ghannam@amd.com
In xgene_hwmon_probe, &ctx->workq is bound with xgene_hwmon_evt_work.
Then it will be started.
If we remove the driver which will call xgene_hwmon_remove to clean up,
there may be unfinished work.
The possible sequence is as follows:
Fix it by finishing the work before cleanup in xgene_hwmon_remove.
CPU0 CPU1
|xgene_hwmon_evt_work
xgene_hwmon_remove |
kfifo_free(&ctx->async_msg_fifo);|
|
|kfifo_out_spinlocked
|//use &ctx->async_msg_fifo
Fixes: 2ca492e22c ("hwmon: (xgene) Fix crash when alarm occurs before driver probe")
Signed-off-by: Zheng Wang <zyytlz.wz@163.com>
Link: https://lore.kernel.org/r/20230310084007.1403388-1-zyytlz.wz@163.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Pull xfs fixes from Darrick Wong:
- Fix a crash if mount time quotacheck fails when there are inodes
queued for garbage collection.
- Fix an off by one error when discarding folios after writeback
failure.
* tag 'xfs-6.3-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: fix off-by-one-block in xfs_discard_folio()
xfs: quotacheck failure can race with background inode inactivation
Pull staging driver fixes and removal from Greg KH:
"Here are four small staging driver fixes, and one big staging driver
deletion for 6.3-rc2.
The fixes are:
- rtl8192e driver fixes for where the driver was attempting to
execute various programs directly from the disk for unknown reasons
- rtl8723bs driver fixes for issues found by Hans in testing
The deleted driver is the removal of the r8188eu wireless driver as
now in 6.3-rc1 we have a "real" wifi driver for one that includes
support for many many more devices than this old driver did. So it's
time to remove it as it is no longer needed. The maintainers of this
driver all have acked its removal. Many thanks to them over the years
for working to clean it up and keep it working while the real driver
was being developed.
All of these have been in linux-next this week with no reported
problems"
* tag 'staging-6.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: r8188eu: delete driver
staging: rtl8723bs: Pass correct parameters to cfg80211_get_bss()
staging: rtl8723bs: Fix key-store index handling
staging: rtl8192e: Remove call_usermodehelper starting RadioPower.sh
staging: rtl8192e: Remove function ..dm_check_ac_dc_power calling a script
Pull x86 fix from Borislav Petkov:
"A single erratum fix for AMD machines:
- Disable XSAVES on AMD Zen1 and Zen2 machines due to an erratum. No
impact to anything as those machines will fallback to XSAVEC which
is equivalent there"
* tag 'x86_urgent_for_v6.3_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/CPU/AMD: Disable XSAVES on AMD family 0x17
Pull clone3 fix from Christian Brauner:
"A simple fix for the clone3() system call.
The CLONE_NEWTIME allows the creation of time namespaces. The flag
reuses a bit from the CSIGNAL bits that are used in the legacy clone()
system call to set the signal that gets sent to the parent after the
child exits.
The clone3() system call doesn't rely on CSIGNAL anymore as it uses a
dedicated .exit_signal field in struct clone_args. So we blocked all
CSIGNAL bits in clone3_args_valid(). When CLONE_NEWTIME was introduced
and reused a CSIGNAL bit we forgot to adapt clone3_args_valid()
causing CLONE_NEWTIME with clone3() to be rejected. Fix this"
* tag 'kernel.fork.v6.3-rc2' of gitolite.kernel.org:pub/scm/linux/kernel/git/brauner/linux:
selftests/clone3: test clone3 with CLONE_NEWTIME
fork: allow CLONE_NEWTIME in clone3 flags
Pull vfs fixes from Christian Brauner:
- When allocating pages for a watch queue failed, we didn't return an
error causing userspace to proceed even though all subsequent
notifcations would be lost. Make sure to return an error.
- Fix a misformed tree entry for the idmapping maintainers entry.
- When setting file leases from an idmapped mount via
generic_setlease() we need to take the idmapping into account
otherwise taking a lease would fail from an idmapped mount.
- Remove two redundant assignments, one in splice code and the other in
locks code, that static checkers complained about.
* tag 'vfs.misc.v6.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping:
filelocks: use mount idmapping for setlease permission check
fs/locks: Remove redundant assignment to cmd
splice: Remove redundant assignment to ret
MAINTAINERS: repair a malformed T: entry in IDMAPPED MOUNTS
watch_queue: fix IOC_WATCH_QUEUE_SET_SIZE alloc error paths
Pull ext4 fixes from Ted Ts'o:
"Bug fixes and regressions for ext4, the most serious of which is a
potential deadlock during directory renames that was introduced during
the merge window discovered by a combination of syzbot and lockdep"
* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: zero i_disksize when initializing the bootloader inode
ext4: make sure fs error flag setted before clear journal error
ext4: commit super block if fs record error when journal record without error
ext4, jbd2: add an optimized bmap for the journal inode
ext4: fix WARNING in ext4_update_inline_data
ext4: move where set the MAY_INLINE_DATA flag is set
ext4: Fix deadlock during directory rename
ext4: Fix comment about the 64BIT feature
docs: ext4: modify the group desc size to 64
ext4: fix another off-by-one fsmap error on 1k block filesystems
ext4: fix RENAME_WHITEOUT handling for inline directories
ext4: make kobj_type structures constant
ext4: fix cgroup writeback accounting with fs-layer encryption
The cpumask_check() was unnecessarily tight, and causes problems for the
users of cpumask_next().
We have a number of users that take the previous return value of one of
the bit scanning functions and subtract one to keep it in "range". But
since the scanning functions end up returning up to 'small_cpumask_bits'
instead of the tighter 'nr_cpumask_bits', the range really needs to be
using that widened form.
[ This "previous-1" behavior is also the reason we have all those
comments about /* -1 is a legal arg here. */ and separate checks for
that being ok. So we could have just made "small_cpumask_bits-1"
be a similar special "don't check this" value.
Tetsuo Handa even suggested a patch that only does that for
cpumask_next(), since that seems to be the only actual case that
triggers, but that all makes it even _more_ magical and special. So
just relax the check ]
One example of this kind of pattern being the 'c_start()' function in
arch/x86/kernel/cpu/proc.c, but also duplicated in various forms on
other architectures.
Reported-by: syzbot+96cae094d90877641f32@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?extid=96cae094d90877641f32
Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Link: https://lore.kernel.org/lkml/c1f4cc16-feea-b83c-82cf-1a1f007b7eb9@I-love.SAKURA.ne.jp/
Fixes: 596ff4a09b ("cpumask: re-introduce constant-sized cpumask optimizations")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Even though we are passing 'ret' as stop condition for
read_poll_timeout(), that return code is still being ignored. The reason
is that the poll will stop if the passed condition is true which will
happen if the passed op() returns error. However, read_poll_timeout()
returns 0 if the *complete* condition evaluates to true. Therefore, the
error code returned by op() will be ignored.
To fix this we need to check for both error codes:
* The one returned by read_poll_timeout() which is either 0 or
ETIMEDOUT.
* The one returned by the passed op().
Fixes: a44ef7c460 ("iio: adc: add max11410 adc driver")
Signed-off-by: Nuno Sá <nuno.sa@analog.com>
Acked-by: Ibrahim Tilki <Ibrahim.Tilki@analog.com>
Link: https://lore.kernel.org/r/20230307095303.713251-1-nuno.sa@analog.com
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Pull i2c updates from Wolfram Sang:
"This marks the end of a transition to let I2C have the same probe
semantics as other subsystems. Uwe took care that no drivers in the
current tree nor in -next use the deprecated .probe call. So, it is a
good time to switch to the new, standard semantics now.
There is also a regression fix:
- regression fix for the notifier handling of the I2C core
- final coversions of drivers away from deprecated .probe
- make .probe_new the standard probe and convert I2C core to use it
* tag 'i2c-for-6.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: dev: Fix bus callback return values
i2c: Convert drivers to new .probe() callback
i2c: mux: Convert all drivers to new .probe() callback
i2c: Switch .probe() to not take an id parameter
media: i2c: ov2685: convert to i2c's .probe_new()
media: i2c: ov5695: convert to i2c's .probe_new()
w1: ds2482: Convert to i2c's .probe_new()
serial: sc16is7xx: Convert to i2c's .probe_new()
mtd: maps: pismo: Convert to i2c's .probe_new()
misc: ad525x_dpot-i2c: Convert to i2c's .probe_new()
The CIO-DAC series of devices only supports DAC values up to 12-bit
rather than 16-bit. Trying to write a 16-bit value results in only the
lower 12 bits affecting the DAC output which is not what the user
expects. Instead, adjust the DAC write value check to reject values
larger than 12-bit so that they fail explicitly as invalid for the user.
Fixes: 3b8df5fd52 ("iio: Add IIO support for the Measurement Computing CIO-DAC family")
Cc: stable@vger.kernel.org
Signed-off-by: William Breathitt Gray <william.gray@linaro.org>
Link: https://lore.kernel.org/r/20230311002248.8548-1-william.gray@linaro.org
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Having a per-vcpu virtual offset is a pain. It needs to be synchronized
on each update, and expands badly to a setup where different timers can
have different offsets, or have composite offsets (as with NV).
So let's start by replacing the use of the CNTVOFF_EL2 shadow register
(which we want to reclaim for NV anyway), and make the virtual timer
carry a pointer to a VM-wide offset.
This simplifies the code significantly. It also addresses two terrible bugs:
- The use of CNTVOFF_EL2 leads to some nice offset corruption
when the sysreg gets reset, as reported by Joey.
- The kvm mutex is taken from a vcpu ioctl, which goes against
the locking rules...
Reported-by: Joey Gouly <joey.gouly@arm.com>
Reviewed-by: Reiji Watanabe <reijiw@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230224173915.GA17407@e124191.cambridge.arm.com
Tested-by: Joey Gouly <joey.gouly@arm.com>
Link: https://lore.kernel.org/r/20230224191640.3396734-1-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
1) nft_parse_register_load() gets an incorrect datatype size
as input, from Jeremy Sowden.
2) incorrect maximum netlink attribute in nft_redir, also
from Jeremy.
* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nft_redir: correct value of inet type `.maxattrs`
netfilter: nft_redir: correct length for loading protocol registers
netfilter: nft_masq: correct length for loading protocol registers
netfilter: nft_nat: correct length for loading protocol registers
====================
Link: https://lore.kernel.org/r/20230309174655.69816-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If the boot loader inode has never been used before, the
EXT4_IOC_SWAP_BOOT inode will initialize it, including setting the
i_size to 0. However, if the "never before used" boot loader has a
non-zero i_size, then i_disksize will be non-zero, and the
inconsistency between i_size and i_disksize can trigger a kernel
warning:
WARNING: CPU: 0 PID: 2580 at fs/ext4/file.c:319
CPU: 0 PID: 2580 Comm: bb Not tainted 6.3.0-rc1-00004-g703695902cfa
RIP: 0010:ext4_file_write_iter+0xbc7/0xd10
Call Trace:
vfs_write+0x3b1/0x5c0
ksys_write+0x77/0x160
__x64_sys_write+0x22/0x30
do_syscall_64+0x39/0x80
Reproducer:
1. create corrupted image and mount it:
mke2fs -t ext4 /tmp/foo.img 200
debugfs -wR "sif <5> size 25700" /tmp/foo.img
mount -t ext4 /tmp/foo.img /mnt
cd /mnt
echo 123 > file
2. Run the reproducer program:
posix_memalign(&buf, 1024, 1024)
fd = open("file", O_RDWR | O_DIRECT);
ioctl(fd, EXT4_IOC_SWAP_BOOT);
write(fd, buf, 1024);
Fix this by setting i_disksize as well as i_size to zero when
initiaizing the boot loader inode.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217159
Cc: stable@kernel.org
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Link: https://lore.kernel.org/r/20230308032643.641113-1-chengzhihao1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Now, 'es->s_state' maybe covered by recover journal. And journal errno
maybe not recorded in journal sb as IO error. ext4_update_super() only
update error information when 'sbi->s_add_error_count' large than zero.
Then 'EXT4_ERROR_FS' flag maybe lost.
To solve above issue just recover 'es->s_state' error flag after journal
replay like error info.
Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230307061703.245965-2-yebin@huaweicloud.com
The only caller of ext4_find_inline_data_nolock() that needs setting of
EXT4_STATE_MAY_INLINE_DATA flag is ext4_iget_extra_inode(). In
ext4_write_inline_data_end() we just need to update inode->i_inline_off.
Since we are going to add one more caller that does not need to set
EXT4_STATE_MAY_INLINE_DATA, just move setting of EXT4_STATE_MAY_INLINE_DATA
out to ext4_iget_extra_inode().
Signed-off-by: Ye Bin <yebin10@huawei.com>
Cc: stable@kernel.org
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230307015253.2232062-2-yebin@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Matthieu Baerts says:
====================
mptcp: fixes for 6.3
Patch 1 fixes a possible deadlock in subflow_error_report() reported by
lockdep. The report was in fact a false positive but the modification
makes sense and silences lockdep to allow syzkaller to find real issues.
The regression has been introduced in v5.12.
Patch 2 is a refactoring needed to be able to fix the two next issues.
It improves the situation and can be backported up to v6.0.
Patches 3 and 4 fix UaF reported by KASAN. It fixes issues potentially
visible since v5.7 and v5.19 but only reproducible until recently
(v6.0). These two patches depend on patch 2/7.
Patch 5 fixes the order of the printed values: expected vs seen values.
The regression has been introduced recently: v6.3-rc1.
Patch 6 adds missing ro_after_init flags. A previous patch added them
for other functions but these two have been missed. This previous patch
has been backported to stable versions (up to v5.12) so probably better
to do the same here.
Patch 7 fixes tcp_set_state() being called twice in a row since v5.10.
Patch 8 fixes another lockdep false positive issue but this time in
MPTCP PM code. Same here, some modifications in the code has been made
to silence this issue and help finding real ones later. This issue can
be seen since v6.2.
v1: https://lore.kernel.org/r/20230227-upstream-net-20230227-mptcp-fixes-v1-0-070e30ae4a8e@tessares.net
====================
Link: https://lore.kernel.org/r/20230227-upstream-net-20230227-mptcp-fixes-v2-0-47c2e95eada9@tessares.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Christoph reports a lockdep splat in the mptcp_subflow_create_socket()
error path, when such function is invoked by
mptcp_pm_nl_create_listen_socket().
Such code path acquires two separates, nested socket lock, with the
internal lock operation lacking the "nested" annotation. Adding that
in sock_release() for mptcp's sake only could be confusing.
Instead just add a new lockclass to the in-kernel msk socket,
re-initializing the lockdep infra after the socket creation.
Fixes: ad2171009d ("mptcp: fix locking for in-kernel listener creation")
Cc: stable@vger.kernel.org
Reported-by: Christoph Paasch <cpaasch@apple.com>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/354
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Tested-by: Christoph Paasch <cpaasch@apple.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add __ro_after_init labels for the variables tcp_prot_override and
tcpv6_prot_override, just like other variables adjacent to them, to
indicate that they are initialised from the init hooks and no writes
occur afterwards.
Fixes: b19bc2945b ("mptcp: implement delegated actions")
Cc: stable@vger.kernel.org
Fixes: 51fa7f8ebf ("mptcp: mark ops structures as ro_after_init")
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In case of errors, the printed message had the expected and the seen
value inverted.
This patch simply correct the order: first the expected value, then the
one that has been seen.
Fixes: 10d4273411 ("selftests: mptcp: userspace: print error details if any")
Cc: stable@vger.kernel.org
Acked-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
As reported by Christoph after having refactored the passive
socket initialization, the mptcp listener shutdown path is prone
to an UaF issue.
BUG: KASAN: use-after-free in _raw_spin_lock_bh+0x73/0xe0
Write of size 4 at addr ffff88810cb23098 by task syz-executor731/1266
CPU: 1 PID: 1266 Comm: syz-executor731 Not tainted 6.2.0-rc59af4eaa31c1f6c00c8f1e448ed99a45c66340dd5 #6
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x6e/0x91
print_report+0x16a/0x46f
kasan_report+0xad/0x130
kasan_check_range+0x14a/0x1a0
_raw_spin_lock_bh+0x73/0xe0
subflow_error_report+0x6d/0x110
sk_error_report+0x3b/0x190
tcp_disconnect+0x138c/0x1aa0
inet_child_forget+0x6f/0x2e0
inet_csk_listen_stop+0x209/0x1060
__mptcp_close_ssk+0x52d/0x610
mptcp_destroy_common+0x165/0x640
mptcp_destroy+0x13/0x80
__mptcp_destroy_sock+0xe7/0x270
__mptcp_close+0x70e/0x9b0
mptcp_close+0x2b/0x150
inet_release+0xe9/0x1f0
__sock_release+0xd2/0x280
sock_close+0x15/0x20
__fput+0x252/0xa20
task_work_run+0x169/0x250
exit_to_user_mode_prepare+0x113/0x120
syscall_exit_to_user_mode+0x1d/0x40
do_syscall_64+0x48/0x90
entry_SYSCALL_64_after_hwframe+0x72/0xdc
The msk grace period can legitly expire in between the last
reference count dropped in mptcp_subflow_queue_clean() and
the later eventual access in inet_csk_listen_stop()
After the previous patch we don't need anymore special-casing
msk listener socket cleanup: the mptcp worker will process each
of the unaccepted msk sockets.
Just drop the now unnecessary code.
Please note this commit depends on the two parent ones:
mptcp: refactor passive socket initialization
mptcp: use the workqueue to destroy unaccepted sockets
Fixes: 6aeed90450 ("mptcp: fix race on unaccepted mptcp sockets")
Cc: stable@vger.kernel.org
Reported-and-tested-by: Christoph Paasch <cpaasch@apple.com>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/346
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Christoph reported a UaF at token lookup time after having
refactored the passive socket initialization part:
BUG: KASAN: use-after-free in __token_bucket_busy+0x253/0x260
Read of size 4 at addr ffff88810698d5b0 by task syz-executor653/3198
CPU: 1 PID: 3198 Comm: syz-executor653 Not tainted 6.2.0-rc59af4eaa31c1f6c00c8f1e448ed99a45c66340dd5 #6
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x6e/0x91
print_report+0x16a/0x46f
kasan_report+0xad/0x130
__token_bucket_busy+0x253/0x260
mptcp_token_new_connect+0x13d/0x490
mptcp_connect+0x4ed/0x860
__inet_stream_connect+0x80e/0xd90
tcp_sendmsg_fastopen+0x3ce/0x710
mptcp_sendmsg+0xff1/0x1a20
inet_sendmsg+0x11d/0x140
__sys_sendto+0x405/0x490
__x64_sys_sendto+0xdc/0x1b0
do_syscall_64+0x3b/0x90
entry_SYSCALL_64_after_hwframe+0x72/0xdc
We need to properly clean-up all the paired MPTCP-level
resources and be sure to release the msk last, even when
the unaccepted subflow is destroyed by the TCP internals
via inet_child_forget().
We can re-use the existing MPTCP_WORK_CLOSE_SUBFLOW infra,
explicitly checking that for the critical scenario: the
closed subflow is the MPC one, the msk is not accepted and
eventually going through full cleanup.
With such change, __mptcp_destroy_sock() is always called
on msk sockets, even on accepted ones. We don't need anymore
to transiently drop one sk reference at msk clone time.
Please note this commit depends on the parent one:
mptcp: refactor passive socket initialization
Fixes: 58b0991962 ("mptcp: create msk early")
Cc: stable@vger.kernel.org
Reported-and-tested-by: Christoph Paasch <cpaasch@apple.com>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/347
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
After commit 30e51b923e ("mptcp: fix unreleased socket in accept queue")
unaccepted msk sockets go throu complete shutdown, we don't need anymore
to delay inserting the first subflow into the subflow lists.
The reference counting deserve some extra care, as __mptcp_close() is
unaware of the request socket linkage to the first subflow.
Please note that this is more a refactoring than a fix but because this
modification is needed to include other corrections, see the following
commits. Then a Fixes tag has been added here to help the stable team.
Fixes: 30e51b923e ("mptcp: fix unreleased socket in accept queue")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Tested-by: Christoph Paasch <cpaasch@apple.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Christoph reported a possible deadlock while the TCP stack
destroys an unaccepted subflow due to an incoming reset: the
MPTCP socket error path tries to acquire the msk-level socket
lock while TCP still owns the listener socket accept queue
spinlock, and the reverse dependency already exists in the
TCP stack.
Note that the above is actually a lockdep false positive, as
the chain involves two separate sockets. A different per-socket
lockdep key will address the issue, but such a change will be
quite invasive.
Instead, we can simply stop earlier the socket error handling
for orphaned or unaccepted subflows, breaking the critical
lockdep chain. Error handling in such a scenario is a no-op.
Reported-and-tested-by: Christoph Paasch <cpaasch@apple.com>
Fixes: 15cc104533 ("mptcp: deliver ssk errors to msk")
Cc: stable@vger.kernel.org
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/355
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Lorenzo Bianconi says:
====================
update xdp_features flag according to NIC re-configuration
Changes since v1:
- rebase on top of net tree
- remove NETDEV_XDP_ACT_NDO_XMIT_SG support in mlx5e driver
- always enable NETDEV_XDP_ACT_NDO_XMIT support in mlx5e driver
====================
Link: https://lore.kernel.org/r/cover.1678364612.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
XDP is supported only if enough queues are present, so when reconfiguring
the queues set xdp_features accordingly.
Fixes: 66c0e13ad2 ("drivers: net: turn on XDP features")
Suggested-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Matteo Croce <teknoraver@meta.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Take into account LRO and GRO configuration setting device xdp_features
flag. Consider channel rq_wq_type enabling rx scatter-gatter support in
xdp_features flag and disable NETDEV_XDP_ACT_NDO_XMIT_SG since it is not
supported yet by the driver.
Moreover always enable NETDEV_XDP_ACT_NDO_XMIT as the ndo_xdp_xmit
callback does not require to load a dummy xdp program on the NIC.
Fixes: 66c0e13ad2 ("drivers: net: turn on XDP features")
Co-developed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Take into account tx/rx queues reconfiguration setting device
xdp_features flag. Moreover consider NETIF_F_GRO flag in order to enable
ndo_xdp_xmit callback.
Fixes: 66c0e13ad2 ("drivers: net: turn on XDP features")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
ena nic allows xdp just if enough hw queues are available for XDP.
Take into account queues configuration setting xdp_features.
Fixes: 66c0e13ad2 ("drivers: net: turn on XDP features")
Reviewed-by: Shay Agroskin <shayagr@amazon.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
thunderx nic allows xdp just if enough hw queues are available for XDP.
Take into account queues configuration setting xdp_features.
Fixes: 66c0e13ad2 ("drivers: net: turn on XDP features")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Introduce xdp_set_features_flag utility routine in order to update
dynamically xdp_features according to the dynamic hw configuration via
ethtool (e.g. changing number of hw rx/tx queues).
Add xdp_clear_features_flag() in order to clear all xdp_feature flag.
Reviewed-by: Shay Agroskin <shayagr@amazon.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fix get_mask utility routine in order to take into account possible gaps
in the elements list.
Fixes: be5bea1cc0 ("net: add basic C code generators for Netlink")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Properly manage render-max property for flags definition type
introducing mask value and setting it to (last_element << 1) - 1
instead of adding max value set to last_element + 1
Fixes: be5bea1cc0 ("net: add basic C code generators for Netlink")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
For l3s mode, skb->dev is set to ipvlan interface in ipvlan_nf_input():
skb->dev = addr->master->dev
but, skb->skb_iif remain unchanged, this will cause socket lookup failed
if a target socket is bound to a interface, like the following example:
ip link add ipvlan0 link eth0 type ipvlan mode l3s
ip addr add dev ipvlan0 192.168.124.111/24
ip link set ipvlan0 up
ping -c 1 -I ipvlan0 8.8.8.8
100% packet loss
This is because there is no match sk in __raw_v4_lookup() as sk->sk_bound_dev_if != dif(skb->skb_iif).
Fix this by make skb->skb_iif track skb->dev in ipvlan_nf_input().
Fixes: c675e06a98 ("ipvlan: decouple l3s mode dependencies from other modes")
Signed-off-by: Jianguo Wu <wujianguo@chinatelecom.cn>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/29865b1f-6db7-c07a-de89-949d3721ea30@163.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull SCSI fixes from James Bottomley:
"Twenty fixes all in drivers except the one zone storage revalidation
fix to sd.
The megaraid_sas fixes are more on the level of a driver update
(enabling crash dump and increasing lun number) but I thought you
could let this slide on -rc1 and the next most extensive update is a
load of fixes to mpi3mr"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: sd: Fix wrong zone_write_granularity value during revalidate
scsi: storvsc: Handle BlockSize change in Hyper-V VHD/VHDX file
scsi: megaraid_sas: Driver version update to 07.725.01.00-rc1
scsi: megaraid_sas: Add crash dump mode capability bit in MFI capabilities
scsi: megaraid_sas: Update max supported LD IDs to 240
scsi: mpi3mr: Bad drive in topology results kernel crash
scsi: mpi3mr: NVMe command size greater than 8K fails
scsi: mpi3mr: Return proper values for failures in firmware init path
scsi: mpi3mr: Wait for diagnostic save during controller init
scsi: mpi3mr: Driver unload crashes host when enhanced logging is enabled
scsi: mpi3mr: ioctl timeout when disabling/enabling interrupt
scsi: lpfc: Avoid usage of list iterator variable after loop
scsi: lpfc: Check kzalloc() in lpfc_sli4_cgn_params_read()
scsi: ufs: mcq: qcom: Clean the return path of ufs_qcom_mcq_config_resource()
scsi: ufs: mcq: qcom: Fix passing zero to PTR_ERR
scsi: ufs: ufs-qcom: Remove impossible check
scsi: ufs: core: Add soft dependency on governor_simpleondemand
scsi: hisi_sas: Check devm_add_action() return value
scsi: qla2xxx: Add option to disable FC2 Target support
scsi: target: iscsi: Fix an error message in iscsi_check_key()
Pull block fixes from Jens Axboe:
- Fix a regression in exclusive mode handling of the partition code,
introduced in this merge windoe (Yu)
- Fix for a use-after-free in BFQ (Yu)
- Add sysfs documentation for the 'hidden' attribute (Sagi)
* tag 'block-6.3-2023-03-09' of git://git.kernel.dk/linux:
block, bfq: fix uaf for 'stable_merge_bfqq'
docs: sysfs-block: document hidden sysfs entry
block: fix wrong mode for blkdev_put() from disk_scan_partitions()
Pull put_and_unmap_page() helper from Al Viro:
"kmap_local_page() conversions in local filesystems keep running into
kunmap_local_page()+put_page() combinations. We can keep inventing
names for identical inline helpers, but it's getting rather
inconvenient. I've added a trivial helper to linux/highmem.h instead.
I would've held that back until the merge window, if not for the mess
it causes in tree topology - I've several branches merging from that
one, and it's only going to get worse if e.g. ext2 stuff gets picked
by Jan"
* tag 'pull-highmem' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
new helper: put_and_unmap_page()
Pull misc fixes from Al Viro:
"pick_file() speculation fix + fix for alpha mis(merge,cherry-pick)
The fs/file.c one is a genuine missing speculation barrier in
pick_file() (reachable e.g. via close(2)). The alpha one is strictly
speaking not a bug fix, but only because confusion between
preempt_enable() and preempt_disable() is harmless on architecture
without CONFIG_PREEMPT.
Looks like alpha.git picked the wrong version of patch - that braino
used to be there in early versions, but it had been fixed quite a
while ago..."
* tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs: prevent out-of-bounds array speculation when closing a file descriptor
alpha: fix lazy-FPU mis(merged/applied/whatnot)
Pull thermal control fix from Rafael Wysocki:
"Fix a recently introduced deadlock in the int340x thermal control
driver (Srinivas Pandruvada)"
* tag 'thermal-6.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
thermal: intel: int340x: processor_thermal: Fix deadlock
Johannes Berg says:
====================
Just a few fixes:
* MLO connection socket ownership didn't work
* basic rates validation was missing (reported by
by a private syzbot instances)
* puncturing bitmap netlink policy was completely broken
* properly check chandef for NULL channel, it can be
pointing to a chandef that's still uninitialized
* tag 'wireless-2023-03-10' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
wifi: cfg80211: fix MLO connection ownership
wifi: mac80211: check basic rates validity
wifi: nl80211: fix puncturing bitmap policy
wifi: nl80211: fix NULL-ptr deref in offchan check
====================
Link: https://lore.kernel.org/r/20230310114647.35422-1-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Xuan Zhuo says:
====================
add checking sq is full inside xdp xmit
If the queue of xdp xmit is not an independent queue, then when the xdp
xmit used all the desc, the xmit from the __dev_queue_xmit() may encounter
the following error.
net ens4: Unexpected TXQ (0) queue failure: -28
This patch adds a check whether sq is full in XDP Xmit.
====================
Link: https://lore.kernel.org/r/20230308024935.91686-1-xuanzhuo@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If the queue of xdp xmit is not an independent queue, then when the xdp
xmit used all the desc, the xmit from the __dev_queue_xmit() may encounter
the following error.
net ens4: Unexpected TXQ (0) queue failure: -28
This patch adds a check whether sq is full in xdp xmit.
Fixes: 56434a01b1 ("virtio_net: add XDP_TX support")
Reported-by: Yichun Zhang <yichun@openresty.com>
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The purpose of this is to facilitate the subsequent addition of new
functions without introducing a separate declaration.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
struct pn533_out_arg used as a temporary context for out_urb is not
initialized properly. Its uninitialized 'phy' field can be dereferenced in
error cases inside pn533_out_complete() callback function. It causes the
following failure:
general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.2.0-rc3-next-20230110-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022
RIP: 0010:pn533_out_complete.cold+0x15/0x44 drivers/nfc/pn533/usb.c:441
Call Trace:
<IRQ>
__usb_hcd_giveback_urb+0x2b6/0x5c0 drivers/usb/core/hcd.c:1671
usb_hcd_giveback_urb+0x384/0x430 drivers/usb/core/hcd.c:1754
dummy_timer+0x1203/0x32d0 drivers/usb/gadget/udc/dummy_hcd.c:1988
call_timer_fn+0x1da/0x800 kernel/time/timer.c:1700
expire_timers+0x234/0x330 kernel/time/timer.c:1751
__run_timers kernel/time/timer.c:2022 [inline]
__run_timers kernel/time/timer.c:1995 [inline]
run_timer_softirq+0x326/0x910 kernel/time/timer.c:2035
__do_softirq+0x1fb/0xaf6 kernel/softirq.c:571
invoke_softirq kernel/softirq.c:445 [inline]
__irq_exit_rcu+0x123/0x180 kernel/softirq.c:650
irq_exit_rcu+0x9/0x20 kernel/softirq.c:662
sysvec_apic_timer_interrupt+0x97/0xc0 arch/x86/kernel/apic/apic.c:1107
Initialize the field with the pn533_usb_phy currently used.
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
Fixes: 9dab880d67 ("nfc: pn533: Wait for out_urb's completion in pn533_usb_send_frame()")
Reported-by: syzbot+1e608ba4217c96d1952f@syzkaller.appspotmail.com
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230309165050.207390-1-pchelkin@ispras.ru
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The following build error can be seen:
progs/test_deny_namespace.c:22:19: error: call to undeclared function 'BIT_LL'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
__u64 cap_mask = BIT_LL(CAP_SYS_ADMIN);
The struct kernel_cap_struct no longer exists in the kernel as well.
Adjust bpf prog to fix both issues.
Fixes: f122a08b19 ("capability: just use a 'u64' instead of a 'u32[2]' array")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The commit 11e456cae9 ("selftests/bpf: Fix compilation errors: Assign a value to a constant")
fixed the issue cleanly in bpf-next.
This is an alternative fix in bpf tree to avoid merge conflict between bpf and bpf-next.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This reverts commit 6d0c4b11e743("libbpf: Poison strlcpy()").
It added the pragma poison directive to libbpf_internal.h to protect
against accidental usage of strlcpy but ended up breaking the build for
toolchains based on libcs which provide the strlcpy() declaration from
string.h (e.g. uClibc-ng). The include order which causes the issue is:
string.h,
from Iibbpf_common.h:12,
from libbpf.h:20,
from libbpf_internal.h:26,
from strset.c:9:
Fixes: 6d0c4b11e7 ("libbpf: Poison strlcpy()")
Signed-off-by: Jesus Sanchez-Palencia <jesussanp@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230309004836.2808610-1-jesussanp@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The tenkeyless version of the Realforce R2 has the same issue of the
full size one, the report fixup is needed to make n-key rollover
work instead of 6 key rollover
Signed-off-by: Alessandro Manca <crizan.git@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Pull RISC-V fixes from Palmer Dabbelt:
- RISC-V architecture-specific ELF attributes have been disabled in the
kernel builds
- A fix for a locking failure while during errata patching that
manifests on SiFive-based systems
- A fix for a KASAN failure during stack unwinding
- A fix for some lockdep failures during text patching
* tag 'riscv-for-linus-6.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
RISC-V: Don't check text_mutex during stop_machine
riscv: Use READ_ONCE_NOCHECK in imprecise unwinding stack mode
RISC-V: fix taking the text_mutex twice during sifive errata patching
RISC-V: Stop emitting attributes
Fix the nid_t field so that its size is correctly reported in the text
format embedded in trace.dat files. As it stands, it is reported as
being of size 4:
field:nid_t nid[3]; offset:24; size:4; signed:0;
Instead of 12:
field:nid_t nid[3]; offset:24; size:12; signed:0;
This also fixes the reported offset of subsequent fields so that they
match with the actual struct layout.
Signed-off-by: Douglas Raillard <douglas.raillard@arm.com>
Reviewed-by: Mukesh Ojha <quic_mojha@quicinc.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Pull drm fixes from Dave Airlie:
"Weekly fixes.
msm and amdgpu are the vast majority of these, otherwise some
straggler misc from last week for nouveau and cirrus and a mailmap
update for a drm developer.
mailmap:
- add an entry
nouveau:
- fix system shutdown regression
- build warning fix
cirrus:
- NULL ptr deref fix
msm:
- fix invalid ptr free in syncobj cleanup
- sync GMU removal in teardown
- a5xx preemption fixes
- fix runpm imbalance
- DPU hw fixes
- stack corruption fix
- clear DSPP reservation
amdgpu:
- Misc display fixes
- UMC 8.10 fixes
- Driver unload fixes
- NBIO 7.3.0 fix
- Error checking fixes for soc15, nv, soc21 read register interface
- Fix video cap query for VCN 4.0.4
amdkfd:
- Fix return check in doorbell handling"
* tag 'drm-fixes-2023-03-10' of git://anongit.freedesktop.org/drm/drm: (42 commits)
drm/amdgpu/soc21: Add video cap query support for VCN_4_0_4
drm/amdgpu: fix error checking in amdgpu_read_mm_registers for nv
drm/amdgpu: fix error checking in amdgpu_read_mm_registers for soc21
drm/amdgpu: fix error checking in amdgpu_read_mm_registers for soc15
drm/amdgpu: Fix the warning info when removing amdgpu device
drm/amdgpu: fix return value check in kfd
drm/amd: Fix initialization mistake for NBIO 7.3.0
drm/amdgpu: Fix call trace warning and hang when removing amdgpu device
mailmap: add mailmap entries for Faith.
drm/msm: DEVFREQ_GOV_SIMPLE_ONDEMAND is no longer needed
drm/amd/display: Update clock table to include highest clock setting
drm/amd/pm: Enable ecc_info table support for smu v13_0_10
drm/amdgpu: Support umc node harvest config on umc v8_10
drm/connector: print max_requested_bpc in state debugfs
drm/display: Don't block HDR_OUTPUT_METADATA on unknown EOTF
drm/msm/dpu: clear DSPP reservations in rm release
drm/msm/disp/dpu: fix sc7280_pp base offset
drm/msm/dpu: fix stack smashing in dpu_hw_ctl_setup_blendstage
drm/msm/dpu: don't use DPU_CLK_CTRL_CURSORn for DMA SSPP clocks
drm/msm/dpu: fix clocks settings for msm8998 SSPP blocks
...
Pull erofs fixes from Gao Xiang:
"The most important one reverts an improper fix which can cause an
unexpected warning more often on specific images, and another one
fixes LZMA decompression on 32-bit platforms. The others are minor
fixes and cleanups.
- Fix LZMA decompression failure on HIGHMEM platforms
- Revert an inproper fix since it is actually an implementation issue
of vmalloc()
- Avoid a wrong DBG_BUGON since it could be triggered with -EINTR
- Minor cleanups"
* tag 'erofs-for-6.3-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: use wrapper i_blocksize() in erofs_file_read_iter()
erofs: get rid of a useless DBG_BUGON
erofs: Revert "erofs: fix kvcalloc() misuse with __GFP_NOFAIL"
erofs: fix wrong kunmap when using LZMA on HIGHMEM platforms
erofs: mark z_erofs_lzma_init/erofs_pcpubuf_init w/ __init
Pull nfsd fixes from Chuck Lever:
- Protect NFSD writes against filesystem freezing
- Fix a potential memory leak during server shutdown
* tag 'nfsd-6.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
SUNRPC: Fix a server shutdown leak
NFSD: Protect against filesystem freezing
Pull btrfs fixes from David Sterba:
"First batch of fixes. Among them there are two updates to sysfs and
ioctl which are not strictly fixes but are used for testing so there's
no reason to delay them.
- fix block group item corruption after inserting new block group
- fix extent map logging bit not cleared for split maps after
dropping range
- fix calculation of unusable block group space reporting bogus
values due to 32/64b division
- fix unnecessary increment of read error stat on write error
- improve error handling in inode update
- export per-device fsid in DEV_INFO ioctl to distinguish seeding
devices, needed for testing
- allocator size classes:
- fix potential dead lock in size class loading logic
- print sysfs stats for the allocation classes"
* tag 'for-6.3-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix block group item corruption after inserting new block group
btrfs: fix extent map logging bit not cleared for split maps after dropping range
btrfs: fix percent calculation for bg reclaim message
btrfs: fix unnecessary increment of read error stat on write error
btrfs: handle btrfs_del_item errors in __btrfs_update_delayed_inode
btrfs: ioctl: return device fsid from DEV_INFO ioctl
btrfs: fix potential dead lock in size class loading logic
btrfs: sysfs: add size class stats
Pull io_uring fixes from Jens Axboe:
- Stop setting PF_NO_SETAFFINITY on io-wq workers.
This has been reported in the past as it confuses some applications,
as some of their threads will fail with -1/EINVAL if attempted
affinitized. Most recent report was on cpusets, where enabling that
with io-wq workers active will fail.
Just deal with the mask changing by checking when a worker times out,
and then exit if we have no work pending.
- Fix an issue with passthrough support where we don't properly check
if the file type has pollable uring_cmd support.
- Fix a reported W=1 warning on a variable being set and unused. Add a
special helper for iterating these lists that doesn't save the
previous list element, if that iterator never ends up using it.
* tag 'io_uring-6.3-2023-03-09' of git://git.kernel.dk/linux:
io_uring: silence variable ‘prev’ set but not used warning
io_uring/uring_cmd: ensure that device supports IOPOLL
io_uring/io-wq: stop setting PF_NO_SETAFFINITY on io-wq workers
Pull perf tools fixes from Arnaldo Carvalho de Melo:
- Add Adrian Hunter to MAINTAINERS as a perf tools reviewer
- Sync various tools/ copies of kernel headers with the kernel sources,
this time trying to avoid first merging with upstream to then update
but instead copy from upstream so that a merge is avoided and the end
result after merging this pull request is the one expected,
tools/perf/check-headers.sh (mostly) happy, less warnings while
building tools/perf/
- Fix counting when initial delay configured by setting
perf_attr.enable_on_exec when starting workloads from the perf
command line
- Don't avoid emitting a PERF_RECORD_MMAP2 in 'perf inject
--buildid-all' when that record comes with a build-id, otherwise we
end up not being able to resolve symbols
- Don't use comma as the CSV output separator the "stat+csv_output"
test, as comma can appear on some tests as a modifier for an event,
use @ instead, ditto for the JSON linter test
- The offcpu test was looking for some bits being set on
task_struct->prev_state without masking other bits not important for
this specific 'perf test', fix it
* tag 'perf-tools-fixes-for-v6.3-1-2023-03-09' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
perf tools: Add Adrian Hunter to MAINTAINERS as a reviewer
tools headers UAPI: Sync linux/perf_event.h with the kernel sources
tools headers x86 cpufeatures: Sync with the kernel sources
tools include UAPI: Sync linux/vhost.h with the kernel sources
tools arch x86: Sync the msr-index.h copy with the kernel sources
tools headers kvm: Sync uapi/{asm/linux} kvm.h headers with the kernel sources
tools include UAPI: Synchronize linux/fcntl.h with the kernel sources
tools headers: Synchronize {linux,vdso}/bits.h with the kernel sources
tools headers UAPI: Sync linux/prctl.h with the kernel sources
tools headers: Update the copy of x86's mem{cpy,set}_64.S used in 'perf bench'
perf stat: Fix counting when initial delay configured
tools headers svm: Sync svm headers with the kernel sources
perf test: Avoid counting commas in json linter
perf tests stat+csv_output: Switch CSV separator to @
perf inject: Fix --buildid-all not to eat up MMAP2
tools arch x86: Sync the msr-index.h copy with the kernel sources
perf test: Fix offcpu test prev_state check
Similar to many other devices using the Synopsys Designware Elantech
hardware, HP Spectre x360 13t-aw100 and 14t-ea100 report an empty
battery devices, supposedly for the active stylus.
Apply the HID_BATTERY_QUIRK_IGNORE quirk to ignore the battery reports
from these devices. Note that there are multiple versions of the panel
installed in the 14t-ea100.
Signed-off-by: Philippe Troin <phil@fifi.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Geert reports that:
> On v6.2, "make ARCH=m68k defconfig" gives you
> CONFIG_RPCSEC_GSS_KRB5=m
> On v6.3, it became builtin, due to dropping the dependencies on
> the individual crypto modules.
>
> $ grep -E "CRYPTO_(MD5|DES|CBC|CTS|ECB|HMAC|SHA1|AES)" .config
> CONFIG_CRYPTO_AES=y
> CONFIG_CRYPTO_AES_TI=m
> CONFIG_CRYPTO_DES=m
> CONFIG_CRYPTO_CBC=m
> CONFIG_CRYPTO_CTS=m
> CONFIG_CRYPTO_ECB=m
> CONFIG_CRYPTO_HMAC=m
> CONFIG_CRYPTO_MD5=m
> CONFIG_CRYPTO_SHA1=m
This behavior is triggered by the "default y" in the definition of
RPCSEC_GSS.
The "default y" was added in 2010 by commit df486a2590 ("NFS: Fix
the selection of security flavours in Kconfig"). However,
svc_gss_principal was removed in 2012 by commit 03a4e1f6dd
("nfsd4: move principal name into svc_cred"), so the 2010 fix is
no longer necessary. We can safely change the NFS_V4 and NFSD_V4
dependencies back to RPCSEC_GSS_KRB5 to get the nicer v6.2
behavior back.
Selecting KRB5 symbolically represents the true requirement here:
that all spec-compliant NFSv4 implementations must have Kerberos
available to use.
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Fixes: dfe9a12345 ("SUNRPC: Enable rpcsec_gss_krb5.ko to be built without CRYPTO_DES")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
The recent fix for the deferred I/O by the commit
3efc61d952 ("fbdev: Fix invalid page access after closing deferred I/O devices")
caused a regression when the same fb device is opened/closed while
it's being used. It resulted in a frozen screen even if something
is redrawn there after the close. The breakage is because the patch
was made under a wrong assumption of a single open; in the current
code, fb_deferred_io_release() cleans up the page mapping of the
pageref list and it calls cancel_delayed_work_sync() unconditionally,
where both are no correct behavior for multiple opens.
This patch adds a refcount for the opens of the device, and applies
the cleanup only when all files get closed.
As both fb_deferred_io_open() and _close() are called always in the
fb_info lock (mutex), it's safe to use the normal int for the
refcounting.
Also, a useless BUG_ON() is dropped.
Fixes: 3efc61d952 ("fbdev: Fix invalid page access after closing deferred I/O devices")
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Reviewed-by: Patrik Jakobsson <patrik.r.jakobsson@gmail.com>
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20230308105012.1845-1-tiwai@suse.de
The PE/COFF header has a NX compat flag which informs the firmware that
the application does not rely on memory regions being mapped with both
executable and writable permissions at the same time.
This is typically used by the firmware to decide whether it can set the
NX attribute on all allocations it returns, but going forward, it may be
used to enforce a policy that only permits applications with the NX flag
set to be loaded to begin wiht in some configurations, e.g., when Secure
Boot is in effect.
Even though the arm64 version of the EFI stub may relocate the kernel
before executing it, it always did so after disabling the MMU, and so we
were always in line with what the NX compat flag conveys, we just never
bothered to set it.
So let's set the flag now.
Cc: <stable@vger.kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
After relocating the executable image, use the EFI memory attributes
protocol to remap the code and data regions with the appropriate
permissions.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Now that the zboot loader will invoke the EFI memory attributes protocol
to remap the decompressed code and rodata as read-only/executable, we
can set the PE/COFF header flag that indicates to the firmware that the
application does not rely on writable memory being executable at the
same time.
Cc: <stable@vger.kernel.org> # v6.2+
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
When userspace sets basic rates, it might send us some rates
list that's empty or consists of invalid values only. We're
currently ignoring invalid values and then may end up with a
rates bitmap that's empty, which later results in a warning.
Reject the call if there were no valid rates.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This was meant to be a u32, and while applying the patch
I tried to use policy validation for it. However, not only
did I copy/paste it to u8 instead of u32, but also used
the policy range erroneously. Fix both of these issues.
Fixes: d7c1a9a0ed ("wifi: nl80211: validate and configure puncturing bitmap")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Otherwise the virtqueue object to instate could point to invalid address
that was unmapped from the MTT:
mlx5_core 0000:41:04.2: mlx5_cmd_out_err:782:(pid 8321):
CREATE_GENERAL_OBJECT(0xa00) op_mod(0xd) failed, status
bad parameter(0x3), syndrome (0x5fa1c), err(-22)
Fixes: cae15c2ed8 ("vdpa/mlx5: Implement susupend virtqueue callback")
Cc: Eli Cohen <elic@nvidia.com>
Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
Reviewed-by: Eli Cohen <elic@nvidia.com>
Message-Id: <1676424640-11673-1-git-send-email-si-wei.liu@oracle.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
pci_enable_pcie_error_reporting() enables the device to send ERR_*
Messages. Since f26e58bf6f ("PCI/AER: Enable error reporting when AER is
native"), the PCI core does this for all devices during enumeration, so the
driver doesn't need to do it itself.
Remove the redundant pci_enable_pcie_error_reporting() call from the
driver. Also remove the corresponding pci_disable_pcie_error_reporting()
from the driver .remove() path.
Note that this only controls ERR_* Messages from the device. An ERR_*
Message may cause the Root Port to generate an interrupt, depending on the
AER Root Error Command register managed by the AER service driver.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Xu Yilun <yilun.xu@intel.com>
Link: https://lore.kernel.org/r/20230307201937.880084-1-helgaas@kernel.org
Signed-off-by: Xu Yilun <yilun.xu@intel.com>
tcp_rtx_synack() now could be called in process context as explained in
0a375c8224 ("tcp: tcp_rtx_synack() can be called from process
context").
tcp_rtx_synack() might call tcp_make_synack(), which will touch per-CPU
variables with preemption enabled. This causes the following BUG:
BUG: using __this_cpu_add() in preemptible [00000000] code: ThriftIO1/5464
caller is tcp_make_synack+0x841/0xac0
Call Trace:
<TASK>
dump_stack_lvl+0x10d/0x1a0
check_preemption_disabled+0x104/0x110
tcp_make_synack+0x841/0xac0
tcp_v6_send_synack+0x5c/0x450
tcp_rtx_synack+0xeb/0x1f0
inet_rtx_syn_ack+0x34/0x60
tcp_check_req+0x3af/0x9e0
tcp_rcv_state_process+0x59b/0x2030
tcp_v6_do_rcv+0x5f5/0x700
release_sock+0x3a/0xf0
tcp_sendmsg+0x33/0x40
____sys_sendmsg+0x2f2/0x490
__sys_sendmsg+0x184/0x230
do_syscall_64+0x3d/0x90
Avoid calling __TCP_INC_STATS() with will touch per-cpu variables. Use
TCP_INC_STATS() which is safe to be called from context switch.
Fixes: 8336886f78 ("tcp: TCP Fast Open Server - support TFO listeners")
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20230308190745.780221-1-leitao@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
KASAN reported follow problem:
BUG: KASAN: use-after-free in lookup_rec
Read of size 8 at addr ffff000199270ff0 by task modprobe
CPU: 2 Comm: modprobe
Call trace:
kasan_report
__asan_load8
lookup_rec
ftrace_location
arch_check_ftrace_location
check_kprobe_address_safe
register_kprobe
When checking pg->records[pg->index - 1].ip in lookup_rec(), it can get a
pg which is newly added to ftrace_pages_start in ftrace_process_locs().
Before the first pg->index++, index is 0 and accessing pg->records[-1].ip
will cause this problem.
Don't check the ip when pg->index is 0.
Link: https://lore.kernel.org/linux-trace-kernel/20230309080230.36064-1-chenzhongjin@huawei.com
Cc: stable@vger.kernel.org
Fixes: 9644302e33 ("ftrace: Speed up search by skipping pages by address")
Suggested-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Chen Zhongjin <chenzhongjin@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
The function hist_field_name() cannot handle being passed a NULL field
parameter. It should never be NULL, but due to a previous bug, NULL was
passed to the function and the kernel crashed due to a NULL dereference.
Mark Rutland reported this to me on IRC.
The bug was fixed, but to prevent future bugs from crashing the kernel,
check the field and add a WARN_ON() if it is NULL.
Link: https://lkml.kernel.org/r/20230302020810.762384440@goodmis.org
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Reported-by: Mark Rutland <mark.rutland@arm.com>
Fixes: c6afad49d1 ("tracing: Add hist trigger 'sym' and 'sym-offset' modifiers")
Tested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Some storage, such as AIX VDASD (virtual storage) and IBM 2076 (front
end), fail as a result of commit c92a6b5d63 ("scsi: core: Query VPD
size before getting full page").
That commit changed getting SCSI VPD pages so that we now read just
enough of the page to get the actual page size, then read the whole
page in a second read. The problem is that the above mentioned
hardware returns zero for the page size, because of a firmware
error. In such cases, until the firmware is fixed, this new blacklist
flag says to revert to the original method of reading the VPD pages,
i.e. try to read a whole buffer's worth on the first try.
[mkp: reworked somewhat]
Fixes: c92a6b5d63 ("scsi: core: Query VPD size before getting full page")
Reported-by: Martin Wilck <mwilck@suse.com>
Suggested-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Lee Duncan <lduncan@suse.com>
Link: https://lore.kernel.org/r/20220928181350.9948-1-leeman.duncan@gmail.com
Tested-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
After commit c28cd1f343 ("clk: Mark a fwnode as initialized when using
CLK_OF_DECLARE() macro"), drivers/clk/mvebu/kirkwood.c fails to build:
drivers/clk/mvebu/kirkwood.c:358:1: error: expected identifier or '('
CLK_OF_DECLARE(98dx1135_clk, "marvell,mv98dx1135-core-clock",
^
include/linux/clk-provider.h:1367:21: note: expanded from macro 'CLK_OF_DECLARE'
static void __init name##_of_clk_init_declare(struct device_node *np) \
^
<scratch space>:124:1: note: expanded from here
98dx1135_clk_of_clk_init_declare
^
drivers/clk/mvebu/kirkwood.c:358:1: error: invalid digit 'd' in decimal constant
include/linux/clk-provider.h:1372:34: note: expanded from macro 'CLK_OF_DECLARE'
OF_DECLARE_1(clk, name, compat, name##_of_clk_init_declare)
^
<scratch space>:125:3: note: expanded from here
98dx1135_clk_of_clk_init_declare
^
drivers/clk/mvebu/kirkwood.c:358:1: error: invalid digit 'd' in decimal constant
include/linux/clk-provider.h:1372:34: note: expanded from macro 'CLK_OF_DECLARE'
OF_DECLARE_1(clk, name, compat, name##_of_clk_init_declare)
^
<scratch space>:125:3: note: expanded from here
98dx1135_clk_of_clk_init_declare
^
drivers/clk/mvebu/kirkwood.c:358:1: error: invalid digit 'd' in decimal constant
include/linux/clk-provider.h:1372:34: note: expanded from macro 'CLK_OF_DECLARE'
OF_DECLARE_1(clk, name, compat, name##_of_clk_init_declare)
^
<scratch space>:125:3: note: expanded from here
98dx1135_clk_of_clk_init_declare
^
C function names must start with either an alphabetic letter or an
underscore. To avoid generating invalid function names from clock names,
add two underscores to the beginning of the identifier.
Fixes: c28cd1f343 ("clk: Mark a fwnode as initialized when using CLK_OF_DECLARE() macro")
Suggested-by: Saravana Kannan <saravanak@google.com>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20230308-clk_of_declare-fix-v1-1-317b741e2532@kernel.org
Reviewed-by: Saravana Kannan <saravanak@google.com>
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
In bq24190_probe, &bdi->input_current_limit_work is bound
with bq24190_input_current_limit_work. When external power
changed, it will call bq24190_charger_external_power_changed
to start the work.
If we remove the module which will call bq24190_remove to make
cleanup, there may be a unfinished work. The possible
sequence is as follows:
CPU0 CPUc1
|bq24190_input_current_limit_work
bq24190_remove |
power_supply_unregister |
device_unregister |
power_supply_dev_release|
kfree(psy) |
|
| power_supply_get_property_from_supplier
| //use
Fix it by finishing the work before cleanup in the bq24190_remove
Fixes: 9777467257 ("power_supply: Initialize changed_work before calling device_add")
Signed-off-by: Zheng Wang <zyytlz.wz@163.com>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Sergey Matyukevich <geomatsi@gmail.com> says:
Some time ago two different patches have been posted to fix stale TLB
entries that caused applications crashes.
The patch [0] suggested 'aggregating' mm_cpumask, i.e. current cpu is not
cleared for the switched-out task in switch_mm function. For additional
explanations see the commit message by Guo Ren. The same approach is
used by arc architecture, so another good comment is for switch_mm
in arch/arc/include/asm/mmu_context.h.
The patch [1] attempted to reduce the number of TLB flushes by deferring
(and possibly avoiding) them for CPUs not running the task.
Patch [1] has been merged. However we already have two bug reports from
different vendors. So apparently something is missing in the approach
suggested in [1]. In both cases the patch [0] fixed the issue.
This patch series reverts [1] and replaces it by [0].
[0] https://lore.kernel.org/linux-riscv/20221111075902.798571-1-guoren@kernel.org/
[1] https://lore.kernel.org/linux-riscv/20220829205219.283543-1-geomatsi@gmail.com/
* b4-shazam-merge:
riscv: asid: Fixup stale TLB entry cause application crash
Revert "riscv: mm: notify remote harts about mmu cache updates"
Link: https://lore.kernel.org/r/20230226150137.1919750-1-geomatsi@gmail.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
After use_asid_allocator is enabled, the userspace application will
crash by stale TLB entries. Because only using cpumask_clear_cpu without
local_flush_tlb_all couldn't guarantee CPU's TLB entries were fresh.
Then set_mm_asid would cause the user space application to get a stale
value by stale TLB entry, but set_mm_noasid is okay.
Here is the symptom of the bug:
unhandled signal 11 code 0x1 (coredump)
0x0000003fd6d22524 <+4>: auipc s0,0x70
0x0000003fd6d22528 <+8>: ld s0,-148(s0) # 0x3fd6d92490
=> 0x0000003fd6d2252c <+12>: ld a5,0(s0)
(gdb) i r s0
s0 0x8082ed1cc3198b21 0x8082ed1cc3198b21
(gdb) x /2x 0x3fd6d92490
0x3fd6d92490: 0xd80ac8a8 0x0000003f
The core dump file shows that register s0 is wrong, but the value in
memory is correct. Because 'ld s0, -148(s0)' used a stale mapping entry
in TLB and got a wrong result from an incorrect physical address.
When the task ran on CPU0, which loaded/speculative-loaded the value of
address(0x3fd6d92490), then the first version of the mapping entry was
PTWed into CPU0's TLB.
When the task switched from CPU0 to CPU1 (No local_tlb_flush_all here by
asid), it happened to write a value on the address (0x3fd6d92490). It
caused do_page_fault -> wp_page_copy -> ptep_clear_flush ->
ptep_get_and_clear & flush_tlb_page.
The flush_tlb_page used mm_cpumask(mm) to determine which CPUs need TLB
flush, but CPU0 had cleared the CPU0's mm_cpumask in the previous
switch_mm. So we only flushed the CPU1 TLB and set the second version
mapping of the PTE. When the task switched from CPU1 to CPU0 again, CPU0
still used a stale TLB mapping entry which contained a wrong target
physical address. It raised a bug when the task happened to read that
value.
CPU0 CPU1
- switch 'task' in
- read addr (Fill stale mapping
entry into TLB)
- switch 'task' out (no tlb_flush)
- switch 'task' in (no tlb_flush)
- write addr cause pagefault
do_page_fault() (change to
new addr mapping)
wp_page_copy()
ptep_clear_flush()
ptep_get_and_clear()
& flush_tlb_page()
write new value into addr
- switch 'task' out (no tlb_flush)
- switch 'task' in (no tlb_flush)
- read addr again (Use stale
mapping entry in TLB)
get wrong value from old phyical
addr, BUG!
The solution is to keep all CPUs' footmarks of cpumask(mm) in switch_mm,
which could guarantee to invalidate all stale TLB entries during TLB
flush.
Fixes: 65d4b9c530 ("RISC-V: Implement ASID allocator")
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Tested-by: Zong Li <zong.li@sifive.com>
Tested-by: Sergey Matyukevich <sergey.matyukevich@syntacore.com>
Cc: Anup Patel <apatel@ventanamicro.com>
Cc: Palmer Dabbelt <palmer@rivosinc.com>
Cc: stable@vger.kernel.org
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20230226150137.1919750-3-geomatsi@gmail.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Variable 'pirq', which may receive negative value
in platform_get_irq().
Used as an index in a function regmap_irq_get_virq().
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Signed-off-by: Denis Arefev <arefev@swemel.ru>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
This doesn't need to be printed every second as an error:
...
<3>[17438.628385] cros-usbpd-charger cros-usbpd-charger.3.auto: Port 1: default case!
<3>[17439.634176] cros-usbpd-charger cros-usbpd-charger.3.auto: Port 1: default case!
<3>[17440.640298] cros-usbpd-charger cros-usbpd-charger.3.auto: Port 1: default case!
...
Reduce priority from ERROR to DEBUG.
Signed-off-by: Grant Grundler <grundler@chromium.org>
Reviewed-by: Guenter Roeck <groeck@chromium.org>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
The tmp is defined as u32 type, which results in invalid processing of
tmp<0 in function rk817_read_or_set_full_charge_on_boot(). Therefore,
drop the comparison.
drivers/power/supply/rk817_charger.c:828 rk817_read_or_set_full_charge_on_boot() warn: unsigned 'tmp' is never less than zero.
drivers/power/supply/rk817_charger.c:788 rk817_read_or_set_full_charge_on_boot() warn: unsigned 'tmp' is never less than zero.
Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=3444
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Tested-by: Chris Morgan <macromorgan@hotmail.com>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
We're currently using stop_machine() to update ftrace & kprobes, which
means that the thread that takes text_mutex during may not be the same
as the thread that eventually patches the code. This isn't actually a
race because the lock is still held (preventing any other concurrent
accesses) and there is only one thread running during stop_machine(),
but it does trigger a lockdep failure.
This patch just elides the lockdep check during stop_machine.
Fixes: c15ac4fd60 ("riscv/ftrace: Add dynamic function tracer support")
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Reported-by: Changbin Du <changbin.du@gmail.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230303143754.4005217-1-conor.dooley@microchip.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
A user should be allowed to take out a lease via an idmapped mount if
the fsuid matches the mapped uid of the inode. generic_setlease() is
checking the unmapped inode uid, causing these operations to be denied.
Fix this by comparing against the mapped inode uid instead of the
unmapped uid.
Fixes: 9caccd4154 ("fs: introduce MOUNT_ATTR_IDMAP")
Cc: stable@vger.kernel.org
Signed-off-by: Seth Forshee (DigitalOcean) <sforshee@kernel.org>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
The i2cdev_{at,de}tach_adapter() callbacks are used for two purposes:
1. As notifier callbacks, when (un)registering I2C adapters created or
destroyed after i2c_dev_init(),
2. As bus iterator callbacks, for registering already existing
adapters from i2c_dev_init(), and for cleanup.
Unfortunately both use cases expect different return values: the former
expects NOTIFY_* return codes, while the latter expects zero or error
codes, and aborts in case of error.
Hence in case 2, as soon as i2cdev_{at,de}tach_adapter() returns
(non-zero) NOTIFY_OK, the bus iterator aborts. This causes (a) only the
first already existing adapter to be registered, leading to missing
/dev/i2c-* entries, and (b) a failure to unregister all but the first
I2C adapter during cleanup.
Fix this by introducing separate callbacks for the bus iterator,
wrapping the notifier functions, and always returning succes.
Any errors inside these callback functions are unlikely to happen, and
are fatal anyway.
Fixes: cddf70d0bc ("i2c: dev: fix notifier return values")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Now that .probe() was changed not to get the id parameter, drivers can
be converted back to that with the eventual goal to drop .probe_new().
Implement that for the i2c drivers that are part of the i2c core.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Now that .probe() was changed not to get the id parameter, drivers can
be converted back to that with the eventual goal to drop .probe_new().
Implement that for the i2c mux drivers.
Acked-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Peter Rosin <peda@axentia.se>
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
The second LPASS pin controller IO address is supposed to be the MCC
range which contains the slew rate registers. The Linux driver then
accesses slew rate register with hard-coded offset (0xa000). However
the DTS contained the address of slew rate register as the second IO
address, thus any reads were effectively pass the memory space and lead
to "Internal error: synchronous external aborts" when applying pin
configuration.
Fixes: 6de7f9c343 ("arm64: dts: qcom: sm8550: add GPR and LPASS pin controller")
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Link: https://lore.kernel.org/r/20230302154724.856062-1-krzysztof.kozlowski@linaro.org
Commit b8a1a4cd5a ("i2c: Provide a temporary .probe_new() call-back
type") introduced a new probe callback to convert i2c init routines to
not take an i2c_device_id parameter. Now that all in-tree drivers are
converted to the temporary .probe_new() callback, .probe() can be
modified to match the desired prototype.
Now that .probe() and .probe_new() have the same semantic, they can be
defined as members of an anonymous union to save some memory and
simplify the core code a bit.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
VA dmics 0, 1, 2 micbias on X13s are connected to WCD MICBIAS1, WCD MICBIAS1
and WCD MICBIAS3 respectively. Reflect this in dt to get dmics working.
Also fix dmics to go via VA Macro instead of TX macro to fix device switching.
Fixes: 8c1ea87e80b4 ("arm64: dts: qcom: sc8280xp-x13s: Add soundcard support")
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Link: https://lore.kernel.org/r/20230302115741.7726-5-srinivas.kandagatla@linaro.org
msm-fixes for v6.3-rc2
- Fix for possible invalid ptr free in submit ioctl syncobj cleanup path.
- Synchronize GMU removal in driver teardown path
- a5xx preemption fixes
- Fix runpm imbalance at unbind
- DPU hw catalog fixes:
- set DPU_MDP_PERIPH_0_REMOVED for sc8280xp as this is another chipset
where the PERIPH_0 block of registers is not there
- fix the DPU features supported in QCM2290 by comparing it with the
downstream device tree
- fix the length of registers in the sc7180_ctl from 0xe4 to 0x1dc
- fix the max mixer line width for sm6115 and qcm2290 chipsets in the
DPU catalog
- fix the scaler version on sm8550, sc8280xp, sm8450, sm8250, sm8350
and sm6115. This was incorrectly populated on the SW version of the
scaler library and not the scaler HW version
- Drop dim layer support for msm8998 as its not indicated to be
supported in the downstream DTSI
- fix the DPU_CLK_CTRL bits for msm 8998 sspp blocks
- Use DPU_CLK_CTRL_DMA* prefix instead of DPU_CLK_CTRL_CURSOR*
for all chipsets for the DMA sspp blocks
- fix the ping-pong block base address for sc7280 in the DPU HW catalog
- Fix stack corruption issue in the dpu_hw_ctl_setup_blendstage() function
as it was causing a negative left shift by protecting against an invalid
index
- Clear the DSPP reservations in dpu_rm_release(). This was missed out and
as as result the DSPP was not released from the resource manager global
state.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rob Clark <robdclark@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/CAF6AEGvH+VH_Wx3mFMG51CMnoiU06CM-+-WMhM73M42Qx7Bp4A@mail.gmail.com
Pull networking fixes from Paolo Abeni:
"Including fixes from netfilter and bpf.
Current release - regressions:
- core: avoid skb end_offset change in __skb_unclone_keeptruesize()
- sched:
- act_connmark: handle errno on tcf_idr_check_alloc
- flower: fix fl_change() error recovery path
- ieee802154: prevent user from crashing the host
Current release - new code bugs:
- eth: bnxt_en: fix the double free during device removal
- tools: ynl:
- fix enum-as-flags in the generic CLI
- fully inherit attrs in subsets
- re-license uniformly under GPL-2.0 or BSD-3-clause
Previous releases - regressions:
- core: use indirect calls helpers for sk_exit_memory_pressure()
- tls:
- fix return value for async crypto
- avoid hanging tasks on the tx_lock
- eth: ice: copy last block omitted in ice_get_module_eeprom()
Previous releases - always broken:
- core: avoid double iput when sock_alloc_file fails
- af_unix: fix struct pid leaks in OOB support
- tls:
- fix possible race condition
- fix device-offloaded sendpage straddling records
- bpf:
- sockmap: fix an infinite loop error
- test_run: fix &xdp_frame misplacement for LIVE_FRAMES
- fix resolving BTF_KIND_VAR after ARRAY, STRUCT, UNION, PTR
- netfilter: tproxy: fix deadlock due to missing BH disable
- phylib: get rid of unnecessary locking
- eth: bgmac: fix *initial* chip reset to support BCM5358
- eth: nfp: fix csum for ipsec offload
- eth: mtk_eth_soc: fix RX data corruption issue
Misc:
- usb: qmi_wwan: add telit 0x1080 composition"
* tag 'net-6.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (64 commits)
tools: ynl: fix enum-as-flags in the generic CLI
tools: ynl: move the enum classes to shared code
net: avoid double iput when sock_alloc_file fails
af_unix: fix struct pid leaks in OOB support
eth: fealnx: bring back this old driver
net: dsa: mt7530: permit port 5 to work without port 6 on MT7621 SoC
net: microchip: sparx5: fix deletion of existing DSCP mappings
octeontx2-af: Unlock contexts in the queue context cache in case of fault detection
net/smc: fix fallback failed while sendmsg with fastopen
ynl: re-license uniformly under GPL-2.0 OR BSD-3-Clause
mailmap: update entries for Stephen Hemminger
mailmap: add entry for Maxim Mikityanskiy
nfc: change order inside nfc_se_io error path
ethernet: ice: avoid gcc-9 integer overflow warning
ice: don't ignore return codes in VSI related code
ice: Fix DSCP PFC TLV creation
net: usb: qmi_wwan: add Telit 0x1080 composition
net: usb: cdc_mbim: avoid altsetting toggling for Telit FE990
netfilter: conntrack: adopt safer max chain length
net: tls: fix device-offloaded sendpage straddling records
...
Pull HID fixes from Benjamin Tissoires:
- fix potential out of bound write of zeroes in HID core with a
specially crafted uhid device (Lee Jones)
- fix potential use-after-free in work function in intel-ish-hid (Reka
Norman)
- selftests config fixes (Benjamin Tissoires)
- few device small fixes and support
* tag 'for-linus-2023030901' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
HID: intel-ish-hid: ipc: Fix potential use-after-free in work function
HID: logitech-hidpp: Add support for Logitech MX Master 3S mouse
HID: cp2112: Fix driver not registering GPIO IRQ chip as threaded
selftest: hid: fix hid_bpf not set in config
HID: uhid: Over-ride the default maximum data buffer value with our own
HID: core: Provide new max_buffer_size attribute to over-ride the default
Pull m68k fixes from Geert Uytterhoeven:
- Fix systems with memory at end of 32-bit address space
- Fix initrd on systems where memory does not start at address zero
- Fix 68030 handling of bus errors for addresses in exception tables
* tag 'm68k-for-v6.3-tag2' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k:
m68k: Only force 030 bus error if PC not in exception table
m68k: mm: Move initrd phys_to_virt handling after paging_init()
m68k: mm: Fix systems with memory at end of 32-bit address space
We fetch %SR value from sigframe; it might have been modified by signal
handler, so we can't trust it with any bits that are not modifiable in
user mode.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Rich Felker <dalias@libc.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If io_uring.o is built with W=1, it triggers a warning:
io_uring/io_uring.c: In function ‘__io_submit_flush_completions’:
io_uring/io_uring.c:1502:40: warning: variable ‘prev’ set but not used [-Wunused-but-set-variable]
1502 | struct io_wq_work_node *node, *prev;
| ^~~~
which is due to the wq_list_for_each() iterator always keeping a 'prev'
variable. Most users need this to remove an entry from a list, for
example, but __io_submit_flush_completions() never does that.
Add a basic helper that doesn't track prev instead, and use that in
that function.
Reported-by: Vincenzo Palazzo <vincenzopalazzodev@gmail.com>
Reviewed-by: Vincenzo Palazzo <vincenzopalazzodev@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
In ioctl(KD_FONT_OP_GET_TALL), userland tells through op->height which
vpitch should be used to copy over the font. In con_font_get, we were
not checking that it is within the maximum height value, and thus
userland could make the vc->vc_sw->con_font_get(vc, &font, vpitch);
call possibly overflow the allocated max_font_size bytes, and the
copy_to_user(op->data, font.data, c) call possibly read out of that
allocated buffer.
By checking vpitch against max_font_height, the max_font_size buffer
will always be large enough for the vc->vc_sw->con_font_get(vc, &font,
vpitch) call (since we already prevent loading a font larger than that),
and c = (font.width+7)/8 * vpitch * font.charcount will always remain
below max_font_size.
Fixes: 24d69384bc ("VT: Add KD_FONT_OP_SET/GET_TALL operations")
Reported-by: syzbot+3af17071816b61e807ed@syzkaller.appspotmail.com
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Reviewed-by: Jiri Slaby <jirislaby@kernel.org>
Link: https://lore.kernel.org/r/20230306094921.tik5ewne4ft6mfpo@begin
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A recent commit added back the calls top stop tx and rx to shutdown()
which had previously been removed by commit e83766334f ("tty: serial:
qcom_geni_serial: No need to stop tx/rx on UART shutdown") in order to
be able to use kgdb after stopping the getty.
Not only did this again break kgdb, but it also broke serial consoles
more generally by hanging TX when stopping the getty during reboot.
The underlying problem has been there since the driver was first merged
and fixing it is going to be a bit involved so simply stop calling the
broken stop functions during shutdown for consoles for now.
Fixes: d8aca2f968 ("tty: serial: qcom-geni-serial: stop operations in progress at shutdown")
Cc: stable <stable@kernel.org>
Cc: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Tested-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Reviewed-by: Andrew Halaney <ahalaney@redhat.com>
Tested-by: Andrew Halaney <ahalaney@redhat.com> # sa8540p-ride
Link: https://lore.kernel.org/r/20230307164405.14218-2-johan+linaro@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
From time to time DMA completion can come in the middle of DMA shutdown:
<process ctx>: <IRQ>:
lpuart32_shutdown()
lpuart_dma_shutdown()
del_timer_sync()
lpuart_dma_rx_complete()
lpuart_copy_rx_to_tty()
mod_timer()
lpuart_dma_rx_free()
When the timer fires a bit later, sport->dma_rx_desc is NULL:
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000004
pc : lpuart_copy_rx_to_tty+0xcc/0x5bc
lr : lpuart_timer_func+0x1c/0x2c
Call trace:
lpuart_copy_rx_to_tty
lpuart_timer_func
call_timer_fn
__run_timers.part.0
run_timer_softirq
__do_softirq
__irq_exit_rcu
irq_exit
handle_domain_irq
gic_handle_irq
call_on_irq_stack
do_interrupt_handler
...
To fix this fold del_timer_sync() into lpuart_dma_rx_free() after
dmaengine_terminate_sync() to make sure timer will not be re-started in
lpuart_copy_rx_to_tty() <= lpuart_dma_rx_complete().
Fixes: 4a8588a1cf ("serial: fsl_lpuart: delete timer on shutdown")
Cc: stable <stable@kernel.org>
Signed-off-by: Alexander Sverdlin <alexander.sverdlin@siemens.com>
Link: https://lore.kernel.org/r/20230309134302.74940-2-alexander.sverdlin@siemens.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
According to LPUART RM, Transmission Complete Flag becomes 0 if queuing
a break character by writing 1 to CTRL[SBK], so here need to skip
waiting for transmission complete when UARTCTRL_SBK is asserted,
otherwise the kernel may stuck here.
And actually set_termios() adds transmission completion waiting to avoid
data loss or data breakage when changing the baud rate, but we don't
need to worry about this when queuing break characters.
Signed-off-by: Sherry Sun <sherry.sun@nxp.com>
Cc: stable <stable@kernel.org>
Link: https://lore.kernel.org/r/20230223093941.31790-1-sherry.sun@nxp.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This reverts commit 5779a072c2.
This results in a link error of
ld: drivers/tty/serial/earlycon.o: in function `parse_options':
drivers/tty/serial/earlycon.c:99: undefined reference to `uart_parse_earlycon'
When the config is in this state
CONFIG_SERIAL_CORE=m
CONFIG_SERIAL_CORE_CONSOLE=y
CONFIG_SERIAL_EARLYCON=y
CONFIG_SERIAL_FSL_LPUART=m
CONFIG_SERIAL_FSL_LPUART_CONSOLE=y
Fixes: 5779a072c2 ("tty: serial: fsl_lpuart: adjust SERIAL_FSL_LPUART_CONSOLE config dependency")
Cc: stable <stable@kernel.org>
Signed-off-by: Tom Rix <trix@redhat.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Jiri Slaby <jirislaby@kernel.org>
Link: https://lore.kernel.org/r/20230226173846.236691-1-trix@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When neither "no_read_workqueue" nor "no_write_workqueue" are enabled,
tasklet_trylock() in crypt_dec_pending() may still return false due to
an uninitialized state, and dm-crypt will unnecessarily do io completion
in io_queue workqueue instead of current context.
Fix this by adding an 'in_tasklet' flag to dm_crypt_io struct and
initialize it to false in crypt_io_init(). Set this flag to true in
kcryptd_queue_crypt() before calling tasklet_schedule(). If set
crypt_dec_pending() will punt io completion to a workqueue.
This also nicely avoids the tasklet_trylock/unlock hack when tasklets
aren't in use.
Fixes: 8e14f61015 ("dm crypt: do not call bio_endio() from the dm-crypt tasklet")
Cc: stable@vger.kernel.org
Reported-by: Hou Tao <houtao1@huawei.com>
Suggested-by: Ignat Korchagin <ignat@cloudflare.com>
Reviewed-by: Ignat Korchagin <ignat@cloudflare.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Some boards might use USB-A female connector for USB ports, however,
the port could be connected to a dual-mode USB controller, making it
also behaves as a peripheral device if male-to-male cable is connected.
In this case, the dts looks like this:
&usb0 {
status = "okay";
dr_mode = "otg";
usb-role-switch;
role-switch-default-mode = "host";
};
After boot, dwc2_ovr_init() sets GOTGCTL to GOTGCTL_AVALOVAL and call
dwc2_force_mode() with parameter host=false, which causes inconsistent
mode - The hardware is in peripheral mode while the kernel status is
in host mode.
What we can do now is to call dwc2_drd_role_sw_set() to switch to
device mode, and everything should work just fine now, even switching
back to none(default) mode afterwards.
Fixes: e14acb8769 ("usb: dwc2: drd: add role-switch-default-node support")
Cc: stable <stable@kernel.org>
Signed-off-by: Ziyang Huang <hzyitc@outlook.com>
Tested-by: Fabrice Gasnier <fabrice.gasnier@foss.st.com>
Acked-by: Minas Harutyunyan <hminas@synopsys.com>
Reviewed-by: Amelie Delaunay <amelie.delaunay@foss.st.com>
Link: https://lore.kernel.org/r/SG2PR01MB204837BF68EDB0E343D2A375C9A59@SG2PR01MB2048.apcprd01.prod.exchangelabs.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The kernel will dump in the below cases:
sysfs: cannot create duplicate filename
'/devices/virtual/usb_power_delivery/pd1/source-capabilities'
1. After soft reset has completed, an Explicit Contract negotiation occurs.
The sink device will receive source capabilitys again. This will cause
a duplicate source-capabilities file be created.
2. Power swap twice on a device that is initailly sink role.
This will unregister existing capabilities when above cases occurs.
Fixes: 8203d26905 ("usb: typec: tcpm: Register USB Power Delivery Capabilities")
cc: <stable@vger.kernel.org>
Signed-off-by: Xu Yang <xu.yang_2@nxp.com>
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/20230215054951.238394-1-xu.yang_2@nxp.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In the unbind callback for f_uac1 and f_uac2, a call to snd_card_free()
via g_audio_cleanup() will disconnect the card and then wait for all
resources to be released, which happens when the refcount falls to zero.
Since userspace can keep the refcount incremented by not closing the
relevant file descriptor, the call to unbind may block indefinitely.
This can cause a deadlock during reboot, as evidenced by the following
blocked task observed on my machine:
task:reboot state:D stack:0 pid:2827 ppid:569 flags:0x0000000c
Call trace:
__switch_to+0xc8/0x140
__schedule+0x2f0/0x7c0
schedule+0x60/0xd0
schedule_timeout+0x180/0x1d4
wait_for_completion+0x78/0x180
snd_card_free+0x90/0xa0
g_audio_cleanup+0x2c/0x64
afunc_unbind+0x28/0x60
...
kernel_restart+0x4c/0xac
__do_sys_reboot+0xcc/0x1ec
__arm64_sys_reboot+0x28/0x30
invoke_syscall+0x4c/0x110
...
The issue can also be observed by opening the card with arecord and
then stopping the process through the shell before unbinding:
# arecord -D hw:UAC2Gadget -f S32_LE -c 2 -r 48000 /dev/null
Recording WAVE '/dev/null' : Signed 32 bit Little Endian, Rate 48000 Hz, Stereo
^Z[1]+ Stopped arecord -D hw:UAC2Gadget -f S32_LE -c 2 -r 48000 /dev/null
# echo gadget.0 > /sys/bus/gadget/drivers/configfs-gadget/unbind
(observe that the unbind command never finishes)
Fix the problem by using snd_card_free_when_closed() instead, which will
still disconnect the card as desired, but defer the task of freeing the
resources to the core once userspace closes its file descriptor.
Fixes: 132fcb4608 ("usb: gadget: Add Audio Class 2.0 Driver")
Cc: stable@vger.kernel.org
Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk>
Reviewed-by: Ruslan Bilovol <ruslan.bilovol@gmail.com>
Reviewed-by: John Keeping <john@metanate.com>
Link: https://lore.kernel.org/r/20230302163648.3349669-1-alvin@pqrs.dk
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Previously, there was a 100uS delay inserted after issuing an end transfer
command for specific controller revisions. This was due to the fact that
there was a GUCTL2 bit field which enabled synchronous completion of the
end transfer command once the CMDACT bit was cleared in the DEPCMD
register. Since this bit does not exist for all controller revisions and
the current implementation heavily relies on utizling the EndTransfer
command completion interrupt, add the delay back in for uses where the
interrupt on completion bit is not set, and increase the duration to 1ms
for the controller to complete the command.
An issue was seen where the USB request buffer was unmapped while the DWC3
controller was still accessing the TRB. However, it was confirmed that the
end transfer command was successfully submitted. (no end transfer timeout)
In situations, such as dwc3_gadget_soft_disconnect() and
__dwc3_gadget_ep_disable(), the dwc3_remove_request() is utilized, which
will issue the end transfer command, and follow up with
dwc3_gadget_giveback(). At least for the USB ep disable path, it is
required for any pending and started requests to be completed and returned
to the function driver in the same context of the disable call. Without
the GUCTL2 bit, it is not ensured that the end transfer is completed before
the buffers are unmapped.
Fixes: cf2f8b63f7 ("usb: dwc3: gadget: Remove END_TRANSFER delay")
Cc: stable <stable@kernel.org>
Signed-off-by: Wesley Cheng <quic_wcheng@quicinc.com>
Acked-by: Thinh Nguyen <Thinh.Nguyen@synopsys.com>
Link: https://lore.kernel.org/r/20230306200557.29387-1-quic_wcheng@quicinc.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently only one stream is supported. This isn't usally a problem
until you have a multi codec audio card. Because the audio card will run
startup and shutdown on both capture and playback streams. So if your
hdmi-codec only support either playback or capture. Then ALSA can't open
for playback and capture.
This patch will ignore if startup and shutdown are called with a non
supported stream. Thus, allowing an audio card like this:
+-+
cpu1 <--@-| |-> codec1 (HDMI-CODEC)
| |<- codec2 (NOT HDMI-CODEC)
+-+
Signed-off-by: Emil Svendsen <emas@bang-olufsen.dk>
Link: https://lore.kernel.org/r/20230309065432.4150700-2-emas@bang-olufsen.dk
Signed-off-by: Mark Brown <broonie@kernel.org>
Commit 130a96d698 ("usb: typec: ucsi: acpi: Increase command
completion timeout value") increased the timeout from 5 seconds
to 60 seconds due to issues related to alternate mode discovery.
After the alternate mode discovery switch to polled mode
the timeout was reduced, but instead of being set back to
5 seconds it was reduced to 1 second.
This is causing problems when using a Lenovo ThinkPad X1 yoga gen7
connected over Type-C to a LG 27UL850-W (charging DP over Type-C).
When the monitor is already connected at boot the following error
is logged: "PPM init failed (-110)", /sys/class/typec is empty and
on unplugging the NULL pointer deref fixed earlier in this series
happens.
When the monitor is connected after boot the following error
is logged instead: "GET_CONNECTOR_STATUS failed (-110)".
Setting the timeout back to 5 seconds fixes both cases.
Fixes: e08065069f ("usb: typec: ucsi: acpi: Reduce the command completion timeout")
Cc: stable@vger.kernel.org
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20230308154244.722337-4-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
ucsi_init() which runs from a workqueue sets ucsi->connector and
on an error will clear it again.
ucsi->connector gets dereferenced by ucsi_resume(), this checks for
ucsi->connector being NULL in case ucsi_init() has not finished yet;
or in case ucsi_init() has failed.
ucsi_init() setting ucsi->connector and then clearing it again on
an error creates a race where the check in ucsi_resume() may pass,
only to have ucsi->connector free-ed underneath it when ucsi_init()
hits an error.
Fix this race by making ucsi_init() store the connector array in
a local variable and only assign it to ucsi->connector on success.
Fixes: bdc62f2bae ("usb: typec: ucsi: Simplified registration and I/O API")
Cc: stable@vger.kernel.org
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20230308154244.722337-3-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2023-03-07 (ice)
This series contains updates to ice driver only.
Dave removes masking from pfcena field as it was incorrectly preventing
valid traffic classes from being enabled.
Michal resolves various smatch issues such as not propagating error
codes and returning 0 explicitly.
Arnd Bergmann resolves gcc-9 warning for integer overflow.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
ethernet: ice: avoid gcc-9 integer overflow warning
ice: don't ignore return codes in VSI related code
ice: Fix DSCP PFC TLV creation
====================
Link: https://lore.kernel.org/r/20230307220714.3997294-1-anthony.l.nguyen@intel.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jakub Kicinski says:
====================
tools: ynl: fix enum-as-flags in the generic CLI
The CLI needs to use proper classes when looking at Enum definitions
rather than interpreting the YAML spec ad-hoc, because we have more
than on format of the definition supported.
====================
Link: https://lore.kernel.org/r/20230308003923.445268-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Lorenzo points out that the generic CLI is broken for the netdev
family. When I added the support for documentation of enums
(and sparse enums) the client script was not updated.
It expects the values in enum to be a list of names,
now it can also be a dict (YAML object).
Reported-by: Lorenzo Bianconi <lorenzo@kernel.org>
Fixes: e4b48ed460 ("tools: ynl: add a completely generic client")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The MT7530 switch from the MT7621 SoC has 2 ports which can be set up as
internal: port 5 and 6. Arınç reports that the GMAC1 attached to port 5
receives corrupted frames, unless port 6 (attached to GMAC0) has been
brought up by the driver. This is true regardless of whether port 5 is
used as a user port or as a CPU port (carrying DSA tags).
Offline debugging (blind for me) which began in the linked thread showed
experimentally that the configuration done by the driver for port 6
contains a step which is needed by port 5 as well - the write to
CORE_GSWPLL_GRP2 (note that I've no idea as to what it does, apart from
the comment "Set core clock into 500Mhz"). Prints put by Arınç show that
the reset value of CORE_GSWPLL_GRP2 is RG_GSWPLL_POSDIV_500M(1) |
RG_GSWPLL_FBKDIV_500M(40) (0x128), both on the MCM MT7530 from the
MT7621 SoC, as well as on the standalone MT7530 from MT7623NI Bananapi
BPI-R2. Apparently, port 5 on the standalone MT7530 can work under both
values of the register, while on the MT7621 SoC it cannot.
The call path that triggers the register write is:
mt753x_phylink_mac_config() for port 6
-> mt753x_pad_setup()
-> mt7530_pad_clk_setup()
so this fully explains the behavior noticed by Arınç, that bringing port
6 up is necessary.
The simplest fix for the problem is to extract the register writes which
are needed for both port 5 and 6 into a common mt7530_pll_setup()
function, which is called at mt7530_setup() time, immediately after
switch reset. We can argue that this mirrors the code layout introduced
in mt7531_setup() by commit 42bc4fafe3 ("net: mt7531: only do PLL once
after the reset"), in that the PLL setup has the exact same positioning,
and further work to consolidate the separate setup() functions is not
hindered.
Testing confirms that:
- the slight reordering of writes to MT7530_P6ECR and to
CORE_GSWPLL_GRP1 / CORE_GSWPLL_GRP2 introduced by this change does not
appear to cause problems for the operation of port 6 on MT7621 and on
MT7623 (where port 5 also always worked)
- packets sent through port 5 are not corrupted anymore, regardless of
whether port 6 is enabled by phylink or not (or even present in the
device tree)
My algorithm for determining the Fixes: tag is as follows. Testing shows
that some logic from mt7530_pad_clk_setup() is needed even for port 5.
Prior to commit ca366d6c88 ("net: dsa: mt7530: Convert to PHYLINK
API"), a call did exist for all phy_is_pseudo_fixed_link() ports - so
port 5 included. That commit replaced it with a temporary "Port 5 is not
supported!" comment, and the following commit 38f790a805 ("net: dsa:
mt7530: Add support for port 5") replaced that comment with a
configuration procedure in mt7530_setup_port5() which was insufficient
for port 5 to work. I'm laying the blame on the patch that claimed
support for port 5, although one would have also needed the change from
commit c3b8e07909 ("net: dsa: mt7530: setup core clock even in TRGMII
mode") for the write to be performed completely independently from port
6's configuration.
Thanks go to Arınç for describing the problem, for debugging and for
testing.
Reported-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Link: https://lore.kernel.org/netdev/f297c2c4-6e7c-57ac-2394-f6025d309b9d@arinc9.com/
Fixes: 38f790a805 ("net: dsa: mt7530: Add support for port 5")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230307155411.868573-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull udf fixes from Jan Kara:
"Fix bugs in UDF caused by the big pile of changes that went in during
the merge window"
* tag 'fs_for_v6.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
udf: Warn if block mapping is done for in-ICB files
udf: Fix reading of in-ICB files
udf: Fix lost writes in udf_adinicb_writepage()
Pull x86 platform driver fixes from Hans de Goede:
"A small set of assorted bug and build/warning fixes"
* tag 'platform-drivers-x86-v6.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform: mellanox: mlx-platform: Initialize shift variable to 0
platform/x86: int3472: Add GPIOs to Surface Go 3 Board data
platform/x86: ISST: Fix kernel documentation warnings
platform: x86: MLX_PLATFORM: select REGMAP instead of depending on it
platform: mellanox: select REGMAP instead of depending on it
platform/x86/intel/tpmi: Fix double free reported by Smatch
platform/x86: ISST: Increase range of valid mail box commands
platform/x86: dell-ddv: Fix temperature scaling
platform/x86: dell-ddv: Fix cache invalidation on resume
platform/x86/amd: pmc: remove CONFIG_SUSPEND checks
The implementation of 'current' on x86 is very intentionally special: it
is a very common thing to look up, and it uses 'this_cpu_read_stable()'
to get the current thread pointer efficiently from per-cpu storage.
And the keyword in there is 'stable': the current thread pointer never
changes as far as a single thread is concerned. Even if when a thread
is preempted, or moved to another CPU, or even across an explicit call
'schedule()' that thread will still have the same value for 'current'.
It is, after all, the kernel base pointer to thread-local storage.
That's why it's stable to begin with, but it's also why it's important
enough that we have that special 'this_cpu_read_stable()' access for it.
So this is all done very intentionally to allow the compiler to treat
'current' as a value that never visibly changes, so that the compiler
can do CSE and combine multiple different 'current' accesses into one.
However, there is obviously one very special situation when the
currently running thread does actually change: inside the scheduler
itself.
So the scheduler code paths are special, and do not have a 'current'
thread at all. Instead there are _two_ threads: the previous and the
next thread - typically called 'prev' and 'next' (or prev_p/next_p)
internally.
So this is all actually quite straightforward and simple, and not all
that complicated.
Except for when you then have special code that is run in scheduler
context, that code then has to be aware that 'current' isn't really a
valid thing. Did you mean 'prev'? Did you mean 'next'?
In fact, even if then look at the code, and you use 'current' after the
new value has been assigned to the percpu variable, we have explicitly
told the compiler that 'current' is magical and always stable. So the
compiler is quite free to use an older (or newer) value of 'current',
and the actual assignment to the percpu storage is not relevant even if
it might look that way.
Which is exactly what happened in the resctl code, that blithely used
'current' in '__resctrl_sched_in()' when it really wanted the new
process state (as implied by the name: we're scheduling 'into' that new
resctl state). And clang would end up just using the old thread pointer
value at least in some configurations.
This could have happened with gcc too, and purely depends on random
compiler details. Clang just seems to have been more aggressive about
moving the read of the per-cpu current_task pointer around.
The fix is trivial: just make the resctl code adhere to the scheduler
rules of using the prev/next thread pointer explicitly, instead of using
'current' in a situation where it just wasn't valid.
That same code is then also used outside of the scheduler context (when
a thread resctl state is explicitly changed), and then we will just pass
in 'current' as that pointer, of course. There is no ambiguity in that
case.
The fix may be trivial, but noticing and figuring out what went wrong
was not. The credit for that goes to Stephane Eranian.
Reported-by: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/lkml/20230303231133.1486085-1-eranian@google.com/
Link: https://lore.kernel.org/lkml/alpine.LFD.2.01.0908011214330.3304@localhost.localdomain/
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Tony Luck <tony.luck@intel.com>
Tested-by: Stephane Eranian <eranian@google.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
To last 2 parameters to cfg80211_get_bss() should be of
the enum ieee80211_bss_type resp. enum ieee80211_privacy types,
which WLAN_CAPABILITY_ESS very much is not.
Fix both cfg80211_get_bss() calls in ioctl_cfg80211.c to pass
the right parameters.
Note that the second call was already somewhat fixed by commenting
out WLAN_CAPABILITY_ESS and passing in 0 instead. This was still
not entirely correct though since that would limit returned
BSS-es to ESS type BSS-es with privacy on.
Cc: stable@vger.kernel.org
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20230306153512.162104-2-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There are 2 issues with the key-store index handling
1. The non WEP key stores can store keys with indexes 0 - BIP_MAX_KEYID,
this means that they should be an array with BIP_MAX_KEYID + 1
entries. But some of the arrays where just BIP_MAX_KEYID entries
big. While one other array was hardcoded to a size of 6 entries,
instead of using the BIP_MAX_KEYID define.
2. The rtw_cfg80211_set_encryption() and wpa_set_encryption() functions
index check where checking that the passed in key-index would fit
inside both the WEP key store (which only has 4 entries) as well as
in the non WEP key stores. This breaks any attempts to set non WEP
keys with index 4 or 5.
Issue 2. specifically breaks wifi connection with some access points
which advertise PMF support. Without this fix connecting to these
access points fails with the following wpa_supplicant messages:
nl80211: kernel reports: key addition failed
wlan0: WPA: Failed to configure IGTK to the driver
wlan0: RSN: Failed to configure IGTK
wlan0: CTRL-EVENT-DISCONNECTED bssid=... reason=1 locally_generated=1
Fix 1. by using the right size for the key-stores. After this 2. can
safely be fixed by checking the right max-index value depending on the
used algorithm, fixing wifi not working with some PMF capable APs.
Cc: stable@vger.kernel.org
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20230306153512.162104-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
AMD Erratum 1386 is summarised as:
XSAVES Instruction May Fail to Save XMM Registers to the Provided
State Save Area
This piece of accidental chronomancy causes the %xmm registers to
occasionally reset back to an older value.
Ignore the XSAVES feature on all AMD Zen1/2 hardware. The XSAVEC
instruction (which works fine) is equivalent on affected parts.
[ bp: Typos, move it into the F17h-specific function. ]
Reported-by: Tavis Ormandy <taviso@gmail.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: <stable@kernel.org>
Link: https://lore.kernel.org/r/20230307174643.1240184-1-andrew.cooper3@citrix.com
Every now and then reports come in that are puzzled on why changing
affinity on the io-wq workers fails with EINVAL. This happens because they
set PF_NO_SETAFFINITY as part of their creation, as io-wq organizes
workers into groups based on what CPU they are running on.
However, this is purely an optimization and not a functional requirement.
We can allow setting affinity, and just lazily update our worker to wqe
mappings. If a given io-wq thread times out, it normally exits if there's
no more work to do. The exception is if it's the last worker available.
For the timeout case, check the affinity of the worker against group mask
and exit even if it's the last worker. New workers should be created with
the right mask and in the right location.
Reported-by:Daniel Dao <dqminh@cloudflare.com>
Link: https://lore.kernel.org/io-uring/CA+wXwBQwgxB3_UphSny-yAP5b26meeOu1W4TwYVcD_+5gOhvPw@mail.gmail.com/
Signed-off-by: Jens Axboe <axboe@kernel.dk>
When mailboxes are used as a transport it is possible to setup the SCMI
transport layer, depending on the underlying channels configuration, to use
one or two mailboxes, associated, respectively, to one or two, distinct,
shared memory areas: any other combination should be treated as invalid.
Add more strict checking of SCMI mailbox transport device node descriptors.
Fixes: 5c8a47a5a9 ("firmware: arm_scmi: Make scmi core independent of the transport type")
Cc: <stable@vger.kernel.org> # 4.19
Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
Link: https://lore.kernel.org/r/20230307162324.891866-1-cristian.marussi@arm.com
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
The T: entries shall be composed of a SCM tree type (git, hg, quilt, stgit
or topgit) and location.
Add the SCM tree type to the T: entry and reorder the file entries in
alphabetical order.
Fixes: ddc84c9053 ("MAINTAINERS: update idmapping tree")
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Before commit fd571df0ac ("block, bfq: turn bfqq_data into an array
in bfq_io_cq"), process reference is read before bfq_put_stable_ref(),
and it's safe if bfq_put_stable_ref() put the last reference, because
process reference will be 0 and 'stable_merge_bfqq' won't be accessed
in this case. However, the commit changed the order and will cause
uaf for 'stable_merge_bfqq'.
In order to emphasize that bfq_put_stable_ref() can drop the last
reference, fix the problem by moving bfq_put_stable_ref() to the end of
bfq_setup_stable_merge().
Fixes: fd571df0ac ("block, bfq: turn bfqq_data into an array in bfq_io_cq")
Reported-and-tested-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Link: https://lore.kernel.org/linux-block/20230307071448.rzihxbm4jhbf5krj@shindev/
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Fix a race where kthread_stop() may prevent the threadfn from ever getting
called. If that happens the svc_rqst will not be cleaned up.
Fixes: ed6473ddc7 ("NFSv4: Fix callback server shutdown")
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Fix deletion of existing DSCP mappings in the APP table.
Adding and deleting DSCP entries are replicated per-port, since the
mapping table is global for all ports in the chip. Whenever a mapping
for a DSCP value already exists, the old mapping is deleted first.
However, it is only deleted for the specified port. Fix this by calling
sparx5_dcb_ieee_delapp() instead of dcb_ieee_delapp() as it ought to be.
Reproduce:
// Map and remap DSCP value 63
$ dcb app add dev eth0 dscp-prio 63:1
$ dcb app add dev eth0 dscp-prio 63:2
$ dcb app show dev eth0 dscp-prio
dscp-prio 63:2
$ dcb app show dev eth1 dscp-prio
dscp-prio 63:1 63:2 <-- 63:1 should not be there
Fixes: 8dcf69a641 ("net: microchip: sparx5: add support for offloading dscp table")
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
NDC caches contexts of frequently used queue's (Rx and Tx queues)
contexts. Due to a HW errata when NDC detects fault/poision while
accessing contexts it could go into an illegal state where a cache
line could get locked forever. To makesure all cache lines in NDC
are available for optimum performance upon fault/lockerror/posion
errors scan through all cache lines in NDC and clear the lock bit.
Fixes: 4a3581cd59 ("octeontx2-af: NPA AQ instruction enqueue support")
Signed-off-by: Suman Ghosh <sumang@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: Sai Krishna <saikrishnag@marvell.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Before determining whether the msg has unsupported options, it has been
prematurely terminated by the wrong status check.
For the application, the general usages of MSG_FASTOPEN likes
fd = socket(...)
/* rather than connect */
sendto(fd, data, len, MSG_FASTOPEN)
Hence, We need to check the flag before state check, because the sock
state here is always SMC_INIT when applications tries MSG_FASTOPEN.
Once we found unsupported options, fallback it to TCP.
Fixes: ee9dfbef02 ("net/smc: handle sockopts forcing fallback")
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
v2 -> v1: Optimize code style
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, calling clone3() with CLONE_NEWTIME in clone_args->flags
fails with -EINVAL. This is because CLONE_NEWTIME intersects with
CSIGNAL. However, CSIGNAL was deprecated when clone3 was introduced in
commit 7f192e3cd3 ("fork: add clone3"), allowing re-use of that part
of clone flags.
Fix this by explicitly allowing CLONE_NEWTIME in clone3_args_valid. This
is also in line with the respective check in check_unshare_flags which
allow CLONE_NEWTIME for unshare().
Fixes: 769071ac9f ("ns: Introduce Time Namespace")
Cc: Andrey Vagin <avagin@openvz.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
`nft_redir_inet_type.maxattrs` was being set, presumably because of a
cut-and-paste error, to `NFTA_MASQ_MAX`, instead of `NFTA_REDIR_MAX`.
Fixes: 63ce3940f3 ("netfilter: nft_redir: add inet support")
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The values in the protocol registers are two bytes wide. However, when
parsing the register loads, the code currently uses the larger 16-byte
size of a `union nf_inet_addr`. Change it to use the (correct) size of
a `union nf_conntrack_man_proto` instead.
Fixes: d07db9884a ("netfilter: nf_tables: introduce nft_validate_register_load()")
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The values in the protocol registers are two bytes wide. However, when
parsing the register loads, the code currently uses the larger 16-byte
size of a `union nf_inet_addr`. Change it to use the (correct) size of
a `union nf_conntrack_man_proto` instead.
Fixes: 8a6bf5da1a ("netfilter: nft_masq: support port range")
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The values in the protocol registers are two bytes wide. However, when
parsing the register loads, the code currently uses the larger 16-byte
size of a `union nf_inet_addr`. Change it to use the (correct) size of
a `union nf_conntrack_man_proto` instead.
Fixes: d07db9884a ("netfilter: nf_tables: introduce nft_validate_register_load()")
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The watch_queue_set_size() allocation error paths return the ret value
set via the prior pipe_resize_ring() call, which will always be zero.
As a result, IOC_WATCH_QUEUE_SET_SIZE callers such as "keyctl watch"
fail to detect kernel wqueue->notes allocation failures and proceed to
KEYCTL_WATCH_KEY, with any notifications subsequently lost.
Fixes: c73be61ced ("pipe: Add general notification queue support")
Signed-off-by: David Disseldorp <ddiss@suse.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
CONTROLLER_IN_GPU() is clearly intended to match only Intel devices, but
previously it checked only the PCI Device ID, not the Vendor ID, so it
could match devices from other vendors that happened to use the same Device
ID.
Update CONTROLLER_IN_GPU() so it matches only Intel devices.
Fixes: 535115b5ff ("ALSA: hda - Abort the probe without i915 binding for HSW/B")
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/r/20230307214054.886721-1-helgaas@kernel.org
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Port is allocated by sas_port_alloc_num() and rphy is allocated by either
sas_end_device_alloc() or sas_expander_alloc(), all of which may return
NULL. So we need to check the rphy to avoid possible NULL pointer access.
If sas_rphy_add() returned with failure, rphy is set to NULL. We would
access the rphy in the following lines which would also result NULL pointer
access.
Fixes: 78316e9dfc ("scsi: mpt3sas: Fix possible resource leaks in mpt3sas_transport_port_add()")
Signed-off-by: Wenchao Hao <haowenchao2@huawei.com>
Link: https://lore.kernel.org/r/20230225100135.2109330-1-haowenchao2@huawei.com
Acked-by: Sathya Prakash Veerichetty <sathya.prakash@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Apparently syzbot figured out that issuing this FSMAP call:
struct fsmap_head cmd = {
.fmh_count = ...;
.fmh_keys = {
{ .fmr_device = /* ext4 dev */, .fmr_physical = 0, },
{ .fmr_device = /* ext4 dev */, .fmr_physical = 0, },
},
...
};
ret = ioctl(fd, FS_IOC_GETFSMAP, &cmd);
Produces this crash if the underlying filesystem is a 1k-block ext4
filesystem:
kernel BUG at fs/ext4/ext4.h:3331!
invalid opcode: 0000 [#1] PREEMPT SMP
CPU: 3 PID: 3227965 Comm: xfs_io Tainted: G W O 6.2.0-rc8-achx
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014
RIP: 0010:ext4_mb_load_buddy_gfp+0x47c/0x570 [ext4]
RSP: 0018:ffffc90007c03998 EFLAGS: 00010246
RAX: ffff888004978000 RBX: ffffc90007c03a20 RCX: ffff888041618000
RDX: 0000000000000000 RSI: 00000000000005a4 RDI: ffffffffa0c99b11
RBP: ffff888012330000 R08: ffffffffa0c2b7d0 R09: 0000000000000400
R10: ffffc90007c03950 R11: 0000000000000000 R12: 0000000000000001
R13: 00000000ffffffff R14: 0000000000000c40 R15: ffff88802678c398
FS: 00007fdf2020c880(0000) GS:ffff88807e100000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffd318a5fe8 CR3: 000000007f80f001 CR4: 00000000001706e0
Call Trace:
<TASK>
ext4_mballoc_query_range+0x4b/0x210 [ext4 dfa189daddffe8fecd3cdfd00564e0f265a8ab80]
ext4_getfsmap_datadev+0x713/0x890 [ext4 dfa189daddffe8fecd3cdfd00564e0f265a8ab80]
ext4_getfsmap+0x2b7/0x330 [ext4 dfa189daddffe8fecd3cdfd00564e0f265a8ab80]
ext4_ioc_getfsmap+0x153/0x2b0 [ext4 dfa189daddffe8fecd3cdfd00564e0f265a8ab80]
__ext4_ioctl+0x2a7/0x17e0 [ext4 dfa189daddffe8fecd3cdfd00564e0f265a8ab80]
__x64_sys_ioctl+0x82/0xa0
do_syscall_64+0x2b/0x80
entry_SYSCALL_64_after_hwframe+0x46/0xb0
RIP: 0033:0x7fdf20558aff
RSP: 002b:00007ffd318a9e30 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00000000000200c0 RCX: 00007fdf20558aff
RDX: 00007fdf1feb2010 RSI: 00000000c0c0583b RDI: 0000000000000003
RBP: 00005625c0634be0 R08: 00005625c0634c40 R09: 0000000000000001
R10: 0000000000000000 R11: 0000000000000246 R12: 00007fdf1feb2010
R13: 00005625be70d994 R14: 0000000000000800 R15: 0000000000000000
For GETFSMAP calls, the caller selects a physical block device by
writing its block number into fsmap_head.fmh_keys[01].fmr_device.
To query mappings for a subrange of the device, the starting byte of the
range is written to fsmap_head.fmh_keys[0].fmr_physical and the last
byte of the range goes in fsmap_head.fmh_keys[1].fmr_physical.
IOWs, to query what mappings overlap with bytes 3-14 of /dev/sda, you'd
set the inputs as follows:
fmh_keys[0] = { .fmr_device = major(8, 0), .fmr_physical = 3},
fmh_keys[1] = { .fmr_device = major(8, 0), .fmr_physical = 14},
Which would return you whatever is mapped in the 12 bytes starting at
physical offset 3.
The crash is due to insufficient range validation of keys[1] in
ext4_getfsmap_datadev. On 1k-block filesystems, block 0 is not part of
the filesystem, which means that s_first_data_block is nonzero.
ext4_get_group_no_and_offset subtracts this quantity from the blocknr
argument before cracking it into a group number and a block number
within a group. IOWs, block group 0 spans blocks 1-8192 (1-based)
instead of 0-8191 (0-based) like what happens with larger blocksizes.
The net result of this encoding is that blocknr < s_first_data_block is
not a valid input to this function. The end_fsb variable is set from
the keys that are copied from userspace, which means that in the above
example, its value is zero. That leads to an underflow here:
blocknr = blocknr - le32_to_cpu(es->s_first_data_block);
The division then operates on -1:
offset = do_div(blocknr, EXT4_BLOCKS_PER_GROUP(sb)) >>
EXT4_SB(sb)->s_cluster_bits;
Leaving an impossibly large group number (2^32-1) in blocknr.
ext4_getfsmap_check_keys checked that keys[0].fmr_physical and
keys[1].fmr_physical are in increasing order, but
ext4_getfsmap_datadev adjusts keys[0].fmr_physical to be at least
s_first_data_block. This implies that we have to check it again after
the adjustment, which is the piece that I forgot.
Reported-by: syzbot+6be2b977c89f79b6b153@syzkaller.appspotmail.com
Fixes: 4a4956249d ("ext4: fix off-by-one fsmap error on 1k block filesystems")
Link: https://syzkaller.appspot.com/bug?id=79d5768e9bfe362911ac1a5057a36fc6b5c30002
Cc: stable@vger.kernel.org
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Link: https://lore.kernel.org/r/Y+58NPTH7VNGgzdd@magnolia
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
A significant number of xfstests can cause ext4 to log one or more
warning messages when they are run on a test file system where the
inline_data feature has been enabled. An example:
"EXT4-fs warning (device vdc): ext4_dirblock_csum_set:425: inode
#16385: comm fsstress: No space for directory leaf checksum. Please
run e2fsck -D."
The xfstests include: ext4/057, 058, and 307; generic/013, 051, 068,
070, 076, 078, 083, 232, 269, 270, 390, 461, 475, 476, 482, 579, 585,
589, 626, 631, and 650.
In this situation, the warning message indicates a bug in the code that
performs the RENAME_WHITEOUT operation on a directory entry that has
been stored inline. It doesn't detect that the directory is stored
inline, and incorrectly attempts to compute a dirent block checksum on
the whiteout inode when creating it. This attempt fails as a result
of the integrity checking in get_dirent_tail (usually due to a failure
to match the EXT4_FT_DIR_CSUM magic cookie), and the warning message
is then emitted.
Fix this by simply collecting the inlined data state at the time the
search for the source directory entry is performed. Existing code
handles the rest, and this is sufficient to eliminate all spurious
warning messages produced by the tests above. Go one step further
and do the same in the code that resets the source directory entry in
the event of failure. The inlined state should be present in the
"old" struct, but given the possibility of a race there's no harm
in taking a conservative approach and getting that information again
since the directory entry is being reread anyway.
Fixes: b7ff91fd03 ("ext4: find old entry again if failed to rename whiteout")
Cc: stable@kernel.org
Signed-off-by: Eric Whitney <enwlinux@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230210173244.679890-1-enwlinux@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
When writing a page from an encrypted file that is using
filesystem-layer encryption (not inline encryption), ext4 encrypts the
pagecache page into a bounce page, then writes the bounce page.
It also passes the bounce page to wbc_account_cgroup_owner(). That's
incorrect, because the bounce page is a newly allocated temporary page
that doesn't have the memory cgroup of the original pagecache page.
This makes wbc_account_cgroup_owner() not account the I/O to the owner
of the pagecache page as it should.
Fix this by always passing the pagecache page to
wbc_account_cgroup_owner().
Fixes: 001e4a8775 ("ext4: implement cgroup writeback support")
Cc: stable@vger.kernel.org
Reported-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20230203005503.141557-1-ebiggers@kernel.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
When buffered write fails to copy data into underlying page cache page,
ocfs2_write_end_nolock() just zeroes out and dirties the page. This can
leave dirty page beyond EOF and if page writeback tries to write this page
before write succeeds and expands i_size, page gets into inconsistent
state where page dirty bit is clear but buffer dirty bits stay set
resulting in page data never getting written and so data copied to the
page is lost. Fix the problem by invalidating page beyond EOF after
failed write.
Link: https://lkml.kernel.org/r/20230302153843.18499-1-jack@suse.cz
Fixes: 6dbf7bb555 ("fs: Don't invalidate page buffers in block_write_full_page()")
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "migrate_pages: fix deadlock in batched synchronous
migration", v2.
Two deadlock bugs were reported for the migrate_pages() batching series.
Thanks Hugh and Pengfei. Analysis shows that if we have locked some other
folios except the one we are migrating, it's not safe in general to wait
synchronously, for example, to wait the writeback to complete or wait to
lock the buffer head.
So 1/3 fixes the deadlock in a simple way, where the batching support for
the synchronous migration is disabled. The change is straightforward and
easy to be understood. While 3/3 re-introduce the batching for
synchronous migration via trying to migrate asynchronously in batch
optimistically, then fall back to migrate synchronously one by one for
fail-to-migrate folios. Test shows that this can restore the TLB flushing
batching performance for synchronous migration effectively.
This patch (of 3):
Two deadlock bugs were reported for the migrate_pages() batching series.
Thanks Hugh and Pengfei! For example, in the following deadlock trace
snippet,
INFO: task kworker/u4:0:9 blocked for more than 147 seconds.
Not tainted 6.2.0-rc4-kvm+ #1314
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/u4:0 state:D stack:0 pid:9 ppid:2 flags:0x00004000
Workqueue: loop4 loop_rootcg_workfn
Call Trace:
<TASK>
__schedule+0x43b/0xd00
schedule+0x6a/0xf0
io_schedule+0x4a/0x80
folio_wait_bit_common+0x1b5/0x4e0
? __pfx_wake_page_function+0x10/0x10
__filemap_get_folio+0x73d/0x770
shmem_get_folio_gfp+0x1fd/0xc80
shmem_write_begin+0x91/0x220
generic_perform_write+0x10e/0x2e0
__generic_file_write_iter+0x17e/0x290
? generic_write_checks+0x12b/0x1a0
generic_file_write_iter+0x97/0x180
? __sanitizer_cov_trace_const_cmp4+0x1a/0x20
do_iter_readv_writev+0x13c/0x210
? __sanitizer_cov_trace_const_cmp4+0x1a/0x20
do_iter_write+0xf6/0x330
vfs_iter_write+0x46/0x70
loop_process_work+0x723/0xfe0
loop_rootcg_workfn+0x28/0x40
process_one_work+0x3cc/0x8d0
worker_thread+0x66/0x630
? __pfx_worker_thread+0x10/0x10
kthread+0x153/0x190
? __pfx_kthread+0x10/0x10
ret_from_fork+0x29/0x50
</TASK>
INFO: task repro:1023 blocked for more than 147 seconds.
Not tainted 6.2.0-rc4-kvm+ #1314
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:repro state:D stack:0 pid:1023 ppid:360 flags:0x00004004
Call Trace:
<TASK>
__schedule+0x43b/0xd00
schedule+0x6a/0xf0
io_schedule+0x4a/0x80
folio_wait_bit_common+0x1b5/0x4e0
? compaction_alloc+0x77/0x1150
? __pfx_wake_page_function+0x10/0x10
folio_wait_bit+0x30/0x40
folio_wait_writeback+0x2e/0x1e0
migrate_pages_batch+0x555/0x1ac0
? __pfx_compaction_alloc+0x10/0x10
? __pfx_compaction_free+0x10/0x10
? __this_cpu_preempt_check+0x17/0x20
? lock_is_held_type+0xe6/0x140
migrate_pages+0x100e/0x1180
? __pfx_compaction_free+0x10/0x10
? __pfx_compaction_alloc+0x10/0x10
compact_zone+0xe10/0x1b50
? lock_is_held_type+0xe6/0x140
? check_preemption_disabled+0x80/0xf0
compact_node+0xa3/0x100
? __sanitizer_cov_trace_const_cmp8+0x1c/0x30
? _find_first_bit+0x7b/0x90
sysctl_compaction_handler+0x5d/0xb0
proc_sys_call_handler+0x29d/0x420
proc_sys_write+0x2b/0x40
vfs_write+0x3a3/0x780
ksys_write+0xb7/0x180
__x64_sys_write+0x26/0x30
do_syscall_64+0x3b/0x90
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f3a2471f59d
RSP: 002b:00007ffe567f7288 EFLAGS: 00000217 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f3a2471f59d
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000005
RBP: 00007ffe567f72a0 R08: 0000000000000010 R09: 0000000000000010
R10: 0000000000000010 R11: 0000000000000217 R12: 00000000004012e0
R13: 00007ffe567f73e0 R14: 0000000000000000 R15: 0000000000000000
</TASK>
The page migration task has held the lock of the shmem folio A, and is
waiting the writeback of the folio B of the file system on the loop block
device to complete. While the loop worker task which writes back the
folio B is waiting to lock the shmem folio A, because the folio A backs
the folio B in the loop device. Thus deadlock is triggered.
In general, if we have locked some other folios except the one we are
migrating, it's not safe to wait synchronously, for example, to wait the
writeback to complete or wait to lock the buffer head.
To fix the deadlock, in this patch, we avoid to batch the page migration
except for MIGRATE_ASYNC mode. In MIGRATE_ASYNC mode, synchronous waiting
is avoided.
The fix can be improved further. We will do that as soon as possible.
Link: https://lkml.kernel.org/r/20230303030155.160983-1-ying.huang@intel.com
Link: https://lore.kernel.org/linux-mm/87a6c8c-c5c1-67dc-1e32-eb30831d6e3d@google.com/
Link: https://lore.kernel.org/linux-mm/874jrg7kke.fsf@yhuang6-desk2.ccr.corp.intel.com/
Link: https://lore.kernel.org/linux-mm/20230227110614.dngdub2j3exr6dfp@quack3/
Link: https://lkml.kernel.org/r/20230303030155.160983-2-ying.huang@intel.com
Fixes: 5dfab109d5 ("migrate_pages: batch _unmap and _move")
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Reported-by: Hugh Dickins <hughd@google.com>
Reported-by: "Xu, Pengfei" <pengfei.xu@intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Stefan Roesch <shr@devkernel.io>
Cc: Tejun Heo <tj@kernel.org>
Cc: Xin Hao <xhao@linux.alibaba.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
We can often end up inserting a block group item, for a new block group,
with a wrong value for the used bytes field.
This happens if for the new allocated block group, in the same transaction
that created the block group, we have tasks allocating extents from it as
well as tasks removing extents from it.
For example:
1) Task A creates a metadata block group X;
2) Two extents are allocated from block group X, so its "used" field is
updated to 32K, and its "commit_used" field remains as 0;
3) Transaction commit starts, by some task B, and it enters
btrfs_start_dirty_block_groups(). There it tries to update the block
group item for block group X, which currently has its "used" field with
a value of 32K. But that fails since the block group item was not yet
inserted, and so on failure update_block_group_item() sets the
"commit_used" field of the block group back to 0;
4) The block group item is inserted by task A, when for example
btrfs_create_pending_block_groups() is called when releasing its
transaction handle. This results in insert_block_group_item() inserting
the block group item in the extent tree (or block group tree), with a
"used" field having a value of 32K, but without updating the
"commit_used" field in the block group, which remains with value of 0;
5) The two extents are freed from block X, so its "used" field changes
from 32K to 0;
6) The transaction commit by task B continues, it enters
btrfs_write_dirty_block_groups() which calls update_block_group_item()
for block group X, and there it decides to skip the block group item
update, because "used" has a value of 0 and "commit_used" has a value
of 0 too.
As a result, we end up with a block item having a 32K "used" field but
no extents allocated from it.
When this issue happens, a btrfs check reports an error like this:
[1/7] checking root items
[2/7] checking extents
block group [1104150528 1073741824] used 39796736 but extent items used 0
ERROR: errors found in extent allocation tree or chunk allocation
(...)
Fix this by making insert_block_group_item() update the block group's
"commit_used" field.
Fixes: 7248e0cebb ("btrfs: skip update of block group item if used bytes are the same")
CC: stable@vger.kernel.org # 6.2+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
With older compilers like gcc-9, the calculation of the vlan
priority field causes a false-positive warning from the byteswap:
In file included from drivers/net/ethernet/intel/ice/ice_tc_lib.c:4:
drivers/net/ethernet/intel/ice/ice_tc_lib.c: In function 'ice_parse_cls_flower':
include/uapi/linux/swab.h:15:15: error: integer overflow in expression '(int)(short unsigned int)((int)match.key-><U67c8>.<U6698>.vlan_priority << 13) & 57344 & 255' of type 'int' results in '0' [-Werror=overflow]
15 | (((__u16)(x) & (__u16)0x00ffU) << 8) | \
| ~~~~~~~~~~~~^~~~~~~~~~~~~~~~~
include/uapi/linux/swab.h:106:2: note: in expansion of macro '___constant_swab16'
106 | ___constant_swab16(x) : \
| ^~~~~~~~~~~~~~~~~~
include/uapi/linux/byteorder/little_endian.h:42:43: note: in expansion of macro '__swab16'
42 | #define __cpu_to_be16(x) ((__force __be16)__swab16((x)))
| ^~~~~~~~
include/linux/byteorder/generic.h:96:21: note: in expansion of macro '__cpu_to_be16'
96 | #define cpu_to_be16 __cpu_to_be16
| ^~~~~~~~~~~~~
drivers/net/ethernet/intel/ice/ice_tc_lib.c:1458:5: note: in expansion of macro 'cpu_to_be16'
1458 | cpu_to_be16((match.key->vlan_priority <<
| ^~~~~~~~~~~
After a change to be16_encode_bits(), the code becomes more
readable to both people and compilers, which avoids the warning.
Fixes: 34800178b3 ("ice: Add support for VLAN priority filters in switchdev")
Suggested-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
There were few smatch warnings reported by Dan:
- ice_vsi_cfg_xdp_txqs can return 0 instead of ret, which is cleaner
- return values in ice_vsi_cfg_def were ignored
- in ice_vsi_rebuild return value was ignored in case rebuild failed,
it was a never reached code, however, rewrite it for clarity.
- ice_vsi_cfg_tc can return 0 instead of ret
Fixes: 6624e780a5 ("ice: split ice_vsi_setup into smaller functions")
Reported-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
When creating the TLV to send to the FW for configuring DSCP mode PFC,the
PFCENABLE field was being masked with a 4 bit mask (0xF), but this is an 8
bit bitmask for enabled classes for PFC. This means that traffic classes
4-7 could not be enabled for PFC.
Remove the mask completely, as it is not necessary, as we are assigning 8
bits to an 8 bit field.
Fixes: 2a87bd73e5 ("ice: Add DSCP support")
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Signed-off-by: Karen Ostrowska <karen.ostrowska@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
[Why]
Currently, the clk manager matches SocVoltage with voltage from
fused settings (dfPstate clock table). And then corresponding clocks
are selected.
However in certain situations, this leads to clk manager not
including at least one entry with highest supported clock setting.
[How]
Update the clk manager to include at least one entry with highest
supported clock setting.
Reviewed-by: Pavle Kotarac <pavle.kotarac@amd.com>
Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com>
Signed-off-by: Swapnil Patel <Swapnil.Patel@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Don't need to query error count and error address on harvest umc nodes.
v2: Fix code bug, use active_mask instead of harvsest_config
and remove unnecessary argument in LOOP macro.
v3: Leave adev->gmc.num_umc unchanged.
Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The EDID of an HDR display defines EOTFs that are supported
by the display and can be set in the HDR metadata infoframe.
Userspace is expected to read the EDID and set an appropriate
HDR_OUTPUT_METADATA.
In drm_parse_hdr_metadata_block the kernel reads the supported
EOTFs from the EDID and stores them in the
drm_connector->hdr_sink_metadata. While doing so it also
filters the EOTFs to the EOTFs the kernel knows about.
When an HDR_OUTPUT_METADATA is set it then checks to
make sure the EOTF is a supported EOTF. In cases where
the kernel doesn't know about a new EOTF this check will
fail, even if the EDID advertises support.
Since it is expected that userspace reads the EDID to understand
what the display supports it doesn't make sense for DRM to block
an HDR_OUTPUT_METADATA if it contains an EOTF the kernel doesn't
understand.
This comes with the added benefit of future-proofing metadata
support. If the spec defines a new EOTF there is no need to
update DRM and an compositor can immediately make use of it.
Bug: https://gitlab.freedesktop.org/wayland/weston/-/issues/609
v2: Distinguish EOTFs defind in kernel and ones defined
in EDID in the commit description (Pekka)
v3: Rebase; drm_hdmi_infoframe_set_hdr_metadata moved
to drm_hdmi_helper.c
Signed-off-by: Harry Wentland <harry.wentland@amd.com>
Cc: Pekka Paalanen <ppaalanen@gmail.com>
Cc: Sebastian Wick <sebastian.wick@redhat.com>
Cc: Vitaly.Prosyak@amd.com
Cc: Uma Shankar <uma.shankar@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Joshua Ashton <joshua@froggi.es>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: dri-devel@lists.freedesktop.org
Cc: amd-gfx@lists.freedesktop.org
Acked-by: Pekka Paalanen <pekka.paalanen@collabora.com>
Reviewed-By: Joshua Ashton <joshua@froggi.es>
Link: https://patchwork.freedesktop.org/patch/msgid/20230113162428.33874-2-harry.wentland@amd.com
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
The current interconnect provider registration interface is inherently
racy as nodes are not added until the after adding the provider. This
can specifically cause racing DT lookups to fail:
of_icc_xlate_onecell: invalid index 0
cpu cpu0: error -EINVAL: error finding src node
cpu cpu0: dev_pm_opp_of_find_icc_paths: Unable to get path0: -22
qcom-cpufreq-hw: probe of 18591000.cpufreq failed with error -22
Switch to using the new API where the provider is not registered until
after it has been fully initialised.
Fixes: 5bc9900add ("interconnect: qcom: Add OSM L3 interconnect provider support")
Cc: stable@vger.kernel.org # 5.7
Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org>
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Link: https://lore.kernel.org/r/20230306075651.2449-6-johan+linaro@kernel.org
Signed-off-by: Georgi Djakov <djakov@kernel.org>
The current interconnect provider interface is inherently racy as
providers are expected to be added before being fully initialised.
Specifically, nodes are currently not added and the provider data is not
initialised until after registering the provider which can cause racing
DT lookups to fail.
Add a new provider API which will be used to fix up the interconnect
drivers.
The old API is reimplemented using the new interface and will be removed
once all drivers have been fixed.
Fixes: 11f1ceca70 ("interconnect: Add generic on-chip interconnect API")
Fixes: 87e3031b6f ("interconnect: Allow endpoints translation via DT")
Cc: stable@vger.kernel.org # 5.1
Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org>
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Tested-by: Luca Ceresoli <luca.ceresoli@bootlin.com> # i.MX8MP MSC SM2-MB-EP1 Board
Link: https://lore.kernel.org/r/20230306075651.2449-4-johan+linaro@kernel.org
Signed-off-by: Georgi Djakov <djakov@kernel.org>
The interconnect framework currently expects that providers are only
removed when there are no users and after all nodes have been removed.
There is currently nothing that guarantees this to be the case and the
framework does not do any reference counting, but refusing to remove the
provider is never correct as that would leave a dangling pointer to a
resource that is about to be released in the global provider list (e.g.
accessible through debugfs).
Replace the current sanity checks with WARN_ON() so that the provider is
always removed.
Fixes: 11f1ceca70 ("interconnect: Add generic on-chip interconnect API")
Cc: stable@vger.kernel.org # 5.1: 680f8666ba: interconnect: Make icc_provider_del() return void
Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org>
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Tested-by: Luca Ceresoli <luca.ceresoli@bootlin.com> # i.MX8MP MSC SM2-MB-EP1 Board
Link: https://lore.kernel.org/r/20230306075651.2449-3-johan+linaro@kernel.org
Signed-off-by: Georgi Djakov <djakov@kernel.org>
Commit 596ff4a09b ("cpumask: re-introduce constant-sized cpumask
optimizations") changed cpumask_setall() to use "bitmap_set()" instead
of "bitmap_fill()", because bitmap_fill() would explicitly set all the
bits of a constant sized small bitmap, and that's exactly what we don't
want: we want to only set bits up to 'nr_cpu_ids', which is what
"bitmap_set()" does.
However, Yury correctly points out that while "bitmap_set()" does indeed
only set bits up to the required bitmap size, it doesn't _clear_ bits
above that size, so the upper bits would still not have well-defined
values.
Now, none of this should really matter, since any bits set past
'nr_cpu_ids' should always be ignored in the first place. Yes, the bit
scanning functions might return them as a result, but since users should
always consider the ">= nr_cpu_ids" condition to mean "no more bits",
that shouldn't have any actual effect (see previous commit 8ca09d5fa3
"cpumask: fix incorrect cpumask scanning result checks").
But let's just do it right, the way the code was _intended_ to work. We
have had enough lazy code that works but bites us in the *rse later
(again, see previous commit) that there's no reason to not just do this
properly.
It turns out that "bitmap_fill()" gets this all right for the complex
case, and really only fails for the inlined optimized case that just
fills the whole word. And while we could just fix bitmap_fill() to use
the proper last word mask, there's two issues with that:
- the cpumask case wants to do the _optimization_ based on "NR_CPUS is
a small constant", but then wants to do the actual bit _fill_ based
on "nr_cpu_ids" that isn't necessarily that same constant
- we have lots of non-cpumask users of bitmap_fill(), and while they
hopefully don't care, and probably would want the proper semantics
anyway ("only set bits up to the limit"), I do not want the cpumask
changes to impact other parts
So this ends up just doing the single-word optimization by hand in the
cpumask code. If our cpumask is fundamentally limited to a single word,
just do the proper "fill in that word" exactly. And if it's the more
complex multi-word case, then the generic bitmap_fill() will DTRT.
This is all an example of how our bitmap function optimizations really
are somewhat broken. They conflate the "this is size of the bitmap"
optimizations with the actual bit(s) we want to set.
In many cases we really want to have the two be separate things:
sometimes we base our optimizations on the size of the whole bitmap ("I
know this whole bitmap fits in a single word, so I'll just use
single-word accesses"), and sometimes we base them on the bit we are
looking at ("this is just acting on bits that are in the first word, so
I'll use single-word accesses").
Notice how the end result of the two optimizations are the same, but the
way we get to them are quite different.
And all our cpumask optimization games are really about that fundamental
distinction, and we'd often really want to pass in both the "this is the
bit I'm working on" (which _can_ be a small constant but might be
variable), and "I know it's in this range even if it's variable" (based
on CONFIG_NR_CPUS).
So this cpumask_setall() implementation just makes that explicit. It
checks the "I statically know the size is small" using the known static
size of the cpumask (which is what that 'small_cpumask_bits' is all
about), but then sets the actual bits using the exact number of cpus we
have (ie 'nr_cpumask_bits')
Of course, in a perfect world, the compiler would have done all the
range analysis (possibly with help from us just telling it that
"this value is always in this range"), and would do all of this for us.
But that is not the world we live in.
While we dream of that perfect world, this does that manual logic to
make it all work out. And this was a very long explanation for a small
code change that shouldn't even matter.
Reported-by: Yury Norov <yury.norov@gmail.com>
Link: https://lore.kernel.org/lkml/ZAV9nGG9e1%2FrV+L%2F@yury-laptop/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Merge series from Peter Ujfalusi <peter.ujfalusi@linux.intel.com>:
We have recently noticed that the ops_free callback was missed for the device
descriptions on Intel platforms.
Following the C text in the file, add a mention about the Rust
programming language, the currently supported compiler and
the edition used (similar to the "dialect" mention for C).
Similarly, add a mention about the unstable features used (similar
to the "extensions" mentions for C).
In addition, add some links to complement the information.
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
Link: https://lore.kernel.org/r/20230306191712.230658-2-ojeda@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
MicroSD card slot in the Pinebook Pro is located on a separate
daughterboard that's connected to the mainboard using a rather
long flat cable. The resulting signal degradation causes many
perfectly fine microSD cards not to work in the Pinebook Pro,
which is a common source of frustration among the owners.
Changing the mode and lowering the speed reportedly fixes this
issue and makes many microSD cards work as expected.
Co-developed-by: Dragan Simic <dragan.simic@gmail.com>
Signed-off-by: Dragan Simic <dragan.simic@gmail.com>
Tested-by: JR Gonzalez <jrg@scientiam.org>
Signed-off-by: Dan Johansen <strit@manjaro.org>
Link: https://lore.kernel.org/r/20230305104730.15849-1-strit@manjaro.org
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Just like the Quartz64 Model B the previously stated speed of sdr-104
in soquartz is too high for the hardware to reliably communicate with
some fast SD cards.
Especially on some carrierboards.
Lower this to sd-uhs-sdr50 to fix this.
Fixes: 5859b5a9c3 ("arm64: dts: rockchip: add SoQuartz CM4IO dts")
Signed-off-by: Dan Johansen <strit@manjaro.org>
Acked-by: Peter Geis <pgwipeout@gmail.com>
Link: https://lore.kernel.org/r/20230304164135.28430-1-strit@manjaro.org
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
With the removal of widget setup during BE hw_params, the DAI config IPC
is never sent with the SOF_DAI_CONFIG_FLAGS_HW_PARAMS. This means that
the early bit clock feature required for certain codecs will be broken.
Fix this by saving the config flags sent during BE DAI hw_params and
reusing it when the DAI_CONFIG IPC is sent after the DAI widget is set
up. Also, free the DAI config before the widget is freed.
The DAI_CONFIG IPC sent during the sof_widget_free() does not have the
DAI index information. So, save the dai_index in the config during
hw_params and reuse it during hw_free.
For IPC4, do not clear the node ID during hw_free. It will be needed for
freeing the group_ida during unprepare.
Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Rander Wang <rander.wang@intel.com>
Reviewed-by: Bard Liao <yung-chuan.liao@linux.intel.com>
Reviewed-by: Péter Ujfalusi <peter.ujfalusi@linux.intel.com>
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com>
Link: https://lore.kernel.org/r/20230307114639.4553-1-peter.ujfalusi@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
The Lenovo Yoga Book X90 is a x86 tablet which ships with Android x86
as factory OS. The Android x86 kernel fork ignores I2C devices described
in the DSDT, except for the PMIC and Audio codecs.
As usual the Lenovo Yoga Book X90's DSDT contains a bunch of extra I2C
devices which are not actually there, causing various resource conflicts.
Add an ACPI_QUIRK_SKIP_I2C_CLIENTS quirk for the Lenovo Yoga Book X90
to the acpi_quirk_skip_dmi_ids table to woraround this.
The DSDT also contains broken ACPI GPIO event handlers, disable those too.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Rafael J. Wysocki <rjw@rjwysocki.net>
The Acer Iconia One 7 B1-750 is a x86 tablet which ships with Android x86
as factory OS. The Android x86 kernel fork ignores I2C devices described
in the DSDT, except for the PMIC and Audio codecs.
As usual the Acer Iconia One 7 B1-750's DSDT contains a bunch of extra I2C
devices which are not actually there, causing various resource conflicts.
Add an ACPI_QUIRK_SKIP_I2C_CLIENTS quirk for the Acer Iconia One 7 B1-750
to the acpi_quirk_skip_dmi_ids table to woraround this.
The DSDT also contains broken ACPI GPIO event handlers, disable those too.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Rafael J. Wysocki <rjw@rjwysocki.net>
x86 ACPI boards which ship with only Android as their factory image usually
have pretty broken ACPI tables, relying on everything being hardcoded in
the factory kernel image and often disabling parts of the ACPI enumeration
kernel code to avoid the broken tables causing issues.
Part of this broken ACPI code is that sometimes these boards have _AEI
ACPI GPIO event handlers which are broken.
So far this has been dealt with in the platform/x86/x86-android-tablets.c
module, which contains various workarounds for these devices, by it calling
acpi_gpiochip_free_interrupts() on gpiochip-s with troublesome handlers to
disable the handlers.
But in some cases this is too late, if the handlers are of the edge type
then gpiolib-acpi.c's code will already have run them at boot.
This can cause issues such as GPIOs ending up as owned by "ACPI:OpRegion",
making them unavailable for drivers which actually need them.
Boards with these broken ACPI tables are already listed in
drivers/acpi/x86/utils.c for e.g. acpi_quirk_skip_i2c_client_enumeration().
Extend the quirks mechanism for a new acpi_quirk_skip_gpio_event_handlers()
helper, this re-uses the DMI-ids rather then having to duplicate the same
DMI table in gpiolib-acpi.c .
Also add the new ACPI_QUIRK_SKIP_GPIO_EVENT_HANDLERS quirk to existing
boards with troublesome ACPI gpio event handlers, so that the current
acpi_gpiochip_free_interrupts() hack can be removed from
x86-android-tablets.c .
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Rafael J. Wysocki <rjw@rjwysocki.net>
Sometimes the system boots up with a acpi_video0 backlight interface
which doesn't work. So add Dell Vostro 15 3535 into the
video_detect_dmi_table to set it to native explicitly.
Signed-off-by: Chia-Lin Kao (AceLan) <acelan.kao@canonical.com>
Signed-off-by: Rafael J. Wysocki <rjw@rjwysocki.net>
I²C peripheral devices that are connected to the controller are
represented in the Linux kernel as objects of the struct i2c_client.
Fix this in the documentation.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rjw@rjwysocki.net>
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
1) Restore ctnetlink zero mark in events and dump, from Ivan Delalande.
2) Fix deadlock due to missing disabled bh in tproxy, from Florian Westphal.
3) Safer maximum chain load in conntrack, from Eric Dumazet.
* 'main' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: conntrack: adopt safer max chain length
netfilter: tproxy: fix deadlock due to missing BH disable
netfilter: ctnetlink: revert to dumping mark regardless of event type
====================
Link: https://lore.kernel.org/r/20230307100424.2037-1-pablo@netfilter.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
REGMAP is a hidden (not user visible) symbol. Users cannot set it
directly thru "make *config", so drivers should select it instead of
depending on it if they need it.
Consistently using "select" or "depends on" can also help reduce
Kconfig circular dependency issues.
Therefore, change the use of "depends on REGMAP" to "select REGMAP".
For NVSW_SN2201, select REGMAP_I2C instead of depending on it.
Fixes: c6acad68eb ("platform/mellanox: mlxreg-hotplug: Modify to use a regmap interface")
Fixes: 5ec4a8ace0 ("platform/mellanox: Introduce support for Mellanox register access driver")
Fixes: 62f9529b8d ("platform/mellanox: mlxreg-lc: Add initial support for Nvidia line card devices")
Fixes: 662f24826f ("platform/mellanox: Add support for new SN2201 system")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Hans de Goede <hdegoede@redhat.com>
Cc: Michael Shych <michaelsh@nvidia.com>
Cc: Mark Gross <markgross@kernel.org>
Cc: Vadim Pasternak <vadimp@nvidia.com>
Cc: platform-driver-x86@vger.kernel.org
Link: https://lore.kernel.org/r/20230226053953.4681-6-rdunlap@infradead.org
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
After using the built-in UEFI hardware diagnostics to compare
the measured battery temperature, i noticed that the temperature
is actually expressed in tenth degree kelvin, similar to the
SBS-Data standard. For example, a value of 2992 is displayed as
26 degrees celsius.
Fix the scaling so that the correct values are being displayed.
Tested on a Dell Inspiron 3505.
Fixes: a77272c160 ("platform/x86: dell: Add new dell-wmi-ddv driver")
Signed-off-by: Armin Wolf <W_Armin@gmx.de>
Link: https://lore.kernel.org/r/20230218115318.20662-2-W_Armin@gmx.de
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
If one or both sensor buffers could not be initialized, either
due to missing hardware support or due to some error during probing,
the resume handler will encounter undefined behaviour when
attempting to lock buffers then protected by an uninitialized or
destroyed mutex.
Fix this by introducing a "active" flag which is set during probe,
and only invalidate buffers which where flaged as "active".
Tested on a Dell Inspiron 3505.
Fixes: 3b7eeff93d ("platform/x86: dell-ddv: Add hwmon support")
Signed-off-by: Armin Wolf <W_Armin@gmx.de>
Link: https://lore.kernel.org/r/20230218115318.20662-1-W_Armin@gmx.de
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
The amd_pmc_write_stb() function was previously hidden in an
ifdef to avoid a warning when CONFIG_SUSPEND is disabled, but
now there is an additional caller:
drivers/platform/x86/amd/pmc.c: In function 'amd_pmc_stb_debugfs_open_v2':
drivers/platform/x86/amd/pmc.c:256:8: error: implicit declaration of function 'amd_pmc_write_stb'; did you mean 'amd_pmc_read_stb'? [-Werror=implicit-function-declaration]
256 | ret = amd_pmc_write_stb(dev, AMD_PMC_STB_DUMMY_PC);
| ^~~~~~~~~~~~~~~~~
| amd_pmc_read_stb
There is now an easier way to handle this using DEFINE_SIMPLE_DEV_PM_OPS()
to replace all the #ifdefs, letting gcc drop any of the unused functions
silently.
Fixes: b0d4bb9735 ("platform/x86/amd: pmc: Write dummy postcode into the STB DRAM")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20230214152512.806188-1-arnd@kernel.org
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Customers using GKE 1.25 and 1.26 are facing conntrack issues
root caused to commit c9c3b6811f ("netfilter: conntrack: make
max chain length random").
Even if we assume Uniform Hashing, a bucket often reachs 8 chained
items while the load factor of the hash table is smaller than 0.5
With a limit of 16, we reach load factors of 3.
With a limit of 32, we reach load factors of 11.
With a limit of 40, we reach load factors of 15.
With a limit of 50, we reach load factors of 24.
This patch changes MIN_CHAINLEN to 50, to minimize risks.
Ideally, we could in the future add a cushion based on expected
load factor (2 * nf_conntrack_max / nf_conntrack_buckets),
because some setups might expect unusual values.
Fixes: c9c3b6811f ("netfilter: conntrack: make max chain length random")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Daniel Borkmann says:
====================
pull-request: bpf 2023-03-06
We've added 8 non-merge commits during the last 7 day(s) which contain
a total of 9 files changed, 64 insertions(+), 18 deletions(-).
The main changes are:
1) Fix BTF resolver for DATASEC sections when a VAR points at a modifier,
that is, keep resolving such instances instead of bailing out,
from Lorenz Bauer.
2) Fix BPF test framework with regards to xdp_frame info misplacement
in the "live packet" code, from Alexander Lobakin.
3) Fix an infinite loop in BPF sockmap code for TCP/UDP/AF_UNIX,
from Liu Jian.
4) Fix a build error for riscv BPF JIT under PERF_EVENTS=n,
from Randy Dunlap.
5) Several BPF doc fixes with either broken links or external instead
of internal doc links, from Bagas Sanjaya.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
selftests/bpf: check that modifier resolves after pointer
btf: fix resolving BTF_KIND_VAR after ARRAY, STRUCT, UNION, PTR
bpf, test_run: fix &xdp_frame misplacement for LIVE_FRAMES
bpf, doc: Link to submitting-patches.rst for general patch submission info
bpf, doc: Do not link to docs.kernel.org for kselftest link
bpf, sockmap: Fix an infinite loop error when len is 0 in tcp_bpf_recvmsg_parser()
riscv, bpf: Fix patch_text implicit declaration
bpf, docs: Fix link to BTF doc
====================
Link: https://lore.kernel.org/r/20230306215944.11981-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The previous commit mistakenly introduced sim_ctrl_default as pinctrl,
this is incorrect, the interface for sim card selection varies between
different devices and should not be placed in the dtsi.
This commit selects external SIM card slot for ufi001c as default.
uf896 selects the correct SIM card slot automatically, thus does not need
this pinctrl node.
Fixes: faf6943146 ("arm64: dts: qcom: msm8916-thwc: Add initial device trees")
Signed-off-by: Yang Xiwen <forbidden405@foxmail.com>
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Link: https://lore.kernel.org/r/tencent_7036BCA256055D05F8C49D86DF7F0E2D1A05@qq.com
My randconfig build setup ran into another kallsyms warning:
Inconsistent kallsyms data
Try make KALLSYMS_EXTRA_PASS=1 as a workaround
After adding some debugging code to kallsyms.c, I saw that the recently
added kallsyms_seqs_of_names symbol can sometimes cause the second stage
table to be slightly longer than the first stage, which makes the
build inconsistent.
Add it to the exception table that contains all other kallsyms-generated
symbols.
Fixes: 60443c88f3 ("kallsyms: Improve the performance of kallsyms_lookup_name()")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Looks like a braino that used to be fixed in e.g. #next.alpha
had gotten into alpha.git cherry-picked version of that patch.
Sure, alpha has no preempt, but preempt_enable() in place of
preempt_disable() is actively confusing the readers...
Other than that, the cherry-picked variant matches what I have.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Bring back the Python scripts that were initially added with
TEST_GEN_FILES but now with TEST_FILES to avoid having them deleted
when doing a clean. Also fix the way the architecture is being
determined as they should also be installed when ARCH=x86_64 is
provided explicitly. Then also append extra files to TEST_FILES and
TEST_PROGS with += so they don't get discarded.
Fixes: ba2d788aa8 ("selftests: amd-pstate: Trigger tbench benchmark and test cpus")
Fixes: a49fb7218e ("selftests: amd-pstate: Don't delete source files via Makefile")
Signed-off-by: Guillaume Tucker <guillaume.tucker@collabora.com>
Acked-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
assert(x) should emit a warning if x is false. WARN_ON(x) emits a
warning if x is true. Thus, assert(x) should be defined as WARN_ON(!x)
rather than WARN_ON(x).
Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Signed-off-by: Nick Terrell <terrelln@fb.com>
Backport the relevant part of upstream commit 5b266196 [0].
This fixes in-place decompression for x86-64 kernel decompression. It
uses a bound of 131072 + (uncompressed_size >> 8), which can be violated
after upstream commit 6a7ede3d [1], as zstd can use part of the output
buffer as temporary storage, and without this patch needs a bound of
~262144.
The fix is for zstd to detect that the input and output buffers overlap,
so that zstd knows it can't use the overlapping portion of the output
buffer as tempoary storage. If the margin is not large enough, this will
ensure that zstd will fail the decompression, rather than overwriting
part of the input data, and causing corruption.
This fix has been landed upstream and is in release v1.5.4. That commit
also adds unit and fuzz tests to verify that the margin we use is
respected, and correct. That means that the fix is well tested upstream.
I have not been able to reproduce the potential bug in x86-64 kernel
decompression locally, nor have I recieved reports of failures to
decompress the kernel. It is possible that compression saves enough
space to make it very hard for the issue to appear.
I've boot tested the zstd compressed kernel on x86-64 and i386 with this
patch, which uses in-place decompression, and sanity tested zstd compression
in btrfs / squashfs to make sure that we don't see any issues, but other
uses of zstd shouldn't be affected, because they don't use in-place
decompression.
Thanks to Vasily Gorbik <gor@linux.ibm.com> for debugging a related issue
on s390, which was triggered by the same commit, but was a bug in how
__decompress() was called [2]. And to Sasha Levin <sashal@kernel.org>
for the CC alerting me of the issue.
[0] 5b266196a4
[1] 6a7ede3dfc
[2] https://lore.kernel.org/r/patch-1.thread-41c676.git-41c676c2d153.your-ad-here.call-01675030179-ext-9637@work.hours
CC: Vasily Gorbik <gor@linux.ibm.com>
CC: Heiko Carstens <hca@linux.ibm.com>
CC: Sasha Levin <sashal@kernel.org>
CC: Yann Collet <cyan@fb.com>
Signed-off-by: Nick Terrell <terrelln@fb.com>
Fix the following -Wstringop-overflow warning when building with GCC 11+:
lib/zstd/decompress/huf_decompress.c: In function ‘HUF_readDTableX2_wksp’:
lib/zstd/decompress/huf_decompress.c:700:5: warning: ‘HUF_fillDTableX2.constprop’ accessing 624 bytes in a region of size 52 [-Wstringop-overflow=]
700 | HUF_fillDTableX2(dt, maxTableLog,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
701 | wksp->sortedSymbol, sizeOfSort,
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
702 | wksp->rankStart0, wksp->rankVal, maxW,
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
703 | tableLog+1,
| ~~~~~~~~~~~
704 | wksp->calleeWksp, sizeof(wksp->calleeWksp) / sizeof(U32));
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
lib/zstd/decompress/huf_decompress.c:700:5: note: referencing argument 6 of type ‘U32 (*)[13]’ {aka ‘unsigned int (*)[13]’}
lib/zstd/decompress/huf_decompress.c:571:13: note: in a call to function ‘HUF_fillDTableX2.constprop’
571 | static void HUF_fillDTableX2(HUF_DEltX2* DTable, const U32 targetLog,
| ^~~~~~~~~~~~~~~~
by using pointer notation instead of array notation.
This is one of the last remaining warnings to be fixed before globally
enabling -Wstringop-overflow.
Co-developed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Cc: Nick Terrell <terrelln@fb.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Nick Terrell <terrelln@fb.com>
When the sd driver revalidates host-managed SMR disks, it calls
disk_set_zoned() which changes the zone_write_granularity attribute value
to the logical block size regardless of the device type. After that, the sd
driver overwrites the value in sd_zbc_read_zone() with the physical block
size, since ZBC/ZAC requires this for host-managed disks. Between the calls
to disk_set_zoned() and sd_zbc_read_zone(), there exists a window where the
attribute shows the logical block size as the zone_write_granularity value,
which is wrong for host-managed disks. The duration of the window is from
20ms to 200ms, depending on report zone command execution time.
To avoid the wrong zone_write_granularity value between disk_set_zoned()
and sd_zbc_read_zone(), modify the value not in sd_zbc_read_zone() but
just after disk_set_zoned() call.
Fixes: a805a4fa4f ("block: introduce zone_write_granularity limit")
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Link: https://lore.kernel.org/r/20230306063024.3376959-1-shinichiro.kawasaki@wdc.com
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Hyper-V uses a VHD or VHDX file on the host as the underlying storage for a
virtual disk. The VHD/VHDX file format is a sparse format where real disk
space on the host is assigned in chunks that the VHD/VHDX file format calls
the BlockSize. This BlockSize is not to be confused with the 512-byte (or
4096-byte) sector size of the underlying storage device. The default block
size for a new VHD/VHDX file is 32 Mbytes. When a guest VM touches any
disk space within a 32 Mbyte chunk of the VHD/VHDX file, Hyper-V allocates
32 Mbytes of real disk space for that section of the VHD/VHDX. Similarly,
if a discard operation is done that covers an entire 32 Mbyte chunk,
Hyper-V will free the real disk space for that portion of the VHD/VHDX.
This BlockSize is surfaced in Linux as the "discard_granularity" in
/sys/block/sd<x>/queue, which makes sense.
Hyper-V also has differencing disks that can overlay a VHD/VHDX file to
capture changes to the VHD/VHDX while preserving the original VHD/VHDX.
One example of this differencing functionality is for VM snapshots. When a
snapshot is created, a differencing disk is created. If the snapshot is
rolled back, Hyper-V can just delete the differencing disk, and the VM will
see the original disk contents at the time the snapshot was taken.
Differencing disks are used in other scenarios as well.
The BlockSize for a differencing disk defaults to 2 Mbytes, not 32 Mbytes.
The smaller default is used because changes to differencing disks are
typically scattered all over, and Hyper-V doesn't want to allocate 32
Mbytes of real disk space for a stray write here or there. The smaller
BlockSize provides more efficient use of real disk space.
When a differencing disk is added to a VHD/VHDX, Hyper-V reports
UNIT_ATTENTION with a sense code indicating "Operating parameters have
changed", because the value of discard_granularity should be changed to 2
Mbytes. When the differencing disk is removed, discard_granularity should
be changed back to 32 Mbytes. However, current code simply reports a
message from scsi_report_sense() and the value of
/sys/block/sd<x>/queue/discard_granularity is not updated. The message
isn't very actionable by a sysadmin.
Fix this by having the storvsc driver check for the sense code indicating
that the underly VHD/VHDX block size has changed, and do a rescan of the
device to pick up the new discard_granularity. With this change the entire
transition to/from differencing disks is handled automatically and
transparently, with no confusing messages being output.
Link: https://lore.kernel.org/r/1677516514-86060-1-git-send-email-mikelley@microsoft.com
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In kdump kernel mode, the driver works in reduced functionality mode with
some features disabled such as reduced MSI-X count and RDPQ disabled, etc.
However, the firmware is not aware of this mode in some cases, which
results in undefined behavior.
To address this, the driver informs the firmware about the kdump mode
through MPI capabilities bit during driver initialization. This allows
firmware to adjust its behavior accordingly.
Signed-off-by: Chandrakanth Patil <chandrakanth.patil@broadcom.com>
Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
Link: https://lore.kernel.org/r/20230302105342.34933-3-chandrakanth.patil@broadcom.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The firmware only supports Logical Disk IDs up to 240 and LD ID 255 (0xFF)
is reserved for deleted LDs. However, in some cases, firmware was assigning
LD ID 254 (0xFE) to deleted LDs and this was causing the driver to mark the
wrong disk as deleted. This in turn caused the wrong disk device to be
taken offline by the SCSI midlayer.
To address this issue, limit the LD ID range from 255 to 240. This ensures
the deleted LD ID is properly identified and removed by the driver without
accidently deleting any valid LDs.
Fixes: ae6874ba4b ("scsi: megaraid_sas: Early detection of VD deletion through RaidMap update")
Reported-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Chandrakanth Patil <chandrakanth.patil@broadcom.com>
Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
Link: https://lore.kernel.org/r/20230302105342.34933-2-chandrakanth.patil@broadcom.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When the SAS Transport Layer support is enabled and a device exposed to
the OS by the driver fails INQUIRY commands, the driver frees up the memory
allocated for an internal HBA port data structure. However, in some places,
the reference to the freed memory is not cleared. When the firmware sends
the Device Info change event for the same device again, the freed memory is
accessed and that leads to memory corruption and OS crash.
Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com>
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Link: https://lore.kernel.org/r/20230228140835.4075-7-ranjan.kumar@broadcom.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The "dev_req_params" pointer points to inside the middle of a struct so it
can't be NULL. Removing this impossible condition is nice because now we
don't need to consider the correct error code for that situation.
Link: https://lore.kernel.org/r/Y/yA3niWUcGYgBU8@kili
Fixes: f06fcc7155 ("scsi: ufs-qcom: add QUniPro hardware support and power optimizations")
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
For uniquely identifying the vadc channels, label property has to be used.
The initial commit adding vadc support assumed that the driver will use the
unit address along with the node name to identify the channels. But this
assumption is now broken by,
commit 701c875ade ("iio: adc: qcom-spmi-adc5: Fix the channel name") that
stripped unit address from channel names. This results in probe failure of
the vadc driver:
[ 8.380370] iio iio:device0: tried to double register : in_temp_pmic-die-temp_input
[ 8.380383] qcom-spmi-adc5 c440000.spmi:pmic@0:adc@3100: Failed to register sysfs interfaces
[ 8.380386] qcom-spmi-adc5: probe of c440000.spmi:pmic@0:adc@3100 failed with error -16
Hence, let's get rid of the assumption about drivers and rely on label
property to uniquely identify the channels.
The labels are derived from the schematics for each PMIC. For internal adc
channels such as die and xo, the PMIC names are used as a prefix.
Fixes: 7c01513474 ("arm64: dts: qcom: sc8280xp-x13s: Add PM8280_{1/2} ADC_TM5 channels")
Fixes: 9d41cd1739 ("arm64: dts: qcom: sc8280xp-x13s: Add PMR735A VADC channel")
Fixes: 3375151a71 ("arm64: dts: qcom: sc8280xp-x13s: Add PM8280_{1/2} VADC channels")
Fixes: 9a6b3042c5 ("arm64: dts: qcom: sc8280xp-x13s: Add PMK8280 VADC channels")
Reported-by: Steev Klimaszewski <steev@kali.org>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Link: https://lore.kernel.org/r/20230211052415.14581-1-manivannan.sadhasivam@linaro.org
This is an already known issue that dm-thin volume cannot be used as
swap, otherwise a deadlock may happen when dm-thin internal memory
demand triggers swap I/O on the dm-thin volume itself.
But thanks to commit a666e5c05e ("dm: fix deadlock when swapping to
encrypted device"), the limit_swap_bios target flag can also be used
for dm-thin to avoid the recursive I/O when it is used as swap.
Fix is to simply set ti->limit_swap_bios to true in both pool_ctr()
and thin_ctr().
In my test, I create a dm-thin volume /dev/vg/swap and use it as swap
device. Then I run fio on another dm-thin volume /dev/vg/main and use
large --blocksize to trigger swap I/O onto /dev/vg/swap.
The following fio command line is used in my test,
fio --name recursive-swap-io --lockmem 1 --iodepth 128 \
--ioengine libaio --filename /dev/vg/main --rw randrw \
--blocksize 1M --numjobs 32 --time_based --runtime=12h
Without this fix, the whole system can be locked up within 15 seconds.
With this fix, there is no any deadlock or hung task observed after
2 hours of running fio.
Furthermore, if blocksize is changed from 1M to 128M, after around 30
seconds fio has no visible I/O, and the out-of-memory killer message
shows up in kernel message. After around 20 minutes all fio processes
are killed and the whole system is back to being alive.
This is exactly what is expected when recursive I/O happens on dm-thin
volume when it is used as swap.
Depends-on: a666e5c05e ("dm: fix deadlock when swapping to encrypted device")
Cc: stable@vger.kernel.org
Signed-off-by: Coly Li <colyli@suse.de>
Acked-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Fix data corruption issue with SerDes connected PHYs operating at 1.25
Gbps speed where we could previously observe about 30% packet loss while
the bad packet counter was increasing.
As almost all boards with MediaTek MT7622 or MT7986 use either the MT7531
switch IC operating at 3.125Gbps SerDes rate or single-port PHYs using
rate-adaptation to 2500Base-X mode, this issue only got exposed now when
we started trying to use SFP modules operating with 1.25 Gbps with the
BananaPi R3 board.
The fix is to set bit 12 which disables the RX FIFO clear function when
setting up MAC MCR, MediaTek SDK did the same change stating:
"If without this patch, kernel might receive invalid packets that are
corrupted by GMAC."[1]
[1]: d8a2975939
Fixes: 42c03844e9 ("net-next: mediatek: add support for MediaTek MT7622 SoC")
Tested-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/138da2735f92c8b6f8578ec2e5a794ee515b665f.1677937317.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently link up can't be detected in forced mode if polling
isn't used. Only link up interrupt source we have is aneg
complete which isn't applicable in forced mode. Therefore we
have to use energy-on as link up indicator.
Fixes: 7365494550 ("net: phy: smsc: skip ENERGYON interrupt if disabled")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Merge series from Amadeusz Sławiński <amadeuszx.slawinski@linux.intel.com>:
Series of adjustments to machine board files. Use fixed format in boards
that were not using one. Fix clock handling.
It turns out that commit 596ff4a09b ("cpumask: re-introduce
constant-sized cpumask optimizations") exposed a number of cases of
drivers not checking the result of "cpumask_next()" and friends
correctly.
The documented correct check for "no more cpus in the cpumask" is to
check for the result being equal or larger than the number of possible
CPU ids, exactly _because_ we've always done those constant-sized
cpumask scans using a widened type before. So the return value of a
cpumask scan should be checked with
if (cpu >= nr_cpu_ids)
...
because the cpumask scan did not necessarily stop exactly *at* that
maximum CPU id.
But a few cases ended up instead using checks like
if (cpu == nr_cpumask_bits)
...
which used that internal "widened" number of bits. And that used to
work pretty much by accident (ok, in this case "by accident" is simply
because it matched the historical internal implementation of the cpumask
scanning, so it was more of a "intentionally using implementation
details rather than an accident").
But the extended constant-sized optimizations then did that internal
implementation differently, and now that code that did things wrong but
matched the old implementation no longer worked at all.
Which then causes subsequent odd problems due to using what ends up
being an invalid CPU ID.
Most of these cases require either unusual hardware or special uses to
hit, but the random.c one triggers quite easily.
All you really need is to have a sufficiently small CONFIG_NR_CPUS value
for the bit scanning optimization to be triggered, but not enough CPUs
to then actually fill that widened cpumask. At that point, the cpumask
scanning will return the NR_CPUS constant, which is _not_ the same as
nr_cpumask_bits.
This just does the mindless fix with
sed -i 's/== nr_cpumask_bits/>= nr_cpu_ids/'
to fix the incorrect uses.
The ones in the SCSI lpfc driver in particular could probably be fixed
more cleanly by just removing that repeated pattern entirely, but I am
not emptionally invested enough in that driver to care.
Reported-and-tested-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/lkml/481b19b5-83a0-4793-b4fd-194ad7b978c3@roeck-us.net/
Reported-and-tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/lkml/CAMuHMdUKo_Sf7TjKzcNDa8Ve+6QrK+P8nSQrSQ=6LTRmcBKNww@mail.gmail.com/
Reported-by: Vernon Yang <vernon2gm@gmail.com>
Link: https://lore.kernel.org/lkml/20230306160651.2016767-1-vernon2gm@gmail.com/
Cc: Yury Norov <yury.norov@gmail.com>
Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Lorenz Bauer says:
====================
See the first patch for a detailed explanation.
v2:
- Move RESOLVE_TBD assignment out of the loop (Martin)
====================
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
btf_datasec_resolve contains a bug that causes the following BTF
to fail loading:
[1] DATASEC a size=2 vlen=2
type_id=4 offset=0 size=1
type_id=7 offset=1 size=1
[2] INT (anon) size=1 bits_offset=0 nr_bits=8 encoding=(none)
[3] PTR (anon) type_id=2
[4] VAR a type_id=3 linkage=0
[5] INT (anon) size=1 bits_offset=0 nr_bits=8 encoding=(none)
[6] TYPEDEF td type_id=5
[7] VAR b type_id=6 linkage=0
This error message is printed during btf_check_all_types:
[1] DATASEC a size=2 vlen=2
type_id=7 offset=1 size=1 Invalid type
By tracing btf_*_resolve we can pinpoint the problem:
btf_datasec_resolve(depth: 1, type_id: 1, mode: RESOLVE_TBD) = 0
btf_var_resolve(depth: 2, type_id: 4, mode: RESOLVE_TBD) = 0
btf_ptr_resolve(depth: 3, type_id: 3, mode: RESOLVE_PTR) = 0
btf_var_resolve(depth: 2, type_id: 4, mode: RESOLVE_PTR) = 0
btf_datasec_resolve(depth: 1, type_id: 1, mode: RESOLVE_PTR) = -22
The last invocation of btf_datasec_resolve should invoke btf_var_resolve
by means of env_stack_push, instead it returns EINVAL. The reason is that
env_stack_push is never executed for the second VAR.
if (!env_type_is_resolve_sink(env, var_type) &&
!env_type_is_resolved(env, var_type_id)) {
env_stack_set_next_member(env, i + 1);
return env_stack_push(env, var_type, var_type_id);
}
env_type_is_resolve_sink() changes its behaviour based on resolve_mode.
For RESOLVE_PTR, we can simplify the if condition to the following:
(btf_type_is_modifier() || btf_type_is_ptr) && !env_type_is_resolved()
Since we're dealing with a VAR the clause evaluates to false. This is
not sufficient to trigger the bug however. The log output and EINVAL
are only generated if btf_type_id_size() fails.
if (!btf_type_id_size(btf, &type_id, &type_size)) {
btf_verifier_log_vsi(env, v->t, vsi, "Invalid type");
return -EINVAL;
}
Most types are sized, so for example a VAR referring to an INT is not a
problem. The bug is only triggered if a VAR points at a modifier. Since
we skipped btf_var_resolve that modifier was also never resolved, which
means that btf_resolved_type_id returns 0 aka VOID for the modifier.
This in turn causes btf_type_id_size to return NULL, triggering EINVAL.
To summarise, the following conditions are necessary:
- VAR pointing at PTR, STRUCT, UNION or ARRAY
- Followed by a VAR pointing at TYPEDEF, VOLATILE, CONST, RESTRICT or
TYPE_TAG
The fix is to reset resolve_mode to RESOLVE_TBD before attempting to
resolve a VAR from a DATASEC.
Fixes: 1dc9285184 ("bpf: kernel side support for BTF Var and DataSec")
Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
Link: https://lore.kernel.org/r/20230306112138.155352-2-lmb@isovalent.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
We already mark fwnodes as initialized when they are registered as clock
providers. We do this so that fw_devlink can tell when a clock driver
doesn't use the driver core framework to probe/initialize its device.
This ensures fw_devlink doesn't block the consumers of such a clock
provider indefinitely.
However, some users of CLK_OF_DECLARE() macros don't use the same node
that matches the macro as the node for the clock provider, but they
initialize the entire node. To cover these cases, also mark the nodes
that match the macros as initialized when the init callback function is
called.
An example of this is "stericsson,u8500-clks" that's handled using
CLK_OF_DECLARE() and looks something like this:
clocks {
compatible = "stericsson,u8500-clks";
prcmu_clk: prcmu-clock {
#clock-cells = <1>;
};
prcc_pclk: prcc-periph-clock {
#clock-cells = <2>;
};
prcc_kclk: prcc-kernel-clock {
#clock-cells = <2>;
};
prcc_reset: prcc-reset-controller {
#reset-cells = <2>;
};
...
};
This patch makes sure that "clocks" is marked as initialized so that
fw_devlink knows that all nodes under it have been initialized. If the
driver creates struct devices for some of the subnodes, fw_devlink is
smart enough to know to wait for those devices to probe, so no special
handling is required for those cases.
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reported-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/lkml/CACRpkdamxDX6EBVjKX5=D3rkHp17f5pwGdBVhzFU90-0MHY6dQ@mail.gmail.com/
Fixes: 4a032827da ("of: property: Simplify of_link_to_phandle()")
Signed-off-by: Saravana Kannan <saravanak@google.com>
Link: https://lore.kernel.org/r/20230302014639.297514-1-saravanak@google.com
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Tested-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
&xdp_buff and &xdp_frame are bound in a way that
xdp_buff->data_hard_start == xdp_frame
It's always the case and e.g. xdp_convert_buff_to_frame() relies on
this.
IOW, the following:
for (u32 i = 0; i < 0xdead; i++) {
xdpf = xdp_convert_buff_to_frame(&xdp);
xdp_convert_frame_to_buff(xdpf, &xdp);
}
shouldn't ever modify @xdpf's contents or the pointer itself.
However, "live packet" code wrongly treats &xdp_frame as part of its
context placed *before* the data_hard_start. With such flow,
data_hard_start is sizeof(*xdpf) off to the right and no longer points
to the XDP frame.
Instead of replacing `sizeof(ctx)` with `offsetof(ctx, xdpf)` in several
places and praying that there are no more miscalcs left somewhere in the
code, unionize ::frm with ::data in a flex array, so that both starts
pointing to the actual data_hard_start and the XDP frame actually starts
being a part of it, i.e. a part of the headroom, not the context.
A nice side effect is that the maximum frame size for this mode gets
increased by 40 bytes, as xdp_buff::frame_sz includes everything from
data_hard_start (-> includes xdpf already) to the end of XDP/skb shared
info.
Also update %MAX_PKT_SIZE accordingly in the selftests code. Leave it
hardcoded for 64 bit && 4k pages, it can be made more flexible later on.
Minor: align `&head->data` with how `head->frm` is assigned for
consistency.
Minor #2: rename 'frm' to 'frame' in &xdp_page_head while at it for
clarity.
(was found while testing XDP traffic generator on ice, which calls
xdp_convert_frame_to_buff() for each XDP frame)
Fixes: b530e9e106 ("bpf: Add "live packet" mode for XDP in BPF_PROG_RUN")
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://lore.kernel.org/r/20230224163607.2994755-1-aleksander.lobakin@intel.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
At btrfs_drop_extent_map_range() we are clearing the EXTENT_FLAG_LOGGING
bit on a 'flags' variable that was not initialized. This makes static
checkers complain about it, so initialize the 'flags' variable before
clearing the bit.
In practice this has no consequences, because EXTENT_FLAG_LOGGING should
not be set when btrfs_drop_extent_map_range() is called, as an fsync locks
the inode in exclusive mode, locks the inode's mmap semaphore in exclusive
mode too and it always flushes all delalloc.
Also add a comment about why we clear EXTENT_FLAG_LOGGING on a copy of the
flags of the split extent map.
Reported-by: Dan Carpenter <error27@gmail.com>
Link: https://lore.kernel.org/linux-btrfs/Y%2FyipSVozUDEZKow@kili/
Fixes: db21370bff ("btrfs: drop extent map range more efficiently")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We have a report, that the info message for block-group reclaim is
crossing the 100% used mark.
This is happening as we were truncating the divisor for the division
(the block_group->length) to a 32bit value.
Fix this by using div64_u64() to not truncate the divisor.
In the worst case, it can lead to a div by zero error and should be
possible to trigger on 4 disks RAID0, and each device is large enough:
$ mkfs.btrfs -f /dev/test/scratch[1234] -m raid1 -d raid0
btrfs-progs v6.1
[...]
Filesystem size: 40.00GiB
Block group profiles:
Data: RAID0 4.00GiB <<<
Metadata: RAID1 256.00MiB
System: RAID1 8.00MiB
Reported-by: Forza <forza@tnonline.net>
Link: https://lore.kernel.org/linux-btrfs/e99483.c11a58d.1863591ca52@tnonline.net/
Fixes: 5f93e776c6 ("btrfs: zoned: print unusable percentage when reclaiming block groups")
CC: stable@vger.kernel.org # 5.15+
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ add Qu's note ]
Signed-off-by: David Sterba <dsterba@suse.com>
Current btrfs_log_dev_io_error() increases the read error count even if the
erroneous IO is a WRITE request. This is because it forget to use "else
if", and all the error WRITE requests counts as READ error as there is (of
course) no REQ_RAHEAD bit set.
Fixes: c3a62baf21 ("btrfs: use chained bios when cloning")
CC: stable@vger.kernel.org # 6.1+
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Even if the slot is already read out, we may still need to re-balance
the tree, thus it can cause error in that btrfs_del_item() call and we
need to handle it properly.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: void0red <void0red@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Currently user space utilizes dev info ioctl to grab the info of a
certain devid, this includes its device uuid. But the returned info is
not enough to determine if a device is a seed.
Commit a26d60dedf ("btrfs: sysfs: add devinfo/fsid to retrieve actual
fsid from the device") exports the same value in sysfs so this is for
parity with ioctl. Add a new member, fsid, into
btrfs_ioctl_dev_info_args, and populate the member with fsid value.
This should not cause any compatibility problem, following the
combinations:
- Old user space, old kernel
- Old user space, new kernel
User space tool won't even check the new member.
- New user space, old kernel
The kernel won't touch the new member, and user space tool should
zero out its argument, thus the new member is all zero.
User space tool can then know the kernel doesn't support this fsid
reporting, and falls back to whatever they can.
- New user space, new kernel
Go as planned.
Would find the fsid member is no longer zero, and trust its value.
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
To pick the changes from:
8415a74852 ("x86/cpu, kvm: Add support for CPUID_80000021_EAX")
This only causes these perf files to be rebuilt:
CC /tmp/build/perf/bench/mem-memcpy-x86-64-asm.o
CC /tmp/build/perf/bench/mem-memset-x86-64-asm.o
And addresses these perf build warnings:
Warning: Kernel ABI header at 'tools/arch/x86/include/asm/disabled-features.h' differs from latest version at 'arch/x86/include/asm/disabled-features.h'
diff -u tools/arch/x86/include/asm/disabled-features.h arch/x86/include/asm/disabled-features.h
Warning: Kernel ABI header at 'tools/arch/x86/include/asm/required-features.h' differs from latest version at 'arch/x86/include/asm/required-features.h'
diff -u tools/arch/x86/include/asm/required-features.h arch/x86/include/asm/required-features.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kim Phillips <kim.phillips@amd.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/ZAYlS2XTJ5hRtss7@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The link for patch submission information in general refers to index
page for "Working with the kernel development community" section of
kernel docs, whereas the link should have been
Documentation/process/submitting-patches.rst instead.
Fix it by replacing the index target with the appropriate doc.
Fixes: 5422283848 ("bpf, doc: convert bpf_devel_QA.rst to use RST formatting")
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230228074523.11493-3-bagasdotme@gmail.com
The question on how to run BPF selftests have a reference link to kernel
selftest documentation (Documentation/dev-tools/kselftest.rst). However,
it uses external link to the documentation at kernel.org/docs (aka
docs.kernel.org) instead, which requires Internet access.
Fix this and replace the link with internal linking, by using :doc: directive
while keeping the anchor text.
Fixes: b7a27c3aaf ("bpf, doc: howto use/run the BPF selftests")
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230228074523.11493-2-bagasdotme@gmail.com
Now that address space operations are merge dfor in-ICB and normal
files, it is more likely some code mistakenly tries to map blocks for
in-ICB files. WARN and return error instead of silently returning
garbage.
Signed-off-by: Jan Kara <jack@suse.cz>
After merging address space operations of normal and in-ICB files,
readahead could get called for in-ICB files which resulted in
udf_get_block() being called for these files. udf_get_block() is not
prepared to be called for in-ICB files and ends up returning garbage
results as it interprets file data as extent list. Fix the problem by
skipping readahead for in-ICB files.
Fixes: 37a8a39f7a ("udf: Switch to single address_space_operations")
Signed-off-by: Jan Kara <jack@suse.cz>
The patch converting udf_adinicb_writepage() to avoid manually kmapping
the page used memcpy_to_page() however that copies in the wrong
direction (effectively overwriting file data with the old contents).
What we should be using is memcpy_from_page() to copy data from the page
into the inode and then mark inode dirty to store the data.
Fixes: 5cfc45321a ("udf: Convert udf_adinicb_writepage() to memcpy_to_page()")
Signed-off-by: Jan Kara <jack@suse.cz>
This structure must be zeroed, because it's field 'hw->core' is used as
'parent' in 'clk_core_fill_parent_index()', but it will be uninitialized.
This happens, because when this struct is not zeroed, pointer 'hw' is
"initialized" by garbage, which is valid pointer, but points to some
garbage. So 'hw' will be dereferenced, but 'core' contains some random
data which will be interpreted as a pointer. The following backtrace is
result of dereference of such pointer:
[ 1.081319] __clk_register+0x414/0x820
[ 1.085113] devm_clk_register+0x64/0xd0
[ 1.088995] meson_nfc_probe+0x258/0x6ec
[ 1.092875] platform_probe+0x70/0xf0
[ 1.096498] really_probe+0xc8/0x3e0
[ 1.100034] __driver_probe_device+0x84/0x190
[ 1.104346] driver_probe_device+0x44/0x120
[ 1.108487] __driver_attach+0xb4/0x220
[ 1.112282] bus_for_each_dev+0x78/0xd0
[ 1.116077] driver_attach+0x2c/0x40
[ 1.119613] bus_add_driver+0x184/0x240
[ 1.123408] driver_register+0x80/0x140
[ 1.127203] __platform_driver_register+0x30/0x40
[ 1.131860] meson_nfc_driver_init+0x24/0x30
Fixes: 1e4d3ba668 ("mtd: rawnand: meson: fix the clock")
Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
Acked-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20230227102425.793841-1-AVKrasnov@sberdevices.ru
relid2channel() assumes vmbus channel array to be allocated when called.
However, in cases such as kdump/kexec, not all relids will be reset by the host.
When the second kernel boots and if the guest receives a vmbus interrupt during
vmbus driver initialization before vmbus_connect() is called, before it finishes,
or if it fails, the vmbus interrupt service routine is called which in turn calls
relid2channel() and can cause a null pointer dereference.
Print a warning and error out in relid2channel() for a channel id that's invalid
in the second kernel.
Fixes: 8b6a877c06 ("Drivers: hv: vmbus: Replace the per-CPU channel lists with a global array of channels")
Signed-off-by: Mohammed Gamal <mgamal@redhat.com>
Reviewed-by: Dexuan Cui <decui@microsoft.com>
Link: https://lore.kernel.org/r/20230217204411.212709-1-mgamal@redhat.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
__copy_to_user_memcpy() and __clear_user_memset() had been calling
memcpy() and memset() respectively, leading to false-positive KASAN
reports when starting userspace:
[ 10.707901] Run /init as init process
[ 10.731892] process '/bin/busybox' started with executable stack
[ 10.745234] ==================================================================
[ 10.745796] BUG: KASAN: user-memory-access in __clear_user_memset+0x258/0x3ac
[ 10.747260] Write of size 2687 at addr 000de581 by task init/1
Use __memcpy() and __memset() instead to allow userspace access, which
is of course the intent of these functions.
Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
Signed-off-by: Zev Weiss <zev@bewilderbeest.net>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
This is a struct with a trailing zero-length array of icc_node pointers
but it's allocated as if it were a single array of icc_nodes instead.
Fortunately this overallocates memory rather then allocating less memory
than required.
Fix by replacing devm_kcalloc() with devm_kzalloc() and struct_size()
macro.
Fixes: 5bc9900add ("interconnect: qcom: Add OSM L3 interconnect provider support")
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Link: https://lore.kernel.org/r/20230105002221.1416479-2-dmitry.baryshkov@linaro.org
Signed-off-by: Georgi Djakov <djakov@kernel.org>
Since commit 502df79b86 ("gpiolib: Warn on drivers still using static
gpiobase allocation"), one or more warnings are printed during boot on
systems where static allocation of GPIO base is used:
[ 0.197707] gpio gpiochip0: Static allocation of GPIO base is deprecated, use dynamic allocation.
[ 0.199942] stm32f429-pinctrl soc:pinctrl@40020000: GPIOA bank added
[ 0.200711] gpio gpiochip1: Static allocation of GPIO base is deprecated, use dynamic allocation.
[ 0.202855] stm32f429-pinctrl soc:pinctrl@40020000: GPIOB bank added
[ 0.203591] gpio gpiochip2: Static allocation of GPIO base is deprecated, use dynamic allocation.
[ 0.205704] stm32f429-pinctrl soc:pinctrl@40020000: GPIOC bank added
[ 0.206338] gpio gpiochip3: Static allocation of GPIO base is deprecated, use dynamic allocation.
[ 0.208448] stm32f429-pinctrl soc:pinctrl@40020000: GPIOD bank added
[ 0.209182] gpio gpiochip4: Static allocation of GPIO base is deprecated, use dynamic allocation.
[ 0.211282] stm32f429-pinctrl soc:pinctrl@40020000: GPIOE bank added
[ 0.212094] gpio gpiochip5: Static allocation of GPIO base is deprecated, use dynamic allocation.
[ 0.214270] stm32f429-pinctrl soc:pinctrl@40020000: GPIOF bank added
[ 0.215005] gpio gpiochip6: Static allocation of GPIO base is deprecated, use dynamic allocation.
[ 0.217110] stm32f429-pinctrl soc:pinctrl@40020000: GPIOG bank added
[ 0.217845] gpio gpiochip7: Static allocation of GPIO base is deprecated, use dynamic allocation.
[ 0.219959] stm32f429-pinctrl soc:pinctrl@40020000: GPIOH bank added
[ 0.220602] gpio gpiochip8: Static allocation of GPIO base is deprecated, use dynamic allocation.
[ 0.222714] stm32f429-pinctrl soc:pinctrl@40020000: GPIOI bank added
[ 0.223483] gpio gpiochip9: Static allocation of GPIO base is deprecated, use dynamic allocation.
[ 0.225594] stm32f429-pinctrl soc:pinctrl@40020000: GPIOJ bank added
[ 0.226336] gpio gpiochip10: Static allocation of GPIO base is deprecated, use dynamic allocation.
[ 0.228490] stm32f429-pinctrl soc:pinctrl@40020000: GPIOK bank added
So let's follow the suggestion and use dynamic allocation.
Tested on STM32F429I-DISC1 board.
Signed-off-by: Dario Binacchi <dario.binacchi@amarulasolutions.com>
Link: https://lore.kernel.org/r/20230227205131.2104082-1-dario.binacchi@amarulasolutions.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
In case the driver was trying to set an alternate mode for gpio
0 or 32 then the mode was not set correctly. The reason is that
there is computation error inside the function ocelot_pinmux_set_mux
because in this case it was trying to shift to left by -1.
Fix this by actually shifting the function bits and not the position.
Fixes: 4b36082e2e ("pinctrl: ocelot: fix pinmuxing for pins after 31")
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Link: https://lore.kernel.org/r/20230206203720.1177718-1-horatiu.vultur@microchip.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
__get_kernel_nofault() does copy data in supervisor mode when
forcing a task backtrace log through /proc/sysrq_trigger.
This is expected cause a bus error exception on e.g. NULL
pointer dereferencing when logging a kernel task has no
workqueue associated. This bus error ought to be ignored.
Our 030 bus error handler is ill equipped to deal with this:
Whenever ssw indicates a kernel mode access on a data fault,
we don't even attempt to handle the fault and instead always
send a SEGV signal (or panic). As a result, the check
for exception handling at the fault PC (buried in
send_sig_fault() which gets called from do_page_fault()
eventually) is never used.
In contrast, both 040 and 060 access error handlers do not
care whether a fault happened on supervisor mode access,
and will call do_page_fault() on those, ultimately honoring
the exception table.
Add a check in bus_error030 to call do_page_fault() in case
we do have an entry for the fault PC in our exception table.
I had attempted a fix for this earlier in 2019 that did rely
on testing pagefault_disabled() (see link below) to achieve
the same thing, but this patch should be more generic.
Tested on 030 Atari Falcon.
Reported-by: Eero Tamminen <oak@helsinkinet.fi>
Link: https://lore.kernel.org/r/alpine.LNX.2.21.1904091023540.25@nippy.intranet
Link: https://lore.kernel.org/r/63130691-1984-c423-c1f2-73bfd8d3dcd3@gmail.com
Signed-off-by: Michael Schmitz <schmitzmic@gmail.com>
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Link: https://lore.kernel.org/r/20230301021107.26307-1-schmitzmic@gmail.com
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
The xtables packet traverser performs an unconditional local_bh_disable(),
but the nf_tables evaluation loop does not.
Functions that are called from either xtables or nftables must assume
that they can be called in process context.
inet_twsk_deschedule_put() assumes that no softirq interrupt can occur.
If tproxy is used from nf_tables its possible that we'll deadlock
trying to aquire a lock already held in process context.
Add a small helper that takes care of this and use it.
Link: https://lore.kernel.org/netfilter-devel/401bd6ed-314a-a196-1cdc-e13c720cc8f2@balasys.hu/
Fixes: 4ed8eb6570 ("netfilter: nf_tables: Add native tproxy support")
Reported-and-tested-by: Major Dávid <major.david@balasys.hu>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
It seems that change was unintentional, we have userspace code that
needs the mark while listening for events like REPLY, DESTROY, etc.
Also include 0-marks in requested dumps, as they were before that fix.
Fixes: 1feeae0715 ("netfilter: ctnetlink: fix compilation warning after data race fixes in ct mark")
Signed-off-by: Ivan Delalande <colona@arista.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
REGMAP is a hidden (not user visible) symbol. Users cannot set it
directly thru "make *config", so drivers should select it instead of
depending on it if they need it.
Consistently using "select" or "depends on" can also help reduce
Kconfig circular dependency issues.
Therefore, change the use of "depends on REGMAP" to "select REGMAP".
Fixes: ebe363197e ("gpio: add a reusable generic gpio_chip using regmap")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Michael Walle <michael@walle.cc>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Bartosz Golaszewski <brgl@bgdev.pl>
Cc: linux-gpio@vger.kernel.org
Acked-by: Michael Walle <michael@walle.cc>
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Add QUIRK_NO_CLX to disable the CLx state for hardware which
doesn't supports it.
AMD Yellow Carp and Pink Sardine don't support CLx state,
hence disabling it using QUIRK_NO_CLX.
Cc: stable@vger.kernel.org
Signed-off-by: Sanjay R Mehta <sanju.mehta@amd.com>
Signed-off-by: Basavaraj Natikar <Basavaraj.Natikar@amd.com>
[mw: added debug log when the quirk is run]
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
The driver needs to keep track of all the possible concurrent TPA (GRO/LRO)
completions on the aggregation ring. On P5 chips, the maximum number
of concurrent TPA is 256 and the amount of memory we allocate is order-5
on systems using 4K pages. Memory allocation failure has been reported:
NetworkManager: page allocation failure: order:5, mode:0x40dc0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), nodemask=(null),cpuset=/,mems_allowed=0-1
CPU: 15 PID: 2995 Comm: NetworkManager Kdump: loaded Not tainted 5.10.156 #1
Hardware name: Dell Inc. PowerEdge R660/0M1CC5, BIOS 0.2.25 08/12/2022
Call Trace:
dump_stack+0x57/0x6e
warn_alloc.cold.120+0x7b/0xdd
? _cond_resched+0x15/0x30
? __alloc_pages_direct_compact+0x15f/0x170
__alloc_pages_slowpath.constprop.108+0xc58/0xc70
__alloc_pages_nodemask+0x2d0/0x300
kmalloc_order+0x24/0xe0
kmalloc_order_trace+0x19/0x80
bnxt_alloc_mem+0x1150/0x15c0 [bnxt_en]
? bnxt_get_func_stat_ctxs+0x13/0x60 [bnxt_en]
__bnxt_open_nic+0x12e/0x780 [bnxt_en]
bnxt_open+0x10b/0x240 [bnxt_en]
__dev_open+0xe9/0x180
__dev_change_flags+0x1af/0x220
dev_change_flags+0x21/0x60
do_setlink+0x35c/0x1100
Instead of allocating this big chunk of memory and dividing it up for the
concurrent TPA instances, allocate each small chunk separately for each
TPA instance. This will reduce it to order-0 allocations.
Fixes: 79632e9ba3 ("bnxt_en: Expand bnxt_tpa_info struct to support 57500 chips.")
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Damodharam Ammepalli <damodharam.ammepalli@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When SCMI raw coexistence mode is enabled make the core stack probe
successfully even when the initial base protocol exchanges with the
platform/server failed.
This behaviour enables the system to boot with a broken regular SCMI
stack but with a fully functional and accessible SCMI raw debugfs
interface that can be used to further debug the issue.
Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
Link: https://lore.kernel.org/r/20230223152330.2707260-1-cristian.marussi@arm.com
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
When MAC is not support PMT, driver will check PHY's WoL capability
and set device wakeup capability in stmmac_init_phy(). We can enable
the WoL through ethtool, the driver would enable the device wake up
flag. Now the device_may_wakeup() return true.
But if there is a way which enable the PHY's WoL capability derectly,
like in BIOS. The driver would not know the enable thing and would not
set the device wake up flag. The phy_suspend may failed like this:
[ 32.409063] PM: dpm_run_callback(): mdio_bus_phy_suspend+0x0/0x50 returns -16
[ 32.409065] PM: Device stmmac-1:00 failed to suspend: error -16
[ 32.409067] PM: Some devices failed to suspend, or early wake event detected
Add to set the device wakeup enable flag according to the get_wol
function result in PHY can fix the error in this scene.
v2: add a Fixes tag.
Fixes: 1d8e5b0f3f ("net: stmmac: Support WOL with phy")
Signed-off-by: Rongguang Wei <weirongguang@kylinos.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
With commit b203e6f1e8 ("arm64: dts: ls1028a: sl28: get MAC addresses
from VPD"), the network adapter now depends on the nvmem device to be
present, which isn't the case and thus breaks networking on this board.
Revert it.
Fixes: b203e6f1e8 ("arm64: dts: ls1028a: sl28: get MAC addresses from VPD")
Signed-off-by: Michael Walle <michael@walle.cc>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
The correct clock order is "fspi_en" and "fspi". As they are identical
just reordering the names is sufficient.
Fixes: 6276d66984 ("arm64: dts: imx8dxl: add flexspi0 support")
Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
When send SMB_COM_NT_CANCEL and RFC1002_SESSION_REQUEST, the
in_send statistic was lost.
Let's move the in_send statistic to the send function to avoid
this scenario.
Fixes: 7ee1af765d ("[CIFS]")
Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
When we run syzkaller we get below Out of Bound.
"KASAN: slab-out-of-bounds Read in regcache_flat_read"
Below is the backtrace of the issue:
dump_backtrace+0x0/0x4c8
show_stack+0x34/0x44
dump_stack_lvl+0xd8/0x118
print_address_description+0x30/0x2d8
kasan_report+0x158/0x198
__asan_report_load4_noabort+0x44/0x50
regcache_flat_read+0x10c/0x110
regcache_read+0xf4/0x180
_regmap_read+0xc4/0x278
_regmap_update_bits+0x130/0x290
regmap_update_bits_base+0xc0/0x15c
snd_soc_component_update_bits+0xa8/0x22c
snd_soc_component_write_field+0x68/0xd4
tx_macro_digital_mute+0xec/0x140
Actually There is no need to have decimator with 32 bits.
By limiting the variable with short type u8 issue is resolved.
Signed-off-by: Ravulapati Vishnu Vardhan Rao <quic_visr@quicinc.com>
Link: https://lore.kernel.org/r/20230304080702.609-1-quic_visr@quicinc.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Both SND_SOC_IMX_SGTL5000 and SND_SOC_FSL_ASOC_CARD implement the
fsl,imx-audio-sgtl5000 compatible string, which is confusing. It took a
little research to find out that the latter is much newer and it is
supposed to be the preferred choice since several years.
Add a clarification note to avoid wasting time for future readers.
Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
Link: https://lore.kernel.org/r/20230303093410.357621-1-luca.ceresoli@bootlin.com
Signed-off-by: Mark Brown <broonie@kernel.org>
The recent writeback corruption fixes changed the code in
xfs_discard_folio() to calculate a byte range to for punching
delalloc extents. A mistake was made in using round_up(pos) for the
end offset, because when pos points at the first byte of a block, it
does not get rounded up to point to the end byte of the block. hence
the punch range is short, and this leads to unexpected behaviour in
certain cases in xfs_bmap_punch_delalloc_range.
e.g. pos = 0 means we call xfs_bmap_punch_delalloc_range(0,0), so
there is no previous extent and it rounds up the punch to the end of
the delalloc extent it found at offset 0, not the end of the range
given to xfs_bmap_punch_delalloc_range().
Fix this by handling the zero block offset case correctly.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=217030
Link: https://lore.kernel.org/linux-xfs/Y+vOfaxIWX1c%2Fyy9@bfoster/
Fixes: 7348b32233 ("xfs: xfs_bmap_punch_delalloc_range() should take a byte range")
Reported-by: Pengfei Xu <pengfei.xu@intel.com>
Found-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
The background inode inactivation can attached dquots to inodes, but
this can race with a foreground quotacheck failure that leads to
disabling quotas and freeing the mp->m_quotainfo structure. The
background inode inactivation then tries to allocate a quota, tries
to dereference mp->m_quotainfo, and crashes like so:
XFS (loop1): Quotacheck: Unsuccessful (Error -5): Disabling quotas.
xfs filesystem being mounted at /root/syzkaller.qCVHXV/0/file0 supports timestamps until 2038 (0x7fffffff)
BUG: kernel NULL pointer dereference, address: 00000000000002a8
....
CPU: 0 PID: 161 Comm: kworker/0:4 Not tainted 6.2.0-c9c3395d5e3d #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
Workqueue: xfs-inodegc/loop1 xfs_inodegc_worker
RIP: 0010:xfs_dquot_alloc+0x95/0x1e0
....
Call Trace:
<TASK>
xfs_qm_dqread+0x46/0x440
xfs_qm_dqget_inode+0x154/0x500
xfs_qm_dqattach_one+0x142/0x3c0
xfs_qm_dqattach_locked+0x14a/0x170
xfs_qm_dqattach+0x52/0x80
xfs_inactive+0x186/0x340
xfs_inodegc_worker+0xd3/0x430
process_one_work+0x3b1/0x960
worker_thread+0x52/0x660
kthread+0x161/0x1a0
ret_from_fork+0x29/0x50
</TASK>
....
Prevent this race by flushing all the queued background inode
inactivations pending before purging all the cached dquots when
quotacheck fails.
Reported-by: Pengfei Xu <pengfei.xu@intel.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
During the processing of the bgt, if the sync_erase() return -EBUSY
or some other error code in __erase_worker(),schedule_erase() called
again lead to the down_read(ubi->work_sem) hold twice and may get
block by down_write(ubi->work_sem) in ubi_update_fastmap(),
which cause deadlock.
ubi bgt other task
do_work
down_read(&ubi->work_sem) ubi_update_fastmap
erase_worker # Blocked by down_read
__erase_worker down_write(&ubi->work_sem)
schedule_erase
schedule_ubi_work
down_read(&ubi->work_sem)
Fix this by changing input parameter @nested of the schedule_erase() to
'true' to avoid recursively acquiring the down_read(&ubi->work_sem).
Also, fix the incorrect comment about @nested parameter of the
schedule_erase() because when down_write(ubi->work_sem) is held, the
@nested is also need be true.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217093
Fixes: 2e8f08deab ("ubi: Fix races around ubi_refill_pools()")
Signed-off-by: ZhaoLong Wang <wangzhaolong1@huawei.com>
Reviewed-by: Zhihao Cheng <chengzhihao1@huawei.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
To pick up the changes in:
e7862eda30 ("x86/cpu: Support AMD Automatic IBRS")
0125acda7d ("x86/bugs: Reset speculation control settings on init")
38aaf921e9 ("perf/x86: Add Meteor Lake support")
5b6fac3fa4 ("x86/resctrl: Detect and configure Slow Memory Bandwidth Allocation")
dc2a3e8579 ("x86/resctrl: Add interface to read mbm_total_bytes_config")
Addressing these tools/perf build warnings:
diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h
Warning: Kernel ABI header at 'tools/arch/x86/include/asm/msr-index.h' differs from latest version at 'arch/x86/include/asm/msr-index.h'
That makes the beautification scripts to pick some new entries:
$ tools/perf/trace/beauty/tracepoints/x86_msr.sh > before
$ cp arch/x86/include/asm/msr-index.h tools/arch/x86/include/asm/msr-index.h
$ tools/perf/trace/beauty/tracepoints/x86_msr.sh > after
$ diff -u before after
--- before 2023-03-03 18:26:51.766923522 -0300
+++ after 2023-03-03 18:27:09.987415481 -0300
@@ -267,9 +267,11 @@
[0xc000010e - x86_64_specific_MSRs_offset] = "AMD64_LBR_SELECT",
[0xc000010f - x86_64_specific_MSRs_offset] = "AMD_DBG_EXTN_CFG",
[0xc0000200 - x86_64_specific_MSRs_offset] = "IA32_MBA_BW_BASE",
+ [0xc0000280 - x86_64_specific_MSRs_offset] = "IA32_SMBA_BW_BASE",
[0xc0000300 - x86_64_specific_MSRs_offset] = "AMD64_PERF_CNTR_GLOBAL_STATUS",
[0xc0000301 - x86_64_specific_MSRs_offset] = "AMD64_PERF_CNTR_GLOBAL_CTL",
[0xc0000302 - x86_64_specific_MSRs_offset] = "AMD64_PERF_CNTR_GLOBAL_STATUS_CLR",
+ [0xc0000400 - x86_64_specific_MSRs_offset] = "IA32_EVT_CFG_BASE",
};
#define x86_AMD_V_KVM_MSRs_offset 0xc0010000
$
Now one can trace systemwide asking to see backtraces to where that MSR
is being read/written, see this example with a previous update:
# perf trace -e msr:*_msr/max-stack=32/ --filter="msr>=IA32_U_CET && msr<=IA32_INT_SSP_TAB"
^C#
If we use -v (verbose mode) we can see what it does behind the scenes:
# perf trace -v -e msr:*_msr/max-stack=32/ --filter="msr>=IA32_U_CET && msr<=IA32_INT_SSP_TAB"
Using CPUID AuthenticAMD-25-21-0
0x6a0
0x6a8
New filter for msr:read_msr: (msr>=0x6a0 && msr<=0x6a8) && (common_pid != 597499 && common_pid != 3313)
0x6a0
0x6a8
New filter for msr:write_msr: (msr>=0x6a0 && msr<=0x6a8) && (common_pid != 597499 && common_pid != 3313)
mmap size 528384B
^C#
Example with a frequent msr:
# perf trace -v -e msr:*_msr/max-stack=32/ --filter="msr==IA32_SPEC_CTRL" --max-events 2
Using CPUID AuthenticAMD-25-21-0
0x48
New filter for msr:read_msr: (msr==0x48) && (common_pid != 2612129 && common_pid != 3841)
0x48
New filter for msr:write_msr: (msr==0x48) && (common_pid != 2612129 && common_pid != 3841)
mmap size 528384B
Looking at the vmlinux_path (8 entries long)
symsrc__init: build id mismatch for vmlinux.
Using /proc/kcore for kernel data
Using /proc/kallsyms for symbols
0.000 Timer/2525383 msr:write_msr(msr: IA32_SPEC_CTRL, val: 6)
do_trace_write_msr ([kernel.kallsyms])
do_trace_write_msr ([kernel.kallsyms])
__switch_to_xtra ([kernel.kallsyms])
__switch_to ([kernel.kallsyms])
__schedule ([kernel.kallsyms])
schedule ([kernel.kallsyms])
futex_wait_queue_me ([kernel.kallsyms])
futex_wait ([kernel.kallsyms])
do_futex ([kernel.kallsyms])
__x64_sys_futex ([kernel.kallsyms])
do_syscall_64 ([kernel.kallsyms])
entry_SYSCALL_64_after_hwframe ([kernel.kallsyms])
__futex_abstimed_wait_common64 (/usr/lib64/libpthread-2.33.so)
0.030 :0/0 msr:write_msr(msr: IA32_SPEC_CTRL, val: 2)
do_trace_write_msr ([kernel.kallsyms])
do_trace_write_msr ([kernel.kallsyms])
__switch_to_xtra ([kernel.kallsyms])
__switch_to ([kernel.kallsyms])
__schedule ([kernel.kallsyms])
schedule_idle ([kernel.kallsyms])
do_idle ([kernel.kallsyms])
cpu_startup_entry ([kernel.kallsyms])
secondary_startup_64_no_verify ([kernel.kallsyms])
#
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Babu Moger <babu.moger@amd.com>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Breno Leitao <leitao@debian.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kim Phillips <kim.phillips@amd.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nikunj A Dadhania <nikunj@amd.com>
Link: https://lore.kernel.org/lkml/ZAJoaZ41+rU5H0vL@kernel.org
[ I had published the perf-tools branch before with the sync with ]
[ 8c29f01654 ("x86/sev: Add SEV-SNP guest feature negotiation support") ]
[ I removed it from this new sync ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick up the changes in:
89b0e7de34 ("KVM: arm64: nv: Introduce nested virtualization VCPU feature")
14329b825f ("KVM: x86/pmu: Introduce masked events to the pmu event filter")
6213b701a9 ("KVM: x86: Replace 0-length arrays with flexible arrays")
3fd49805d1 ("KVM: s390: Extend MEM_OP ioctl by storage key checked cmpxchg")
14329b825f ("KVM: x86/pmu: Introduce masked events to the pmu event filter")
That don't change functionality in tools/perf, as no new ioctl is added
for the 'perf trace' scripts to harvest.
This addresses these perf build warnings:
Warning: Kernel ABI header at 'tools/include/uapi/linux/kvm.h' differs from latest version at 'include/uapi/linux/kvm.h'
diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h
Warning: Kernel ABI header at 'tools/arch/x86/include/uapi/asm/kvm.h' differs from latest version at 'arch/x86/include/uapi/asm/kvm.h'
diff -u tools/arch/x86/include/uapi/asm/kvm.h arch/x86/include/uapi/asm/kvm.h
Warning: Kernel ABI header at 'tools/arch/arm64/include/uapi/asm/kvm.h' differs from latest version at 'arch/arm64/include/uapi/asm/kvm.h'
diff -u tools/arch/arm64/include/uapi/asm/kvm.h arch/arm64/include/uapi/asm/kvm.h
Cc: Aaron Lewis <aaronlewis@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Christoffer Dall <christoffer.dall@arm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Janis Schoetterl-Glausch <scgl@linux.ibm.com>
Cc: Janosch Frank <frankja@linux.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kees Kook <keescook@chromium.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/lkml/ZAJlg7%2FfWDVGX0F3@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick up the changes in:
6fd7353829 ("mm/memfd: add F_SEAL_EXEC")
That doesn't add or change any perf tools functionality, only addresses
these build warnings:
Warning: Kernel ABI header at 'tools/include/uapi/linux/fcntl.h' differs from latest version at 'include/uapi/linux/fcntl.h'
diff -u tools/include/uapi/linux/fcntl.h include/uapi/linux/fcntl.h
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick up the changes in this cset:
cbdb1f163a ("vdso/bits.h: Add BIT_ULL() for the sake of consistency")
That just causes perf to rebuild, the macro included doesn't clash with
anything in tools/{perf,objtool,bpf}.
This addresses this perf build warning:
Warning: Kernel ABI header at 'tools/include/linux/bits.h' differs from latest version at 'include/linux/bits.h'
diff -u tools/include/linux/bits.h include/linux/bits.h
Warning: Kernel ABI header at 'tools/include/vdso/bits.h' differs from latest version at 'include/vdso/bits.h'
diff -u tools/include/vdso/bits.h include/vdso/bits.h
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
We also continue with SYM_TYPED_FUNC_START() in util/include/linux/linkage.h
and with an exception in tools/perf/check_headers.sh's diff check to ignore
the include cfi_types.h line when checking if the kernel original files drifted
from the copies we carry.
This is to get the changes from:
69d4c0d321 ("entry, kasan, x86: Disallow overriding mem*() functions")
That addresses these perf tools build warning:
Warning: Kernel ABI header at 'tools/arch/x86/lib/memcpy_64.S' differs from latest version at 'arch/x86/lib/memcpy_64.S'
diff -u tools/arch/x86/lib/memcpy_64.S arch/x86/lib/memcpy_64.S
Warning: Kernel ABI header at 'tools/arch/x86/lib/memset_64.S' differs from latest version at 'arch/x86/lib/memset_64.S'
diff -u tools/arch/x86/lib/memset_64.S arch/x86/lib/memset_64.S
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/lkml/ZAH%2FjsioJXGIOrkf@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When user space updates the trip point there is a deadlock, which results
in caller gets blocked forever.
Commit 05eeee2b51 ("thermal/core: Protect sysfs accesses to thermal
operations with thermal zone mutex"), added a mutex for tz->lock in the
function trip_point_temp_store(). Hence, trip set callback() can't
call any thermal zone API as they are protected with the same mutex lock.
The callback here calling thermal_zone_device_enable(), which will result
in deadlock.
Move the thermal_zone_device_enable() to proc_thermal_pci_probe() to
avoid this deadlock.
Fixes: 05eeee2b51 ("thermal/core: Protect sysfs accesses to thermal operations with thermal zone mutex")
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@intel.com>
Cc: 6.2+ <stable@vger.kernel.org> # 6.2+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
When a reset notify IPC message is received, the ISR schedules a work
function and passes the ISHTP device to it via a global pointer
ishtp_dev. If ish_probe() fails, the devm-managed device resources
including ishtp_dev are freed, but the work is not cancelled, causing a
use-after-free when the work function tries to access ishtp_dev. Use
devm_work_autocancel() instead, so that the work is automatically
cancelled if probe fails.
Signed-off-by: Reka Norman <rekanorman@chromium.org>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Simon Horman says:
====================
nfp: fix incorrect IPsec checksum handling
this short series resolves two problems with IPsec checksum handling
in the nfp driver.
* PATCH 1/3, 2/3: Correct setting of checksum flags.
One patch for each of the nfd3 and nfdk datapaths.
* Patch 3/3: Correct configuration of NETIF_F_CSUM_MASK
so that the stack does not unecessarily calculate csums for
IPsec offload packets.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When esp-tx-csum-offload is set to on, the protocol stack shouldn't
calculate the IPsec offload packet's csum, but it does. Because the
callback `.ndo_features_check` incorrectly masked NETIF_F_CSUM_MASK bit.
Fixes: 57f273adbc ("nfp: add framework to support ipsec offloading")
Signed-off-by: Huanhuan Wang <huanhuan.wang@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The csum flag of IPsec packet are set repeatedly. Therefore, the csum
flag set of IPsec and non-IPsec packet need to be distinguished.
As the ipv6 header does not have a csum field, so l3-csum flag is not
required to be set for ipv6 case.
Fixes: 436396f26d ("nfp: support IPsec offloading for NFP3800")
Signed-off-by: Huanhuan Wang <huanhuan.wang@corigine.com>
Reviewed-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The csum flag of IPsec packet are set repeatedly. Therefore, the csum
flag set of IPsec and non-IPsec packet need to be distinguished.
As the ipv6 header does not have a csum field, so l3-csum flag is not
required to be set for ipv6 case.
L4-csum flag include the tcp csum flag and udp csum flag, we shouldn't
set the udp and tcp csum flag at the same time for one packet, should
set l4-csum flag according to the transport layer is tcp or udp.
Fixes: 57f273adbc ("nfp: add framework to support ipsec offloading")
Signed-off-by: Huanhuan Wang <huanhuan.wang@corigine.com>
Reviewed-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski says:
====================
tools: ynl: fix subset use and change default value for attrs/ops
Fix a problem in subsetting, which will become apparent when
the devlink family comes after the merge window. Even tho none
of the existing families need this, we don't want someone to
get "inspired" by the current, incorrect code when using specs
in other languages.
Change the default value for the first attr/op. This is a slight
behavior change so needs to go in now. The diffstat of the last
patch should serve as the clearest justification there..
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that the codegen rules had been changed we can update
the specs to reflect the new default.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pretty much all families use value: 1 or reserve as unspec
the first entry in attribute set and the first operation.
Make this the default. Update documentation (the doc for
values of operations just refers back to doc for attrs
so updating only attrs).
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
To avoid having to repeat the entire definition of an attribute
(including the value) use the Attr object from the original set.
In fact this is already the documented expectation.
Fixes: be5bea1cc0 ("net: add basic C code generators for Netlink")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stefan Schmidt says:
====================
ieee802154 for net 2023-03-02
Two small fixes this time.
Alexander Aring fixed a potential negative array access in the ca8210
driver.
Miquel Raynal fixed a crash that could have been triggered through
the extended netlink API for 802154. This only came in this merge window.
Found by syzkaller.
* tag 'ieee802154-for-net-2023-03-02' of git://git.kernel.org/pub/scm/linux/kernel/git/wpan/wpan:
ieee802154: Prevent user from crashing the host
ca8210: fix mac_len negative array access
====================
Link: https://lore.kernel.org/r/20230302153032.1312755-1-stefan@datenfreihafen.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
syzbot reported use-after-free in cfusbl_device_notify() [1]. This
causes a stack trace like below:
BUG: KASAN: use-after-free in cfusbl_device_notify+0x7c9/0x870 net/caif/caif_usb.c:138
Read of size 8 at addr ffff88807ac4e6f0 by task kworker/u4:6/1214
CPU: 0 PID: 1214 Comm: kworker/u4:6 Not tainted 5.19.0-rc3-syzkaller-00146-g92f20ff72066 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: netns cleanup_net
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description.constprop.0.cold+0xeb/0x467 mm/kasan/report.c:313
print_report mm/kasan/report.c:429 [inline]
kasan_report.cold+0xf4/0x1c6 mm/kasan/report.c:491
cfusbl_device_notify+0x7c9/0x870 net/caif/caif_usb.c:138
notifier_call_chain+0xb5/0x200 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x130 net/core/dev.c:1945
call_netdevice_notifiers_extack net/core/dev.c:1983 [inline]
call_netdevice_notifiers net/core/dev.c:1997 [inline]
netdev_wait_allrefs_any net/core/dev.c:10227 [inline]
netdev_run_todo+0xbc0/0x10f0 net/core/dev.c:10341
default_device_exit_batch+0x44e/0x590 net/core/dev.c:11334
ops_exit_list+0x125/0x170 net/core/net_namespace.c:167
cleanup_net+0x4ea/0xb00 net/core/net_namespace.c:594
process_one_work+0x996/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e9/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:302
</TASK>
When unregistering a net device, unregister_netdevice_many_notify()
sets the device's reg_state to NETREG_UNREGISTERING, calls notifiers
with NETDEV_UNREGISTER, and adds the device to the todo list.
Later on, devices in the todo list are processed by netdev_run_todo().
netdev_run_todo() waits devices' reference count become 1 while
rebdoadcasting NETDEV_UNREGISTER notification.
When cfusbl_device_notify() is called with NETDEV_UNREGISTER multiple
times, the parent device might be freed. This could cause UAF.
Processing NETDEV_UNREGISTER multiple times also causes inbalance of
reference count for the module.
This patch fixes the issue by accepting only first NETDEV_UNREGISTER
notification.
Fixes: 7ad65bf68d ("caif: Add support for CAIF over CDC NCM USB interface")
CC: sjur.brandeland@stericsson.com <sjur.brandeland@stericsson.com>
Reported-by: syzbot+b563d33852b893653a9e@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?id=c3bfd8e2450adab3bffe4d80821fbbced600407f [1]
Signed-off-by: Shigeru Yoshida <syoshida@redhat.com>
Link: https://lore.kernel.org/r/20230301163913.391304-1-syoshida@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Move the LAN7800 internal phy (phy ID 0x0007c132) specific register
accesses to the phy driver (microchip.c).
Fix the error reported by Enguerrand de Ribaucourt in December 2022,
"Some operations during the cable switch workaround modify the register
LAN88XX_INT_MASK of the PHY. However, this register is specific to the
LAN8835 PHY. For instance, if a DP8322I PHY is connected to the LAN7801,
that register (0x19), corresponds to the LED and MAC address
configuration, resulting in unapropriate behavior."
I did not test with the DP8322I PHY, but I tested with an EVB-LAN7800
with the internal PHY.
Fixes: 14437e3fa2 ("lan78xx: workaround of forced 100 Full/Half duplex mode error")
Signed-off-by: Yuiko Oshino <yuiko.oshino@microchip.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20230301154307.30438-1-yuiko.oshino@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When creating counters with initial delay configured, the enable_on_exec
field is not set. So we need to enable the counters later. The problem
is, when a workload is specified the target__none() is true. So we also
need to check stat_config.initial_delay.
In this change, we add a new field 'initial_delay' for struct target
which could be shared by other subcommands. And define
target__enable_on_exec() which returns whether enable_on_exec should be
set on normal cases.
Before this fix the event is not counted:
$ ./perf stat -e instructions -D 100 sleep 2
Events disabled
Events enabled
Performance counter stats for 'sleep 2':
<not counted> instructions
1.901661124 seconds time elapsed
0.001602000 seconds user
0.000000000 seconds sys
After fix it works:
$ ./perf stat -e instructions -D 100 sleep 2
Events disabled
Events enabled
Performance counter stats for 'sleep 2':
404,214 instructions
1.901743475 seconds time elapsed
0.001617000 seconds user
0.000000000 seconds sys
Fixes: c587e77e10 ("perf stat: Do not delay the workload with --delay")
Signed-off-by: Changbin Du <changbin.du@huawei.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Hui Wang <hw.huiwang@huawei.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230302031146.2801588-2-changbin.du@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick the changes in:
8c29f01654 ("x86/sev: Add SEV-SNP guest feature negotiation support")
That triggers:
CC /tmp/build/perf-tools/arch/x86/util/kvm-stat.o
CC /tmp/build/perf-tools/util/header.o
LD /tmp/build/perf-tools/arch/x86/util/perf-in.o
LD /tmp/build/perf-tools/arch/x86/perf-in.o
LD /tmp/build/perf-tools/arch/perf-in.o
LD /tmp/build/perf-tools/util/perf-in.o
LD /tmp/build/perf-tools/perf-in.o
LINK /tmp/build/perf-tools/perf
But this time causes no changes in tooling results, as the introduced
SVM_VMGEXIT_TERM_REQUEST exit reason wasn't added to SVM_EXIT_REASONS,
that is used in kvm-stat.c.
And addresses this perf build warning:
Warning: Kernel ABI header at 'tools/arch/x86/include/uapi/asm/svm.h' differs from latest version at 'arch/x86/include/uapi/asm/svm.h'
diff -u tools/arch/x86/include/uapi/asm/svm.h arch/x86/include/uapi/asm/svm.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nikunj A Dadhania <nikunj@amd.com>
Link: http://lore.kernel.org/lkml/
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Florian reported a regression and sent a patch with the following
changelog:
<quote>
There is a noticeable tcp performance regression (loopback or cross-netns),
seen with iperf3 -Z (sendfile mode) when generic retpolines are needed.
With SK_RECLAIM_THRESHOLD checks gone number of calls to enter/leave
memory pressure happen much more often. For TCP indirect calls are
used.
We can't remove the if-set-return short-circuit check in
tcp_enter_memory_pressure because there are callers other than
sk_enter_memory_pressure. Doing a check in the sk wrapper too
reduces the indirect calls enough to recover some performance.
Before,
0.00-60.00 sec 322 GBytes 46.1 Gbits/sec receiver
After:
0.00-60.04 sec 359 GBytes 51.4 Gbits/sec receiver
"iperf3 -c $peer -t 60 -Z -f g", connected via veth in another netns.
</quote>
It seems we forgot to upstream this indirect call mitigation we
had for years, lets do this instead.
[edumazet] - It seems we forgot to upstream this indirect call
mitigation we had for years, let's do this instead.
- Changed to INDIRECT_CALL_INET_1() to avoid bots reports.
Fixes: 4890b686f4 ("net: keep sk->sk_forward_alloc as small as possible")
Reported-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/netdev/20230227152741.4a53634b@kernel.org/T/
Signed-off-by: Brian Vazquez <brianvv@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20230301133247.2346111-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
1) Fix bogus error report in selftests/netfilter/nft_nat.sh,
from Hangbin Liu.
2) Initialize last and quota expressions from template when
expr_ops::clone is called, otherwise, states are not restored
accordingly when loading a dynamic set with elements using
these two expressions.
* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nft_quota: copy content when cloning expression
netfilter: nft_last: copy content when cloning expression
selftests: nft_nat: ensuring the listening side is up before starting the client
====================
Link: https://lore.kernel.org/r/20230301222021.154670-1-pablo@netfilter.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Both Xavier (Tegra194) and Orin (Tegra234) support a 40-bit address map,
so bump the CBB ranges property to cover all of the 1 TiB address space.
This fixes an issue where some of the PCIe regions could not be remapped
because of they were outside the memory specified by the CBB's ranges
property.
Reported-by: Jonathan Hunter <jonathanh@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Make it possible to see the distribution of size classes for block
groups. Helpful for testing and debugging the allocator w.r.t. to size
classes.
The new stats can be found at the path:
/sys/fs/btrfs/<FSID>/allocation/<bg-type>/size_class
but they will only be non-zero for bg-type = data.
Signed-off-by: Boris Burkov <boris@bur.io>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
If the ruleset contains consumed quota, restore them accordingly.
Otherwise, listing after restoration shows never used items.
Restore the user-defined quota and flags too.
Fixes: ed0a0c60f0 ("netfilter: nft_quota: move stateful fields out of expression data")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
If the ruleset contains last timestamps, restore them accordingly.
Otherwise, listing after restoration shows never used items.
Fixes: 33a24de37e ("netfilter: nft_last: move stateful fields out of expression data")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The test_local_dnat_portonly() function initiates the client-side as
soon as it sets the listening side to the background. This could lead to
a race condition where the server may not be ready to listen. To ensure
that the server-side is up and running before initiating the
client-side, a delay is introduced to the test_local_dnat_portonly()
function.
Before the fix:
# ./nft_nat.sh
PASS: netns routing/connectivity: ns0-rthlYrBU can reach ns1-rthlYrBU and ns2-rthlYrBU
PASS: ping to ns1-rthlYrBU was ip NATted to ns2-rthlYrBU
PASS: ping to ns1-rthlYrBU OK after ip nat output chain flush
PASS: ipv6 ping to ns1-rthlYrBU was ip6 NATted to ns2-rthlYrBU
2023/02/27 04:11:03 socat[6055] E connect(5, AF=2 10.0.1.99:2000, 16): Connection refused
ERROR: inet port rewrite
After the fix:
# ./nft_nat.sh
PASS: netns routing/connectivity: ns0-9sPJV6JJ can reach ns1-9sPJV6JJ and ns2-9sPJV6JJ
PASS: ping to ns1-9sPJV6JJ was ip NATted to ns2-9sPJV6JJ
PASS: ping to ns1-9sPJV6JJ OK after ip nat output chain flush
PASS: ipv6 ping to ns1-9sPJV6JJ was ip6 NATted to ns2-9sPJV6JJ
PASS: inet port rewrite without l3 address
Fixes: 282e5f8fe9 ("netfilter: nat: really support inet nat without l3 address")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
When the police was removed from the port, then it was trying to
remove the police from the police id and not from the actual
police index.
The police id represents the id of the police and police index
represents the position in HW where the police is situated.
The port police id can be any number while the port police index
is a number based on the port chip port.
Fix this by deleting the police from HW that is situated at the
police index and not police id.
Fixes: 5390334b59 ("net: lan966x: Add port police support using tc-matchall")
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Smatch reports that 'ci' can be used uninitialized.
The current code ignores errno coming from tcf_idr_check_alloc, which
will lead to the incorrect usage of 'ci'. Handle the errno as it should.
Fixes: 288864effe ("net/sched: act_connmark: transition to percpu stats and rcu")
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gaurav reports that TLS Rx is broken with async crypto
accelerators. The commit under fixes missed updating
the retval byte counting logic when updating how records
are stored. Even tho both before and after the change
'decrypted' was updated inside the main loop, it was
completely overwritten when processing the async
completions. Now that the rx_list only holds
non-zero-copy records we need to add, not overwrite.
Reported-and-bisected-by: Gaurav Jain <gaurav.jain@nxp.com>
Fixes: cbbdee9918 ("tls: rx: async: don't put async zc on the list")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217064
Tested-by: Gaurav Jain <gaurav.jain@nxp.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230227181201.1793772-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean says:
====================
Freescale T1040RDB DTS updates
This contains a fix for the new device tree for the T1040RDB rev A
board, which never worked, and an update to enable multiple CPU port
support for all revisions of the T1040RDB.
====================
Link: https://lore.kernel.org/r/20230224155941.514638-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Since commit eca70102cf ("net: dsa: felix: add support for changing
DSA master") included in kernel v6.1, the driver supports 2 CPU ports,
and they can be put in a LAG, for example (see
Documentation/networking/dsa/configuration.rst for more details).
Defining the second CPU port in the device tree should not cause any
compatibility issue, because the default CPU port was &seville_port8
before this change, and still is &seville_port8 now (the numerically
first CPU port is used by default).
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
It looks like U-Boot fails to start the kernel properly when the
compatible string of the board isn't fsl,T1040RDB, so stop overriding it
from the rev-a.dts.
Fixes: 5ebb747492 ("powerpc: dts: t1040rdb: fix ports names for Seville Ethernet switch")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
There is a potential race condition in amdtee_open_session that may
lead to use-after-free. For instance, in amdtee_open_session() after
sess->sess_mask is set, and before setting:
sess->session_info[i] = session_info;
if amdtee_close_session() closes this same session, then 'sess' data
structure will be released, causing kernel panic when 'sess' is
accessed within amdtee_open_session().
The solution is to set the bit sess->sess_mask as the last step in
amdtee_open_session().
Fixes: 757cc3e9ff ("tee: add AMD-TEE driver")
Cc: stable@vger.kernel.org
Signed-off-by: Rijo Thomas <Rijo-john.Thomas@amd.com>
Acked-by: Sumit Garg <sumit.garg@linaro.org>
Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>
While bringing hardware up we should perform a full reset including the
switch bit (BGMAC_BCMA_IOCTL_SW_RESET aka SICF_SWRST). It's what
specification says and what reference driver does.
This seems to be critical for the BCM5358. Without this hardware doesn't
get initialized properly and doesn't seem to transmit or receive any
packets.
Originally bgmac was calling bgmac_chip_reset() before setting
"has_robosw" property which resulted in expected behaviour. That has
changed as a side effect of adding platform device support which
regressed BCM5358 support.
Fixes: f6a95a2495 ("net: ethernet: bgmac: Add platform device support")
Cc: Jon Mason <jdmason@kudzu.us>
Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20230227091156.19509-1-zajec5@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Consider this scenario:
1. APP1 continuously creates lots of small GEMs
2. APP2 triggers `drop_caches`
3. Shrinker starts to evict APP1 GEMs, while APP1 produces new purgeable
GEMs
4. msm_gem_shrinker_scan() returns non-zero number of freed pages
and causes shrinker to try shrink more
5. msm_gem_shrinker_scan() returns non-zero number of freed pages again,
goto 4
6. The APP2 is blocked in `drop_caches` until APP1 stops producing
purgeable GEMs
To prevent this blocking scenario, check number of remaining pages
that GPU shrinker couldn't release due to a GEM locking contention
or shrinking rejection. If there are no remaining pages left to shrink,
then there is no need to free up more pages and shrinker may break out
from the loop.
This problem was found during shrinker/madvise IOCTL testing of
virtio-gpu driver. The MSM driver is affected in the same way.
Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
Fixes: b352ba54a8 ("drm/msm/gem: Convert to using drm_gem_lru")
Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Link: https://lore.kernel.org/all/20230108210445.3948344-2-dmitry.osipenko@collabora.com/
The "vdev->dev.parent" should be used instead of "vdev->dev" as a device
for which to perform the DMA operation in both
virtio_gpu_cmd_transfer_to_host_2d(3d).
Because the virtio-gpu device "vdev->dev" doesn't really have DMA OPS
assigned to it, but parent (virtio-pci or virtio-mmio) device
"vdev->dev.parent" has. The more, the sgtable in question the code is
trying to sync here was mapped for the parent device (by using its DMA OPS)
previously at:
virtio_gpu_object_shmem_init()->drm_gem_shmem_get_pages_sgt()->
dma_map_sgtable(), so should be synced here for the same parent device.
Fixes: b5c9ed70d1 ("drm/virtio: Improve DMA API usage for shmem BOs")
Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Reviewed-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230224153450.526222-1-olekstysh@gmail.com
xfrm state selectors are matched against the inner-most flow
which can be of any address family. Therefore middle states
in nested configurations need to carry a wildcard selector in
order to work at all.
However, this is currently forbidden for transport-mode states.
Fix this by removing the unnecessary check.
Fixes: 13996378e6 ("[IPSEC]: Rename mode to outer_mode and add inner_mode")
Reported-by: David George <David.George@sophos.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
The CP2112 generates interrupts from a polling routine on a thread,
and can only support threaded interrupts. This patch configures the
gpiochip irq chip with this flag, disallowing consumers to request
a hard IRQ from this driver, which resulted in a segfault previously.
Signed-off-by: Danny Kaehn <kaehndan@gmail.com>
Link: https://lore.kernel.org/r/20230210170044.11835-1-kaehndan@gmail.com
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Now that CONFIG_HID_BPF is not automatically implied by HID, we need
to set it properly in the selftests config.
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
When MMAP2 has the PERF_RECORD_MISC_MMAP_BUILD_ID flag, it means the
record already has the build-id info. So it marks the DSO as hit, to
skip if the same DSO is not processed if it happens to miss the build-id
later.
But it missed to copy the MMAP2 record itself so it'd fail to symbolize
samples for those regions.
For example, the following generates 249 MMAP2 events.
$ perf record --buildid-mmap -o- true | perf report --stat -i- | grep MMAP2
MMAP2 events: 249 (86.8%)
Adding perf inject should not change the number of events like this
$ perf record --buildid-mmap -o- true | perf inject -b | \
> perf report --stat -i- | grep MMAP2
MMAP2 events: 249 (86.5%)
But when --buildid-all is used, it eats most of the MMAP2 events.
$ perf record --buildid-mmap -o- true | perf inject -b --buildid-all | \
> perf report --stat -i- | grep MMAP2
MMAP2 events: 1 ( 2.5%)
With this patch, it shows the original number now.
$ perf record --buildid-mmap -o- true | perf inject -b --buildid-all | \
> perf report --stat -i- | grep MMAP2
MMAP2 events: 249 (86.5%)
Committer testing:
Before:
$ perf record --buildid-mmap -o- perf stat --null sleep 1 2> /dev/null | perf inject -b | perf report --stat -i- | grep MMAP2
MMAP2 events: 58 (36.2%)
$ perf record --buildid-mmap -o- perf stat --null sleep 1 2> /dev/null | perf report --stat -i- | grep MMAP2
MMAP2 events: 58 (36.2%)
$ perf record --buildid-mmap -o- perf stat --null sleep 1 2> /dev/null | perf inject -b --buildid-all | perf report --stat -i- | grep MMAP2
MMAP2 events: 2 ( 1.9%)
$
After:
$ perf record --buildid-mmap -o- perf stat --null sleep 1 2> /dev/null | perf inject -b | perf report --stat -i- | grep MMAP2
MMAP2 events: 58 (29.3%)
$ perf record --buildid-mmap -o- perf stat --null sleep 1 2> /dev/null | perf report --stat -i- | grep MMAP2
MMAP2 events: 58 (34.3%)
$ perf record --buildid-mmap -o- perf stat --null sleep 1 2> /dev/null | perf inject -b --buildid-all | perf report --stat -i- | grep MMAP2
MMAP2 events: 58 (38.4%)
$
Fixes: f7fc0d1c91 ("perf inject: Do not inject BUILD_ID record if MMAP2 has it")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230223070155.54251-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The default maximum data buffer size for this interface is UHID_DATA_MAX
(4k). When data buffers are being processed, ensure this value is used
when ensuring the sanity, rather than a value between the user provided
value and HID_MAX_BUFFER_SIZE (16k).
Signed-off-by: Lee Jones <lee@kernel.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Presently, when a report is processed, its proposed size, provided by
the user of the API (as Report Size * Report Count) is compared against
the subsystem default HID_MAX_BUFFER_SIZE (16k). However, some
low-level HID drivers allocate a reduced amount of memory to their
buffers (e.g. UHID only allocates UHID_DATA_MAX (4k) buffers), rending
this check inadequate in some cases.
In these circumstances, if the received report ends up being smaller
than the proposed report size, the remainder of the buffer is zeroed.
That is, the space between sizeof(csize) (size of the current report)
and the rsize (size proposed i.e. Report Size * Report Count), which can
be handled up to HID_MAX_BUFFER_SIZE (16k). Meaning that memset()
shoots straight past the end of the buffer boundary and starts zeroing
out in-use values, often resulting in calamity.
This patch introduces a new variable into 'struct hid_ll_driver' where
individual low-level drivers can over-ride the default maximum value of
HID_MAX_BUFFER_SIZE (16k) with something more sympathetic to the
interface.
Signed-off-by: Lee Jones <lee@kernel.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Eduard Zingerman says:
====================
This patch-set modifies BPF verifier to accept programs that read from
uninitialized stack locations, but only if executed in privileged mode.
This provides significant verification performance gains: 30% to 70% less
processed states for big number of test programs.
The reason for performance gains comes from treating STACK_MISC and
STACK_INVALID as compatible, when cached state is compared to current state
in verifier.c:stacksafe().
The change should not affect safety, because any value read from STACK_MISC
location has full binary range (e.g. 0x00-0xff for byte-sized reads).
Details and measurements are provided in the description for the patch #1.
The change was suggested by Andrii Nakryiko, the initial patch was created
by Alexei Starovoitov. The discussion could be found at [1].
Changes v1 -> v2 (v1 available at [2]):
- Calls to helper functions now convert STACK_INVALID to STACK_MISC
(suggested by Andrii);
- The test case progs/test_global_func10.c is updated to expect new
error message. Before recent commit [3] exact content of error
messages was not verified for this test.
- Replaced incorrect '//'-style comments in test case asm blocks by
'/*...*/'-style comments in order to fix compilation issues;
- Changed the tag from "Suggested-By" to "Co-developed-by" for Alexei
on patch #1, please let me know if this is appropriate use of the tag.
[1] https://lore.kernel.org/bpf/CAADnVQKs2i1iuZ5SUGuJtxWVfGYR9kDgYKhq3rNV+kBLQCu7rA@mail.gmail.com/
[2] https://lore.kernel.org/bpf/20230216183606.2483834-1-eddyz87@gmail.com/
[3] 95ebb37617 ("selftests/bpf: Convert test_global_funcs test to test_loader framework")
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Three testcases to make sure that stack reads from uninitialized
locations are accepted by verifier when executed in privileged mode:
- read from a fixed offset;
- read from a variable offset;
- passing a pointer to stack to a helper converts
STACK_INVALID to STACK_MISC.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230219200427.606541-3-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This commits updates the following functions to allow reads from
uninitialized stack locations when env->allow_uninit_stack option is
enabled:
- check_stack_read_fixed_off()
- check_stack_range_initialized(), called from:
- check_stack_read_var_off()
- check_helper_mem_access()
Such change allows to relax logic in stacksafe() to treat STACK_MISC
and STACK_INVALID in a same way and make the following stack slot
configurations equivalent:
| Cached state | Current state |
| stack slot | stack slot |
|------------------+------------------|
| STACK_INVALID or | STACK_INVALID or |
| STACK_MISC | STACK_SPILL or |
| | STACK_MISC or |
| | STACK_ZERO or |
| | STACK_DYNPTR |
This leads to significant verification speed gains (see below).
The idea was suggested by Andrii Nakryiko [1] and initial patch was
created by Alexei Starovoitov [2].
Currently the env->allow_uninit_stack is allowed for programs loaded
by users with CAP_PERFMON or CAP_SYS_ADMIN capabilities.
A number of test cases from verifier/*.c were expecting uninitialized
stack access to be an error. These test cases were updated to execute
in unprivileged mode (thus preserving the tests).
The test progs/test_global_func10.c expected "invalid indirect read
from stack" error message because of the access to uninitialized
memory region. This error is no longer possible in privileged mode.
The test is updated to provoke an error "invalid indirect access to
stack" because of access to invalid stack address (such error is not
verified by progs/test_global_func*.c series of tests).
The following tests had to be removed because these can't be made
unprivileged:
- verifier/sock.c:
- "sk_storage_get(map, skb->sk, &stack_value, 1): partially init
stack_value"
BPF_PROG_TYPE_SCHED_CLS programs are not executed in unprivileged mode.
- verifier/var_off.c:
- "indirect variable-offset stack access, max_off+size > max_initialized"
- "indirect variable-offset stack access, uninitialized"
These tests verify that access to uninitialized stack values is
detected when stack offset is not a constant. However, variable
stack access is prohibited in unprivileged mode, thus these tests
are no longer valid.
* * *
Here is veristat log comparing this patch with current master on a
set of selftest binaries listed in tools/testing/selftests/bpf/veristat.cfg
and cilium BPF binaries (see [3]):
$ ./veristat -e file,prog,states -C -f 'states_pct<-30' master.log current.log
File Program States (A) States (B) States (DIFF)
-------------------------- -------------------------- ---------- ---------- ----------------
bpf_host.o tail_handle_ipv6_from_host 349 244 -105 (-30.09%)
bpf_host.o tail_handle_nat_fwd_ipv4 1320 895 -425 (-32.20%)
bpf_lxc.o tail_handle_nat_fwd_ipv4 1320 895 -425 (-32.20%)
bpf_sock.o cil_sock4_connect 70 48 -22 (-31.43%)
bpf_sock.o cil_sock4_sendmsg 68 46 -22 (-32.35%)
bpf_xdp.o tail_handle_nat_fwd_ipv4 1554 803 -751 (-48.33%)
bpf_xdp.o tail_lb_ipv4 6457 2473 -3984 (-61.70%)
bpf_xdp.o tail_lb_ipv6 7249 3908 -3341 (-46.09%)
pyperf600_bpf_loop.bpf.o on_event 287 145 -142 (-49.48%)
strobemeta.bpf.o on_event 15915 4772 -11143 (-70.02%)
strobemeta_nounroll2.bpf.o on_event 17087 3820 -13267 (-77.64%)
xdp_synproxy_kern.bpf.o syncookie_tc 21271 6635 -14636 (-68.81%)
xdp_synproxy_kern.bpf.o syncookie_xdp 23122 6024 -17098 (-73.95%)
-------------------------- -------------------------- ---------- ---------- ----------------
Note: I limited selection by states_pct<-30%.
Inspection of differences in pyperf600_bpf_loop behavior shows that
the following patch for the test removes almost all differences:
- a/tools/testing/selftests/bpf/progs/pyperf.h
+ b/tools/testing/selftests/bpf/progs/pyperf.h
@ -266,8 +266,8 @ int __on_event(struct bpf_raw_tracepoint_args *ctx)
}
if (event->pthread_match || !pidData->use_tls) {
- void* frame_ptr;
- FrameData frame;
+ void* frame_ptr = 0;
+ FrameData frame = {};
Symbol sym = {};
int cur_cpu = bpf_get_smp_processor_id();
W/o this patch the difference comes from the following pattern
(for different variables):
static bool get_frame_data(... FrameData *frame ...)
{
...
bpf_probe_read_user(&frame->f_code, ...);
if (!frame->f_code)
return false;
...
bpf_probe_read_user(&frame->co_name, ...);
if (frame->co_name)
...;
}
int __on_event(struct bpf_raw_tracepoint_args *ctx)
{
FrameData frame;
...
get_frame_data(... &frame ...) // indirectly via a bpf_loop & callback
...
}
SEC("raw_tracepoint/kfree_skb")
int on_event(struct bpf_raw_tracepoint_args* ctx)
{
...
ret |= __on_event(ctx);
ret |= __on_event(ctx);
...
}
With regards to value `frame->co_name` the following is important:
- Because of the conditional `if (!frame->f_code)` each call to
__on_event() produces two states, one with `frame->co_name` marked
as STACK_MISC, another with it as is (and marked STACK_INVALID on a
first call).
- The call to bpf_probe_read_user() does not mark stack slots
corresponding to `&frame->co_name` as REG_LIVE_WRITTEN but it marks
these slots as BPF_MISC, this happens because of the following loop
in the check_helper_call():
for (i = 0; i < meta.access_size; i++) {
err = check_mem_access(env, insn_idx, meta.regno, i, BPF_B,
BPF_WRITE, -1, false);
if (err)
return err;
}
Note the size of the write, it is a one byte write for each byte
touched by a helper. The BPF_B write does not lead to write marks
for the target stack slot.
- Which means that w/o this patch when second __on_event() call is
verified `if (frame->co_name)` will propagate read marks first to a
stack slot with STACK_MISC marks and second to a stack slot with
STACK_INVALID marks and these states would be considered different.
[1] https://lore.kernel.org/bpf/CAEf4BzY3e+ZuC6HUa8dCiUovQRg2SzEk7M-dSkqNZyn=xEmnPA@mail.gmail.com/
[2] https://lore.kernel.org/bpf/CAADnVQKs2i1iuZ5SUGuJtxWVfGYR9kDgYKhq3rNV+kBLQCu7rA@mail.gmail.com/
[3] git@github.com:anakryiko/cilium.git
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Co-developed-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230219200427.606541-2-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
To pick up the changes from these csets:
8c29f01654 ("x86/sev: Add SEV-SNP guest feature negotiation support")
That cause no changes to tooling:
$ tools/perf/trace/beauty/tracepoints/x86_msr.sh > before
$ cp arch/x86/include/asm/msr-index.h tools/arch/x86/include/asm/msr-index.h
$ tools/perf/trace/beauty/tracepoints/x86_msr.sh > after
$ diff -u before after
$
Just silences this perf build warning:
Warning: Kernel ABI header at 'tools/arch/x86/include/asm/msr-index.h' differs from latest version at 'arch/x86/include/asm/msr-index.h'
diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Nikunj A Dadhania <nikunj@amd.com>
Link: https://lore.kernel.org/lkml/Y%2FZrNvtcijPWagCp@kernel.org/
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
If, for whatever reason, we're trying process adreno_runtime_resume()
at the same time that a6xx_destroy() is running then things can go
boom. Specifically adreno_runtime_resume() will eventually call
a6xx_pm_resume() and that may try to resume the gmu.
Let's grab the GMU lock as we're destroying the GMU. That will solve
the race because a6xx_pm_resume() grabs the same lock. That makes the
access of `gmu->initialized` in a6xx_gmu_resume() safe.
We'll also return an error code in a6xx_gmu_resume() if we see that
`gmu->initialized` was false. If this happens we'll bail out of the
rest of a6xx_pm_resume(), which is good because the rest of that
function is also not good to do if we're racing with a6xx_destroy().
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Patchwork: https://patchwork.freedesktop.org/patch/521232/
Link: https://lore.kernel.org/r/20230202104822.1.I0e49003bf4dd1dead9be4a29dbee41f3b1236e48@changeid
Signed-off-by: Rob Clark <robdclark@chromium.org>
gcc-13 warns about mismatching types for enums. That revealed switched
arguments of nv50_wndw_new_():
drivers/gpu/drm/nouveau/dispnv50/wndw.c:696:1: error: conflicting types for 'nv50_wndw_new_' due to enum/integer mismatch; have 'int(const struct nv50_wndw_func *, struct drm_device *, enum drm_plane_type, const char *, int, const u32 *, u32, enum nv50_disp_interlock_type, u32, struct nv50_wndw **)'
drivers/gpu/drm/nouveau/dispnv50/wndw.h:36:5: note: previous declaration of 'nv50_wndw_new_' with type 'int(const struct nv50_wndw_func *, struct drm_device *, enum drm_plane_type, const char *, int, const u32 *, enum nv50_disp_interlock_type, u32, u32, struct nv50_wndw **)'
It can be barely visible, but the declaration says about the parameters
in the middle:
enum nv50_disp_interlock_type,
u32 interlock_data,
u32 heads,
While the definition states differently:
u32 heads,
enum nv50_disp_interlock_type interlock_type,
u32 interlock_data,
Unify/fix the declaration to match the definition.
Fixes: 53e0a3e70d ("drm/nouveau/kms/nv50-: simplify tracking of channel interlocks")
Cc: Martin Liska <mliska@suse.cz>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Karol Herbst <kherbst@redhat.com>
Cc: Lyude Paul <lyude@redhat.com>
Cc: David Airlie <airlied@gmail.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: dri-devel@lists.freedesktop.org
Cc: nouveau@lists.freedesktop.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221031114229.10289-1-jirislaby@kernel.org
For output buffers, there's no guarantee that the buffer won't be full
in the first iteration of the loop in which case we would block
independently of userspace passing O_NONBLOCK or not. Fix it by always
checking the flag before going to sleep.
While at it (and as it's a bit related), refactored the loop so that the
stop condition is 'written != n', i.e, run the loop until all data has
been copied into the IIO buffers. This makes the code a bit simpler.
Fixes: 9eeee3b0bf ("iio: Add output buffer support")
Signed-off-by: Nuno Sá <nuno.sa@analog.com>
Reviewed-by: Lars-Peter Clausen <lars@metafoo.de>
Link: https://lore.kernel.org/r/20230216101452.591805-3-nuno.sa@analog.com
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
After having been compared to NULL value at cirrus.c:455, pointer
'pipe->plane.state->fb' is passed as 1st parameter in call to function
'cirrus_fb_blit_rect' at cirrus.c:461, where it is dereferenced at
cirrus.c:316.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
v2:
* aligned commit message to line-length limits
Signed-off-by: Alexandr Sapozhnikov <alsp705@gmail.com>
Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20230215171549.16305-1-alsp705@gmail.com
When copying data to user-space we should ensure that only valid
data is copied over. Padding in structures may be filled with
random (possibly sensitve) data and should never be given directly
to user-space.
This patch fixes the copying of xfrm algorithms and the encap
template in xfrm_user so that padding is zeroed.
Reported-by: syzbot+fa5414772d5c445dac3c@syzkaller.appspotmail.com
Reported-by: Hyunwoo Kim <v4bel@theori.io>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
There are different init functions for the sensors in this driver in
which only one initializes the generic vcnl4000_lock. With commit
e21b5b1f26 ("iio: light: vcnl4000: Preserve conf bits when toggle power")
the vcnl4040 sensor started to depend on the lock, but it was missed to
initialize it in vcnl4040's init function. This has not been visible
until we run lockdep on it:
DEBUG_LOCKS_WARN_ON(lock->magic != lock)
at kernel/locking/mutex.c:575 __mutex_lock+0x4f8/0x890
Call trace:
__mutex_lock
mutex_lock_nested
vcnl4200_set_power_state
vcnl4200_init
vcnl4000_probe
Fix this by initializing the lock in the probe function instead of doing
it in the chip specific init functions.
Fixes: e21b5b1f26 ("iio: light: vcnl4000: Preserve conf bits when toggle power")
Signed-off-by: Mårten Lindahl <marten.lindahl@axis.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20230131140109.2067577-1-marten.lindahl@axis.com
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Correct the "sub_lsb" shift for the ltc2497 and drop the sub_lsb element
which is now constant.
An earlier version of the code shifted by 14 but this was a consequence
of reading three bytes into a __be32 buffer and using be32_to_cpu(), so
eight extra bits needed to be skipped. Now we use get_unaligned_be24()
and thus the additional skip is wrong.
Fixes: 2187cfeb36 ("drivers: iio: adc: ltc2497: LTC2499 support")
Signed-off-by: Ian Ray <ian.ray@ge.com>
Link: https://lore.kernel.org/r/20230127125714.44608-1-ian.ray@ge.com
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.