Pull kvm fix from Paolo Bonzini:
"Fix for the SLS mitigation, which makes a 'SETcc/RET' pair grow
to 'SETcc/RET/INT3'.
This doesn't fit in 4 bytes any more, so the alignment has to
change to 8 for this case"
* tag 'for-linus-5.17' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
kvm/emulate: Fix SETcc emulation function offsets with SLS
Pull input fixes from Dmitry Torokhov:
"Two driver fixes:
- a fix for zinitix touchscreen to properly report contacts
- a fix for aiptek tablet driver to be more resilient to devices with
incorrect descriptors"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: aiptek - properly check endpoint type
Input: zinitix - do not report shadow fingers
The commit in Fixes started adding INT3 after RETs as a mitigation
against straight-line speculation.
The fastop SETcc implementation in kvm's insn emulator uses macro magic
to generate all possible SETcc functions and to jump to them when
emulating the respective instruction.
However, it hardcodes the size and alignment of those functions to 4: a
three-byte SETcc insn and a single-byte RET. BUT, with SLS, there's an
INT3 that gets slapped after the RET, which brings the whole scheme out
of alignment:
15: 0f 90 c0 seto %al
18: c3 ret
19: cc int3
1a: 0f 1f 00 nopl (%rax)
1d: 0f 91 c0 setno %al
20: c3 ret
21: cc int3
22: 0f 1f 00 nopl (%rax)
25: 0f 92 c0 setb %al
28: c3 ret
29: cc int3
and this explodes like this:
int3: 0000 [#1] PREEMPT SMP PTI
CPU: 0 PID: 2435 Comm: qemu-system-x86 Not tainted 5.17.0-rc8-sls #1
Hardware name: Dell Inc. Precision WorkStation T3400 /0TP412, BIOS A14 04/30/2012
RIP: 0010:setc+0x5/0x8 [kvm]
Code: 00 00 0f 1f 00 0f b6 05 43 24 06 00 c3 cc 0f 1f 80 00 00 00 00 0f 90 c0 c3 cc 0f \
1f 00 0f 91 c0 c3 cc 0f 1f 00 0f 92 c0 c3 cc <0f> 1f 00 0f 93 c0 c3 cc 0f 1f 00 \
0f 94 c0 c3 cc 0f 1f 00 0f 95 c0
Call Trace:
<TASK>
? x86_emulate_insn [kvm]
? x86_emulate_instruction [kvm]
? vmx_handle_exit [kvm_intel]
? kvm_arch_vcpu_ioctl_run [kvm]
? kvm_vcpu_ioctl [kvm]
? __x64_sys_ioctl
? do_syscall_64
? entry_SYSCALL_64_after_hwframe
</TASK>
Raise the alignment value when SLS is enabled and use a macro for that
instead of hard-coding naked numbers.
Fixes: e463a09af2 ("x86: Add straight-line-speculation mitigation")
Reported-by: Jamie Heilman <jamie@audible.transient.net>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Jamie Heilman <jamie@audible.transient.net>
Link: https://lore.kernel.org/r/YjGzJwjrvxg5YZ0Z@audible.transient.net
[Add a comment and a bit of safety checking, since this is going to be changed
again for IBT support. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Pull ARM SoC fix from Arnd Bergmann:
"Here is one last regression fix for 5.17, reverting a patch that went
into 5.16 as a cleanup that ended up breaking external interrupts on
Layerscape chips.
The revert makes it work again, but also reintroduces a build time
warning about the nonstandard DT binding that will have to be dealt
with in the future"
* tag 'soc-fixes-5.17-4' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
Revert "arm64: dts: freescale: Fix 'interrupt-map' parent address cells"
Pull SCSI fixes from James Bottomley:
"Two small(ish) fixes, both in drivers"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: fnic: Finish scsi_cmnd before dropping the spinlock
scsi: mpt3sas: Page fault in reply q processing
Pull perf tools fixes from Arnaldo Carvalho de Melo:
- Avoid iterating empty evlist, fixing a segfault with 'perf stat --null'
- Ignore case in topdown.slots check, fixing issue with Intel Icelake
JSON metrics.
- Fix symbol size calculation condition for fixing up corner case
symbol end address obtained from Kallsyms.
* tag 'perf-tools-fixes-for-v5.17-2022-03-19' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
perf parse-events: Ignore case in topdown.slots check
perf evlist: Avoid iteration for empty evlist.
perf symbols: Fix symbol size calculation condition
Pull char/misc driver fix from Greg KH:
"Here is a single driver fix for 5.17-final that has been submitted
many times but I somehow missed it in my patch queue:
- fix for counter sysfs code for reported problem
This has been in linux-next all week with no reported issues"
* tag 'char-misc-5.17-final' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
counter: Stop using dev_get_drvdata() to get the counter device
Pull USB fixes from Greg KH:
"Here are some small remaining USB fixes for 5.17-final.
They include:
- two USB gadget driver fixes for reported problems
- usbtmc driver fix for syzbot found issues
- musb patch partial revert to resolve a reported regression.
All of these have been in linux-next this week with no reported
problems"
* tag 'usb-5.17-final' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: gadget: Fix use-after-free bug by not setting udc->dev.driver
usb: usbtmc: Fix bug in pipe direction for control transfers
partially Revert "usb: musb: Set the DT node on the child device"
usb: gadget: rndis: prevent integer overflow in rndis_set_response()
Before this patch, the symbol end address fixup to be called, needed two
conditions being met:
if (prev->end == prev->start && prev->end != curr->start)
Where
"prev->end == prev->start" means that prev is zero-long
(and thus needs a fixup)
and
"prev->end != curr->start" means that fixup hasn't been applied yet
However, this logic is incorrect in the following situation:
*curr = {rb_node = {__rb_parent_color = 278218928,
rb_right = 0x0, rb_left = 0x0},
start = 0xc000000000062354,
end = 0xc000000000062354, namelen = 40, type = 2 '\002',
binding = 0 '\000', idle = 0 '\000', ignore = 0 '\000',
inlined = 0 '\000', arch_sym = 0 '\000', annotate2 = false,
name = 0x1159739e "kprobe_optinsn_page\t[__builtin__kprobes]"}
*prev = {rb_node = {__rb_parent_color = 278219041,
rb_right = 0x109548b0, rb_left = 0x109547c0},
start = 0xc000000000062354,
end = 0xc000000000062354, namelen = 12, type = 2 '\002',
binding = 1 '\001', idle = 0 '\000', ignore = 0 '\000',
inlined = 0 '\000', arch_sym = 0 '\000', annotate2 = false,
name = 0x1095486e "optinsn_slot"}
In this case, prev->start == prev->end == curr->start == curr->end,
thus the condition above thinks that "we need a fixup due to zero
length of prev symbol, but it has been probably done, since the
prev->end == curr->start", which is wrong.
After the patch, the execution path proceeds to arch__symbols__fixup_end
function which fixes up the size of prev symbol by adding page_size to
its end offset.
Fixes: 3b01a413c1 ("perf symbols: Improve kallsyms symbol end addr calculation")
Signed-off-by: Michael Petlan <mpetlan@redhat.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: http://lore.kernel.org/lkml/20220317135536.805-1-mpetlan@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Pull arm64 fixes from Catalin Marinas:
"Fix two compiler warnings introduced by recent commits: pointer
arithmetic and double initialisation of struct field"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: errata: avoid duplicate field initializer
arm64: fix clang warning about TRAMP_VALIAS
Pull cifs fix from Steve French:
"Small fix for regression in multiuser mounts.
The additional improvements suggested by Ronnie to make the server and
session status handling code easier to read can wait for the 5.18
merge window."
* tag '5.17-rc8-smb3-fix' of git://git.samba.org/sfrench/cifs-2.6:
smb3: fix incorrect session setup check for multiuser mounts
Pull block fixes from Jens Axboe:
- Revert of a nvme target feature (Hannes)
- Fix a memory leak with rq-qos (Ming)
* tag 'block-5.17-2022-03-18' of git://git.kernel.dk/linux-block:
nvmet: revert "nvmet: make discovery NQN configurable"
block: release rq qos structures for queue without disk
Pull drm fixes from Dave Airlie:
"A few minor changes to finish things off, one mgag200 regression, imx
fix and couple of panel changes.
imx:
- Don't test bus flags in atomic check
mgag200:
- Fix PLL setup on some models
panel:
- Fix bpp settings on Innolux G070Y2-L01
- Fix DRM_PANEL_EDP Kconfig dependencies"
* tag 'drm-fixes-2022-03-18' of git://anongit.freedesktop.org/drm/drm:
drm: Don't make DRM_PANEL_BRIDGE dependent on DRM_KMS_HELPERS
drm/panel: simple: Fix Innolux G070Y2-L01 BPP settings
drm/imx: parallel-display: Remove bus flags check in imx_pd_bridge_atomic_check()
drm/mgag200: Fix PLL setup for g200wb and g200ew
The '.type' field is initialized both in place and in the macro
as reported by this W=1 warning:
arch/arm64/include/asm/cpufeature.h:281:9: error: initialized field overwritten [-Werror=override-init]
281 | (ARM64_CPUCAP_SCOPE_LOCAL_CPU | ARM64_CPUCAP_OPTIONAL_FOR_LATE_CPU)
| ^
arch/arm64/kernel/cpu_errata.c:136:17: note: in expansion of macro 'ARM64_CPUCAP_LOCAL_CPU_ERRATUM'
136 | .type = ARM64_CPUCAP_LOCAL_CPU_ERRATUM, \
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
arch/arm64/kernel/cpu_errata.c:145:9: note: in expansion of macro 'ERRATA_MIDR_RANGE'
145 | ERRATA_MIDR_RANGE(m, var, r_min, var, r_max)
| ^~~~~~~~~~~~~~~~~
arch/arm64/kernel/cpu_errata.c:613:17: note: in expansion of macro 'ERRATA_MIDR_REV_RANGE'
613 | ERRATA_MIDR_REV_RANGE(MIDR_CORTEX_A510, 0, 0, 2),
| ^~~~~~~~~~~~~~~~~~~~~
arch/arm64/include/asm/cpufeature.h:281:9: note: (near initialization for 'arm64_errata[18].type')
281 | (ARM64_CPUCAP_SCOPE_LOCAL_CPU | ARM64_CPUCAP_OPTIONAL_FOR_LATE_CPU)
| ^
Remove the extranous initializer.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 1dd498e5e2 ("KVM: arm64: Workaround Cortex-A510's single-step and PAC trap errata")
Link: https://lore.kernel.org/r/20220316183800.1546731-1-arnd@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The newly introduced TRAMP_VALIAS definition causes a build warning
with clang-14:
arch/arm64/include/asm/vectors.h:66:31: error: arithmetic on a null pointer treated as a cast from integer to pointer is a GNU extension [-Werror,-Wnull-pointer-arithmetic]
return (char *)TRAMP_VALIAS + SZ_2K * slot;
Change the addition to something clang does not complain about.
Fixes: bd09128d16 ("arm64: Add percpu vectors for EL1")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: James Morse <james.morse@arm.com>
Link: https://lore.kernel.org/r/20220316183833.1563139-1-arnd@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Pull networking fixes from Jakub Kicinski:
"Including fixes from netfilter, ipsec, and wireless.
A few last minute revert / disable and fix patches came down from our
sub-trees. We're not waiting for any fixes at this point.
Current release - regressions:
- Revert "netfilter: nat: force port remap to prevent shadowing
well-known ports", restore working conntrack on asymmetric paths
- Revert "ath10k: drop beacon and probe response which leak from
other channel", restore working AP and mesh mode on QCA9984
- eth: intel: fix hang during reboot/shutdown
Current release - new code bugs:
- netfilter: nf_tables: disable register tracking, it needs more work
to cover all corner cases
Previous releases - regressions:
- ipv6: fix skb_over_panic in __ip6_append_data when (admin-only)
extension headers get specified
- esp6: fix ESP over TCP/UDP, interpret ipv6_skip_exthdr's return
value more selectively
- bnx2x: fix driver load failure when FW not present in initrd
Previous releases - always broken:
- vsock: stop destroying unrelated sockets in nested virtualization
- packet: fix slab-out-of-bounds access in packet_recvmsg()
Misc:
- add Paolo Abeni to networking maintainers!"
* tag 'net-5.17-final' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (26 commits)
iavf: Fix hang during reboot/shutdown
net: mscc: ocelot: fix backwards compatibility with single-chain tc-flower offload
net: bcmgenet: skip invalid partial checksums
bnx2x: fix built-in kernel driver load failure
net: phy: mscc: Add MODULE_FIRMWARE macros
net: dsa: Add missing of_node_put() in dsa_port_parse_of
net: handle ARPHRD_PIMREG in dev_is_mac_header_xmit()
Revert "ath10k: drop beacon and probe response which leak from other channel"
hv_netvsc: Add check for kvmalloc_array
iavf: Fix double free in iavf_reset_task
ice: destroy flow director filter mutex after releasing VSIs
ice: fix NULL pointer dereference in ice_update_vsi_tx_ring_stats()
Add Paolo Abeni to networking maintainers
atm: eni: Add check for dma_map_single
net/packet: fix slab-out-of-bounds access in packet_recvmsg()
net: mdio: mscc-miim: fix duplicate debugfs entry
net: phy: marvell: Fix invalid comparison in the resume and suspend functions
esp6: fix check on ipv6_skip_exthdr's return value
net: dsa: microchip: add spi_device_id tables
netfilter: nf_tables: disable register tracking
...
Pull ACPI fix from Rafael Wysocki:
"Revert recent commit that caused multiple systems to misbehave due to
firmware issues"
* tag 'acpi-5.17-rc9' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
Revert "ACPI: scan: Do not add device IDs from _CID if _HID is not valid"
Merge misc fixes from Andrew Morton:
"Four patches.
Subsystems affected by this patch series: mm/swap, kconfig, ocfs2, and
selftests"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
selftests: vm: fix clang build error multiple output files
ocfs2: fix crash when initialize filecheck kobj fails
configs/debug: restore DEBUG_INFO=y for overriding
mm: swap: get rid of livelock in swapin readahead
When building the vm selftests using clang, some errors are seen due to
having headers in the compilation command:
clang -Wall -I ../../../../usr/include -no-pie gup_test.c ../../../../mm/gup_test.h -lrt -lpthread -o .../tools/testing/selftests/vm/gup_test
clang: error: cannot specify -o when generating multiple output files
make[1]: *** [../lib.mk:146: .../tools/testing/selftests/vm/gup_test] Error 1
Rework to add the header files to LOCAL_HDRS before including ../lib.mk,
since the dependency is evaluated in '$(OUTPUT)/%:%.c $(LOCAL_HDRS)' in
file lib.mk.
Link: https://lkml.kernel.org/r/20220304000645.1888133-1-yosryahmed@google.com
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In our testing, a livelock task was found. Through sysrq printing, same
stack was found every time, as follows:
__swap_duplicate+0x58/0x1a0
swapcache_prepare+0x24/0x30
__read_swap_cache_async+0xac/0x220
read_swap_cache_async+0x58/0xa0
swapin_readahead+0x24c/0x628
do_swap_page+0x374/0x8a0
__handle_mm_fault+0x598/0xd60
handle_mm_fault+0x114/0x200
do_page_fault+0x148/0x4d0
do_translation_fault+0xb0/0xd4
do_mem_abort+0x50/0xb0
The reason for the livelock is that swapcache_prepare() always returns
EEXIST, indicating that SWAP_HAS_CACHE has not been cleared, so that it
cannot jump out of the loop. We suspect that the task that clears the
SWAP_HAS_CACHE flag never gets a chance to run. We try to lower the
priority of the task stuck in a livelock so that the task that clears
the SWAP_HAS_CACHE flag will run. The results show that the system
returns to normal after the priority is lowered.
In our testing, multiple real-time tasks are bound to the same core, and
the task in the livelock is the highest priority task of the core, so
the livelocked task cannot be preempted.
Although cond_resched() is used by __read_swap_cache_async, it is an
empty function in the preemptive system and cannot achieve the purpose
of releasing the CPU. A high-priority task cannot release the CPU
unless preempted by a higher-priority task. But when this task is
already the highest priority task on this core, other tasks will not be
able to be scheduled. So we think we should replace cond_resched() with
schedule_timeout_uninterruptible(1), schedule_timeout_interruptible will
call set_current_state first to set the task state, so the task will be
removed from the running queue, so as to achieve the purpose of giving
up the CPU and prevent it from running in kernel mode for too long.
(akpm: ugly hack becomes uglier. But it fixes the issue in a
backportable-to-stable fashion while we hopefully work on something
better)
Link: https://lkml.kernel.org/r/20220221111749.1928222-1-cgel.zte@gmail.com
Signed-off-by: Guo Ziliang <guo.ziliang@zte.com.cn>
Reported-by: Zeal Robot <zealci@zte.com.cn>
Reviewed-by: Ran Xiaokai <ran.xiaokai@zte.com.cn>
Reviewed-by: Jiang Xuexin <jiang.xuexin@zte.com.cn>
Reviewed-by: Yang Yang <yang.yang29@zte.com.cn>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Roger Quadros <rogerq@kernel.org>
Cc: Ziliang Guo <guo.ziliang@zte.com.cn>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Recent commit 974578017f ("iavf: Add waiting so the port is
initialized in remove") adds a wait-loop at the beginning of
iavf_remove() to ensure that port initialization is finished
prior unregistering net device. This causes a regression
in reboot/shutdown scenario because in this case callback
iavf_shutdown() is called and this callback detaches the device,
makes it down if it is running and sets its state to __IAVF_REMOVE.
Later shutdown callback of associated PF driver (e.g. ice_shutdown)
is called. That callback calls among other things sriov_disable()
that calls indirectly iavf_remove() (see stack trace below).
As the adapter state is already __IAVF_REMOVE then the mentioned
loop is end-less and shutdown process hangs.
The patch fixes this by checking adapter's state at the beginning
of iavf_remove() and skips the rest of the function if the adapter
is already in remove state (shutdown is in progress).
Reproducer:
1. Create VF on PF driven by ice or i40e driver
2. Ensure that the VF is bound to iavf driver
3. Reboot
[52625.981294] sysrq: SysRq : Show Blocked State
[52625.988377] task:reboot state:D stack: 0 pid:17359 ppid: 1 f2
[52625.996732] Call Trace:
[52625.999187] __schedule+0x2d1/0x830
[52626.007400] schedule+0x35/0xa0
[52626.010545] schedule_hrtimeout_range_clock+0x83/0x100
[52626.020046] usleep_range+0x5b/0x80
[52626.023540] iavf_remove+0x63/0x5b0 [iavf]
[52626.027645] pci_device_remove+0x3b/0xc0
[52626.031572] device_release_driver_internal+0x103/0x1f0
[52626.036805] pci_stop_bus_device+0x72/0xa0
[52626.040904] pci_stop_and_remove_bus_device+0xe/0x20
[52626.045870] pci_iov_remove_virtfn+0xba/0x120
[52626.050232] sriov_disable+0x2f/0xe0
[52626.053813] ice_free_vfs+0x7c/0x340 [ice]
[52626.057946] ice_remove+0x220/0x240 [ice]
[52626.061967] ice_shutdown+0x16/0x50 [ice]
[52626.065987] pci_device_shutdown+0x34/0x60
[52626.070086] device_shutdown+0x165/0x1c5
[52626.074011] kernel_restart+0xe/0x30
[52626.077593] __do_sys_reboot+0x1d2/0x210
[52626.093815] do_syscall_64+0x5b/0x1a0
[52626.097483] entry_SYSCALL_64_after_hwframe+0x65/0xca
Fixes: 974578017f ("iavf: Add waiting so the port is initialized in remove")
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Link: https://lore.kernel.org/r/20220317104524.2802848-1-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
ACL rules can be offloaded to VCAP IS2 either through chain 0, or, since
the blamed commit, through a chain index whose number encodes a specific
PAG (Policy Action Group) and lookup number.
The chain number is translated through ocelot_chain_to_pag() into a PAG,
and through ocelot_chain_to_lookup() into a lookup number.
The problem with the blamed commit is that the above 2 functions don't
have special treatment for chain 0. So ocelot_chain_to_pag(0) returns
filter->pag = 224, which is in fact -32, but the "pag" field is an u8.
So we end up programming the hardware with VCAP IS2 entries having a PAG
of 224. But the way in which the PAG works is that it defines a subset
of VCAP IS2 filters which should match on a packet. The default PAG is
0, and previous VCAP IS1 rules (which we offload using 'goto') can
modify it. So basically, we are installing filters with a PAG on which
no packet will ever match. This is the hardware equivalent of adding
filters to a chain which has no 'goto' to it.
Restore the previous functionality by making ACL filters offloaded to
chain 0 go to PAG 0 and lookup number 0. The choice of PAG is clearly
correct, but the choice of lookup number isn't "as before" (which was to
leave the lookup a "don't care"). However, lookup 0 should be fine,
since even though there are ACL actions (policers) which have a
requirement to be used in a specific lookup, that lookup is 0.
Fixes: 226e9cd82a ("net: mscc: ocelot: only install TCAM entries into a specific lookup and PAG")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20220316192117.2568261-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The RXCHK block will return a partial checksum of 0 if it encounters
a problem while receiving a packet. Since a 1's complement sum can
only produce this result if no bits are set in the received data
stream it is fair to treat it as an invalid partial checksum and
not pass it up the stack.
Fixes: 8101553978 ("net: bcmgenet: use CHECKSUM_COMPLETE for NETIF_F_RXCSUM")
Signed-off-by: Doug Berger <opendmb@gmail.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20220317012812.1313196-1-opendmb@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit b7a49f7305 ("bnx2x: Utilize firmware 7.13.21.0")
added request_firmware() logic in probe() which caused
load failure when firmware file is not present in initrd (below),
as access to firmware file is not feasible during probe.
Direct firmware load for bnx2x/bnx2x-e2-7.13.15.0.fw failed with error -2
Direct firmware load for bnx2x/bnx2x-e2-7.13.21.0.fw failed with error -2
This patch fixes this issue by -
1. Removing request_firmware() logic from the probe()
such that .ndo_open() handle it as it used to handle
it earlier
2. Given request_firmware() is removed from probe(), so
driver has to relax FW version comparisons a bit against
the already loaded FW version (by some other PFs of same
adapter) to allow different compatible/close enough FWs with which
multiple PFs may run with (in different environments), as the
given PF who is in probe flow has no idea now with which firmware
file version it is going to initialize the device in ndo_open()
Link: https://lore.kernel.org/all/46f2d9d9-ae7f-b332-ddeb-b59802be2bab@molgen.mpg.de/
Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Tested-by: Paul Menzel <pmenzel@molgen.mpg.de>
Fixes: b7a49f7305 ("bnx2x: Utilize firmware 7.13.21.0")
Signed-off-by: Manish Chopra <manishc@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Link: https://lore.kernel.org/r/20220316214613.6884-1-manishc@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fix a number of undefined references to drm_kms_helper.ko in
drm_dp_helper.ko:
arm-suse-linux-gnueabi-ld: drivers/gpu/drm/dp/drm_dp_mst_topology.o: in function `drm_dp_mst_duplicate_state':
drm_dp_mst_topology.c:(.text+0x2df0): undefined reference to `__drm_atomic_helper_private_obj_duplicate_state'
arm-suse-linux-gnueabi-ld: drivers/gpu/drm/dp/drm_dp_mst_topology.o: in function `drm_dp_delayed_destroy_work':
drm_dp_mst_topology.c:(.text+0x370c): undefined reference to `drm_kms_helper_hotplug_event'
arm-suse-linux-gnueabi-ld: drivers/gpu/drm/dp/drm_dp_mst_topology.o: in function `drm_dp_mst_up_req_work':
drm_dp_mst_topology.c:(.text+0x7938): undefined reference to `drm_kms_helper_hotplug_event'
arm-suse-linux-gnueabi-ld: drivers/gpu/drm/dp/drm_dp_mst_topology.o: in function `drm_dp_mst_link_probe_work':
drm_dp_mst_topology.c:(.text+0x82e0): undefined reference to `drm_kms_helper_hotplug_event'
This happens if panel-edp.ko has been configured with
DRM_PANEL_EDP=y
DRM_DP_HELPER=y
DRM_KMS_HELPER=m
which builds DP helpers into the kernel and KMS helpers sa a module.
Making DRM_PANEL_EDP select DRM_KMS_HELPER resolves this problem.
To avoid a resulting cyclic dependency with DRM_PANEL_BRIDGE, don't
make the latter depend on DRM_KMS_HELPER and fix the one DRM bridge
drivers that doesn't already select DRM_KMS_HELPER. As KMS helpers
cannot be selected directly by the user, config symbols should avoid
depending on it anyway.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Fixes: 3755d35ee1 ("drm/panel: Select DRM_DP_HELPER for DRM_PANEL_EDP")
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Tested-by: Brian Masney <bmasney@redhat.com>
Reported-by: kernel test robot <lkp@intel.com>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Naresh Kamboju <naresh.kamboju@linaro.org>
Cc: Linux Kernel Functional Testing <lkft@linaro.org>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Maxime Ripard <mripard@kernel.org>
Cc: dri-devel@lists.freedesktop.org
Cc: Dave Airlie <airlied@redhat.com>
Cc: Thierry Reding <thierry.reding@gmail.com>
Link: https://patchwork.freedesktop.org/patch/478296/
Backmerging drm/drm-fixes for commit 3755d35ee1 ("drm/panel: Select
DRM_DP_HELPER for DRM_PANEL_EDP").
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
A recent change to how the SMB3 server (socket) and session status
is managed regressed multiuser mounts by changing the check
for whether session setup is needed to the socket (TCP_Server_info)
structure instead of the session struct (cifs_ses). Add additional
check in cifs_setup_sesion to fix this.
Fixes: 73f9bfbe3d ("cifs: maintain a state machine for tcp/smb/tcon sessions")
Reported-by: Ronnie Sahlberg <lsahlber@redhat.com>
Acked-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Pull EFI fix from Ard Biesheuvel:
"Avoid spurious warnings about unknown boot parameters"
* tag 'efi-urgent-for-v5.17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
efi: fix return value of __setup handlers
Pull crypto fix from Herbert Xu:
"This fixes a bug where qcom-rng can return a buffer that is not
completely filled with random data"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: qcom-rng - ensure buffer for generate is completely filled
This reverts commit 869f0ec048. That
updated the expected device tree binding format for the ls-extirq
driver, without also updating the parsing code (ls_extirq_parse_map)
to the new format.
The context is that the ls-extirq driver uses the standard
"interrupt-map" OF property in a non-standard way, as suggested by
Rob Herring during review:
https://lore.kernel.org/lkml/20190927161118.GA19333@bogus/
This has turned out to be problematic, as Marc Zyngier discovered
through commit 0412841812 ("of/irq: Allow matching of an interrupt-map
local to an interrupt controller"), later fixed through commit
de4adddcbc ("of/irq: Add a quirk for controllers with their own
definition of interrupt-map"). Marc's position, expressed on multiple
opportunities, is that:
(a) [ making private use of the reserved "interrupt-map" name in a
driver ] "is wrong, by the very letter of what an interrupt-map
means. If the interrupt map points to an interrupt controller,
that's the target for the interrupt."
https://lore.kernel.org/lkml/87k0g8jlmg.wl-maz@kernel.org/
(b) [ updating the driver's bindings to accept a non-reserved name for
this property, as an alternative, is ] "is totally pointless. These
machines have been in the wild for years, and existing DTs will be
there *forever*."
https://lore.kernel.org/lkml/87ilvrk1r0.wl-maz@kernel.org/
Considering the above, the Linux kernel has quirks in place to deal with
the ls-extirq's non-standard use of the "interrupt-map". These quirks
may be needed in other operating systems that consume this device tree,
yet this is seen as the only viable solution.
Therefore, the premise of the patch being reverted here is invalid.
It doesn't matter whether the driver, in its non-standard use of the
property, complies to the standard format or not, since this property
isn't expected to be used for interrupt translation by the core.
This change restores LS1088A, LS2088A/LS2085A and LX2160A to their
previous bindings, which allows these systems to continue to use
external interrupt lines with the correct polarity.
Fixes: 869f0ec048 ("arm64: dts: freescale: Fix 'interrupt-map' parent address cells")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Steffen Klassert says:
====================
pull request (net): ipsec 2022-03-16
1) Fix a kernel-info-leak in pfkey.
From Haimin Zhang.
2) Fix an incorrect check of the return value of ipv6_skip_exthdr.
From Sabrina Dubroca.
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec:
esp6: fix check on ipv6_skip_exthdr's return value
af_key: add __GFP_ZERO flag for compose_sadb_supported in function pfkey_register
====================
Link: https://lore.kernel.org/r/20220316121142.3142336-1-steffen.klassert@secunet.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kalle Valo says:
====================
wireless fixes for v5.17
Third set of fixes for v5.17. We have only one revert to fix an ath10k
regression.
* tag 'wireless-2022-03-16' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
Revert "ath10k: drop beacon and probe response which leak from other channel"
====================
Link: https://lore.kernel.org/r/20220316130249.B5225C340EC@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull NVMe fix from Christoph:
"nvme fix for Linux 5.17
- last minute revert of a nvmet feature added in Linux 5.16
(Hannes Reinecke)"
* tag 'nvme-5.17-2022-03-16' of git://git.infradead.org/nvme:
nvmet: revert "nvmet: make discovery NQN configurable"
This reverts commit 3bf2537ec2.
I was reported privately that this commit breaks AP and mesh mode on QCA9984
(firmware 10.4-3.9.0.2-00156). So revert the commit to fix the regression.
There was a conflict due to cfg80211 API changes but that was easy to fix.
Fixes: 3bf2537ec2 ("ath10k: drop beacon and probe response which leak from other channel")
Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com>
Link: https://lore.kernel.org/r/20220315155455.20446-1-kvalo@kernel.org
Revert commit e38f9ff63e ("ACPI: scan: Do not add device IDs from _CID
if _HID is not valid"), because it has introduced regressions on
multiple systems, even though it only has effect on clearly invalid
firmware.
Reported-by: Pierre-Louis Bossart <notifications@github.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
====================
Intel Wired LAN Driver Updates 2022-03-15
This series contains updates to ice and iavf drivers.
Maciej adjusts null check logic on Tx ring to prevent possible NULL
pointer dereference for ice.
Sudheer moves destruction of Flow Director lock as it was being accessed
after destruction for ice.
Przemyslaw removes an excess mutex unlock as it was being double
unlocked for iavf.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix double free possibility in iavf_disable_vf, as crit_lock is
freed in caller, iavf_reset_task. Add kernel-doc for iavf_disable_vf.
Remove mutex_unlock in iavf_disable_vf.
Without this patch there is double free scenario, when calling
iavf_reset_task.
Fixes: e85ff9c631 ("iavf: Fix deadlock in iavf_reset_task")
Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com>
Suggested-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Currently fdir_fltr_lock is accessed in ice_vsi_release_all() function
after it is destroyed. Instead destroy mutex after ice_vsi_release_all.
Fixes: 40319796b7 ("ice: Add flow director support for channel mode")
Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>
Tested-by: Bharathi Sreenivas <bharathi.sreenivas@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
It is possible to do NULL pointer dereference in routine that updates
Tx ring stats. Currently only stats and bytes are updated when ring
pointer is valid, but later on ring is accessed to propagate gathered Tx
stats onto VSI stats.
Change the existing logic to move to next ring when ring is NULL.
Fixes: e72bba2135 ("ice: split ice_ring onto Tx/Rx separate structs")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Acked-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
When aborting a SCSI command through fnic, there is a race with the fnic
interrupt handler which can result in the SCSI command and its request
being completed twice. If the interrupt handler claims the command by
setting CMD_SP to NULL first, the abort handler assumes the interrupt
handler has completed the command and returns SUCCESS, causing the request
for the scsi_cmnd to be re-queued.
But the interrupt handler may not have finished the command yet. After it
drops the spinlock protecting CMD_SP, it does memory cleanup before finally
calling scsi_done() to complete the scsi_cmnd. If the call to scsi_done
occurs after the abort handler finishes and re-queues the request, the
completion of the scsi_cmnd will advance and try to double complete a
request already queued for retry.
This patch fixes the issue by moving scsi_done() and any other use of
scsi_cmnd to before the spinlock is released by the interrupt handler.
Link: https://lore.kernel.org/r/20220311184359.2345319-1-djeffery@redhat.com
Reviewed-by: Laurence Oberman <loberman@redhat.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: David Jeffery <djeffery@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The syzbot fuzzer found a use-after-free bug:
BUG: KASAN: use-after-free in dev_uevent+0x712/0x780 drivers/base/core.c:2320
Read of size 8 at addr ffff88802b934098 by task udevd/3689
CPU: 2 PID: 3689 Comm: udevd Not tainted 5.17.0-rc4-syzkaller-00229-g4f12b742eb2b #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description.constprop.0.cold+0x8d/0x303 mm/kasan/report.c:255
__kasan_report mm/kasan/report.c:442 [inline]
kasan_report.cold+0x83/0xdf mm/kasan/report.c:459
dev_uevent+0x712/0x780 drivers/base/core.c:2320
uevent_show+0x1b8/0x380 drivers/base/core.c:2391
dev_attr_show+0x4b/0x90 drivers/base/core.c:2094
Although the bug manifested in the driver core, the real cause was a
race with the gadget core. dev_uevent() does:
if (dev->driver)
add_uevent_var(env, "DRIVER=%s", dev->driver->name);
and between the test and the dereference of dev->driver, the gadget
core sets dev->driver to NULL.
The race wouldn't occur if the gadget core registered its devices on
a real bus, using the standard synchronization techniques of the
driver core. However, it's not necessary to make such a large change
in order to fix this bug; all we need to do is make sure that
udc->dev.driver is always NULL.
In fact, there is no reason for udc->dev.driver ever to be set to
anything, let alone to the value it currently gets: the address of the
gadget's driver. After all, a gadget driver only knows how to manage
a gadget, not how to manage a UDC.
This patch simply removes the statements in the gadget core that touch
udc->dev.driver.
Fixes: 2ccea03a8f ("usb: gadget: introduce UDC Class")
CC: <stable@vger.kernel.org>
Reported-and-tested-by: syzbot+348b571beb5eeb70a582@syzkaller.appspotmail.com
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Link: https://lore.kernel.org/r/YiQgukfFFbBnwJ/9@rowland.harvard.edu
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The syzbot fuzzer reported a minor bug in the usbtmc driver:
usb 5-1: BOGUS control dir, pipe 80001e80 doesn't match bRequestType 0
WARNING: CPU: 0 PID: 3813 at drivers/usb/core/urb.c:412
usb_submit_urb+0x13a5/0x1970 drivers/usb/core/urb.c:410
Modules linked in:
CPU: 0 PID: 3813 Comm: syz-executor122 Not tainted
5.17.0-rc5-syzkaller-00306-g2293be58d6a1 #0
...
Call Trace:
<TASK>
usb_start_wait_urb+0x113/0x530 drivers/usb/core/message.c:58
usb_internal_control_msg drivers/usb/core/message.c:102 [inline]
usb_control_msg+0x2a5/0x4b0 drivers/usb/core/message.c:153
usbtmc_ioctl_request drivers/usb/class/usbtmc.c:1947 [inline]
The problem is that usbtmc_ioctl_request() uses usb_rcvctrlpipe() for
all of its transfers, whether they are in or out. It's easy to fix.
CC: <stable@vger.kernel.org>
Reported-and-tested-by: syzbot+a48e3d1a875240cab5de@syzkaller.appspotmail.com
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Link: https://lore.kernel.org/r/YiEsYTPEE6lOCOA5@rowland.harvard.edu
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This reverts the omap2430 changes of
commit cf081d009c ("usb: musb: Set the DT node on the child device")
Since v5.17-rc1, musb is broken on the gta04 and openpandora devices
(omap3530/dm3730). BeagleBone Black (am335x) seems to work.
Symptoms of this bug are
a) main symptom
[ 21.336517] using random host ethernet address
[ 21.341430] using host ethernet address: 32:70:05:18:ff:78
[ 21.341461] using self ethernet address: 46:10:3a:b3:af:d9
[ 21.358184] usb0: HOST MAC 32:70:05:18:ff:78
[ 21.376678] usb0: MAC 46:10:3a:b3:af:d9
[ 21.388305] using random self ethernet address
[ 21.393371] using random host ethernet address
[ 21.398162] g_ether gadget: Ethernet Gadget, version: Memorial Day 2008
[ 21.421081] g_ether gadget: g_ether ready
[ 21.492156] musb-hdrc musb-hdrc.1.auto: Could not enable: -22
[ 21.691345] musb-hdrc musb-hdrc.1.auto: Could not enable: -22
[ 21.803192] musb-hdrc musb-hdrc.1.auto: Could not enable: -22
[ 21.819427] musb-hdrc musb-hdrc.1.auto: Could not enable: -22
[ 22.124450] musb-hdrc musb-hdrc.1.auto: Could not enable: -22
[ 22.168518] musb-hdrc musb-hdrc.1.auto: Could not enable: -22
[ 22.179382] musb-hdrc musb-hdrc.1.auto: Could not enable: -22
[ 23.213592] musb-hdrc musb-hdrc.1.auto: pm runtime get failed in musb_gadget_queue
[ 23.221832] musb-hdrc musb-hdrc.1.auto: Could not enable: -22
[ 23.227905] musb-hdrc musb-hdrc.1.auto: Could not enable: -22
[ 23.239440] musb-hdrc musb-hdrc.1.auto: Could not enable: -22
[ 23.401000] musb-hdrc musb-hdrc.1.auto: Could not enable: -22
[ 23.407073] musb-hdrc musb-hdrc.1.auto: Could not enable: -22
[ 23.426361] musb-hdrc musb-hdrc.1.auto: Could not enable: -22
[ 23.734466] musb-hdrc musb-hdrc.1.auto: pm runtime get failed in musb_gadget_queue
[ 23.742462] musb-hdrc musb-hdrc.1.auto: pm runtime get failed in musb_gadget_queue
[ 23.750396] musb-hdrc musb-hdrc.1.auto: pm runtime get failed in musb_gadget_queue
... (repeats with high frequency)
This stops if the USB cable is unplugged and restarts if it is plugged in again.
b) also found in the log
[ 6.498107] ------------[ cut here ]------------
[ 6.502960] WARNING: CPU: 0 PID: 868 at arch/arm/mach-omap2/omap_hwmod.c:1885 _enable+0x50/0x234
[ 6.512207] omap_hwmod: usb_otg_hs: enabled state can only be entered from initialized, idle, or disabled state
[ 6.522766] Modules linked in: omap2430(+) bmp280_i2c bmp280 itg3200 at24 tsc2007 leds_tca6507 bma180 hmc5843_i2c hmc5843_core industrialio_triggered_buffer lis3lv02d_i2c kfifo_buf lis3lv02d phy_twl4030_usb snd_soc_omap_mcbsp snd_soc_ti_sdma musb_hdrc snd_soc_twl4030 gnss_sirf twl4030_vibra twl4030_madc twl4030_charger twl4030_pwrbutton gnss industrialio ehci_omap omapdrm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm drm_panel_orientation_quirks cec
[ 6.566436] CPU: 0 PID: 868 Comm: udevd Not tainted 5.16.0-rc5-letux+ #8251
[ 6.573730] Hardware name: Generic OMAP36xx (Flattened Device Tree)
[ 6.580322] [<c010ed30>] (unwind_backtrace) from [<c010a1d0>] (show_stack+0x10/0x14)
[ 6.588470] [<c010a1d0>] (show_stack) from [<c0897c14>] (dump_stack_lvl+0x40/0x4c)
[ 6.596405] [<c0897c14>] (dump_stack_lvl) from [<c0130cc4>] (__warn+0xb4/0xdc)
[ 6.604003] [<c0130cc4>] (__warn) from [<c0130d5c>] (warn_slowpath_fmt+0x70/0x9c)
[ 6.611846] [<c0130d5c>] (warn_slowpath_fmt) from [<c011f4d4>] (_enable+0x50/0x234)
[ 6.619903] [<c011f4d4>] (_enable) from [<c012081c>] (omap_hwmod_enable+0x28/0x40)
[ 6.627838] [<c012081c>] (omap_hwmod_enable) from [<c0120ff4>] (omap_device_enable+0x4c/0x78)
[ 6.636779] [<c0120ff4>] (omap_device_enable) from [<c0121030>] (_od_runtime_resume+0x10/0x3c)
[ 6.645812] [<c0121030>] (_od_runtime_resume) from [<c05c688c>] (__rpm_callback+0x3c/0xf4)
[ 6.654510] [<c05c688c>] (__rpm_callback) from [<c05c6994>] (rpm_callback+0x50/0x54)
[ 6.662628] [<c05c6994>] (rpm_callback) from [<c05c66b0>] (rpm_resume+0x448/0x4e4)
[ 6.670593] [<c05c66b0>] (rpm_resume) from [<c05c6784>] (__pm_runtime_resume+0x38/0x50)
[ 6.678985] [<c05c6784>] (__pm_runtime_resume) from [<bf14ab20>] (musb_init_controller+0x350/0xa5c [musb_hdrc])
[ 6.689727] [<bf14ab20>] (musb_init_controller [musb_hdrc]) from [<c05bccb8>] (platform_probe+0x58/0xa8)
[ 6.699737] [<c05bccb8>] (platform_probe) from [<c05badf0>] (really_probe+0x170/0x2fc)
[ 6.708068] [<c05badf0>] (really_probe) from [<c05bb040>] (__driver_probe_device+0xc4/0xd8)
[ 6.716827] [<c05bb040>] (__driver_probe_device) from [<c05bb084>] (driver_probe_device+0x30/0xac)
[ 6.726226] [<c05bb084>] (driver_probe_device) from [<c05bb3d0>] (__device_attach_driver+0x94/0xb4)
[ 6.735717] [<c05bb3d0>] (__device_attach_driver) from [<c05b93f8>] (bus_for_each_drv+0xa0/0xb4)
[ 6.744934] [<c05b93f8>] (bus_for_each_drv) from [<c05bb248>] (__device_attach+0xc0/0x134)
[ 6.753631] [<c05bb248>] (__device_attach) from [<c05b9fcc>] (bus_probe_device+0x28/0x80)
[ 6.762207] [<c05b9fcc>] (bus_probe_device) from [<c05b7e40>] (device_add+0x5fc/0x788)
[ 6.770507] [<c05b7e40>] (device_add) from [<c05bd240>] (platform_device_add+0x70/0x1bc)
[ 6.779022] [<c05bd240>] (platform_device_add) from [<bf177830>] (omap2430_probe+0x260/0x2d4 [omap2430])
[ 6.789001] [<bf177830>] (omap2430_probe [omap2430]) from [<c05bccb8>] (platform_probe+0x58/0xa8)
[ 6.798309] [<c05bccb8>] (platform_probe) from [<c05badf0>] (really_probe+0x170/0x2fc)
[ 6.806610] [<c05badf0>] (really_probe) from [<c05bb040>] (__driver_probe_device+0xc4/0xd8)
[ 6.815399] [<c05bb040>] (__driver_probe_device) from [<c05bb084>] (driver_probe_device+0x30/0xac)
[ 6.824798] [<c05bb084>] (driver_probe_device) from [<c05bb4b4>] (__driver_attach+0xc4/0xd8)
[ 6.833648] [<c05bb4b4>] (__driver_attach) from [<c05b9308>] (bus_for_each_dev+0x64/0xa0)
[ 6.842224] [<c05b9308>] (bus_for_each_dev) from [<c05ba248>] (bus_add_driver+0x148/0x1a4)
[ 6.850891] [<c05ba248>] (bus_add_driver) from [<c05bbd1c>] (driver_register+0xb4/0xf8)
[ 6.859313] [<c05bbd1c>] (driver_register) from [<c0101f54>] (do_one_initcall+0x90/0x1c8)
[ 6.867889] [<c0101f54>] (do_one_initcall) from [<c0893968>] (do_init_module+0x4c/0x204)
[ 6.876373] [<c0893968>] (do_init_module) from [<c01b4c30>] (load_module+0x13f0/0x1928)
[ 6.884796] [<c01b4c30>] (load_module) from [<c01b53a0>] (sys_finit_module+0xa0/0xc0)
[ 6.893005] [<c01b53a0>] (sys_finit_module) from [<c0100080>] (ret_fast_syscall+0x0/0x54)
[ 6.901580] Exception stack(0xc2807fa8 to 0xc2807ff0)
[ 6.906890] 7fa0: b6e517d4 00052068 00000006 b6e509f8 00000000 b6e5131c
[ 6.915466] 7fc0: b6e517d4 00052068 cd718000 0000017b 00020000 00037f78 00050048 00063368
[ 6.924011] 7fe0: bed8fef0 bed8fee0 b6e4ac4b b6f55a42
[ 6.929321] ---[ end trace d715ff121b58763c ]---
c) git bisect result on testing for "musb-hdrc" in the console log:
cf081d009c is the first bad commit
commit cf081d009c
Author: Rob Herring <robh@kernel.org>
Date: Wed Dec 15 17:07:57 2021 -0600
usb: musb: Set the DT node on the child device
The musb glue drivers just copy the glue resources to the musb child device.
Instead, set the musb child device's DT node pointer to the parent device's
node so that platform_get_irq_byname() can find the resources in the DT.
This removes the need for statically populating the IRQ resources from the
DT which has been deprecated for some time.
Signed-off-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20211215230756.2009115-3-robh@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
drivers/usb/musb/am35x.c | 2 ++
drivers/usb/musb/da8xx.c | 2 ++
drivers/usb/musb/jz4740.c | 1 +
drivers/usb/musb/mediatek.c | 2 ++
drivers/usb/musb/omap2430.c | 1 +
drivers/usb/musb/ux500.c | 1 +
6 files changed, 9 insertions(+)
Reverting this patch makes musb work again as before.
Fixes: cf081d009c ("usb: musb: Set the DT node on the child device")
Cc: Rob Herring <robh@kernel.org>
Signed-off-by: H. Nikolaus Schaller <hns@goldelico.com>
Link: https://lore.kernel.org/r/f62f5fc11f9ecae7e57f3fd66939e051bd3b11fc.1646744166.git.hns@goldelico.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
As the potential failure of the dma_map_single(),
it should be better to check it and return error
if fails.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Revert commit 626851e922 ("nvmet: make discovery NQN configurable");
the interface was deemed incorrect and will be replaced with a different
one.
Fixes: 626851e922 ("nvmet: make discovery NQN configurable")
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
This driver can have up to two regmaps. If the second one is registered
its debugfs entry will have the same name as the first one and the
following error will be printed:
[ 3.833521] debugfs: Directory 'e200413c.mdio' with parent 'regmap' already present!
Give the second regmap a name to avoid this.
Fixes: a27a762828 ("net: mdio: mscc-miim: convert to a regmap implementation")
Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20220312224140.4173930-1-michael@walle.cc
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Syzbot reported warning in usb_submit_urb() which is caused by wrong
endpoint type. There was a check for the number of endpoints, but not
for the type of endpoint.
Fix it by replacing old desc.bNumEndpoints check with
usb_find_common_endpoints() helper for finding endpoints
Fail log:
usb 5-1: BOGUS urb xfer, pipe 1 != type 3
WARNING: CPU: 2 PID: 48 at drivers/usb/core/urb.c:502 usb_submit_urb+0xed2/0x18a0 drivers/usb/core/urb.c:502
Modules linked in:
CPU: 2 PID: 48 Comm: kworker/2:2 Not tainted 5.17.0-rc6-syzkaller-00226-g07ebd38a0da2 #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014
Workqueue: usb_hub_wq hub_event
...
Call Trace:
<TASK>
aiptek_open+0xd5/0x130 drivers/input/tablet/aiptek.c:830
input_open_device+0x1bb/0x320 drivers/input/input.c:629
kbd_connect+0xfe/0x160 drivers/tty/vt/keyboard.c:1593
Fixes: 8e20cf2bce ("Input: aiptek - fix crash on detecting device without endpoints")
Reported-and-tested-by: syzbot+75cccf2b7da87fb6f84b@syzkaller.appspotmail.com
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Link: https://lore.kernel.org/r/20220308194328.26220-1-paskripkin@gmail.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net coming late
in the 5.17-rc process:
1) Revert port remap to mitigate shadowing service ports, this is causing
problems in existing setups and this mitigation can be achieved with
explicit ruleset, eg.
... tcp sport < 16386 tcp dport >= 32768 masquerade random
This patches provided a built-in policy similar to the one described above.
2) Disable register tracking infrastructure in nf_tables. Florian reported
two issues:
- Existing expressions with no implemented .reduce interface
that causes data-store on register should cancel the tracking.
- Register clobbering might be possible storing data on registers that
are larger than 32-bits.
This might lead to generating incorrect ruleset bytecode. These two
issues are scheduled to be addressed in the next release cycle.
* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nf_tables: disable register tracking
Revert "netfilter: conntrack: tag conntracks picked up in local out hook"
Revert "netfilter: nat: force port remap to prevent shadowing well-known ports"
====================
Link: https://lore.kernel.org/r/20220312220315.64531-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This bug resulted in only the current mode being resumed and suspended when
the PHY supported both fiber and copper modes and when the PHY only supported
copper mode the fiber mode would incorrectly be attempted to be resumed and
suspended.
Fixes: 3758be3dc1 ("Marvell phy: add functions to suspend and resume both interfaces: fiber and copper links.")
Signed-off-by: Kurt Cancemi <kurt@x64architecture.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20220312201512.326047-1-kurt@x64architecture.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
blkcg_init_queue() may add rq qos structures to request queue, previously
blk_cleanup_queue() calls rq_qos_exit() to release them, but commit
8e141f9eb8 ("block: drain file system I/O on del_gendisk")
moves rq_qos_exit() into del_gendisk(), so memory leak is caused
because queues may not have disk, such as un-present scsi luns, nvme
admin queue, ...
Fixes the issue by adding rq_qos_exit() to blk_cleanup_queue() back.
BTW, v5.18 won't need this patch any more since we move
blkcg_init_queue()/blkcg_exit_queue() into disk allocation/release
handler, and patches have been in for-5.18/block.
Cc: Christoph Hellwig <hch@lst.de>
Cc: stable@vger.kernel.org
Fixes: 8e141f9eb8 ("block: drain file system I/O on del_gendisk")
Reported-by: syzbot+b42749a851a47a0f581b@syzkaller.appspotmail.com
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220314043018.177141-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull virtio fix from Michael Tsirkin:
"A last minute regression fix.
I thought we did a lot of testing, but a regression still managed to
sneak in. The fix seems trivial"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
vhost: allow batching hint without size
Commit 5f9c55c806 ("ipv6: check return value of ipv6_skip_exthdr")
introduced an incorrect check, which leads to all ESP packets over
either TCPv6 or UDPv6 encapsulation being dropped. In this particular
case, offset is negative, since skb->data points to the ESP header in
the following chain of headers, while skb->network_header points to
the IPv6 header:
IPv6 | ext | ... | ext | UDP | ESP | ...
That doesn't seem to be a problem, especially considering that if we
reach esp6_input_done2, we're guaranteed to have a full set of headers
available (otherwise the packet would have been dropped earlier in the
stack). However, it means that the return value will (intentionally)
be negative. We can make the test more specific, as the expected
return value of ipv6_skip_exthdr will be the (negated) size of either
a UDP header, or a TCP header with possible options.
In the future, we should probably either make ipv6_skip_exthdr
explicitly accept negative offsets (and adjust its return value for
error cases), or make ipv6_skip_exthdr only take non-negative
offsets (and audit all callers).
Fixes: 5f9c55c806 ("ipv6: check return value of ipv6_skip_exthdr")
Reported-by: Xiumei Mu <xmu@redhat.com>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Add spi_device_id tables to avoid logs like "SPI driver ksz9477-switch
has no spi_device_id".
Signed-off-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The generate function in struct rng_alg expects that the destination
buffer is completely filled if the function returns 0. qcom_rng_read()
can run into a situation where the buffer is partially filled with
randomness and the remaining part of the buffer is zeroed since
qcom_rng_generate() doesn't check the return value. This issue can
be reproduced by running the following from libkcapi:
kcapi-rng -b 9000000 > OUTFILE
The generated OUTFILE will have three huge sections that contain all
zeros, and this is caused by the code where the test
'val & PRNG_STATUS_DATA_AVAIL' fails.
Let's fix this issue by ensuring that qcom_rng_read() always returns
with a full buffer if the function returns success. Let's also have
qcom_rng_generate() return the correct value.
Here's some statistics from the ent project
(https://www.fourmilab.ch/random/) that shows information about the
quality of the generated numbers:
$ ent -c qcom-random-before
Value Char Occurrences Fraction
0 606748 0.067416
1 33104 0.003678
2 33001 0.003667
...
253 � 32883 0.003654
254 � 33035 0.003671
255 � 33239 0.003693
Total: 9000000 1.000000
Entropy = 7.811590 bits per byte.
Optimum compression would reduce the size
of this 9000000 byte file by 2 percent.
Chi square distribution for 9000000 samples is 9329962.81, and
randomly would exceed this value less than 0.01 percent of the
times.
Arithmetic mean value of data bytes is 119.3731 (127.5 = random).
Monte Carlo value for Pi is 3.197293333 (error 1.77 percent).
Serial correlation coefficient is 0.159130 (totally uncorrelated =
0.0).
Without this patch, the results of the chi-square test is 0.01%, and
the numbers are certainly not random according to ent's project page.
The results improve with this patch:
$ ent -c qcom-random-after
Value Char Occurrences Fraction
0 35432 0.003937
1 35127 0.003903
2 35424 0.003936
...
253 � 35201 0.003911
254 � 34835 0.003871
255 � 35368 0.003930
Total: 9000000 1.000000
Entropy = 7.999979 bits per byte.
Optimum compression would reduce the size
of this 9000000 byte file by 0 percent.
Chi square distribution for 9000000 samples is 258.77, and randomly
would exceed this value 42.24 percent of the times.
Arithmetic mean value of data bytes is 127.5006 (127.5 = random).
Monte Carlo value for Pi is 3.141277333 (error 0.01 percent).
Serial correlation coefficient is 0.000468 (totally uncorrelated =
0.0).
This change was tested on a Nexus 5 phone (msm8974 SoC).
Signed-off-by: Brian Masney <bmasney@redhat.com>
Fixes: ceec5f5b59 ("crypto: qcom-rng - Add Qcom prng driver")
Cc: stable@vger.kernel.org # 4.19+
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Reviewed-by: Andrew Halaney <ahalaney@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
commit f86c3ed559 ("drm/mgag200: Split PLL setup into compute and
update functions") introduced a regression for g200wb and g200ew.
The PLLs are not set up properly, and VGA screen stays
black, or displays "out of range" message.
MGA1064_WB_PIX_PLLC_N/M/P was mistakenly replaced with
MGA1064_PIX_PLLC_N/M/P which have different addresses.
Patch tested on a Dell T310 with g200wb
Fixes: f86c3ed559 ("drm/mgag200: Split PLL setup into compute and update functions")
Cc: stable@vger.kernel.org
Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com>
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20220308174321.225606-1-jfalempe@redhat.com
Pull x86 fixes from Borislav Petkov:
- Free shmem backing storage for SGX enclave pages when those are
swapped back into EPC memory
- Prevent do_int3() from being kprobed, to avoid recursion
- Remap setup_data and setup_indirect structures properly when
accessing their members
- Correct the alternatives patching order for modules too
* tag 'x86_urgent_for_v5.17_rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/sgx: Free backing memory after faulting the enclave page
x86/traps: Mark do_int3() NOKPROBE_SYMBOL
x86/boot: Add setup_indirect support in early_memremap_is_setup_data()
x86/boot: Fix memremap of setup_indirect structures
x86/module: Fix the paravirt vs alternative order
Pull perf tools fixes from Arnaldo Carvalho de Melo:
- Fix event parser error for hybrid systems
- Fix NULL check against wrong variable in 'perf bench' and in the
parsing code
- Update arm64 KVM headers from the kernel sources
- Sync cpufeatures header with the kernel sources
* tag 'perf-tools-fixes-for-v5.17-2022-03-12' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
perf parse: Fix event parser error for hybrid systems
perf bench: Fix NULL check against wrong variable
perf parse-events: Fix NULL check against wrong variable
tools headers cpufeatures: Sync with the kernel sources
tools kvm headers arm64: Update KVM headers from the kernel sources
Pull drm kconfig fix from Dave Airlie:
"Thorsten pointed out this had fallen down the cracks and was in -next
only, I've picked it out, fixed up it's Fixes: line.
- fix regression in Kconfig"
* tag 'drm-fixes-2022-03-12' of git://anongit.freedesktop.org/drm/drm:
drm/panel: Select DRM_DP_HELPER for DRM_PANEL_EDP
The register tracking infrastructure is incomplete, it might lead to
generating incorrect ruleset bytecode, disable it by now given we are
late in the release process.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This bug happened on hybrid systems when both cpu_core and cpu_atom
have the same event name such as "UOPS_RETIRED.MS" while their event
terms are different, then during perf stat, the event for cpu_atom
will parse fail and then no output for cpu_atom.
UOPS_RETIRED.MS -> cpu_core/period=0x1e8483,umask=0x4,event=0xc2,frontend=0x8/
UOPS_RETIRED.MS -> cpu_atom/period=0x1e8483,umask=0x1,event=0xc2/
It is because event terms in the "head" of parse_events_multi_pmu_add
will be changed to event terms for cpu_core after parsing UOPS_RETIRED.MS
for cpu_core, then when parsing the same event for cpu_atom, it still
uses the event terms for cpu_core, but event terms for cpu_atom are
different with cpu_core, the event parses for cpu_atom will fail. This
patch fixes it, the event terms should be parsed from the original
event.
This patch can work for the hybrid systems that have the same event
in more than 2 PMUs. It also can work in non-hybrid systems.
Before:
# perf stat -v -e UOPS_RETIRED.MS -a sleep 1
Using CPUID GenuineIntel-6-97-1
UOPS_RETIRED.MS -> cpu_core/period=0x1e8483,umask=0x4,event=0xc2,frontend=0x8/
Control descriptor is not initialized
UOPS_RETIRED.MS: 2737845 16068518485 16068518485
Performance counter stats for 'system wide':
2,737,845 cpu_core/UOPS_RETIRED.MS/
1.002553850 seconds time elapsed
After:
# perf stat -v -e UOPS_RETIRED.MS -a sleep 1
Using CPUID GenuineIntel-6-97-1
UOPS_RETIRED.MS -> cpu_core/period=0x1e8483,umask=0x4,event=0xc2,frontend=0x8/
UOPS_RETIRED.MS -> cpu_atom/period=0x1e8483,umask=0x1,event=0xc2/
Control descriptor is not initialized
UOPS_RETIRED.MS: 1977555 16076950711 16076950711
UOPS_RETIRED.MS: 568684 8038694234 8038694234
Performance counter stats for 'system wide':
1,977,555 cpu_core/UOPS_RETIRED.MS/
568,684 cpu_atom/UOPS_RETIRED.MS/
1.004758259 seconds time elapsed
Fixes: fb0811535e ("perf parse-events: Allow config on kernel PMU events")
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Zhengjun Xing <zhengjun.xing@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220307151627.30049-1-zhengjun.xing@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick the changes from:
d45476d983 ("x86/speculation: Rename RETPOLINE_AMD to RETPOLINE_LFENCE")
Its just a comment fixup.
This only causes these perf files to be rebuilt:
CC /tmp/build/perf/bench/mem-memcpy-x86-64-asm.o
CC /tmp/build/perf/bench/mem-memset-x86-64-asm.o
And addresses this perf build warning:
Warning: Kernel ABI header at 'tools/arch/x86/include/asm/cpufeatures.h' differs from latest version at 'arch/x86/include/asm/cpufeatures.h'
diff -u tools/arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/cpufeatures.h
Cc: Borislav Petkov <bp@suse.de>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/lkml/YiyiHatGaJQM7l/Y@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick the changes from:
a5905d6af4 ("KVM: arm64: Allow SMCCC_ARCH_WORKAROUND_3 to be discovered and migrated")
That don't causes any changes in tooling (when built on x86), only
addresses this perf build warning:
Warning: Kernel ABI header at 'tools/arch/arm64/include/uapi/asm/kvm.h' differs from latest version at 'arch/arm64/include/uapi/asm/kvm.h'
diff -u tools/arch/arm64/include/uapi/asm/kvm.h arch/arm64/include/uapi/asm/kvm.h
Cc: James Morse <james.morse@arm.com>
Link: https://lore.kernel.org/lkml/YiyhAK6sVPc83FaI@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When iterating over sockets using vsock_for_each_connected_socket, make
sure that a transport filters out sockets that don't belong to the
transport.
There actually was an issue caused by this; in a nested VM
configuration, destroying the nested VM (which often involves the
closing of /dev/vhost-vsock if there was h2g connections to the nested
VM) kills not only the h2g connections, but also all existing g2h
connections to the (outmost) host which are totally unrelated.
Tested: Executed the following steps on Cuttlefish (Android running on a
VM) [1]: (1) Enter into an `adb shell` session - to have a g2h
connection inside the VM, (2) open and then close /dev/vhost-vsock by
`exec 3< /dev/vhost-vsock && exec 3<&-`, (3) observe that the adb
session is not reset.
[1] https://android.googlesource.com/device/google/cuttlefish/
Fixes: c0cfa2d8a7 ("vsock: add multi-transports support")
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jiyong Park <jiyong@google.com>
Link: https://lore.kernel.org/r/20220311020017.1509316-1-jiyong@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
alx_reinit has a lockdep assertion that the alx->mtx mutex must be held.
alx_reinit is called from two places: alx_reset and alx_change_mtu.
alx_reset does acquire alx->mtx before calling alx_reinit.
alx_change_mtu does not acquire this mutex, nor do its callers or any
path towards alx_change_mtu.
Acquire the mutex in alx_change_mtu.
The issue was introduced when the fine-grained locking was introduced
to the code to replace the RTNL. The same commit also introduced the
lockdep assertion.
Fixes: 4a5fe57e77 ("alx: use fine-grained locking instead of RTNL")
Signed-off-by: Niels Dossche <dossche.niels@gmail.com>
Link: https://lore.kernel.org/r/20220310232707.44251-1-dossche.niels@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When CONFIG_GENERIC_CPU_VULNERABILITIES is not set, references
to spectre_v2_update_state() cause a build error, so provide an
empty stub for that function when the Kconfig option is not set.
Fixes this build error:
arm-linux-gnueabi-ld: arch/arm/mm/proc-v7-bugs.o: in function `cpu_v7_bugs_init':
proc-v7-bugs.c:(.text+0x52): undefined reference to `spectre_v2_update_state'
arm-linux-gnueabi-ld: proc-v7-bugs.c:(.text+0x82): undefined reference to `spectre_v2_update_state'
Fixes: b9baf5c8c5 ("ARM: Spectre-BHB workaround")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: kernel test robot <lkp@intel.com>
Cc: Russell King <rmk+kernel@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: patches@armlinux.org.uk
Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull RISC-V fixes from Palmer Dabbelt:
- prevent users from enabling the alternatives framework (and thus
errata handling) on XIP kernels, where runtime code patching does not
function correctly.
- properly detect offset overflow for AUIPC-based relocations in
modules. This may manifest as modules calling arbitrary invalid
addresses, depending on the address allocated when a module is
loaded.
* tag 'riscv-for-linus-5.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: Fix auipc+jalr relocation range checks
riscv: alternative only works on !XIP_KERNEL
Pull powerpc fix from Michael Ellerman:
"Fix STACKTRACE=n build, in particular for skiroot_defconfig"
* tag 'powerpc-5.17-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc: Fix STACKTRACE=n build
When building for Thumb2, the vectors make use of a local label. Sadly,
the Spectre BHB code also uses a local label with the same number which
results in the Thumb2 reference pointing at the wrong place. Fix this
by changing the number used for the Spectre BHB local label.
Fixes: b9baf5c8c5 ("ARM: Spectre-BHB workaround")
Tested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull MMC fixes from Ulf Hansson:
"MMC core:
- Restore (mostly) the busy polling for MMC_SEND_OP_COND
MMC host:
- meson-gx: Fix DMA usage of meson_mmc_post_req()"
* tag 'mmc-v5.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: core: Restore (almost) the busy polling for MMC_SEND_OP_COND
mmc: meson: Fix usage of meson_mmc_post_req()
There is a limited amount of SGX memory (EPC) on each system. When that
memory is used up, SGX has its own swapping mechanism which is similar
in concept but totally separate from the core mm/* code. Instead of
swapping to disk, SGX swaps from EPC to normal RAM. That normal RAM
comes from a shared memory pseudo-file and can itself be swapped by the
core mm code. There is a hierarchy like this:
EPC <-> shmem <-> disk
After data is swapped back in from shmem to EPC, the shmem backing
storage needs to be freed. Currently, the backing shmem is not freed.
This effectively wastes the shmem while the enclave is running. The
memory is recovered when the enclave is destroyed and the backing
storage freed.
Sort this out by freeing memory with shmem_truncate_range(), as soon as
a page is faulted back to the EPC. In addition, free the memory for
PCMD pages as soon as all PCMD's in a page have been marked as unused
by zeroing its contents.
Cc: stable@vger.kernel.org
Fixes: 1728ab54b4 ("x86/sgx: Add a page reclaimer")
Reported-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lkml.kernel.org/r/20220303223859.273187-1-jarkko@kernel.org
Merge misc fixes from David Howells:
"A set of patches for watch_queue filter issues noted by Jann. I've
added in a cleanup patch from Christophe Jaillet to convert to using
formal bitmap specifiers for the note allocation bitmap.
Also two filesystem fixes (afs and cachefiles)"
* emailed patches from David Howells <dhowells@redhat.com>:
cachefiles: Fix volume coherency attribute
afs: Fix potential thrashing in afs writeback
watch_queue: Make comment about setting ->defunct more accurate
watch_queue: Fix lack of barrier/sync/lock between post and read
watch_queue: Free the alloc bitmap when the watch_queue is torn down
watch_queue: Fix the alloc bitmap size to reflect notes allocated
watch_queue: Use the bitmap API when applicable
watch_queue: Fix to always request a pow-of-2 pipe ring size
watch_queue: Fix to release page in ->release()
watch_queue, pipe: Free watchqueue state after clearing pipe ring
watch_queue: Fix filter limit check
A network filesystem may set coherency data on a volume cookie, and if
given, cachefiles will store this in an xattr on the directory in the
cache corresponding to the volume.
The function that sets the xattr just stores the contents of the volume
coherency buffer directly into the xattr, with nothing added; the
checking function, on the other hand, has a cut'n'paste error whereby it
tries to interpret the xattr contents as would be the xattr on an
ordinary file (using the cachefiles_xattr struct). This results in a
failure to match the coherency data because the buffer ends up being
shifted by 18 bytes.
Fix this by defining a structure specifically for the volume xattr and
making both the setting and checking functions use it.
Since the volume coherency doesn't work if used, take the opportunity to
insert a reserved field for future use, set it to 0 and check that it is
0. Log mismatch through the appropriate tracepoint.
Note that this only affects cifs; 9p, afs, ceph and nfs don't use the
volume coherency data at the moment.
Fixes: 32e150037d ("fscache, cachefiles: Store the volume coherency data")
Reported-by: Rohith Surabattula <rohiths.msft@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
cc: Steve French <smfrench@gmail.com>
cc: linux-cifs@vger.kernel.org
cc: linux-cachefs@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In afs_writepages_region(), if the dirty page we find is undergoing
writeback or write to cache, but the sync_mode is WB_SYNC_NONE, we go
round the loop trying the same page again and again with no pausing or
waiting unless and until another thread manages to clear the writeback
and fscache flags.
Fix this with three measures:
(1) Advance start to after the page we found.
(2) Break out of the loop and return if rescheduling is requested.
(3) Arbitrarily give up after a maximum of 5 skips.
Fixes: 31143d5d51 ("AFS: implement basic file write support")
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Marc Dionne <marc.dionne@auristor.com>
Acked-by: Marc Dionne <marc.dionne@auristor.com>
Link: https://lore.kernel.org/r/164692725757.2097000.2060513769492301854.stgit@warthog.procyon.org.uk/ # v1
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
watch_queue_clear() has a comment stating that setting ->defunct to true
preventing new additions as well as preventing notifications. Whilst
the latter is true, the first bit is superfluous since at the time this
function is called, the pipe cannot be accessed to add new event
sources.
Remove the "new additions" bit from the comment.
Fixes: c73be61ced ("pipe: Add general notification queue support")
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There's nothing to synchronise post_one_notification() versus
pipe_read(). Whilst posting is done under pipe->rd_wait.lock, the
reader only takes pipe->mutex which cannot bar notification posting as
that may need to be made from contexts that cannot sleep.
Fix this by setting pipe->head with a barrier in post_one_notification()
and reading pipe->head with a barrier in pipe_read().
If that's not sufficient, the rd_wait.lock will need to be taken,
possibly in a ->confirm() op so that it only applies to notifications.
The lock would, however, have to be dropped before copy_page_to_iter()
is invoked.
Fixes: c73be61ced ("pipe: Add general notification queue support")
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently, watch_queue_set_size() sets the number of notes available in
wqueue->nr_notes according to the number of notes allocated, but sets
the size of the bitmap to the unrounded number of notes originally asked
for.
Fix this by setting the bitmap size to the number of notes we're
actually going to make available (ie. the number allocated).
Fixes: c73be61ced ("pipe: Add general notification queue support")
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use bitmap_alloc() to simplify code, improve the semantic and reduce
some open-coded arithmetic in allocator arguments.
Also change a memset(0xff) into an equivalent bitmap_fill() to keep
consistency.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The pipe ring size must always be a power of 2 as the head and tail
pointers are masked off by AND'ing with the size of the ring - 1.
watch_queue_set_size(), however, lets you specify any number of notes
between 1 and 511. This number is passed through to pipe_resize_ring()
without checking/forcing its alignment.
Fix this by rounding the number of slots required up to the nearest
power of two. The request is meant to guarantee that at least that many
notifications can be generated before the queue is full, so rounding
down isn't an option, but, alternatively, it may be better to give an
error if we aren't allowed to allocate that much ring space.
Fixes: c73be61ced ("pipe: Add general notification queue support")
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When a pipe ring descriptor points to a notification message, the
refcount on the backing page is incremented by the generic get function,
but the release function, which marks the bitmap, doesn't drop the page
ref.
Fix this by calling generic_pipe_buf_release() at the end of
watch_queue_pipe_buf_release().
Fixes: c73be61ced ("pipe: Add general notification queue support")
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In free_pipe_info(), free the watchqueue state after clearing the pipe
ring as each pipe ring descriptor has a release function, and in the
case of a notification message, this is watch_queue_pipe_buf_release()
which tries to mark the allocation bitmap that was previously released.
Fix this by moving the put of the pipe's ref on the watch queue to after
the ring has been cleared. We still need to call watch_queue_clear()
before doing that to make sure that the pipe is disconnected from any
notification sources first.
Fixes: c73be61ced ("pipe: Add general notification queue support")
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In watch_queue_set_filter(), there are a couple of places where we check
that the filter type value does not exceed what the type_filter bitmap
can hold. One place calculates the number of bits by:
if (tf[i].type >= sizeof(wfilter->type_filter) * 8)
which is fine, but the second does:
if (tf[i].type >= sizeof(wfilter->type_filter) * BITS_PER_LONG)
which is not. This can lead to a couple of out-of-bounds writes due to
a too-large type:
(1) __set_bit() on wfilter->type_filter
(2) Writing more elements in wfilter->filters[] than we allocated.
Fix this by just using the proper WATCH_TYPE__NR instead, which is the
number of types we actually know about.
The bug may cause an oops looking something like:
BUG: KASAN: slab-out-of-bounds in watch_queue_set_filter+0x659/0x740
Write of size 4 at addr ffff88800d2c66bc by task watch_queue_oob/611
...
Call Trace:
<TASK>
dump_stack_lvl+0x45/0x59
print_address_description.constprop.0+0x1f/0x150
...
kasan_report.cold+0x7f/0x11b
...
watch_queue_set_filter+0x659/0x740
...
__x64_sys_ioctl+0x127/0x190
do_syscall_64+0x43/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae
Allocated by task 611:
kasan_save_stack+0x1e/0x40
__kasan_kmalloc+0x81/0xa0
watch_queue_set_filter+0x23a/0x740
__x64_sys_ioctl+0x127/0x190
do_syscall_64+0x43/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae
The buggy address belongs to the object at ffff88800d2c66a0
which belongs to the cache kmalloc-32 of size 32
The buggy address is located 28 bytes inside of
32-byte region [ffff88800d2c66a0, ffff88800d2c66c0)
Fixes: c73be61ced ("pipe: Add general notification queue support")
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull drm fixes from Dave Airlie:
"As expected at this stage its pretty quiet, one sun4i mixer fix and
one i915 display flicker fix:
i915:
- fix psr screen flicker
sun4i:
- mixer format fix"
* tag 'drm-fixes-2022-03-11' of git://anongit.freedesktop.org/drm/drm:
drm/sun4i: mixer: Fix P010 and P210 format numbers
drm/i915/psr: Set "SF Partial Frame Enable" also on full update
RISC-V can do PC-relative jumps with a 32bit range using the following
two instructions:
auipc t0, imm20 ; t0 = PC + imm20 * 2^12
jalr ra, t0, imm12 ; ra = PC + 4, PC = t0 + imm12
Crucially both the 20bit immediate imm20 and the 12bit immediate imm12
are treated as two's-complement signed values. For this reason the
immediates are usually calculated like this:
imm20 = (offset + 0x800) >> 12
imm12 = offset & 0xfff
..where offset is the signed offset from the auipc instruction. When
the 11th bit of offset is 0 the addition of 0x800 doesn't change the top
20 bits and imm12 considered positive. When the 11th bit is 1 the carry
of the addition by 0x800 means imm20 is one higher, but since imm12 is
then considered negative the two's complement representation means it
all cancels out nicely.
However, this addition by 0x800 (2^11) means an offset greater than or
equal to 2^31 - 2^11 would overflow so imm20 is considered negative and
result in a backwards jump. Similarly the lower range of offset is also
moved down by 2^11 and hence the true 32bit range is
[-2^31 - 2^11, 2^31 - 2^11)
Signed-off-by: Emil Renner Berthing <kernel@esmil.dk>
Fixes: e2c0cdfba7 ("RISC-V: User-facing API")
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Pull tracing fixes from Steven Rostedt:
"Minor tracing fixes:
- Fix unregistering the same event twice. A user could disable the
same event that osnoise will disable on unregistering.
- Inform RCU of a quiescent state in the osnoise testing thread.
- Fix some kerneldoc comments"
* tag 'trace-v5.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
ftrace: Fix some W=1 warnings in kernel doc comments
tracing/osnoise: Force quiescent states while tracing
tracing/osnoise: Do not unregister events twice
Pull networking fixes from Jakub Kicinski:
"Including fixes from bluetooth, and ipsec.
Current release - regressions:
- Bluetooth: fix unbalanced unlock in set_device_flags()
- Bluetooth: fix not processing all entries on cmd_sync_work, make
connect with qualcomm and intel adapters reliable
- Revert "xfrm: state and policy should fail if XFRMA_IF_ID 0"
- xdp: xdp_mem_allocator can be NULL in trace_mem_connect()
- eth: ice: fix race condition and deadlock during interface enslave
Current release - new code bugs:
- tipc: fix incorrect order of state message data sanity check
Previous releases - regressions:
- esp: fix possible buffer overflow in ESP transformation
- dsa: unlock the rtnl_mutex when dsa_master_setup() fails
- phy: meson-gxl: fix interrupt handling in forced mode
- smsc95xx: ignore -ENODEV errors when device is unplugged
Previous releases - always broken:
- xfrm: fix tunnel mode fragmentation behavior
- esp: fix inter address family tunneling on GSO
- tipc: fix null-deref due to race when enabling bearer
- sctp: fix kernel-infoleak for SCTP sockets
- eth: macb: fix lost RX packet wakeup race in NAPI receive
- eth: intel stop disabling VFs due to PF error responses
- eth: bcmgenet: don't claim WOL when its not available"
* tag 'net-5.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (50 commits)
xdp: xdp_mem_allocator can be NULL in trace_mem_connect().
ice: Fix race condition during interface enslave
net: phy: meson-gxl: improve link-up behavior
net: bcmgenet: Don't claim WOL when its not available
net: arc_emac: Fix use after free in arc_mdio_probe()
sctp: fix kernel-infoleak for SCTP sockets
net: phy: correct spelling error of media in documentation
net: phy: DP83822: clear MISR2 register to disable interrupts
gianfar: ethtool: Fix refcount leak in gfar_get_ts_info
selftests: pmtu.sh: Kill nettest processes launched in subshell.
selftests: pmtu.sh: Kill tcpdump processes launched by subshell.
NFC: port100: fix use-after-free in port100_send_complete
net/mlx5e: SHAMPO, reduce TIR indication
net/mlx5e: Lag, Only handle events from highest priority multipath entry
net/mlx5: Fix offloading with ESWITCH_IPV4_TTL_MODIFY_ENABLE
net/mlx5: Fix a race on command flush flow
net/mlx5: Fix size field in bufferx_reg struct
ax25: Fix NULL pointer dereference in ax25_kill_by_device
net: marvell: prestera: Add missing of_node_put() in prestera_switch_set_base_mac_addr
net: ethernet: lpc_eth: Handle error for clk_enable
...
Since the commit mentioned below __xdp_reg_mem_model() can return a NULL
pointer. This pointer is dereferenced in trace_mem_connect() which leads
to segfault.
The trace points (mem_connect + mem_disconnect) were put in place to
pair connect/disconnect using the IDs. The ID is only assigned if
__xdp_reg_mem_model() does not return NULL. That connect trace point is
of no use if there is no ID.
Skip that connect trace point if xdp_alloc is NULL.
[ Toke Høiland-Jørgensen delivered the reasoning for skipping the trace
point ]
Fixes: 4a48ef70b9 ("xdp: Allow registering memory model without rxq reference")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/r/YikmmXsffE+QajTB@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit 5dbbbd01cb ("ice: Avoid RTNL lock when re-creating
auxiliary device") changes a process of re-creation of aux device
so ice_plug_aux_dev() is called from ice_service_task() context.
This unfortunately opens a race window that can result in dead-lock
when interface has left LAG and immediately enters LAG again.
Reproducer:
```
#!/bin/sh
ip link add lag0 type bond mode 1 miimon 100
ip link set lag0
for n in {1..10}; do
echo Cycle: $n
ip link set ens7f0 master lag0
sleep 1
ip link set ens7f0 nomaster
done
```
This results in:
[20976.208697] Workqueue: ice ice_service_task [ice]
[20976.213422] Call Trace:
[20976.215871] __schedule+0x2d1/0x830
[20976.219364] schedule+0x35/0xa0
[20976.222510] schedule_preempt_disabled+0xa/0x10
[20976.227043] __mutex_lock.isra.7+0x310/0x420
[20976.235071] enum_all_gids_of_dev_cb+0x1c/0x100 [ib_core]
[20976.251215] ib_enum_roce_netdev+0xa4/0xe0 [ib_core]
[20976.256192] ib_cache_setup_one+0x33/0xa0 [ib_core]
[20976.261079] ib_register_device+0x40d/0x580 [ib_core]
[20976.266139] irdma_ib_register_device+0x129/0x250 [irdma]
[20976.281409] irdma_probe+0x2c1/0x360 [irdma]
[20976.285691] auxiliary_bus_probe+0x45/0x70
[20976.289790] really_probe+0x1f2/0x480
[20976.298509] driver_probe_device+0x49/0xc0
[20976.302609] bus_for_each_drv+0x79/0xc0
[20976.306448] __device_attach+0xdc/0x160
[20976.310286] bus_probe_device+0x9d/0xb0
[20976.314128] device_add+0x43c/0x890
[20976.321287] __auxiliary_device_add+0x43/0x60
[20976.325644] ice_plug_aux_dev+0xb2/0x100 [ice]
[20976.330109] ice_service_task+0xd0c/0xed0 [ice]
[20976.342591] process_one_work+0x1a7/0x360
[20976.350536] worker_thread+0x30/0x390
[20976.358128] kthread+0x10a/0x120
[20976.365547] ret_from_fork+0x1f/0x40
...
[20976.438030] task:ip state:D stack: 0 pid:213658 ppid:213627 flags:0x00004084
[20976.446469] Call Trace:
[20976.448921] __schedule+0x2d1/0x830
[20976.452414] schedule+0x35/0xa0
[20976.455559] schedule_preempt_disabled+0xa/0x10
[20976.460090] __mutex_lock.isra.7+0x310/0x420
[20976.464364] device_del+0x36/0x3c0
[20976.467772] ice_unplug_aux_dev+0x1a/0x40 [ice]
[20976.472313] ice_lag_event_handler+0x2a2/0x520 [ice]
[20976.477288] notifier_call_chain+0x47/0x70
[20976.481386] __netdev_upper_dev_link+0x18b/0x280
[20976.489845] bond_enslave+0xe05/0x1790 [bonding]
[20976.494475] do_setlink+0x336/0xf50
[20976.502517] __rtnl_newlink+0x529/0x8b0
[20976.543441] rtnl_newlink+0x43/0x60
[20976.546934] rtnetlink_rcv_msg+0x2b1/0x360
[20976.559238] netlink_rcv_skb+0x4c/0x120
[20976.563079] netlink_unicast+0x196/0x230
[20976.567005] netlink_sendmsg+0x204/0x3d0
[20976.570930] sock_sendmsg+0x4c/0x50
[20976.574423] ____sys_sendmsg+0x1eb/0x250
[20976.586807] ___sys_sendmsg+0x7c/0xc0
[20976.606353] __sys_sendmsg+0x57/0xa0
[20976.609930] do_syscall_64+0x5b/0x1a0
[20976.613598] entry_SYSCALL_64_after_hwframe+0x65/0xca
1. Command 'ip link ... set nomaster' causes that ice_plug_aux_dev()
is called from ice_service_task() context, aux device is created
and associated device->lock is taken.
2. Command 'ip link ... set master...' calls ice's notifier under
RTNL lock and that notifier calls ice_unplug_aux_dev(). That
function tries to take aux device->lock but this is already taken
by ice_plug_aux_dev() in step 1
3. Later ice_plug_aux_dev() tries to take RTNL lock but this is already
taken in step 2
4. Dead-lock
The patch fixes this issue by following changes:
- Bit ICE_FLAG_PLUG_AUX_DEV is kept to be set during ice_plug_aux_dev()
call in ice_service_task()
- The bit is checked in ice_clear_rdma_cap() and only if it is not set
then ice_unplug_aux_dev() is called. If it is set (in other words
plugging of aux device was requested and ice_plug_aux_dev() is
potentially running) then the function only clears the bit
- Once ice_plug_aux_dev() call (in ice_service_task) is finished
the bit ICE_FLAG_PLUG_AUX_DEV is cleared but it is also checked
whether it was already cleared by ice_clear_rdma_cap(). If so then
aux device is unplugged.
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Co-developed-by: Petr Oros <poros@redhat.com>
Signed-off-by: Petr Oros <poros@redhat.com>
Reviewed-by: Dave Ertman <david.m.ertman@intel.com>
Link: https://lore.kernel.org/r/20220310171641.3863659-1-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Some of the bcmgenet platforms don't correctly support WOL, yet
ethtool returns:
"Supports Wake-on: gsf"
which is false.
Ideally if there isn't a wol_irq, or there is something else that
keeps the device from being able to wakeup it should display:
"Supports Wake-on: d"
This patch checks whether the device can wakup, before using the
hard-coded supported flags. This corrects the ethtool reporting, as
well as the WOL configuration because ethtool verifies that the mode
is supported before attempting it.
Fixes: c51de7f397 ("net: bcmgenet: add Wake-on-LAN support code")
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Tested-by: Peter Robinson <pbrobinson@gmail.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20220310045535.224450-1-jeremy.linton@arm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If bus->state is equal to MDIOBUS_ALLOCATED, mdiobus_free(bus) will free
the "bus". But bus->name is still used in the next line, which will lead
to a use after free.
We can fix it by putting the name in a local variable and make the
bus->name point to the rodata section "name",then use the name in the
error message without referring to bus to avoid the uaf.
Fixes: 95b5fc03c1 ("net: arc_emac: Make use of the helper function dev_err_probe()")
Signed-off-by: Jianglei Nie <niejianglei2021@163.com>
Link: https://lore.kernel.org/r/20220309121824.36529-1-niejianglei2021@163.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Saeed Mahameed says:
====================
mlx5 fixes 2022-03-09
This series provides bug fixes to mlx5 driver.
* tag 'mlx5-fixes-2022-03-09' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5e: SHAMPO, reduce TIR indication
net/mlx5e: Lag, Only handle events from highest priority multipath entry
net/mlx5: Fix offloading with ESWITCH_IPV4_TTL_MODIFY_ENABLE
net/mlx5: Fix a race on command flush flow
net/mlx5: Fix size field in bufferx_reg struct
====================
Link: https://lore.kernel.org/r/20220309201517.589132-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull block fix from Jens Axboe:
"Just a single fix for a regression that occured in this merge window"
* tag 'block-5.17-2022-03-10' of git://git.kernel.dk/linux-block:
block: fix blk_mq_attempt_bio_merge and rq_qos_throttle protection
Pull staging driver fixes from Greg KH:
"Here are three small fixes for staging drivers for 5.17-rc8 or -final,
which ever comes next.
They resolve some reported problems:
- rtl8723bs wifi driver deadlock fix for reported problem that is a
revert of a previous patch. Also a documentation fix is added so
that the same problem hopefully can not come back again.
- gdm724x driver use-after-free fix for a reported problem.
All of these have been in linux-next for a while with no reported
problems"
* tag 'staging-5.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: rtl8723bs: Improve the comment explaining the locking rules
staging: rtl8723bs: Fix access-point mode deadlock
staging: gdm724x: fix use after free in gdm_lte_rx()
Pull ARM SoC fixes from Arnd Bergmann:
"Here is a third set of fixes for the soc tree, well within the
expected set of changes.
Maintainer list changes:
- Krzysztof Kozlowski and Jisheng Zhang both have new email addresses
- Broadcom iProc has a new git tree
Regressions:
- Robert Foss sends a revert for a Mediatek DPI bridge patch that
caused an inadvertent break in the DT binding
- mstar timers need to be included in Kconfig
Devicetree fixes for:
- Aspeed ast2600 spi pinmux
- Tegra eDP panels on Nyan FHD
- Tegra display IOMMU
- Qualcomm sm8350 UFS clocks
- minor DT changes for Marvell Armada, Qualcomm sdx65, Qualcomm
sm8450, and Broadcom BCM2711"
* tag 'soc-fixes-5.17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
arm64: dts: marvell: armada-37xx: Remap IO space to bus address 0x0
MAINTAINERS: Update Jisheng's email address
Revert "arm64: dts: mt8183: jacuzzi: Fix bus properties in anx's DSI endpoint"
dt-bindings: drm/bridge: anx7625: Revert DPI support
ARM: dts: aspeed: Fix AST2600 quad spi group
MAINTAINERS: update Krzysztof Kozlowski's email
MAINTAINERS: Update git tree for Broadcom iProc SoCs
ARM: tegra: Move Nyan FHD panels to AUX bus
arm64: dts: armada-3720-turris-mox: Add missing ethernet0 alias
ARM: mstar: Select HAVE_ARM_ARCH_TIMER
soc: mediatek: mt8192-mmsys: Fix dither to dsi0 path's input sel
arm64: dts: mt8183: jacuzzi: Fix bus properties in anx's DSI endpoint
ARM: boot: dts: bcm2711: Fix HVS register range
arm64: dts: qcom: c630: disable crypto due to serror
arm64: dts: qcom: sm8450: fix apps_smmu interrupts
arm64: dts: qcom: sm8450: enable GCC_USB3_0_CLKREF_EN for usb
arm64: dts: qcom: sm8350: Correct UFS symbol clocks
arm64: tegra: Disable ISO SMMU for Tegra194
Revert "dt-bindings: arm: qcom: Document SDX65 platform and boards"
Instead of using GUP, make fault_in_safe_writeable() actually force a
'handle_mm_fault()' using the same fixup_user_fault() machinery that
futexes already use.
Using the GUP machinery meant that fault_in_safe_writeable() did not do
everything that a real fault would do, ranging from not auto-expanding
the stack segment, to not updating accessed or dirty flags in the page
tables (GUP sets those flags on the pages themselves).
The latter causes problems on architectures (like s390) that do accessed
bit handling in software, which meant that fault_in_safe_writeable()
didn't actually do all the fault handling it needed to, and trying to
access the user address afterwards would still cause faults.
Reported-and-tested-by: Andreas Gruenbacher <agruenba@redhat.com>
Fixes: cdd591fc86 ("iov_iter: Introduce fault_in_iov_iter_writeable")
Link: https://lore.kernel.org/all/CAHc6FU5nP+nziNGG0JAF1FUx-GV7kKFvM7aZuU_XD2_1v4vnvg@mail.gmail.com/
Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The alternative mechanism needs runtime code patching, it can't work
on XIP_KERNEL. And the errata workarounds are implemented via the
alternative mechanism. So add !XIP_KERNEL dependency for alternative
and erratas.
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Fixes: 44c9225729 ("RISC-V: enable XIP")
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
mvebu fixes for 5.17 (part 2)
Allow using old PCIe card on Armada 37xx
* tag 'mvebu-fixes-5.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/gclement/mvebu:
arm64: dts: marvell: armada-37xx: Remap IO space to bus address 0x0
Link: https://lore.kernel.org/r/87bkydj4fn.fsf@BL-laptop
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Legacy and old PCI I/O based cards do not support 32-bit I/O addressing.
Since commit 64f160e19e ("PCI: aardvark: Configure PCIe resources from
'ranges' DT property") kernel can set different PCIe address on CPU and
different on the bus for the one A37xx address mapping without any firmware
support in case the bus address does not conflict with other A37xx mapping.
So remap I/O space to the bus address 0x0 to enable support for old legacy
I/O port based cards which have hardcoded I/O ports in low address space.
Note that DDR on A37xx is mapped to bus address 0x0. And mapping of I/O
space can be set to address 0x0 too because MEM space and I/O space are
separate and so do not conflict.
Remapping IO space on Turris Mox to different address is not possible to
due bootloader bug.
Signed-off-by: Pali Rohár <pali@kernel.org>
Reported-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 76f6386b25 ("arm64: dts: marvell: Add Aardvark PCIe support for Armada 3700")
Cc: stable@vger.kernel.org # 64f160e19e ("PCI: aardvark: Configure PCIe resources from 'ranges' DT property")
Cc: stable@vger.kernel.org # 514ef1e62d ("arm64: dts: marvell: armada-37xx: Extend PCIe MEM space")
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Commit e2ae38cf3d ("vhost: fix hung thread due to erroneous iotlb
entries") tries to reject the IOTLB message whose size is zero. But
the size is not necessarily meaningful, one example is the batching
hint, so the commit breaks that.
Fixing this be reject zero size message only if the message is used to
update/invalidate the IOTLB.
Fixes: e2ae38cf3d ("vhost: fix hung thread due to erroneous iotlb entries")
Reported-by: Eli Cohen <elic@nvidia.com>
Cc: Anirudh Rayabharam <mail@anirudhrb.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20220310075211.4801-1-jasowang@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Eli Cohen <elic@nvidia.com>
Pull spi fix from Mark Brown:
"One fix for type conversion issues when working out maximum
scatter/gather segment sizes.
It caused problems for some systems where the limits overflow
due to the type conversion"
* tag 'spi-fix-v5.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: Fix invalid sgs value
The kernel test robot discovered that building without
HARDEN_BRANCH_PREDICTOR issues a warning due to a missing
argument to pr_info().
Add the missing argument.
Reported-by: kernel test robot <lkp@intel.com>
Fixes: 9dd78194a3 ("ARM: report Spectre v2 status through sysfs")
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull gpio fixes from Bartosz Golaszewski:
- fix a probe failure for Tegra241 GPIO controller in gpio-tegra186
- revert changes that caused a regression in the sysfs user-space
interface
- correct the debounce time conversion in GPIO ACPI
- statify a struct in gpio-sim and fix a typo
- update registers in correct order (hardware quirk) in gpio-ts4900
* tag 'gpio-fixes-for-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpio: sim: fix a typo
gpio: ts4900: Do not set DAT and OE together
gpio: sim: Declare gpio_sim_hog_config_item_ops static
gpiolib: acpi: Convert ACPI value of debounce to microseconds
gpio: Revert regression in sysfs-gpio (gpiolib.c)
gpio: tegra186: Add IRQ per bank for Tegra241
Just noticed this when applying Andy's patch. s/childred/children/
Fixes: cb8c474e79 ("gpio: sim: new testing module")
Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
This works around an issue with the hardware where both OE and
DAT are exposed in the same register. If both are updated
simultaneously, the harware makes no guarantees that OE or DAT
will actually change in any given order and may result in a
glitch of a few ns on a GPIO pin when changing direction and value
in a single write.
Setting direction to input now only affects OE bit. Setting
direction to output updates DAT first, then OE.
Fixes: 9c6686322d ("gpio: add Technologic I2C-FPGA gpio support")
Signed-off-by: Mark Featherston <mark@embeddedTS.com>
Signed-off-by: Kris Bahnsen <kris@embeddedTS.com>
Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl>
Add __GFP_ZERO flag for compose_sadb_supported in function pfkey_register
to initialize the buffer of supp_skb to fix a kernel-info-leak issue.
1) Function pfkey_register calls compose_sadb_supported to request
a sk_buff. 2) compose_sadb_supported calls alloc_sbk to allocate
a sk_buff, but it doesn't zero it. 3) If auth_len is greater 0, then
compose_sadb_supported treats the memory as a struct sadb_supported and
begins to initialize. But it just initializes the field sadb_supported_len
and field sadb_supported_exttype without field sadb_supported_reserved.
Reported-by: TCS Robot <tcs_robot@tencent.com>
Signed-off-by: Haimin Zhang <tcs_kernel@tencent.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Pull clk fixes from Stephen Boyd:
"One more small batch of clk driver fixes:
- A fix for the Qualcomm GDSC power domain delays that avoids black
screens at boot on some more recent SoCs that use a different delay
than the hard-coded delays in the driver.
- A build fix LAN966X clk driver that let it be built on
architectures that didn't have IOMEM"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: lan966x: Fix linking error
clk: qcom: dispcc: Update the transition delay for MDSS GDSC
clk: qcom: gdsc: Add support to update GDSC transition delay
Pull xen fixes from Juergen Gross:
"Several Linux PV device frontends are using the grant table interfaces
for removing access rights of the backends in ways being subject to
race conditions, resulting in potential data leaks, data corruption by
malicious backends, and denial of service triggered by malicious
backends:
- blkfront, netfront, scsifront and the gntalloc driver are testing
whether a grant reference is still in use. If this is not the case,
they assume that a following removal of the granted access will
always succeed, which is not true in case the backend has mapped
the granted page between those two operations.
As a result the backend can keep access to the memory page of the
guest no matter how the page will be used after the frontend I/O
has finished. The xenbus driver has a similar problem, as it
doesn't check the success of removing the granted access of a
shared ring buffer.
- blkfront, netfront, scsifront, usbfront, dmabuf, xenbus, 9p,
kbdfront, and pvcalls are using a functionality to delay freeing a
grant reference until it is no longer in use, but the freeing of
the related data page is not synchronized with dropping the granted
access.
As a result the backend can keep access to the memory page even
after it has been freed and then re-used for a different purpose.
- netfront will fail a BUG_ON() assertion if it fails to revoke
access in the rx path.
This will result in a Denial of Service (DoS) situation of the
guest which can be triggered by the backend"
* tag 'xsa396-5.17-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/netfront: react properly to failing gnttab_end_foreign_access_ref()
xen/gnttab: fix gnttab_end_foreign_access() without page specified
xen/pvcalls: use alloc/free_pages_exact()
xen/9p: use alloc/free_pages_exact()
xen/usb: don't use gnttab_end_foreign_access() in xenhcd_gnttab_done()
xen: remove gnttab_query_foreign_access()
xen/gntalloc: don't use gnttab_query_foreign_access()
xen/scsifront: don't use gnttab_query_foreign_access() for mapped status
xen/netfront: don't use gnttab_query_foreign_access() for mapped status
xen/blkfront: don't use gnttab_query_foreign_access() for mapped status
xen/grant-table: add gnttab_try_end_foreign_access()
xen/xenbus: don't let xenbus_grant_ring() remove grants in error case
Guillaume Nault says:
====================
selftests: pmtu.sh: Fix cleanup of processes launched in subshell.
Depending on the options used, pmtu.sh may launch tcpdump and nettest
processes in the background. However it fails to clean them up after
the tests complete.
Patch 1 allows the cleanup() function to read the list of PIDs launched
by the tests.
Patch 2 fixes the way the nettest PIDs are retrieved.
====================
Link: https://lore.kernel.org/r/cover.1646776561.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When using "run_cmd <command> &", then "$!" refers to the PID of the
subshell used to run <command>, not the command itself. Therefore
nettest_pids actually doesn't contain the list of the nettest commands
running in the background. So cleanup() can't kill them and the nettest
processes run until completion (fortunately they have a 5s timeout).
Fix this by defining a new command for running processes in the
background, for which "$!" really refers to the PID of the command run.
Also, double quote variables on the modified lines, to avoid shellcheck
warnings.
Fixes: ece1278a9b ("selftests: net: add ESP-in-UDP PMTU test")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The cleanup() function takes care of killing processes launched by the
test functions. It relies on variables like ${tcpdump_pids} to get the
relevant PIDs. But tests are run in their own subshell, so updated
*_pids values are invisible to other shells. Therefore cleanup() never
sees any process to kill:
$ ./tools/testing/selftests/net/pmtu.sh -t pmtu_ipv4_exception
TEST: ipv4: PMTU exceptions [ OK ]
TEST: ipv4: PMTU exceptions - nexthop objects [ OK ]
$ pgrep -af tcpdump
6084 tcpdump -s 0 -i veth_A-R1 -w pmtu_ipv4_exception_veth_A-R1.pcap
6085 tcpdump -s 0 -i veth_R1-A -w pmtu_ipv4_exception_veth_R1-A.pcap
6086 tcpdump -s 0 -i veth_R1-B -w pmtu_ipv4_exception_veth_R1-B.pcap
6087 tcpdump -s 0 -i veth_B-R1 -w pmtu_ipv4_exception_veth_B-R1.pcap
6088 tcpdump -s 0 -i veth_A-R2 -w pmtu_ipv4_exception_veth_A-R2.pcap
6089 tcpdump -s 0 -i veth_R2-A -w pmtu_ipv4_exception_veth_R2-A.pcap
6090 tcpdump -s 0 -i veth_R2-B -w pmtu_ipv4_exception_veth_R2-B.pcap
6091 tcpdump -s 0 -i veth_B-R2 -w pmtu_ipv4_exception_veth_B-R2.pcap
6228 tcpdump -s 0 -i veth_A-R1 -w pmtu_ipv4_exception_veth_A-R1.pcap
6229 tcpdump -s 0 -i veth_R1-A -w pmtu_ipv4_exception_veth_R1-A.pcap
6230 tcpdump -s 0 -i veth_R1-B -w pmtu_ipv4_exception_veth_R1-B.pcap
6231 tcpdump -s 0 -i veth_B-R1 -w pmtu_ipv4_exception_veth_B-R1.pcap
6232 tcpdump -s 0 -i veth_A-R2 -w pmtu_ipv4_exception_veth_A-R2.pcap
6233 tcpdump -s 0 -i veth_R2-A -w pmtu_ipv4_exception_veth_R2-A.pcap
6234 tcpdump -s 0 -i veth_R2-B -w pmtu_ipv4_exception_veth_R2-B.pcap
6235 tcpdump -s 0 -i veth_B-R2 -w pmtu_ipv4_exception_veth_B-R2.pcap
Fix this by running cleanup() in the context of the test subshell.
Now that each test cleans the environment after completion, there's no
need for calling cleanup() again when the next test starts. So let's
drop it from the setup() function. This is okay because cleanup() is
also called when pmtu.sh starts, so even the first test starts in a
clean environment.
Also, use tcpdump's immediate mode. Otherwise it might not have time to
process buffered packets, resulting in missing packets or even empty
pcap files for short tests.
Note: PAUSE_ON_FAIL is still evaluated before cleanup(), so one can
still inspect the test environment upon failure when using -p.
Fixes: a92a0a7b8e ("selftests: pmtu: Simplify cleanup and namespace names")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull arm64 build fix from Catalin Marinas:
"Fix kernel build with clang LTO after the inclusion of the Spectre BHB
arm64 mitigations"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: Do not include __READ_ONCE() block in assembly files
ld.lld does not support the NOCROSSREFS directive at the moment, which
breaks the build after commit b9baf5c8c5 ("ARM: Spectre-BHB
workaround"):
ld.lld: error: ./arch/arm/kernel/vmlinux.lds:34: AT expected, but got NOCROSSREFS
Support for this directive will eventually be implemented, at which
point a version check can be added. To avoid breaking the build in the
meantime, just define NOCROSSREFS to nothing when using ld.lld, with a
link to the issue for tracking.
Cc: stable@vger.kernel.org
Fixes: b9baf5c8c5 ("ARM: Spectre-BHB workaround")
Link: https://github.com/ClangBuiltLinux/linux/issues/1609
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When building arm64 defconfig + CONFIG_LTO_CLANG_{FULL,THIN}=y after
commit 558c303c97 ("arm64: Mitigate spectre style branch history side
channels"), the following error occurs:
<instantiation>:4:2: error: invalid fixup for movz/movk instruction
mov w0, #ARM_SMCCC_ARCH_WORKAROUND_3
^
Marc figured out that moving "#include <linux/init.h>" in
include/linux/arm-smccc.h into a !__ASSEMBLY__ block resolves it. The
full include chain with CONFIG_LTO=y from include/linux/arm-smccc.h:
include/linux/init.h
include/linux/compiler.h
arch/arm64/include/asm/rwonce.h
arch/arm64/include/asm/alternative-macros.h
arch/arm64/include/asm/assembler.h
The asm/alternative-macros.h include in asm/rwonce.h only happens when
CONFIG_LTO is set, which ultimately casues asm/assembler.h to be
included before the definition of ARM_SMCCC_ARCH_WORKAROUND_3. As a
result, the preprocessor does not expand ARM_SMCCC_ARCH_WORKAROUND_3 in
__mitigate_spectre_bhb_fw, which results in the error above.
Avoid this problem by just avoiding the CONFIG_LTO=y __READ_ONCE() block
in asm/rwonce.h with assembly files, as nothing in that block is useful
to assembly files, which allows ARM_SMCCC_ARCH_WORKAROUND_3 to be
properly expanded with CONFIG_LTO=y builds.
Fixes: e35123d83e ("arm64: lto: Strengthen READ_ONCE() to acquire when CONFIG_LTO=y")
Cc: <stable@vger.kernel.org> # 5.11.x
Link: https://lore.kernel.org/r/20220309155716.3988480-1-maz@kernel.org/
Reported-by: Marc Zyngier <maz@kernel.org>
Acked-by: James Morse <james.morse@arm.com>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20220309191633.2307110-1-nathan@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Pull HID fixes from Jiri Kosina:
- sysfs attributes leak fix for Google Vivaldi driver (Dmitry Torokhov)
- fix for potential out-of-bounds read in Thrustmaster driver (Pavel
Skripkin)
- error handling reference leak in Elo driver (Jiri Kosina)
- a few new device IDs
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
HID: nintendo: check the return value of alloc_workqueue()
HID: vivaldi: fix sysfs attributes leak
HID: hid-thrustmaster: fix OOB read in thrustmaster_interrupts
HID: elo: Revert USB reference counting
HID: Add support for open wheel and no attachment to T300
HID: logitech-dj: add new lightspeed receiver id
Pull arm64 fixes from Catalin Marinas:
- Fix compilation of eBPF object files that indirectly include
mte-kasan.h.
- Fix test for execute-only permissions with EPAN (Enhanced Privileged
Access Never, ARMv8.7 feature).
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: kasan: fix include error in MTE functions
arm64: Ensure execute-only permissions are not allowed without EPAN
In the recent Spectre BHB patches, there was a typo that is only
exposed in certain configurations: mcr p15,0,XX,c7,r5,4 should have
been mcr p15,0,XX,c7,c5,4
Reported-by: kernel test robot <lkp@intel.com>
Fixes: b9baf5c8c5 ("ARM: Spectre-BHB workaround")
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
SHAMPO is an RQ / WQ feature, an indication was added to the TIR in the
first place to enforce suitability between connected TIR and RQ, this
enforcement does not exist in current the Firmware implementation and was
redundant in the first place.
Fixes: 83439f3c37 ("net/mlx5e: Add HW-GRO offload")
Signed-off-by: Ben Ben-Ishay <benishay@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
There could be multiple multipath entries but changing the port affinity
for each one doesn't make much sense and there should be a default one.
So only track the entry with lowest priority value.
The commit doesn't affect existing users with a single entry.
Fixes: 544fe7c2e6 ("net/mlx5e: Activate HW multipath and handle port affinity based on FIB events")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Only prio 1 is supported for nic mode when there is no ignore flow level
support in firmware. But for switchdev mode, which supports fixed number
of statically pre-allocated prios, this restriction is not relevant so
it can be relaxed.
Fixes: d671e109bd ("net/mlx5: Fix tc max supported prio for nic mode")
Signed-off-by: Dima Chumak <dchumak@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Fix a refcount use after free warning due to a race on command entry.
Such race occurs when one of the commands releases its last refcount and
frees its index and entry while another process running command flush
flow takes refcount to this command entry. The process which handles
commands flush may see this command as needed to be flushed if the other
process released its refcount but didn't release the index yet. Fix it
by adding the needed spin lock.
It fixes the following warning trace:
refcount_t: addition on 0; use-after-free.
WARNING: CPU: 11 PID: 540311 at lib/refcount.c:25 refcount_warn_saturate+0x80/0xe0
...
RIP: 0010:refcount_warn_saturate+0x80/0xe0
...
Call Trace:
<TASK>
mlx5_cmd_trigger_completions+0x293/0x340 [mlx5_core]
mlx5_cmd_flush+0x3a/0xf0 [mlx5_core]
enter_error_state+0x44/0x80 [mlx5_core]
mlx5_fw_fatal_reporter_err_work+0x37/0xe0 [mlx5_core]
process_one_work+0x1be/0x390
worker_thread+0x4d/0x3d0
? rescuer_thread+0x350/0x350
kthread+0x141/0x160
? set_kthread_struct+0x40/0x40
ret_from_fork+0x1f/0x30
</TASK>
Fixes: 50b2412b7e ("net/mlx5: Avoid possible free of command entry while timeout comp handler")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Eran Ben Elisha <eranbe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
According to HW spec the field "size" should be 16 bits
in bufferx register.
Fixes: e281682bf2 ("net/mlx5_core: HW data structs/types definitions cleanup")
Signed-off-by: Mohammad Kabat <mohammadkab@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
At the moment running osnoise on a nohz_full CPU or uncontested FIFO
priority and a PREEMPT_RCU kernel might have the side effect of
extending grace periods too much. This will entice RCU to force a
context switch on the wayward CPU to end the grace period, all while
introducing unwarranted noise into the tracer. This behaviour is
unavoidable as overly extending grace periods might exhaust the system's
memory.
This same exact problem is what extended quiescent states (EQS) were
created for, conversely, rcu_momentary_dyntick_idle() emulates them by
performing a zero duration EQS. So let's make use of it.
In the common case rcu_momentary_dyntick_idle() is fairly inexpensive:
atomically incrementing a local per-CPU counter and doing a store. So it
shouldn't affect osnoise's measurements (which has a 1us granularity),
so we'll call it unanimously.
The uncommon case involve calling rcu_momentary_dyntick_idle() after
having the osnoise process:
- Receive an expedited quiescent state IPI with preemption disabled or
during an RCU critical section. (activates rdp->cpu_no_qs.b.exp
code-path).
- Being preempted within in an RCU critical section and having the
subsequent outermost rcu_read_unlock() called with interrupts
disabled. (t->rcu_read_unlock_special.b.blocked code-path).
Neither of those are possible at the moment, and are unlikely to be in
the future given the osnoise's loop design. On top of this, the noise
generated by the situations described above is unavoidable, and if not
exposed by rcu_momentary_dyntick_idle() will be eventually seen in
subsequent rcu_read_unlock() calls or schedule operations.
Link: https://lkml.kernel.org/r/20220307180740.577607-1-nsaenzju@redhat.com
Cc: stable@vger.kernel.org
Fixes: bce29ac9ce ("trace: Add osnoise tracer")
Signed-off-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Acked-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Nicolas reported that using:
# trace-cmd record -e all -M 10 -p osnoise --poll
Resulted in the following kernel warning:
------------[ cut here ]------------
WARNING: CPU: 0 PID: 1217 at kernel/tracepoint.c:404 tracepoint_probe_unregister+0x280/0x370
[...]
CPU: 0 PID: 1217 Comm: trace-cmd Not tainted 5.17.0-rc6-next-20220307-nico+ #19
RIP: 0010:tracepoint_probe_unregister+0x280/0x370
[...]
CR2: 00007ff919b29497 CR3: 0000000109da4005 CR4: 0000000000170ef0
Call Trace:
<TASK>
osnoise_workload_stop+0x36/0x90
tracing_set_tracer+0x108/0x260
tracing_set_trace_write+0x94/0xd0
? __check_object_size.part.0+0x10a/0x150
? selinux_file_permission+0x104/0x150
vfs_write+0xb5/0x290
ksys_write+0x5f/0xe0
do_syscall_64+0x3b/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7ff919a18127
[...]
---[ end trace 0000000000000000 ]---
The warning complains about an attempt to unregister an
unregistered tracepoint.
This happens on trace-cmd because it first stops tracing, and
then switches the tracer to nop. Which is equivalent to:
# cd /sys/kernel/tracing/
# echo osnoise > current_tracer
# echo 0 > tracing_on
# echo nop > current_tracer
The osnoise tracer stops the workload when no trace instance
is actually collecting data. This can be caused both by
disabling tracing or disabling the tracer itself.
To avoid unregistering events twice, use the existing
trace_osnoise_callback_enabled variable to check if the events
(and the workload) are actually active before trying to
deactivate them.
Link: https://lore.kernel.org/all/c898d1911f7f9303b7e14726e7cc9678fbfb4a0e.camel@redhat.com/
Link: https://lkml.kernel.org/r/938765e17d5a781c2df429a98f0b2e7cc317b022.1646823913.git.bristot@kernel.org
Cc: stable@vger.kernel.org
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Fixes: 2fac8d6486 ("tracing/osnoise: Allow multiple instances of the same tracer")
Reported-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Steffen Klassert says:
====================
pull request (net): ipsec 2022-03-09
1) Fix IPv6 PMTU discovery for xfrm interfaces.
From Lina Wang.
2) Revert failing for policies and states that are
configured with XFRMA_IF_ID 0. It broke a
user configuration. From Kai Lueke.
3) Fix a possible buffer overflow in the ESP output path.
4) Fix ESP GSO for tunnel and BEET mode on inter address
family tunnels.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When two ax25 devices attempted to establish connection, the requester use ax25_create(),
ax25_bind() and ax25_connect() to initiate connection. The receiver use ax25_rcv() to
accept connection and use ax25_create_cb() in ax25_rcv() to create ax25_cb, but the
ax25_cb->sk is NULL. When the receiver is detaching, a NULL pointer dereference bug
caused by sock_hold(sk) in ax25_kill_by_device() will happen. The corresponding
fail log is shown below:
===============================================================
BUG: KASAN: null-ptr-deref in ax25_device_event+0xfd/0x290
Call Trace:
...
ax25_device_event+0xfd/0x290
raw_notifier_call_chain+0x5e/0x70
dev_close_many+0x174/0x220
unregister_netdevice_many+0x1f7/0xa60
unregister_netdevice_queue+0x12f/0x170
unregister_netdev+0x13/0x20
mkiss_close+0xcd/0x140
tty_ldisc_release+0xc0/0x220
tty_release_struct+0x17/0xa0
tty_release+0x62d/0x670
...
This patch add condition check in ax25_kill_by_device(). If s->sk is
NULL, it will goto if branch to kill device.
Fixes: 4e0f718daf ("ax25: improve the incomplete fix to avoid UAF and NPD bugs")
Reported-by: Thomas Osterried <thomas@osterried.de>
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
This node pointer is returned by of_find_compatible_node() with
refcount incremented. Calling of_node_put() to aovid the refcount leak.
Fixes: 501ef3066c ("net: marvell: prestera: Add driver for Prestera family ASIC devices")
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As the potential failure of the clk_enable(),
it should be better to check it and return error
if fails.
Fixes: b7370112f5 ("lpc32xx: Added ethernet driver")
Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is not recommened to use platform_get_resource(pdev, IORESOURCE_IRQ)
for requesting IRQ's resources any more, as they can be not ready yet in
case of DT-booting.
platform_get_irq() instead is a recommended way for getting IRQ even if
it was not retrieved earlier.
It also makes code simpler because we're getting "int" value right away
and no conversion from resource to int is required.
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Minghao Chi (CGEL ZTE) <chi.minghao@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
As the potential failure of the clk_enable(),
it should be better to check it and return error
if fails.
Fixes: 8a2c9a5ab4 ("net: ethernet: ti: cpts: rework initialization/deinitialization")
Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
As documented, the setup_indirect structure is nested inside
the setup_data structures in the setup_data list. The code currently
accesses the fields inside the setup_indirect structure but only
the sizeof(struct setup_data) is being memremapped. No crash
occurred but this is just due to how the area is remapped under the
covers.
Properly memremap both the setup_data and setup_indirect structures
in these cases before accessing them.
Fixes: b3c72fc9a7 ("x86/boot: Introduce setup_indirect")
Signed-off-by: Ross Philipson <ross.philipson@oracle.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/1645668456-22036-2-git-send-email-ross.philipson@oracle.com
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2022-03-08
This series contains updates to iavf, i40e, and ice drivers.
Michal ensures netdev features are properly updated to reflect VLAN
changes received from PF and adds an additional flag for MSI-X
reinitialization as further differentiation of reinitialization
operations is needed for iavf.
Jake stops disabling of VFs due to failed virtchannel responses for
i40e and ice driver.
Dave moves MTU event notification to the service task to prevent issues
with RTNL lock for ice.
Christophe Jaillet corrects an allocation to GFP_ATOMIC instead of
GFP_KERNEL for ice.
Jedrzej fixes the value for link speed comparison which was preventing
the requested value from being set for ice.
---
Note: This will conflict when merging with net-next. Resolution:
diff --cc drivers/net/ethernet/intel/ice/ice.h
index dc42ff92dbad,3121f9b04f59..000000000000
--- a/drivers/net/ethernet/intel/ice/ice.h
+++ b/drivers/net/ethernet/intel/ice/ice.h
@@@ -484,10 -481,9 +484,11 @@@ enum ice_pf_flags
ICE_FLAG_LEGACY_RX,
ICE_FLAG_VF_TRUE_PROMISC_ENA,
ICE_FLAG_MDD_AUTO_RESET_VF,
+ ICE_FLAG_VF_VLAN_PRUNING,
ICE_FLAG_LINK_LENIENT_MODE_ENA,
ICE_FLAG_PLUG_AUX_DEV,
+ ICE_FLAG_MTU_CHANGED,
+ ICE_FLAG_GNSS, /* GNSS successfully initialized */
ICE_PF_FLAGS_NBITS /* must be last */
};
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When receiving a state message, function tipc_link_validate_msg()
is called to validate its header portion. Then, its data portion
is validated before it can be accessed correctly. However, current
data sanity check is done after the message header is accessed to
update some link variables.
This commit fixes this issue by moving the data sanity check to
the beginning of state message handling and right after the header
sanity check.
Fixes: 9aa422ad32 ("tipc: improve size validations for received domain records")
Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: Tung Nguyen <tung.q.nguyen@dektech.com.au>
Link: https://lore.kernel.org/r/20220308021200.9245-1-tung.q.nguyen@dektech.com.au
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit 9d497e2941 ("block: don't protect submit_bio_checks by
q_usage_counter") moved blk_mq_attempt_bio_merge and rq_qos_throttle
calls out of q_usage_counter protection. However, these functions require
q_usage_counter protection. The blk_mq_attempt_bio_merge call without
the protection resulted in blktests block/005 failure with KASAN null-
ptr-deref or use-after-free at bio merge. The rq_qos_throttle call
without the protection caused kernel hang at qos throttle.
To fix the failures, move the blk_mq_attempt_bio_merge and
rq_qos_throttle calls back to q_usage_counter protection.
Fixes: 9d497e2941 ("block: don't protect submit_bio_checks by q_usage_counter")
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Link: https://lore.kernel.org/r/20220308080915.3473689-1-shinichiro.kawasaki@wdc.com
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Change curr_link_speed advertised speed, due to
link_info.link_speed is not equal phy.curr_user_speed_req.
Without this patch it is impossible to set advertised
speed to same as link_speed.
Testing Hints: Try to set advertised speed
to 25G only with 25G default link (use ethtool -s 0x80000000)
Fixes: 48cb27f2fd ("ice: Implement handlers for ethtool PHY/link operations")
Signed-off-by: Grzegorz Siwik <grzegorz.siwik@intel.com>
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
When a bonded interface is destroyed, .ndo_change_mtu can be called
during the tear-down process while the RTNL lock is held. This is a
problem since the auxiliary driver linked to the LAN driver needs to be
notified of the MTU change, and this requires grabbing a device_lock on
the auxiliary_device's dev. Currently this is being attempted in the
same execution context as the call to .ndo_change_mtu which is causing a
dead-lock.
Move the notification of the changed MTU to a separate execution context
(watchdog service task) and eliminate the "before" notification.
Fixes: 348048e724 ("ice: Implement iidc operations")
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Tested-by: Jonathan Toppins <jtoppins@redhat.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The ice_vc_send_msg_to_vf function has logic to detect "failure"
responses being sent to a VF. If a VF is sent more than
ICE_DFLT_NUM_INVAL_MSGS_ALLOWED then the VF is marked as disabled.
Almost identical logic also existed in the i40e driver.
This logic was added to the ice driver in commit 1071a8358a ("ice:
Implement virtchnl commands for AVF support") which itself copied from
the i40e implementation in commit 5c3c48ac6b ("i40e: implement virtual
device interface").
Neither commit provides a proper explanation or justification of the
check. In fact, later commits to i40e changed the logic to allow
bypassing the check in some specific instances.
The "logic" for this seems to be that error responses somehow indicate a
malicious VF. This is not really true. The PF might be sending an error
for any number of reasons such as lack of resources, etc.
Additionally, this causes the PF to log an info message for every failed
VF response which may confuse users, and can spam the kernel log.
This behavior is not documented as part of any requirement for our
products and other operating system drivers such as the FreeBSD
implementation of our drivers do not include this type of check.
In fact, the change from dev_err to dev_info in i40e commit 18b7af57d9
("i40e: Lower some message levels") explains that these messages
typically don't actually indicate a real issue. It is quite likely that
a user who hits this in practice will be very confused as the VF will be
disabled without an obvious way to recover.
We already have robust malicious driver detection logic using actual
hardware detection mechanisms that detect and prevent invalid device
usage. Remove the logic since its not a documented requirement and the
behavior is not intuitive.
Fixes: 1071a8358a ("ice: Implement virtchnl commands for AVF support")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The i40e_vc_send_msg_to_vf_ex (and its wrapper i40e_vc_send_msg_to_vf)
function has logic to detect "failure" responses sent to the VF. If a VF
is sent more than I40E_DEFAULT_NUM_INVALID_MSGS_ALLOWED, then the VF is
marked as disabled. In either case, a dev_info message is printed
stating that a VF opcode failed.
This logic originates from the early implementation of VF support in
commit 5c3c48ac6b ("i40e: implement virtual device interface").
That commit did not go far enough. The "logic" for this behavior seems
to be that error responses somehow indicate a malicious VF. This is not
really true. The PF might be sending an error for any number of reasons
such as lacking resources, an unsupported operation, etc. This does not
indicate a malicious VF. We already have a separate robust malicious VF
detection which relies on hardware logic to detect and prevent a variety
of behaviors.
There is no justification for this behavior in the original
implementation. In fact, a later commit 18b7af57d9 ("i40e: Lower some
message levels") reduced the opcode failure message from a dev_err to a
dev_info. In addition, recent commit 01cbf50877 ("i40e: Fix to not
show opcode msg on unsuccessful VF MAC change") changed the logic to
allow quieting it for expected failures.
That commit prevented this logic from kicking in for specific
circumstances. This change did not go far enough. The behavior is not
documented nor is it part of any requirement for our products. Other
operating systems such as the FreeBSD implementation of our driver do
not include this logic.
It is clear this check does not make sense, and causes problems which
led to ugly workarounds.
Fix this by just removing the entire logic and the need for the
i40e_vc_send_msg_to_vf_ex function.
Fixes: 01cbf50877 ("i40e: Fix to not show opcode msg on unsuccessful VF MAC change")
Fixes: 5c3c48ac6b ("i40e: implement virtual device interface")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
In some cases overloaded flag IAVF_FLAG_REINIT_ITR_NEEDED
which should indicate that interrupts need to be completely
reinitialized during reset leads to RTNL deadlocks using ethtool -C
while a reset is in progress.
To fix, it was added a new flag IAVF_FLAG_REINIT_MSIX_NEEDED
used to trigger MSI-X reinit.
New combined setting is fixed adopt after VF reset.
This has been implemented by call reinit interrupt scheme
during VF reset.
Without this fix new combined setting has never been adopted.
Fixes: 209f2f9c71 ("iavf: Add support for VIRTCHNL_VF_OFFLOAD_VLAN_V2 negotiation")
Signed-off-by: Grzegorz Szczurek <grzegorzx.szczurek@intel.com>
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Michal Maloszewski <michal.maloszewski@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Modify netdev->features for vlan stripping based on virtual
channel messages received from the PF. Change is needed
to synchronize vlan strip status between PF sysfs and iavf ethtool.
Fixes: 5951a2b981 ("iavf: Fix VLAN feature flags after VFR")
Signed-off-by: Norbert Ciosek <norbertx.ciosek@intel.com>
Signed-off-by: Michal Maloszewski <michal.maloszewski@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Pull devicetree fixes from Rob Herring:
- Fix pinctrl node name warnings in examples
- Add missing 'mux-states' property in ti,tcan104x-can binding
* tag 'devicetree-fixes-for-5.17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
dt-bindings: phy: ti,tcan104x-can: Document mux-states property
dt-bindings: mfd: Fix pinctrl node name warnings
Pull fuse fixes from Miklos Szeredi:
- Fix an issue with splice on the fuse device
- Fix a regression in the fileattr API conversion
- Add a small userspace API improvement
* tag 'fuse-fixes-5.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: fix pipe buffer lifetime for direct_io
fuse: move FUSE_SUPER_MAGIC definition to magic.h
fuse: fix fileattr op failure
Pull arm64 spectre fixes from James Morse:
"ARM64 Spectre-BHB mitigations:
- Make EL1 vectors per-cpu
- Add mitigation sequences to the EL1 and EL2 vectors on vulnerble
CPUs
- Implement ARCH_WORKAROUND_3 for KVM guests
- Report Vulnerable when unprivileged eBPF is enabled"
* tag 'arm64-spectre-bhb-for-v5.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: proton-pack: Include unprivileged eBPF status in Spectre v2 mitigation reporting
arm64: Use the clearbhb instruction in mitigations
KVM: arm64: Allow SMCCC_ARCH_WORKAROUND_3 to be discovered and migrated
arm64: Mitigate spectre style branch history side channels
arm64: proton-pack: Report Spectre-BHB vulnerabilities as part of Spectre-v2
arm64: Add percpu vectors for EL1
arm64: entry: Add macro for reading symbol addresses from the trampoline
arm64: entry: Add vectors that have the bhb mitigation sequences
arm64: entry: Add non-kpti __bp_harden_el1_vectors for mitigations
arm64: entry: Allow the trampoline text to occupy multiple pages
arm64: entry: Make the kpti trampoline's kpti sequence optional
arm64: entry: Move trampoline macros out of ifdef'd section
arm64: entry: Don't assume tramp_vectors is the start of the vectors
arm64: entry: Allow tramp_alias to access symbols after the 4K boundary
arm64: entry: Move the trampoline data page before the text page
arm64: entry: Free up another register on kpti's tramp_exit path
arm64: entry: Make the trampoline cleanup optional
KVM: arm64: Allow indirect vectors to be used without SPECTRE_V3A
arm64: spectre: Rename spectre_v4_patch_fw_mitigation_conduit
arm64: entry.S: Add ventry overflow sanity checks
Pull ARM spectre fixes from Russell King:
"ARM Spectre BHB mitigations.
These patches add Spectre BHB migitations for the following Arm CPUs
to the 32-bit ARM kernels:
- Cortex A15
- Cortex A57
- Cortex A72
- Cortex A73
- Cortex A75
- Brahma B15
for CVE-2022-23960"
* tag 'for-linus-bhb' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: include unprivileged BPF status in Spectre V2 reporting
ARM: Spectre-BHB workaround
ARM: use LOADADDR() to get load address of sections
ARM: early traps initialisation
ARM: report Spectre v2 status through sysfs
The recent addition pinctrl.yaml in commit c09acbc499 ("dt-bindings:
pinctrl: use pinctrl.yaml") resulted in some node name warnings:
Documentation/devicetree/bindings/mfd/cirrus,lochnagar.example.dt.yaml: \
lochnagar-pinctrl: $nodename:0: 'lochnagar-pinctrl' does not match '^(pinctrl|pinmux)(@[0-9a-f]+)?$'
Documentation/devicetree/bindings/mfd/cirrus,madera.example.dt.yaml: \
codec@1a: $nodename:0: 'codec@1a' does not match '^(pinctrl|pinmux)(@[0-9a-f]+)?$'
Documentation/devicetree/bindings/mfd/brcm,cru.example.dt.yaml: \
pin-controller@1c0: $nodename:0: 'pin-controller@1c0' does not match '^(pinctrl|pinmux)(@[0-9a-f]+)?$'
Fix the node names to the preferred 'pinctrl'. For cirrus,madera,
nothing from pinctrl.yaml schema is used, so just drop the reference.
Fixes: c09acbc499 ("dt-bindings: pinctrl: use pinctrl.yaml")
Cc: Rafał Miłecki <rafal@milecki.pl>
Signed-off-by: Rob Herring <robh@kernel.org>
Acked-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Acked-by: Lee Jones <lee.jones@linaro.org>
Link: https://lore.kernel.org/r/20220303232350.2591143-1-robh@kernel.org
This was a prerequisite for the ill-fated
"netfilter: nat: force port remap to prevent shadowing well-known ports".
As this has been reverted, this change can be backed out too.
Signed-off-by: Florian Westphal <fw@strlen.de>
The mitigations for Spectre-BHB are only applied when an exception
is taken, but when unprivileged BPF is enabled, userspace can
load BPF programs that can be used to exploit the problem.
When unprivileged BPF is enabled, report the vulnerable status via
the spectre_v2 sysfs file.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Revert DPI support from binding.
DPI support relies on the bus-type enum which does not yet support
Mipi DPI, since no v4l2_fwnode_bus_type has been defined for this
bus type.
When DPI for anx7625 was initially added, it assumed that
V4L2_FWNODE_BUS_TYPE_PARALLEL was the correct bus type for
representing DPI, which it is not.
In order to prevent adding this mis-usage to the ABI, let's revert
the support.
Signed-off-by: Robert Foss <robert.foss@linaro.org>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
This reverts commit 878aed8db3.
This change breaks existing setups where conntrack is used with
asymmetric paths.
In these cases, the NAT transformation occurs on the syn-ack instead of
the syn:
1. SYN x:12345 -> y -> 443 // sent by initiator, receiverd by responder
2. SYNACK y:443 -> x:12345 // First packet seen by conntrack, as sent by responder
3. tuple_force_port_remap() gets called, sees:
'tcp from 443 to port 12345 NAT' -> pick a new source port, inititor receives
4. SYNACK y:$RANDOM -> x:12345 // connection is never established
While its possible to avoid the breakage with NOTRACK rules, a kernel
update should not break working setups.
An alternative to the revert is to augment conntrack to tag
mid-stream connections plus more code in the nat core to skip NAT
for such connections, however, this leads to more interaction/integration
between conntrack and NAT.
Therefore, revert, users will need to add explicit nat rules to avoid
port shadowing.
Link: https://lore.kernel.org/netfilter-devel/20220302105908.GA5852@breakpoint.cc/#R
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2051413
Signed-off-by: Florian Westphal <fw@strlen.de>
Requesting quad mode for the FMC resulted in an error:
&fmc {
status = "okay";
+ pinctrl-names = "default";
+ pinctrl-0 = <&pinctrl_fwqspi_default>'
[ 0.742963] aspeed-g6-pinctrl 1e6e2000.syscon:pinctrl: invalid function FWQSPID in map table

This is because the quad mode pins are a group of pins, not a function.
After applying this patch we can request the pins and the QSPI data
lines are muxed:
# cat /sys/kernel/debug/pinctrl/1e6e2000.syscon\:pinctrl-aspeed-g6-pinctrl/pinmux-pins |grep 1e620000.spi
pin 196 (AE12): device 1e620000.spi function FWSPID group FWQSPID
pin 197 (AF12): device 1e620000.spi function FWSPID group FWQSPID
pin 240 (Y1): device 1e620000.spi function FWSPID group FWQSPID
pin 241 (Y2): device 1e620000.spi function FWSPID group FWQSPID
pin 242 (Y3): device 1e620000.spi function FWSPID group FWQSPID
pin 243 (Y4): device 1e620000.spi function FWSPID group FWQSPID
Fixes: f510f04c8c ("ARM: dts: aspeed: Add AST2600 pinmux nodes")
Signed-off-by: Joel Stanley <joel@jms.id.au>
Reviewed-by: Andrew Jeffery <andrew@aj.id.au>
Link: https://lore.kernel.org/r/20220304011010.974863-1-joel@jms.id.au
Link: https://lore.kernel.org/r/20220304011010.974863-1-joel@jms.id.au'
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
ARM: tegra: Device tree fixes for v5.17
One more patch to fix up eDP panels on Nyan FHD models.
* tag 'tegra-for-5.17-arm-dt-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux:
ARM: tegra: Move Nyan FHD panels to AUX bus
ARM: tegra: Move panels to AUX bus
Link: https://lore.kernel.org/r/20220308084339.2199400-1-thierry.reding@gmail.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Commit 18107f8a2d ("arm64: Support execute-only permissions with
Enhanced PAN") re-introduced execute-only permissions when EPAN is
available. When EPAN is not available, arch_filter_pgprot() is supposed
to change a PAGE_EXECONLY permission into PAGE_READONLY_EXEC. However,
if BTI or MTE are present, such check does not detect the execute-only
pgprot in the presence of PTE_GP (BTI) or MT_NORMAL_TAGGED (MTE),
allowing the user to request PROT_EXEC with PROT_BTI or PROT_MTE.
Remove the arch_filter_pgprot() function, change the default VM_EXEC
permissions to PAGE_READONLY_EXEC and update the protection_map[] array
at core_initcall() if EPAN is detected.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Fixes: 18107f8a2d ("arm64: Support execute-only permissions with Enhanced PAN")
Cc: <stable@vger.kernel.org> # 5.13.x
Acked-by: Will Deacon <will@kernel.org>
Reviewed-by: Vladimir Murzin <vladimir.murzin@arm.com>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com>
Compiler is not happy:
warning: symbol 'gpio_sim_hog_config_item_ops' was not declared. Should it be static?
Fixes: cb8c474e79 ("gpio: sim: new testing module")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl>
Pull x86 spectre fixes from Borislav Petkov:
- Mitigate Spectre v2-type Branch History Buffer attacks on machines
which support eIBRS, i.e., the hardware-assisted speculation
restriction after it has been shown that such machines are vulnerable
even with the hardware mitigation.
- Do not use the default LFENCE-based Spectre v2 mitigation on AMD as
it is insufficient to mitigate such attacks. Instead, switch to
retpolines on all AMD by default.
- Update the docs and add some warnings for the obviously vulnerable
cmdline configurations.
* tag 'x86_bugs_for_v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/speculation: Warn about eIBRS + LFENCE + Unprivileged eBPF + SMT
x86/speculation: Warn about Spectre v2 LFENCE mitigation
x86/speculation: Update link to AMD speculation whitepaper
x86/speculation: Use generic retpoline by default on AMD
x86/speculation: Include unprivileged eBPF status in Spectre v2 mitigation reporting
Documentation/hw-vuln: Update spectre doc
x86/speculation: Add eIBRS + Retpoline options
x86/speculation: Rename RETPOLINE_AMD to RETPOLINE_LFENCE
arm64: tegra: Device tree fixes for v5.17
This contains a single, last-minute fix to disable the display SMMU by
default because under some circumstances leaving it enabled by default
can cause SMMU faults on boot.
* tag 'tegra-for-5.17-arm64-dt-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux:
arm64: tegra: Disable ISO SMMU for Tegra194
Link: https://lore.kernel.org/r/20220307182120.2169598-1-thierry.reding@gmail.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
I observed the following problem with the BT404 touch pad
running the Phosh UI:
When e.g. typing on the virtual keyboard pressing "g" would
produce "ggg".
After some analysis it turns out the firmware reports that three
fingers hit that coordinate at the same time, finger 0, 2 and
4 (of the five available 0,1,2,3,4).
DOWN
Zinitix-TS 3-0020: finger 0 down (246, 395)
Zinitix-TS 3-0020: finger 1 up (0, 0)
Zinitix-TS 3-0020: finger 2 down (246, 395)
Zinitix-TS 3-0020: finger 3 up (0, 0)
Zinitix-TS 3-0020: finger 4 down (246, 395)
UP
Zinitix-TS 3-0020: finger 0 up (246, 395)
Zinitix-TS 3-0020: finger 2 up (246, 395)
Zinitix-TS 3-0020: finger 4 up (246, 395)
This is one touch and release: i.e. this is all reported on
touch (down) and release.
There is a field in the struct touch_event called finger_cnt
which is actually a bitmask of the fingers active in the
event.
Rename this field finger_mask as this matches the use contents
better, then use for_each_set_bit() to iterate over just the
fingers that are actally active.
Factor out a finger reporting function zinitix_report_fingers()
to handle all fingers.
Also be more careful in reporting finger down/up: we were
reporting every event with input_mt_report_slot_state(..., true);
but this should only be reported on finger down or move,
not on finger up, so also add code to check p->sub_status
to see what is happening and report correctly.
After this my Zinitix BT404 touchscreen report fingers
flawlessly.
The vendor drive I have notably does not use the "finger_cnt"
and contains obviously incorrect code like this:
if (touch_dev->touch_info.finger_cnt > MAX_SUPPORTED_FINGER_NUM)
touch_dev->touch_info.finger_cnt = MAX_SUPPORTED_FINGER_NUM;
As MAX_SUPPORTED_FINGER_NUM is an ordinal and the field is
a bitmask this seems quite confused.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/r/20220228233017.2270599-1-linus.walleij@linaro.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Current git tree for Broadcom iProc SoCs is pretty outdated as it has
not updated for a long time. Fix the reference.
Signed-off-by: Kuldeep Singh <singh.kuldeep87k@gmail.com>
Pull MTD fix from Miquel Raynal:
"As part of a previous changeset introducing support for the K3
architecture, the OMAP_GPMC (a non visible symbol) got selected by the
selection of MTD_NAND_OMAP2 instead of doing so from the architecture
directly (like for the other users of these two drivers). Indeed, from
a hardware perspective, the OMAP NAND controller needs the GPMC to
work.
This led to a robot error which got addressed in fix merge into -rc4.
Unfortunately, the approach at this time still used "select" and lead
to further build error reports (sparc64:allmodconfig).
This time we switch to 'depends on' in order to prevent random
misconfigurations. The different dependencies will however need a
future cleanup"
* tag 'mtd/fixes-for-5.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux:
mtd: rawnand: omap2: Actually prevent invalid configuration and build error
Pull virtio fixes from Michael Tsirkin:
"Some last minute fixes that took a while to get ready. Not
regressions, but they look safe and seem to be worth to have"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
tools/virtio: handle fallout from folio work
tools/virtio: fix virtio_test execution
vhost: remove avail_event arg from vhost_update_avail_event()
virtio: drop default for virtio-mem
vdpa: fix use-after-free on vp_vdpa_remove
virtio-blk: Remove BUG_ON() in virtio_queue_rq()
virtio-blk: Don't use MAX_DISCARD_SEGMENTS if max_discard_seg is zero
vhost: fix hung thread due to erroneous iotlb entries
vduse: Fix returning wrong type in vduse_domain_alloc_iova()
vdpa/mlx5: add validation for VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET command
vdpa/mlx5: should verify CTRL_VQ feature exists for MQ
vdpa: factor out vdpa_set_features_unlocked for vdpa internal use
virtio_console: break out of buf poll on remove
virtio: document virtio_reset_device
virtio: acknowledge all features before access
virtio: unexport virtio_finalize_features
Unfortunately, we ended up merging an old version of the patch "fix info
leak with DMA_FROM_DEVICE" instead of merging the latest one. Christoph
(the swiotlb maintainer), he asked me to create an incremental fix
(after I have pointed this out the mix up, and asked him for guidance).
So here we go.
The main differences between what we got and what was agreed are:
* swiotlb_sync_single_for_device is also required to do an extra bounce
* We decided not to introduce DMA_ATTR_OVERWRITE until we have exploiters
* The implantation of DMA_ATTR_OVERWRITE is flawed: DMA_ATTR_OVERWRITE
must take precedence over DMA_ATTR_SKIP_CPU_SYNC
Thus this patch removes DMA_ATTR_OVERWRITE, and makes
swiotlb_sync_single_for_device() bounce unconditionally (that is, also
when dir == DMA_TO_DEVICE) in order do avoid synchronising back stale
data from the swiotlb buffer.
Let me note, that if the size used with dma_sync_* API is less than the
size used with dma_[un]map_*, under certain circumstances we may still
end up with swiotlb not being transparent. In that sense, this is no
perfect fix either.
To get this bullet proof, we would have to bounce the entire
mapping/bounce buffer. For that we would have to figure out the starting
address, and the size of the mapping in
swiotlb_sync_single_for_device(). While this does seem possible, there
seems to be no firm consensus on how things are supposed to work.
Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
Fixes: ddbd89deb7 ("swiotlb: fix info leak with DMA_FROM_DEVICE")
Cc: stable@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Similarly to what was earlier done for other Nyan variants, move the eDP
panel on the FHD models to the AUX bus as well.
Suggested-by: Dmitry Osipenko <digetx@gmail.com>
Fixes: ef6fb9875c ("ARM: tegra: Add device-tree for 1080p version of Nyan Big")
Signed-off-by: Thierry Reding <treding@nvidia.com>
The mitigations for Spectre-BHB are only applied when an exception is
taken from user-space. The mitigation status is reported via the spectre_v2
sysfs vulnerabilities file.
When unprivileged eBPF is enabled the mitigation in the exception vectors
can be avoided by an eBPF program.
When unprivileged eBPF is enabled, print a warning and report vulnerable
via the sysfs vulnerabilities file.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
The root of the problem is that we are selecting symbols that have
dependencies. This can cause random configurations that can fail.
The cleanest solution is to avoid using select.
This driver uses interfaces from the OMAP_GPMC driver so we have to
depend on it instead.
Fixes: 4cd335dae3 ("mtd: rawnand: omap2: Prevent invalid configuration and build error")
Signed-off-by: Roger Quadros <rogerq@kernel.org>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/linux-mtd/20220219193600.24892-1-rogerq@kernel.org
In FOPEN_DIRECT_IO mode, fuse_file_write_iter() calls
fuse_direct_write_iter(), which normally calls fuse_direct_io(), which then
imports the write buffer with fuse_get_user_pages(), which uses
iov_iter_get_pages() to grab references to userspace pages instead of
actually copying memory.
On the filesystem device side, these pages can then either be read to
userspace (via fuse_dev_read()), or splice()d over into a pipe using
fuse_dev_splice_read() as pipe buffers with &nosteal_pipe_buf_ops.
This is wrong because after fuse_dev_do_read() unlocks the FUSE request,
the userspace filesystem can mark the request as completed, causing write()
to return. At that point, the userspace filesystem should no longer have
access to the pipe buffer.
Fix by copying pages coming from the user address space to new pipe
buffers.
Reported-by: Jann Horn <jannh@google.com>
Fixes: c3021629a0 ("fuse: support splice() reading from fuse device")
Cc: <stable@vger.kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Currently we are observing occasional screen flickering when
PSR2 selective fetch is enabled. More specifically glitch seems
to happen on full frame update when cursor moves to coords
x = -1 or y = -1.
According to Bspec SF Single full frame should not be set if
SF Partial Frame Enable is not set. This happened to be true for
ADLP as PSR2_MAN_TRK_CTL_ENABLE is always set and for ADL_P it's
actually "SF Partial Frame Enable" (Bit 31).
Setting "SF Partial Frame Enable" bit also on full update seems to
fix screen flickering.
Also make code more clear by setting PSR2_MAN_TRK_CTL_ENABLE
only if not on ADL_P. Bit 31 has different meaning in ADL_P.
Bspec: 49274
v2: Fix Mihai Harpau email address
v3: Modify commit message and remove unnecessary comment
Tested-by: Lyude Paul <lyude@redhat.com>
Fixes: 7f6002e580 ("drm/i915/display: Enable PSR2 selective fetch by default")
Reported-by: Lyude Paul <lyude@redhat.com>
Cc: Mihai Harpau <mharpau@gmail.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Bugzilla: https://gitlab.freedesktop.org/drm/intel/-/issues/5077
Signed-off-by: Jouni Högander <jouni.hogander@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220225070228.855138-1-jouni.hogander@intel.com
(cherry picked from commit 8d5516d18b)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Some GPIO lines have stopped working after the patch
commit 2ab73c6d83 ("gpio: Support GPIO controllers without pin-ranges")
And this has supposedly been fixed in the following patches
commit 89ad556b7f ("gpio: Avoid using pin ranges with !PINCTRL")
commit 6dbbf84603 ("gpiolib: Don't free if pin ranges are not defined")
But an erratic behavior where some GPIO lines work while others do not work
has been introduced.
This patch reverts those changes so that the sysfs-gpio interface works
properly again.
Signed-off-by: Marcelo Roberto Jimenez <marcelo.jimenez@gmail.com>
Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl>
Add the number of interrupts per bank for Tegra241 (Grace) to
fix the probe failure.
Fixes: d1056b771d ("gpio: tegra186: Add support for Tegra241")
Signed-off-by: Akhil R <akhilrajeev@nvidia.com>
Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl>
According to Documentation/driver-api/usb/URB.rst when a device
is unplugged usb_submit_urb() returns -ENODEV.
This error code propagates all the way up to usbnet_read_cmd() and
usbnet_write_cmd() calls inside the smsc95xx.c driver during
Ethernet cable unplug, unbind or reboot.
This causes the following errors to be shown on reboot, for example:
ci_hdrc ci_hdrc.1: remove, state 1
usb usb2: USB disconnect, device number 1
usb 2-1: USB disconnect, device number 2
usb 2-1.1: USB disconnect, device number 3
smsc95xx 2-1.1:1.0 eth1: unregister 'smsc95xx' usb-ci_hdrc.1-1.1, smsc95xx USB 2.0 Ethernet
smsc95xx 2-1.1:1.0 eth1: Failed to read reg index 0x00000114: -19
smsc95xx 2-1.1:1.0 eth1: Error reading MII_ACCESS
smsc95xx 2-1.1:1.0 eth1: __smsc95xx_mdio_read: MII is busy
smsc95xx 2-1.1:1.0 eth1: Failed to read reg index 0x00000114: -19
smsc95xx 2-1.1:1.0 eth1: Error reading MII_ACCESS
smsc95xx 2-1.1:1.0 eth1: __smsc95xx_mdio_read: MII is busy
smsc95xx 2-1.1:1.0 eth1: hardware isn't capable of remote wakeup
usb 2-1.4: USB disconnect, device number 4
ci_hdrc ci_hdrc.1: USB bus 2 deregistered
ci_hdrc ci_hdrc.0: remove, state 4
usb usb1: USB disconnect, device number 1
ci_hdrc ci_hdrc.0: USB bus 1 deregistered
imx2-wdt 30280000.watchdog: Device shutdown: Expect reboot!
reboot: Restarting system
Ignore the -ENODEV errors inside __smsc95xx_mdio_read() and
__smsc95xx_phy_wait_not_busy() and do not print error messages
when -ENODEV is returned.
Fixes: a049a30fc2 ("net: usb: Correct PHY handling of smsc95xx")
Signed-off-by: Fabio Estevam <festevam@denx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Clang static analysis reports this issue
qed_sriov.c:4727:19: warning: Assigned value is
garbage or undefined
ivi->max_tx_rate = tx_rate ? tx_rate : link.speed;
^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
link is only sometimes set by the call to qed_iov_get_link()
qed_iov_get_link fails without setting link or returning
status. So change the decl to return status.
Fixes: 73390ac9d8 ("qed*: support ndo_get_vf_config")
Signed-off-by: Tom Rix <trix@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The esp tunnel GSO handlers use skb_mac_gso_segment to
push the inner packet to the segmentation handlers.
However, skb_mac_gso_segment takes the Ethernet Protocol
ID from 'skb->protocol' which is wrong for inter address
family tunnels. We fix this by introducing a new
skb_eth_gso_segment function.
This function can be used if it is necessary to pass the
Ethernet Protocol ID directly to the segmentation handler.
First users of this function will be the esp4 and esp6
tunnel segmentation handlers.
Fixes: c35fe4106b ("xfrm: Add mode handlers for IPsec on layer 2")
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
The xfrm{4,6}_beet_gso_segment() functions did not correctly set the
SKB_GSO_IPXIP4 and SKB_GSO_IPXIP6 gso types for the address family
tunneling case. Fix this by setting these gso types.
Fixes: 384a46ea7b ("esp4: add gso_segment for esp4 beet mode")
Fixes: 7f9e40eb18 ("esp6: add gso_segment for esp6 beet mode")
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
The maximum message size that can be send is bigger than
the maximum site that skb_page_frag_refill can allocate.
So it is possible to write beyond the allocated buffer.
Fix this by doing a fallback to COW in that case.
v2:
Avoid get get_order() costs as suggested by Linus Torvalds.
Fixes: cac2661c53 ("esp4: Avoid skb_cow_data whenever possible")
Fixes: 03e2a30f6a ("esp6: Avoid skb_cow_data whenever possible")
Reported-by: valis <sec@valis.email>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
When the driver fails to register net device, it should free the DMA
region first, and then do other cleanup.
Signed-off-by: Zheyu Ma <zheyuma97@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
The skb->len field is read after the packet is sent to the network
stack. In the meantime, skb can be freed. This patch fixes this bug.
Fixes: c3e6b2c35b ("net: lantiq_xrx200: add ingress SG DMA support")
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl>
Acked-by: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
The function dma_alloc_coherent() in qed_vf_hw_prepare() can fail, so
its return value should be checked.
Fixes: 1408cc1fa4 ("qed: Introduce VFs")
Reported-by: TOTE Robot <oslab@tsinghua.edu.cn>
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The function dma_set_mask() in setup_hw() can fail, so its return value
should be checked.
Fixes: 1700fe1a10 ("Add mISDN HFC PCI driver")
Reported-by: TOTE Robot <oslab@tsinghua.edu.cn>
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 76bfc7ccc2 ("mmc: core: adjust polling interval for CMD1"),
significantly decreased the polling period from ~10-12ms into just a couple
of us. The purpose was to decrease the total time spent in the busy polling
loop, but unfortunate it has lead to problems, that causes eMMC cards to
never gets out busy and thus fails to be initialized.
To fix the problem, but also to try to keep some of the new improved
behaviour, let's start by using a polling period of 1-2ms, which then
increases for each loop, according to common polling loop in
__mmc_poll_for_busy().
Reported-by: Jean Rene Dawin <jdawin@math.uni-bielefeld.de>
Reported-by: H. Nikolaus Schaller <hns@goldelico.com>
Cc: Huijin Park <huijin.park@samsung.com>
Fixes: 76bfc7ccc2 ("mmc: core: adjust polling interval for CMD1")
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Tested-by: Jean Rene Dawin <jdawin@math.uni-bielefeld.de>
Tested-by: H. Nikolaus Schaller <hns@goldelico.com>
Link: https://lore.kernel.org/r/20220304105656.149281-1-ulf.hansson@linaro.org
When calling gnttab_end_foreign_access_ref() the returned value must
be tested and the reaction to that value should be appropriate.
In case of failure in xennet_get_responses() the reaction should not be
to crash the system, but to disable the network device.
The calls in setup_netfront() can be replaced by calls of
gnttab_end_foreign_access(). While at it avoid double free of ring
pages and grant references via xennet_disconnect_backend() in this case.
This is CVE-2022-23042 / part of XSA-396.
Reported-by: Demi Marie Obenour <demi@invisiblethingslab.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
V2:
- avoid double free
V3:
- remove pointless initializer (Jan Beulich)
gnttab_end_foreign_access() is used to free a grant reference and
optionally to free the associated page. In case the grant is still in
use by the other side processing is being deferred. This leads to a
problem in case no page to be freed is specified by the caller: the
caller doesn't know that the page is still mapped by the other side
and thus should not be used for other purposes.
The correct way to handle this situation is to take an additional
reference to the granted page in case handling is being deferred and
to drop that reference when the grant reference could be freed
finally.
This requires that there are no users of gnttab_end_foreign_access()
left directly repurposing the granted page after the call, as this
might result in clobbered data or information leaks via the not yet
freed grant reference.
This is part of CVE-2022-23041 / XSA-396.
Reported-by: Simon Gaiser <simon@invisiblethingslab.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
V4:
- expand comment in header
V5:
- get page ref in case of kmalloc() failure, too
Instead of __get_free_pages() and free_pages() use alloc_pages_exact()
and free_pages_exact(). This is in preparation of a change of
gnttab_end_foreign_access() which will prohibit use of high-order
pages.
This is part of CVE-2022-23041 / XSA-396.
Reported-by: Simon Gaiser <simon@invisiblethingslab.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
V4:
- new patch
Instead of __get_free_pages() and free_pages() use alloc_pages_exact()
and free_pages_exact(). This is in preparation of a change of
gnttab_end_foreign_access() which will prohibit use of high-order
pages.
By using the local variable "order" instead of ring->intf->ring_order
in the error path of xen_9pfs_front_alloc_dataring() another bug is
fixed, as the error path can be entered before ring->intf->ring_order
is being set.
By using alloc_pages_exact() the size in bytes is specified for the
allocation, which fixes another bug for the case of
order < (PAGE_SHIFT - XEN_PAGE_SHIFT).
This is part of CVE-2022-23041 / XSA-396.
Reported-by: Simon Gaiser <simon@invisiblethingslab.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
V4:
- new patch
The usage of gnttab_end_foreign_access() in xenhcd_gnttab_done() is
not safe against a malicious backend, as the backend could keep the
I/O page mapped and modify it even after the granted memory page is
being used for completely other purposes in the local system.
So replace that use case with gnttab_try_end_foreign_access() and
disable the PV host adapter in case the backend didn't stop using the
granted page.
In xenhcd_urb_request_done() immediately return in case of setting
the device state to "error" instead of looking into further backend
responses.
Reported-by: Demi Marie Obenour <demi@invisiblethingslab.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
V2:
- use gnttab_try_end_foreign_access()
Remove gnttab_query_foreign_access(), as it is unused and unsafe to
use.
All previous use cases assumed a grant would not be in use after
gnttab_query_foreign_access() returned 0. This information is useless
in best case, as it only refers to a situation in the past, which could
have changed already.
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Using gnttab_query_foreign_access() is unsafe, as it is racy by design.
The use case in the gntalloc driver is not needed at all. While at it
replace the call of gnttab_end_foreign_access_ref() with a call of
gnttab_end_foreign_access(), which is what is really wanted there. In
case the grant wasn't used due to an allocation failure, just free the
grant via gnttab_free_grant_reference().
This is CVE-2022-23039 / part of XSA-396.
Reported-by: Demi Marie Obenour <demi@invisiblethingslab.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
V3:
- fix __del_gref() (Jan Beulich)
It isn't enough to check whether a grant is still being in use by
calling gnttab_query_foreign_access(), as a mapping could be realized
by the other side just after having called that function.
In case the call was done in preparation of revoking a grant it is
better to do so via gnttab_try_end_foreign_access() and check the
success of that operation instead.
This is CVE-2022-23038 / part of XSA-396.
Reported-by: Demi Marie Obenour <demi@invisiblethingslab.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
V2:
- use gnttab_try_end_foreign_access()
It isn't enough to check whether a grant is still being in use by
calling gnttab_query_foreign_access(), as a mapping could be realized
by the other side just after having called that function.
In case the call was done in preparation of revoking a grant it is
better to do so via gnttab_end_foreign_access_ref() and check the
success of that operation instead.
This is CVE-2022-23037 / part of XSA-396.
Reported-by: Demi Marie Obenour <demi@invisiblethingslab.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
V2:
- use gnttab_try_end_foreign_access()
V3:
- don't use gnttab_try_end_foreign_access()
It isn't enough to check whether a grant is still being in use by
calling gnttab_query_foreign_access(), as a mapping could be realized
by the other side just after having called that function.
In case the call was done in preparation of revoking a grant it is
better to do so via gnttab_end_foreign_access_ref() and check the
success of that operation instead.
For the ring allocation use alloc_pages_exact() in order to avoid
high order pages in case of a multi-page ring.
If a grant wasn't unmapped by the backend without persistent grants
being used, set the device state to "error".
This is CVE-2022-23036 / part of XSA-396.
Reported-by: Demi Marie Obenour <demi@invisiblethingslab.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
---
V2:
- use gnttab_try_end_foreign_access()
V4:
- use alloc_pages_exact() and free_pages_exact()
- set state to error if backend didn't unmap (Roger Pau Monné)
Add a new grant table function gnttab_try_end_foreign_access(), which
will remove and free a grant if it is not in use.
Its main use case is to either free a grant if it is no longer in use,
or to take some other action if it is still in use. This other action
can be an error exit, or (e.g. in the case of blkfront persistent grant
feature) some special handling.
This is CVE-2022-23036, CVE-2022-23038 / part of XSA-396.
Reported-by: Demi Marie Obenour <demi@invisiblethingslab.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
V2:
- new patch
V4:
- add comments to header (Jan Beulich)
Letting xenbus_grant_ring() tear down grants in the error case is
problematic, as the other side could already have used these grants.
Calling gnttab_end_foreign_access_ref() without checking success is
resulting in an unclear situation for any caller of xenbus_grant_ring()
as in the error case the memory pages of the ring page might be
partially mapped. Freeing them would risk unwanted foreign access to
them, while not freeing them would leak memory.
In order to remove the need to undo any gnttab_grant_foreign_access()
calls, use gnttab_alloc_grant_references() to make sure no further
error can occur in the loop granting access to the ring pages.
It should be noted that this way of handling removes leaking of
grant entries in the error case, too.
This is CVE-2022-23040 / part of XSA-396.
Reported-by: Demi Marie Obenour <demi@invisiblethingslab.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Our skiroot_defconfig doesn't enable FTRACE, and so doesn't get
STACKTRACE enabled either. That leads to a build failure since commit
1614b2b11f ("arch: Make ARCH_STACKWALK independent of STACKTRACE")
made stacktrace.c build even when STACKTRACE=n.
arch/powerpc/kernel/stacktrace.c: In function ‘handle_backtrace_ipi’:
arch/powerpc/kernel/stacktrace.c:171:2: error: implicit declaration of function ‘nmi_cpu_backtrace’
171 | nmi_cpu_backtrace(regs);
| ^~~~~~~~~~~~~~~~~
arch/powerpc/kernel/stacktrace.c: In function ‘arch_trigger_cpumask_backtrace’:
arch/powerpc/kernel/stacktrace.c:226:2: error: implicit declaration of function ‘nmi_trigger_cpumask_backtrace’
226 | nmi_trigger_cpumask_backtrace(mask, exclude_self, raise_backtrace_ipi);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This happens because our headers haven't defined
arch_trigger_cpumask_backtrace, which causes lib/nmi_backtrace.c not to
build nmi_cpu_backtrace().
The code in question doesn't actually depend on STACKTRACE=y, that was
just added because arch_trigger_cpumask_backtrace() lived in
stacktrace.c for convenience. So drop the dependency on
CONFIG_STACKTRACE, that causes lib/nmi_backtrace.c to build
nmi_cpu_backtrace() etc. and fixes the build.
Fixes: 1614b2b11f ("arch: Make ARCH_STACKWALK independent of STACKTRACE")
[mpe: Cherry pick of 5a72345e6a from next into fixes]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220212111349.2806972-1-mpe@ellerman.id.au
Pull btrfs fixes from David Sterba:
"A few more fixes for various problems that have user visible effects
or seem to be urgent:
- fix corruption when combining DIO and non-blocking io_uring over
multiple extents (seen on MariaDB)
- fix relocation crash due to premature return from commit
- fix quota deadlock between rescan and qgroup removal
- fix item data bounds checks in tree-checker (found on a fuzzed
image)
- fix fsync of prealloc extents after EOF
- add missing run of delayed items after unlink during log replay
- don't start relocation until snapshot drop is finished
- fix reversed condition for subpage writers locking
- fix warning on page error"
* tag 'for-5.17-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fallback to blocking mode when doing async dio over multiple extents
btrfs: add missing run of delayed items after unlink during log replay
btrfs: qgroup: fix deadlock between rescan worker and remove qgroup
btrfs: fix relocation crash due to premature return from btrfs_commit_transaction()
btrfs: do not start relocation until in progress drops are done
btrfs: tree-checker: use u64 for item data end to avoid overflow
btrfs: do not WARN_ON() if we have PageError set
btrfs: fix lost prealloc extents beyond eof after full fsync
btrfs: subpage: fix a wrong check on subpage->writers
Pull kvm fixes from Paolo Bonzini:
"x86 guest:
- Tweaks to the paravirtualization code, to avoid using them when
they're pointless or harmful
x86 host:
- Fix for SRCU lockdep splat
- Brown paper bag fix for the propagation of errno"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: pull kvm->srcu read-side to kvm_arch_vcpu_ioctl_run
KVM: x86/mmu: Passing up the error state of mmu_alloc_shadow_roots()
KVM: x86: Yield to IPI target vCPU only if it is busy
x86/kvmclock: Fix Hyper-V Isolated VM's boot issue when vCPUs > 64
x86/kvm: Don't waste memory if kvmclock is disabled
x86/kvm: Don't use PV TLB/yield when mwait is advertised
Pull powerpc fix from Michael Ellerman:
"Fix build failure when CONFIG_PPC_64S_HASH_MMU is not set.
Thanks to Murilo Opsfelder Araujo, and Erhard F"
* tag 'powerpc-5.17-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/64s: Fix build failure when CONFIG_PPC_64S_HASH_MMU is not set
Pull tracing fixes from Steven Rostedt:
- Fix sorting on old "cpu" value in histograms
- Fix return value of __setup() boot parameter handlers
* tag 'trace-v5.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Fix return value of __setup handlers
tracing/histogram: Fix sorting on old "cpu" value
There's no special reason why virtio-mem needs a default that's
different from what kconfig provides, any more than e.g. virtio blk.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
When vp_vdpa driver is unbind, vp_vdpa is freed in vdpa_unregister_device
and then vp_vdpa->mdev.pci_dev is dereferenced in vp_modern_remove,
triggering use-after-free.
Call Trace of unbinding driver free vp_vdpa :
do_syscall_64
vfs_write
kernfs_fop_write_iter
device_release_driver_internal
pci_device_remove
vp_vdpa_remove
vdpa_unregister_device
kobject_release
device_release
kfree
Call Trace of dereference vp_vdpa->mdev.pci_dev:
vp_modern_remove
pci_release_selected_regions
pci_release_region
pci_resource_len
pci_resource_end
(dev)->resource[(bar)].end
Signed-off-by: Zhang Min <zhang.min9@zte.com.cn>
Signed-off-by: Yi Wang <wang.yi59@zte.com.cn>
Link: https://lore.kernel.org/r/20220301091059.46869-1-wang.yi59@zte.com.cn
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Fixes: 64b9f64f80 ("vdpa: introduce virtio pci driver")
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Currently we have a BUG_ON() to make sure the number of sg
list does not exceed queue_max_segments() in virtio_queue_rq().
However, the block layer uses queue_max_discard_segments()
instead of queue_max_segments() to limit the sg list for
discard requests. So the BUG_ON() might be triggered if
virtio-blk device reports a larger value for max discard
segment than queue_max_segments(). To fix it, let's simply
remove the BUG_ON() which has become unnecessary after commit
02746e26c39e("virtio-blk: avoid preallocating big SGL for data").
And the unused vblk->sg_elems can also be removed together.
Fixes: 1f23816b8e ("virtio_blk: add discard and write zeroes support")
Suggested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Link: https://lore.kernel.org/r/20220304100058.116-2-xieyongji@bytedance.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Currently the value of max_discard_segment will be set to
MAX_DISCARD_SEGMENTS (256) with no basis in hardware if device
set 0 to max_discard_seg in configuration space. It's incorrect
since the device might not be able to handle such large descriptors.
To fix it, let's follow max_segments restrictions in this case.
Fixes: 1f23816b8e ("virtio_blk: add discard and write zeroes support")
Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Link: https://lore.kernel.org/r/20220304100058.116-1-xieyongji@bytedance.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
In vhost_iotlb_add_range_ctx(), range size can overflow to 0 when
start is 0 and last is ULONG_MAX. One instance where it can happen
is when userspace sends an IOTLB message with iova=size=uaddr=0
(vhost_process_iotlb_msg). So, an entry with size = 0, start = 0,
last = ULONG_MAX ends up in the iotlb. Next time a packet is sent,
iotlb_access_ok() loops indefinitely due to that erroneous entry.
Call Trace:
<TASK>
iotlb_access_ok+0x21b/0x3e0 drivers/vhost/vhost.c:1340
vq_meta_prefetch+0xbc/0x280 drivers/vhost/vhost.c:1366
vhost_transport_do_send_pkt+0xe0/0xfd0 drivers/vhost/vsock.c:104
vhost_worker+0x23d/0x3d0 drivers/vhost/vhost.c:372
kthread+0x2e9/0x3a0 kernel/kthread.c:377
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
</TASK>
Reported by syzbot at:
https://syzkaller.appspot.com/bug?extid=0abd373e2e50d704db87
To fix this, do two things:
1. Return -EINVAL in vhost_chr_write_iter() when userspace asks to map
a range with size 0.
2. Fix vhost_iotlb_add_range_ctx() to handle the range [0, ULONG_MAX]
by splitting it into two entries.
Fixes: 0bbe30668d ("vhost: factor out IOTLB")
Reported-by: syzbot+0abd373e2e50d704db87@syzkaller.appspotmail.com
Tested-by: syzbot+0abd373e2e50d704db87@syzkaller.appspotmail.com
Signed-off-by: Anirudh Rayabharam <mail@anirudhrb.com>
Link: https://lore.kernel.org/r/20220305095525.5145-1-mail@anirudhrb.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
After the blamed commit, dsa_tree_setup_master() may exit without
calling rtnl_unlock(), fix that.
Fixes: c146f9bc19 ("net: dsa: hold rtnl_mutex when calling dsa_master_{setup,teardown}")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull input updates from Dmitry Torokhov:
- a fixup for Goodix touchscreen driver allowing it to work on certain
Cherry Trail devices
- a fix for imbalanced enable/disable regulator in Elam touchpad driver
that became apparent when used with Asus TF103C 2-in-1 dock
- a couple new input keycodes used on newer keyboards
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
HID: add mapping for KEY_ALL_APPLICATIONS
HID: add mapping for KEY_DICTATE
Input: elan_i2c - fix regulator enable count imbalance after suspend/resume
Input: elan_i2c - move regulator_[en|dis]able() out of elan_[en|dis]able_power()
Input: goodix - workaround Cherry Trail devices with a bogus ACPI Interrupt() resource
Input: goodix - use the new soc_intel_is_byt() helper
Input: samsung-keypad - properly state IOMEM dependency
Merge misc fixes from Andrew Morton:
"8 patches.
Subsystems affected by this patch series: mm (hugetlb, pagemap, and
userfaultfd), memfd, selftests, and kconfig"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
configs/debug: set CONFIG_DEBUG_INFO=y properly
proc: fix documentation and description of pagemap
kselftest/vm: fix tests build with old libc
memfd: fix F_SEAL_WRITE after shmem huge page allocated
mm: fix use-after-free when anon vma name is used after vma is freed
mm: prevent vm_area_struct::anon_name refcount saturation
mm: refactor vm_area_struct::anon_vma_name usage code
selftests/vm: cleanup hugetlb file after mremap test
Pull s390 fixes from Vasily Gorbik:
- Fix HAVE_DYNAMIC_FTRACE_WITH_ARGS implementation by providing correct
switching between ftrace_caller/ftrace_regs_caller and supplying
pt_regs only when ftrace_regs_caller is activated.
- Fix exception table sorting.
- Fix breakage of kdump tooling by preserving metadata it cannot
function without.
* tag 's390-5.17-5' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/extable: fix exception table sorting
s390/ftrace: fix arch_ftrace_get_regs implementation
s390/ftrace: fix ftrace_caller/ftrace_regs_caller generation
s390/setup: preserve memory at OLDMEM_BASE and OLDMEM_SIZE
The error message when I build vm tests on debian10 (GLIBC 2.28):
userfaultfd.c: In function `userfaultfd_pagemap_test':
userfaultfd.c:1393:37: error: `MADV_PAGEOUT' undeclared (first use
in this function); did you mean `MADV_RANDOM'?
if (madvise(area_dst, test_pgsize, MADV_PAGEOUT))
^~~~~~~~~~~~
MADV_RANDOM
This patch includes these newer definitions from UAPI linux/mman.h, is
useful to fix tests build on systems without these definitions in glibc
sys/mman.h.
Link: https://lkml.kernel.org/r/20220227055330.43087-2-zhouchengming@bytedance.com
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Reviewed-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Wangyong reports: after enabling tmpfs filesystem to support transparent
hugepage with the following command:
echo always > /sys/kernel/mm/transparent_hugepage/shmem_enabled
the docker program tries to add F_SEAL_WRITE through the following
command, but it fails unexpectedly with errno EBUSY:
fcntl(5, F_ADD_SEALS, F_SEAL_WRITE) = -1.
That is because memfd_tag_pins() and memfd_wait_for_pins() were never
updated for shmem huge pages: checking page_mapcount() against
page_count() is hopeless on THP subpages - they need to check
total_mapcount() against page_count() on THP heads only.
Make memfd_tag_pins() (compared > 1) as strict as memfd_wait_for_pins()
(compared != 1): either can be justified, but given the non-atomic
total_mapcount() calculation, it is better now to be strict. Bear in
mind that total_mapcount() itself scans all of the THP subpages, when
choosing to take an XA_CHECK_SCHED latency break.
Also fix the unlikely xa_is_value() case in memfd_wait_for_pins(): if a
page has been swapped out since memfd_tag_pins(), then its refcount must
have fallen, and so it can safely be untagged.
Link: https://lkml.kernel.org/r/a4f79248-df75-2c8c-3df-ba3317ccb5da@google.com
Signed-off-by: Hugh Dickins <hughd@google.com>
Reported-by: Zeal Robot <zealci@zte.com.cn>
Reported-by: wangyong <wang.yong12@zte.com.cn>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: CGEL ZTE <cgel.zte@gmail.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Song Liu <songliubraving@fb.com>
Cc: Yang Yang <yang.yang29@zte.com.cn>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A deep process chain with many vmas could grow really high. With
default sysctl_max_map_count (64k) and default pid_max (32k) the max
number of vmas in the system is 2147450880 and the refcounter has
headroom of 1073774592 before it reaches REFCOUNT_SATURATED
(3221225472).
Therefore it's unlikely that an anonymous name refcounter will overflow
with these defaults. Currently the max for pid_max is PID_MAX_LIMIT
(4194304) and for sysctl_max_map_count it's INT_MAX (2147483647). In
this configuration anon_vma_name refcount overflow becomes theoretically
possible (that still require heavy sharing of that anon_vma_name between
processes).
kref refcounting interface used in anon_vma_name structure will detect a
counter overflow when it reaches REFCOUNT_SATURATED value but will only
generate a warning and freeze the ref counter. This would lead to the
refcounted object never being freed. A determined attacker could leak
memory like that but it would be rather expensive and inefficient way to
do so.
To ensure anon_vma_name refcount does not overflow, stop anon_vma_name
sharing when the refcount reaches REFCOUNT_MAX (2147483647), which still
leaves INT_MAX/2 (1073741823) values before the counter reaches
REFCOUNT_SATURATED. This should provide enough headroom for raising the
refcounts temporarily.
Link: https://lkml.kernel.org/r/20220223153613.835563-2-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Suggested-by: Michal Hocko <mhocko@suse.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Alexey Gladkov <legion@kernel.org>
Cc: Chris Hyser <chris.hyser@oracle.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Colin Cross <ccross@google.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Xiaofeng Cao <caoxiaofeng@yulong.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
dsp_pipeline_build() allocates dup pointer by kstrdup(cfg),
but then it updates dup variable by strsep(&dup, "|").
As a result when it calls kfree(dup), the dup variable contains NULL.
Found by Linux Driver Verification project (linuxtesting.org) with SVACE.
Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Fixes: 960366cf8d ("Add mISDN DSP")
Signed-off-by: David S. Miller <davem@davemloft.net>
Workaround the Spectre BHB issues for Cortex-A15, Cortex-A57,
Cortex-A72, Cortex-A73 and Cortex-A75. We also include Brahma B15 as
well to be safe, which is affected by Spectre V2 in the same ways as
Cortex-A15.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Use the linker's LOADADDR() macro to get the load address of the
sections, and provide a macro to set the start and end symbols.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Provide a couple of helpers to copy the vectors and stubs, and also
to flush the copied vectors and stubs.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
As per other architectures, add support for reporting the Spectre
vulnerability status via sysfs CPU.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
The following build failure occurs when CONFIG_PPC_64S_HASH_MMU is not
set:
arch/powerpc/kernel/setup_64.c: In function ‘setup_per_cpu_areas’:
arch/powerpc/kernel/setup_64.c:811:21: error: ‘mmu_linear_psize’ undeclared (first use in this function); did you mean ‘mmu_virtual_psize’?
811 | if (mmu_linear_psize == MMU_PAGE_4K)
| ^~~~~~~~~~~~~~~~
| mmu_virtual_psize
arch/powerpc/kernel/setup_64.c:811:21: note: each undeclared identifier is reported only once for each function it appears in
Move the declaration of mmu_linear_psize outside of
CONFIG_PPC_64S_HASH_MMU ifdef.
After the above is fixed, it fails later with the following error:
ld: arch/powerpc/kexec/file_load_64.o: in function `.arch_kexec_kernel_image_probe':
file_load_64.c:(.text+0x1c1c): undefined reference to `.add_htab_mem_range'
Fix that, too, by conditioning add_htab_mem_range() symbol to
CONFIG_PPC_64S_HASH_MMU.
Fixes: 387e220a2e ("powerpc/64s: Move hash MMU support code under CONFIG_PPC_64S_HASH_MMU")
Reported-by: Erhard F. <erhard_f@mailbox.org>
Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215567
Link: https://lore.kernel.org/r/20220301204743.45133-1-muriloo@linux.ibm.com
The commit
44a3918c82 ("x86/speculation: Include unprivileged eBPF status in Spectre v2 mitigation reporting")
added a warning for the "eIBRS + unprivileged eBPF" combination, which
has been shown to be vulnerable against Spectre v2 BHB-based attacks.
However, there's no warning about the "eIBRS + LFENCE retpoline +
unprivileged eBPF" combo. The LFENCE adds more protection by shortening
the speculation window after a mispredicted branch. That makes an attack
significantly more difficult, even with unprivileged eBPF. So at least
for now the logic doesn't warn about that combination.
But if you then add SMT into the mix, the SMT attack angle weakens the
effectiveness of the LFENCE considerably.
So extend the "eIBRS + unprivileged eBPF" warning to also include the
"eIBRS + LFENCE + unprivileged eBPF + SMT" case.
[ bp: Massage commit message. ]
Suggested-by: Alyssa Milburn <alyssa.milburn@linux.intel.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
With:
f8a66d608a ("x86,bugs: Unconditionally allow spectre_v2=retpoline,amd")
it became possible to enable the LFENCE "retpoline" on Intel. However,
Intel doesn't recommend it, as it has some weaknesses compared to
retpoline.
Now AMD doesn't recommend it either.
It can still be left available as a cmdline option. It's faster than
retpoline but is weaker in certain scenarios -- particularly SMT, but
even non-SMT may be vulnerable in some cases.
So just unconditionally warn if the user requests it on the cmdline.
[ bp: Massage commit message. ]
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
This PHY doesn't support a link-up interrupt source. If aneg is enabled
we use the "aneg complete" interrupt for this purpose, but if aneg is
disabled link-up isn't signaled currently.
According to a vendor driver there's an additional "energy detect"
interrupt source that can be used to signal link-up if aneg is disabled.
We can safely ignore this interrupt source if aneg is enabled.
This patch was tested on a TX3 Mini TV box with S905W (even though
boot message says it's a S905D).
This issue has been existing longer, but due to changes in phylib and
the driver the patch applies only from the commit marked as fixed.
Fixes: 84c8f773d2 ("net: phy: meson-gxl: remove the use of .ack_callback()")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/04cac530-ea1b-850e-6cfa-144a55c4d75d@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull block fix from Jens Axboe:
"Just a small UAF fix for blktrace"
* tag 'block-5.17-2022-03-04' of git://git.kernel.dk/linux-block:
blktrace: fix use after free for struct blk_trace
Pull RISC-V fixes from Palmer Dabbelt:
- Fixes for a handful of KASAN-related crashes.
- A fix to avoid a crash during boot for SPARSEMEM &&
!SPARSEMEM_VMEMMAP configurations.
- A fix to stop reporting some incorrect errors under DEBUG_VIRTUAL.
- A fix for the K210's device tree to properly populate the interrupt
map, so hart1 will get interrupts again.
* tag 'riscv-for-linus-5.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: dts: k210: fix broken IRQs on hart1
riscv: Fix kasan pud population
riscv: Move high_memory initialization to setup_bootmem
riscv: Fix config KASAN && DEBUG_VIRTUAL
riscv: Fix DEBUG_VIRTUAL false warnings
riscv: Fix config KASAN && SPARSEMEM && !SPARSE_VMEMMAP
riscv: Fix is_linear_mapping with recent move of KASAN region
Pull thermal control fix from Rafael Wysocki:
"Fix NULL pointer dereference in the thermal netlink interface (Nicolas
Cavallari)"
* tag 'thermal-5.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
thermal: core: Fix TZ_GET_TRIP NULL pointer dereference
Pull sound fixes from Takashi Iwai:
"Hopefully the last PR for 5.17, including just a few small changes:
an additional fix for ASoC ops boundary check and other minor
device-specific fixes"
* tag 'sound-5.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: intel_hdmi: Fix reference to PCM buffer address
ASoC: cs4265: Fix the duplicated control name
ASoC: ops: Shift tested values in snd_soc_put_volsw() by +min
Pull drm fixes from Dave Airlie:
"Things are quieting down as expected, just a small set of fixes, i915,
exynos, amdgpu, vrr, bridge and hdlcd. Nothing scary at all.
i915:
- Fix GuC SLPC unset command
- Fix misidentification of some Apple MacBook Pro laptops as Jasper Lake
amdgpu:
- Suspend regression fix
exynos:
- irq handling fixes
- Fix two regressions to TE-gpio handling
arm/hdlcd:
- Select DRM_GEM_CMEA_HELPER for HDLCD
bridge:
- ti-sn65dsi86: Properly undo autosuspend
vrr:
- Fix potential NULL-pointer deref"
* tag 'drm-fixes-2022-03-04' of git://anongit.freedesktop.org/drm/drm:
drm/amdgpu: fix suspend/resume hang regression
drm/vrr: Set VRR capable prop only if it is attached to connector
drm/arm: arm hdlcd select DRM_GEM_CMA_HELPER
drm/bridge: ti-sn65dsi86: Properly undo autosuspend
drm/i915: s/JSP2/ICP2/ PCH
drm/i915/guc/slpc: Correct the param count for unset param
drm/exynos: Search for TE-gpio in DSI panel's node
drm/exynos: Don't fail if no TE-gpio is defined for DSI driver
drm/exynos: gsc: Use platform_get_irq() to get the interrupt
drm/exynos/fimc: Use platform_get_irq() to get the interrupt
drm/exynos/exynos_drm_fimd: Use platform_get_irq_byname() to get the interrupt
drm/exynos: mixer: Use platform_get_irq() to get the interrupt
drm/exynos/exynos7_drm_decon: Use platform_get_irq_byname() to get the interrupt
Pull pin control fixes from Linus Walleij:
"These two fixes should fix the issues seen on the OrangePi, first we
needed the correct offset when calling pinctrl_gpio_direction(), and
fixing that made a lockdep issue explode in our face. Both now fixed"
* tag 'pinctrl-v5.17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: sunxi: Use unique lockdep classes for IRQs
pinctrl-sunxi: sunxi_pinctrl_gpio_direction_in/output: use correct offset
__setup() handlers should generally return 1 to indicate that the
boot options have been handled.
Using invalid option values causes the entire kernel boot option
string to be reported as Unknown and added to init's environment
strings, polluting it.
Unknown kernel command line parameters "BOOT_IMAGE=/boot/bzImage-517rc6
kprobe_event=p,syscall_any,$arg1 trace_options=quiet
trace_clock=jiffies", will be passed to user space.
Run /sbin/init as init process
with arguments:
/sbin/init
with environment:
HOME=/
TERM=linux
BOOT_IMAGE=/boot/bzImage-517rc6
kprobe_event=p,syscall_any,$arg1
trace_options=quiet
trace_clock=jiffies
Return 1 from the __setup() handlers so that init's environment is not
polluted with kernel boot options.
Link: lore.kernel.org/r/64644a2f-4a20-bab3-1e15-3b2cdd0defe3@omprussia.ru
Link: https://lkml.kernel.org/r/20220303031744.32356-1-rdunlap@infradead.org
Cc: stable@vger.kernel.org
Fixes: 7bcfaf54f5 ("tracing: Add trace_options kernel command line parameter")
Fixes: e1e232ca6b ("tracing: Add trace_clock=<clock> kernel parameter")
Fixes: 970988e19e ("tracing/kprobe: Add kprobe_event= boot parameter")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: Igor Zhbanov <i.zhbanov@omprussia.ru>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
syzkaller was recently triggering an oversized kvmalloc() warning via
xdp_umem_create().
The triggered warning was added back in 7661809d49 ("mm: don't allow
oversized kvmalloc() calls"). The rationale for the warning for huge
kvmalloc sizes was as a reaction to a security bug where the size was
more than UINT_MAX but not everything was prepared to handle unsigned
long sizes.
Anyway, the AF_XDP related call trace from this syzkaller report was:
kvmalloc include/linux/mm.h:806 [inline]
kvmalloc_array include/linux/mm.h:824 [inline]
kvcalloc include/linux/mm.h:829 [inline]
xdp_umem_pin_pages net/xdp/xdp_umem.c:102 [inline]
xdp_umem_reg net/xdp/xdp_umem.c:219 [inline]
xdp_umem_create+0x6a5/0xf00 net/xdp/xdp_umem.c:252
xsk_setsockopt+0x604/0x790 net/xdp/xsk.c:1068
__sys_setsockopt+0x1fd/0x4e0 net/socket.c:2176
__do_sys_setsockopt net/socket.c:2187 [inline]
__se_sys_setsockopt net/socket.c:2184 [inline]
__x64_sys_setsockopt+0xb5/0x150 net/socket.c:2184
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
Björn mentioned that requests for >2GB allocation can still be valid:
The structure that is being allocated is the page-pinning accounting.
AF_XDP has an internal limit of U32_MAX pages, which is *a lot*, but
still fewer than what memcg allows (PAGE_COUNTER_MAX is a LONG_MAX/
PAGE_SIZE on 64 bit systems). [...]
I could just change from U32_MAX to INT_MAX, but as I stated earlier
that has a hacky feeling to it. [...] From my perspective, the code
isn't broken, with the memcg limits in consideration. [...]
Linus says:
[...] Pretty much every time this has come up, the kernel warning has
shown that yes, the code was broken and there really wasn't a reason
for doing allocations that big.
Of course, some people would be perfectly fine with the allocation
failing, they just don't want the warning. I didn't want __GFP_NOWARN
to shut it up originally because I wanted people to see all those
cases, but these days I think we can just say "yeah, people can shut
it up explicitly by saying 'go ahead and fail this allocation, don't
warn about it'".
So enough time has passed that by now I'd certainly be ok with [it].
Thus allow call-sites to silence such userspace triggered splats if the
allocation requests have __GFP_NOWARN. For xdp_umem_pin_pages()'s call
to kvcalloc() this is already the case, so nothing else needed there.
Fixes: 7661809d49 ("mm: don't allow oversized kvmalloc() calls")
Reported-by: syzbot+11421fbbff99b989670e@syzkaller.appspotmail.com
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: syzbot+11421fbbff99b989670e@syzkaller.appspotmail.com
Cc: Björn Töpel <bjorn@kernel.org>
Cc: Magnus Karlsson <magnus.karlsson@intel.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: David S. Miller <davem@davemloft.net>
Link: https://lore.kernel.org/bpf/CAJ+HfNhyfsT5cS_U9EC213ducHs9k9zNxX9+abqC0kTrPbQ0gg@mail.gmail.com
Link: https://lore.kernel.org/bpf/20211201202905.b9892171e3f5b9a60f9da251@linux-foundation.org
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Ackd-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When control vq receives a VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET command
request from the driver, presently there is no validation against the
number of queue pairs to configure, or even if multiqueue had been
negotiated or not is unverified. This may lead to kernel panic due to
uninitialized resource for the queues were there any bogus request
sent down by untrusted driver. Tie up the loose ends there.
Fixes: 52893733f2 ("vdpa/mlx5: Add multiqueue support")
Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
Link: https://lore.kernel.org/r/1642206481-30721-4-git-send-email-si-wei.liu@oracle.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Eli Cohen <elic@nvidia.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Per VIRTIO v1.1 specification, section 5.1.3.1 Feature bit requirements:
"VIRTIO_NET_F_MQ Requires VIRTIO_NET_F_CTRL_VQ".
There's assumption in the mlx5_vdpa multiqueue code that MQ must come
together with CTRL_VQ. However, there's nowhere in the upper layer to
guarantee this assumption would hold. Were there an untrusted driver
sending down MQ without CTRL_VQ, it would compromise various spots for
e.g. is_index_valid() and is_ctrl_vq_idx(). Although this doesn't end
up with immediate panic or security loophole as of today's code, the
chance for this to be taken advantage of due to future code change is
not zero.
Harden the crispy assumption by failing the set_driver_features() call
when seeing (MQ && !CTRL_VQ). For that end, verify_min_features() is
renamed to verify_driver_features() to reflect the fact that it now does
more than just validate the minimum features. verify_driver_features()
is now used to accommodate various checks against the driver features
for set_driver_features().
Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
Link: https://lore.kernel.org/r/1642206481-30721-3-git-send-email-si-wei.liu@oracle.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Eli Cohen <elic@nvidia.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Some users recently reported that MariaDB was getting a read corruption
when using io_uring on top of btrfs. This started to happen in 5.16,
after commit 51bd9563b6 ("btrfs: fix deadlock due to page faults
during direct IO reads and writes"). That changed btrfs to use the new
iomap flag IOMAP_DIO_PARTIAL and to disable page faults before calling
iomap_dio_rw(). This was necessary to fix deadlocks when the iovector
corresponds to a memory mapped file region. That type of scenario is
exercised by test case generic/647 from fstests.
For this MariaDB scenario, we attempt to read 16K from file offset X
using IOCB_NOWAIT and io_uring. In that range we have 4 extents, each
with a size of 4K, and what happens is the following:
1) btrfs_direct_read() disables page faults and calls iomap_dio_rw();
2) iomap creates a struct iomap_dio object, its reference count is
initialized to 1 and its ->size field is initialized to 0;
3) iomap calls btrfs_dio_iomap_begin() with file offset X, which finds
the first 4K extent, and setups an iomap for this extent consisting
of a single page;
4) At iomap_dio_bio_iter(), we are able to access the first page of the
buffer (struct iov_iter) with bio_iov_iter_get_pages() without
triggering a page fault;
5) iomap submits a bio for this 4K extent
(iomap_dio_submit_bio() -> btrfs_submit_direct()) and increments
the refcount on the struct iomap_dio object to 2; The ->size field
of the struct iomap_dio object is incremented to 4K;
6) iomap calls btrfs_iomap_begin() again, this time with a file
offset of X + 4K. There we setup an iomap for the next extent
that also has a size of 4K;
7) Then at iomap_dio_bio_iter() we call bio_iov_iter_get_pages(),
which tries to access the next page (2nd page) of the buffer.
This triggers a page fault and returns -EFAULT;
8) At __iomap_dio_rw() we see the -EFAULT, but we reset the error
to 0 because we passed the flag IOMAP_DIO_PARTIAL to iomap and
the struct iomap_dio object has a ->size value of 4K (we submitted
a bio for an extent already). The 'wait_for_completion' variable
is not set to true, because our iocb has IOCB_NOWAIT set;
9) At the bottom of __iomap_dio_rw(), we decrement the reference count
of the struct iomap_dio object from 2 to 1. Because we were not
the only ones holding a reference on it and 'wait_for_completion' is
set to false, -EIOCBQUEUED is returned to btrfs_direct_read(), which
just returns it up the callchain, up to io_uring;
10) The bio submitted for the first extent (step 5) completes and its
bio endio function, iomap_dio_bio_end_io(), decrements the last
reference on the struct iomap_dio object, resulting in calling
iomap_dio_complete_work() -> iomap_dio_complete().
11) At iomap_dio_complete() we adjust the iocb->ki_pos from X to X + 4K
and return 4K (the amount of io done) to iomap_dio_complete_work();
12) iomap_dio_complete_work() calls the iocb completion callback,
iocb->ki_complete() with a second argument value of 4K (total io
done) and the iocb with the adjust ki_pos of X + 4K. This results
in completing the read request for io_uring, leaving it with a
result of 4K bytes read, and only the first page of the buffer
filled in, while the remaining 3 pages, corresponding to the other
3 extents, were not filled;
13) For the application, the result is unexpected because if we ask
to read N bytes, it expects to get N bytes read as long as those
N bytes don't cross the EOF (i_size).
MariaDB reports this as an error, as it's not expecting a short read,
since it knows it's asking for read operations fully within the i_size
boundary. This is typical in many applications, but it may also be
questionable if they should react to such short reads by issuing more
read calls to get the remaining data. Nevertheless, the short read
happened due to a change in btrfs regarding how it deals with page
faults while in the middle of a read operation, and there's no reason
why btrfs can't have the previous behaviour of returning the whole data
that was requested by the application.
The problem can also be triggered with the following simple program:
/* Get O_DIRECT */
#ifndef _GNU_SOURCE
#define _GNU_SOURCE
#endif
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include <errno.h>
#include <string.h>
#include <liburing.h>
int main(int argc, char *argv[])
{
char *foo_path;
struct io_uring ring;
struct io_uring_sqe *sqe;
struct io_uring_cqe *cqe;
struct iovec iovec;
int fd;
long pagesize;
void *write_buf;
void *read_buf;
ssize_t ret;
int i;
if (argc != 2) {
fprintf(stderr, "Use: %s <directory>\n", argv[0]);
return 1;
}
foo_path = malloc(strlen(argv[1]) + 5);
if (!foo_path) {
fprintf(stderr, "Failed to allocate memory for file path\n");
return 1;
}
strcpy(foo_path, argv[1]);
strcat(foo_path, "/foo");
/*
* Create file foo with 2 extents, each with a size matching
* the page size. Then allocate a buffer to read both extents
* with io_uring, using O_DIRECT and IOCB_NOWAIT. Before doing
* the read with io_uring, access the first page of the buffer
* to fault it in, so that during the read we only trigger a
* page fault when accessing the second page of the buffer.
*/
fd = open(foo_path, O_CREAT | O_TRUNC | O_WRONLY |
O_DIRECT, 0666);
if (fd == -1) {
fprintf(stderr,
"Failed to create file 'foo': %s (errno %d)",
strerror(errno), errno);
return 1;
}
pagesize = sysconf(_SC_PAGE_SIZE);
ret = posix_memalign(&write_buf, pagesize, 2 * pagesize);
if (ret) {
fprintf(stderr, "Failed to allocate write buffer\n");
return 1;
}
memset(write_buf, 0xab, pagesize);
memset(write_buf + pagesize, 0xcd, pagesize);
/* Create 2 extents, each with a size matching page size. */
for (i = 0; i < 2; i++) {
ret = pwrite(fd, write_buf + i * pagesize, pagesize,
i * pagesize);
if (ret != pagesize) {
fprintf(stderr,
"Failed to write to file, ret = %ld errno %d (%s)\n",
ret, errno, strerror(errno));
return 1;
}
ret = fsync(fd);
if (ret != 0) {
fprintf(stderr, "Failed to fsync file\n");
return 1;
}
}
close(fd);
fd = open(foo_path, O_RDONLY | O_DIRECT);
if (fd == -1) {
fprintf(stderr,
"Failed to open file 'foo': %s (errno %d)",
strerror(errno), errno);
return 1;
}
ret = posix_memalign(&read_buf, pagesize, 2 * pagesize);
if (ret) {
fprintf(stderr, "Failed to allocate read buffer\n");
return 1;
}
/*
* Fault in only the first page of the read buffer.
* We want to trigger a page fault for the 2nd page of the
* read buffer during the read operation with io_uring
* (O_DIRECT and IOCB_NOWAIT).
*/
memset(read_buf, 0, 1);
ret = io_uring_queue_init(1, &ring, 0);
if (ret != 0) {
fprintf(stderr, "Failed to create io_uring queue\n");
return 1;
}
sqe = io_uring_get_sqe(&ring);
if (!sqe) {
fprintf(stderr, "Failed to get io_uring sqe\n");
return 1;
}
iovec.iov_base = read_buf;
iovec.iov_len = 2 * pagesize;
io_uring_prep_readv(sqe, fd, &iovec, 1, 0);
ret = io_uring_submit_and_wait(&ring, 1);
if (ret != 1) {
fprintf(stderr,
"Failed at io_uring_submit_and_wait()\n");
return 1;
}
ret = io_uring_wait_cqe(&ring, &cqe);
if (ret < 0) {
fprintf(stderr, "Failed at io_uring_wait_cqe()\n");
return 1;
}
printf("io_uring read result for file foo:\n\n");
printf(" cqe->res == %d (expected %d)\n", cqe->res, 2 * pagesize);
printf(" memcmp(read_buf, write_buf) == %d (expected 0)\n",
memcmp(read_buf, write_buf, 2 * pagesize));
io_uring_cqe_seen(&ring, cqe);
io_uring_queue_exit(&ring);
return 0;
}
When running it on an unpatched kernel:
$ gcc io_uring_test.c -luring
$ mkfs.btrfs -f /dev/sda
$ mount /dev/sda /mnt/sda
$ ./a.out /mnt/sda
io_uring read result for file foo:
cqe->res == 4096 (expected 8192)
memcmp(read_buf, write_buf) == -205 (expected 0)
After this patch, the read always returns 8192 bytes, with the buffer
filled with the correct data. Although that reproducer always triggers
the bug in my test vms, it's possible that it will not be so reliable
on other environments, as that can happen if the bio for the first
extent completes and decrements the reference on the struct iomap_dio
object before we do the atomic_dec_and_test() on the reference at
__iomap_dio_rw().
Fix this in btrfs by having btrfs_dio_iomap_begin() return -EAGAIN
whenever we try to satisfy a non blocking IO request (IOMAP_NOWAIT flag
set) over a range that spans multiple extents (or a mix of extents and
holes). This avoids returning success to the caller when we only did
partial IO, which is not optimal for writes and for reads it's actually
incorrect, as the caller doesn't expect to get less bytes read than it has
requested (unless EOF is crossed), as previously mentioned. This is also
the type of behaviour that xfs follows (xfs_direct_write_iomap_begin()),
even though it doesn't use IOMAP_DIO_PARTIAL.
A test case for fstests will follow soon.
Link: https://lore.kernel.org/linux-btrfs/CABVffEM0eEWho+206m470rtM0d9J8ue85TtR-A_oVTuGLWFicA@mail.gmail.com/
Link: https://lore.kernel.org/linux-btrfs/CAHF2GV6U32gmqSjLe=XKgfcZAmLCiH26cJ2OnHGp5x=VAH4OHQ@mail.gmail.com/
CC: stable@vger.kernel.org # 5.16+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
A common pattern for device reset is currently:
vdev->config->reset(vdev);
.. cleanup ..
reset prevents new interrupts from arriving and waits for interrupt
handlers to finish.
However if - as is common - the handler queues a work request which is
flushed during the cleanup stage, we have code adding buffers / trying
to get buffers while device is reset. Not good.
This was reproduced by running
modprobe virtio_console
modprobe -r virtio_console
in a loop.
Fix this up by calling virtio_break_device + flush before reset.
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1786239
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The feature negotiation was designed in a way that
makes it possible for devices to know which config
fields will be accessed by drivers.
This is broken since commit 404123c2db ("virtio: allow drivers to
validate features") with fallout in at least block and net. We have a
partial work-around in commit 2f9a174f91 ("virtio: write back
F_VERSION_1 before validate") which at least lets devices find out which
format should config space have, but this is a partial fix: guests
should not access config space without acknowledging features since
otherwise we'll never be able to change the config space format.
To fix, split finalize_features from virtio_finalize_features and
call finalize_features with all feature bits before validation,
and then - if validation changed any bits - once again after.
Since virtio_finalize_features no longer writes out features
rename it to virtio_features_ok - since that is what it does:
checks that features are ok with the device.
As a side effect, this also reduces the amount of hypervisor accesses -
we now only acknowledge features once unless we are clearing any
features when validating (which is uncommon).
IRC I think that this was more or less always the intent in the spec but
unfortunately the way the spec is worded does not say this explicitly, I
plan to address this at the spec level, too.
Acked-by: Jason Wang <jasowang@redhat.com>
Cc: stable@vger.kernel.org
Fixes: 404123c2db ("virtio: allow drivers to validate features")
Fixes: 2f9a174f91 ("virtio: write back F_VERSION_1 before validate")
Cc: "Halil Pasic" <pasic@linux.ibm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
virtio_finalize_features is only used internally within virtio.
No reason to export it.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
There is an oddity in the way the RSR register flags propagate to the
ISR register (and the actual interrupt output) on this hardware: it
appears that RSR register bits only result in ISR being asserted if the
interrupt was actually enabled at the time, so enabling interrupts with
RSR bits already set doesn't trigger an interrupt to be raised. There
was already a partial fix for this race in the macb_poll function where
it checked for RSR bits being set and re-triggered NAPI receive.
However, there was a still a race window between checking RSR and
actually enabling interrupts, where a lost wakeup could happen. It's
necessary to check again after enabling interrupts to see if RSR was set
just prior to the interrupt being enabled, and re-trigger receive in that
case.
This issue was noticed in a point-to-point UDP request-response protocol
which periodically saw timeouts or abnormally high response times due to
received packets not being processed in a timely fashion. In many
applications, more packets arriving, including TCP retransmissions, would
cause the original packet to be processed, thus masking the issue.
Fixes: 02f7a34f34 ("net: macb: Re-enable RX interrupt only when RX is done")
Cc: stable@vger.kernel.org
Co-developed-by: Scott McNutt <scott.mcnutt@siriusxm.com>
Signed-off-by: Scott McNutt <scott.mcnutt@siriusxm.com>
Signed-off-by: Robert Hancock <robert.hancock@calian.com>
Tested-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Luiz Augusto von Dentz says:
====================
bluetooth pull request for net:
- Fix regression with processing of MGMT commands
- Fix unbalanced unlock in Set Device Flags
* tag 'for-net-2022-03-03' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: hci_sync: Fix not processing all entries on cmd_sync_work
Bluetooth: hci_core: Fix unbalanced unlock in set_device_flags()
====================
Link: https://lore.kernel.org/r/20220303210743.314679-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit 67d96729a9 ("riscv: Update Canaan Kendryte K210 device tree")
incorrectly removed two entries from the PLIC interrupt-controller node's
interrupts-extended property.
The PLIC driver cannot know the mapping between hart contexts and hart ids,
so this information has to be provided by device tree, as specified by the
PLIC device tree binding.
The PLIC driver uses the interrupts-extended property, and initializes the
hart context registers in the exact same order as provided by the
interrupts-extended property.
In other words, if we don't specify the S-mode interrupts, the PLIC driver
will simply initialize the hart0 S-mode hart context with the hart1 M-mode
configuration. It is therefore essential to specify the S-mode IRQs even
though the system itself will only ever be running in M-mode.
Re-add the S-mode interrupts, so that we get working IRQs on hart1 again.
Cc: <stable@vger.kernel.org>
Fixes: 67d96729a9 ("riscv: Update Canaan Kendryte K210 device tree")
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
In sv48, the kasan inner regions are not aligned on PGDIR_SIZE and then
when we populate the kasan linear mapping region, we clear the kasan
vmalloc region which is in the same PGD.
Fix this by copying the content of the kasan early pud after allocating a
new PGD for the first time.
Fixes: e8a62cc26d ("riscv: Implement sv48 support")
Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
high_memory used to be initialized in mem_init, way after setup_bootmem.
But a call to dma_contiguous_reserve in this function gives rise to the
below warning because high_memory is equal to 0 and is used at the very
beginning at cma_declare_contiguous_nid.
It went unnoticed since the move of the kasan region redefined
KERN_VIRT_SIZE so that it does not encompass -1 anymore.
Fix this by initializing high_memory in setup_bootmem.
------------[ cut here ]------------
virt_to_phys used for non-linear address: ffffffffffffffff (0xffffffffffffffff)
WARNING: CPU: 0 PID: 0 at arch/riscv/mm/physaddr.c:14 __virt_to_phys+0xac/0x1b8
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 5.17.0-rc1-00007-ga68b89289e26 #27
Hardware name: riscv-virtio,qemu (DT)
epc : __virt_to_phys+0xac/0x1b8
ra : __virt_to_phys+0xac/0x1b8
epc : ffffffff80014922 ra : ffffffff80014922 sp : ffffffff84a03c30
gp : ffffffff85866c80 tp : ffffffff84a3f180 t0 : ffffffff86bce657
t1 : fffffffef09406e8 t2 : 0000000000000000 s0 : ffffffff84a03c70
s1 : ffffffffffffffff a0 : 000000000000004f a1 : 00000000000f0000
a2 : 0000000000000002 a3 : ffffffff8011f408 a4 : 0000000000000000
a5 : 0000000000000000 a6 : 0000000000f00000 a7 : ffffffff84a03747
s2 : ffffffd800000000 s3 : ffffffff86ef4000 s4 : ffffffff8467f828
s5 : fffffff800000000 s6 : 8000000000006800 s7 : 0000000000000000
s8 : 0000000480000000 s9 : 0000000080038ea0 s10: 0000000000000000
s11: ffffffffffffffff t3 : ffffffff84a035c0 t4 : fffffffef09406e8
t5 : fffffffef09406e9 t6 : ffffffff84a03758
status: 0000000000000100 badaddr: 0000000000000000 cause: 0000000000000003
[<ffffffff8322ef4c>] cma_declare_contiguous_nid+0xf2/0x64a
[<ffffffff83212a58>] dma_contiguous_reserve_area+0x46/0xb4
[<ffffffff83212c3a>] dma_contiguous_reserve+0x174/0x18e
[<ffffffff83208fc2>] paging_init+0x12c/0x35e
[<ffffffff83206bd2>] setup_arch+0x120/0x74e
[<ffffffff83201416>] start_kernel+0xce/0x68c
irq event stamp: 0
hardirqs last enabled at (0): [<0000000000000000>] 0x0
hardirqs last disabled at (0): [<0000000000000000>] 0x0
softirqs last enabled at (0): [<0000000000000000>] 0x0
softirqs last disabled at (0): [<0000000000000000>] 0x0
---[ end trace 0000000000000000 ]---
Fixes: f7ae02333d ("riscv: Move KASAN mapping next to the kernel mapping")
Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
__virt_to_phys function is called very early in the boot process (ie
kasan_early_init) so it should not be instrumented by KASAN otherwise it
bugs.
Fix this by declaring phys_addr.c as non-kasan instrumentable.
Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
Fixes: 8ad8b72721 (riscv: Add KASAN support)
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
KERN_VIRT_SIZE used to encompass the kernel mapping before it was
redefined when moving the kasan mapping next to the kernel mapping to only
match the maximum amount of physical memory.
Then, kernel mapping addresses that go through __virt_to_phys are now
declared as wrong which is not true, one can use __virt_to_phys on such
addresses.
Fix this by redefining the condition that matches wrong addresses.
Fixes: f7ae02333d ("riscv: Move KASAN mapping next to the kernel mapping")
Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
In order to get the pfn of a struct page* when sparsemem is enabled
without vmemmap, the mem_section structures need to be initialized which
happens in sparse_init.
But kasan_early_init calls pfn_to_page way before sparse_init is called,
which then tries to dereference a null mem_section pointer.
Fix this by removing the usage of this function in kasan_early_init.
Fixes: 8ad8b72721 ("riscv: Add KASAN support")
Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
The KASAN region was recently moved between the linear mapping and the
kernel mapping, is_linear_mapping used to check the validity of an
address by using the start of the kernel mapping, which is now wrong.
Fix this by using the maximum size of the physical memory.
Fixes: f7ae02333d ("riscv: Move KASAN mapping next to the kernel mapping")
Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
The patchwork link is dead. It says:
404: File not found
The page URL requested (/project/LKML/list/) does not exist.
Remove it.
Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull networking fixes from Jakub Kicinski:
"Including fixes from can, xfrm, wifi, bluetooth, and netfilter.
Lots of various size fixes, the length of the tag speaks for itself.
Most of the 5.17-relevant stuff comes from xfrm, wifi and bt trees
which had been lagging as you pointed out previously. But there's also
a larger than we'd like portion of fixes for bugs from previous
releases.
Three more fixes still under discussion, including and xfrm revert for
uAPI error.
Current release - regressions:
- iwlwifi: don't advertise TWT support, prevent FW crash
- xfrm: fix the if_id check in changelink
- xen/netfront: destroy queues before real_num_tx_queues is zeroed
- bluetooth: fix not checking MGMT cmd pending queue, make scanning
work again
Current release - new code bugs:
- mptcp: make SIOCOUTQ accurate for fallback socket
- bluetooth: access skb->len after null check
- bluetooth: hci_sync: fix not using conn_timeout
- smc: fix cleanup when register ULP fails
- dsa: restore error path of dsa_tree_change_tag_proto
- iwlwifi: fix build error for IWLMEI
- iwlwifi: mvm: propagate error from request_ownership to the user
Previous releases - regressions:
- xfrm: fix pMTU regression when reported pMTU is too small
- xfrm: fix TCP MSS calculation when pMTU is close to 1280
- bluetooth: fix bt_skb_sendmmsg not allocating partial chunks
- ipv6: ensure we call ipv6_mc_down() at most once, prevent leaks
- ipv6: prevent leaks in igmp6 when input queues get full
- fix up skbs delta_truesize in UDP GRO frag_list
- eth: e1000e: fix possible HW unit hang after an s0ix exit
- eth: e1000e: correct NVM checksum verification flow
- ptp: ocp: fix large time adjustments
Previous releases - always broken:
- tcp: make tcp_read_sock() more robust in presence of urgent data
- xfrm: distinguishing SAs and SPs by if_id in xfrm_migrate
- xfrm: fix xfrm_migrate issues when address family changes
- dcb: flush lingering app table entries for unregistered devices
- smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error
- mac80211: fix EAPoL rekey fail in 802.3 rx path
- mac80211: fix forwarded mesh frames AC & queue selection
- netfilter: nf_queue: fix socket access races and bugs
- batman-adv: fix ToCToU iflink problems and check the result belongs
to the expected net namespace
- can: gs_usb, etas_es58x: fix opened_channel_cnt's accounting
- can: rcar_canfd: register the CAN device when fully ready
- eth: igb, igc: phy: drop premature return leaking HW semaphore
- eth: ixgbe: xsk: change !netif_carrier_ok() handling in
ixgbe_xmit_zc(), prevent live lock when link goes down
- eth: stmmac: only enable DMA interrupts when ready
- eth: sparx5: move vlan checks before any changes are made
- eth: iavf: fix races around init, removal, resets and vlan ops
- ibmvnic: more reset flow fixes
Misc:
- eth: fix return value of __setup handlers"
* tag 'net-5.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (92 commits)
ipv6: fix skb drops in igmp6_event_query() and igmp6_event_report()
net: dsa: make dsa_tree_change_tag_proto actually unwind the tag proto change
ixgbe: xsk: change !netif_carrier_ok() handling in ixgbe_xmit_zc()
selftests: mlxsw: resource_scale: Fix return value
selftests: mlxsw: tc_police_scale: Make test more robust
net: dcb: disable softirqs in dcbnl_flush_dev()
bnx2: Fix an error message
sfc: extend the locking on mcdi->seqno
net/smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error cause by server
net/smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error generated by client
net: arcnet: com20020: Fix null-ptr-deref in com20020pci_probe()
tcp: make tcp_read_sock() more robust
bpf, sockmap: Do not ignore orig_len parameter
net: ipa: add an interconnect dependency
net: fix up skbs delta_truesize in UDP GRO frag_list
iwlwifi: mvm: return value for request_ownership
nl80211: Update bss channel on channel switch for P2P_CLIENT
iwlwifi: fix build error for IWLMEI
ptp: ocp: Add ptp_ocp_adjtime_coarse for large adjustments
batman-adv: Don't expect inter-netns unique iflink indices
...
Pull MIPS fixes from Thomas Bogendoerfer:
- Fix memory detection for MT7621 devices
- Fix setnocoherentio kernel option
- Fix warning when CONFIG_SCHED_CORE is enabled
* tag 'mips-fixes-5.17_4' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
MIPS: ralink: mt7621: use bitwise NOT instead of logical
mips: setup: fix setnocoherentio() boolean setting
MIPS: smp: fill in sibling and core maps earlier
MIPS: ralink: mt7621: do memory detection on KSEG1
Pull auxdisplay fixes from Miguel Ojeda:
"A few lcd2s fixes from Andy Shevchenko"
* tag 'auxdisplay-for-linus-v5.17-rc7' of git://github.com/ojeda/linux:
auxdisplay: lcd2s: Use proper API to free the instance of charlcd object
auxdisplay: lcd2s: Fix memory leak in ->remove()
auxdisplay: lcd2s: Fix lcd2s_redefine_char() feature
The blamed commit said one thing but did another. It explains that we
should restore the "return err" to the original "goto out_unwind_tagger",
but instead it replaced it with "goto out_unlock".
When DSA_NOTIFIER_TAG_PROTO fails after the first switch of a
multi-switch tree, the switches would end up not using the same tagging
protocol.
Fixes: 0b0e2ff103 ("net: dsa: restore error path of dsa_tree_change_tag_proto")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20220303154249.1854436-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit c685c69fba ("ixgbe: don't do any AF_XDP zero-copy transmit if
netif is not OK") addressed the ring transient state when
MEM_TYPE_XSK_BUFF_POOL was being configured which in turn caused the
interface to through down/up. Maurice reported that when carrier is not
ok and xsk_pool is present on ring pair, ksoftirqd will consume 100% CPU
cycles due to the constant NAPI rescheduling as ixgbe_poll() states that
there is still some work to be done.
To fix this, do not set work_done to false for a !netif_carrier_ok().
Fixes: c685c69fba ("ixgbe: don't do any AF_XDP zero-copy transmit if netif is not OK")
Reported-by: Maurice Baijens <maurice.baijens@ellips.com>
Tested-by: Maurice Baijens <maurice.baijens@ellips.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ido Schimmel says:
====================
selftests: mlxsw: A couple of fixes
Patch #1 fixes a breakage due to a change in iproute2 output. The real
problem is not iproute2, but the fact that the check was not strict
enough. Fixed by using JSON output instead. Targeting at net so that the
test will pass as part of old and new kernels regardless of iproute2
version.
Patch #2 fixes an issue uncovered by the first one.
====================
Link: https://lore.kernel.org/r/20220302161447.217447-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The test runs several test cases and is supposed to return an error in
case at least one of them failed.
Currently, the check of the return value of each test case is in the
wrong place, which can result in the wrong return value. For example:
# TESTS='tc_police' ./resource_scale.sh
TEST: 'tc_police' [default] 968 [FAIL]
tc police offload count failed
Error: mlxsw_spectrum: Failed to allocate policer index.
We have an error talking to the kernel
Command failed /tmp/tmp.i7Oc5HwmXY:969
TEST: 'tc_police' [default] overflow 969 [ OK ]
...
TEST: 'tc_police' [ipv4_max] overflow 969 [ OK ]
$ echo $?
0
Fix this by moving the check to be done after each test case.
Fixes: 059b18e21c ("selftests: mlxsw: Return correct error code in resource scale test")
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The test adds tc filters and checks how many of them were offloaded by
grepping for 'in_hw'.
iproute2 commit f4cd4f127047 ("tc: add skip_hw and skip_sw to control
action offload") added offload indication to tc actions, producing the
following output:
$ tc filter show dev swp2 ingress
...
filter protocol ipv6 pref 1000 flower chain 0 handle 0x7c0
eth_type ipv6
dst_ip 2001:db8:1::7bf
skip_sw
in_hw in_hw_count 1
action order 1: police 0x7c0 rate 10Mbit burst 100Kb mtu 2Kb action drop overhead 0b
ref 1 bind 1
not_in_hw
used_hw_stats immediate
The current grep expression matches on both 'in_hw' and 'not_in_hw',
resulting in incorrect results.
Fix that by using JSON output instead.
Fixes: 5061e77326 ("selftests: mlxsw: Add scale test for tc-police")
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ido Schimmel points out that since commit 52cff74eef ("dcbnl : Disable
software interrupts before taking dcb_lock"), the DCB API can be called
by drivers from softirq context.
One such in-tree example is the chelsio cxgb4 driver:
dcb_rpl
-> cxgb4_dcb_handle_fw_update
-> dcb_ieee_setapp
If the firmware for this driver happened to send an event which resulted
in a call to dcb_ieee_setapp() at the exact same time as another
DCB-enabled interface was unregistering on the same CPU, the softirq
would deadlock, because the interrupted process was already holding the
dcb_lock in dcbnl_flush_dev().
Fix this unlikely event by using spin_lock_bh() in dcbnl_flush_dev() as
in the rest of the dcbnl code.
Fixes: 91b0383fef ("net: dcb: flush lingering app table entries for unregistered devices")
Reported-by: Ido Schimmel <idosch@idosch.org>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20220302193939.1368823-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
seqno could be read as a stale value outside of the lock. The lock is
already acquired to protect the modification of seqno against a possible
race condition. Place the reading of this value also inside this locking
to protect it against a possible race condition.
Signed-off-by: Niels Dossche <dossche.niels@gmail.com>
Acked-by: Martin Habets <habetsm.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is only one "goto done;" in set_device_flags() and this happens
*before* hci_dev_lock() is called, move the done label to after the
hci_dev_unlock() to fix the following unlock balance:
[ 31.493567] =====================================
[ 31.493571] WARNING: bad unlock balance detected!
[ 31.493576] 5.17.0-rc2+ #13 Tainted: G C E
[ 31.493581] -------------------------------------
[ 31.493584] bluetoothd/685 is trying to release lock (&hdev->lock) at:
[ 31.493594] [<ffffffffc07603f5>] set_device_flags+0x65/0x1f0 [bluetooth]
[ 31.493684] but there are no more locks to release!
Note this bug has been around for a couple of years, but before
commit fe92ee6425 ("Bluetooth: hci_core: Rework hci_conn_params flags")
supported_flags was hardcoded to "((1U << HCI_CONN_FLAG_MAX) - 1)" so
the check for unsupported flags which does the "goto done;" never
triggered.
Fixes: fe92ee6425 ("Bluetooth: hci_core: Rework hci_conn_params flags")
Cc: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
D. Wythe says:
====================
fix unexpected SMC_CLC_DECL_ERR_REGRMB error
We can easily trigger the SMC_CLC_DECL_ERR_REGRMB exception within
following script:
server: smc_run nginx
client: smc_run ./wrk -c 2000 -t 8 -d 20 http://smc-server
And we can clearly see that this error is also divided into two types:
1. 0x09990003
2. 0x05000000/0x09990003
Which has the same root causes, but the immediate causes vary.
The root cause of this issues is that remove connections from link group
is not synchronous with add/delete rtoken entry, which means that even
the number of connections is less that SMC_RMBS_PER_LGR_MAX, it does not
mean that the connection can register rtoken successfully later. In
other words, the rtoken entry may released, This will cause an
unexpected SMC_CLC_DECL_ERR_REGRMB to be reported, and then this SMC
connections have to fallback to TCP.
This patch set handles two types of SMC_CLC_DECL_ERR_REGRMB exceptions
from different perspectives.
Patch 1: fix the 0x05000000/0x09990003 error.
Patch 2: fix the 0x09990003 error.
After those patches, there is no SMC_CLC_DECL_ERR_REGRMB exceptions in
my
test case any more.
v1 -> v2:
- add bugfix patch for SMC_CLC_DECL_ERR_REGRMB cause by server side
v2 -> v3:
- fix incorrect mail thread
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The problem of SMC_CLC_DECL_ERR_REGRMB on the server is very clear.
Based on the fact that whether a new SMC connection can be accepted or
not depends on not only the limit of conn nums, but also the available
entries of rtoken. Since the rtoken release is trigger by peer, while
the conn nums is decrease by local, tons of thing can happen in this
time difference.
This only thing that needs to be mentioned is that now all connection
creations are completely protected by smc_server_lgr_pending lock, it's
enough to check only the available entries in rtokens_used_mask.
Fixes: cd6851f303 ("smc: remote memory buffers (RMBs)")
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The main reason for this unexpected SMC_CLC_DECL_ERR_REGRMB in client
dues to following execution sequence:
Server Conn A: Server Conn B: Client Conn B:
smc_lgr_unregister_conn
smc_lgr_register_conn
smc_clc_send_accept ->
smc_rtoken_add
smcr_buf_unuse
-> Client Conn A:
smc_rtoken_delete
smc_lgr_unregister_conn() makes current link available to assigned to new
incoming connection, while smcr_buf_unuse() has not executed yet, which
means that smc_rtoken_add may fail because of insufficient rtoken_entry,
reversing their execution order will avoid this problem.
Fixes: 3e034725c0 ("net/smc: common functions for RMBs and send buffers")
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
During driver initialization, the pointer of card info, i.e. the
variable 'ci' is required. However, the definition of
'com20020pci_id_table' reveals that this field is empty for some
devices, which will cause null pointer dereference when initializing
these devices.
The following log reveals it:
[ 3.973806] KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f]
[ 3.973819] RIP: 0010:com20020pci_probe+0x18d/0x13e0 [com20020_pci]
[ 3.975181] Call Trace:
[ 3.976208] local_pci_probe+0x13f/0x210
[ 3.977248] pci_device_probe+0x34c/0x6d0
[ 3.977255] ? pci_uevent+0x470/0x470
[ 3.978265] really_probe+0x24c/0x8d0
[ 3.978273] __driver_probe_device+0x1b3/0x280
[ 3.979288] driver_probe_device+0x50/0x370
Fix this by checking whether the 'ci' is a null pointer first.
Fixes: 8c14f9c703 ("ARCNET: add com20020 PCI IDs with metadata")
Signed-off-by: Zheyu Ma <zheyuma97@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to function, the IPA driver very clearly requires the
interconnect framework to be enabled in the kernel configuration.
State that dependency in the Kconfig file.
This became a problem when CONFIG_COMPILE_TEST support was added.
Non-Qualcomm platforms won't necessarily enable CONFIG_INTERCONNECT.
Reported-by: kernel test robot <lkp@intel.com>
Fixes: 38a4066f59 ("net: ipa: support COMPILE_TEST")
Signed-off-by: Alex Elder <elder@linaro.org>
Link: https://lore.kernel.org/r/20220301113440.257916-1-elder@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The truesize for a UDP GRO packet is added by main skb and skbs in main
skb's frag_list:
skb_gro_receive_list
p->truesize += skb->truesize;
The commit 53475c5dd8 ("net: fix use-after-free when UDP GRO with
shared fraglist") introduced a truesize increase for frag_list skbs.
When uncloning skb, it will call pskb_expand_head and trusesize for
frag_list skbs may increase. This can occur when allocators uses
__netdev_alloc_skb and not jump into __alloc_skb. This flow does not
use ksize(len) to calculate truesize while pskb_expand_head uses.
skb_segment_list
err = skb_unclone(nskb, GFP_ATOMIC);
pskb_expand_head
if (!skb->sk || skb->destructor == sock_edemux)
skb->truesize += size - osize;
If we uses increased truesize adding as delta_truesize, it will be
larger than before and even larger than previous total truesize value
if skbs in frag_list are abundant. The main skb truesize will become
smaller and even a minus value or a huge value for an unsigned int
parameter. Then the following memory check will drop this abnormal skb.
To avoid this error we should use the original truesize to segment the
main skb.
Fixes: 53475c5dd8 ("net: fix use-after-free when UDP GRO with shared fraglist")
Signed-off-by: lena wang <lena.wang@mediatek.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/1646133431-8948-1-git-send-email-lena.wang@mediatek.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Simon Wunderlich says:
====================
Here are some batman-adv bugfixes:
- Remove redundant iflink requests, by Sven Eckelmann (2 patches)
- Don't expect inter-netns unique iflink indices, by Sven Eckelmann
* tag 'batadv-net-pullrequest-20220302' of git://git.open-mesh.org/linux-merge:
batman-adv: Don't expect inter-netns unique iflink indices
batman-adv: Request iflink once in batadv_get_real_netdevice
batman-adv: Request iflink once in batadv-on-batadv check
====================
Link: https://lore.kernel.org/r/20220302163049.101957-1-sw@simonwunderlich.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Johannes Berg says:
====================
Three more fixes:
- fix build issue in iwlwifi, now that I understood
what's going on there
- propagate error in iwlwifi/mvm to userspace so it
can figure out what's happening
- fix channel switch related updates in P2P-client
in cfg80211
* tag 'wireless-for-net-2022-03-02' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
iwlwifi: mvm: return value for request_ownership
nl80211: Update bss channel on channel switch for P2P_CLIENT
iwlwifi: fix build error for IWLMEI
====================
Link: https://lore.kernel.org/r/20220302214444.100180-1-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull ucounts fix from Eric Biederman:
"Etienne Dechamps recently found a regression caused by enforcing
RLIMIT_NPROC for root where the rlimit was not previously enforced.
Michal Koutný had previously pointed out the inconsistency in
enforcing the RLIMIT_NPROC that had been on the root owned process
after the root user creates a user namespace.
Which makes the fix for the regression simply removing the
inconsistency"
* 'ucount-rlimit-fixes-for-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
ucounts: Fix systemd LimitNPROC with private users regression
Pull ARM fixes from Russell King:
- Fix kgdb breakpoint for Thumb2
- Fix dependency for BITREVERSE kconfig
- Fix nommu early_params and __setup returns
* tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: 9182/1: mmu: fix returns from early_param() and __setup() functions
ARM: 9178/1: fix unmet dependency on BITREVERSE for HAVE_ARCH_BITREVERSE
ARM: Fix kgdb breakpoint for Thumb2
While it might work, the current approach is fragile in a few ways:
- whenever members in the structure are shuffled, the pointer will be wrong
- the resource freeing may include more than covered by kfree()
Fix this by using charlcd_free() call instead of kfree().
Fixes: 8c9108d014 ("auxdisplay: add a driver for lcd2s character display")
Cc: Lars Poeschel <poeschel@lemonage.de>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
Once allocated the struct lcd2s_data is never freed.
Fix the memory leak by switching to devm_kzalloc().
Fixes: 8c9108d014 ("auxdisplay: add a driver for lcd2s character display")
Cc: Lars Poeschel <poeschel@lemonage.de>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
It seems that the lcd2s_redefine_char() has never been properly
tested. The buffer is filled by DEF_CUSTOM_CHAR command followed
by the character number (from 0 to 7), but immediately after that
these bytes are rewritten by the decoded hex stream.
Fix the index to fill the buffer after the command and number.
Fixes: 8c9108d014 ("auxdisplay: add a driver for lcd2s character display")
Cc: Lars Poeschel <poeschel@lemonage.de>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
[fixed typo in commit message]
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
The wdev channel information is updated post channel switch only for
the station mode and not for the other modes. Due to this, the P2P client
still points to the old value though it moved to the new channel
when the channel change is induced from the P2P GO.
Update the bss channel after CSA channel switch completion for P2P client
interface as well.
Signed-off-by: Sreeramya Soratkal <quic_ssramya@quicinc.com>
Link: https://lore.kernel.org/r/1646114600-31479-1-git-send-email-quic_ssramya@quicinc.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Pull erofs fix from Gao Xiang:
"A one-line patch to fix the new ztailpacking feature on > 4GiB
filesystems because z_idataoff can get trimmed improperly.
ztailpacking is still a brand new EXPERIMENTAL feature, but it'd be
better to fix the issue as soon as possible to avoid unnecessary
backporting.
Summary:
- Fix ztailpacking z_idataoff getting trimmed on > 4GiB filesystems"
* tag 'erofs-for-5.17-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: fix ztailpacking on > 4GiB filesystems
Pull NTB fixes from Jon Mason:
"Bug fixes for sparse warning, intel port config offset, and a new
mailing list"
* tag 'ntb-5.17-bugfixes' of git://github.com/jonmason/ntb:
MAINTAINERS: update mailing list address for NTB subsystem
ntb: intel: fix port config status offset for SPR
NTB/msi: Use struct_size() helper in devm_kzalloc()
In ("ptp: ocp: Have FPGA fold in ns adjustment for adjtime."), the
ns adjustment was written to the FPGA register, so the clock could
accurately perform adjustments.
However, the adjtime() call passes in a s64, while the clock adjustment
registers use a s32. When trying to perform adjustments with a large
value (37 sec), things fail.
Examine the incoming delta, and if larger than 1 sec, use the original
(coarse) adjustment method. If smaller than 1 sec, then allow the
FPGA to fold in the changes over a 1 second window.
Fixes: 6d59d4fa17 ("ptp: ocp: Have FPGA fold in ns adjustment for adjtime.")
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Link: https://lore.kernel.org/r/20220228203957.367371-1-jonathan.lemon@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
kvm_arch_vcpu_ioctl_run is already doing srcu_read_lock/unlock in two
places, namely vcpu_run and post_kvm_run_save, and a third is actually
needed around the call to vcpu->arch.complete_userspace_io to avoid
the following splat:
WARNING: suspicious RCU usage
arch/x86/kvm/pmu.c:190 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1
1 lock held by CPU 28/KVM/370841:
#0: ff11004089f280b8 (&vcpu->mutex){+.+.}-{3:3}, at: kvm_vcpu_ioctl+0x87/0x730 [kvm]
Call Trace:
<TASK>
dump_stack_lvl+0x59/0x73
reprogram_fixed_counter+0x15d/0x1a0 [kvm]
kvm_pmu_trigger_event+0x1a3/0x260 [kvm]
? free_moved_vector+0x1b4/0x1e0
complete_fast_pio_in+0x8a/0xd0 [kvm]
This splat is not at all unexpected, since complete_userspace_io callbacks
can execute similar code to vmexits. For example, SVM with nrips=false
will call into the emulator from svm_skip_emulated_instruction().
While it's tempting to never acquire kvm->srcu for an uninitialized vCPU,
practically speaking there's no penalty to acquiring kvm->srcu "early"
as the KVM_MP_STATE_UNINITIALIZED path is a one-time thing per vCPU. On
the other hand, seemingly innocuous helpers like kvm_apic_accept_events()
and sync_regs() can theoretically reach code that might access
SRCU-protected data structures, e.g. sync_regs() can trigger forced
existing of nested mode via kvm_vcpu_ioctl_x86_set_vcpu_events().
Reported-by: Like Xu <likexu@tencent.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Just like on the optional mmu_alloc_direct_roots() path, once shadow
path reaches "r = -EIO" somewhere, the caller needs to know the actual
state in order to enter error handling and avoid something worse.
Fixes: 4a38162ee9 ("KVM: MMU: load PDPTRs outside mmu_lock")
Signed-off-by: Like Xu <likexu@tencent.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220301124941.48412-1-likexu@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
During log replay, whenever we need to check if a name (dentry) exists in
a directory we do searches on the subvolume tree for inode references or
or directory entries (BTRFS_DIR_INDEX_KEY keys, and BTRFS_DIR_ITEM_KEY
keys as well, before kernel 5.17). However when during log replay we
unlink a name, through btrfs_unlink_inode(), we may not delete inode
references and dir index keys from a subvolume tree and instead just add
the deletions to the delayed inode's delayed items, which will only be
run when we commit the transaction used for log replay. This means that
after an unlink operation during log replay, if we attempt to search for
the same name during log replay, we will not see that the name was already
deleted, since the deletion is recorded only on the delayed items.
We run delayed items after every unlink operation during log replay,
except at unlink_old_inode_refs() and at add_inode_ref(). This was due
to an overlook, as delayed items should be run after evert unlink, for
the reasons stated above.
So fix those two cases.
Fixes: 0d836392ca ("Btrfs: fix mount failure after fsync due to hard link recreation")
Fixes: 1f250e929a ("Btrfs: fix log replay failure after unlink and link combination")
CC: stable@vger.kernel.org # 4.19+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The commit e804861bd4 ("btrfs: fix deadlock between quota disable and
qgroup rescan worker") by Kawasaki resolves deadlock between quota
disable and qgroup rescan worker. But also there is a deadlock case like
it. It's about enabling or disabling quota and creating or removing
qgroup. It can be reproduced in simple script below.
for i in {1..100}
do
btrfs quota enable /mnt &
btrfs qgroup create 1/0 /mnt &
btrfs qgroup destroy 1/0 /mnt &
btrfs quota disable /mnt &
done
Here's why the deadlock happens:
1) The quota rescan task is running.
2) Task A calls btrfs_quota_disable(), locks the qgroup_ioctl_lock
mutex, and then calls btrfs_qgroup_wait_for_completion(), to wait for
the quota rescan task to complete.
3) Task B calls btrfs_remove_qgroup() and it blocks when trying to lock
the qgroup_ioctl_lock mutex, because it's being held by task A. At that
point task B is holding a transaction handle for the current transaction.
4) The quota rescan task calls btrfs_commit_transaction(). This results
in it waiting for all other tasks to release their handles on the
transaction, but task B is blocked on the qgroup_ioctl_lock mutex
while holding a handle on the transaction, and that mutex is being held
by task A, which is waiting for the quota rescan task to complete,
resulting in a deadlock between these 3 tasks.
To resolve this issue, the thread disabling quota should unlock
qgroup_ioctl_lock before waiting rescan completion. Move
btrfs_qgroup_wait_for_completion() after unlock of qgroup_ioctl_lock.
Fixes: e804861bd4 ("btrfs: fix deadlock between quota disable and qgroup rescan worker")
CC: stable@vger.kernel.org # 5.4+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Sidong Yang <realwakka@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We hit a bug with a recovering relocation on mount for one of our file
systems in production. I reproduced this locally by injecting errors
into snapshot delete with balance running at the same time. This
presented as an error while looking up an extent item
WARNING: CPU: 5 PID: 1501 at fs/btrfs/extent-tree.c:866 lookup_inline_extent_backref+0x647/0x680
CPU: 5 PID: 1501 Comm: btrfs-balance Not tainted 5.16.0-rc8+ #8
RIP: 0010:lookup_inline_extent_backref+0x647/0x680
RSP: 0018:ffffae0a023ab960 EFLAGS: 00010202
RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 000000000000000c RDI: 0000000000000000
RBP: ffff943fd2a39b60 R08: 0000000000000000 R09: 0000000000000001
R10: 0001434088152de0 R11: 0000000000000000 R12: 0000000001d05000
R13: ffff943fd2a39b60 R14: ffff943fdb96f2a0 R15: ffff9442fc923000
FS: 0000000000000000(0000) GS:ffff944e9eb40000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f1157b1fca8 CR3: 000000010f092000 CR4: 0000000000350ee0
Call Trace:
<TASK>
insert_inline_extent_backref+0x46/0xd0
__btrfs_inc_extent_ref.isra.0+0x5f/0x200
? btrfs_merge_delayed_refs+0x164/0x190
__btrfs_run_delayed_refs+0x561/0xfa0
? btrfs_search_slot+0x7b4/0xb30
? btrfs_update_root+0x1a9/0x2c0
btrfs_run_delayed_refs+0x73/0x1f0
? btrfs_update_root+0x1a9/0x2c0
btrfs_commit_transaction+0x50/0xa50
? btrfs_update_reloc_root+0x122/0x220
prepare_to_merge+0x29f/0x320
relocate_block_group+0x2b8/0x550
btrfs_relocate_block_group+0x1a6/0x350
btrfs_relocate_chunk+0x27/0xe0
btrfs_balance+0x777/0xe60
balance_kthread+0x35/0x50
? btrfs_balance+0xe60/0xe60
kthread+0x16b/0x190
? set_kthread_struct+0x40/0x40
ret_from_fork+0x22/0x30
</TASK>
Normally snapshot deletion and relocation are excluded from running at
the same time by the fs_info->cleaner_mutex. However if we had a
pending balance waiting to get the ->cleaner_mutex, and a snapshot
deletion was running, and then the box crashed, we would come up in a
state where we have a half deleted snapshot.
Again, in the normal case the snapshot deletion needs to complete before
relocation can start, but in this case relocation could very well start
before the snapshot deletion completes, as we simply add the root to the
dead roots list and wait for the next time the cleaner runs to clean up
the snapshot.
Fix this by setting a bit on the fs_info if we have any DEAD_ROOT's that
had a pending drop_progress key. If they do then we know we were in the
middle of the drop operation and set a flag on the fs_info. Then
balance can wait until this flag is cleared to start up again.
If there are DEAD_ROOT's that don't have a drop_progress set then we're
safe to start balance right away as we'll be properly protected by the
cleaner_mutex.
CC: stable@vger.kernel.org # 5.10+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
User reported there is an array-index-out-of-bounds access while
mounting the crafted image:
[350.411942 ] loop0: detected capacity change from 0 to 262144
[350.427058 ] BTRFS: device fsid a62e00e8-e94e-4200-8217-12444de93c2e devid 1 transid 8 /dev/loop0 scanned by systemd-udevd (1044)
[350.428564 ] BTRFS info (device loop0): disk space caching is enabled
[350.428568 ] BTRFS info (device loop0): has skinny extents
[350.429589 ]
[350.429619 ] UBSAN: array-index-out-of-bounds in fs/btrfs/struct-funcs.c:161:1
[350.429636 ] index 1048096 is out of range for type 'page *[16]'
[350.429650 ] CPU: 0 PID: 9 Comm: kworker/u8:1 Not tainted 5.16.0-rc4
[350.429652 ] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-1ubuntu1.1 04/01/2014
[350.429653 ] Workqueue: btrfs-endio-meta btrfs_work_helper [btrfs]
[350.429772 ] Call Trace:
[350.429774 ] <TASK>
[350.429776 ] dump_stack_lvl+0x47/0x5c
[350.429780 ] ubsan_epilogue+0x5/0x50
[350.429786 ] __ubsan_handle_out_of_bounds+0x66/0x70
[350.429791 ] btrfs_get_16+0xfd/0x120 [btrfs]
[350.429832 ] check_leaf+0x754/0x1a40 [btrfs]
[350.429874 ] ? filemap_read+0x34a/0x390
[350.429878 ] ? load_balance+0x175/0xfc0
[350.429881 ] validate_extent_buffer+0x244/0x310 [btrfs]
[350.429911 ] btrfs_validate_metadata_buffer+0xf8/0x100 [btrfs]
[350.429935 ] end_bio_extent_readpage+0x3af/0x850 [btrfs]
[350.429969 ] ? newidle_balance+0x259/0x480
[350.429972 ] end_workqueue_fn+0x29/0x40 [btrfs]
[350.429995 ] btrfs_work_helper+0x71/0x330 [btrfs]
[350.430030 ] ? __schedule+0x2fb/0xa40
[350.430033 ] process_one_work+0x1f6/0x400
[350.430035 ] ? process_one_work+0x400/0x400
[350.430036 ] worker_thread+0x2d/0x3d0
[350.430037 ] ? process_one_work+0x400/0x400
[350.430038 ] kthread+0x165/0x190
[350.430041 ] ? set_kthread_struct+0x40/0x40
[350.430043 ] ret_from_fork+0x1f/0x30
[350.430047 ] </TASK>
[350.430047 ]
[350.430077 ] BTRFS warning (device loop0): bad eb member start: ptr 0xffe20f4e start 20975616 member offset 4293005178 size 2
btrfs check reports:
corrupt leaf: root=3 block=20975616 physical=20975616 slot=1, unexpected
item end, have 4294971193 expect 3897
The first slot item offset is 4293005033 and the size is 1966160.
In check_leaf, we use btrfs_item_end() to check item boundary versus
extent_buffer data size. However, return type of btrfs_item_end() is u32.
(u32)(4293005033 + 1966160) == 3897, overflow happens and the result 3897
equals to leaf data size reasonably.
Fix it by use u64 variable to store item data end in check_leaf() to
avoid u32 overflow.
This commit does solve the invalid memory access showed by the stack
trace. However, its metadata profile is DUP and another copy of the
leaf is fine. So the image can be mounted successfully. But when umount
is called, the ASSERT btrfs_mark_buffer_dirty() will be triggered
because the only node in extent tree has 0 item and invalid owner. It's
solved by another commit
"btrfs: check extent buffer owner against the owner rootid".
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=215299
Reported-by: Wenqing Liu <wenqingliu0120@gmail.com>
CC: stable@vger.kernel.org # 4.19+
Signed-off-by: Su Yue <l@damenly.su>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Whenever we do any extent buffer operations we call
assert_eb_page_uptodate() to complain loudly if we're operating on an
non-uptodate page. Our overnight tests caught this warning earlier this
week
WARNING: CPU: 1 PID: 553508 at fs/btrfs/extent_io.c:6849 assert_eb_page_uptodate+0x3f/0x50
CPU: 1 PID: 553508 Comm: kworker/u4:13 Tainted: G W 5.17.0-rc3+ #564
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
Workqueue: btrfs-cache btrfs_work_helper
RIP: 0010:assert_eb_page_uptodate+0x3f/0x50
RSP: 0018:ffffa961440a7c68 EFLAGS: 00010246
RAX: 0017ffffc0002112 RBX: ffffe6e74453f9c0 RCX: 0000000000001000
RDX: ffffe6e74467c887 RSI: ffffe6e74453f9c0 RDI: ffff8d4c5efc2fc0
RBP: 0000000000000d56 R08: ffff8d4d4a224000 R09: 0000000000000000
R10: 00015817fa9d1ef0 R11: 000000000000000c R12: 00000000000007b1
R13: ffff8d4c5efc2fc0 R14: 0000000001500000 R15: 0000000001cb1000
FS: 0000000000000000(0000) GS:ffff8d4dbbd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ff31d3448d8 CR3: 0000000118be8004 CR4: 0000000000370ee0
Call Trace:
extent_buffer_test_bit+0x3f/0x70
free_space_test_bit+0xa6/0xc0
load_free_space_tree+0x1f6/0x470
caching_thread+0x454/0x630
? rcu_read_lock_sched_held+0x12/0x60
? rcu_read_lock_sched_held+0x12/0x60
? rcu_read_lock_sched_held+0x12/0x60
? lock_release+0x1f0/0x2d0
btrfs_work_helper+0xf2/0x3e0
? lock_release+0x1f0/0x2d0
? finish_task_switch.isra.0+0xf9/0x3a0
process_one_work+0x26d/0x580
? process_one_work+0x580/0x580
worker_thread+0x55/0x3b0
? process_one_work+0x580/0x580
kthread+0xf0/0x120
? kthread_complete_and_exit+0x20/0x20
ret_from_fork+0x1f/0x30
This was partially fixed by c2e3930529 ("btrfs: clear extent buffer
uptodate when we fail to write it"), however all that fix did was keep
us from finding extent buffers after a failed writeout. It didn't keep
us from continuing to use a buffer that we already had found.
In this case we're searching the commit root to cache the block group,
so we can start committing the transaction and switch the commit root
and then start writing. After the switch we can look up an extent
buffer that hasn't been written yet and start processing that block
group. Then we fail to write that block out and clear Uptodate on the
page, and then we start spewing these errors.
Normally we're protected by the tree lock to a certain degree here. If
we read a block we have that block read locked, and we block the writer
from locking the block before we submit it for the write. However this
isn't necessarily fool proof because the read could happen before we do
the submit_bio and after we locked and unlocked the extent buffer.
Also in this particular case we have path->skip_locking set, so that
won't save us here. We'll simply get a block that was valid when we
read it, but became invalid while we were using it.
What we really want is to catch the case where we've "read" a block but
it's not marked Uptodate. On read we ClearPageError(), so if we're
!Uptodate and !Error we know we didn't do the right thing for reading
the page.
Fix this by checking !Uptodate && !Error, this way we will not complain
if our buffer gets invalidated while we're using it, and we'll maintain
the spirit of the check which is to make sure we have a fully in-cache
block while we're messing with it.
CC: stable@vger.kernel.org # 5.4+
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When doing a full fsync, if we have prealloc extents beyond (or at) eof,
and the leaves that contain them were not modified in the current
transaction, we end up not logging them. This results in losing those
extents when we replay the log after a power failure, since the inode is
truncated to the current value of the logged i_size.
Just like for the fast fsync path, we need to always log all prealloc
extents starting at or beyond i_size. The fast fsync case was fixed in
commit 471d557afe ("Btrfs: fix loss of prealloc extents past i_size
after fsync log replay") but it missed the full fsync path. The problem
exists since the very early days, when the log tree was added by
commit e02119d5a7 ("Btrfs: Add a write ahead tree log to optimize
synchronous operations").
Example reproducer:
$ mkfs.btrfs -f /dev/sdc
$ mount /dev/sdc /mnt
# Create our test file with many file extent items, so that they span
# several leaves of metadata, even if the node/page size is 64K. Use
# direct IO and not fsync/O_SYNC because it's both faster and it avoids
# clearing the full sync flag from the inode - we want the fsync below
# to trigger the slow full sync code path.
$ xfs_io -f -d -c "pwrite -b 4K 0 16M" /mnt/foo
# Now add two preallocated extents to our file without extending the
# file's size. One right at i_size, and another further beyond, leaving
# a gap between the two prealloc extents.
$ xfs_io -c "falloc -k 16M 1M" /mnt/foo
$ xfs_io -c "falloc -k 20M 1M" /mnt/foo
# Make sure everything is durably persisted and the transaction is
# committed. This makes all created extents to have a generation lower
# than the generation of the transaction used by the next write and
# fsync.
sync
# Now overwrite only the first extent, which will result in modifying
# only the first leaf of metadata for our inode. Then fsync it. This
# fsync will use the slow code path (inode full sync bit is set) because
# it's the first fsync since the inode was created/loaded.
$ xfs_io -c "pwrite 0 4K" -c "fsync" /mnt/foo
# Extent list before power failure.
$ xfs_io -c "fiemap -v" /mnt/foo
/mnt/foo:
EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS
0: [0..7]: 2178048..2178055 8 0x0
1: [8..16383]: 26632..43007 16376 0x0
2: [16384..32767]: 2156544..2172927 16384 0x0
3: [32768..34815]: 2172928..2174975 2048 0x800
4: [34816..40959]: hole 6144
5: [40960..43007]: 2174976..2177023 2048 0x801
<power fail>
# Mount fs again, trigger log replay.
$ mount /dev/sdc /mnt
# Extent list after power failure and log replay.
$ xfs_io -c "fiemap -v" /mnt/foo
/mnt/foo:
EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS
0: [0..7]: 2178048..2178055 8 0x0
1: [8..16383]: 26632..43007 16376 0x0
2: [16384..32767]: 2156544..2172927 16384 0x1
# The prealloc extents at file offsets 16M and 20M are missing.
So fix this by calling btrfs_log_prealloc_extents() when we are doing a
full fsync, so that we always log all prealloc extents beyond eof.
A test case for fstests will follow soon.
CC: stable@vger.kernel.org # 4.19+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
When looping btrfs/074 with 64K page size and 4K sectorsize, there is a
low chance (1/50~1/100) to crash with the following ASSERT() triggered
in btrfs_subpage_start_writer():
ret = atomic_add_return(nbits, &subpage->writers);
ASSERT(ret == nbits); <<< This one <<<
[CAUSE]
With more debugging output on the parameters of
btrfs_subpage_start_writer(), it shows a very concerning error:
ret=29 nbits=13 start=393216 len=53248
For @nbits it's correct, but @ret which is the returned value from
atomic_add_return(), it's not only larger than nbits, but also larger
than max sectors per page value (for 64K page size and 4K sector size,
it's 16).
This indicates that some call sites are not properly decreasing the value.
And that's exactly the case, in btrfs_page_unlock_writer(), due to the
fact that we can have page locked either by lock_page() or
process_one_page(), we have to check if the subpage has any writer.
If no writers, it's locked by lock_page() and we only need to unlock it.
But unfortunately the check for the writers are completely opposite:
if (atomic_read(&subpage->writers))
/* No writers, locked by plain lock_page() */
return unlock_page(page);
We directly unlock the page if it has writers, which is the completely
opposite what we want.
Thankfully the affected call site is only limited to
extent_write_locked_range(), so it's mostly affecting compressed write.
[FIX]
Just fix the wrong check condition to fix the bug.
Fixes: e55a0de185 ("btrfs: rework page locking in __extent_writepage()")
CC: stable@vger.kernel.org # 5.16
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Commit 54659ca026 ("staging: rtl8723bs: remove possible deadlock when
disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock
into 2 locks in attempt to fix a lockdep reported issue with the locking
order of the sta_hash_lock vs pxmitpriv->lock.
But in the end this turned out to not fully solve the sta_hash_lock issue
so commit a7ac783c33 ("staging: rtl8723bs: remove a second possible
deadlock") was added to fix this in another way.
The original fix was kept as it was still seen as a good thing to have,
but now it turns out that it creates a deadlock in access-point mode:
[Feb20 23:47] ======================================================
[ +0.074085] WARNING: possible circular locking dependency detected
[ +0.074077] 5.16.0-1-amd64 #1 Tainted: G C E
[ +0.064710] ------------------------------------------------------
[ +0.074075] ksoftirqd/3/29 is trying to acquire lock:
[ +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.114921]
but task is already holding lock:
[ +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
[ +0.116976]
which lock already depends on the new lock.
[ +0.098037]
the existing dependency chain (in reverse order) is:
[ +0.089704]
-> #1 (&psta->sleep_q.lock){+.-.}-{2:2}:
[ +0.077232] _raw_spin_lock_bh+0x34/0x40
[ +0.053261] xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs]
[ +0.082572] rtw_xmit+0x58b/0x940 [r8723bs]
[ +0.056528] _rtw_xmit_entry+0xba/0x350 [r8723bs]
[ +0.062755] dev_hard_start_xmit+0xf1/0x320
[ +0.056381] sch_direct_xmit+0x9e/0x360
[ +0.052212] __dev_queue_xmit+0xce4/0x1080
[ +0.055334] ip6_finish_output2+0x18f/0x6e0
[ +0.056378] ndisc_send_skb+0x2c8/0x870
[ +0.052209] ndisc_send_ns+0xd3/0x210
[ +0.050130] addrconf_dad_work+0x3df/0x5a0
[ +0.055338] process_one_work+0x274/0x5a0
[ +0.054296] worker_thread+0x52/0x3b0
[ +0.050124] kthread+0x16c/0x1a0
[ +0.044925] ret_from_fork+0x1f/0x30
[ +0.049092]
-> #0 (&pxmitpriv->lock){+.-.}-{2:2}:
[ +0.074101] __lock_acquire+0x10f5/0x1d80
[ +0.054298] lock_acquire+0xd7/0x300
[ +0.049088] _raw_spin_lock_bh+0x34/0x40
[ +0.053248] rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[ +0.066949] rtw_xmitframe_enqueue+0xa/0x20 [r8723bs]
[ +0.066946] rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs]
[ +0.078386] wakeup_sta_to_xmit+0xa6/0x300 [r8723bs]
[ +0.065903] rtw_recv_entry+0xe36/0x1160 [r8723bs]
[ +0.063809] rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs]
[ +0.071093] tasklet_action_common.constprop.0+0xe5/0x110
[ +0.070966] __do_softirq+0x16f/0x50a
[ +0.050134] __irq_exit_rcu+0xeb/0x140
[ +0.051172] irq_exit_rcu+0xa/0x20
[ +0.047006] common_interrupt+0xb8/0xd0
[ +0.052214] asm_common_interrupt+0x1e/0x40
[ +0.056381] finish_task_switch.isra.0+0x100/0x3a0
[ +0.063670] __schedule+0x3ad/0xd20
[ +0.048047] schedule+0x4e/0xc0
[ +0.043880] smpboot_thread_fn+0xc4/0x220
[ +0.054298] kthread+0x16c/0x1a0
[ +0.044922] ret_from_fork+0x1f/0x30
[ +0.049088]
other info that might help us debug this:
[ +0.095950] Possible unsafe locking scenario:
[ +0.070952] CPU0 CPU1
[ +0.054282] ---- ----
[ +0.054285] lock(&psta->sleep_q.lock);
[ +0.047004] lock(&pxmitpriv->lock);
[ +0.074082] lock(&psta->sleep_q.lock);
[ +0.077209] lock(&pxmitpriv->lock);
[ +0.043873]
*** DEADLOCK ***
[ +0.070950] 1 lock held by ksoftirqd/3/29:
[ +0.049082] #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
Analysis shows that in hindsight the splitting of the lock was not
a good idea, so revert this to fix the access-point mode deadlock.
Note this is a straight-forward revert done with git revert, the commented
out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the
code before the reverted changes.
Fixes: 54659ca026 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)")
Cc: stable <stable@vger.kernel.org>
Cc: Fabio Aiuto <fabioaiuto83@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542
Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The ifindex doesn't have to be unique for multiple network namespaces on
the same machine.
$ ip netns add test1
$ ip -net test1 link add dummy1 type dummy
$ ip netns add test2
$ ip -net test2 link add dummy2 type dummy
$ ip -net test1 link show dev dummy1
6: dummy1: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 96:81:55:1e:dd:85 brd ff:ff:ff:ff:ff:ff
$ ip -net test2 link show dev dummy2
6: dummy2: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 5a:3c:af:35:07:c3 brd ff:ff:ff:ff:ff:ff
But the batman-adv code to walk through the various layers of virtual
interfaces uses this assumption because dev_get_iflink handles it
internally and doesn't return the actual netns of the iflink. And
dev_get_iflink only documents the situation where ifindex == iflink for
physical devices.
But only checking for dev->netdev_ops->ndo_get_iflink is also not an option
because ipoib_get_iflink implements it even when it sometimes returns an
iflink != ifindex and sometimes iflink == ifindex. The caller must
therefore make sure itself to check both netns and iflink + ifindex for
equality. Only when they are equal, a "physical" interface was detected
which should stop the traversal. On the other hand, vxcan_get_iflink can
also return 0 in case there was currently no valid peer. In this case, it
is still necessary to stop.
Fixes: b7eddd0b39 ("batman-adv: prevent using any virtual device created on batman-adv as hard-interface")
Fixes: 5ed4a460a1 ("batman-adv: additional checks for virtual interfaces on top of WiFi")
Reported-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
There is no need to call dev_get_iflink multiple times for the same
net_device in batadv_get_real_netdevice. And since some of the
ndo_get_iflink callbacks are dynamic (for example via RCUs like in
vxcan_get_iflink), it could easily happen that the returned values are not
stable. The pre-checks before __dev_get_by_index are then of course bogus.
Fixes: 5ed4a460a1 ("batman-adv: additional checks for virtual interfaces on top of WiFi")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
There is no need to call dev_get_iflink multiple times for the same
net_device in batadv_is_on_batman_iface. And since some of the
.ndo_get_iflink callbacks are dynamic (for example via RCUs like in
vxcan_get_iflink), it could easily happen that the returned values are not
stable. The pre-checks before __dev_get_by_index are then of course bogus.
Fixes: b7eddd0b39 ("batman-adv: prevent using any virtual device created on batman-adv as hard-interface")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Before these changes elan_suspend() would only disable the regulator
when device_may_wakeup() returns false; whereas elan_resume() would
unconditionally enable it, leading to an enable count imbalance when
device_may_wakeup() returns true.
This triggers the "WARN_ON(regulator->enable_count)" in regulator_put()
when the elan_i2c driver gets unbound, this happens e.g. with the
hot-plugable dock with Elan I2C touchpad for the Asus TF103C 2-in-1.
Fix this by making the regulator_enable() call also be conditional
on device_may_wakeup() returning false.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20220131135436.29638-2-hdegoede@redhat.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
elan_disable_power() is called conditionally on suspend, where as
elan_enable_power() is always called on resume. This leads to
an imbalance in the regulator's enable count.
Move the regulator_[en|dis]able() calls out of elan_[en|dis]able_power()
in preparation of fixing this.
No functional changes intended.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20220131135436.29638-1-hdegoede@redhat.com
[dtor: consolidate elan_[en|dis]able() into elan_set_power()]
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
When trying to add a histogram against an event with the "cpu" field, it
was impossible due to "cpu" being a keyword to key off of the running CPU.
So to fix this, it was changed to "common_cpu" to match the other generic
fields (like "common_pid"). But since some scripts used "cpu" for keying
off of the CPU (for events that did not have "cpu" as a field, which is
most of them), a backward compatibility trick was added such that if "cpu"
was used as a key, and the event did not have "cpu" as a field name, then
it would fallback and switch over to "common_cpu".
This fix has a couple of subtle bugs. One was that when switching over to
"common_cpu", it did not change the field name, it just set a flag. But
the code still found a "cpu" field. The "cpu" field is used for filtering
and is returned when the event does not have a "cpu" field.
This was found by:
# cd /sys/kernel/tracing
# echo hist:key=cpu,pid:sort=cpu > events/sched/sched_wakeup/trigger
# cat events/sched/sched_wakeup/hist
Which showed the histogram unsorted:
{ cpu: 19, pid: 1175 } hitcount: 1
{ cpu: 6, pid: 239 } hitcount: 2
{ cpu: 23, pid: 1186 } hitcount: 14
{ cpu: 12, pid: 249 } hitcount: 2
{ cpu: 3, pid: 994 } hitcount: 5
Instead of hard coding the "cpu" checks, take advantage of the fact that
trace_event_field_field() returns a special field for "cpu" and "CPU" if
the event does not have "cpu" as a field. This special field has the
"filter_type" of "FILTER_CPU". Check that to test if the returned field is
of the CPU type instead of doing the string compare.
Also, fix the sorting bug by testing for the hist_field flag of
HIST_FIELD_FL_CPU when setting up the sort routine. Otherwise it will use
the special CPU field to know what compare routine to use, and since that
special field does not have a size, it returns tracing_map_cmp_none.
Cc: stable@vger.kernel.org
Fixes: 1e3bac71c5 ("tracing/histogram: Rename "cpu" to "common_cpu"")
Reported-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
When the DSA_NOTIFIER_TAG_PROTO returns an error, the user space process
which initiated the protocol change exits the kernel processing while
still holding the rtnl_mutex. So any other process attempting to lock
the rtnl_mutex would deadlock after such event.
The error handling of DSA_NOTIFIER_TAG_PROTO was inadvertently changed
by the blamed commit, introducing this regression. We must still call
rtnl_unlock(), and we must still call DSA_NOTIFIER_TAG_PROTO for the old
protocol. The latter is due to the limiting design of notifier chains
for cross-chip operations, which don't have a built-in error recovery
mechanism - we should look into using notifier_call_chain_robust for that.
Fixes: dc452a471d ("net: dsa: introduce tagger-owned storage for private and shared data")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20220228141715.146485-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Luiz Augusto von Dentz says:
====================
bluetooth pull request for net:
- Fix regression with scanning not working in some systems.
* tag 'for-net-2022-03-01' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: Fix not checking MGMT cmd pending queue
====================
Link: https://lore.kernel.org/r/20220302004330.125536-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
A number of places in the MGMT handlers we examine the command queue for
other commands (in progress but not yet complete) that will interact
with the process being performed. However, not all commands go into the
queue if one of:
1. There is no negative side effect of consecutive or redundent commands
2. The command is entirely perform "inline".
This change examines each "pending command" check, and if it is not
needed, deletes the check. Of the remaining pending command checks, we
make sure that the command is in the pending queue by using the
mgmt_pending_add/mgmt_pending_remove pair rather than the
mgmt_pending_new/mgmt_pending_free pair.
Link: https://lore.kernel.org/linux-bluetooth/f648f2e11bb3c2974c32e605a85ac3a9fac944f1.camel@redhat.com/T/
Tested-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Brian Gix <brian.gix@intel.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
1) Use kfree_rcu(ptr, rcu) variant, using kfree_rcu(ptr) was not
intentional. From Eric Dumazet.
2) Use-after-free in netfilter hook core, from Eric Dumazet.
3) Missing rcu read lock side for netfilter egress hook,
from Florian Westphal.
4) nf_queue assume state->sk is full socket while it might not be.
Invoke sock_gen_put(), from Florian Westphal.
5) Add selftest to exercise the reported KASAN splat in 4)
6) Fix possible use-after-free in nf_queue in case sk_refcnt is 0.
Also from Florian.
7) Use input interface index only for hardware offload, not for
the software plane. This breaks tc ct action. Patch from Paul Blakey.
* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
net/sched: act_ct: Fix flow table lookup failure with no originating ifindex
netfilter: nf_queue: handle socket prefetch
netfilter: nf_queue: fix possible use-after-free
selftests: netfilter: add nfqueue TCP_NEW_SYN_RECV socket race test
netfilter: nf_queue: don't assume sk is full socket
netfilter: egress: silence egress hook lockdep splats
netfilter: fix use-after-free in __nf_register_net_hook()
netfilter: nf_tables: prefer kfree_rcu(ptr, rcu) variant
====================
Link: https://lore.kernel.org/r/20220301215337.378405-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
After cited commit optimizted hw insertion, flow table entries are
populated with ifindex information which was intended to only be used
for HW offload. This tuple ifindex is hashed in the flow table key, so
it must be filled for lookup to be successful. But tuple ifindex is only
relevant for the netfilter flowtables (nft), so it's not filled in
act_ct flow table lookup, resulting in lookup failure, and no SW
offload and no offload teardown for TCP connection FIN/RST packets.
To fix this, add new tc ifindex field to tuple, which will
only be used for offloading, not for lookup, as it will not be
part of the tuple hash.
Fixes: 9795ded7f9 ("net/sched: act_ct: Fill offloading tuple iifidx")
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pull kvm fixes from Paolo Bonzini:
"The bigger part of the change is a revert for x86 hosts. Here the
second patch was supposed to fix the first, but in reality it was just
as broken, so both have to go.
x86 host:
- Revert incorrect assumption that cr3 changes come with preempt
notifier callbacks (they don't when static branches are changed,
for example)
ARM host:
- Correctly synchronise PMR and co on PSCI CPU_SUSPEND
- Skip tests that depend on GICv3 when the HW isn't available"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: selftests: aarch64: Skip tests if we can't create a vgic-v3
Revert "KVM: VMX: Save HOST_CR3 in vmx_prepare_switch_to_guest()"
Revert "KVM: VMX: Save HOST_CR3 in vmx_set_host_fs_gs()"
KVM: arm64: Don't miss pending interrupts for suspended vCPU
s390 has a swap_ex_entry_fixup function, however it is not being used
since common code expects a swap_ex_entry_fixup define. If it is not
defined the default implementation will be used. So fix this by adding
a proper define.
However also the implementation of the function must be fixed, since a
NULL value for handler has a special meaning and must not be adjusted.
Luckily all of this doesn't fix a real bug currently: the main extable
is correctly sorted during build time, and for runtime sorting there
is currently no case where the handler field is not NULL.
Fixes: 05a68e892e ("s390/kernel: expand exception table logic to allow new handling options")
Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
arch_ftrace_get_regs is supposed to return a struct pt_regs pointer
only if the pt_regs structure contains all register contents, which
means it must have been populated when created via ftrace_regs_caller.
If it was populated via ftrace_caller the contents are not complete
(the psw mask part is missing), and therefore a NULL pointer needs be
returned.
The current code incorrectly always returns a struct pt_regs pointer.
Fix this by adding another pt_regs flag which indicates if the
contents are complete, and fix arch_ftrace_get_regs accordingly.
Fixes: 894979689d ("s390/ftrace: provide separate ftrace_caller/ftrace_regs_caller implementations")
Reported-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reported-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
ftrace_caller was used for both ftrace_caller and ftrace_regs_caller,
which means that the target address of the hotpatch trampoline was
never updated.
With commit 894979689d ("s390/ftrace: provide separate
ftrace_caller/ftrace_regs_caller implementations") a separate
ftrace_regs_caller entry point was implemeted, however it was
forgotten to implement the necessary changes for ftrace_modify_call
and ftrace_make_call, where the branch target has to be modified
accordingly.
Therefore add the missing code now.
Fixes: 894979689d ("s390/ftrace: provide separate ftrace_caller/ftrace_regs_caller implementations")
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
We need to preserve the values at OLDMEM_BASE and OLDMEM_SIZE which are
used by zgetdump in case when kdump crashes. In that case zgetdump will
attempt to read OLDMEM_BASE and OLDMEM_SIZE in order to find out where
the memory range [0 - OLDMEM_SIZE] belonging to the production kernel is.
Fixes: f1a5469474 ("s390/setup: don't reserve memory that occupied decompressor's head")
Cc: stable@vger.kernel.org # 5.15+
Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Acked-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Partially revert commit 5f501d5556 ("binfmt_elf: reintroduce using
MAP_FIXED_NOREPLACE"), which applied the ET_DYN "total_mapping_size"
logic also to ET_EXEC.
At least ia64 has ET_EXEC PT_LOAD segments that are not virtual-address
contiguous (but _are_ file-offset contiguous). This would result in a
giant mapping attempting to cover the entire span, including the virtual
address range hole, and well beyond the size of the ELF file itself,
causing the kernel to refuse to load it. For example:
$ readelf -lW /usr/bin/gcc
...
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz ...
...
LOAD 0x000000 0x4000000000000000 0x4000000000000000 0x00b5a0 0x00b5a0 ...
LOAD 0x00b5a0 0x600000000000b5a0 0x600000000000b5a0 0x0005ac 0x000710 ...
...
^^^^^^^^ ^^^^^^^^^^^^^^^^^^ ^^^^^^^^ ^^^^^^^^
File offset range : 0x000000-0x00bb4c
0x00bb4c bytes
Virtual address range : 0x4000000000000000-0x600000000000bcb0
0x200000000000bcb0 bytes
Remove the total_mapping_size logic for ET_EXEC, which reduces the
ET_EXEC MAP_FIXED_NOREPLACE coverage to only the first PT_LOAD (better
than nothing), and retains it for ET_DYN.
Ironically, this is the reverse of the problem that originally caused
problems with MAP_FIXED_NOREPLACE: overlapping PT_LOAD segments. Future
work could restore full coverage if load_elf_binary() were to perform
mappings in a separate phase from the loading (where it could resolve
both overlaps and holes).
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-mm@kvack.org
Reported-by: matoro <matoro_bugzilla_kernel@matoro.tk>
Fixes: 5f501d5556 ("binfmt_elf: reintroduce using MAP_FIXED_NOREPLACE")
Link: https://lore.kernel.org/r/a3edd529-c42d-3b09-135c-7e98a15b150f@leemhuis.info
Tested-by: matoro <matoro_mailinglist_kernel@matoro.tk>
Link: https://lore.kernel.org/lkml/ce8af9c13bcea9230c7689f3c1e0e2cd@matoro.tk
Tested-By: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Link: https://lore.kernel.org/lkml/49182d0d-708b-4029-da5f-bc18603440a6@physik.fu-berlin.de
Cc: stable@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
The function alloc_workqueue() in nintendo_hid_probe() can fail, but
there is no check of its return value. To fix this bug, its return value
should be checked with new error handling code.
Fixes: c4eae84fef ("HID: nintendo: add rumble support")
Reported-by: TOTE Robot <oslab@tsinghua.edu.cn>
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Reviewed-by: Silvan Jegen <s.jegen@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
johannes Berg says:
====================
Some last-minute fixes:
* rfkill
- add missing rfill_soft_blocked() when disabled
* cfg80211
- handle a nla_memdup() failure correctly
- fix CONFIG_CFG80211_EXTRA_REGDB_KEYDIR typo in
Makefile
* mac80211
- fix EAPOL handling in 802.3 RX path
- reject setting up aggregation sessions before
connection is authorized to avoid timeouts or
similar
- handle some SAE authentication steps correctly
- fix AC selection in mesh forwarding
* iwlwifi
- remove TWT support as it causes firmware crashes
when the AP isn't behaving correctly
- check debugfs pointer before dereferncing it
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The driver creates the top row map sysfs attribute in input_configured()
method; unfortunately we do not have a callback that is executed when HID
interface is unbound, thus we are leaking these sysfs attributes, for
example when device is disconnected.
To fix it let's switch to managed version of adding sysfs attributes which
will ensure that they are destroyed when the driver is unbound.
Fixes: 14c9c014ba ("HID: add vivaldi HID driver")
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Tested-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
in tunnel mode, if outer interface(ipv4) is less, it is easily to let
inner IPV6 mtu be less than 1280. If so, a Packet Too Big ICMPV6 message
is received. When send again, packets are fragmentized with 1280, they
are still rejected with ICMPV6(Packet Too Big) by xfrmi_xmit2().
According to RFC4213 Section3.2.2:
if (IPv4 path MTU - 20) is less than 1280
if packet is larger than 1280 bytes
Send ICMPv6 "packet too big" with MTU=1280
Drop packet
else
Encapsulate but do not set the Don't Fragment
flag in the IPv4 header. The resulting IPv4
packet might be fragmented by the IPv4 layer
on the encapsulator or by some router along
the IPv4 path.
endif
else
if packet is larger than (IPv4 path MTU - 20)
Send ICMPv6 "packet too big" with
MTU = (IPv4 path MTU - 20).
Drop packet.
else
Encapsulate and set the Don't Fragment flag
in the IPv4 header.
endif
endif
Packets should be fragmentized with ipv4 outer interface, so change it.
After it is fragemtized with ipv4, there will be double fragmenation.
No.48 & No.51 are ipv6 fragment packets, No.48 is double fragmentized,
then tunneled with IPv4(No.49& No.50), which obey spec. And received peer
cannot decrypt it rightly.
48 2002::10 2002::11 1296(length) IPv6 fragment (off=0 more=y ident=0xa20da5bc nxt=50)
49 0x0000 (0) 2002::10 2002::11 1304 IPv6 fragment (off=0 more=y ident=0x7448042c nxt=44)
50 0x0000 (0) 2002::10 2002::11 200 ESP (SPI=0x00035000)
51 2002::10 2002::11 180 Echo (ping) request
52 0x56dc 2002::10 2002::11 248 IPv6 fragment (off=1232 more=n ident=0xa20da5bc nxt=50)
xfrm6_noneed_fragment has fixed above issues. Finally, it acted like below:
1 0x6206 192.168.1.138 192.168.1.1 1316 Fragmented IP protocol (proto=Encap Security Payload 50, off=0, ID=6206) [Reassembled in #2]
2 0x6206 2002::10 2002::11 88 IPv6 fragment (off=0 more=y ident=0x1f440778 nxt=50)
3 0x0000 2002::10 2002::11 248 ICMPv6 Echo (ping) request
Signed-off-by: Lina Wang <lina.wang@mediatek.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
In case someone combines bpf socket assign and nf_queue, then we will
queue an skb who references a struct sock that did not have its
reference count incremented.
As we leave rcu protection, there is no guarantee that skb->sk is still
valid.
For refcount-less skb->sk case, try to increment the reference count
and then override the destructor.
In case of failure we have two choices: orphan the skb and 'delete'
preselect or let nf_queue() drop the packet.
Do the latter, it should not happen during normal operation.
Fixes: cf7fbe660f ("bpf: Add socket assign support")
Acked-by: Joe Stringer <joe@cilium.io>
Signed-off-by: Florian Westphal <fw@strlen.de>
Eric Dumazet says:
The sock_hold() side seems suspect, because there is no guarantee
that sk_refcnt is not already 0.
On failure, we cannot queue the packet and need to indicate an
error. The packet will be dropped by the caller.
v2: split skb prefetch hunk into separate change
Fixes: 271b72c7fa ("udp: RCU handling for Unicast packets.")
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
causes:
BUG: KASAN: slab-out-of-bounds in sk_free+0x25/0x80
Write of size 4 at addr ffff888106df0284 by task nf-queue/1459
sk_free+0x25/0x80
nf_queue_entry_release_refs+0x143/0x1a0
nf_reinject+0x233/0x770
... without 'netfilter: nf_queue: don't assume sk is full socket'.
Signed-off-by: Florian Westphal <fw@strlen.de>
There is no guarantee that state->sk refers to a full socket.
If refcount transitions to 0, sock_put calls sk_free which then ends up
with garbage fields.
I'd like to thank Oleksandr Natalenko and Jiri Benc for considerable
debug work and pointing out state->sk oddities.
Fixes: ca6fb06518 ("tcp: attach SYNACK messages to request sockets instead of listener")
Tested-by: Oleksandr Natalenko <oleksandr@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
When we get anti-clogging token required (added by the commit
mentioned below), or the other status codes added by the later
commit 4e56cde15f ("mac80211: Handle special status codes in
SAE commit") we currently just pretend (towards the internal
state machine of authentication) that we didn't receive anything.
This has the undesirable consequence of retransmitting the prior
frame, which is not expected, because the timer is still armed.
If we just disarm the timer at that point, it would result in
the undesirable side effect of being in this state indefinitely
if userspace crashes, or so.
So to fix this, reset the timer and set a new auth_data->waiting
in order to have no more retransmissions, but to have the data
destroyed when the timer actually fires, which will only happen
if userspace didn't continue (i.e. crashed or abandoned it.)
Fixes: a4055e74a2 ("mac80211: Don't destroy auth data in case of anti-clogging")
Reported-by: Jouni Malinen <j@w1.fi>
Link: https://lore.kernel.org/r/20220224103932.75964e1d7932.Ia487f91556f29daae734bf61f8181404642e1eec@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
If CONFIG_RFKILL is not set, the Intel WiFi driver will not build
the iw_mvm driver part due to the missing rfill_soft_blocked()
call. Adding a inline declaration of rfill_soft_blocked() if
CONFIG_RFKILL=n fixes the following error:
drivers/net/wireless/intel/iwlwifi/mvm/mvm.h: In function 'iwl_mvm_mei_set_sw_rfkill_state':
drivers/net/wireless/intel/iwlwifi/mvm/mvm.h:2215:38: error: implicit declaration of function 'rfkill_soft_blocked'; did you mean 'rfkill_blocked'? [-Werror=implicit-function-declaration]
2215 | mvm->hw_registered ? rfkill_soft_blocked(mvm->hw->wiphy->rfkill) : false;
| ^~~~~~~~~~~~~~~~~~~
| rfkill_blocked
Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Reported-by: Neill Whillans <neill.whillans@codethink.co.uk>
Fixes: 5bc9a9dd75 ("rfkill: allow to get the software rfkill state")
Link: https://lore.kernel.org/r/20220218093858.1245677-1-ben.dooks@codethink.co.uk
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
- Set display pipeline to DSI on mt8183 kukui jacuzzi
- Fix display for mt8192 based boards by fixing the routing table
* tag 'v5.17-fixes-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/matthias.bgg/linux:
soc: mediatek: mt8192-mmsys: Fix dither to dsi0 path's input sel
arm64: dts: mt8183: jacuzzi: Fix bus properties in anx's DSI endpoint
Link: https://lore.kernel.org/r/8eb8510d-c597-4fee-e4b3-924b6d4bb3be@gmail.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Qualcomm DeviceTree fixes for v5.17
The SDX65 platform and MTP device was added twice to the DT binding,
this drops one of the occurances.
* tag 'qcom-dts-fixes-for-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux:
Revert "dt-bindings: arm: qcom: Document SDX65 platform and boards"
Link: https://lore.kernel.org/r/20220301033838.1801689-1-bjorn.andersson@linaro.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Qualcomm ARM64 DeviceTree fixes for 5.17
This starts off by fixing an issue introduced in a bug fix in the
global clock controller, where the symbol clocks for UFS would
end up picking the wrong parent clock which breaks UFS.
It then makes sure that the reference clock for the USB blocks are
enabled, even with booting without clk_ignore_unused.
It corrects the apps SMMU interrupts defintion by adding a missing
interrupt in the list.
Lastly it disables the Qualcomm crypto hardware (for now) on the Lenovo
Yoga C630, to prevent the cryptomanager tests during boot from crashing
the device.
* tag 'qcom-arm64-fixes-for-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux:
arm64: dts: qcom: c630: disable crypto due to serror
arm64: dts: qcom: sm8450: fix apps_smmu interrupts
arm64: dts: qcom: sm8450: enable GCC_USB3_0_CLKREF_EN for usb
arm64: dts: qcom: sm8350: Correct UFS symbol clocks
Link: https://lore.kernel.org/r/20220301033526.1801295-1-bjorn.andersson@linaro.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2022-02-28
This series contains updates to igc and e1000e drivers.
Corinna Vinschen ensures release of hardware sempahore on failed
register read in igc_read_phy_reg_gpy().
Sasha does the same for the write variant, igc_write_phy_reg_gpy(). On
e1000e, he resolves an issue with hardware unit hang on s0ix exit
by disabling some bits and LAN connected device reset during power
management flows. Lastly, he allows for TGP platforms to correct its
NVM checksum.
v2: Fix Fixes tag on patch 3
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When "dump_apple_properties" is used on the kernel boot command line,
it causes an Unknown parameter message and the string is added to init's
argument strings:
Unknown kernel command line parameters "dump_apple_properties
BOOT_IMAGE=/boot/bzImage-517rc6 efivar_ssdt=newcpu_ssdt", will be
passed to user space.
Run /sbin/init as init process
with arguments:
/sbin/init
dump_apple_properties
with environment:
HOME=/
TERM=linux
BOOT_IMAGE=/boot/bzImage-517rc6
efivar_ssdt=newcpu_ssdt
Similarly when "efivar_ssdt=somestring" is used, it is added to the
Unknown parameter message and to init's environment strings, polluting
them (see examples above).
Change the return value of the __setup functions to 1 to indicate
that the __setup options have been handled.
Fixes: 58c5475aba ("x86/efi: Retrieve and assign Apple device properties")
Fixes: 475fb4e8b2 ("efi / ACPI: load SSTDs from EFI variables")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: Igor Zhbanov <i.zhbanov@omprussia.ru>
Link: lore.kernel.org/r/64644a2f-4a20-bab3-1e15-3b2cdd0defe3@omprussia.ru
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: linux-efi@vger.kernel.org
Cc: Lukas Wunner <lukas@wunner.de>
Cc: Octavian Purdila <octavian.purdila@intel.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Link: https://lore.kernel.org/r/20220301041851.12459-1-rdunlap@infradead.org
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
In commit d687e056a1 ("soc: mediatek: mmsys: Add mt8192 mmsys routing table"),
the mmsys routing table for mt8192 was introduced but the input selector
for DITHER->DSI0 has no value assigned to it.
This means that we are clearing bit 0 instead of setting it, blocking
communication between these two blocks; due to that, any display that
is connected to DSI0 will not work, as no data will go through.
The effect of that issue is that, during bootup, the DRM will block for
some time, while atomically waiting for a vblank that never happens;
later, the situation doesn't get better, leaving the display in a
non-functional state.
To fix this issue, fix the route entry in the table by assigning the
dither input selector to MT8192_DISP_DSI0_SEL_IN.
Fixes: d687e056a1 ("soc: mediatek: mmsys: Add mt8192 mmsys routing table")
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Tested-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Nícolas F. R. A. Prado <nfraprado@collabora.com>
Link: https://lore.kernel.org/r/20220128142056.359900-1-angelogioacchino.delregno@collabora.com
Signed-off-by: Matthias Brugger <matthias.bgg@gmail.com>
This driver, like several others, uses a chained IRQ for each GPIO bank,
and forwards .irq_set_wake to the GPIO bank's upstream IRQ. As a result,
a call to irq_set_irq_wake() needs to lock both the upstream and
downstream irq_desc's. Lockdep considers this to be a possible deadlock
when the irq_desc's share lockdep classes, which they do by default:
============================================
WARNING: possible recursive locking detected
5.17.0-rc3-00394-gc849047c2473 #1 Not tainted
--------------------------------------------
init/307 is trying to acquire lock:
c2dfe27c (&irq_desc_lock_class){-.-.}-{2:2}, at: __irq_get_desc_lock+0x58/0xa0
but task is already holding lock:
c3c0ac7c (&irq_desc_lock_class){-.-.}-{2:2}, at: __irq_get_desc_lock+0x58/0xa0
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&irq_desc_lock_class);
lock(&irq_desc_lock_class);
*** DEADLOCK ***
May be due to missing lock nesting notation
4 locks held by init/307:
#0: c1f29f18 (system_transition_mutex){+.+.}-{3:3}, at: __do_sys_reboot+0x90/0x23c
#1: c20f7760 (&dev->mutex){....}-{3:3}, at: device_shutdown+0xf4/0x224
#2: c2e804d8 (&dev->mutex){....}-{3:3}, at: device_shutdown+0x104/0x224
#3: c3c0ac7c (&irq_desc_lock_class){-.-.}-{2:2}, at: __irq_get_desc_lock+0x58/0xa0
stack backtrace:
CPU: 0 PID: 307 Comm: init Not tainted 5.17.0-rc3-00394-gc849047c2473 #1
Hardware name: Allwinner sun8i Family
unwind_backtrace from show_stack+0x10/0x14
show_stack from dump_stack_lvl+0x68/0x90
dump_stack_lvl from __lock_acquire+0x1680/0x31a0
__lock_acquire from lock_acquire+0x148/0x3dc
lock_acquire from _raw_spin_lock_irqsave+0x50/0x6c
_raw_spin_lock_irqsave from __irq_get_desc_lock+0x58/0xa0
__irq_get_desc_lock from irq_set_irq_wake+0x2c/0x19c
irq_set_irq_wake from irq_set_irq_wake+0x13c/0x19c
[tail call from sunxi_pinctrl_irq_set_wake]
irq_set_irq_wake from gpio_keys_suspend+0x80/0x1a4
gpio_keys_suspend from gpio_keys_shutdown+0x10/0x2c
gpio_keys_shutdown from device_shutdown+0x180/0x224
device_shutdown from __do_sys_reboot+0x134/0x23c
__do_sys_reboot from ret_fast_syscall+0x0/0x1c
However, this can never deadlock because the upstream and downstream
IRQs are never the same (nor do they even involve the same irqchip).
Silence this erroneous lockdep splat by applying what appears to be the
usual fix of moving the GPIO IRQs to separate lockdep classes.
Fixes: a59c99d9ea ("pinctrl: sunxi: Forward calls to irq_set_irq_wake")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Samuel Holland <samuel@sholland.org>
Reviewed-by: Jernej Skrabec <jernej.skrabec@gmail.com>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/20220216040037.22730-1-samuel@sholland.org
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Netfilter assumes its called with rcu_read_lock held, but in egress
hook case it may be called with BH readlock.
This triggers lockdep splat.
In order to avoid to change all rcu_dereference() to
rcu_dereference_check(..., rcu_read_lock_bh_held()), wrap nf_hook_slow
with read lock/unlock pair.
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pull ARM SoC fixes from Arnd Bergmann:
"The code changes address mostly minor problems:
- Several NXP/FSL SoC driver fixes, addressing issues with error
handling and compilation
- Fix a clock disabling imbalance in gpcv2 driver.
- Arm Juno DMA coherency issue
- Trivial firmware driver fixes for op-tee and scmi firmware
The remaining changes address issues in the devicetree files:
- A timer regression for the OMAP devkit8000, which has to use the
alternative timer.
- A hang in the i.MX8MM power domain configuration
- Multiple fixes for the Rockchip RK3399 addressing issues with sound
and eMMC
- Cosmetic fixes for i.MX8ULP, RK3xxx, and Tegra124"
* tag 'soc-fixes-5.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (32 commits)
ARM: tegra: Move panels to AUX bus
soc: imx: gpcv2: Fix clock disabling imbalance in error path
soc: fsl: qe: Check of ioremap return value
soc: fsl: qe: fix typo in a comment
soc: fsl: guts: Add a missing memory allocation failure check
soc: fsl: guts: Revert commit 3c0d64e867
soc: fsl: Correct MAINTAINERS database (SOC)
soc: fsl: Correct MAINTAINERS database (QUICC ENGINE LIBRARY)
soc: fsl: Replace kernel.h with the necessary inclusions
dt-bindings: fsl,layerscape-dcfg: add missing compatible for lx2160a
dt-bindings: qoriq-clock: add missing compatible for lx2160a
ARM: dts: Use 32KiHz oscillator on devkit8000
ARM: dts: switch timer config to common devkit8000 devicetree
tee: optee: fix error return code in probe function
arm64: dts: imx8ulp: Set #thermal-sensor-cells to 1 as required
arm64: dts: imx8mm: Fix VPU Hanging
ARM: dts: rockchip: fix a typo on rk3288 crypto-controller
ARM: dts: rockchip: reorder rk322x hmdi clocks
firmware: arm_scmi: Remove space in MODULE_ALIAS name
arm64: dts: agilex: use the compatible "intel,socfpga-agilex-hsotg"
...
Pull EFI fixes from Ard Biesheuvel:
- don't treat valid hartid U32_MAX as a failure return code (RISC-V)
- avoid blocking query_variable_info() call when blocking is not
allowed
* tag 'efi-urgent-for-v5.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
efivars: Respect "block" flag in efivar_entry_set_safe()
riscv/efi_stub: Fix get_boot_hartid_from_fdt() return value
Update the link to the "Software Techniques for Managing Speculation
on AMD Processors" whitepaper.
Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
AMD retpoline may be susceptible to speculation. The speculation
execution window for an incorrect indirect branch prediction using
LFENCE/JMP sequence may potentially be large enough to allow
exploitation using Spectre V2.
By default, don't use retpoline,lfence on AMD. Instead, use the
generic retpoline.
Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Similar to "igc_read_phy_reg_gpy: drop premature return" patch.
igc_write_phy_reg_gpy checks the return value from igc_write_phy_reg_mdic
and if it's not 0, returns immediately. By doing this, it leaves the HW
semaphore in the acquired state.
Drop this premature return statement, the function returns after
releasing the semaphore immediately anyway.
Fixes: 5586838fe9 ("igc: Add code for PHY support")
Suggested-by: Dima Ruinskiy <dima.ruinskiy@intel.com>
Reported-by: Corinna Vinschen <vinschen@redhat.com>
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
igc_read_phy_reg_gpy checks the return value from igc_read_phy_reg_mdic
and if it's not 0, returns immediately. By doing this, it leaves the HW
semaphore in the acquired state.
Drop this premature return statement, the function returns after
releasing the semaphore immediately anyway.
Fixes: 5586838fe9 ("igc: Add code for PHY support")
Signed-off-by: Corinna Vinschen <vinschen@redhat.com>
Acked-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
early_param() handlers should return 0 on success.
__setup() handlers should return 1 on success, i.e., the parameter
has been handled. A return of 0 would cause the "option=value" string
to be added to init's environment strings, polluting it.
../arch/arm/mm/mmu.c: In function 'test_early_cachepolicy':
../arch/arm/mm/mmu.c:215:1: error: no return statement in function returning non-void [-Werror=return-type]
../arch/arm/mm/mmu.c: In function 'test_noalign_setup':
../arch/arm/mm/mmu.c:221:1: error: no return statement in function returning non-void [-Werror=return-type]
Fixes: b849a60e09 ("ARM: make cr_alignment read-only #ifndef CONFIG_CPU_CP15")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: Igor Zhbanov <i.zhbanov@omprussia.ru>
Cc: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Cc: linux-arm-kernel@lists.infradead.org
Cc: patches@armlinux.org.uk
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
When enabling VMD and IOMMU scalable mode, the following kernel panic
call trace/kernel log is shown in Eagle Stream platform (Sapphire Rapids
CPU) during booting:
pci 0000:59:00.5: Adding to iommu group 42
...
vmd 0000:59:00.5: PCI host bridge to bus 10000:80
pci 10000:80:01.0: [8086:352a] type 01 class 0x060400
pci 10000:80:01.0: reg 0x10: [mem 0x00000000-0x0001ffff 64bit]
pci 10000:80:01.0: enabling Extended Tags
pci 10000:80:01.0: PME# supported from D0 D3hot D3cold
pci 10000:80:01.0: DMAR: Setup RID2PASID failed
pci 10000:80:01.0: Failed to add to iommu group 42: -16
pci 10000:80:03.0: [8086:352b] type 01 class 0x060400
pci 10000:80:03.0: reg 0x10: [mem 0x00000000-0x0001ffff 64bit]
pci 10000:80:03.0: enabling Extended Tags
pci 10000:80:03.0: PME# supported from D0 D3hot D3cold
------------[ cut here ]------------
kernel BUG at lib/list_debug.c:29!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 7 Comm: kworker/0:1 Not tainted 5.17.0-rc3+ #7
Hardware name: Lenovo ThinkSystem SR650V3/SB27A86647, BIOS ESE101Y-1.00 01/13/2022
Workqueue: events work_for_cpu_fn
RIP: 0010:__list_add_valid.cold+0x26/0x3f
Code: 9a 4a ab ff 4c 89 c1 48 c7 c7 40 0c d9 9e e8 b9 b1 fe ff 0f
0b 48 89 f2 4c 89 c1 48 89 fe 48 c7 c7 f0 0c d9 9e e8 a2 b1
fe ff <0f> 0b 48 89 d1 4c 89 c6 4c 89 ca 48 c7 c7 98 0c d9
9e e8 8b b1 fe
RSP: 0000:ff5ad434865b3a40 EFLAGS: 00010246
RAX: 0000000000000058 RBX: ff4d61160b74b880 RCX: ff4d61255e1fffa8
RDX: 0000000000000000 RSI: 00000000fffeffff RDI: ffffffff9fd34f20
RBP: ff4d611d8e245c00 R08: 0000000000000000 R09: ff5ad434865b3888
R10: ff5ad434865b3880 R11: ff4d61257fdc6fe8 R12: ff4d61160b74b8a0
R13: ff4d61160b74b8a0 R14: ff4d611d8e245c10 R15: ff4d611d8001ba70
FS: 0000000000000000(0000) GS:ff4d611d5ea00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ff4d611fa1401000 CR3: 0000000aa0210001 CR4: 0000000000771ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
<TASK>
intel_pasid_alloc_table+0x9c/0x1d0
dmar_insert_one_dev_info+0x423/0x540
? device_to_iommu+0x12d/0x2f0
intel_iommu_attach_device+0x116/0x290
__iommu_attach_device+0x1a/0x90
iommu_group_add_device+0x190/0x2c0
__iommu_probe_device+0x13e/0x250
iommu_probe_device+0x24/0x150
iommu_bus_notifier+0x69/0x90
blocking_notifier_call_chain+0x5a/0x80
device_add+0x3db/0x7b0
? arch_memremap_can_ram_remap+0x19/0x50
? memremap+0x75/0x140
pci_device_add+0x193/0x1d0
pci_scan_single_device+0xb9/0xf0
pci_scan_slot+0x4c/0x110
pci_scan_child_bus_extend+0x3a/0x290
vmd_enable_domain.constprop.0+0x63e/0x820
vmd_probe+0x163/0x190
local_pci_probe+0x42/0x80
work_for_cpu_fn+0x13/0x20
process_one_work+0x1e2/0x3b0
worker_thread+0x1c4/0x3a0
? rescuer_thread+0x370/0x370
kthread+0xc7/0xf0
? kthread_complete_and_exit+0x20/0x20
ret_from_fork+0x1f/0x30
</TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
...
Kernel panic - not syncing: Fatal exception
Kernel Offset: 0x1ca00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
---[ end Kernel panic - not syncing: Fatal exception ]---
The following 'lspci' output shows devices '10000:80:*' are subdevices of
the VMD device 0000:59:00.5:
$ lspci
...
0000:59:00.5 RAID bus controller: Intel Corporation Volume Management Device NVMe RAID Controller (rev 20)
...
10000:80:01.0 PCI bridge: Intel Corporation Device 352a (rev 03)
10000:80:03.0 PCI bridge: Intel Corporation Device 352b (rev 03)
10000:80:05.0 PCI bridge: Intel Corporation Device 352c (rev 03)
10000:80:07.0 PCI bridge: Intel Corporation Device 352d (rev 03)
10000:81:00.0 Non-Volatile memory controller: Intel Corporation NVMe Datacenter SSD [3DNAND, Beta Rock Controller]
10000:82:00.0 Non-Volatile memory controller: Intel Corporation NVMe Datacenter SSD [3DNAND, Beta Rock Controller]
The symptom 'list_add double add' is caused by the following failure
message:
pci 10000:80:01.0: DMAR: Setup RID2PASID failed
pci 10000:80:01.0: Failed to add to iommu group 42: -16
pci 10000:80:03.0: [8086:352b] type 01 class 0x060400
Device 10000:80:01.0 is the subdevice of the VMD device 0000:59:00.5,
so invoking intel_pasid_alloc_table() gets the pasid_table of the VMD
device 0000:59:00.5. Here is call path:
intel_pasid_alloc_table
pci_for_each_dma_alias
get_alias_pasid_table
search_pasid_table
pci_real_dma_dev() in pci_for_each_dma_alias() gets the real dma device
which is the VMD device 0000:59:00.5. However, pte of the VMD device
0000:59:00.5 has been configured during this message "pci 0000:59:00.5:
Adding to iommu group 42". So, the status -EBUSY is returned when
configuring pasid entry for device 10000:80:01.0.
It then invokes dmar_remove_one_dev_info() to release
'struct device_domain_info *' from iommu_devinfo_cache. But, the pasid
table is not released because of the following statement in
__dmar_remove_one_dev_info():
if (info->dev && !dev_is_real_dma_subdevice(info->dev)) {
...
intel_pasid_free_table(info->dev);
}
The subsequent dmar_insert_one_dev_info() operation of device
10000:80:03.0 allocates 'struct device_domain_info *' from
iommu_devinfo_cache. The allocated address is the same address that
is released previously for device 10000:80:01.0. Finally, invoking
device_attach_pasid_table() causes the issue.
`git bisect` points to the offending commit 474dd1c650 ("iommu/vt-d:
Fix clearing real DMA device's scalable-mode context entries"), which
releases the pasid table if the device is not the subdevice by
checking the returned status of dev_is_real_dma_subdevice().
Reverting the offending commit can work around the issue.
The solution is to prevent from allocating pasid table if those
devices are subdevices of the VMD device.
Fixes: 474dd1c650 ("iommu/vt-d: Fix clearing real DMA device's scalable-mode context entries")
Cc: stable@vger.kernel.org # v5.14+
Signed-off-by: Adrian Huang <ahuang12@lenovo.com>
Link: https://lore.kernel.org/r/20220216091307.703-1-adrianhuang0701@gmail.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20220221053348.262724-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
An IPA build problem arose in the linux-next tree the other day.
The problem is that a recent commit adds a new dependency on some
code, and the Kconfig file for IPA doesn't reflect that dependency.
As a result, some configurations can fail to build (particularly
when COMPILE_TEST is enabled).
The recent patch adds calls to qmp_get(), qmp_put(), and qmp_send(),
and those are built based on the QCOM_AOSS_QMP config option. If
that symbol is not defined, stubs are defined, so we just need to
ensure QCOM_AOSS_QMP is compatible with QCOM_IPA, or it's not
defined.
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Fixes: 34a081761e ("net: ipa: request IPA register values be retained")
Signed-off-by: Alex Elder <elder@linaro.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
main.h uses NUM_TARGETS from main_regs.h, but
the missing include never causes any errors
because everywhere main.h is (currently)
included, main_regs.h is included before.
But since it is dependent on main_regs.h
it should always be included.
Signed-off-by: Casper Andersson <casper.casan@gmail.com>
Reviewed-by: Joacim Zetterling <joacim.zetterling@westermo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch calls smc_ib_unregister_client() when tcp_register_ulp()
fails, and make sure to clean it up.
Fixes: d7cd421da9 ("net/smc: Introduce TCP ULP support")
Signed-off-by: Tony Lu <tonylu@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
mt8183-kukui-jacuzzi has an anx7625 bridge connected to the output of
its DSI host. However, after commit fd0310b6fe ("drm/bridge: anx7625:
add MIPI DPI input feature"), a bus-type property started being required
in the endpoint node by the driver to indicate whether it is DSI or DPI.
Add the missing bus-type property and set it to 5
(V4L2_FWNODE_BUS_TYPE_PARALLEL) so that the driver has its input
configured to DSI and the display pipeline can probe correctly.
While at it, also set the data-lanes property that was also introduced
in that same commit, so that we don't rely on the default value.
Fixes: fd0310b6fe ("drm/bridge: anx7625: add MIPI DPI input feature")
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: Nícolas F. R. A. Prado <nfraprado@collabora.com>
Link: https://lore.kernel.org/r/20220214200507.2500693-1-nfraprado@collabora.com
Signed-off-by: Matthias Brugger <matthias.bgg@gmail.com>
There are two reasons for addrconf_notify() to be called with NETDEV_DOWN:
either the network device is actually going down, or IPv6 was disabled
on the interface.
If either of them stays down while the other is toggled, we repeatedly
call the code for NETDEV_DOWN, including ipv6_mc_down(), while never
calling the corresponding ipv6_mc_up() in between. This will cause a
new entry in idev->mc_tomb to be allocated for each multicast group
the interface is subscribed to, which in turn leaks one struct ifmcaddr6
per nontrivial multicast group the interface is subscribed to.
The following reproducer will leak at least $n objects:
ip addr add ff2e::4242/32 dev eth0 autojoin
sysctl -w net.ipv6.conf.eth0.disable_ipv6=1
for i in $(seq 1 $n); do
ip link set up eth0; ip link set down eth0
done
Joining groups with IPV6_ADD_MEMBERSHIP (unprivileged) or setting the
sysctl net.ipv6.conf.eth0.forwarding to 1 (=> subscribing to ff02::2)
can also be used to create a nontrivial idev->mc_list, which will the
leak objects with the right up-down-sequence.
Based on both sources for NETDEV_DOWN events the interface IPv6 state
should be considered:
- not ready if the network interface is not ready OR IPv6 is disabled
for it
- ready if the network interface is ready AND IPv6 is enabled for it
The functions ipv6_mc_up() and ipv6_down() should only be run when this
state changes.
Implement this by remembering when the IPv6 state is ready, and only
run ipv6_mc_down() if it actually changed from ready to not ready.
The other direction (not ready -> ready) already works correctly, as:
- the interface notification triggered codepath for NETDEV_UP /
NETDEV_CHANGE returns early if ipv6 is disabled, and
- the disable_ipv6=0 triggered codepath skips fully initializing the
interface as long as addrconf_link_ready(dev) returns false
- calling ipv6_mc_up() repeatedly does not leak anything
Fixes: 3ce62a84d5 ("ipv6: exit early in addrconf_notify() if IPv6 is disabled")
Signed-off-by: Johannes Nixdorf <j.nixdorf@avm.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the "block" flag is false, the old code would sometimes still call
check_var_size(), which wrongly tells ->query_variable_store() that it can
block.
As far as I can tell, this can't really materialize as a bug at the moment,
because ->query_variable_store only does something on X86 with generic EFI,
and in that configuration we always take the efivar_entry_set_nonblocking()
path.
Fixes: ca0e30dcaa ("efi: Add nonblocking option to efi_query_variable_store()")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20220218180559.1432559-1-jannh@google.com
The get_boot_hartid_from_fdt() function currently returns U32_MAX
for failure case which is not correct because U32_MAX is a valid
hartid value. This patch fixes the issue by returning error code.
Cc: <stable@vger.kernel.org>
Fixes: d7071743db ("RISC-V: Add EFI stub support.")
Signed-off-by: Sunil V L <sunilvl@ventanamicro.com>
Reviewed-by: Heinrich Schuchardt <heinrich.schuchardt@canonical.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Make the samsung-keypad driver explicitly depend on CONFIG_HAS_IOMEM, as it
calls devm_ioremap(). This prevents compile errors in some configs (e.g,
allyesconfig/randconfig under UML):
/usr/bin/ld: drivers/input/keyboard/samsung-keypad.o: in function `samsung_keypad_probe':
samsung-keypad.c:(.text+0xc60): undefined reference to `devm_ioremap'
Signed-off-by: David Gow <davidgow@google.com>
Acked-by: anton ivanov <anton.ivanov@cambridgegreys.com>
Link: https://lore.kernel.org/r/20220225041727.1902850-1-davidgow@google.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Pull irq fix from Thomas Gleixner:
"A single fix for a regression caused by the recent PCI/MSI rework
which resulted in a recursive locking problem in the VMD driver.
The cure is to cache the relevant information upfront instead of
retrieving it at runtime"
* tag 'irq-urgent-2022-02-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
PCI: vmd: Prevent recursive locking on interrupt allocation
Pull dma-mapping fix from Christoph Hellwig:
- fix a swiotlb info leak (Halil Pasic)
* tag 'dma-mapping-5.17-1' of git://git.infradead.org/users/hch/dma-mapping:
swiotlb: fix info leak with DMA_FROM_DEVICE
Pull pin control fixes from Linus Walleij:
- Fix some drive strength and pull-up code in the K210 driver.
- Add the Alder Lake-M ACPI ID so it starts to work properly.
- Use a static name for the StarFive GPIO irq_chip, forestalling an
upcoming fixes series from Marc Zyngier.
- Fix an ages old bug in the Tegra 186 driver where we were indexing at
random into struct and being lucky getting the right member.
* tag 'pinctrl-v5-17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
gpio: tegra186: Fix chip_data type confusion
pinctrl: starfive: Use a static name for the GPIO irq_chip
pinctrl: tigerlake: Revert "Add Alder Lake-M ACPI ID"
pinctrl: k210: Fix bias-pull-up
pinctrl: fix loop in k210_pinconf_get_drive()
Pull tracing fixes from Steven Rostedt:
- rtla (Real-Time Linux Analysis tool):
- fix typo in man page
- Update API -e to -E before it is released
- Error message fix and memory leak fix
- Partially uninline trace event soft disable to shrink text
- Fix function graph start up test
- Have triggers affect the trace instance they are in and not top level
- Have osnoise sleep in the units it says it uses
- Remove unused ftrace stub function
- Remove event probe redundant info from event in the buffer
- Fix group ownership setting in tracefs
- Ensure trace buffer is minimum size to prevent crashes
* tag 'trace-v5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
rtla/osnoise: Fix error message when failing to enable trace instance
rtla/osnoise: Free params at the exit
rtla/hist: Make -E the short version of --entries
tracing: Fix selftest config check for function graph start up test
tracefs: Set the group ownership in apply_options() not parse_options()
tracing/osnoise: Make osnoise_main to sleep for microseconds
ftrace: Remove unused ftrace_startup_enable() stub
tracing: Ensure trace buffer is at least 4096 bytes large
tracing: Uninline trace_trigger_soft_disabled() partly
eprobes: Remove redundant event type information
tracing: Have traceon and traceoff trigger honor the instance
tracing: Dump stacktrace trigger to the corresponding instance
rtla: Fix systme -> system typo on man page
Pull memblock fix from Mike Rapoport:
"Use kfree() to release kmalloced memblock regions
memblock.{reserved,memory}.regions may be allocated using kmalloc()
in memblock_double_array(). Use kfree() to release these kmalloced
regions"
* tag 'fixes-2022-02-26' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
memblock: use kfree() to release kmalloced memblock regions
Merge misc fixes from Andrew Morton:
"12 patches.
Subsystems affected by this patch series: MAINTAINERS, mailmap, memfd,
and mm (hugetlb, kasan, hugetlbfs, pagemap, selftests, memcg, and
slab)"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
selftests/memfd: clean up mapping in mfd_fail_write
mailmap: update Roman Gushchin's email
MAINTAINERS, SLAB: add Roman as reviewer, git tree
MAINTAINERS: add Shakeel as a memcg co-maintainer
MAINTAINERS: remove Vladimir from memcg maintainers
MAINTAINERS: add Roman as a memcg co-maintainer
selftest/vm: fix map_fixed_noreplace test failure
mm: fix use-after-free bug when mm->mmap is reused after being freed
hugetlbfs: fix a truncation issue in hugepages parameter
kasan: test: prevent cache merging in kmem_cache_double_destroy
mm/hugetlb: fix kernel crash with hugetlb mremap
MAINTAINERS: add sysctl-next git tree
Pull RISC-V fixes from Palmer Dabbelt:
- A fix for the K210 sdcard defconfig, to avoid using a
fixed delay for the root FS
- A fix to make sure there's a proper call frame for
trace_hardirqs_{on,off}().
* tag 'riscv-for-linus-5.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: fix oops caused by irqsoff latency tracer
riscv: fix nommu_k210_sdcard_defconfig
Pull xfs fixes from Darrick Wong:
"Nothing exciting, just more fixes for not returning sync_filesystem
error values (and eliding it when it's not necessary).
Summary:
- Only call sync_filesystem when we're remounting the filesystem
readonly readonly, and actually check its return value"
* tag 'xfs-5.17-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: only bother with sync_filesystem during readonly remount
Running the memfd script ./run_hugetlbfs_test.sh will often end in error
as follows:
memfd-hugetlb: CREATE
memfd-hugetlb: BASIC
memfd-hugetlb: SEAL-WRITE
memfd-hugetlb: SEAL-FUTURE-WRITE
memfd-hugetlb: SEAL-SHRINK
fallocate(ALLOC) failed: No space left on device
./run_hugetlbfs_test.sh: line 60: 166855 Aborted (core dumped) ./memfd_test hugetlbfs
opening: ./mnt/memfd
fuse: DONE
If no hugetlb pages have been preallocated, run_hugetlbfs_test.sh will
allocate 'just enough' pages to run the test. In the SEAL-FUTURE-WRITE
test the mfd_fail_write routine maps the file, but does not unmap. As a
result, two hugetlb pages remain reserved for the mapping. When the
fallocate call in the SEAL-SHRINK test attempts allocate all hugetlb
pages, it is short by the two reserved pages.
Fix by making sure to unmap in mfd_fail_write.
Link: https://lkml.kernel.org/r/20220219004340.56478-1-mike.kravetz@oracle.com
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When we specify a large number for node in hugepages parameter, it may
be parsed to another number due to truncation in this statement:
node = tmp;
For example, add following parameter in command line:
hugepagesz=1G hugepages=4294967297:5
and kernel will allocate 5 hugepages for node 1 instead of ignoring it.
I move the validation check earlier to fix this issue, and slightly
simplifies the condition here.
Link: https://lkml.kernel.org/r/20220209134018.8242-1-liuyuntao10@huawei.com
Fixes: b5389086ad ("hugetlbfs: extend the definition of hugepages parameter to support node allocation")
Signed-off-by: Liu Yuntao <liuyuntao10@huawei.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This fixes the below crash:
kernel BUG at include/linux/mm.h:2373!
cpu 0x5d: Vector: 700 (Program Check) at [c00000003c6e76e0]
pc: c000000000581a54: pmd_to_page+0x54/0x80
lr: c00000000058d184: move_hugetlb_page_tables+0x4e4/0x5b0
sp: c00000003c6e7980
msr: 9000000000029033
current = 0xc00000003bd8d980
paca = 0xc000200fff610100 irqmask: 0x03 irq_happened: 0x01
pid = 9349, comm = hugepage-mremap
kernel BUG at include/linux/mm.h:2373!
move_hugetlb_page_tables+0x4e4/0x5b0 (link register)
move_hugetlb_page_tables+0x22c/0x5b0 (unreliable)
move_page_tables+0xdbc/0x1010
move_vma+0x254/0x5f0
sys_mremap+0x7c0/0x900
system_call_exception+0x160/0x2c0
the kernel can't use huge_pte_offset before it set the pte entry because
a page table lookup check for huge PTE bit in the page table to
differentiate between a huge pte entry and a pointer to pte page. A
huge_pte_alloc won't mark the page table entry huge and hence kernel
should not use huge_pte_offset after a huge_pte_alloc.
Link: https://lkml.kernel.org/r/20220211063221.99293-1-aneesh.kumar@linux.ibm.com
Fixes: 550a7d60bd ("mm, hugepages: add mremap() support for hugepage backed vma")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2022-02-25
This series contains updates to iavf driver only.
Slawomir fixes stability issues that can be seen when stressing the
driver using a large number of VFs with a multitude of operations.
Among the fixes are reworking mutexes to provide more effective locking,
ensuring initialization is complete before teardown, preventing
operations which could race while removing the driver, stopping certain
tasks from being queued when the device is down, and adding a missing
mutex unlock.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, --entries uses -e as the short version in the hist mode of
timerlat and osnoise tools. But as -e is already used to enable events
on trace sessions by other tools, thus let's keep it available for the
same usage for all rtla tools.
Make -E the short version of --entries for hist mode on all tools.
Note: rtla was merged in this merge window, so rtla was not released yet.
Link: https://lkml.kernel.org/r/5dbf0cbe7364d3a05e708926b41a097c59a02b1e.1645206561.git.bristot@kernel.org
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Marc Kleine-Budde says:
====================
pull-request: can 2022-02-25
The first 2 patches are by Vincent Mailhol and fix the error handling
of the ndo_open callbacks of the etas_es58x and the gs_usb CAN USB
drivers.
The last patch is by Lad Prabhakar and fixes a small race condition in
the rcar_canfd's rcar_canfd_channel_probe() function.
* tag 'linux-can-fixes-for-5.17-20220225' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
can: rcar_canfd: rcar_canfd_channel_probe(): register the CAN device when fully ready
can: gs_usb: change active_channels's type from atomic_t to u8
can: etas_es58x: change opened_channel_cnt's type from atomic_t to u8
====================
Link: https://lore.kernel.org/r/20220225165622.3231809-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull configfs fix from Christoph Hellwig:
- fix a race in configfs_{,un}register_subsystem (ChenXiaoSong)
* tag 'configfs-5.17-2022-02-25' of git://git.infradead.org/users/hch/configfs:
configfs: fix a race in configfs_{,un}register_subsystem()
Pull btrfs fixes from David Sterba:
"This is a hopefully last batch of fixes for defrag that got broken in
5.16, all stable material.
The remaining reported problem is excessive IO with autodefrag due to
various conditions in the defrag code not met or missing"
* tag 'for-5.17-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: reduce extent threshold for autodefrag
btrfs: autodefrag: only scan one inode once
btrfs: defrag: don't use merged extent map for their generation check
btrfs: defrag: bring back the old file extent search behavior
btrfs: defrag: remove an ambiguous condition for rejection
btrfs: defrag: don't defrag extents which are already at max capacity
btrfs: defrag: don't try to merge regular extents with preallocated extents
btrfs: defrag: allow defrag_one_cluster() to skip large extent which is not a target
btrfs: prevent copying too big compressed lzo segment
Pull rdma fixes from Jason Gunthorpe:
- Older "does not even boot" regression in qib from July
- Bug fixes for error unwind in rtrs
- Avoid a deadlock syzkaller found in srp
- Fix another UAF syzkaller found in cma
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/cma: Do not change route.addr.src_addr outside state checks
RDMA/ib_srp: Fix a deadlock
RDMA/rtrs-clt: Move free_permit from free_clt to rtrs_clt_close
RDMA/rtrs-clt: Fix possible double free in error case
IB/qib: Fix duplicate sysfs directory name
Pull gpio fixes from Bartosz Golaszewski:
- fix an bug generating spurious interrupts in gpio-rockchip
- fix a race condition in gpiod_to_irq() called by GPIO consumers
* tag 'gpio-fixes-for-v5.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpio: Return EPROBE_DEFER if gc->to_irq is NULL
gpio: rockchip: Reset int_bothedge when changing trigger
If the state is not idle then resolve_prepare_src() should immediately
fail and no change to global state should happen. However, it
unconditionally overwrites the src_addr trying to build a temporary any
address.
For instance if the state is already RDMA_CM_LISTEN then this will corrupt
the src_addr and would cause the test in cma_cancel_operation():
if (cma_any_addr(cma_src_addr(id_priv)) && !id_priv->cma_dev)
Which would manifest as this trace from syzkaller:
BUG: KASAN: use-after-free in __list_add_valid+0x93/0xa0 lib/list_debug.c:26
Read of size 8 at addr ffff8881546491e0 by task syz-executor.1/32204
CPU: 1 PID: 32204 Comm: syz-executor.1 Not tainted 5.12.0-rc8-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:79 [inline]
dump_stack+0x141/0x1d7 lib/dump_stack.c:120
print_address_description.constprop.0.cold+0x5b/0x2f8 mm/kasan/report.c:232
__kasan_report mm/kasan/report.c:399 [inline]
kasan_report.cold+0x7c/0xd8 mm/kasan/report.c:416
__list_add_valid+0x93/0xa0 lib/list_debug.c:26
__list_add include/linux/list.h:67 [inline]
list_add_tail include/linux/list.h:100 [inline]
cma_listen_on_all drivers/infiniband/core/cma.c:2557 [inline]
rdma_listen+0x787/0xe00 drivers/infiniband/core/cma.c:3751
ucma_listen+0x16a/0x210 drivers/infiniband/core/ucma.c:1102
ucma_write+0x259/0x350 drivers/infiniband/core/ucma.c:1732
vfs_write+0x28e/0xa30 fs/read_write.c:603
ksys_write+0x1ee/0x250 fs/read_write.c:658
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xae
This is indicating that an rdma_id_private was destroyed without doing
cma_cancel_listens().
Instead of trying to re-use the src_addr memory to indirectly create an
any address derived from the dst build one explicitly on the stack and
bind to that as any other normal flow would do. rdma_bind_addr() will copy
it over the src_addr once it knows the state is valid.
This is similar to commit bc0bdc5afa ("RDMA/cma: Do not change
route.addr.src_addr.ss_family")
Link: https://lore.kernel.org/r/0-v2-e975c8fd9ef2+11e-syz_cma_srcaddr_jgg@nvidia.com
Cc: stable@vger.kernel.org
Fixes: 732d41c545 ("RDMA/cma: Make the locking for automatic state transition more clear")
Reported-by: syzbot+c94a3675a626f6333d74@syzkaller.appspotmail.com
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Pull spi fixes from Mark Brown:
"A few small driver specific fixes"
* tag 'spi-fix-v5.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: rockchip: terminate dma transmission when slave abort
spi: rockchip: Fix error in getting num-cs property
spi: spi-zynq-qspi: Fix a NULL pointer dereference in zynq_qspi_exec_mem_op()
Pull regulator fixes from Mark Brown:
"A series of fixes for the da9121 driver"
* tag 'regulator-fix-v5.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: da9121: Remove surplus DA9141 parameters
regulator: da9121: Fix DA914x voltage value
regulator: da9121: Fix DA914x current values
Pull regmap fix from Mark Brown:
"A fix for interrupt controllers which require the explicit
acknowledgement of interrupts using a different register to the one
where interrupts are reported.
Urgent for the few devices this affects"
* tag 'regmap-fix-v5.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
regmap-irq: Update interrupt clear register for proper reset
Pull thermal control fix from Rafael Wysocki:
"Fix a memory leak in the int340x thermal driver's ACPI notify handler
(Chuansheng Liu)"
* tag 'thermal-5.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
thermal: int340x: fix memory leak in int3400_notify()
Pull power management fixes from Rafael Wysocki:
"Fix the throttle IRQ handling during cpufreq initialization on
Qualcomm platforms (Bjorn Andersson)"
* tag 'pm-5.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: qcom-hw: Delay enabling throttle_irq
cpufreq: Reintroduce ready() callback
Pull char/misc driver fixes from Greg KH:
"Here are a few small driver fixes for 5.17-rc6 for reported issues.
The majority of these are IIO fixes for small things, and the other
two are a mvmem and mtd core conflict fix.
All of these have been in linux-next with no reported issues"
* tag 'char-misc-5.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
mtd: core: Fix a conflict between MTD and NVMEM on wp-gpios property
nvmem: core: Fix a conflict between MTD and NVMEM on wp-gpios property
iio: imu: st_lsm6dsx: wait for settling time in st_lsm6dsx_read_oneshot
iio: Fix error handling for PM
iio: addac: ad74413r: correct comparator gpio getters mask usage
iio: addac: ad74413r: use ngpio size when iterating over mask
iio: addac: ad74413r: Do not reference negative array offsets
iio: adc: men_z188_adc: Fix a resource leak in an error handling path
iio: frequency: admv1013: remove the always true condition
iio: accel: fxls8962af: add padding to regmap for SPI
iio:imu:adis16480: fix buffering for devices with no burst mode
iio: adc: ad7124: fix mask used for setting AIN_BUFP & AIN_BUFM bits
iio: adc: tsc2046: fix memory corruption by preventing array overflow
Pull driver core fix from Greg KH:
"Here is a single driver core fix for 5.17-rc6. It resolves a reported
problem when the DMA map of a device is not properly released.
It has been in linux-next with no reported problems"
* tag 'driver-core-5.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
driver core: Free DMA range map when device is released
Pull staging driver fix from Greg KH:
"Here is a single staging driver fix for 5.17-rc6.
It resolves a reported problem in the fbtft fb_st7789v.c driver that
could cause the display to be flipped in cold weather.
It has been in linux-next with no reported problems"
* tag 'staging-5.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: fbtft: fb_st7789v: reset display before initialization
Pull tty/serial driver fixes from Greg KH:
"Here are some small n_gsm and sc16is7xx serial driver fixes for
5.17-rc6.
The n_gsm fixes are from Siemens as it seems they are using the line
discipline and fixing up a number of issues they found in their
testing. The sc16is7xx serial driver fix is for a reported problem
with that chip.
All of these have been in linux-next with no reported problems"
* tag 'tty-5.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
sc16is7xx: Fix for incorrect data being transmitted
tty: n_gsm: fix deadlock in gsmtty_open()
tty: n_gsm: fix wrong modem processing in convergence layer type 2
tty: n_gsm: fix wrong tty control line for flow control
tty: n_gsm: fix NULL pointer access due to DLCI release
tty: n_gsm: fix proper link termination after failed open
tty: n_gsm: fix encoding of command/response bit
tty: n_gsm: fix encoding of control signal octet bit DV
The setup of __IAVF_RESETTING state in watchdog task had no
effect and could lead to slow resets in the driver as
the task for __IAVF_RESETTING state only requeues watchdog.
Till now the __IAVF_RESETTING was interpreted by reset task
as running state which could lead to errors with allocating
and resources disposal.
Make watchdog_task queue the reset task when it's necessary.
Do not update the state to __IAVF_RESETTING so the reset task
knows exactly what is the current state of the adapter.
Fixes: 898ef1cb1c ("iavf: Combine init and watchdog state machines")
Signed-off-by: Slawomir Laba <slawomirx.laba@intel.com>
Signed-off-by: Phani Burra <phani.r.burra@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
When iavf_init_version_check sends VIRTCHNL_OP_GET_VF_RESOURCES
message, the driver will wait for the response after requeueing
the watchdog task in iavf_init_get_resources call stack. The
logic is implemented this way that iavf_init_get_resources has
to be called in order to allocate adapter->vf_res. It is polling
for the AQ response in iavf_get_vf_config function. Expect a
call trace from kernel when adminq_task worker handles this
message first. adapter->vf_res will be NULL in
iavf_virtchnl_completion.
Make the watchdog task not queue the adminq_task if the init
process is not finished yet.
Fixes: 898ef1cb1c ("iavf: Combine init and watchdog state machines")
Signed-off-by: Slawomir Laba <slawomirx.laba@intel.com>
Signed-off-by: Phani Burra <phani.r.burra@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
iavf_virtchnl_completion is called under crit_lock but when
the code for VIRTCHNL_OP_GET_OFFLOAD_VLAN_V2_CAPS is called,
this lock is released in order to obtain rtnl_lock to avoid
ABBA deadlock with unregister_netdev.
Along with the new way iavf_remove behaves, there exist
many risks related to the lock release and attmepts to regrab
it. The driver faces crashes related to races between
unregister_netdev and netdev_update_features. Yet another
risk is that the driver could already obtain the crit_lock
in order to destroy it and iavf_virtchnl_completion could
crash or block forever.
Make iavf_virtchnl_completion never relock crit_lock in it's
call paths.
Extract rtnl_lock locking logic to the driver for
unregister_netdev in order to set the netdev_registered flag
inside the lock.
Introduce a new flag that will inform adminq_task to perform
the code from VIRTCHNL_OP_GET_OFFLOAD_VLAN_V2_CAPS right after
it finishes processing messages. Guard this code with remove
flags so it's never called when the driver is in remove state.
Fixes: 5951a2b981 ("iavf: Fix VLAN feature flags after VFR")
Signed-off-by: Slawomir Laba <slawomirx.laba@intel.com>
Signed-off-by: Phani Burra <phani.r.burra@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
When init states of the adapter work, the errors like lack
of communication with the PF might hop in. If such events
occur the driver restores previous states in order to retry
initialization in a proper way. When remove task kicks in,
this situation could lead to races with unregistering the
netdevice as well as resources cleanup. With the commit
introducing the waiting in remove for init to complete,
this problem turns into an endless waiting if init never
recovers from errors.
Introduce __IAVF_IN_REMOVE_TASK bit to indicate that the
remove thread has started.
Make __IAVF_COMM_FAILED adapter state respect the
__IAVF_IN_REMOVE_TASK bit and set the __IAVF_INIT_FAILED
state and return without any action instead of trying to
recover.
Make __IAVF_INIT_FAILED adapter state respect the
__IAVF_IN_REMOVE_TASK bit and return without any further
actions.
Make the loop in the remove handler break when adapter has
__IAVF_INIT_FAILED state set.
Fixes: 898ef1cb1c ("iavf: Combine init and watchdog state machines")
Signed-off-by: Slawomir Laba <slawomirx.laba@intel.com>
Signed-off-by: Phani Burra <phani.r.burra@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
There exist races when port is being configured and remove is
triggered.
unregister_netdev is not and can't be called under crit_lock
mutex since it is calling ndo_stop -> iavf_close which requires
this lock. Depending on init state the netdev could be still
unregistered so unregister_netdev never cleans up, when shortly
after that the device could become registered.
Make iavf_remove wait until port finishes initialization.
All critical state changes are atomic (under crit_lock).
Crashes that come from iavf_reset_interrupt_capability and
iavf_free_traffic_irqs should now be solved in a graceful
manner.
Fixes: 605ca7c5c6 ("iavf: Fix kernel BUG in free_msi_irqs")
Signed-off-by: Slawomir Laba <slawomirx.laba@intel.com>
Signed-off-by: Phani Burra <phani.r.burra@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The driver used to crash in multiple spots when put to stress testing
of the init, reset and remove paths.
The user would experience call traces or hangs when creating,
resetting, removing VFs. Depending on the machines, the call traces
are happening in random spots, like reset restoring resources racing
with driver remove.
Make adapter->crit_lock mutex a mandatory lock for guarding the
operations performed on all workqueues and functions dealing with
resource allocation and disposal.
Make __IAVF_REMOVE a final state of the driver respected by
workqueues that shall not requeue, when they fail to obtain the
crit_lock.
Make the IRQ handler not to queue the new work for adminq_task
when the __IAVF_REMOVE state is set.
Fixes: 5ac49f3c27 ("iavf: use mutexes for locking of critical sections")
Signed-off-by: Slawomir Laba <slawomirx.laba@intel.com>
Signed-off-by: Phani Burra <phani.r.burra@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Pull USB fixes from Greg KH:
"Here are a number of small USB driver fixes for 5.17-rc6 to resolve
reported problems and add new device ids. They include:
- dwc3:
- device mapping fix
- new device ids
- driver fixes
- xhci driver fixes
- gadget driver fixes
- usb-serial driver device id updates
All of these have been in linux-next with no reported problems"
* tag 'usb-5.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: gadget: rndis: add spinlock for rndis response list
usb: dwc3: gadget: Let the interrupt handler disable bottom halves.
USB: gadget: validate endpoint index for xilinx udc
USB: serial: option: add Telit LE910R1 compositions
USB: serial: option: add support for DW5829e
Revert "USB: serial: ch341: add new Product ID for CH341A"
usb: dwc2: drd: fix soft connect when gadget is unconfigured
usb: dwc3: pci: Fix Bay Trail phy GPIO mappings
tps6598x: clear int mask on probe failure
xhci: Prevent futile URB re-submissions due to incorrect return value.
xhci: re-initialize the HC during resume if HCE was set
usb: dwc3: pci: Add "snps,dis_u2_susphy_quirk" for Intel Bay Trail
usb: dwc3: pci: add support for the Intel Raptor Lake-S
Pull ata fixes from Damien Le Moal:
"Two fixes for the pata_hpt37x driver, both from Sergey:
- Fix a PCI register access using an incorrect size (8bits instead of
16bits)
- Make sure to always disable the primary channel as it is unused"
* tag 'ata-5.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
ata: pata_hpt37x: disable primary channel on HPT371
ata: pata_hpt37x: fix PCI clock detection
When building with clang + CONFIG_DYNAMIC_FTRACE=n + W=1, there is a
warning:
kernel/trace/ftrace.c:7194:20: error: unused function 'ftrace_startup_enable' [-Werror,-Wunused-function]
static inline void ftrace_startup_enable(int command) { }
^
1 error generated.
Clang warns on instances of static inline functions in .c files with W=1
after commit 6863f5643d ("kbuild: allow Clang to find unused static
inline functions for W=1 build").
The ftrace_startup_enable() stub has been unused since
commit e1effa0144 ("ftrace: Annotate the ops operation on update"),
where its use outside of the CONFIG_DYNAMIC_TRACE section was replaced
by ftrace_startup_all(). Remove it to resolve the warning.
Link: https://lkml.kernel.org/r/20220214192847.488166-1-nathan@kernel.org
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
On a powerpc32 build with CONFIG_CC_OPTIMISE_FOR_SIZE, the inline
keyword is not honored and trace_trigger_soft_disabled() appears
approx 50 times in vmlinux.
Adding -Winline to the build, the following message appears:
./include/linux/trace_events.h:712:1: error: inlining failed in call to 'trace_trigger_soft_disabled': call is unlikely and code size would grow [-Werror=inline]
That function is rather big for an inlined function:
c003df60 <trace_trigger_soft_disabled>:
c003df60: 94 21 ff f0 stwu r1,-16(r1)
c003df64: 7c 08 02 a6 mflr r0
c003df68: 90 01 00 14 stw r0,20(r1)
c003df6c: bf c1 00 08 stmw r30,8(r1)
c003df70: 83 e3 00 24 lwz r31,36(r3)
c003df74: 73 e9 01 00 andi. r9,r31,256
c003df78: 41 82 00 10 beq c003df88 <trace_trigger_soft_disabled+0x28>
c003df7c: 38 60 00 00 li r3,0
c003df80: 39 61 00 10 addi r11,r1,16
c003df84: 4b fd 60 ac b c0014030 <_rest32gpr_30_x>
c003df88: 73 e9 00 80 andi. r9,r31,128
c003df8c: 7c 7e 1b 78 mr r30,r3
c003df90: 41 a2 00 14 beq c003dfa4 <trace_trigger_soft_disabled+0x44>
c003df94: 38 c0 00 00 li r6,0
c003df98: 38 a0 00 00 li r5,0
c003df9c: 38 80 00 00 li r4,0
c003dfa0: 48 05 c5 f1 bl c009a590 <event_triggers_call>
c003dfa4: 73 e9 00 40 andi. r9,r31,64
c003dfa8: 40 82 00 28 bne c003dfd0 <trace_trigger_soft_disabled+0x70>
c003dfac: 73 ff 02 00 andi. r31,r31,512
c003dfb0: 41 82 ff cc beq c003df7c <trace_trigger_soft_disabled+0x1c>
c003dfb4: 80 01 00 14 lwz r0,20(r1)
c003dfb8: 83 e1 00 0c lwz r31,12(r1)
c003dfbc: 7f c3 f3 78 mr r3,r30
c003dfc0: 83 c1 00 08 lwz r30,8(r1)
c003dfc4: 7c 08 03 a6 mtlr r0
c003dfc8: 38 21 00 10 addi r1,r1,16
c003dfcc: 48 05 6f 6c b c0094f38 <trace_event_ignore_this_pid>
c003dfd0: 38 60 00 01 li r3,1
c003dfd4: 4b ff ff ac b c003df80 <trace_trigger_soft_disabled+0x20>
However it is located in a hot path so inlining it is important.
But forcing inlining of the entire function by using __always_inline
leads to increasing the text size by approx 20 kbytes.
Instead, split the fonction in two parts, one part with the likely
fast path, flagged __always_inline, and a second part out of line.
With this change, on a powerpc32 with CONFIG_CC_OPTIMISE_FOR_SIZE
vmlinux text increases by only 1,4 kbytes, which is partly
compensated by a decrease of vmlinux data by 7 kbytes.
On ppc64_defconfig which has CONFIG_CC_OPTIMISE_FOR_SPEED, this
change reduces vmlinux text by more than 30 kbytes.
Link: https://lkml.kernel.org/r/69ce0986a52d026d381d612801d978aa4f977460.1644563295.git.christophe.leroy@csgroup.eu
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Currently, the event probes save the type of the event they are attached
to when recording the event. For example:
# echo 'e:switch sched/sched_switch prev_state=$prev_state prev_prio=$prev_prio next_pid=$next_pid next_prio=$next_prio' > dynamic_events
# cat events/eprobes/switch/format
name: switch
ID: 1717
format:
field:unsigned short common_type; offset:0; size:2; signed:0;
field:unsigned char common_flags; offset:2; size:1; signed:0;
field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
field:int common_pid; offset:4; size:4; signed:1;
field:unsigned int __probe_type; offset:8; size:4; signed:0;
field:u64 prev_state; offset:12; size:8; signed:0;
field:u64 prev_prio; offset:20; size:8; signed:0;
field:u64 next_pid; offset:28; size:8; signed:0;
field:u64 next_prio; offset:36; size:8; signed:0;
print fmt: "(%u) prev_state=0x%Lx prev_prio=0x%Lx next_pid=0x%Lx next_prio=0x%Lx", REC->__probe_type, REC->prev_state, REC->prev_prio, REC->next_pid, REC->next_prio
The __probe_type adds 4 bytes to every event.
One of the reasons for creating eprobes is to limit what is traced in an
event to be able to limit what is written into the ring buffer. Having
this redundant 4 bytes to every event takes away from this.
The event that is recorded can be retrieved from the event probe itself,
that is available when the trace is happening. For user space tools, it
could simply read the dynamic_event file to find the event they are for.
So there is really no reason to write this information into the ring
buffer for every event.
Link: https://lkml.kernel.org/r/20220218190057.2f5a19a8@gandalf.local.home
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Joel Fernandes <joel@joelfernandes.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Long story short recursively enforcing RLIMIT_NPROC when it is not
enforced on the process that creates a new user namespace, causes
currently working code to fail. There is no reason to enforce
RLIMIT_NPROC recursively when we don't enforce it normally so update
the code to detect this case.
I would like to simply use capable(CAP_SYS_RESOURCE) to detect when
RLIMIT_NPROC is not enforced upon the caller. Unfortunately because
RLIMIT_NPROC is charged and checked for enforcement based upon the
real uid, using capable() which is euid based is inconsistent with reality.
Come as close as possible to testing for capable(CAP_SYS_RESOURCE) by
testing for when the real uid would match the conditions when
CAP_SYS_RESOURCE would be present if the real uid was the effective
uid.
Reported-by: Etienne Dechamps <etienne@edechamps.fr>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215596
Link: https://lkml.kernel.org/r/e9589141-cfeb-90cd-2d0e-83a62787239a@edechamps.fr
Link: https://lkml.kernel.org/r/87sfs8jmpz.fsf_-_@email.froward.int.ebiederm.org
Cc: stable@vger.kernel.org
Fixes: 21d1c5e386 ("Reimplement RLIMIT_NPROC on top of ucounts")
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
When sending a call-function IPI-many to vCPUs, yield to the
IPI target vCPU which is marked as preempted.
but when emulating HLT, an idling vCPU will be voluntarily
scheduled out and mark as preempted from the guest kernel
perspective. yielding to idle vCPU is pointless and increase
unnecessary vmexit, maybe miss the true preempted vCPU
so yield to IPI target vCPU only if vCPU is busy and preempted
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Message-Id: <1644380201-29423-1-git-send-email-lirongqing@baidu.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When Linux runs as an Isolated VM on Hyper-V, it supports AMD SEV-SNP
but it's partially enlightened, i.e. cc_platform_has(
CC_ATTR_GUEST_MEM_ENCRYPT) is true but sev_active() is false.
Commit 4d96f91091 per se is good, but with it now
kvm_setup_vsyscall_timeinfo() -> kvmclock_init_mem() calls
set_memory_decrypted(), and later gets stuck when trying to zere out
the pages pointed by 'hvclock_mem', if Linux runs as an Isolated VM on
Hyper-V. The cause is that here now the Linux VM should no longer access
the original guest physical addrss (GPA); instead the VM should do
memremap() and access the original GPA + ms_hyperv.shared_gpa_boundary:
see the example code in drivers/hv/connection.c: vmbus_connect() or
drivers/hv/ring_buffer.c: hv_ringbuffer_init(). If the VM tries to
access the original GPA, it keepts getting injected a fault by Hyper-V
and gets stuck there.
Here the issue happens only when the VM has >=65 vCPUs, because the
global static array hv_clock_boot[] can hold 64 "struct
pvclock_vsyscall_time_info" (the sizeof of the struct is 64 bytes), so
kvmclock_init_mem() only allocates memory in the case of vCPUs > 64.
Since the 'hvclock_mem' pages are only useful when the kvm clock is
supported by the underlying hypervisor, fix the issue by returning
early when Linux VM runs on Hyper-V, which doesn't support kvm clock.
Fixes: 4d96f91091 ("x86/sev: Replace occurrences of sev_active() with cc_platform_has()")
Tested-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Message-Id: <20220225084600.17817-1-decui@microsoft.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The arch_timer and vgic_irq kselftests assume that they can create a
vgic-v3, using the library function vgic_v3_setup() which aborts with a
test failure if it is not possible to do so. Since vgic-v3 can only be
instantiated on systems where the host has GICv3 this leads to false
positives on older systems where that is not the case.
Fix this by changing vgic_v3_setup() to return an error if the vgic can't
be instantiated and have the callers skip if this happens. We could also
exit flagging a skip in vgic_v3_setup() but this would prevent future test
cases conditionally deciding which GIC to use or generally doing more
complex output.
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Tested-by: Ricardo Koller <ricarkol@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220223131624.1830351-1-broonie@kernel.org
Check if operation is valid before changing any
settings in hardware. Otherwise it results in
changes being made despite it not being a valid
operation.
Fixes: 78eab33bb6 ("net: sparx5: add vlan support")
Signed-off-by: Casper Andersson <casper.casan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The function pci_find_capability() in t3_prep_adapter() can fail, so its
return value should be checked.
Fixes: 4d22de3e6c ("Add support for the latest 1G/10G Chelsio adapter, T3")
Reported-by: TOTE Robot <oslab@tsinghua.edu.cn>
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sukadev Bhattiprolu says:
====================
ibmvnic: Fix a race in ibmvnic_probe()
If we get a transport (reset) event right after a successful CRQ_INIT
during ibmvnic_probe() but before we set the adapter state to VNIC_PROBED,
we will throw away the reset assuming that the adapter is still in the
probing state. But since the adapter has completed the CRQ_INIT any
subsequent CRQs the we send will be ignored by the vnicserver until
we release/init the CRQ again. This can leave the adapter unconfigured.
While here fix a couple of other bugs that were observed (Patches 1,2,4).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
We currently don't allow queuing resets when adapter is in VNIC_PROBING
state - instead we throw away the reset and return EBUSY. The reasoning
is probably that during ibmvnic_probe() the ibmvnic_adapter itself is
being initialized so performing a reset during this time can lead us to
accessing fields in the ibmvnic_adapter that are not fully initialized.
A review of the code shows that all the adapter state neede to process a
reset is initialized before registering the CRQ so that should no longer
be a concern.
Further the expectation is that if we do get a reset (transport event)
during probe, the do..while() loop in ibmvnic_probe() will handle this
by reinitializing the CRQ.
While that is true to some extent, it is possible that the reset might
occur _after_ the CRQ is registered and CRQ_INIT message was exchanged
but _before_ the adapter state is set to VNIC_PROBED. As mentioned above,
such a reset will be thrown away. While the client assumes that the
adapter is functional, the vnic server will wait for the client to reinit
the adapter. This disconnect between the two leaves the adapter down
needing manual intervention.
Because ibmvnic_probe() has other work to do after initializing the CRQ
(such as registering the netdev at a minimum) and because the reset event
can occur at any instant after the CRQ is initialized, there will always
be a window between initializing the CRQ and considering the adapter
ready for resets (ie state == PROBED).
So rather than discarding resets during this window, allow queueing them
- but only process them after the adapter is fully initialized.
To do this, introduce a new completion state ->probe_done and have the
reset worker thread wait on this before processing resets.
This change brings up two new situations in or just after ibmvnic_probe().
First after one or more resets were queued, we encounter an error and
decide to retry the initialization. At that point the queued resets are
no longer relevant since we could be talking to a new vnic server. So we
must purge/flush the queued resets before restarting the initialization.
As a side note, since we are still in the probing stage and we have not
registered the netdev, it will not be CHANGE_PARAM reset.
Second this change opens up a potential race between the worker thread
in __ibmvnic_reset(), the tasklet and the ibmvnic_open() due to the
following sequence of events:
1. Register CRQ
2. Get transport event before CRQ_INIT completes.
3. Tasklet schedules reset:
a) add rwi to list
b) schedule_work() to start worker thread which runs
and waits for ->probe_done.
4. ibmvnic_probe() decides to retry, purges rwi_list
5. Re-register crq and this time rest of probe succeeds - register
netdev and complete(->probe_done).
6. Worker thread resumes in __ibmvnic_reset() from 3b.
7. Worker thread sets ->resetting bit
8. ibmvnic_open() comes in, notices ->resetting bit, sets state
to IBMVNIC_OPEN and returns early expecting worker thread to
finish the open.
9. Worker thread finds rwi_list empty and returns without
opening the interface.
If this happens, the ->ndo_open() call is effectively lost and the
interface remains down. To address this, ensure that ->rwi_list is
not empty before setting the ->resetting bit. See also comments in
__ibmvnic_reset().
Fixes: 6a2fb0e99f ("ibmvnic: driver initialization for kdump/kexec")
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Clear ->failover_pending flag that may have been set in the previous
pass of registering CRQ. If we don't clear, a subsequent ibmvnic_open()
call would be misled into thinking a failover is pending and assuming
that the reset worker thread would open the adapter. If this pass of
registering the CRQ succeeds (i.e there is no transport event), there
wouldn't be a reset worker thread.
This would leave the adapter unconfigured and require manual intervention
to bring it up during boot.
Fixes: 5a18e1e0c1 ("ibmvnic: Fix failover case for non-redundant configuration")
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We currently initialize the ->init_done completion/return code fields
before issuing a CRQ_INIT command. But if we get a transport event soon
after registering the CRQ the taskslet may already have recorded the
completion and error code. If we initialize here, we might overwrite/
lose that and end up issuing the CRQ_INIT only to timeout later.
If that timeout happens during probe, we will leave the adapter in the
DOWN state rather than retrying to register/init the CRQ.
Initialize the completion before registering the CRQ so we don't lose
the notification.
Fixes: 032c5e8284 ("Driver for IBM System i/p VNIC protocol")
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Finish initializing the adapter before registering netdev so state
is consistent.
Fixes: c26eba03e4 ("ibmvnic: Update reset infrastructure to support tunable parameters")
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If we get a transport event, set the error and mark the init as
complete so the attempt to send crq-init or login fail sooner
rather than wait for the timeout.
Fixes: bbd669a868 ("ibmvnic: Fix completion structure initialization")
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Define and use a helper to flush the reset queue.
Fixes: 2770a7984d ("ibmvnic: Introduce hard reset recovery")
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We should initialize ->init_done_rc before calling complete(). Otherwise
the waiting thread may see ->init_done_rc as 0 before we have updated it
and may assume that the CRQ was successful.
Fixes: 6b278c0cb3 ("ibmvnic delay complete()")
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix a tiny memory leak when flushing the reset work queue.
Fixes: 2770a7984d ("ibmvnic: Introduce hard reset recovery")
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Steffen Klassert says:
====================
1) Fix PMTU for IPv6 if the reported MTU minus the ESP overhead is
smaller than 1280. From Jiri Bohac.
2) Fix xfrm interface ID and inter address family tunneling when
migrating xfrm states. From Yan Yan.
3) Add missing xfrm intrerface ID initialization on xfrmi_changelink.
From Antony Antony.
4) Enforce validity of xfrm offload input flags so that userspace can't
send undefined flags to the offload driver.
From Leon Romanovsky.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
If I'm not mistaken (and I don't think I am), the way in which the
dcbnl_ops work is that drivers call dcb_ieee_setapp() and this populates
the application table with dynamically allocated struct dcb_app_type
entries that are kept in the module-global dcb_app_list.
However, nobody keeps exact track of these entries, and although
dcb_ieee_delapp() is supposed to remove them, nobody does so when the
interface goes away (example: driver unbinds from device). So the
dcb_app_list will contain lingering entries with an ifindex that no
longer matches any device in dcb_app_lookup().
Reclaim the lost memory by listening for the NETDEV_UNREGISTER event and
flushing the app table entries of interfaces that are now gone.
In fact something like this used to be done as part of the initial
commit (blamed below), but it was done in dcbnl_exit() -> dcb_flushapp(),
essentially at module_exit time. That became dead code after commit
7a6b6f515f ("DCB: fix kconfig option") which essentially merged
"tristate config DCB" and "bool config DCBNL" into a single "bool config
DCB", so net/dcb/dcbnl.c could not be built as a module anymore.
Commit 36b9ad8084 ("net/dcb: make dcbnl.c explicitly non-modular")
recognized this and deleted dcbnl_exit() and dcb_flushapp() altogether,
leaving us with the version we have today.
Since flushing application table entries can and should be done as soon
as the netdevice disappears, fundamentally the commit that is to blame
is the one that introduced the design of this API.
Fixes: 9ab933ab2c ("dcbnl: add appliction tlv handlers")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There's a potential leak issue under following execution sequence :
smc_release smc_connect_work
if (sk->sk_state == SMC_INIT)
send_clc_confirim
tcp_abort();
...
sk.sk_state = SMC_ACTIVE
smc_close_active
switch(sk->sk_state) {
...
case SMC_ACTIVE:
smc_close_final()
// then wait peer closed
Unfortunately, tcp_abort() may discard CLC CONFIRM messages that are
still in the tcp send buffer, in which case our connection token cannot
be delivered to the server side, which means that we cannot get a
passive close message at all. Therefore, it is impossible for the to be
disconnected at all.
This patch tries a very simple way to avoid this issue, once the state
has changed to SMC_ACTIVE after tcp_abort(), we can actively abort the
smc connection, considering that the state is SMC_INIT before
tcp_abort(), abandoning the complete disconnection process should not
cause too much problem.
In fact, this problem may exist as long as the CLC CONFIRM message is
not received by the server. Whether a timer should be added after
smc_close_final() needs to be discussed in the future. But even so, this
patch provides a faster release for connection in above case, it should
also be valuable.
Fixes: 39f41f367b ("net/smc: common release code for non-accepted sockets")
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
Acked-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In this driver's ->ndo_open() callback, it enables DMA interrupts,
starts the DMA channels, then requests interrupts with request_irq(),
and then finally enables napi.
If RX DMA interrupts are received before napi is enabled, no processing
is done because napi_schedule_prep() will return false. If the network
has a lot of broadcast/multicast traffic, then the RX ring could fill up
completely before napi is enabled. When this happens, no further RX
interrupts will be delivered, and the driver will fail to receive any
packets.
Fix this by only enabling DMA interrupts after all other initialization
is complete.
Fixes: 523f11b5d4 ("net: stmmac: move hardware setup for stmmac_open to new function")
Reported-by: Lars Persson <larper@axis.com>
Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes for omaps
Fixes for devkit8000 timer regression. Similar to the earlier beagleboard
fixes, we must not configure the clocksource drivers to use an alternative
timer configuration. It causes unnecessary issues with power management.
Only some old designs based on early beagleboard revisions with a miswired
timer need to use the alternative timer.
* tag 'omap-for-v5.17/fixes-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
ARM: dts: Use 32KiHz oscillator on devkit8000
ARM: dts: switch timer config to common devkit8000 devicetree
Link: https://lore.kernel.org/r/pull-1645606483-876944@atomide.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Revert back to refreshing vmcs.HOST_CR3 immediately prior to VM-Enter.
The PCID (ASID) part of CR3 can be bumped without KVM being scheduled
out, as the kernel will switch CR3 during __text_poke(), e.g. in response
to a static key toggling. If switch_mm_irqs_off() chooses a new ASID for
the mm associate with KVM, KVM will do VM-Enter => VM-Exit with a stale
vmcs.HOST_CR3.
Add a comment to explain why KVM must wait until VM-Enter is imminent to
refresh vmcs.HOST_CR3.
The following splat was captured by stashing vmcs.HOST_CR3 in kvm_vcpu
and adding a WARN in load_new_mm_cr3() to fire if a new ASID is being
loaded for the KVM-associated mm while KVM has a "running" vCPU:
static void load_new_mm_cr3(pgd_t *pgdir, u16 new_asid, bool need_flush)
{
struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
...
WARN(vcpu && (vcpu->cr3 & GENMASK(11, 0)) != (new_mm_cr3 & GENMASK(11, 0)) &&
(vcpu->cr3 & PHYSICAL_PAGE_MASK) == (new_mm_cr3 & PHYSICAL_PAGE_MASK),
"KVM is hosed, loading CR3 = %lx, vmcs.HOST_CR3 = %lx", new_mm_cr3, vcpu->cr3);
}
------------[ cut here ]------------
KVM is hosed, loading CR3 = 8000000105393004, vmcs.HOST_CR3 = 105393003
WARNING: CPU: 4 PID: 20717 at arch/x86/mm/tlb.c:291 load_new_mm_cr3+0x82/0xe0
Modules linked in: vhost_net vhost vhost_iotlb tap kvm_intel
CPU: 4 PID: 20717 Comm: stable Tainted: G W 5.17.0-rc3+ #747
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
RIP: 0010:load_new_mm_cr3+0x82/0xe0
RSP: 0018:ffffc9000489fa98 EFLAGS: 00010082
RAX: 0000000000000000 RBX: 8000000105393004 RCX: 0000000000000027
RDX: 0000000000000027 RSI: 00000000ffffdfff RDI: ffff888277d1b788
RBP: 0000000000000004 R08: ffff888277d1b780 R09: ffffc9000489f8b8
R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000000
R13: ffff88810678a800 R14: 0000000000000004 R15: 0000000000000c33
FS: 00007fa9f0e72700(0000) GS:ffff888277d00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 00000001001b5003 CR4: 0000000000172ea0
Call Trace:
<TASK>
switch_mm_irqs_off+0x1cb/0x460
__text_poke+0x308/0x3e0
text_poke_bp_batch+0x168/0x220
text_poke_finish+0x1b/0x30
arch_jump_label_transform_apply+0x18/0x30
static_key_slow_inc_cpuslocked+0x7c/0x90
static_key_slow_inc+0x16/0x20
kvm_lapic_set_base+0x116/0x190
kvm_set_apic_base+0xa5/0xe0
kvm_set_msr_common+0x2f4/0xf60
vmx_set_msr+0x355/0xe70 [kvm_intel]
kvm_set_msr_ignored_check+0x91/0x230
kvm_emulate_wrmsr+0x36/0x120
vmx_handle_exit+0x609/0x6c0 [kvm_intel]
kvm_arch_vcpu_ioctl_run+0x146f/0x1b80
kvm_vcpu_ioctl+0x279/0x690
__x64_sys_ioctl+0x83/0xb0
do_syscall_64+0x3b/0xc0
entry_SYSCALL_64_after_hwframe+0x44/0xae
</TASK>
---[ end trace 0000000000000000 ]---
This reverts commit 15ad9762d6.
Fixes: 15ad9762d6 ("KVM: VMX: Save HOST_CR3 in vmx_prepare_switch_to_guest()")
Reported-by: Wanpeng Li <kernellwp@gmail.com>
Cc: Lai Jiangshan <laijs@linux.alibaba.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Acked-by: Lai Jiangshan <jiangshanlai@gmail.com>
Message-Id: <20220224191917.3508476-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Undo a nested VMX fix as a step toward reverting the commit it fixed,
15ad9762d6 ("KVM: VMX: Save HOST_CR3 in vmx_prepare_switch_to_guest()"),
as the underlying premise that "host CR3 in the vcpu thread can only be
changed when scheduling" is wrong.
This reverts commit a9f2705ec8.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220224191917.3508476-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The driver uses an atomic_t variable: struct
es58x_device::opened_channel_cnt to keep track of the number of opened
channels in order to only allocate memory for the URBs when this count
changes from zero to one.
While the intent was to prevent race conditions, the choice of an
atomic_t turns out to be a bad idea for several reasons:
- implementation is incorrect and fails to decrement
opened_channel_cnt when the URB allocation fails as reported in
[1].
- even if opened_channel_cnt were to be correctly decremented,
atomic_t is insufficient to cover edge cases: there can be a race
condition in which 1/ a first process fails to allocate URBs
memory 2/ a second process enters es58x_open() before the first
process does its cleanup and decrements opened_channed_cnt. In
which case, the second process would successfully return despite
the URBs memory not being allocated.
- actually, any kind of locking mechanism was useless here because
it is redundant with the network stack big kernel lock
(a.k.a. rtnl_lock) which is being hold by all the callers of
net_device_ops:ndo_open() and net_device_ops:ndo_close(). c.f. the
ASSERST_RTNL() calls in __dev_open() [2] and __dev_close_many()
[3].
The atmomic_t is thus replaced by a simple u8 type and the logic to
increment and decrement es58x_device:opened_channel_cnt is simplified
accordingly fixing the bug reported in [1]. We do not check again for
ASSERST_RTNL() as this is already done by the callers.
[1] https://lore.kernel.org/linux-can/20220201140351.GA2548@kili/T/#u
[2] https://elixir.bootlin.com/linux/v5.16/source/net/core/dev.c#L1463
[3] https://elixir.bootlin.com/linux/v5.16/source/net/core/dev.c#L1541
Fixes: 8537257874 ("can: etas_es58x: add core support for ETAS ES58X CAN USB interfaces")
Link: https://lore.kernel.org/all/20220212112713.577957-1-mailhol.vincent@wanadoo.fr
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Mat Martineau says:
====================
mptcp: Fixes for 5.17
Patch 1 fixes an issue with the SIOCOUTQ ioctl in MPTCP sockets that
have performed a fallback to TCP.
Patch 2 is a selftest fix to correctly remove temp files.
Patch 3 fixes a shift-out-of-bounds issue found by syzkaller.
====================
Link: https://lore.kernel.org/r/20220225005259.318898-1-mathew.j.martineau@linux.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Syzkaller with UBSAN uncovered a scenario where a large number of
DATA_FIN retransmits caused a shift-out-of-bounds in the DATA_FIN
timeout calculation:
================================================================================
UBSAN: shift-out-of-bounds in net/mptcp/protocol.c:470:29
shift exponent 32 is too large for 32-bit type 'unsigned int'
CPU: 1 PID: 13059 Comm: kworker/1:0 Not tainted 5.17.0-rc2-00630-g5fbf21c90c60 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
ubsan_epilogue+0xb/0x5a lib/ubsan.c:151
__ubsan_handle_shift_out_of_bounds.cold+0xb2/0x20e lib/ubsan.c:330
mptcp_set_datafin_timeout net/mptcp/protocol.c:470 [inline]
__mptcp_retrans.cold+0x72/0x77 net/mptcp/protocol.c:2445
mptcp_worker+0x58a/0xa70 net/mptcp/protocol.c:2528
process_one_work+0x9df/0x16d0 kernel/workqueue.c:2307
worker_thread+0x95/0xe10 kernel/workqueue.c:2454
kthread+0x2f4/0x3b0 kernel/kthread.c:377
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
</TASK>
================================================================================
This change limits the maximum timeout by limiting the size of the
shift, which keeps all intermediate values in-bounds.
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/259
Fixes: 6477dd39e6 ("mptcp: Retransmit DATA_FIN")
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
After commit 05be5e273c ("selftests: mptcp: add disconnect tests")
the mptcp selftests leave behind a couple of tmp files after
each run. run_tests_disconnect() misnames a few variables used to
track them. Address the issue setting the appropriate global variables
Fixes: 05be5e273c ("selftests: mptcp: add disconnect tests")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The MPTCP SIOCOUTQ implementation is not very accurate in
case of fallback: it only measures the data in the MPTCP-level
write queue, but it does not take in account the subflow
write queue utilization. In case of fallback the first can be
empty, while the latter is not.
The above produces sporadic self-tests issues and can foul
legit user-space application.
Fix the issue additionally querying the subflow in case of fallback.
Fixes: 644807e3e4 ("mptcp: add SIOCINQ, OUTQ and OUTQNSD ioctls")
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/260
Reported-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Luiz Augusto von Dentz says:
====================
bluetooth pull request for net:
- Fix regression with RFCOMM
- Fix regression with LE devices using Privacy (RPA)
- Fix regression with LE devices not waiting proper timeout to
establish connections
- Fix race in smp
* tag 'for-net-2022-02-24' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: hci_sync: Fix not using conn_timeout
Bluetooth: hci_sync: Fix hci_update_accept_list_sync
Bluetooth: assign len after null check
Bluetooth: Fix bt_skb_sendmmsg not allocating partial chunks
Bluetooth: fix data races in smp_unregister(), smp_del_chan()
Bluetooth: hci_core: Fix leaking sent_cmd skb
====================
Link: https://lore.kernel.org/r/20220224210838.197787-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull clk fixes from Stephen Boyd:
"A couple driver fixes in the clk subsystem
- Fix a hang due to bad clk parent in the Ingenic jz4725b driver
- Fix SD controllers on Qualcomm MSM8994 SoCs by removing clks that
shouldn't be touched"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: jz4725b: fix mmc0 clock gating
clk: qcom: gcc-msm8994: Remove NoC clocks
Pull drm fixes from Dave Airlie:
"Regular drm fixes pull, i915, amdgpu and tegra mostly, all pretty
small.
core:
- edid: Always set RGB444
tegra:
- tegra186 suspend/resume fixes
- syncpoint wait fix
- build warning fix
- eDP on older devices fix
amdgpu:
- Display FP fix
- PCO powergating fix
- RDNA2 OEM SKU stability fixes
- Display PSR fix
- PCI ASPM fix
- Display link encoder fix for TEST_COMMIT
- Raven2 suspend/resume fix
- Fix a regression in virtual display support
- GPUVM eviction fix
i915:
- Fix QGV handling on ADL-P+
- Fix bw atomic check when switching between SAGV vs. no SAGV
- Disconnect PHYs left connected by BIOS on disabled ports
- Fix SAVG to no SAGV transitions on TGL+
- Print PHY name properly on calibration error (DG2)
imx:
- dcss: Select GEM CMA helpers
radeon:
- Fix some variables's type
vc4:
- Fix codec cleanup
- Fix PM reference counting"
* tag 'drm-fixes-2022-02-25' of git://anongit.freedesktop.org/drm/drm: (24 commits)
drm/amdgpu: check vm ready by amdgpu_vm->evicting flag
drm/amdgpu: bypass tiling flag check in virtual display case (v2)
Revert "drm/amdgpu: add modifiers in amdgpu_vkms_plane_init()"
drm/amdgpu: do not enable asic reset for raven2
drm/amd/display: Fix stream->link_enc unassigned during stream removal
drm/amd: Check if ASPM is enabled from PCIe subsystem
drm/edid: Always set RGB444
drm/tegra: dpaux: Populate AUX bus
drm/radeon: fix variable type
drm/amd/display: For vblank_disable_immediate, check PSR is really used
drm/amd/pm: fix some OEM SKU specific stability issues
drm/amdgpu: disable MMHUB PG for Picasso
drm/amd/display: Protect update_bw_bounding_box FPU code.
drm/i915/dg2: Print PHY name properly on calibration error
drm/i915: Fix bw atomic check when switching between SAGV vs. no SAGV
drm/i915: Correctly populate use_sagv_wm for all pipes
drm/i915: Disconnect PHYs left connected by BIOS on disabled ports
drm/i915: Widen the QGV point mask
drm/imx/dcss: i.MX8MQ DCSS select DRM_GEM_CMA_HELPER
drm/vc4: crtc: Fix runtime_pm reference counting
...
TE-gpio, if defined, is placed in the panel's node, not the parent DSI
node. Change the devm_gpiod_get_optional() to gpiod_get_optional() and
pass proper device node to it. The code already has a proper cleanup
path, so it looks that the devm_* variant has been applied accidentally
during the conversion to gpiod API.
Fixes: ee6c8b5afa ("drm/exynos: Replace legacy gpio interface for gpiod interface")
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Fixed a typo.
Signed-off-by: Inki Dae <inki.dae@samsung.com>
TE-gpio is optional and if it is not found then gpiod_get_optional()
returns NULL. In such case the code will continue and try to convert NULL
gpiod to irq what in turn fails. The failure is then propagated and driver
is not registered.
Fix this by returning early from exynos_dsi_register_te_irq() if no
TE-gpio is found.
Fixes: ee6c8b5afa ("drm/exynos: Replace legacy gpio interface for gpiod interface")
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
platform_get_resource(pdev, IORESOURCE_IRQ, ..) relies on static
allocation of IRQ resources in DT core code, this causes an issue
when using hierarchical interrupt domains using "interrupts" property
in the node as this bypassed the hierarchical setup and messed up the
irq chaining.
In preparation for removal of static setup of IRQ resource from DT core
code use platform_get_irq().
Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
platform_get_resource(pdev, IORESOURCE_IRQ, ..) relies on static
allocation of IRQ resources in DT core code, this causes an issue
when using hierarchical interrupt domains using "interrupts" property
in the node as this bypassed the hierarchical setup and messed up the
irq chaining.
In preparation for removal of static setup of IRQ resource from DT core
code use platform_get_irq().
Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
platform_get_resource_byname(pdev, IORESOURCE_IRQ, ..) relies on static
allocation of IRQ resources in DT core code, this causes an issue
when using hierarchical interrupt domains using "interrupts" property
in the node as this bypassed the hierarchical setup and messed up the
irq chaining.
In preparation for removal of static setup of IRQ resource from DT core
code use platform_get_irq_byname().
Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
platform_get_resource(pdev, IORESOURCE_IRQ, ..) relies on static
allocation of IRQ resources in DT core code, this causes an issue
when using hierarchical interrupt domains using "interrupts" property
in the node as this bypassed the hierarchical setup and messed up the
irq chaining.
In preparation for removal of static setup of IRQ resource from DT core
code use platform_get_irq().
Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
platform_get_resource_byname(pdev, IORESOURCE_IRQ, ..) relies on static
allocation of IRQ resources in DT core code, this causes an issue
when using hierarchical interrupt domains using "interrupts" property
in the node as this bypassed the hierarchical setup and messed up the
irq chaining.
In preparation for removal of static setup of IRQ resource from DT core
code use platform_get_irq_byname().
Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
On SC7180 we observe black screens because the gdsc is being
enabled/disabled very rapidly and the GDSC FSM state does not work as
expected. This is due to the fact that the GDSC reset value is being
updated from SW.
The recommended transition delay for mdss core gdsc updated for
SC7180/SC7280/SM8250.
Fixes: dd3d066221 ("clk: qcom: Add display clock controller driver for SC7180")
Fixes: 1a00c962f9 ("clk: qcom: Add display clock controller driver for SC7280")
Fixes: 80a18f4a85 ("clk: qcom: Add display clock controller driver for SM8150 and SM8250")
Signed-off-by: Taniya Das <tdas@codeaurora.org>
Link: https://lore.kernel.org/r/20220223185606.3941-2-tdas@codeaurora.org
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
[sboyd@kernel.org: lowercase hex]
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
GDSCs have multiple transition delays which are used for the GDSC FSM
states. Older targets/designs required these values to be updated from
gdsc code to certain default values for the FSM state to work as
expected. But on the newer targets/designs the values updated from the
GDSC driver can hamper the FSM state to not work as expected.
On SC7180 we observe black screens because the gdsc is being
enabled/disabled very rapidly and the GDSC FSM state does not work as
expected. This is due to the fact that the GDSC reset value is being
updated from SW.
Thus add support to update the transition delay from the clock
controller gdscs as required.
Fixes: 45dd0e5531 ("clk: qcom: Add support for GDSCs)
Signed-off-by: Taniya Das <tdas@codeaurora.org>
Link: https://lore.kernel.org/r/20220223185606.3941-1-tdas@codeaurora.org
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Pull perf tools fixes from Arnaldo Carvalho de Melo:
- Fix double free in in the error path when opening perf.data from
multiple files in a directory instead of from a single file
- Sync the msr-index.h copy with the kernel sources
- Fix error when printing 'weight' field in 'perf script'
- Skip failing sigtrap test for arm+aarch64 in 'perf test'
- Fix failure to use a cpu list for uncore events in hybrid systems,
e.g. Intel Alder Lake
* tag 'perf-tools-fixes-for-v5.17-2022-02-24' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
perf script: Fix error when printing 'weight' field
tools arch x86: Sync the msr-index.h copy with the kernel sources
perf data: Fix double free in perf_session__delete()
perf evlist: Fix failed to use cpu list for uncore events
perf test: Skip failing sigtrap test for arm+aarch64
Pull kvm fixes from Paolo Bonzini:
"x86 host:
- Expose KVM_CAP_ENABLE_CAP since it is supported
- Disable KVM_HC_CLOCK_PAIRING in TSC catchup mode
- Ensure async page fault token is nonzero
- Fix lockdep false negative
- Fix FPU migration regression from the AMX changes
x86 guest:
- Don't use PV TLB/IPI/yield on uniprocessor guests
PPC:
- reserve capability id (topic branch for ppc/kvm)"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: nSVM: disallow userspace setting of MSR_AMD64_TSC_RATIO to non default value when tsc scaling disabled
KVM: x86/mmu: make apf token non-zero to fix bug
KVM: PPC: reserve capability 210 for KVM_CAP_PPC_AIL_MODE_3
x86/kvm: Don't use pv tlb/ipi/sched_yield if on 1 vCPU
x86/kvm: Fix compilation warning in non-x86_64 builds
x86/kvm/fpu: Remove kvm_vcpu_arch.guest_supported_xcr0
x86/kvm/fpu: Limit guest user_xfeatures to supported bits of XCR0
kvm: x86: Disable KVM_HC_CLOCK_PAIRING if tsc is in always catchup mode
KVM: Fix lockdep false negative during host resume
KVM: x86: Add KVM_CAP_ENABLE_CAP to x86
i.MX fixes for 5.17, round 2:
- Drop reset signal from i.MX8MM vpumix power domain to fix a system
hang.
- Fix a dtbs_check warning caused by #thermal-sensor-cells in i.MX8ULP
device tree.
- Fix a clock disabling imbalance in gpcv2 driver.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
ARM: tegra: Device tree fixes for v5.17-rc6
This contains fixes for the eDP panel found on the Venice 2 and Nyan
boards.
* tag 'tegra-for-5.17-arm-dt-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux:
ARM: tegra: Move panels to AUX bus
Link: https://lore.kernel.org/r/20220223162209.293722-1-thierry.reding@gmail.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fix the display-port-sound on Gru devices, DDR voltage on the Quartz-A
board, fix emmc signal-integrity and usb OTG mode on rk3399-puma as well
as a number of dtschema fixes to make the reduce the number of errors.
* tag 'v5.17-rockchip-dtsfixes1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip:
ARM: dts: rockchip: fix a typo on rk3288 crypto-controller
ARM: dts: rockchip: reorder rk322x hmdi clocks
arm64: dts: rockchip: reorder rk3399 hdmi clocks
arm64: dts: rockchip: align pl330 node name with dtschema
arm64: dts: rockchip: fix rk3399-puma eMMC HS400 signal integrity
arm64: dts: rockchip: fix Quartz64-A ddr regulator voltage
arm64: dts: rockchip: Switch RK3399-Gru DP to SPDIF output
arm64: dts: rockchip: fix rk3399-puma-haikou USB OTG mode
arm64: dts: rockchip: drop pclk_xpcs from gmac0 on rk3568
arm64: dts: rockchip: fix dma-controller node names on rk356x
Link: https://lore.kernel.org/r/1973741.CViHJPHrxy@phil
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Pull pci fixes from Bjorn Helgaas:
- Fix a merge error that broke PCI device enumeration on mvebu
platforms, including Turris Omnia (Armada 385) (Pali Rohár)
- Avoid using ATS on all AMD Navi10 and Navi14 GPUs because some
VBIOSes don't account for "harvested" (disabled) parts of the chip
when initializing caches (Alex Deucher)
* tag 'pci-v5.17-fixes-5' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
PCI: Mark all AMD Navi10 and Navi14 GPU ATS as broken
PCI: mvebu: Fix device enumeration regression
Pull networking fixes from Jakub Kicinski:
"Including fixes from bpf and netfilter.
Current release - regressions:
- bpf: fix crash due to out of bounds access into reg2btf_ids
- mvpp2: always set port pcs ops, avoid null-deref
- eth: marvell: fix driver load from initrd
- eth: intel: revert "Fix reset bw limit when DCB enabled with 1 TC"
Current release - new code bugs:
- mptcp: fix race in overlapping signal events
Previous releases - regressions:
- xen-netback: revert hotplug-status changes causing devices to not
be configured
- dsa:
- avoid call to __dev_set_promiscuity() while rtnl_mutex isn't
held
- fix panic when removing unoffloaded port from bridge
- dsa: microchip: fix bridging with more than two member ports
Previous releases - always broken:
- bpf:
- fix crash due to incorrect copy_map_value when both spin lock
and timer are present in a single value
- fix a bpf_timer initialization issue with clang
- do not try bpf_msg_push_data with len 0
- add schedule points in batch ops
- nf_tables:
- unregister flowtable hooks on netns exit
- correct flow offload action array size
- fix a couple of memory leaks
- vsock: don't check owner in vhost_vsock_stop() while releasing
- gso: do not skip outer ip header in case of ipip and net_failover
- smc: use a mutex for locking "struct smc_pnettable"
- openvswitch: fix setting ipv6 fields causing hw csum failure
- mptcp: fix race in incoming ADD_ADDR option processing
- sysfs: add check for netdevice being present to speed_show
- sched: act_ct: fix flow table lookup after ct clear or switching
zones
- eth: intel: fixes for SR-IOV forwarding offloads
- eth: broadcom: fixes for selftests and error recovery
- eth: mellanox: flow steering and SR-IOV forwarding fixes
Misc:
- make __pskb_pull_tail() & pskb_carve_frag_list() drop_monitor
friends not report freed skbs as drops
- force inlining of checksum functions in net/checksum.h"
* tag 'net-5.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (85 commits)
net: mv643xx_eth: process retval from of_get_mac_address
ping: remove pr_err from ping_lookup
Revert "i40e: Fix reset bw limit when DCB enabled with 1 TC"
openvswitch: Fix setting ipv6 fields causing hw csum failure
ipv6: prevent a possible race condition with lifetimes
net/smc: Use a mutex for locking "struct smc_pnettable"
bnx2x: fix driver load from initrd
Revert "xen-netback: Check for hotplug-status existence before watching"
Revert "xen-netback: remove 'hotplug-status' once it has served its purpose"
net/mlx5e: Fix VF min/max rate parameters interchange mistake
net/mlx5e: Add missing increment of count
net/mlx5e: MPLSoUDP decap, fix check for unsupported matches
net/mlx5e: Fix MPLSoUDP encap to use MPLS action information
net/mlx5e: Add feature check for set fec counters
net/mlx5e: TC, Skip redundant ct clear actions
net/mlx5e: TC, Reject rules with forward and drop actions
net/mlx5e: TC, Reject rules with drop and modify hdr action
net/mlx5e: kTLS, Use CHECKSUM_UNNECESSARY for device-offloaded packets
net/mlx5e: Fix wrong return value on ioctl EEPROM query failure
net/mlx5: Fix possible deadlock on rule deletion
...
When using hci_le_create_conn_sync it shall wait for the conn_timeout
since the connection complete may take longer than just 2 seconds.
Also fix the masking of HCI_EV_LE_ENHANCED_CONN_COMPLETE and
HCI_EV_LE_CONN_COMPLETE so they are never both set so we can predict
which one the controller will use in case of HCI_OP_LE_CREATE_CONN.
Fixes: 6cd29ec6ae ("Bluetooth: hci_sync: Wait for proper events when connecting LE")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
hci_update_accept_list_sync is returning the filter based on the error
but that gets overwritten by hci_le_set_addr_resolution_enable_sync
return instead of using the actual result of the likes of
hci_le_add_accept_list_sync which was intended.
Fixes: ad383c2c65 ("Bluetooth: hci_sync: Enable advertising when LL privacy is enabled")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Previous commit e04480920d ("Bluetooth: defer cleanup of resources
in hci_unregister_dev()") defers all destructive actions to
hci_release_dev() to prevent cocurrent problems like NPD, UAF.
However, there are still some exceptions that are ignored.
The smp_unregister() in hci_dev_close_sync() (previously in
hci_dev_do_close) will release resources like the sensitive channel
and the smp_dev objects. Consider the situations the device is detaching
or power down while the kernel is still operating on it, the following
data race could take place.
thread-A hci_dev_close_sync | thread-B read_local_oob_ext_data
|
hci_dev_unlock() |
... | hci_dev_lock()
if (hdev->smp_data) |
chan = hdev->smp_data |
| chan = hdev->smp_data (3)
|
hdev->smp_data = NULL (1) | if (!chan || !chan->data) (4)
... |
smp = chan->data | smp = chan->data
if (smp) |
chan->data = NULL (2) |
... |
kfree_sensitive(smp) |
| // dereference smp trigger UFA
That is, the objects hdev->smp_data and chan->data both suffer from the
data races. In a preempt-enable kernel, the above schedule (when (3) is
before (1) and (4) is before (2)) leads to UAF bugs. It can be
reproduced in the latest kernel and below is part of the report:
[ 49.097146] ================================================================
[ 49.097611] BUG: KASAN: use-after-free in smp_generate_oob+0x2dd/0x570
[ 49.097611] Read of size 8 at addr ffff888006528360 by task generate_oob/155
[ 49.097611]
[ 49.097611] Call Trace:
[ 49.097611] <TASK>
[ 49.097611] dump_stack_lvl+0x34/0x44
[ 49.097611] print_address_description.constprop.0+0x1f/0x150
[ 49.097611] ? smp_generate_oob+0x2dd/0x570
[ 49.097611] ? smp_generate_oob+0x2dd/0x570
[ 49.097611] kasan_report.cold+0x7f/0x11b
[ 49.097611] ? smp_generate_oob+0x2dd/0x570
[ 49.097611] smp_generate_oob+0x2dd/0x570
[ 49.097611] read_local_oob_ext_data+0x689/0xc30
[ 49.097611] ? hci_event_packet+0xc80/0xc80
[ 49.097611] ? sysvec_apic_timer_interrupt+0x9b/0xc0
[ 49.097611] ? asm_sysvec_apic_timer_interrupt+0x12/0x20
[ 49.097611] ? mgmt_init_hdev+0x1c/0x240
[ 49.097611] ? mgmt_init_hdev+0x28/0x240
[ 49.097611] hci_sock_sendmsg+0x1880/0x1e70
[ 49.097611] ? create_monitor_event+0x890/0x890
[ 49.097611] ? create_monitor_event+0x890/0x890
[ 49.097611] sock_sendmsg+0xdf/0x110
[ 49.097611] __sys_sendto+0x19e/0x270
[ 49.097611] ? __ia32_sys_getpeername+0xa0/0xa0
[ 49.097611] ? kernel_fpu_begin_mask+0x1c0/0x1c0
[ 49.097611] __x64_sys_sendto+0xd8/0x1b0
[ 49.097611] ? syscall_exit_to_user_mode+0x1d/0x40
[ 49.097611] do_syscall_64+0x3b/0x90
[ 49.097611] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 49.097611] RIP: 0033:0x7f5a59f51f64
...
[ 49.097611] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f5a59f51f64
[ 49.097611] RDX: 0000000000000007 RSI: 00007f5a59d6ac70 RDI: 0000000000000006
[ 49.097611] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[ 49.097611] R10: 0000000000000040 R11: 0000000000000246 R12: 00007ffec26916ee
[ 49.097611] R13: 00007ffec26916ef R14: 00007f5a59d6afc0 R15: 00007f5a59d6b700
To solve these data races, this patch places the smp_unregister()
function in the protected area by the hci_dev_lock(). That is, the
smp_unregister() function can not be concurrently executed when
operating functions (most of them are mgmt operations in mgmt.c) hold
the device lock.
This patch is tested with kernel LOCK DEBUGGING enabled. The price from
the extended holding time of the device lock is supposed to be low as the
smp_unregister() function is fairly short and efficient.
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
sent_cmd memory is not freed before freeing hci_dev causing it to leak
it contents.
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
- Fix QGV handling on ADL-P+ (Ville Syrjälä)
- Fix bw atomic check when switching between SAGV vs. no SAGV (Ville Syrjälä)
- Disconnect PHYs left connected by BIOS on disabled ports (Imre Deak)
- Fix SAVG to no SAGV transitions on TGL+ (Ville Syrjälä)
- Print PHY name properly on calibration error (DG2) (Matt Roper)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/YhdyHwRWkOTWwlqi@tursulin-mobl2
Pull block fixes from Jens Axboe:
- NVMe pull request:
- send H2CData PDUs based on MAXH2CDATA (Varun Prakash)
- fix passthrough to namespaces with unsupported features (Christoph
Hellwig)
- Clear iocb->private at poll completion (Stefano)
* tag 'block-5.17-2022-02-24' of git://git.kernel.dk/linux-block:
nvme-tcp: send H2CData PDUs based on MAXH2CDATA
nvme: also mark passthrough-only namespaces ready in nvme_update_ns_info
nvme: don't return an error from nvme_configure_metadata
block: clear iocb->private in blkdev_bio_end_io_async()
Pull io_uring fixes from Jens Axboe:
- Add a conditional schedule point in io_add_buffers() (Eric)
- Fix for a quiesce speedup merged in this release (Dylan)
- Don't convert to jiffies for event timeout waiting, it's way too
coarse when we accept a timespec as input (me)
* tag 'io_uring-5.17-2022-02-23' of git://git.kernel.dk/linux-block:
io_uring: disallow modification of rsrc_data during quiesce
io_uring: don't convert to jiffies for waiting on timeouts
io_uring: add a schedule point in io_add_buffers()
Pull ARM cpufreq fixes for 5.18-rc6 from Viresh Kumar:
"This fixes issues related to throttle IRQ for Qcom SoCs."
* 'cpufreq/arm/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm:
cpufreq: qcom-hw: Delay enabling throttle_irq
cpufreq: Reintroduce ready() callback
Pull more x86 platform driver fixes from Hans de Goede:
"Two more fixes:
- Fix suspend/resume regression on AMD Cezanne APUs in >= 5.16
- Fix Microsoft Surface 3 battery readings"
* tag 'platform-drivers-x86-v5.17-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
surface: surface3_power: Fix battery readings on batteries without a serial number
platform/x86: amd-pmc: Set QOS during suspend on CZN w/ timer wakeup
Obtaining a MAC address may be deferred in cases when the MAC is stored
in an NVMEM block, for example, and it may not be ready upon the first
retrieval attempt and return EPROBE_DEFER.
It is also possible that a port that does not rely on NVMEM has been
already created when getting the defer request. Thus, also the resources
allocated previously must be freed when doing a roll-back.
Fixes: 76723bca28 ("net: mv643xx_eth: add DT parsing support")
Signed-off-by: Mauri Sandberg <maukka@ext.kapsi.fi>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20220223142337.41757-1-maukka@ext.kapsi.fi
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If nested tsc scaling is disabled, MSR_AMD64_TSC_RATIO should
never have non default value.
Due to way nested tsc scaling support was implmented in qemu,
it would set this msr to 0 when nested tsc scaling was disabled.
Ignore that value for now, as it causes no harm.
Fixes: 5228eb96a4 ("KVM: x86: nSVM: implement nested TSC scaling")
Cc: stable@vger.kernel.org
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220223115649.319134-1-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
In current async pagefault logic, when a page is ready, KVM relies on
kvm_arch_can_dequeue_async_page_present() to determine whether to deliver
a READY event to the Guest. This function test token value of struct
kvm_vcpu_pv_apf_data, which must be reset to zero by Guest kernel when a
READY event is finished by Guest. If value is zero meaning that a READY
event is done, so the KVM can deliver another.
But the kvm_arch_setup_async_pf() may produce a valid token with zero
value, which is confused with previous mention and may lead the loss of
this READY event.
This bug may cause task blocked forever in Guest:
INFO: task stress:7532 blocked for more than 1254 seconds.
Not tainted 5.10.0 #16
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:stress state:D stack: 0 pid: 7532 ppid: 1409
flags:0x00000080
Call Trace:
__schedule+0x1e7/0x650
schedule+0x46/0xb0
kvm_async_pf_task_wait_schedule+0xad/0xe0
? exit_to_user_mode_prepare+0x60/0x70
__kvm_handle_async_pf+0x4f/0xb0
? asm_exc_page_fault+0x8/0x30
exc_page_fault+0x6f/0x110
? asm_exc_page_fault+0x8/0x30
asm_exc_page_fault+0x1e/0x30
RIP: 0033:0x402d00
RSP: 002b:00007ffd31912500 EFLAGS: 00010206
RAX: 0000000000071000 RBX: ffffffffffffffff RCX: 00000000021a32b0
RDX: 000000000007d011 RSI: 000000000007d000 RDI: 00000000021262b0
RBP: 00000000021262b0 R08: 0000000000000003 R09: 0000000000000086
R10: 00000000000000eb R11: 00007fefbdf2baa0 R12: 0000000000000000
R13: 0000000000000002 R14: 000000000007d000 R15: 0000000000001000
Signed-off-by: Liang Zhang <zhangliang5@huawei.com>
Message-Id: <20220222031239.1076682-1-zhangliang5@huawei.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Revert of a patch that instead of fixing a AQ error when trying
to reset BW limit introduced several regressions related to
creation and managing TC. Currently there are errors when creating
a TC on both PF and VF.
Error log:
[17428.783095] i40e 0000:3b:00.1: AQ command Config VSI BW allocation per TC failed = 14
[17428.783107] i40e 0000:3b:00.1: Failed configuring TC map 0 for VSI 391
[17428.783254] i40e 0000:3b:00.1: AQ command Config VSI BW allocation per TC failed = 14
[17428.783259] i40e 0000:3b:00.1: Unable to configure TC map 0 for VSI 391
This reverts commit 3d2504663c.
Fixes: 3d2504663c (i40e: Fix reset bw limit when DCB enabled with 1 TC)
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20220223175347.1690692-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This reverts commit 2afeec08ab.
The reasoning in the commit was wrong - the code expected to setup the
watch even if 'hotplug-status' didn't exist. In fact, it relied on the
watch being fired the first time - to check if maybe 'hotplug-status' is
already set to 'connected'. Not registering a watch for non-existing
path (which is the case if hotplug script hasn't been executed yet),
made the backend not waiting for the hotplug script to execute. This in
turns, made the netfront think the interface is fully operational, while
in fact it was not (the vif interface on xen-netback side might not be
configured yet).
This was a workaround for 'hotplug-status' erroneously being removed.
But since that is reverted now, the workaround is not necessary either.
More discussion at
https://lore.kernel.org/xen-devel/afedd7cb-a291-e773-8b0d-4db9b291fa98@ipxe.org/T/#u
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Reviewed-by: Michael Brown <mbrown@fensystems.co.uk>
Link: https://lore.kernel.org/r/20220222001817.2264967-2-marmarek@invisiblethingslab.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This reverts commit 1f2565780e.
The 'hotplug-status' node should not be removed as long as the vif
device remains configured. Otherwise the xen-netback would wait for
re-running the network script even if it was already called (in case of
the frontent re-connecting). But also, it _should_ be removed when the
vif device is destroyed (for example when unbinding the driver) -
otherwise hotplug script would not configure the device whenever it
re-appear.
Moving removal of the 'hotplug-status' node was a workaround for nothing
calling network script after xen-netback module is reloaded. But when
vif interface is re-created (on xen-netback unbind/bind for example),
the script should be called, regardless of who does that - currently
this case is not handled by the toolstack, and requires manual
script call. Keeping hotplug-status=connected to skip the call is wrong
and leads to not configured interface.
More discussion at
https://lore.kernel.org/xen-devel/afedd7cb-a291-e773-8b0d-4db9b291fa98@ipxe.org/T/#u
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Link: https://lore.kernel.org/r/20220222001817.2264967-1-marmarek@invisiblethingslab.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
There is a big gap between inode_should_defrag() and autodefrag extent
size threshold. For inode_should_defrag() it has a flexible
@small_write value. For compressed extent is 16K, and for non-compressed
extent it's 64K.
However for autodefrag extent size threshold, it's always fixed to the
default value (256K).
This means, the following write sequence will trigger autodefrag to
defrag ranges which didn't trigger autodefrag:
pwrite 0 8k
sync
pwrite 8k 128K
sync
The latter 128K write will also be considered as a defrag target (if
other conditions are met). While only that 8K write is really
triggering autodefrag.
Such behavior can cause extra IO for autodefrag.
Close the gap, by copying the @small_write value into inode_defrag, so
that later autodefrag can use the same @small_write value which
triggered autodefrag.
With the existing transid value, this allows autodefrag really to scan
the ranges which triggered autodefrag.
Although this behavior change is mostly reducing the extent_thresh value
for autodefrag, I believe in the future we should allow users to specify
the autodefrag extent threshold through mount options, but that's an
other problem to consider in the future.
CC: stable@vger.kernel.org # 5.16+
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Future CPUs may implement a clearbhb instruction that is sufficient
to mitigate SpectreBHB. CPUs that implement this instruction, but
not CSV2.3 must be affected by Spectre-BHB.
Add support to use this instruction as the BHB mitigation on CPUs
that support it. The instruction is in the hint space, so it will
be treated by a NOP as older CPUs.
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Pull NVMe fixes from Christoph:
"nvme fixes for Linux 5.17
- send H2CData PDUs based on MAXH2CDATA (Varun Prakash)
- fix passthrough to namespaces with unsupported features (me)"
* tag 'nvme-5.17-2022-02-24' of git://git.infradead.org/nvme:
nvme-tcp: send H2CData PDUs based on MAXH2CDATA
nvme: also mark passthrough-only namespaces ready in nvme_update_ns_info
nvme: don't return an error from nvme_configure_metadata
KVM allows the guest to discover whether the ARCH_WORKAROUND SMCCC are
implemented, and to preserve that state during migration through its
firmware register interface.
Add the necessary boiler plate for SMCCC_ARCH_WORKAROUND_3.
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Speculation attacks against some high-performance processors can
make use of branch history to influence future speculation.
When taking an exception from user-space, a sequence of branches
or a firmware call overwrites or invalidates the branch history.
The sequence of branches is added to the vectors, and should appear
before the first indirect branch. For systems using KPTI the sequence
is added to the kpti trampoline where it has a free register as the exit
from the trampoline is via a 'ret'. For systems not using KPTI, the same
register tricks are used to free up a register in the vectors.
For the firmware call, arch-workaround-3 clobbers 4 registers, so
there is no choice but to save them to the EL1 stack. This only happens
for entry from EL0, so if we take an exception due to the stack access,
it will not become re-entrant.
For KVM, the existing branch-predictor-hardening vectors are used.
When a spectre version of these vectors is in use, the firmware call
is sufficient to mitigate against Spectre-BHB. For the non-spectre
versions, the sequence of branches is added to the indirect vector.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Johan writes:
USB-serial fixes for 5.17-rc6
Here's a revert of a commit which erroneously added a device id used for
the EPP/MEM mode of ch341 devices.
Included are also some new modem device ids.
All have been in linux-next with no reported issues.
* tag 'usb-serial-5.17-rc6' of https://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial:
USB: serial: option: add Telit LE910R1 compositions
USB: serial: option: add support for DW5829e
Revert "USB: serial: ch341: add new Product ID for CH341A"
The battery on the 2nd hand Surface 3 which I recently bought appears to
not have a serial number programmed in. This results in any I2C reads from
the registers containing the serial number failing with an I2C NACK.
This was causing mshw0011_bix() to fail causing the battery readings to
not work at all.
Ignore EREMOTEIO (I2C NACK) errors when retrieving the serial number and
continue with an empty serial number to fix this.
Fixes: b1f81b496b ("platform/x86: surface3_power: MSHW0011 rev-eng implementation")
BugLink: https://github.com/linux-surface/linux-surface/issues/608
Reviewed-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Reviewed-by: Maximilian Luz <luzmaximilian@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20220224101848.7219-1-hdegoede@redhat.com
commit 59348401eb ("platform/x86: amd-pmc: Add special handling for
timer based S0i3 wakeup") adds support for using another platform timer
in lieu of the RTC which doesn't work properly on some systems. This path
was validated and worked well before submission. During the 5.16-rc1 merge
window other patches were merged that caused this to stop working properly.
When this feature was used with 5.16-rc1 or later some OEM laptops with the
matching firmware requirements from that commit would shutdown instead of
program a timer based wakeup.
This was bisected to commit 8d89835b04 ("PM: suspend: Do not pause
cpuidle in the suspend-to-idle path"). This wasn't supposed to cause any
negative impacts and also tested well on both Intel and ARM platforms.
However this changed the semantics of when CPUs are allowed to be in the
deepest state. For the AMD systems in question it appears this causes a
firmware crash for timer based wakeup.
It's hypothesized to be caused by the `amd-pmc` driver sending `OS_HINT`
and all the CPUs going into a deep state while the timer is still being
programmed. It's likely a firmware bug, but to avoid it don't allow setting
CPUs into the deepest state while using CZN timer wakeup path.
If later it's discovered that this also occurs from "regular" suspends
without a timer as well or on other silicon, this may be later expanded to
run in the suspend path for more scenarios.
Cc: stable@vger.kernel.org # 5.16+
Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/linux-acpi/BL1PR12MB51570F5BD05980A0DCA1F3F4E23A9@BL1PR12MB5157.namprd12.prod.outlook.com/T/#mee35f39c41a04b624700ab2621c795367f19c90e
Fixes: 8d89835b04 ("PM: suspend: Do not pause cpuidle in the suspend-to-idle path")
Fixes: 23f62d7ab2 ("PM: sleep: Pause cpuidle later and resume it earlier during system transitions")
Fixes: 59348401eb ("platform/x86: amd-pmc: Add special handling for timer based S0i3 wakeup"
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://lore.kernel.org/r/20220223175237.6209-1-mario.limonciello@amd.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
The interrupt service routine registered for the gadget is a primary
handler which mask the interrupt source and a threaded handler which
handles the source of the interrupt. Since the threaded handler is
voluntary threaded, the IRQ-core does not disable bottom halves before
invoke the handler like it does for the forced-threaded handler.
Due to changes in networking it became visible that a network gadget's
completions handler may schedule a softirq which remains unprocessed.
The gadget's completion handler is usually invoked either in hard-IRQ or
soft-IRQ context. In this context it is enough to just raise the softirq
because the softirq itself will be handled once that context is left.
In the case of the voluntary threaded handler, there is nothing that
will process pending softirqs. Which means it remain queued until
another random interrupt (on this CPU) fires and handles it on its exit
path or another thread locks and unlocks a lock with the bh suffix.
Worst case is that the CPU goes idle and the NOHZ complains about
unhandled softirqs.
Disable bottom halves before acquiring the lock (and disabling
interrupts) and enable them after dropping the lock. This ensures that
any pending softirqs will handled right away.
Link: https://lkml.kernel.org/r/c2a64979-73d1-2c22-e048-c275c9f81558@samsung.com
Fixes: e5f68b4a3e ("Revert "usb: dwc3: gadget: remove unnecessary _irqsave()"")
Cc: stable <stable@kernel.org>
Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://lore.kernel.org/r/Yg/YPejVQH3KkRVd@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Saeed Mahameed says:
====================
mlx5 fixes 2022-02-22
This series provides bug fixes to mlx5 driver.
Please pull and let me know if there is any problem.
* tag 'mlx5-fixes-2022-02-23' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5e: Fix VF min/max rate parameters interchange mistake
net/mlx5e: Add missing increment of count
net/mlx5e: MPLSoUDP decap, fix check for unsupported matches
net/mlx5e: Fix MPLSoUDP encap to use MPLS action information
net/mlx5e: Add feature check for set fec counters
net/mlx5e: TC, Skip redundant ct clear actions
net/mlx5e: TC, Reject rules with forward and drop actions
net/mlx5e: TC, Reject rules with drop and modify hdr action
net/mlx5e: kTLS, Use CHECKSUM_UNNECESSARY for device-offloaded packets
net/mlx5e: Fix wrong return value on ioctl EEPROM query failure
net/mlx5: Fix possible deadlock on rule deletion
net/mlx5: Fix tc max supported prio for nic mode
net/mlx5: Fix wrong limitation of metadata match on ecpf
net/mlx5: Update log_max_qp value to be 17 at most
net/mlx5: DR, Fix the threshold that defines when pool sync is initiated
net/mlx5: DR, Don't allow match on IP w/o matching on full ethertype/ip_version
net/mlx5: DR, Fix slab-out-of-bounds in mlx5_cmd_dr_create_fte
net/mlx5: DR, Cache STE shadow memory
net/mlx5: Update the list of the PCI supported devices
====================
Link: https://lore.kernel.org/r/20220224001123.365265-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull devicetree fixes from Rob Herring:
- Update some maintainers email addresses
- Fix handling of elfcorehdr reservation for crash dump kernel
- Fix unittest expected warnings text
* tag 'devicetree-fixes-for-5.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
dt-bindings: update Roger Quadros email
MAINTAINERS: sifive: drop Yash Shah
of/fdt: move elfcorehdr reservation early for crash dump kernel
of: unittest: update text of expected warnings
Pull selinux fix from Paul Moore:
"A second small SELinux fix which addresses an incorrect
mutex_is_locked() check"
* tag 'selinux-pr-20220223' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: fix misuse of mutex_is_locked()
The VF min and max rate were passed incorrectly and resulted in wrongly
interchanging them. Fix the order of parameters in
mlx5_esw_qos_set_vport_rate().
Fixes: d7df09f5e7 ("net/mlx5: E-switch, Enable vport QoS on demand")
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Aya Levin <ayal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Add mistakenly missing increment of count variable when looping over
output buffer in mlx5e_self_test().
This resolves the issue of garbage values output when querying with self
test via ethtool.
before:
$ ethtool -t eth2
The test result is PASS
The test extra info:
Link Test 0
Speed Test 1768697188
Health Test 758528120
Loopback Test 3288687
after:
$ ethtool -t eth2
The test result is PASS
The test extra info:
Link Test 0
Speed Test 0
Health Test 0
Loopback Test 0
Fixes: 7990b1b5e8 ("net/mlx5e: loopback test is not supported in switchdev mode")
Signed-off-by: Lama Kayal <lkayal@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Currently offload of rule on bareudp device require tunnel key
in order to match on mpls fields and without it the mpls fields
are ignored, this is incorrect due to the fact udp tunnel doesn't
have key to match on.
Fix by returning error in case flow is matching on tunnel key.
Fixes: 72046a91d1 ("net/mlx5e: Allow to match on mpls parameters")
Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Currently the MPLSoUDP encap builds the MPLS header using encap action
information (tunnel id, ttl and tos) instead of the MPLS action
information (label, ttl, tc and bos) which is wrong.
Fix by storing the MPLS action information during the flow action
parse and later using it to create the encap MPLS header.
Fixes: f828ca6a2f ("net/mlx5e: Add support for hw encapsulation of MPLS over UDP")
Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Fec counters support is checked via the PCAM feature_cap_mask,
bit 0: PPCNT_counter_group_Phy_statistical_counter_group.
Add feature check to avoid faulty behavior.
Fixes: 0a1498ebfa ("net/mlx5e: Expose FEC counters via ethtool")
Signed-off-by: Lama Kayal <lkayal@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Offload of ct clear action is just resetting the reg_c register.
It's done by allocating modify hdr resources which is limited.
Doing it multiple times is redundant and wasting modify hdr resources
and if resources depleted the driver will fail offloading the rule.
Ignore redundant ct clear actions after the first one.
Fixes: 806401c20a ("net/mlx5e: CT, Fix multiple allocations and memleak of mod acts")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Ariel Levkovich <lariel@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Such rules are redundant but allowed and passed to the driver.
The driver does not support offloading such rules so return an error.
Fixes: 03a9d11e6e ("net/mlx5e: Add TC drop and mirred/redirect action parsing for SRIOV offloads")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
This kind of action is not supported by firmware and generates a
syndrome.
kernel: mlx5_core 0000:08:00.0: mlx5_cmd_check:777:(pid 102063): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x8708c3)
Fixes: d7e75a325c ("net/mlx5e: Add offloading of E-Switch TC pedit (header re-write) actions")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
For RX TLS device-offloaded packets, the HW spec guarantees checksum
validation for the offloaded packets, but does not define whether the
CQE.checksum field matches the original packet (ciphertext) or
the decrypted one (plaintext). This latitude allows architetctural
improvements between generations of chips, resulting in different decisions
regarding the value type of CQE.checksum.
Hence, for these packets, the device driver should not make use of this CQE
field. Here we block CHECKSUM_COMPLETE usage for RX TLS device-offloaded
packets, and use CHECKSUM_UNNECESSARY instead.
Value of the packet's tcp_hdr.csum is not modified by the HW, and it always
matches the original ciphertext.
Fixes: 1182f36593 ("net/mlx5e: kTLS, Add kTLS RX HW offload support")
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The ioctl EEPROM query wrongly returns success on read failures, fix
that by returning the appropriate error code.
Fixes: bb64143eee ("net/mlx5e: Add ethtool support for dump module EEPROM")
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Add missing call to up_write_ref_node() which releases the semaphore
in case the FTE doesn't have destinations, such in drop rule case.
Fixes: 465e7baab6 ("net/mlx5: Fix deletion of duplicate rules")
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Only prio 1 is supported if firmware doesn't support ignore flow
level for nic mode. The offending commit removed the check wrongly.
Add it back.
Fixes: 9a99c8f125 ("net/mlx5e: E-Switch, Offload all chain 0 priorities when modify header and forward action is not supported")
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Match metadata support check returns false for ecpf device.
However, this support does exist for ecpf and therefore this
limitation should be removed to allow feature such as stacked
devices and internal port offloaded to be supported.
Fixes: 92ab1eb392 ("net/mlx5: E-Switch, Enable vport metadata matching if firmware supports it")
Signed-off-by: Ariel Levkovich <lariel@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Currently, log_max_qp value is dependent on what FW reports as its max capability.
In reality, due to a bug, some FWs report a value greater than 17, even though they
don't support log_max_qp > 17.
This FW issue led the driver to exhaust memory on startup.
Thus, log_max_qp value is set to be no more than 17 regardless
of what FW reports, as it was before the cited commit.
Fixes: f79a609ea6 ("net/mlx5: Update log_max_qp value to FW max capability")
Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
Reviewed-by: Avihai Horon <avihaih@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
When deciding whether to start syncing and actually free all the "hot"
ICM chunks, we need to consider the type of the ICM chunks that we're
dealing with. For instance, the amount of available ICM for MODIFY_ACTION
is significantly lower than the usual STE ICM, so the threshold should
account for that - otherwise we can deplete MODIFY_ACTION memory just by
creating and deleting the same modify header action in a continuous loop.
This patch replaces the hard-coded threshold with a dynamic value.
Fixes: 1c58651412 ("net/mlx5: DR, ICM memory pools sync optimization")
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Currently SMFS allows adding rule with matching on src/dst IP w/o matching
on full ethertype or ip_version, which is not supported by HW.
This patch fixes this issue and adds the check as it is done in DMFS.
Fixes: 26d688e33f ("net/mlx5: DR, Add Steering entry (STE) utilities")
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
When adding a rule with 32 destinations, we hit the following out-of-band
access issue:
BUG: KASAN: slab-out-of-bounds in mlx5_cmd_dr_create_fte+0x18ee/0x1e70
This patch fixes the issue by both increasing the allocated buffers to
accommodate for the needed actions and by checking the number of actions
to prevent this issue when a rule with too many actions is provided.
Fixes: 1ffd498901 ("net/mlx5: DR, Increase supported num of actions to 32")
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
During rule insertion on each ICM memory chunk we also allocate shadow memory
used for management. This includes the hw_ste, dr_ste and miss list per entry.
Since the scale of these allocations is large we noticed a performance hiccup
that happens once malloc and free are stressed.
In extreme usecases when ~1M chunks are freed at once, it might take up to 40
seconds to complete this, up to the point the kernel sees this as self-detected
stall on CPU:
rcu: INFO: rcu_sched self-detected stall on CPU
To resolve this we will increase the reuse of shadow memory.
Doing this we see that a time in the aforementioned usecase dropped from ~40
seconds to ~8-10 seconds.
Fixes: 29cf8febd1 ("net/mlx5: DR, ICM pool memory allocator")
Signed-off-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Add the upcoming BlueField-4 and ConnectX-8 device IDs.
Fixes: 2e9d3e83ab ("net/mlx5: Update the list of the PCI supported devices")
Signed-off-by: Meir Lichtinger <meirl@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Workstation application ANSA/META v21.1.4 get this error dmesg when
running CI test suite provided by ANSA/META:
[drm:amdgpu_gem_va_ioctl [amdgpu]] *ERROR* Couldn't update BO_VA (-16)
This is caused by:
1. create a 256MB buffer in invisible VRAM
2. CPU map the buffer and access it causes vm_fault and try to move
it to visible VRAM
3. force visible VRAM space and traverse all VRAM bos to check if
evicting this bo is valuable
4. when checking a VM bo (in invisible VRAM), amdgpu_vm_evictable()
will set amdgpu_vm->evicting, but latter due to not in visible
VRAM, won't really evict it so not add it to amdgpu_vm->evicted
5. before next CS to clear the amdgpu_vm->evicting, user VM ops
ioctl will pass amdgpu_vm_ready() (check amdgpu_vm->evicted)
but fail in amdgpu_vm_bo_update_mapping() (check
amdgpu_vm->evicting) and get this error log
This error won't affect functionality as next CS will finish the
waiting VM ops. But we'd better clear the error log by checking
the amdgpu_vm->evicting flag in amdgpu_vm_ready() to stop calling
amdgpu_vm_bo_update_mapping() later.
Another reason is amdgpu_vm->evicted list holds all BOs (both
user buffer and page table), but only page table BOs' eviction
prevent VM ops. amdgpu_vm->evicting flag is set only for page
table BOs, so we should use evicting flag instead of evicted list
in amdgpu_vm_ready().
The side effect of this change is: previously blocked VM op (user
buffer in "evicted" list but no page table in it) gets done
immediately.
v2: update commit comments.
Acked-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Qiang Yu <qiang.yu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
vkms leverages common amdgpu framebuffer creation, and
also as it does not support FB modifier, there is no need
to check tiling flags when initing framebuffer when virtual
display is enabled.
This can fix below calltrace:
amdgpu 0000:00:08.0: GFX9+ requires FB check based on format modifier
WARNING: CPU: 0 PID: 1023 at drivers/gpu/drm/amd/amdgpu/amdgpu_display.c:1150 amdgpu_display_framebuffer_init+0x8e7/0xb40 [amdgpu]
v2: check adev->enable_virtual_display instead as vkms can be
enabled in bare metal as well.
Signed-off-by: Leslie Shi <Yuliang.Shi@amd.com>
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This reverts commit 4046afcebf.
No need to support modifier in virtual kms, otherwise, in SRIOV
mode, when lanuching X server, set crtc will fail due to mismatch
between primary plane modifier and framebuffer modifier.
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The GPU reset function of raven2 is not maintained or tested, so it should be
very unstable.
Now the amdgpu_asic_reset function is added to amdgpu_pmops_suspend, which
causes the S3 test of raven2 to fail, so the asic_reset of raven2 is ignored
here.
Fixes: daf8de0874 ("drm/amdgpu: always reset the asic in suspend (v2)")
Signed-off-by: Chen Gong <curry.gong@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
[Why]
Found when running igt@kms_atomic.
Userspace attempts to do a TEST_COMMIT when 0 streams which calls
dc_remove_stream_from_ctx. This in turn calls link_enc_unassign
which ends up modifying stream->link = NULL directly, causing the
global link_enc to be removed preventing further link activity
and future link validation from passing.
[How]
We take care of link_enc unassignment at the start of
link_enc_cfg_link_encs_assign so this call is no longer necessary.
Fixes global state from being modified while unlocked.
Reviewed-by: Jimmy Kizito <Jimmy.Kizito@amd.com>
Acked-by: Jasdeep Dhillon <jdhillon@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
We are racing the registering of .to_irq when probing the
i2c driver. This results in random failure of touchscreen
devices.
Following explains the race condition better.
[gpio driver] gpio driver registers gpio chip
[gpio consumer] gpio is acquired
[gpio consumer] gpiod_to_irq() fails with -ENXIO
[gpio driver] gpio driver registers irqchip
gpiod_to_irq works at this point, but -ENXIO is fatal
We could see the following errors in dmesg logs when gc->to_irq is NULL
[2.101857] i2c_hid i2c-FTS3528:00: HID over i2c has not been provided an Int IRQ
[2.101953] i2c_hid: probe of i2c-FTS3528:00 failed with error -22
To avoid this situation, defer probing until to_irq is registered.
Returning -EPROBE_DEFER would be the first step towards avoiding
the failure of devices due to the race in registration of .to_irq.
Final solution to this issue would be to avoid using gc irq members
until they are fully initialized.
This issue has been reported many times in past and people have been
using workarounds like changing the pinctrl_amd to built-in instead
of loading it as a module or by adding a softdep for pinctrl_amd into
the config file.
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=209413
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Shreeya Patel <shreeya.patel@collabora.com>
Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl>
Pull parisc unaligned handler fixes from Helge Deller:
"Two patches which fix a few bugs in the unalignment handlers.
The fldd and fstd instructions weren't handled at all on 32-bit
kernels, the stw instruction didn't check for fault errors and the
fldw_l and ldw_m were handled wrongly as integer vs floating point
instructions.
Both patches are tagged for stable series"
* tag 'for-5.17/parisc-4' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc/unaligned: Fix ldw() and stw() unalignment handlers
parisc/unaligned: Fix fldd and fstd unaligned handlers on 32-bit kernel
Pull hwmon fixes from Guenter Roeck:
"Fix two old bugs and one new bug in the hwmon subsystem:
- In pmbus core, clear pmbus fault/warning status bits after read to
follow PMBus standard
- In hwmon core, handle failure to register sensor with thermal zone
correctly
- In ntc_thermal driver, use valid thermistor names for Samsung
thermistors"
* tag 'hwmon-for-v5.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (pmbus) Clear pmbus fault/warning bits after read
hwmon: Handle failure to register sensor with thermal zone correctly
hwmon: (ntc_thermistor) Underscore Samsung thermistor
Pull slab fixes from Vlastimil Babka:
- Build fix (workaround) for clang.
- Fix a /proc/kcore based slabinfo script broken by struct slab changes
in 5.17-rc1.
* tag 'slab-for-5.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
tools/cgroup/slabinfo: update to work with struct slab
slab: remove __alloc_size attribute from __kmalloc_track_caller
While the HVS has the same context memory size in the BCM2711 than in
the previous SoCs, the range allocated to the registers doubled and it
now takes 16k + 16k, compared to 8k + 16k before.
The KMS driver will use the whole context RAM though, eventually
resulting in a pointer dereference error when we access the higher half
of the context memory since it hasn't been mapped.
Fixes: 4564363351 ("ARM: dts: bcm2711: Enable the display pipeline")
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
There are enough VBIOS escapes without the proper workaround that some
users still hit this. Microsoft never productized ATS on Windows so OEM
platforms that were Windows-only didn't always validate ATS.
The advantages of ATS are not worth it compared to the potential
instabilities on harvested boards. Disable ATS on all Navi10 and Navi14
boards.
Symptoms include:
amdgpu 0000:07:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0007 address=0xffffc02000 flags=0x0000]
AMD-Vi: Event logged [IO_PAGE_FAULT device=07:00.0 domain=0x0007 address=0xffffc02000 flags=0x0000]
[drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=6047, emitted seq=6049
amdgpu 0000:07:00.0: amdgpu: GPU reset begin!
amdgpu 0000:07:00.0: amdgpu: GPU reset succeeded, trying to resume
amdgpu 0000:07:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring sdma0 test failed (-110)
[drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <sdma_v4_0> failed -110
amdgpu 0000:07:00.0: amdgpu: GPU reset(1) failed
Related commits:
e8946a53e2 ("PCI: Mark AMD Navi14 GPU ATS as broken")
a2da5d8cc0 ("PCI: Mark AMD Raven iGPU ATS as broken in some platforms")
45beb31d3a ("PCI: Mark AMD Navi10 GPU rev 0x00 ATS as broken")
5e89cd303e ("PCI: Mark AMD Navi14 GPU rev 0xc5 ATS as broken")
d28ca864c4 ("PCI: Mark AMD Stoney Radeon R7 GPU ATS as broken")
9b44b0b09d ("PCI: Mark AMD Stoney GPU ATS as broken")
[bhelgaas: add symptoms and related commits]
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1760
Link: https://lore.kernel.org/r/20220222160801.841643-1-alexander.deucher@amd.com
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Guchun Chen <guchun.chen@amd.com>
Fix 3 bugs:
a) emulate_stw() doesn't return the error code value, so faulting
instructions are not reported and aborted.
b) Tell emulate_ldw() to handle fldw_l as floating point instruction
c) Tell emulate_ldw() to handle ldw_m as integer instruction
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org
Usually the kernel provides fixup routines to emulate the fldd and fstd
floating-point instructions if they load or store 8-byte from/to a not
natuarally aligned memory location.
On a 32-bit kernel I noticed that those unaligned handlers didn't worked and
instead the application got a SEGV.
While checking the code I found two problems:
First, the OPCODE_FLDD_L and OPCODE_FSTD_L cases were ifdef'ed out by the
CONFIG_PA20 option, and as such those weren't built on a pure 32-bit kernel.
This is now fixed by moving the CONFIG_PA20 #ifdef to prevent the compilation
of OPCODE_LDD_L and OPCODE_FSTD_L only, and handling the fldd and fstd
instructions.
The second problem are two bugs in the 32-bit inline assembly code, where the
wrong registers where used. The calculation of the natural alignment used %2
(vall) instead of %3 (ior), and the first word was stored back to address %1
(valh) instead of %3 (ior).
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org
Although we have btrfs_requeue_inode_defrag(), for autodefrag we are
still just exhausting all inode_defrag items in the tree.
This means, it doesn't make much difference to requeue an inode_defrag,
other than scan the inode from the beginning till its end.
Change the behaviour to always scan from offset 0 of an inode, and till
the end.
By this we get the following benefit:
- Straight-forward code
- No more re-queue related check
- Fewer members in inode_defrag
We still keep the same btrfs_get_fs_root() and btrfs_iget() check for
each loop, and added extra should_auto_defrag() check per-loop.
Note: the patch needs to be backported and is intentionally written
to minimize the diff size, code will be cleaned up later.
CC: stable@vger.kernel.org # 5.16
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
For extent maps, if they are not compressed extents and are adjacent by
logical addresses and file offsets, they can be merged into one larger
extent map.
Such merged extent map will have the higher generation of all the
original ones.
But this brings a problem for autodefrag, as it relies on accurate
extent_map::generation to determine if one extent should be defragged.
For merged extent maps, their higher generation can mark some older
extents to be defragged while the original extent map doesn't meet the
minimal generation threshold.
Thus this will cause extra IO.
So solve the problem, here we introduce a new flag, EXTENT_FLAG_MERGED,
to indicate if the extent map is merged from one or more ems.
And for autodefrag, if we find a merged extent map, and its generation
meets the generation requirement, we just don't use this one, and go
back to defrag_get_extent() to read extent maps from subvolume trees.
This could cause more read IO, but should result less defrag data write,
so in the long run it should be a win for autodefrag.
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
For defrag, we don't really want to use btrfs_get_extent() to iterate
all extent maps of an inode.
The reasons are:
- btrfs_get_extent() can merge extent maps
And the result em has the higher generation of the two, causing defrag
to mark unnecessary part of such merged large extent map.
This in fact can result extra IO for autodefrag in v5.16+ kernels.
However this patch is not going to completely solve the problem, as
one can still using read() to trigger extent map reading, and got
them merged.
The completely solution for the extent map merging generation problem
will come as an standalone fix.
- btrfs_get_extent() caches the extent map result
Normally it's fine, but for defrag the target range may not get
another read/write for a long long time.
Such cache would only increase the memory usage.
- btrfs_get_extent() doesn't skip older extent map
Unlike the old find_new_extent() which uses btrfs_search_forward() to
skip the older subtree, thus it will pick up unnecessary extent maps.
This patch will fix the regression by introducing defrag_get_extent() to
replace the btrfs_get_extent() call.
This helper will:
- Not cache the file extent we found
It will search the file extent and manually convert it to em.
- Use btrfs_search_forward() to skip entire ranges which is modified in
the past
This should reduce the IO for autodefrag.
Reported-by: Filipe Manana <fdmanana@suse.com>
Fixes: 7b508037d4 ("btrfs: defrag: use defrag_one_cluster() to implement btrfs_defrag_file()")
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
From the very beginning of btrfs defrag, there is a check to reject
extents which meet both conditions:
- Physically adjacent
We may want to defrag physically adjacent extents to reduce the number
of extents or the size of subvolume tree.
- Larger than 128K
This may be there for compressed extents, but unfortunately 128K is
exactly the max capacity for compressed extents.
And the check is > 128K, thus it never rejects compressed extents.
Furthermore, the compressed extent capacity bug is fixed by previous
patch, there is no reason for that check anymore.
The original check has a very small ranges to reject (the target extent
size is > 128K, and default extent threshold is 256K), and for
compressed extent it doesn't work at all.
So it's better just to remove the rejection, and allow us to defrag
physically adjacent extents.
CC: stable@vger.kernel.org # 5.16
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
For compressed extents, defrag ioctl will always try to defrag any
compressed extents, wasting not only IO but also CPU time to
compress/decompress:
mkfs.btrfs -f $DEV
mount -o compress $DEV $MNT
xfs_io -f -c "pwrite -S 0xab 0 128K" $MNT/foobar
sync
xfs_io -f -c "pwrite -S 0xcd 128K 128K" $MNT/foobar
sync
echo "=== before ==="
xfs_io -c "fiemap -v" $MNT/foobar
btrfs filesystem defrag $MNT/foobar
sync
echo "=== after ==="
xfs_io -c "fiemap -v" $MNT/foobar
Then it shows the 2 128K extents just get COW for no extra benefit, with
extra IO/CPU spent:
=== before ===
/mnt/btrfs/file1:
EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS
0: [0..255]: 26624..26879 256 0x8
1: [256..511]: 26632..26887 256 0x9
=== after ===
/mnt/btrfs/file1:
EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS
0: [0..255]: 26640..26895 256 0x8
1: [256..511]: 26648..26903 256 0x9
This affects not only v5.16 (after the defrag rework), but also v5.15
(before the defrag rework).
[CAUSE]
From the very beginning, btrfs defrag never checks if one extent is
already at its max capacity (128K for compressed extents, 128M
otherwise).
And the default extent size threshold is 256K, which is already beyond
the compressed extent max size.
This means, by default btrfs defrag ioctl will mark all compressed
extent which is not adjacent to a hole/preallocated range for defrag.
[FIX]
Introduce a helper to grab the maximum extent size, and then in
defrag_collect_targets() and defrag_check_next_extent(), reject extents
which are already at their max capacity.
Reported-by: Filipe Manana <fdmanana@suse.com>
CC: stable@vger.kernel.org # 5.16
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
With older kernels (before v5.16), btrfs will defrag preallocated extents.
While with newer kernels (v5.16 and newer) btrfs will not defrag
preallocated extents, but it will defrag the extent just before the
preallocated extent, even it's just a single sector.
This can be exposed by the following small script:
mkfs.btrfs -f $dev > /dev/null
mount $dev $mnt
xfs_io -f -c "pwrite 0 4k" -c sync -c "falloc 4k 16K" $mnt/file
xfs_io -c "fiemap -v" $mnt/file
btrfs fi defrag $mnt/file
sync
xfs_io -c "fiemap -v" $mnt/file
The output looks like this on older kernels:
/mnt/btrfs/file:
EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS
0: [0..7]: 26624..26631 8 0x0
1: [8..39]: 26632..26663 32 0x801
/mnt/btrfs/file:
EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS
0: [0..39]: 26664..26703 40 0x1
Which defrags the single sector along with the preallocated extent, and
replace them with an regular extent into a new location (caused by data
COW).
This wastes most of the data IO just for the preallocated range.
On the other hand, v5.16 is slightly better:
/mnt/btrfs/file:
EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS
0: [0..7]: 26624..26631 8 0x0
1: [8..39]: 26632..26663 32 0x801
/mnt/btrfs/file:
EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS
0: [0..7]: 26664..26671 8 0x0
1: [8..39]: 26632..26663 32 0x801
The preallocated range is not defragged, but the sector before it still
gets defragged, which has no need for it.
[CAUSE]
One of the function reused by the old and new behavior is
defrag_check_next_extent(), it will determine if we should defrag
current extent by checking the next one.
It only checks if the next extent is a hole or inlined, but it doesn't
check if it's preallocated.
On the other hand, out of the function, both old and new kernel will
reject preallocated extents.
Such inconsistent behavior causes above behavior.
[FIX]
- Also check if next extent is preallocated
If so, don't defrag current extent.
- Add comments for each branch why we reject the extent
This will reduce the IO caused by defrag ioctl and autodefrag.
CC: stable@vger.kernel.org # 5.16
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
ASoC: Fixes for v5.17
A few more fixes for v5.17, one followup to the bounds checking fixes
handling controls which support negative values internally and a driver
specific one.
As per NVMe/TCP specification (revision 1.0a, section 3.6.2.3)
Maximum Host to Controller Data length (MAXH2CDATA): Specifies the
maximum number of PDU-Data bytes per H2CData PDU in bytes. This value
is a multiple of dwords and should be no less than 4,096.
Current code sets H2CData PDU data_length to r2t_length,
it does not check MAXH2CDATA value. Fix this by setting H2CData PDU
data_length to min(req->h2cdata_left, queue->maxh2cdata).
Also validate MAXH2CDATA value returned by target in ICResp PDU,
if it is not a multiple of dword or if it is less than 4096 return
-EINVAL from nvme_tcp_init_connection().
Signed-off-by: Varun Prakash <varun@chelsio.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Commit e7d65803e2 ("nvme-multipath: revalidate paths during rescan")
introduced the NVME_NS_READY flag, which nvme_path_is_disabled() uses
to check if a path can be used or not. We also need to set this flag
for devices that fail the ZNS feature validation and which are available
through passthrough devices only to that they can be used in multipathing
setups.
Fixes: e7d65803e2 ("nvme-multipath: revalidate paths during rescan")
Reported-by: Kanchan Joshi <joshi.k@samsung.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Daniel Wagner <dwagner@suse.de>
Tested-by: Kanchan Joshi <joshi.k@samsung.com>
When a fabrics controller claims to support an invalidate metadata
configuration we already warn and disable metadata support. No need to
also return an error during revalidation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Daniel Wagner <dwagner@suse.de>
Tested-by: Kanchan Joshi <joshi.k@samsung.com>
In order to fill the drm_display_info structure each time an EDID is
read, the code currently will call drm_add_display_info with the parsed
EDID.
drm_add_display_info will then call drm_reset_display_info to reset all
the fields to 0, and then set them to the proper value depending on the
EDID.
In the color_formats case, we will thus report that we don't support any
color format, and then fill it back with RGB444 plus the additional
formats described in the EDID Feature Support byte.
However, since that byte only contains format-related bits since the 1.4
specification, this doesn't happen if the EDID is following an earlier
specification. In turn, it means that for one of these EDID, we end up
with color_formats set to 0.
The EDID 1.3 specification never really specifies what it means by RGB
exactly, but since both HDMI and DVI will use RGB444, it's fairly safe
to assume it's supposed to be RGB444.
Let's move the addition of RGB444 to color_formats earlier in
drm_add_display_info() so that it's always set for a digital display.
Fixes: da05a5a71a ("drm: parse color format support for digital displays")
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reported-by: Matthias Reichl <hias@horus.com>
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220203115416.1137308-1-maxime@cerno.tech
Heyi Guo says:
====================
drivers/net/ftgmac100: fix occasional DHCP failure
This patch set is to fix the issues discussed in the mail thread:
https://lore.kernel.org/netdev/51f5b7a7-330f-6b3c-253d-10e45cdb6805@linux.alibaba.com/
and follows the advice from Andrew Lunn.
The first 2 patches refactors the code to enable adjust_link calling reset
function directly.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
DHCP failures were observed with systemd 247.6. The issue could be
reproduced by rebooting Aspeed 2600 and then running ifconfig ethX
down/up.
It is caused by below procedures in the driver:
1. ftgmac100_open() enables net interface and call phy_start()
2. When PHY is link up, it calls netif_carrier_on() and then
adjust_link callback
3. ftgmac100_adjust_link() will schedule the reset task
4. ftgmac100_reset_task() will then reset the MAC in another schedule
After step 2, systemd will be notified to send DHCP discover packet,
while the packet might be corrupted by MAC reset operation in step 4.
Call ftgmac100_reset() directly instead of scheduling task to fix the
issue.
Signed-off-by: Heyi Guo <guoheyi@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is to prepare for ftgmac100_adjust_link() to call
ftgmac100_reset() directly. Only code places are changed.
Signed-off-by: Heyi Guo <guoheyi@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is to prepare for ftgmac100_adjust_link() to call reset function
directly, instead of task schedule.
Signed-off-by: Heyi Guo <guoheyi@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix following coccicheck warning:
./net/sched/act_api.c:277:7-49: WARNING avoid newline at end of message
in NL_SET_ERR_MSG_MOD
Signed-off-by: Wan Jiabing <wanjiabing@vivo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adding myself (Alvin Šipraga) as another maintainer for the Realtek DSA
switch drivers. I intend to help Linus out with reviewing and testing
changes to these drivers, particularly the rtl8365mb driver which I
authored and have hardware access to.
Cc: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
These tests are supposed to check if the loop exited via a break or not.
However the tests are wrong because if we did not exit via a break then
"p" is not a valid pointer. In that case, it's the equivalent of
"if (*(u32 *)sr == *last_key) {". That's going to work most of the time,
but there is a potential for those to be equal.
Fixes: 1593123a6a ("tipc: add name table dump to new netlink api")
Fixes: 1a1a143daf ("tipc: add publication dump to new netlink api")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This test is checking if we exited the list via break or not. However
if it did not exit via a break then "node" does not point to a valid
udp_tunnel_nic_shared_node struct. It will work because of the way
the structs are laid out it's the equivalent of
"if (info->shared->udp_tunnel_nic_info != dev)" which will always be
true, but it's not the right way to test.
Fixes: 74cc6d182d ("udp_tunnel: add the ability to share port tables")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
vhost_vsock_stop() calls vhost_dev_check_owner() to check the device
ownership. It expects current->mm to be valid.
vhost_vsock_stop() is also called by vhost_vsock_dev_release() when
the user has not done close(), so when we are in do_exit(). In this
case current->mm is invalid and we're releasing the device, so we
should clean it anyway.
Let's check the owner only when vhost_vsock_stop() is called
by an ioctl.
When invoked from release we can not fail so we don't check return
code of vhost_vsock_stop(). We need to stop vsock even if it's not
the owner.
Fixes: 433fc58e6b ("VSOCK: Introduce vhost_vsock.ko")
Cc: stable@vger.kernel.org
Reported-by: syzbot+1e3ea63db39f2b4440e0@syzkaller.appspotmail.com
Reported-and-tested-by: syzbot+3140b17cb44a7b174008@syzkaller.appspotmail.com
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move the eDP panel on Venice 2 and Nyan boards into the corresponding
AUX bus device tree node. This allows us to avoid a nasty circular
dependency that would otherwise be created between the DPAUX and panel
nodes via the DDC/I2C phandle.
Fixes: eb481f9ac9 ("ARM: tegra: add Acer Chromebook 13 device tree")
Fixes: 59fe02cb07 ("ARM: tegra: Add DTS for the nyan-blaze board")
Fixes: 40e231c770 ("ARM: tegra: Enable eDP for Venice2")
Signed-off-by: Thierry Reding <treding@nvidia.com>
The DPAUX hardware block exposes an DP AUX interface that provides
access to an AUX bus and the devices on that bus. Use the DP AUX bus
infrastructure that was recently introduced to probe devices on this
bus from DT.
Signed-off-by: Thierry Reding <treding@nvidia.com>
While kfree_rcu(ptr) _is_ supported, it has some limitations.
Given that 99.99% of kfree_rcu() users [1] use the legacy
two parameters variant, and @catchall objects do have an rcu head,
simply use it.
Choice of kfree_rcu(ptr) variant was probably not intentional.
[1] including calls from net/netfilter/nf_tables_api.c
Fixes: aaa31047a6 ("netfilter: nftables: add catch-all set element support")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
If a bridged port is not offloaded to the hardware - either because the
underlying driver does not implement the port_bridge_{join,leave} ops,
or because the operation failed - then its dp->bridge pointer will be
NULL when dsa_port_bridge_leave() is called. Avoid dereferncing NULL.
This fixes the following splat when removing a port from a bridge:
Unable to handle kernel access to user memory outside uaccess routines at virtual address 0000000000000000
Internal error: Oops: 96000004 [#1] PREEMPT_RT SMP
CPU: 3 PID: 1119 Comm: brctl Tainted: G O 5.17.0-rc4-rt4 #1
Call trace:
dsa_port_bridge_leave+0x8c/0x1e4
dsa_slave_changeupper+0x40/0x170
dsa_slave_netdevice_event+0x494/0x4d4
notifier_call_chain+0x80/0xe0
raw_notifier_call_chain+0x1c/0x24
call_netdevice_notifiers_info+0x5c/0xac
__netdev_upper_dev_unlink+0xa4/0x200
netdev_upper_dev_unlink+0x38/0x60
del_nbp+0x1b0/0x300
br_del_if+0x38/0x114
add_del_if+0x60/0xa0
br_ioctl_stub+0x128/0x2dc
br_ioctl_call+0x68/0xb0
dev_ifsioc+0x390/0x554
dev_ioctl+0x128/0x400
sock_do_ioctl+0xb4/0xf4
sock_ioctl+0x12c/0x4e0
__arm64_sys_ioctl+0xa8/0xf0
invoke_syscall+0x4c/0x110
el0_svc_common.constprop.0+0x48/0xf0
do_el0_svc+0x28/0x84
el0_svc+0x1c/0x50
el0t_64_sync_handler+0xa8/0xb0
el0t_64_sync+0x17c/0x180
Code: f9402f00 f0002261 f9401302 913cc021 (a9401404)
---[ end trace 0000000000000000 ]---
Fixes: d3eed0e57d ("net: dsa: keep the bridge_dev and bridge_num as part of the same structure")
Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20220221203539.310690-1-alvin@pqrs.dk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The HPT371 chip physically has only one channel, the secondary one,
however the primary channel registers do exist! Thus we have to
manually disable the non-existing channel if the BIOS hasn't done this
already. Similarly to the pata_hpt3x2n driver, always disable the
primary channel.
Fixes: 669a5db411 ("[libata] Add a bunch of PATA drivers.")
Cc: stable@vger.kernel.org
Signed-off-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
In SPE traces the 'weight' field can't be printed in 'perf script'
because the 'dummy:u' event doesn't have the WEIGHT attribute set.
Use evsel__do_check_stype(..) to check this field, as it's done with
other fields such as "phys_addr".
Before:
$ perf record -e arm_spe_0// -- sleep 1
$ perf script -F event,ip,weight
Samples for 'dummy:u' event do not have WEIGHT attribute set. Cannot print 'weight' field.
After:
$ perf script -F event,ip,weight
l1d-access: 12 ffffaf629d4cb320
tlb-access: 12 ffffaf629d4cb320
memory: 12 ffffaf629d4cb320
Fixes: b0fde9c6e2 ("perf arm-spe: Add SPE total latency as PERF_SAMPLE_WEIGHT")
Signed-off-by: German Gomez <german.gomez@arm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20220221171707.62960-1-german.gomez@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Pull cgroup fixes from Tejun Heo:
- Fix for a subtle bug in the recent release_agent permission check
update
- Fix for a long-standing race condition between cpuset and cpu hotplug
- Comment updates
* 'for-5.17-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cpuset: Fix kernel-doc
cgroup-v1: Correct privileges check in release_agent writes
cgroup: clarify cgroup_css_set_fork()
cgroup/cpuset: Fix a race between cpuset_attach() and cpu hotplug
mutex_is_locked() tests whether the mutex is locked *by any task*, while
here we want to test if it is held *by the current task*. To avoid
false/missed WARNINGs, use lockdep_assert_is_held() and
lockdep_assert_is_not_held() instead, which do the right thing (though
they are a no-op if CONFIG_LOCKDEP=n).
Cc: stable@vger.kernel.org
Fixes: 2554a48f44 ("selinux: measure state and policy capabilities")
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Fix the following W=1 kernel warnings:
kernel/cgroup/cpuset.c:3718: warning: expecting prototype for
cpuset_memory_pressure_bump(). Prototype was for
__cpuset_memory_pressure_bump() instead.
kernel/cgroup/cpuset.c:3568: warning: expecting prototype for
cpuset_node_allowed(). Prototype was for __cpuset_node_allowed()
instead.
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Pull ITER_PIPE fix from Al Viro:
"Fix for old sloppiness in pipe_buffer reuse"
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
lib/iov_iter: initialize "flags" in new pipe_buffer
The idea is to check: a) the owning user_ns of cgroup_ns, b)
capabilities in init_user_ns.
The commit 24f6008564 ("cgroup-v1: Require capabilities to set
release_agent") got this wrong in the write handler of release_agent
since it checked user_ns of the opener (may be different from the owning
user_ns of cgroup_ns).
Secondly, to avoid possibly confused deputy, the capability of the
opener must be checked.
Fixes: 24f6008564 ("cgroup-v1: Require capabilities to set release_agent")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/stable/20220216121142.GB30035@blackbody.suse.cz/
Signed-off-by: Michal Koutný <mkoutny@suse.com>
Reviewed-by: Masami Ichikawa(CIP) <masami.ichikawa@cybertrust.co.jp>
Signed-off-by: Tejun Heo <tj@kernel.org>
With recent fixes for the permission checking when moving a task into a cgroup
using a file descriptor to a cgroup's cgroup.procs file and calling write() it
seems a good idea to clarify CLONE_INTO_CGROUP permission checking with a
comment.
Cc: Tejun Heo <tj@kernel.org>
Cc: <cgroups@vger.kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
When configfs_register_subsystem() or configfs_unregister_subsystem()
is executing link_group() or unlink_group(),
it is possible that two processes add or delete list concurrently.
Some unfortunate interleavings of them can cause kernel panic.
One of cases is:
A --> B --> C --> D
A <-- B <-- C <-- D
delete list_head *B | delete list_head *C
--------------------------------|-----------------------------------
configfs_unregister_subsystem | configfs_unregister_subsystem
unlink_group | unlink_group
unlink_obj | unlink_obj
list_del_init | list_del_init
__list_del_entry | __list_del_entry
__list_del | __list_del
// next == C |
next->prev = prev |
| next->prev = prev
prev->next = next |
| // prev == B
| prev->next = next
Fix this by adding mutex when calling link_group() or unlink_group(),
but parent configfs_subsystem is NULL when config_item is root.
So I create a mutex configfs_subsystem_mutex.
Fixes: 7063fbf226 ("[PATCH] configfs: User-driven configuration filesystem")
Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com>
Signed-off-by: Laibin Qiu <qiulaibin@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
io_rsrc_ref_quiesce will unlock the uring while it waits for references to
the io_rsrc_data to be killed.
There are other places to the data that might add references to data via
calls to io_rsrc_node_switch.
There is a race condition where this reference can be added after the
completion has been signalled. At this point the io_rsrc_ref_quiesce call
will wake up and relock the uring, assuming the data is unused and can be
freed - although it is actually being used.
To fix this check in io_rsrc_ref_quiesce if a resource has been revived.
Reported-by: syzbot+ca8bf833622a1662745b@syzkaller.appspotmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220222161751.995746-1-dylany@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Almost all fault/warning bits in pmbus status registers remain set even
after fault/warning condition are removed. As per pmbus specification
these faults must be cleared by user.
Modify hwmon behavior to clear fault/warning bit after fetching data if
fault/warning bit was set. This allows to get fresh data in next read.
Signed-off-by: Vikash Chandola <vikash.chandola@linux.intel.com>
Link: https://lore.kernel.org/r/20220222131253.2426834-1-vikash.chandola@linux.intel.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
If an attempt is made to a sensor with a thermal zone and it fails,
the call to devm_thermal_zone_of_sensor_register() may return -ENODEV.
This may result in crashes similar to the following.
Unable to handle kernel NULL pointer dereference at virtual address 00000000000003cd
...
Internal error: Oops: 96000021 [#1] PREEMPT SMP
...
pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : mutex_lock+0x18/0x60
lr : thermal_zone_device_update+0x40/0x2e0
sp : ffff800014c4fc60
x29: ffff800014c4fc60 x28: ffff365ee3f6e000 x27: ffffdde218426790
x26: ffff365ee3f6e000 x25: 0000000000000000 x24: ffff365ee3f6e000
x23: ffffdde218426870 x22: ffff365ee3f6e000 x21: 00000000000003cd
x20: ffff365ee8bf3308 x19: ffffffffffffffed x18: 0000000000000000
x17: ffffdde21842689c x16: ffffdde1cb7a0b7c x15: 0000000000000040
x14: ffffdde21a4889a0 x13: 0000000000000228 x12: 0000000000000000
x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000
x8 : 0000000001120000 x7 : 0000000000000001 x6 : 0000000000000000
x5 : 0068000878e20f07 x4 : 0000000000000000 x3 : 00000000000003cd
x2 : ffff365ee3f6e000 x1 : 0000000000000000 x0 : 00000000000003cd
Call trace:
mutex_lock+0x18/0x60
hwmon_notify_event+0xfc/0x110
0xffffdde1cb7a0a90
0xffffdde1cb7a0b7c
irq_thread_fn+0x2c/0xa0
irq_thread+0x134/0x240
kthread+0x178/0x190
ret_from_fork+0x10/0x20
Code: d503201f d503201f d2800001 aa0103e4 (c8e47c02)
Jon Hunter reports that the exact call sequence is:
hwmon_notify_event()
--> hwmon_thermal_notify()
--> thermal_zone_device_update()
--> update_temperature()
--> mutex_lock()
The hwmon core needs to handle all errors returned from calls
to devm_thermal_zone_of_sensor_register(). If the call fails
with -ENODEV, report that the sensor was not attached to a
thermal zone but continue to register the hwmon device.
Reported-by: Jon Hunter <jonathanh@nvidia.com>
Cc: Dmitry Osipenko <digetx@gmail.com>
Fixes: 1597b374af ("hwmon: Add notification support")
Reviewed-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
By request of Nick Piggin:
> Patch 3 requires a KVM_CAP_PPC number allocated. QEMU maintainers are
> happy with it (link in changelog) just waiting on KVM upstreaming. Do
> you have objections to the series going to ppc/kvm tree first, or
> another option is you could take patch 3 alone first (it's relatively
> independent of the other 2) and ppc/kvm gets it from you?
Add KVM_CAP_PPC_AIL_MODE_3 to advertise the capability to set the AIL
resource mode to 3 with the H_SET_MODE hypercall. This capability
differs between processor types and KVM types (PR, HV, Nested HV), and
affects guest-visible behaviour.
QEMU will implement a cap-ail-mode-3 to control this behaviour[1], and
use the KVM CAP if available to determine KVM support[2].
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
This is fixing up the use without proper initialization in patch 5/5
-o-
Hi,
The following patchset contains Netfilter fixes for net:
1) Missing #ifdef CONFIG_IP6_NF_IPTABLES in recent xt_socket fix.
2) Fix incorrect flow action array size in nf_tables.
3) Unregister flowtable hooks from netns exit path.
4) Fix missing limit object release, from Florian Westphal.
5) Memleak in nf_tables object update path, also from Florian.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
stateful objects can be updated from the control plane.
The transaction logic allocates a temporary object for this purpose.
The ->init function was called for this object, so plain kfree() leaks
resources. We must call ->destroy function of the object.
nft_obj_destroy does this, but it also decrements the module refcount,
but the update path doesn't increment it.
To avoid special-casing the update object release, do module_get for
the update case too and release it via nft_obj_destroy().
Fixes: d62d0ba97b ("netfilter: nf_tables: Introduce stateful object update operation")
Cc: Fernando Fernandez Mancera <ffmancera@riseup.net>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Even if PSR is allowed for a present GPU, there might be no eDP link
which supports PSR.
Fixes: 7089784873 ("drm/amdgpu/display: Only set vblank_disable_immediate when PSR is not enabled")
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Michel Dänzer <mdaenzer@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
UART drivers are meant to use the port spinlock within certain
methods, to protect against reentrancy. The sc16is7xx driver does
very little locking, presumably because when added it triggers
"scheduling while atomic" errors. This is due to the use of mutexes
within the regmap abstraction layer, and the mutex implementation's
habit of sleeping the current thread while waiting for access.
Unfortunately this lack of interlocking can lead to corruption of
outbound data, which occurs when the buffer used for I2C transmission
is used simultaneously by two threads - a work queue thread running
sc16is7xx_tx_proc, and an IRQ thread in sc16is7xx_port_irq, both
of which can call sc16is7xx_handle_tx.
An earlier patch added efr_lock, a mutex that controls access to the
EFR register. This mutex is already claimed in the IRQ handler, and
all that is required is to claim the same mutex in sc16is7xx_tx_proc.
See: https://github.com/raspberrypi/linux/issues/4885
Fixes: 6393ff1c44 ("sc16is7xx: Use threaded IRQ")
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
Link: https://lore.kernel.org/r/20220216160802.1026013-1-phil@raspberrypi.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In the current implementation the user may open a virtual tty which then
could fail to establish the underlying DLCI. The function gsmtty_open()
gets stuck in tty_port_block_til_ready() while waiting for a carrier rise.
This happens if the remote side fails to acknowledge the link establishment
request in time or completely. At some point gsm_dlci_close() is called
to abort the link establishment attempt. The function tries to inform the
associated virtual tty by performing a hangup. But the blocking loop within
tty_port_block_til_ready() is not informed about this event.
The patch proposed here fixes this by resetting the initialization state of
the virtual tty to ensure the loop exits and triggering it to make
tty_port_block_til_ready() return.
Fixes: e1eaea46bb ("tty: n_gsm line discipline")
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Starke <daniel.starke@siemens.com>
Link: https://lore.kernel.org/r/20220218073123.2121-7-daniel.starke@siemens.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The function gsm_process_modem() exists to handle modem status bits of
incoming frames. This includes incoming MSC (modem status command) frames
and convergence layer type 2 data frames. The function, however, was only
designed to handle MSC frames as it expects the command length. Within
gsm_dlci_data() it is wrongly assumed that this is the same as the data
frame length. This is only true if the data frame contains only 1 byte of
payload.
This patch names the length parameter of gsm_process_modem() in a generic
manner to reflect its association. It also corrects all calls to the
function to handle the variable number of modem status octets correctly in
both cases.
Fixes: 7263287af9 ("tty: n_gsm: Fixed logic to decode break signal from modem status")
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Starke <daniel.starke@siemens.com>
Link: https://lore.kernel.org/r/20220218073123.2121-6-daniel.starke@siemens.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
tty flow control is handled via gsmtty_throttle() and gsmtty_unthrottle().
Both functions propagate the outgoing hardware flow control state to the
remote side via MSC (modem status command) frames. The local state is taken
from the RTS (ready to send) flag of the tty. However, RTS gets mapped to
DTR (data terminal ready), which is wrong.
This patch corrects this by mapping RTS to RTS.
Fixes: e1eaea46bb ("tty: n_gsm line discipline")
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Starke <daniel.starke@siemens.com>
Link: https://lore.kernel.org/r/20220218073123.2121-5-daniel.starke@siemens.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The here fixed commit made the tty hangup asynchronous to avoid a circular
locking warning. I could not reproduce this warning. Furthermore, due to
the asynchronous hangup the function call now gets queued up while the
underlying tty is being freed. Depending on the timing this results in a
NULL pointer access in the global work queue scheduler. To be precise in
process_one_work(). Therefore, the previous commit made the issue worse
which it tried to fix.
This patch fixes this by falling back to the old behavior which uses a
blocking tty hangup call before freeing up the associated tty.
Fixes: 7030082a74 ("tty: n_gsm: avoid recursive locking with async port hangup")
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Starke <daniel.starke@siemens.com>
Link: https://lore.kernel.org/r/20220218073123.2121-4-daniel.starke@siemens.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Trying to open a DLCI by sending a SABM frame may fail with a timeout.
The link is closed on the initiator side without informing the responder
about this event. The responder assumes the link is open after sending a
UA frame to answer the SABM frame. The link gets stuck in a half open
state.
This patch fixes this by initiating the proper link termination procedure
after link setup timeout instead of silently closing it down.
Fixes: e1eaea46bb ("tty: n_gsm line discipline")
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Starke <daniel.starke@siemens.com>
Link: https://lore.kernel.org/r/20220218073123.2121-3-daniel.starke@siemens.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
n_gsm is based on the 3GPP 07.010 and its newer version is the 3GPP 27.010.
See https://portal.3gpp.org/desktopmodules/Specifications/SpecificationDetails.aspx?specificationId=1516
The changes from 07.010 to 27.010 are non-functional. Therefore, I refer to
the newer 27.010 here. Chapter 5.2.1.2 describes the encoding of the
C/R (command/response) bit. Table 1 shows that the actual encoding of the
C/R bit is inverted if the associated frame is sent by the responder.
The referenced commit fixed here further broke the internal meaning of this
bit in the outgoing path by always setting the C/R bit regardless of the
frame type.
This patch fixes both by setting the C/R bit always consistently for
command (1) and response (0) frames and inverting it later for the
responder where necessary. The meaning of this bit in the debug output
is being preserved and shows the bit as if it was encoded by the initiator.
This reflects only the frame type rather than the encoded combination of
communication side and frame type.
Fixes: cc0f42122a ("tty: n_gsm: Modify CR,PF bit when config requester")
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Starke <daniel.starke@siemens.com>
Link: https://lore.kernel.org/r/20220218073123.2121-2-daniel.starke@siemens.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
n_gsm is based on the 3GPP 07.010 and its newer version is the 3GPP 27.010.
See https://portal.3gpp.org/desktopmodules/Specifications/SpecificationDetails.aspx?specificationId=1516
The changes from 07.010 to 27.010 are non-functional. Therefore, I refer to
the newer 27.010 here. Chapter 5.4.6.3.7 describes the encoding of the
control signal octet used by the MSC (modem status command). The same
encoding is also used in convergence layer type 2 as described in chapter
5.5.2. Table 7 and 24 both require the DV (data valid) bit to be set 1 for
outgoing control signal octets sent by the DTE (data terminal equipment),
i.e. for the initiator side.
Currently, the DV bit is only set if CD (carrier detect) is on, regardless
of the side.
This patch fixes this behavior by setting the DV bit on the initiator side
unconditionally.
Fixes: e1eaea46bb ("tty: n_gsm line discipline")
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Starke <daniel.starke@siemens.com>
Link: https://lore.kernel.org/r/20220218073123.2121-1-daniel.starke@siemens.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pull x86 platform driver fixes from Hans de Goede:
"Two small fixes and one hardware-id addition"
* tag 'platform-drivers-x86-v5.17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/x86: int3472: Add terminator to gpiod_lookup_table
platform/x86: asus-wmi: Fix regression when probing for fan curve control
platform/x86: thinkpad_acpi: Add dual-fan quirk for T15g (2nd gen)
Wp-gpios property can be used on NVMEM nodes and the same property can
be also used on MTD NAND nodes. In case of the wp-gpios property is
defined at NAND level node, the GPIO management is done at NAND driver
level. Write protect is disabled when the driver is probed or resumed
and is enabled when the driver is released or suspended.
When no partitions are defined in the NAND DT node, then the NAND DT node
will be passed to NVMEM framework. If wp-gpios property is defined in
this node, the GPIO resource is taken twice and the NAND controller
driver fails to probe.
A new Boolean flag named ignore_wp has been added in nvmem_config.
In case ignore_wp is set, it means that the GPIO is handled by the
provider. Lets set this flag in MTD layer to avoid the conflict on
wp_gpios property.
Fixes: 2a127da461 ("nvmem: add support for the write-protect pin")
Cc: stable@vger.kernel.org
Acked-by: Miquel Raynal <miquel.raynal@bootlin.com>
Signed-off-by: Christophe Kerello <christophe.kerello@foss.st.com>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Link: https://lore.kernel.org/r/20220220151432.16605-3-srinivas.kandagatla@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Wp-gpios property can be used on NVMEM nodes and the same property can
be also used on MTD NAND nodes. In case of the wp-gpios property is
defined at NAND level node, the GPIO management is done at NAND driver
level. Write protect is disabled when the driver is probed or resumed
and is enabled when the driver is released or suspended.
When no partitions are defined in the NAND DT node, then the NAND DT node
will be passed to NVMEM framework. If wp-gpios property is defined in
this node, the GPIO resource is taken twice and the NAND controller
driver fails to probe.
It would be possible to set config->wp_gpio at MTD level before calling
nvmem_register function but NVMEM framework will toggle this GPIO on
each write when this GPIO should only be controlled at NAND level driver
to ensure that the Write Protect has not been enabled.
A way to fix this conflict is to add a new boolean flag in nvmem_config
named ignore_wp. In case ignore_wp is set, the GPIO resource will
be managed by the provider.
Fixes: 2a127da461 ("nvmem: add support for the write-protect pin")
Cc: stable@vger.kernel.org
Signed-off-by: Christophe Kerello <christophe.kerello@foss.st.com>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Link: https://lore.kernel.org/r/20220220151432.16605-2-srinivas.kandagatla@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jonathan writes:
1st set of IIO fixes for the 5.17 cycle.
Several drivers:
- Fix a failure to disable runtime in probe error paths. All cases
were introduced in the same rework patch.
adi,ad7124
- Fix incorrect register masking.
adi,ad74413r
- Avoid referencing negative array offsets.
- Use ngpio size when iterating over mask not numebr of channels.
- Fix issue with wrong mask uage getting GPIOs.
adi,admv1014
- Drop check on unsigned less than 0.
adi,ads16480
- Correctly handle devices that don't have burst mode support.
fsl,fxls8962af
- Add missing padding needed between address and data for SPI transfers.
men_z188
- Fix iomap leak in error path.
st,lsm6dsx
- Wait for setting time in oneshot reads to get a stable result.
ti,tsc2046
- Prevent an array overflow.
* tag 'iio-fixes-for-5.17a' of https://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio:
iio: imu: st_lsm6dsx: wait for settling time in st_lsm6dsx_read_oneshot
iio: Fix error handling for PM
iio: addac: ad74413r: correct comparator gpio getters mask usage
iio: addac: ad74413r: use ngpio size when iterating over mask
iio: addac: ad74413r: Do not reference negative array offsets
iio: adc: men_z188_adc: Fix a resource leak in an error handling path
iio: frequency: admv1013: remove the always true condition
iio: accel: fxls8962af: add padding to regmap for SPI
iio:imu:adis16480: fix buffering for devices with no burst mode
iio: adc: ad7124: fix mask used for setting AIN_BUFP & AIN_BUFM bits
iio: adc: tsc2046: fix memory corruption by preventing array overflow
Resending this to properly add it to the patch tracker - thanks for letting
me know, Arnd :)
When ARM is enabled, and BITREVERSE is disabled,
Kbuild gives the following warning:
WARNING: unmet direct dependencies detected for HAVE_ARCH_BITREVERSE
Depends on [n]: BITREVERSE [=n]
Selected by [y]:
- ARM [=y] && (CPU_32v7M [=n] || CPU_32v7 [=y]) && !CPU_32v6 [=n]
This is because ARM selects HAVE_ARCH_BITREVERSE
without selecting BITREVERSE, despite
HAVE_ARCH_BITREVERSE depending on BITREVERSE.
This unmet dependency bug was found by Kismet,
a static analysis tool for Kconfig. Please advise if this
is not the appropriate solution.
Signed-off-by: Julian Braha <julianbraha@gmail.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
The kgdb code needs to register an undef hook for the Thumb UDF
instruction that will fault in order to be functional on Thumb2
platforms.
Reported-by: Johannes Stezenbach <js@sig21.net>
Tested-by: Johannes Stezenbach <js@sig21.net>
Fixes: 5cbad0ebf4 ("kgdb: support for ARCH=arm")
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
We need to provide a destroy callback to release the extra fields.
Fixes: 3b9e2ea6c1 ("netfilter: nft_limit: move stateful fields out of expression data")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Unregister flowtable hooks before they are releases via
nf_tables_flowtable_destroy() otherwise hook core reports UAF.
BUG: KASAN: use-after-free in nf_hook_entries_grow+0x5a7/0x700 net/netfilter/core.c:142 net/netfilter/core.c:142
Read of size 4 at addr ffff8880736f7438 by task syz-executor579/3666
CPU: 0 PID: 3666 Comm: syz-executor579 Not tainted 5.16.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
__dump_stack lib/dump_stack.c:88 [inline] lib/dump_stack.c:106
dump_stack_lvl+0x1dc/0x2d8 lib/dump_stack.c:106 lib/dump_stack.c:106
print_address_description+0x65/0x380 mm/kasan/report.c:247 mm/kasan/report.c:247
__kasan_report mm/kasan/report.c:433 [inline]
__kasan_report mm/kasan/report.c:433 [inline] mm/kasan/report.c:450
kasan_report+0x19a/0x1f0 mm/kasan/report.c:450 mm/kasan/report.c:450
nf_hook_entries_grow+0x5a7/0x700 net/netfilter/core.c:142 net/netfilter/core.c:142
__nf_register_net_hook+0x27e/0x8d0 net/netfilter/core.c:429 net/netfilter/core.c:429
nf_register_net_hook+0xaa/0x180 net/netfilter/core.c:571 net/netfilter/core.c:571
nft_register_flowtable_net_hooks+0x3c5/0x730 net/netfilter/nf_tables_api.c:7232 net/netfilter/nf_tables_api.c:7232
nf_tables_newflowtable+0x2022/0x2cf0 net/netfilter/nf_tables_api.c:7430 net/netfilter/nf_tables_api.c:7430
nfnetlink_rcv_batch net/netfilter/nfnetlink.c:513 [inline]
nfnetlink_rcv_skb_batch net/netfilter/nfnetlink.c:634 [inline]
nfnetlink_rcv_batch net/netfilter/nfnetlink.c:513 [inline] net/netfilter/nfnetlink.c:652
nfnetlink_rcv_skb_batch net/netfilter/nfnetlink.c:634 [inline] net/netfilter/nfnetlink.c:652
nfnetlink_rcv+0x10e6/0x2550 net/netfilter/nfnetlink.c:652 net/netfilter/nfnetlink.c:652
__nft_release_hook() calls nft_unregister_flowtable_net_hooks() which
only unregisters the hooks, then after RCU grace period, it is
guaranteed that no packets add new entries to the flowtable (no flow
offload rules and flowtable hooks are reachable from packet path), so it
is safe to call nf_flow_table_free() which cleans up the remaining
entries from the flowtable (both software and hardware) and it unbinds
the flow_block.
Fixes: ff4bf2f42a ("netfilter: nf_tables: add nft_unregister_flowtable_hook()")
Reported-by: syzbot+e918523f77e62790d6d9@syzkaller.appspotmail.com
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
...to help userland apps that need to identify FUSE mounts.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Experimentation shows that PHY detect might fail when the code attempts
MDIO bus read immediately after clock enable. Add delay to stabilize the
clock before bus access.
PHY detect failure started to show after commit 7590fc6f80 ("net:
mdio: Demote probed message to debug print") that removed coincidental
delay between clock enable and bus access.
10ms is meant to match the time it take to send the probed message over
UART at 115200 bps. This might be a far overshoot.
Fixes: 23a890d493 ("net: mdio: Add the reset function for IPQ MDIO driver")
Signed-off-by: Baruch Siach <baruch.siach@siklu.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
If an application calls io_uring_enter(2) with a timespec passed in,
convert that timespec to ktime_t rather than jiffies. The latter does
not provide the granularity the application may expect, and may in
fact provided different granularity on different systems, depending
on what the HZ value is configured at.
Turn the timespec into an absolute ktime_t, and use that with
schedule_hrtimeout() instead.
Link: https://github.com/axboe/liburing/issues/531
Cc: stable@vger.kernel.org
Reported-by: Bob Chen <chenbo.chen@alibaba-inc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We encounter a tcp drop issue in our cloud environment. Packet GROed in
host forwards to a VM virtio_net nic with net_failover enabled. VM acts
as a IPVS LB with ipip encapsulation. The full path like:
host gro -> vm virtio_net rx -> net_failover rx -> ipvs fullnat
-> ipip encap -> net_failover tx -> virtio_net tx
When net_failover transmits a ipip pkt (gso_type = 0x0103, which means
SKB_GSO_TCPV4, SKB_GSO_DODGY and SKB_GSO_IPXIP4), there is no gso
did because it supports TSO and GSO_IPXIP4. But network_header points to
inner ip header.
Call Trace:
tcp4_gso_segment ------> return NULL
inet_gso_segment ------> inner iph, network_header points to
ipip_gso_segment
inet_gso_segment ------> outer iph
skb_mac_gso_segment
Afterwards virtio_net transmits the pkt, only inner ip header is modified.
And the outer one just keeps unchanged. The pkt will be dropped in remote
host.
Call Trace:
inet_gso_segment ------> inner iph, outer iph is skipped
skb_mac_gso_segment
__skb_gso_segment
validate_xmit_skb
validate_xmit_skb_list
sch_direct_xmit
__qdisc_run
__dev_queue_xmit ------> virtio_net
dev_hard_start_xmit
__dev_queue_xmit ------> net_failover
ip_finish_output2
ip_output
iptunnel_xmit
ip_tunnel_xmit
ipip_tunnel_xmit ------> ipip
dev_hard_start_xmit
__dev_queue_xmit
ip_finish_output2
ip_output
ip_forward
ip_rcv
__netif_receive_skb_one_core
netif_receive_skb_internal
napi_gro_receive
receive_buf
virtnet_poll
net_rx_action
The root cause of this issue is specific with the rare combination of
SKB_GSO_DODGY and a tunnel device that adds an SKB_GSO_ tunnel option.
SKB_GSO_DODGY is set from external virtio_net. We need to reset network
header when callbacks.gso_segment() returns NULL.
This patch also includes ipv6_gso_segment(), considering SIT, etc.
Fixes: cb32f511a7 ("ipip: add GSO/TSO support")
Signed-off-by: Tao Liu <thomas.liu@ucloud.cn>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the only thing that is changing is SAGV vs. no SAGV but
the number of active planes and the total data rates end up
unchanged we currently bail out of intel_bw_atomic_check()
early and forget to actually compute the new WGV point
mask and thus won't actually enable/disable SAGV as requested.
This ends up poorly if we end up running with SAGV enabled
when we shouldn't. Usually ends up in underruns.
To fix this let's go through the QGV point mask computation
if either the data rates/number of planes, or the state
of SAGV is changing.
v2: Check more carefully if things are changing to avoid
the extra calculations/debugs from introducing unwanted
overhead
Cc: stable@vger.kernel.org
Reviewed-by: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com> #v1
Fixes: 20f505f225 ("drm/i915: Restrict qgv points which don't have enough bandwidth.")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220218064039.12834-3-ville.syrjala@linux.intel.com
(cherry picked from commit 6b728595ff)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
BIOS may leave a TypeC PHY in a connected state even though the
corresponding port is disabled. This will prevent any hotplug events
from being signalled (after the monitor deasserts and then reasserts its
HPD) until the PHY is disconnected and so the driver will not detect a
connected sink. Rebooting with the PHY in the connected state also
results in a system hang.
Fix the above by disconnecting TypeC PHYs on disabled ports.
Before commit 64851a32c4 the PHY connected state was read out even
for disabled ports and later the PHY got disconnected as a side effect
of a tc_port_lock/unlock() sequence (during connector probing), hence
recovering the port's hotplug functionality.
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/5014
Fixes: 64851a32c4 ("drm/i915/tc: Add a mode for the TypeC PHY's disconnected state")
Cc: <stable@vger.kernel.org> # v5.16+
Cc: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220217152237.670220-1-imre.deak@intel.com
(cherry picked from commit ed0ccf349f)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
With unprivileged eBPF enabled, eIBRS (without retpoline) is vulnerable
to Spectre v2 BHB-based attacks.
When both are enabled, print a warning message and report it in the
'spectre_v2' sysfs vulnerabilities file.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Update the doc with the new fun.
[ bp: Massage commit message. ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Thanks to the chaps at VUsec it is now clear that eIBRS is not
sufficient, therefore allow enabling of retpolines along with eIBRS.
Add spectre_v2=eibrs, spectre_v2=eibrs,lfence and
spectre_v2=eibrs,retpoline options to explicitly pick your preferred
means of mitigation.
Since there's new mitigations there's also user visible changes in
/sys/devices/system/cpu/vulnerabilities/spectre_v2 to reflect these
new mitigations.
[ bp: Massage commit message, trim error messages,
do more precise eIBRS mode checking. ]
Co-developed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Patrick Colp <patrick.colp@oracle.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
The RETPOLINE_AMD name is unfortunate since it isn't necessarily
AMD only, in fact Hygon also uses it. Furthermore it will likely be
sufficient for some Intel processors. Therefore rename the thing to
RETPOLINE_LFENCE to better describe what it is.
Add the spectre_v2=retpoline,lfence option as an alias to
spectre_v2=retpoline,amd to preserve existing setups. However, the output
of /sys/devices/system/cpu/vulnerabilities/spectre_v2 will be changed.
[ bp: Fix typos, massage. ]
Co-developed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Syzbot reported an slab-out-of-bounds Read in thrustmaster_probe() bug.
The root case is in missing validation check of actual number of endpoints.
Code should not blindly access usb_host_interface::endpoint array, since
it may contain less endpoints than code expects.
Fix it by adding missing validaion check and print an error if
number of endpoints do not match expected number
Fixes: c49c336378 ("HID: support for initialization of some Thrustmaster wheels")
Reported-and-tested-by: syzbot+35eebd505e97d315d01c@syzkaller.appspotmail.com
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
The imx_pgc_power_down() starts by enabling the domain clocks, and thus
disables them in the error path. Commit 18c98573a4 ("soc: imx: gpcv2:
add domain option to keep domain clocks enabled") made the clock enable
conditional, but forgot to add the same condition to the error path.
This can result in a clock enable/disable imbalance. Fix it.
Fixes: 18c98573a4 ("soc: imx: gpcv2: add domain option to keep domain clocks enabled")
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Tejas reported the following recursive locking issue:
swapper/0/1 is trying to acquire lock:
ffff8881074fd0a0 (&md->mutex){+.+.}-{3:3}, at: msi_get_virq+0x30/0xc0
but task is already holding lock:
ffff8881017cd6a0 (&md->mutex){+.+.}-{3:3}, at: __pci_enable_msi_range+0xf2/0x290
stack backtrace:
__mutex_lock+0x9d/0x920
msi_get_virq+0x30/0xc0
pci_irq_vector+0x26/0x30
vmd_msi_init+0xcc/0x210
msi_domain_alloc+0xbf/0x150
msi_domain_alloc_irqs_descs_locked+0x3e/0xb0
__pci_enable_msi_range+0x155/0x290
pci_alloc_irq_vectors_affinity+0xba/0x100
pcie_port_device_register+0x307/0x550
pcie_portdrv_probe+0x3c/0xd0
pci_device_probe+0x95/0x110
This is caused by the VMD MSI code which does a lookup of the Linux
interrupt number for an VMD managed MSI[X] vector. The lookup function
tries to acquire the already held mutex.
Avoid that by caching the Linux interrupt number at initialization time
instead of looking it up over and over.
Fixes: 82ff8e6b78 ("PCI/MSI: Use msi_get_virq() in pci_get_vector()")
Reported-by: "Surendrakumar Upadhyay, TejaskumarX" <tejaskumarx.surendrakumar.upadhyay@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: "Surendrakumar Upadhyay, TejaskumarX" <tejaskumarx.surendrakumar.upadhyay@intel.com>
Cc: linux-pci@vger.kernel.org
Link: https://lore.kernel.org/r/87a6euub2a.ffs@tglx
Michael Chan says:
====================
bnxt_en: Bug fixes
This series contains bug fixes for FEC reporting, ethtool self test,
multicast setup, devlink health reporting and live patching, and
a firmware response timeout.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
To install a livepatch, first flash the package to NVM, and then
activate the patch through the "HWRM_FW_LIVEPATCH" fw command.
To uninstall a patch from NVM, flash the removal package and then
activate it through the "HWRM_FW_LIVEPATCH" fw command.
The "HWRM_FW_LIVEPATCH" fw command has to consider following scenarios:
1. no patch in NVM and no patch active. Do nothing.
2. patch in NVM, but not active. Activate the patch currently in NVM.
3. patch is not in NVM, but active. Deactivate the patch.
4. patch in NVM and the patch active. Do nothing.
Fix the code to handle these scenarios during devlink "fw_activate".
To install and activate a live patch:
devlink dev flash pci/0000:c1:00.0 file thor_patch.pkg
devlink -f dev reload pci/0000:c1:00.0 action fw_activate limit no_reset
To remove and deactivate a live patch:
devlink dev flash pci/0000:c1:00.0 file thor_patch_rem.pkg
devlink -f dev reload pci/0000:c1:00.0 action fw_activate limit no_reset
Fixes: 3c4153394e ("bnxt_en: implement firmware live patching")
Reviewed-by: Vikas Gupta <vikas.gupta@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When polling for the firmware message response, we first poll for the
response message header. Once the valid length is detected in the
header, we poll for the valid bit at the end of the message which
signals DMA completion. Normally, this poll time for DMA completion
is extremely short (0 to a few usec). But on some devices under some
rare conditions, it can be up to about 20 msec.
Increase this delay to 50 msec and use udelay() for the first 10 usec
for the common case, and usleep_range() beyond that.
Also, change the error message to include the above delay time when
printing the timeout value.
Fixes: 3c8c20db76 ("bnxt_en: move HWRM API implementation into separate file")
Reviewed-by: Vladimir Olovyannikov <vladimir.olovyannikov@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
During ifdown, we call bnxt_inv_fw_health_reg() which will clear
both the status_reliable and resets_reliable flags if these
registers are mapped. This is correct because a FW reset during
ifdown will clear these register mappings. If we detect that FW
has gone through reset during the next ifup, we will remap these
registers.
But during normal ifup with no FW reset, we need to restore the
resets_reliable flag otherwise we will not show the reset counter
during devlink diagnose.
Fixes: 8cc95ceb70 ("bnxt_en: improve fw diagnose devlink health messages")
Reviewed-by: Vikas Gupta <vikas.gupta@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We should setup multicast only when net_device flags explicitly
has IFF_MULTICAST set. Otherwise we will incorrectly turn it on
even when not asked. Fix it by only passing the multicast table
to the firmware if IFF_MULTICAST is set.
Fixes: 7d2837dd7a ("bnxt_en: Setup multicast properly after resetting device.")
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the current code, we setup the port to PHY or MAC loopback mode
and then transmit a test broadcast packet for the loopback test. This
scheme fails sometime if the port is shared with management firmware
that can also send packets. The driver may receive the management
firmware's packet and the test will fail when the contents don't
match the test packet.
Change the test packet to use it's own MAC address as the destination
and setup the port to only receive it's own MAC address. This should
filter out other packets sent by management firmware.
Fixes: 91725d89b9 ("bnxt_en: Add PHY loopback to ethtool self-test.")
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Reviewed-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For offline (destructive) self tests, we need to stop the RDMA driver
first. Otherwise, the RDMA driver will run into unrecoverable errors
when destructive firmware tests are being performed.
The irq_re_init parameter used in the half close and half open
sequence when preparing the NIC for offline tests should be set to
true because the RDMA driver will free all IRQs before the offline
tests begin.
Fixes: 55fd0cf320 ("bnxt_en: Add external loopback test to ethtool selftest.")
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Reviewed-by: Ben Li <ben.li@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ethtool --show-fec <interface> does not show anything when the Active
FEC setting in the chip is set to None. Fix it to properly return
ETHTOOL_FEC_OFF in that case.
Fixes: 8b2775890a ("bnxt_en: Report FEC settings to ethtool.")
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
memblock.{reserved,memory}.regions may be allocated using kmalloc() in
memblock_double_array(). Use kfree() to release these kmalloced regions
indicated by memblock_{reserved,memory}_in_slab.
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Fixes: 3010f87650 ("mm: discard memblock data later")
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
immediate verdict expression needs to allocate one slot in the flow offload
action array, however, immediate data expression does not need to do so.
fwd and dup expression need to allocate one slot, this is missing.
Add a new offload_action interface to report if this expression needs to
allocate one slot in the flow offload action array.
Fixes: be2861dc36 ("netfilter: nft_{fwd,dup}_netdev: add offload support")
Reported-and-tested-by: Nick Gregory <Nick.Gregory@Sophos.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
If the DSA master doesn't support IFF_UNICAST_FLT, then the following
call path is possible:
dsa_slave_switchdev_event_work
-> dsa_port_host_fdb_add
-> dev_uc_add
-> __dev_set_rx_mode
-> __dev_set_promiscuity
Since the blamed commit, dsa_slave_switchdev_event_work() no longer
holds rtnl_lock(), which triggers the ASSERT_RTNL() from
__dev_set_promiscuity().
Taking rtnl_lock() around dev_uc_add() is impossible, because all the
code paths that call dsa_flush_workqueue() do so from contexts where the
rtnl_mutex is already held - so this would lead to an instant deadlock.
dev_uc_add() in itself doesn't require the rtnl_mutex for protection.
There is this comment in __dev_set_rx_mode() which assumes so:
/* Unicast addresses changes may only happen under the rtnl,
* therefore calling __dev_set_promiscuity here is safe.
*/
but it is from commit 4417da668c ("[NET]: dev: secondary unicast
address support") dated June 2007, and in the meantime, commit
f1f28aa351 ("netdev: Add addr_list_lock to struct net_device."), dated
July 2008, has added &dev->addr_list_lock to protect this instead of the
global rtnl_mutex.
Nonetheless, __dev_set_promiscuity() does assume rtnl_mutex protection,
but it is the uncommon path of what we typically expect dev_uc_add()
to do. So since only the uncommon path requires rtnl_lock(), just check
ahead of time whether dev_uc_add() would result into a call to
__dev_set_promiscuity(), and handle that condition separately.
DSA already configures the master interface to be promiscuous if the
tagger requires this. We can extend this to also cover the case where
the master doesn't handle dev_uc_add() (doesn't support IFF_UNICAST_FLT),
and on the premise that we'd end up making it promiscuous during
operation anyway, either if a DSA slave has a non-inherited MAC address,
or if the bridge notifies local FDB entries for its own MAC address, the
address of a station learned on a foreign port, etc.
Fixes: 0faf890fc5 ("net: dsa: drop rtnl_lock from dsa_slave_switchdev_event_work")
Reported-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit b3612ccdf2 ("net: dsa: microchip: implement multi-bridge support")
plugged a packet leak between ports that were members of different bridges.
Unfortunately, this broke another use case, namely that of more than two
ports that are members of the same bridge.
After that commit, when a port is added to a bridge, hardware bridging
between other member ports of that bridge will be cleared, preventing
packet exchange between them.
Fix by ensuring that the Port VLAN Membership bitmap includes any existing
ports in the bridge, not just the port being added.
Fixes: b3612ccdf2 ("net: dsa: microchip: implement multi-bridge support")
Signed-off-by: Svenning Sørensen <sss@secomea.com>
Tested-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2022-02-18
This series contains updates to ice driver only.
Wojciech fixes protocol matching for slow-path switchdev so that all
packets are correctly redirected.
Michal removes accidental unconditional setting of l4 port filtering
flag.
Jake adds locking to protect VF reset and removal to fix various issues
that can be encountered when they race with each other.
Tom Rix propagates an error and initializes a struct to resolve reported
Clang issues.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Mat Martineau says:
====================
mptcp: Fix address advertisement races and stabilize tests
Patches 1, 2, and 7 modify two self tests to give consistent, accurate
results by fixing timing issues and accounting for syncookie behavior.
Paches 3-6 fix two races in overlapping address advertisement send and
receive. Associated self tests are updated, including addition of two
MIBs to enable testing and tracking dropped address events.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Since commit 2843ff6f36 ("mptcp: remote addresses fullmesh"), an
MPTCP client can attempt creating multiple MPJ subflow simultaneusly.
In such scenario the server, when syncookies are enabled, could end-up
accepting incoming MPJ syn even above the configured subflow limit, as
the such limit can be enforced in a reliable way only after the subflow
creation. In case of syncookie, only after the 3rd ack reception.
As a consequence the related self-tests case sporadically fails, as it
verify that the server always accept the expected number of MPJ syn.
Address the issues relaxing the MPJ syn number constrain. Note that the
check on the accepted number of MPJ 3rd ack still remains intact.
Fixes: 2843ff6f36 ("mptcp: remote addresses fullmesh")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The MPTCP in kernel path manager has some constraints on incoming
addresses announce processing, so that in edge scenarios it can
end-up dropping (ignoring) some of such announces.
The above is not very limiting in practice since such scenarios are
very uncommon and MPTCP will recover due to ADD_ADDR retransmissions.
This patch adds a few MIB counters to account for such drop events
to allow easier introspection of the critical scenarios.
Fixes: f7efc7771e ("mptcp: drop argument port from mptcp_pm_announce_addr")
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If an MPTCP endpoint received multiple consecutive incoming
ADD_ADDR options, mptcp_pm_add_addr_received() can overwrite
the current remote address value after the PM lock is released
in mptcp_pm_nl_add_addr_received() and before such address
is echoed.
Fix the issue caching the remote address value a little earlier
and always using the cached value after releasing the PM lock.
Fixes: f7efc7771e ("mptcp: drop argument port from mptcp_pm_announce_addr")
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After commit a88c9e4969 ("mptcp: do not block subflows
creation on errors"), if a signal address races with a failing
subflow creation, the subflow creation failure control path
can trigger the selection of the next address to be announced
while the current announced is still pending.
The above will cause the unintended suppression of the ADD_ADDR
announce.
Fix the issue skipping the to-be-suppressed announce before it
will mark an endpoint as already used. The relevant announce
will be triggered again when the current one will complete.
Fixes: a88c9e4969 ("mptcp: do not block subflows creation on errors")
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of waiting for an arbitrary amount of time for the MPTCP
MP_CAPABLE handshake to complete, explicitly wait for the relevant
socket to enter into the established status.
Additionally let the data transfer application use the slowest
transfer mode available (-r), to cope with very slow host, or
high jitter caused by hosting VMs.
Fixes: df62f2ec3d ("selftests/mptcp: add diag interface tests")
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/258
Reported-and-tested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ida_simple_get() returns an id between min (0) and max (NFP_MAX_MAC_INDEX)
inclusive.
So NFP_MAX_MAC_INDEX (0xff) is a valid id.
In order for the error handling path to work correctly, the 'invalid'
value for 'ida_idx' should not be in the 0..NFP_MAX_MAC_INDEX range,
inclusive.
So set it to -1.
Fixes: 20cce88650 ("nfp: flower: enable MAC address sharing for offloadable devs")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20220218131535.100258-1-simon.horman@corigine.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
intel-pinctrl for v5.17-5
* Revert misplaced ID
The following is an automated git shortlog grouped by driver:
tigerlake:
- Revert "Add Alder Lake-M ACPI ID"
The tegra186 GPIO driver makes the assumption that the pointer
returned by irq_data_get_irq_chip_data() is a pointer to a
tegra_gpio structure. Unfortunately, it is actually a pointer
to the inner gpio_chip structure, as mandated by the gpiolib
infrastructure. Nice try.
The saving grace is that the gpio_chip is the first member of
tegra_gpio, so the bug has gone undetected since... forever.
Fix it by performing a container_of() on the pointer. This results
in no additional code, and makes it possible to understand how
the whole thing works.
Fixes: 5b2b135a87 ("gpio: Add Tegra186 support")
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: Thierry Reding <treding@nvidia.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Link: https://lore.kernel.org/r/20220211093904.1112679-1-maz@kernel.org
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
As the possible failure of the ioremap(), the par_io could be NULL.
Therefore it should be better to check it and return error in order to
guarantee the success of the initiation.
But, I also notice that all the caller like mpc85xx_qe_par_io_init() in
`arch/powerpc/platforms/85xx/common.c` don't check the return value of
the par_io_init().
Actually, par_io_init() needs to check to handle the potential error.
I will submit another patch to fix that.
Anyway, par_io_init() itsely should be fixed.
Fixes: 7aa1aa6ece ("QE: Move QE from arch/powerpc to drivers/soc")
Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn>
Signed-off-by: Li Yang <leoyang.li@nxp.com>
The double `is' in the comment in line 150 is repeated. Remove one
of them from the comment. Also removes a redundant tab in a new line.
Signed-off-by: Jason Wang <wangborong@cdjrlc.com>
Signed-off-by: Li Yang <leoyang.li@nxp.com>
If 'devm_kstrdup()' fails, we should return -ENOMEM.
While at it, move the 'of_node_put()' call in the error handling path and
after the 'machine' has been copied.
Better safe than sorry.
Fixes: a6fc3b6981 ("soc: fsl: add GUTS driver for QorIQ platforms")
Depends-on: fddacc7ff4dd ("soc: fsl: guts: Revert commit 3c0d64e867ed")
Suggested-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Li Yang <leoyang.li@nxp.com>
This reverts commit 3c0d64e867
("soc: fsl: guts: reuse machine name from device tree").
A following patch will fix the missing memory allocation failure check
instead.
Suggested-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Li Yang <leoyang.li@nxp.com>
MAINTAINERS lacks of proper coverage for FSL headers. Fix it accordingly.
Fixes: 1b48706f02 ("MAINTAINERS: add entry for Freescale SoC drivers")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Li Yang <leoyang.li@nxp.com>
MAINTAINERS lacks of proper coverage for FSL headers. Fix it accordingly.
Fixes: 7aa1aa6ece ("QE: Move QE from arch/powerpc to drivers/soc")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Li Yang <leoyang.li@nxp.com>
When kernel.h is used in the headers it adds a lot into dependency hell,
especially when there are circular dependencies are involved.
Replace kernel.h inclusion with the list of what is really being used.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Li Yang <leoyang.li@nxp.com>
The compatible string is already in use, fix the chip list in binding to
include it.
Signed-off-by: Li Yang <leoyang.li@nxp.com>
Acked-by: Rob Herring <robh@kernel.org>
The compatible string is already in use, fix the binding to include it.
Signed-off-by: Li Yang <leoyang.li@nxp.com>
Acked-by: Rob Herring <robh@kernel.org>
Clang static analysis reports this issues
ice_common.c:5008:21: warning: The left expression of the compound
assignment is an uninitialized value. The computed value will
also be garbage
ldo->phy_type_low |= ((u64)buf << (i * 16));
~~~~~~~~~~~~~~~~~ ^
When called from ice_cfg_phy_fec() ldo is the uninitialized local
variable tlv. So initialize.
Fixes: ea78ce4dab ("ice: add link lenient and default override support")
Signed-off-by: Tom Rix <trix@redhat.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Clang static analysis reports this issue
time64.h:69:50: warning: The left operand of '+'
is a garbage value
set_normalized_timespec64(&ts_delta, lhs.tv_sec + rhs.tv_sec,
~~~~~~~~~~ ^
In ice_ptp_adjtime_nonatomic(), the timespec64 variable 'now'
is set by ice_ptp_gettimex64(). This function can fail
with -EBUSY, so 'now' can have a gargbage value.
So check the return.
Fixes: 06c16d89d2 ("ice: register 1588 PTP clock device object for E810 devices")
Signed-off-by: Tom Rix <trix@redhat.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Commit c503e63200 ("ice: Stop processing VF messages during teardown")
introduced a driver state flag, ICE_VF_DEINIT_IN_PROGRESS, which is
intended to prevent some issues with concurrently handling messages from
VFs while tearing down the VFs.
This change was motivated by crashes caused while tearing down and
bringing up VFs in rapid succession.
It turns out that the fix actually introduces issues with the VF driver
caused because the PF no longer responds to any messages sent by the VF
during its .remove routine. This results in the VF potentially removing
its DMA memory before the PF has shut down the device queues.
Additionally, the fix doesn't actually resolve concurrency issues within
the ice driver. It is possible for a VF to initiate a reset just prior
to the ice driver removing VFs. This can result in the remove task
concurrently operating while the VF is being reset. This results in
similar memory corruption and panics purportedly fixed by that commit.
Fix this concurrency at its root by protecting both the reset and
removal flows using the existing VF cfg_lock. This ensures that we
cannot remove the VF while any outstanding critical tasks such as a
virtchnl message or a reset are occurring.
This locking change also fixes the root cause originally fixed by commit
c503e63200 ("ice: Stop processing VF messages during teardown"), so we
can simply revert it.
Note that I kept these two changes together because simply reverting the
original commit alone would leave the driver vulnerable to worse race
conditions.
Fixes: c503e63200 ("ice: Stop processing VF messages during teardown")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Accidentally filter flag for none encapsulated l4 port field is always
set. Even if user wants to add encapsulated l4 port field.
Remove this unnecessary flag setting.
Fixes: 9e300987d4 ("ice: VXLAN and Geneve TC support")
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
In switchdev mode, slow-path rules need to match all protocols, in order
to correctly redirect unfiltered or missed packets to the uplink. To set
this up for the virtual function to uplink flow, the rule that redirects
packets to the control VSI must have the tunnel type set to
ICE_SW_TUN_AND_NON_TUN. As a result of that new tunnel type being set,
ice_get_compat_fv_bitmap will select ICE_PROF_ALL. At that point all
profiles would be selected for this rule, resulting in the desired
behavior. Without this change slow-path would not work with
tunnel protocols.
Fixes: 8b032a55c1 ("ice: low level support for tunnels")
Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Arm SCMI fix for v5.17
A simple fix to remove space in the MODULE_ALIAS name used in the
SCMI driver as userspace expect no spaces in these names.
* tag 'scmi-fix-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux:
firmware: arm_scmi: Remove space in MODULE_ALIAS name
Link: https://lore.kernel.org/r/20220214144245.2376150-1-sudeep.holla@arm.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
SoCFPGA dts updates for v5.18, part 2
- Add the "intel,socfpga-agilex-hsotg" compatible for Agilex platform
* tag 'socfpga_dts_update_for_v5.18_part2' of git://git.kernel.org/pub/scm/linux/kernel/git/dinguyen/linux:
arm64: dts: agilex: use the compatible "intel,socfpga-agilex-hsotg"
dt-bindings: usb: dwc2: add compatible "intel,socfpga-agilex-hsotg"
Link: https://lore.kernel.org/r/20220211112556.98940-2-dinguyen@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Error path of rtrs_clt_open() calls free_clt(), where free_permit is
called. This is wrong since error path of rtrs_clt_open() does not need
to call free_permit().
Also, moving free_permits() call to rtrs_clt_close(), makes it more
aligned with the call to alloc_permit() in rtrs_clt_open().
Fixes: 6a98d71dae ("RDMA/rtrs: client: main functionality")
Link: https://lore.kernel.org/r/20220217030929.323849-2-haris.iqbal@ionos.com
Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Reviewed-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Callback function rtrs_clt_dev_release() for put_device() calls kfree(clt)
to free memory. We shouldn't call kfree(clt) again, and we can't use the
clt after kfree too.
Replace device_register() with device_initialize() and device_add() so that
dev_set_name can() be used appropriately.
Move mutex_destroy() to the release function so it can be called in
the alloc_clt err path.
Fixes: eab0982466 ("RDMA/rtrs-clt: Refactor the failure cases in alloc_clt")
Link: https://lore.kernel.org/r/20220217030929.323849-1-haris.iqbal@ionos.com
Reported-by: Miaoqian Lin <linmq006@gmail.com>
Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Reviewed-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
The 'perf record' and 'perf stat' commands have supported the option
'-C/--cpus' to count or collect only on the list of CPUs provided.
Commit 1d3351e631 ("perf tools: Enable on a list of CPUs for
hybrid") add it to be supported for hybrid. For hybrid support, it
checks the cpu list are available on hybrid PMU. But when we test only
uncore events(or events not in cpu_core and cpu_atom), there is a bug:
Before:
# perf stat -C0 -e uncore_clock/clockticks/ sleep 1
failed to use cpu list 0
In this case, for uncore event, its pmu_name is not cpu_core or
cpu_atom, so in evlist__fix_hybrid_cpus, perf_pmu__find_hybrid_pmu
should return NULL,both events_nr and unmatched_count should be 0 ,then
the cpu list check function evlist__fix_hybrid_cpus return -1 and the
error "failed to use cpu list 0" will happen. Bypass "events_nr=0" case
then the issue is fixed.
After:
# perf stat -C0 -e uncore_clock/clockticks/ sleep 1
Performance counter stats for 'CPU(s) 0':
195,476,873 uncore_clock/clockticks/
1.004518677 seconds time elapsed
When testing with at least one core event and uncore events, it has no
issue.
# perf stat -C0 -e cpu_core/cpu-cycles/,uncore_clock/clockticks/ sleep 1
Performance counter stats for 'CPU(s) 0':
5,993,774 cpu_core/cpu-cycles/
301,025,912 uncore_clock/clockticks/
1.003964934 seconds time elapsed
Fixes: 1d3351e631 ("perf tools: Enable on a list of CPUs for hybrid")
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Zhengjun Xing <zhengjun.xing@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: alexander.shishkin@intel.com
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20220218093127.1844241-1-zhengjun.xing@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
devm_kmalloc() returns a pointer to allocated memory on success, NULL
on failure. While lp->indirect_lock is allocated by devm_kmalloc()
without proper check. It is better to check the value of it to
prevent potential wrong memory access.
Fixes: f14f5c11f0 ("net: ll_temac: Support indirect_mutex share within TEMAC IP")
Signed-off-by: Xiaoke Wang <xkernel.wang@foxmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
UDP sendmsg() can be lockless, this is causing all kinds
of data races.
This patch converts sk->sk_tskey to remove one of these races.
BUG: KCSAN: data-race in __ip_append_data / __ip_append_data
read to 0xffff8881035d4b6c of 4 bytes by task 8877 on cpu 1:
__ip_append_data+0x1c1/0x1de0 net/ipv4/ip_output.c:994
ip_make_skb+0x13f/0x2d0 net/ipv4/ip_output.c:1636
udp_sendmsg+0x12bd/0x14c0 net/ipv4/udp.c:1249
inet_sendmsg+0x5f/0x80 net/ipv4/af_inet.c:819
sock_sendmsg_nosec net/socket.c:705 [inline]
sock_sendmsg net/socket.c:725 [inline]
____sys_sendmsg+0x39a/0x510 net/socket.c:2413
___sys_sendmsg net/socket.c:2467 [inline]
__sys_sendmmsg+0x267/0x4c0 net/socket.c:2553
__do_sys_sendmmsg net/socket.c:2582 [inline]
__se_sys_sendmmsg net/socket.c:2579 [inline]
__x64_sys_sendmmsg+0x53/0x60 net/socket.c:2579
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
write to 0xffff8881035d4b6c of 4 bytes by task 8880 on cpu 0:
__ip_append_data+0x1d8/0x1de0 net/ipv4/ip_output.c:994
ip_make_skb+0x13f/0x2d0 net/ipv4/ip_output.c:1636
udp_sendmsg+0x12bd/0x14c0 net/ipv4/udp.c:1249
inet_sendmsg+0x5f/0x80 net/ipv4/af_inet.c:819
sock_sendmsg_nosec net/socket.c:705 [inline]
sock_sendmsg net/socket.c:725 [inline]
____sys_sendmsg+0x39a/0x510 net/socket.c:2413
___sys_sendmsg net/socket.c:2467 [inline]
__sys_sendmmsg+0x267/0x4c0 net/socket.c:2553
__do_sys_sendmmsg net/socket.c:2582 [inline]
__se_sys_sendmmsg net/socket.c:2579 [inline]
__x64_sys_sendmmsg+0x53/0x60 net/socket.c:2579
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
value changed: 0x0000054d -> 0x0000054e
Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 8880 Comm: syz-executor.5 Not tainted 5.17.0-rc2-syzkaller-00167-gdcb85f85fa6f-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Fixes: 09c2d251b7 ("net-timestamp: add key to disambiguate concurrent datagrams")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A malicious device can leak heap data to user space
providing bogus frame lengths. Introduce a sanity check.
Signed-off-by: Oliver Neukum <oneukum@suse.com>
Reviewed-by: Grant Grundler <grundler@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Flow table lookup is skipped if packet either went through ct clear
action (which set the IP_CT_UNTRACKED flag on the packet), or while
switching zones and there is already a connection associated with
the packet. This will result in no SW offload of the connection,
and the and connection not being removed from flow table with
TCP teardown (fin/rst packet).
To fix the above, remove these unneccary checks in flow
table lookup.
Fixes: 46475bb20f ("net/sched: act_ct: Software offload of established flows")
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When bringing down the netdevice or system shutdown, a panic can be
triggered while accessing the sysfs path because the device is already
removed.
[ 755.549084] mlx5_core 0000:12:00.1: Shutdown was called
[ 756.404455] mlx5_core 0000:12:00.0: Shutdown was called
...
[ 757.937260] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 758.031397] IP: [<ffffffff8ee11acb>] dma_pool_alloc+0x1ab/0x280
crash> bt
...
PID: 12649 TASK: ffff8924108f2100 CPU: 1 COMMAND: "amsd"
...
#9 [ffff89240e1a38b0] page_fault at ffffffff8f38c778
[exception RIP: dma_pool_alloc+0x1ab]
RIP: ffffffff8ee11acb RSP: ffff89240e1a3968 RFLAGS: 00010046
RAX: 0000000000000246 RBX: ffff89243d874100 RCX: 0000000000001000
RDX: 0000000000000000 RSI: 0000000000000246 RDI: ffff89243d874090
RBP: ffff89240e1a39c0 R8: 000000000001f080 R9: ffff8905ffc03c00
R10: ffffffffc04680d4 R11: ffffffff8edde9fd R12: 00000000000080d0
R13: ffff89243d874090 R14: ffff89243d874080 R15: 0000000000000000
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#10 [ffff89240e1a39c8] mlx5_alloc_cmd_msg at ffffffffc04680f3 [mlx5_core]
#11 [ffff89240e1a3a18] cmd_exec at ffffffffc046ad62 [mlx5_core]
#12 [ffff89240e1a3ab8] mlx5_cmd_exec at ffffffffc046b4fb [mlx5_core]
#13 [ffff89240e1a3ae8] mlx5_core_access_reg at ffffffffc0475434 [mlx5_core]
#14 [ffff89240e1a3b40] mlx5e_get_fec_caps at ffffffffc04a7348 [mlx5_core]
#15 [ffff89240e1a3bb0] get_fec_supported_advertised at ffffffffc04992bf [mlx5_core]
#16 [ffff89240e1a3c08] mlx5e_get_link_ksettings at ffffffffc049ab36 [mlx5_core]
#17 [ffff89240e1a3ce8] __ethtool_get_link_ksettings at ffffffff8f25db46
#18 [ffff89240e1a3d48] speed_show at ffffffff8f277208
#19 [ffff89240e1a3dd8] dev_attr_show at ffffffff8f0b70e3
#20 [ffff89240e1a3df8] sysfs_kf_seq_show at ffffffff8eedbedf
#21 [ffff89240e1a3e18] kernfs_seq_show at ffffffff8eeda596
#22 [ffff89240e1a3e28] seq_read at ffffffff8ee76d10
#23 [ffff89240e1a3e98] kernfs_fop_read at ffffffff8eedaef5
#24 [ffff89240e1a3ed8] vfs_read at ffffffff8ee4e3ff
#25 [ffff89240e1a3f08] sys_read at ffffffff8ee4f27f
#26 [ffff89240e1a3f50] system_call_fastpath at ffffffff8f395f92
crash> net_device.state ffff89443b0c0000
state = 0x5 (__LINK_STATE_START| __LINK_STATE_NOCARRIER)
To prevent this scenario, we also make sure that the netdevice is present.
Signed-off-by: suresh kumar <suresh2514@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a 6pack device is detaching, the sixpack_close() will act to cleanup
necessary resources. Although del_timer_sync() in sixpack_close()
won't return if there is an active timer, one could use mod_timer() in
sp_xmit_on_air() to wake up timer again by calling userspace syscall such
as ax25_sendmsg(), ax25_connect() and ax25_ioctl().
This unexpected waked handler, sp_xmit_on_air(), realizes nothing about
the undergoing cleanup and may still call pty_write() to use driver layer
resources that have already been released.
One of the possible race conditions is shown below:
(USE) | (FREE)
ax25_sendmsg() |
ax25_queue_xmit() |
... |
sp_xmit() |
sp_encaps() | sixpack_close()
sp_xmit_on_air() | del_timer_sync(&sp->tx_t)
mod_timer(&sp->tx_t,...) | ...
| unregister_netdev()
| ...
(wait a while) | tty_release()
| tty_release_struct()
| release_tty()
sp_xmit_on_air() | tty_kref_put(tty_struct) //FREE
pty_write(tty_struct) //USE | ...
The corresponding fail log is shown below:
===============================================================
BUG: KASAN: use-after-free in __run_timers.part.0+0x170/0x470
Write of size 8 at addr ffff88800a652ab8 by task swapper/2/0
...
Call Trace:
...
queue_work_on+0x3f/0x50
pty_write+0xcd/0xe0pty_write+0xcd/0xe0
sp_xmit_on_air+0xb2/0x1f0
call_timer_fn+0x28/0x150
__run_timers.part.0+0x3c2/0x470
run_timer_softirq+0x3b/0x80
__do_softirq+0xf1/0x380
...
This patch reorders the del_timer_sync() after the unregister_netdev()
to avoid UAF bugs. Because the unregister_netdev() is well synchronized,
it flushs out any pending queues, waits the refcount of net_device
decreases to zero and removes net_device from kernel. There is not any
running routines after executing unregister_netdev(). Therefore, we could
not arouse timer from userspace again.
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Reviewed-by: Lin Ma <linma@zju.edu.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
The fileattr API conversion broke lsattr on ntfs3g.
Previously the ioctl(... FS_IOC_GETFLAGS) returned an EINVAL error, but
after the conversion the error returned by the fuse filesystem was not
propagated back to the ioctl() system call, resulting in success being
returned with bogus values.
Fix by checking for outarg.result in fuse_priv_ioctl(), just as generic
ioctl code does.
Reported-by: Jean-Pierre André <jean-pierre.andre@wanadoo.fr>
Fixes: 72227eac17 ("fuse: convert to fileattr")
Cc: <stable@vger.kernel.org> # v5.13
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
On non-x86_64 builds, helpers gtod_is_based_on_tsc() and
kvm_guest_supported_xfd() are defined but never used. Because these are
static inline but are in a .c file, some compilers do warn for them with
-Wunused-function, which becomes an error if -Werror is present.
Add #ifdef so they are only defined in x86_64 builds.
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Leonardo Bras <leobras@redhat.com>
Message-Id: <20220218034100.115702-1-leobras@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Devkit8000 board seems to always used 32k_counter as clocksource.
Restore this behavior.
If clocksource is back to 32k_counter, timer12 is now the clockevent
source (as before) and timer2 is not longer needed here.
This commit fixes the same issue observed with commit 23885389db
("ARM: dts: Fix timer regression for beagleboard revision c") when sleep
is blocked until hitting keys over serial console.
Fixes: aba1ad05da ("clocksource/drivers/timer-ti-dm: Add clockevent and clocksource support")
Fixes: e428e250fd ("ARM: dts: Configure system timers for omap3")
Signed-off-by: Anthoine Bourgeois <anthoine.bourgeois@gmail.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
This patch allow lcd43 and lcd70 flavors to benefit from timer
evolution.
Fixes: e428e250fd ("ARM: dts: Configure system timers for omap3")
Signed-off-by: Anthoine Bourgeois <anthoine.bourgeois@gmail.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
The mmc0 clock gate bit was mistakenly assigned to "i2s" clock.
You can find that the same bit is assigned to "mmc0" too.
It leads to mmc0 hang for a long time after any sound activity
also it prevented PM_SLEEP to work properly.
I guess it was introduced by copy-paste from jz4740 driver
where it is really controls I2S clock gate.
Fixes: 226dfa4726 ("clk: Add Ingenic jz4725b CGU driver")
Signed-off-by: Siarhei Volkau <lis8215@gmail.com>
Tested-by: Siarhei Volkau <lis8215@gmail.com>
Reviewed-by: Paul Cercueil <paul@crapouillou.net>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20220205171849.687805-2-lis8215@gmail.com
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Just like in commit 05cf3ec00d ("clk: qcom: gcc-msm8996: Drop (again)
gcc_aggre1_pnoc_ahb_clk") adding NoC clocks turned out to be a huge
mistake, as they cause a lot of issues at little benefit (basically
letting Linux know about their children's frequencies), especially when
mishandled or misconfigured.
Adding these ones broke SDCC approx 99 out of 100 times, but that somehow
went unnoticed. To prevent further issues like this one, remove them.
This commit is effectively a revert of 74a33fac3a ("clk: qcom:
gcc-msm8994: Add missing NoC clocks") with ABI preservation.
Fixes: 74a33fac3a ("clk: qcom: gcc-msm8994: Add missing NoC clocks")
Signed-off-by: Konrad Dybcio <konrad.dybcio@somainline.org>
Link: https://lore.kernel.org/r/20220217232408.78932-1-konrad.dybcio@somainline.org
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Alexei Starovoitov says:
====================
pull-request: bpf 2022-02-17
We've added 8 non-merge commits during the last 7 day(s) which contain
a total of 8 files changed, 119 insertions(+), 15 deletions(-).
The main changes are:
1) Add schedule points in map batch ops, from Eric.
2) Fix bpf_msg_push_data with len 0, from Felix.
3) Fix crash due to incorrect copy_map_value, from Kumar.
4) Fix crash due to out of bounds access into reg2btf_ids, from Kumar.
5) Fix a bpf_timer initialization issue with clang, from Yonghong.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf: Add schedule points in batch ops
bpf: Fix crash due to out of bounds access into reg2btf_ids.
selftests: bpf: Check bpf_msg_push_data return value
bpf: Fix a bpf_timer initialization issue
bpf: Emit bpf_timer in vmlinux BTF
selftests/bpf: Add test for bpf_timer overwriting crash
bpf: Fix crash due to incorrect copy_map_value
bpf: Do not try bpf_msg_push_data with len 0
====================
Link: https://lore.kernel.org/r/20220217190000.37925-1-alexei.starovoitov@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
syzbot reported various soft lockups caused by bpf batch operations.
INFO: task kworker/1:1:27 blocked for more than 140 seconds.
INFO: task hung in rcu_barrier
Nothing prevents batch ops to process huge amount of data,
we need to add schedule points in them.
Note that maybe_wait_bpf_programs(map) calls from
generic_map_delete_batch() can be factorized by moving
the call after the loop.
This will be done later in -next tree once we get this fix merged,
unless there is strong opinion doing this optimization sooner.
Fixes: aa2e93b8e5 ("bpf: Add generic support for update and delete batch ops")
Fixes: cb4d03ab49 ("bpf: Add generic support for lookup batch op")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Brian Vazquez <brianvv@google.com>
Link: https://lore.kernel.org/bpf/20220217181902.808742-1-eric.dumazet@gmail.com
At boot on the BCM2711, if the HDMI controllers are running, the CRTC
driver will disable itself and its associated HDMI controller to work
around a hardware bug that would leave some pixels stuck in a FIFO.
In order to avoid that issue, we need to run some operations in lockstep
between the CRTC and HDMI controller, and we need to make sure the HDMI
controller will be powered properly.
However, since we haven't enabled it through KMS, the runtime_pm state
is off at this point so we need to make sure the device is powered
through pm_runtime_resume_and_get, and once the operations are complete,
we call pm_runtime_put.
However, the HDMI controller will do that itself in its
post_crtc_powerdown, which means we'll end up calling pm_runtime_put for
a single pm_runtime_get, throwing the reference counting off. Let's
remove the pm_runtime_put call in the CRTC code in order to have the
proper counting.
Fixes: bca10db67b ("drm/vc4: crtc: Make sure the HDMI controller is powered when disabling")
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220203102003.1114673-1-maxime@cerno.tech
With the existing logic where clear_ack is true (HW doesn’t support
auto clear for ICR), interrupt clear register reset is not handled
properly. Due to this only the first interrupts get processed properly
and further interrupts are blocked due to not resetting interrupt
clear register.
Example for issue case where Invert_ack is false and clear_ack is true:
Say Default ISR=0x00 & ICR=0x00 and ISR is triggered with 2
interrupts making ISR = 0x11.
Step 1: Say ISR is set 0x11 (store status_buff = ISR). ISR needs to
be cleared with the help of ICR once the Interrupt is processed.
Step 2: Write ICR = 0x11 (status_buff), this will clear the ISR to 0x00.
Step 3: Issue - In the existing code, ICR is written with ICR =
~(status_buff) i.e ICR = 0xEE -> This will block all the interrupts
from raising except for interrupts 0 and 4. So expectation here is to
reset ICR, which will unblock all the interrupts.
if (chip->clear_ack) {
if (chip->ack_invert && !ret)
........
else if (!ret)
ret = regmap_write(map, reg,
~data->status_buf[i]);
So writing 0 and 0xff (when ack_invert is true) should have no effect, other
than clearing the ACKs just set.
Fixes: 3a6f0fb7b8 ("regmap: irq: Add support to clear ack registers")
Signed-off-by: Prasad Kumpatla <quic_pkumpatl@quicinc.com>
Reviewed-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://lore.kernel.org/r/20220217085007.30218-1-quic_pkumpatl@quicinc.com
Signed-off-by: Mark Brown <broonie@kernel.org>
When the gadget driver hasn't been (yet) configured, and the cable is
connected to a HOST, the SFTDISCON gets cleared unconditionally, so the
HOST tries to enumerate it.
At the host side, this can result in a stuck USB port or worse. When
getting lucky, some dmesg can be observed at the host side:
new high-speed USB device number ...
device descriptor read/64, error -110
Fix it in drd, by checking the enabled flag before calling
dwc2_hsotg_core_connect(). It will be called later, once configured,
by the normal flow:
- udc_bind_to_driver
- usb_gadget_connect
- dwc2_hsotg_pullup
- dwc2_hsotg_core_connect
Fixes: 17f934024e ("usb: dwc2: override PHY input signals with usb role switch support")
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Fabrice Gasnier <fabrice.gasnier@foss.st.com>
Link: https://lore.kernel.org/r/1644999135-13478-1-git-send-email-fabrice.gasnier@foss.st.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When the Bay Trail phy GPIO mappings where added cs and reset were swapped,
this did not cause any issues sofar, because sofar they were always driven
high/low at the same time.
Note the new mapping has been verified both in /sys/kernel/debug/gpio
output on Android factory images on multiple devices, as well as in
the schematics for some devices.
Fixes: 5741022cbd ("usb: dwc3: pci: Add GPIO lookup table on platforms without ACPI GPIO resources")
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20220213130524.18748-3-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
kvm_vcpu_arch currently contains the guest supported features in both
guest_supported_xcr0 and guest_fpu.fpstate->user_xfeatures field.
Currently both fields are set to the same value in
kvm_vcpu_after_set_cpuid() and are not changed anywhere else after that.
Since it's not good to keep duplicated data, remove guest_supported_xcr0.
To keep the code more readable, introduce kvm_guest_supported_xcr()
and kvm_guest_supported_xfd() to replace the previous usages of
guest_supported_xcr0.
Signed-off-by: Leonardo Bras <leobras@redhat.com>
Message-Id: <20220217053028.96432-3-leobras@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
During host/guest switch (like in kvm_arch_vcpu_ioctl_run()), the kernel
swaps the fpu between host/guest contexts, by using fpu_swap_kvm_fpstate().
When xsave feature is available, the fpu swap is done by:
- xsave(s) instruction, with guest's fpstate->xfeatures as mask, is used
to store the current state of the fpu registers to a buffer.
- xrstor(s) instruction, with (fpu_kernel_cfg.max_features &
XFEATURE_MASK_FPSTATE) as mask, is used to put the buffer into fpu regs.
For xsave(s) the mask is used to limit what parts of the fpu regs will
be copied to the buffer. Likewise on xrstor(s), the mask is used to
limit what parts of the fpu regs will be changed.
The mask for xsave(s), the guest's fpstate->xfeatures, is defined on
kvm_arch_vcpu_create(), which (in summary) sets it to all features
supported by the cpu which are enabled on kernel config.
This means that xsave(s) will save to guest buffer all the fpu regs
contents the cpu has enabled when the guest is paused, even if they
are not used.
This would not be an issue, if xrstor(s) would also do that.
xrstor(s)'s mask for host/guest swap is basically every valid feature
contained in kernel config, except XFEATURE_MASK_PKRU.
Accordingto kernel src, it is instead switched in switch_to() and
flush_thread().
Then, the following happens with a host supporting PKRU starts a
guest that does not support it:
1 - Host has XFEATURE_MASK_PKRU set. 1st switch to guest,
2 - xsave(s) fpu regs to host fpustate (buffer has XFEATURE_MASK_PKRU)
3 - xrstor(s) guest fpustate to fpu regs (fpu regs have XFEATURE_MASK_PKRU)
4 - guest runs, then switch back to host,
5 - xsave(s) fpu regs to guest fpstate (buffer now have XFEATURE_MASK_PKRU)
6 - xrstor(s) host fpstate to fpu regs.
7 - kvm_vcpu_ioctl_x86_get_xsave() copy guest fpstate to userspace (with
XFEATURE_MASK_PKRU, which should not be supported by guest vcpu)
On 5, even though the guest does not support PKRU, it does have the flag
set on guest fpstate, which is transferred to userspace via vcpu ioctl
KVM_GET_XSAVE.
This becomes a problem when the user decides on migrating the above guest
to another machine that does not support PKRU: the new host restores
guest's fpu regs to as they were before (xrstor(s)), but since the new
host don't support PKRU, a general-protection exception ocurs in xrstor(s)
and that crashes the guest.
This can be solved by making the guest's fpstate->user_xfeatures hold
a copy of guest_supported_xcr0. This way, on 7 the only flags copied to
userspace will be the ones compatible to guest requirements, and thus
there will be no issue during migration.
As a bonus, it will also fail if userspace tries to set fpu features
(with the KVM_SET_XSAVE ioctl) that are not compatible to the guest
configuration. Such features will never be returned by KVM_GET_XSAVE
or KVM_GET_XSAVE2.
Also, since kvm_vcpu_after_set_cpuid() now sets fpstate->user_xfeatures,
there is not need to set it in kvm_check_cpuid(). So, change
fpstate_realloc() so it does not touch fpstate->user_xfeatures if a
non-NULL guest_fpu is passed, which is the case when kvm_check_cpuid()
calls it.
Signed-off-by: Leonardo Bras <leobras@redhat.com>
Message-Id: <20220217053028.96432-2-leobras@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
If vcpu has tsc_always_catchup set each request updates pvclock data.
KVM_HC_CLOCK_PAIRING consumers such as ptp_kvm_x86 rely on tsc read on
host's side and do hypercall inside pvclock_read_retry loop leading to
infinite loop in such situation.
v3:
Removed warn
Changed return code to KVM_EFAULT
v2:
Added warn
Signed-off-by: Anton Romanov <romanton@google.com>
Message-Id: <20220216182653.506850-1-romanton@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
I saw the below splatting after the host suspended and resumed.
WARNING: CPU: 0 PID: 2943 at kvm/arch/x86/kvm/../../../virt/kvm/kvm_main.c:5531 kvm_resume+0x2c/0x30 [kvm]
CPU: 0 PID: 2943 Comm: step_after_susp Tainted: G W IOE 5.17.0-rc3+ #4
RIP: 0010:kvm_resume+0x2c/0x30 [kvm]
Call Trace:
<TASK>
syscore_resume+0x90/0x340
suspend_devices_and_enter+0xaee/0xe90
pm_suspend.cold+0x36b/0x3c2
state_store+0x82/0xf0
kernfs_fop_write_iter+0x1b6/0x260
new_sync_write+0x258/0x370
vfs_write+0x33f/0x510
ksys_write+0xc9/0x160
do_syscall_64+0x3b/0xc0
entry_SYSCALL_64_after_hwframe+0x44/0xae
lockdep_is_held() can return -1 when lockdep is disabled which triggers
this warning. Let's use lockdep_assert_not_held() which can detect
incorrect calls while holding a lock and it also avoids false negatives
when lockdep is disabled.
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Message-Id: <1644920142-81249-1-git-send-email-wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Follow the precedent set by other architectures that support the VCPU
ioctl, KVM_ENABLE_CAP, and advertise the VM extension, KVM_CAP_ENABLE_CAP.
This way, userspace can ensure that KVM_ENABLE_CAP is available on a
vcpu before using it.
Fixes: 5c919412fe ("kvm/x86: Hyper-V synthetic interrupt controller")
Signed-off-by: Aaron Lewis <aaronlewis@google.com>
Message-Id: <20220214212950.1776943-1-aaronlewis@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
In order to properly emulate the WFI instruction, KVM reads back
ICH_VMCR_EL2 and enables doorbells for GICv4. These preparations are
necessary in order to recognize pending interrupts in
kvm_arch_vcpu_runnable() and return to the guest. Until recently, this
work was done by kvm_arch_vcpu_{blocking,unblocking}(). Since commit
6109c5a6ab ("KVM: arm64: Move vGIC v4 handling for WFI out arch
callback hook"), these callbacks were gutted and superseded by
kvm_vcpu_wfi().
It is important to note that KVM implements PSCI CPU_SUSPEND calls as
a WFI within the guest. However, the implementation calls directly into
kvm_vcpu_halt(), which skips the needed work done in kvm_vcpu_wfi()
to detect pending interrupts. Fix the issue by calling the WFI helper.
Fixes: 6109c5a6ab ("KVM: arm64: Move vGIC v4 handling for WFI out arch callback hook")
Signed-off-by: Oliver Upton <oupton@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220217101242.3013716-1-oupton@google.com
Commit 817b8b9c53 ("HID: elo: fix memory leak in elo_probe") introduced
memory leak on error path, but more importantly the whole USB reference
counting is not needed at all in the first place, as the driver itself
doesn't change the reference counting in any way, and the associated
usb_device is guaranteed to be kept around by USB core as long as the
driver binding exists.
Reported-by: Alan Stern <stern@rowland.harvard.edu>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: fbf42729d0 ("HID: elo: update the reference count of the usb device structure")
Fixes: 817b8b9c53 ("HID: elo: fix memory leak in elo_probe")
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Sparse warns about the following cast in the function
falcon_copy_firmware_image() ...
drivers/gpu/drm/tegra/falcon.c:66:27: warning: cast to restricted __le32
Fix this by casting the firmware data array to __le32 instead of u32.
Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
When commit e6ac2450d6 ("bpf: Support bpf program calling kernel function") added
kfunc support, it defined reg2btf_ids as a cheap way to translate the verifier
reg type to the appropriate btf_vmlinux BTF ID, however
commit c25b2ae136 ("bpf: Replace PTR_TO_XXX_OR_NULL with PTR_TO_XXX | PTR_MAYBE_NULL")
moved the __BPF_REG_TYPE_MAX from the last member of bpf_reg_type enum to after
the base register types, and defined other variants using type flag
composition. However, now, the direct usage of reg->type to index into
reg2btf_ids may no longer fall into __BPF_REG_TYPE_MAX range, and hence lead to
out of bounds access and kernel crash on dereference of bad pointer.
Fixes: c25b2ae136 ("bpf: Replace PTR_TO_XXX_OR_NULL with PTR_TO_XXX | PTR_MAYBE_NULL")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220216201943.624869-1-memxor@gmail.com
After enabling CONFIG_SCHED_CORE (landed during 5.14 cycle),
2-core 2-thread-per-core interAptiv (CPS-driven) started emitting
the following:
[ 0.025698] CPU1 revision is: 0001a120 (MIPS interAptiv (multi))
[ 0.048183] ------------[ cut here ]------------
[ 0.048187] WARNING: CPU: 1 PID: 0 at kernel/sched/core.c:6025 sched_core_cpu_starting+0x198/0x240
[ 0.048220] Modules linked in:
[ 0.048233] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.17.0-rc3+ #35 b7b319f24073fd9a3c2aa7ad15fb7993eec0b26f
[ 0.048247] Stack : 817f0000 00000004 327804c8 810eb050 00000000 00000004 00000000 c314fdd1
[ 0.048278] 830cbd64 819c0000 81800000 817f0000 83070bf4 00000001 830cbd08 00000000
[ 0.048307] 00000000 00000000 815fcbc4 00000000 00000000 00000000 00000000 00000000
[ 0.048334] 00000000 00000000 00000000 00000000 817f0000 00000000 00000000 817f6f34
[ 0.048361] 817f0000 818a3c00 817f0000 00000004 00000000 00000000 4dc33260 0018c933
[ 0.048389] ...
[ 0.048396] Call Trace:
[ 0.048399] [<8105a7bc>] show_stack+0x3c/0x140
[ 0.048424] [<8131c2a0>] dump_stack_lvl+0x60/0x80
[ 0.048440] [<8108b5c0>] __warn+0xc0/0xf4
[ 0.048454] [<8108b658>] warn_slowpath_fmt+0x64/0x10c
[ 0.048467] [<810bd418>] sched_core_cpu_starting+0x198/0x240
[ 0.048483] [<810c6514>] sched_cpu_starting+0x14/0x80
[ 0.048497] [<8108c0f8>] cpuhp_invoke_callback_range+0x78/0x140
[ 0.048510] [<8108d914>] notify_cpu_starting+0x94/0x140
[ 0.048523] [<8106593c>] start_secondary+0xbc/0x280
[ 0.048539]
[ 0.048543] ---[ end trace 0000000000000000 ]---
[ 0.048636] Synchronize counters for CPU 1: done.
...for each but CPU 0/boot.
Basic debug printks right before the mentioned line say:
[ 0.048170] CPU: 1, smt_mask:
So smt_mask, which is sibling mask obviously, is empty when entering
the function.
This is critical, as sched_core_cpu_starting() calculates
core-scheduling parameters only once per CPU start, and it's crucial
to have all the parameters filled in at that moment (at least it
uses cpu_smt_mask() which in fact is `&cpu_sibling_map[cpu]` on
MIPS).
A bit of debugging led me to that set_cpu_sibling_map() performing
the actual map calculation, was being invocated after
notify_cpu_start(), and exactly the latter function starts CPU HP
callback round (sched_core_cpu_starting() is basically a CPU HP
callback).
While the flow is same on ARM64 (maps after the notifier, although
before calling set_cpu_online()), x86 started calculating sibling
maps earlier than starting the CPU HP callbacks in Linux 4.14 (see
[0] for the reference). Neither me nor my brief tests couldn't find
any potential caveats in calculating the maps right after performing
delay calibration, but the WARN splat is now gone.
The very same debug prints now yield exactly what I expected from
them:
[ 0.048433] CPU: 1, smt_mask: 0-1
[0] https://git.kernel.org/pub/scm/linux/kernel/git/mips/linux.git/commit/?id=76ce7cfe35ef
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
It's reported that current memory detection code occasionally detects
larger memory under some bootloaders.
Current memory detection code tests whether address space wraps around
on KSEG0, which is unreliable because it's cached.
Rewrite memory size detection to perform the same test on KSEG1 instead.
While at it, this patch also does the following two things:
1. use a fixed pattern instead of a random function pointer as the magic
value.
2. add an additional memory write and a second comparison as part of the
test to prevent possible smaller memory detection result due to
leftover values in memory.
Fixes: 139c949f7f MIPS: ("ralink: mt7621: add memory detection support")
Reported-by: Rui Salvaterra <rsalvaterra@gmail.com>
Signed-off-by: Chuanhong Guo <gch981213@gmail.com>
Tested-by: Sergio Paracuellos <sergio.paracuellos@gmail.com>
Tested-by: Rui Salvaterra <rsalvaterra@gmail.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Currently, the following error messages are seen during boot:
asoc-simple-card sound: control 2:0:0:SPDIF Switch:0 is already present
cs4265 1-004f: ASoC: failed to add widget SPDIF dapm kcontrol SPDIF Switch: -16
Quoting Mark Brown:
"The driver is just plain buggy, it defines both a regular SPIDF Switch
control and a SND_SOC_DAPM_SWITCH() called SPDIF both of which will
create an identically named control, it can never have loaded without
error. One or both of those has to be renamed or they need to be
merged into one thing."
Fix the duplicated control name by combining the two SPDIF controls here
and move the register bits onto the DAPM widget and have DAPM control them.
Fixes: f853d6b3ba ("ASoC: cs4265: Add a S/PDIF enable switch")
Signed-off-by: Fabio Estevam <festevam@denx.de>
Acked-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Link: https://lore.kernel.org/r/20220215120514.1760628-1-festevam@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
The new TegraDRM UAPI uses syncpoint waiting with timeout set to
zero to indicate reading the syncpoint value. To support that we
need to return the syncpoint value always when waiting.
Fixes: 44e9613813 ("drm/tegra: Implement syncpoint wait UAPI")
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Different add ons to the wheel base report different models. Having
no wheel mounted to the base and using the open wheel attachment is
added here.
Signed-off-by: Michael Hübner <michaelh.95@t-online.de>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
As of logitech lightspeed receiver fw version 04.02.B0009,
HIDPP_PARAM_DEVICE_INFO is being reported as 0x11.
With patch "HID: logitech-dj: add support for the new lightspeed receiver
iteration", the mouse starts to error out with:
logitech-djreceiver: unusable device of type UNKNOWN (0x011) connected on
slot 1
and becomes unusable.
This has been noticed on a Logitech G Pro X Superlight fw MPM 25.01.B0018.
Signed-off-by: Lucas Zampieri <lzampier@redhat.com>
Acked-by: Nestor Lopez Casado <nlopezcasad@logitech.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
With v2 hardware, an IRQ can be configured to trigger on both edges via
a bit in the int_bothedge register. Currently, the driver sets this bit
when changing the trigger type to IRQ_TYPE_EDGE_BOTH, but fails to reset
this bit if the trigger type is later changed to something else. This
causes spurious IRQs, and when using gpio-keys with wakeup-event-action
set to EV_ACT_(DE)ASSERTED, those IRQs translate into spurious wakeups.
Fixes: 3bcbd1a85b ("gpio/rockchip: support next version gpio controller")
Reported-by: Guillaume Savaton <guillaume@baierouge.fr>
Tested-by: Guillaume Savaton <guillaume@baierouge.fr>
Signed-off-by: Samuel Holland <samuel@sholland.org>
Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl>
There are two problems with the current code that have been highlighted
with the AQL feature that is now enbaled by default.
First problem is in ieee80211_rx_h_mesh_fwding(),
ieee80211_select_queue_80211() is used on received packets to choose
the sending AC queue of the forwarding packet although this function
should only be called on TX packet (it uses ieee80211_tx_info).
This ends with forwarded mesh packets been sent on unrelated random AC
queue. To fix that, AC queue can directly be infered from skb->priority
which has been extracted from QOS info (see ieee80211_parse_qos()).
Second problem is the value of queue_mapping set on forwarded mesh
frames via skb_set_queue_mapping() is not the AC of the packet but a
hardware queue index. This may or may not work depending on AC to HW
queue mapping which is driver specific.
Both of these issues lead to improper AC selection while forwarding
mesh packets but more importantly due to improper airtime accounting
(which is done on a per STA, per AC basis) caused traffic stall with
the introduction of AQL.
Fixes: cf44012810 ("mac80211: fix unnecessary frame drops in mesh fwding")
Fixes: d3c1597b8d ("mac80211: fix forwarded mesh frame queue mapping")
Co-developed-by: Remi Pommarel <repk@triplefau.lt>
Signed-off-by: Remi Pommarel <repk@triplefau.lt>
Signed-off-by: Nicolas Escande <nico.escande@gmail.com>
Link: https://lore.kernel.org/r/20220214173214.368862-1-nico.escande@gmail.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
mac80211 set capability NL80211_EXT_FEATURE_CONTROL_PORT_OVER_NL80211
to upper layer by default. That means we should pass EAPoL packets through
nl80211 path only, and should not send the EAPoL skb to netdevice diretly.
At the meanwhile, wpa_supplicant would not register sock to listen EAPoL
skb on the netdevice.
However, there is no control_port_protocol handler in mac80211 for 802.3 RX
packets, mac80211 driver would pass up the EAPoL rekey frame to netdevice
and wpa_supplicant would be never interactive with this kind of packets,
if SUPPORTS_RX_DECAP_OFFLOAD is enabled. This causes STA always rekey fail
if EAPoL frame go through 802.3 path.
To avoid this problem, align the same process as 802.11 type to handle
this frame before put it into network stack.
This also addresses a potential security issue in 802.3 RX mode that was
previously fixed in commit a8c4d76a8d ("mac80211: do not accept/forward
invalid EAPOL frames").
Cc: stable@vger.kernel.org # 5.12+
Fixes: 80a915ec44 ("mac80211: add rx decapsulation offload support")
Signed-off-by: Deren Wu <deren.wu@mediatek.com>
Link: https://lore.kernel.org/r/6889c9fced5859ebb088564035f84fd0fa792a49.1644680751.git.deren.wu@mediatek.com
[fix typos, update comment and add note about security issue]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Speculation attacks against some high-performance processors can
make use of branch history to influence future speculation as part of
a spectre-v2 attack. This is not mitigated by CSV2, meaning CPUs that
previously reported 'Not affected' are now moderately mitigated by CSV2.
Update the value in /sys/devices/system/cpu/vulnerabilities/spectre_v2
to also show the state of the BHB mitigation.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
The Spectre-BHB workaround adds a firmware call to the vectors. This
is needed on some CPUs, but not others. To avoid the unaffected CPU in
a big/little pair from making the firmware call, create per cpu vectors.
The per-cpu vectors only apply when returning from EL0.
Systems using KPTI can use the canonical 'full-fat' vectors directly at
EL1, the trampoline exit code will switch to this_cpu_vector on exit to
EL0. Systems not using KPTI should always use this_cpu_vector.
this_cpu_vector will point at a vector in tramp_vecs or
__bp_harden_el1_vectors, depending on whether KPTI is in use.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
The trampoline code needs to use the address of symbols in the wider
kernel, e.g. vectors. PC-relative addressing wouldn't work as the
trampoline code doesn't run at the address the linker expected.
tramp_ventry uses a literal pool, unless CONFIG_RANDOMIZE_BASE is
set, in which case it uses the data page as a literal pool because
the data page can be unmapped when running in user-space, which is
required for CPUs vulnerable to meltdown.
Pull this logic out as a macro, instead of adding a third copy
of it.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Some CPUs affected by Spectre-BHB need a sequence of branches, or a
firmware call to be run before any indirect branch. This needs to go
in the vectors. No CPU needs both.
While this can be patched in, it would run on all CPUs as there is a
single set of vectors. If only one part of a big/little combination is
affected, the unaffected CPUs have to run the mitigation too.
Create extra vectors that include the sequence. Subsequent patches will
allow affected CPUs to select this set of vectors. Later patches will
modify the loop count to match what the CPU requires.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
In the rework of btrfs_defrag_file(), we always call
defrag_one_cluster() and increase the offset by cluster size, which is
only 256K.
But there are cases where we have a large extent (e.g. 128M) which
doesn't need to be defragged at all.
Before the refactor, we can directly skip the range, but now we have to
scan that extent map again and again until the cluster moves after the
non-target extent.
Fix the problem by allow defrag_one_cluster() to increase
btrfs_defrag_ctrl::last_scanned to the end of an extent, if and only if
the last extent of the cluster is not a target.
The test script looks like this:
mkfs.btrfs -f $dev > /dev/null
mount $dev $mnt
# As btrfs ioctl uses 32M as extent_threshold
xfs_io -f -c "pwrite 0 64M" $mnt/file1
sync
# Some fragemented range to defrag
xfs_io -s -c "pwrite 65548k 4k" \
-c "pwrite 65544k 4k" \
-c "pwrite 65540k 4k" \
-c "pwrite 65536k 4k" \
$mnt/file1
sync
echo "=== before ==="
xfs_io -c "fiemap -v" $mnt/file1
echo "=== after ==="
btrfs fi defrag $mnt/file1
sync
xfs_io -c "fiemap -v" $mnt/file1
umount $mnt
With extra ftrace put into defrag_one_cluster(), before the patch it
would result tons of loops:
(As defrag_one_cluster() is inlined, the function name is its caller)
btrfs-126062 [005] ..... 4682.816026: btrfs_defrag_file: r/i=5/257 start=0 len=262144
btrfs-126062 [005] ..... 4682.816027: btrfs_defrag_file: r/i=5/257 start=262144 len=262144
btrfs-126062 [005] ..... 4682.816028: btrfs_defrag_file: r/i=5/257 start=524288 len=262144
btrfs-126062 [005] ..... 4682.816028: btrfs_defrag_file: r/i=5/257 start=786432 len=262144
btrfs-126062 [005] ..... 4682.816028: btrfs_defrag_file: r/i=5/257 start=1048576 len=262144
...
btrfs-126062 [005] ..... 4682.816043: btrfs_defrag_file: r/i=5/257 start=67108864 len=262144
But with this patch there will be just one loop, then directly to the
end of the extent:
btrfs-130471 [014] ..... 5434.029558: defrag_one_cluster: r/i=5/257 start=0 len=262144
btrfs-130471 [014] ..... 5434.029559: defrag_one_cluster: r/i=5/257 start=67108864 len=16384
CC: stable@vger.kernel.org # 5.16
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
bpf_msg_push_data may return a non-zero value to indicate an error. The
return value should be checked to prevent undetected errors.
To indicate an error, the BPF programs now perform a different action
than their intended one to make the userspace test program notice the
error, i.e., the programs supposed to pass/redirect drop, the program
supposed to drop passes.
Fixes: 84fbfe026a ("bpf: test_sockmap add options to use msg_push_data")
Signed-off-by: Felix Maurer <fmaurer@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/89f767bb44005d6b4dd1f42038c438f76b3ebfad.1644601294.git.fmaurer@redhat.com
kpti is an optional feature, for systems not using kpti a set of
vectors for the spectre-bhb mitigations is needed.
Add another set of vectors, __bp_harden_el1_vectors, that will be
used if a mitigation is needed and kpti is not in use.
The EL1 ventries are repeated verbatim as there is no additional
work needed for entry from EL1.
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Adding a second set of vectors to .entry.tramp.text will make it
larger than a single 4K page.
Allow the trampoline text to occupy up to three pages by adding two
more fixmap slots. Previous changes to tramp_valias allowed it to reach
beyond a single page.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Spectre-BHB needs to add sequences to the vectors. Having one global
set of vectors is a problem for big/little systems where the sequence
is costly on cpus that are not vulnerable.
Making the vectors per-cpu in the style of KVM's bh_harden_hyp_vecs
requires the vectors to be generated by macros.
Make the kpti re-mapping of the kernel optional, so the macros can be
used without kpti.
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
The macros for building the kpti trampoline are all behind
CONFIG_UNMAP_KERNEL_AT_EL0, and in a region that outputs to the
.entry.tramp.text section.
Move the macros out so they can be used to generate other kinds of
trampoline. Only the symbols need to be guarded by
CONFIG_UNMAP_KERNEL_AT_EL0 and appear in the .entry.tramp.text section.
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
The tramp_ventry macro uses tramp_vectors as the address of the vectors
when calculating which ventry in the 'full fat' vectors to branch to.
While there is one set of tramp_vectors, this will be true.
Adding multiple sets of vectors will break this assumption.
Move the generation of the vectors to a macro, and pass the start
of the vectors as an argument to tramp_ventry.
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Systems using kpti enter and exit the kernel through a trampoline mapping
that is always mapped, even when the kernel is not. tramp_valias is a macro
to find the address of a symbol in the trampoline mapping.
Adding extra sets of vectors will expand the size of the entry.tramp.text
section to beyond 4K. tramp_valias will be unable to generate addresses
for symbols beyond 4K as it uses the 12 bit immediate of the add
instruction.
As there are now two registers available when tramp_alias is called,
use the extra register to avoid the 4K limit of the 12 bit immediate.
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
The trampoline code has a data page that holds the address of the vectors,
which is unmapped when running in user-space. This ensures that with
CONFIG_RANDOMIZE_BASE, the randomised address of the kernel can't be
discovered until after the kernel has been mapped.
If the trampoline text page is extended to include multiple sets of
vectors, it will be larger than a single page, making it tricky to
find the data page without knowing the size of the trampoline text
pages, which will vary with PAGE_SIZE.
Move the data page to appear before the text page. This allows the
data page to be found without knowing the size of the trampoline text
pages. 'tramp_vectors' is used to refer to the beginning of the
.entry.tramp.text section, do that explicitly.
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Kpti stashes x30 in far_el1 while it uses x30 for all its work.
Making the vectors a per-cpu data structure will require a second
register.
Allow tramp_exit two registers before it unmaps the kernel, by
leaving x30 on the stack, and stashing x29 in far_el1.
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Subsequent patches will add additional sets of vectors that use
the same tricks as the kpti vectors to reach the full-fat vectors.
The full-fat vectors contain some cleanup for kpti that is patched
in by alternatives when kpti is in use. Once there are additional
vectors, the cleanup will be needed in more cases.
But on big/little systems, the cleanup would be harmful if no
trampoline vector were in use. Instead of forcing CPUs that don't
need a trampoline vector to use one, make the trampoline cleanup
optional.
Entry at the top of the vectors will skip the cleanup. The trampoline
vectors can then skip the first instruction, triggering the cleanup
to run.
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
CPUs vulnerable to Spectre-BHB either need to make an SMC-CC firmware
call from the vectors, or run a sequence of branches. This gets added
to the hyp vectors. If there is no support for arch-workaround-1 in
firmware, the indirect vector will be used.
kvm_init_vector_slots() only initialises the two indirect slots if
the platform is vulnerable to Spectre-v3a. pKVM's hyp_map_vectors()
only initialises __hyp_bp_vect_base if the platform is vulnerable to
Spectre-v3a.
As there are about to more users of the indirect vectors, ensure
their entries in hyp_spectre_vector_selector[] are always initialised,
and __hyp_bp_vect_base defaults to the regular VA mapping.
The Spectre-v3a check is moved to a helper
kvm_system_needs_idmapped_vectors(), and merged with the code
that creates the hyp mappings.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
The spectre-v4 sequence includes an SMC from the assembly entry code.
spectre_v4_patch_fw_mitigation_conduit is the patching callback that
generates an HVC or SMC depending on the SMCCC conduit type.
As this isn't specific to spectre-v4, rename it
smccc_patch_fw_mitigation_conduit so it can be re-used.
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Subsequent patches add even more code to the ventry slots.
Ensure kernels that overflow a ventry slot don't get built.
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
It appears that last minute change moved ACPI ID of Alder Lake-M
to the INTC1055, which is already in the driver.
This ID on the other hand will be used elsewhere.
This reverts commit 258435a1c8.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
The -ENODEV return value from xhci_check_args() is incorrectly changed
to -EINVAL in a couple places before propagated further.
xhci_check_args() returns 4 types of value, -ENODEV, -EINVAL, 1 and 0.
xhci_urb_enqueue and xhci_check_streams_endpoint return -EINVAL if
the return value of xhci_check_args <= 0.
This causes problems for example r8152_submit_rx, calling usb_submit_urb
in drivers/net/usb/r8152.c.
r8152_submit_rx will never get -ENODEV after submiting an urb when xHC
is halted because xhci_urb_enqueue returns -EINVAL in the very beginning.
[commit message and header edit -Mathias]
Fixes: 203a86613f ("xhci: Avoid NULL pointer deref when host dies.")
Cc: stable@vger.kernel.org
Signed-off-by: Hongyu Xie <xiehongyu1@kylinos.cn>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20220215123320.1253947-3-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit e0082698b6 ("usb: dwc3: ulpi: conditionally resume ULPI PHY")
fixed an issue where ULPI transfers would timeout if any requests where
send to the phy sometime after init, giving it enough time to auto-suspend.
Commit e5f4ca3fce ("usb: dwc3: ulpi: Fix USB2.0 HS/FS/LS PHY suspend
regression") changed the behavior to instead of clearing the
DWC3_GUSB2PHYCFG_SUSPHY bit, add an extra sleep when it is set.
But on Bay Trail devices, when phy_set_mode() gets called during init,
this leads to errors like these:
[ 28.451522] tusb1210 dwc3.ulpi: error -110 writing val 0x01 to reg 0x0a
[ 28.464089] tusb1210 dwc3.ulpi: error -110 writing val 0x01 to reg 0x0a
Add "snps,dis_u2_susphy_quirk" to the settings for Bay Trail devices to
fix this. This restores the old behavior for Bay Trail devices, since
previously the DWC3_GUSB2PHYCFG_SUSPHY bit would get cleared on the first
ulpi_read/_write() and then was never set again.
Fixes: e5f4ca3fce ("usb: dwc3: ulpi: Fix USB2.0 HS/FS/LS PHY suspend regression")
Cc: stable@kernel.org
Cc: Serge Semin <Sergey.Semin@baikalelectronics.ru>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20220213130524.18748-2-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
As previously discussed(https://lkml.org/lkml/2022/1/20/51),
cpuset_attach() is affected with similar cpu hotplug race,
as follow scenario:
cpuset_attach() cpu hotplug
--------------------------- ----------------------
down_write(cpuset_rwsem)
guarantee_online_cpus() // (load cpus_attach)
sched_cpu_deactivate
set_cpu_active()
// will change cpu_active_mask
set_cpus_allowed_ptr(cpus_attach)
__set_cpus_allowed_ptr_locked()
// (if the intersection of cpus_attach and
cpu_active_mask is empty, will return -EINVAL)
up_write(cpuset_rwsem)
To avoid races such as described above, protect cpuset_attach() call
with cpu_hotplug_lock.
Fixes: be367d0992 ("cgroups: let ss->can_attach and ss->attach do whole threadgroups at a time")
Cc: stable@vger.kernel.org # v2.6.32+
Reported-by: Zhao Gongyi <zhaogongyi@huawei.com>
Signed-off-by: Zhang Qiao <zhangqiao22@huawei.com>
Acked-by: Waiman Long <longman@redhat.com>
Reviewed-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
The AMD IOMMU logs I/O page faults and such to a ring buffer in
system memory, and this ring buffer can overflow. The AMD IOMMU
spec has the following to say about the interrupt status bit that
signals this overflow condition:
EventOverflow: Event log overflow. RW1C. Reset 0b. 1 = IOMMU
event log overflow has occurred. This bit is set when a new
event is to be written to the event log and there is no usable
entry in the event log, causing the new event information to
be discarded. An interrupt is generated when EventOverflow = 1b
and MMIO Offset 0018h[EventIntEn] = 1b. No new event log
entries are written while this bit is set. Software Note: To
resume logging, clear EventOverflow (W1C), and write a 1 to
MMIO Offset 0018h[EventLogEn].
The AMD IOMMU driver doesn't currently implement this recovery
sequence, meaning that if a ring buffer overflow occurs, logging
of EVT/PPR/GA events will cease entirely.
This patch implements the spec-mandated reset sequence, with the
minor tweak that the hardware seems to want to have a 0 written to
MMIO Offset 0018h[EventLogEn] first, before writing an 1 into this
field, or the IOMMU won't actually resume logging events.
Signed-off-by: Lennert Buytenhek <buytenh@arista.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/YVrSXEdW2rzEfOvk@wantstofly.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The problem I'm addressing was discovered by the LTP test covering
cve-2018-1000204.
A short description of what happens follows:
1) The test case issues a command code 00 (TEST UNIT READY) via the SG_IO
interface with: dxfer_len == 524288, dxdfer_dir == SG_DXFER_FROM_DEV
and a corresponding dxferp. The peculiar thing about this is that TUR
is not reading from the device.
2) In sg_start_req() the invocation of blk_rq_map_user() effectively
bounces the user-space buffer. As if the device was to transfer into
it. Since commit a45b599ad8 ("scsi: sg: allocate with __GFP_ZERO in
sg_build_indirect()") we make sure this first bounce buffer is
allocated with GFP_ZERO.
3) For the rest of the story we keep ignoring that we have a TUR, so the
device won't touch the buffer we prepare as if the we had a
DMA_FROM_DEVICE type of situation. My setup uses a virtio-scsi device
and the buffer allocated by SG is mapped by the function
virtqueue_add_split() which uses DMA_FROM_DEVICE for the "in" sgs (here
scatter-gather and not scsi generics). This mapping involves bouncing
via the swiotlb (we need swiotlb to do virtio in protected guest like
s390 Secure Execution, or AMD SEV).
4) When the SCSI TUR is done, we first copy back the content of the second
(that is swiotlb) bounce buffer (which most likely contains some
previous IO data), to the first bounce buffer, which contains all
zeros. Then we copy back the content of the first bounce buffer to
the user-space buffer.
5) The test case detects that the buffer, which it zero-initialized,
ain't all zeros and fails.
One can argue that this is an swiotlb problem, because without swiotlb
we leak all zeros, and the swiotlb should be transparent in a sense that
it does not affect the outcome (if all other participants are well
behaved).
Copying the content of the original buffer into the swiotlb buffer is
the only way I can think of to make swiotlb transparent in such
scenarios. So let's do just that if in doubt, but allow the driver
to tell us that the whole mapped buffer is going to be overwritten,
in which case we can preserve the old behavior and avoid the performance
impact of the extra bounce.
Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
The SCMI binding clearly states the value of #thermal-sensor-cells must
be 1. However arch/arm64/boot/dts/freescale/imx8ulp.dtsi sets it 0 which
results in the following warning with dtbs_check:
| arch/arm64/boot/dts/freescale/imx8ulp-evk.dt.yaml: scmi:
| protocol@15:#thermal-sensor-cells:0:0: 1 was expected
| From schema: Documentation/devicetree/bindings/firmware/arm,scmi.yaml
Fix it by setting it to 1 as required.
Cc:Shawn Guo <shawnguo@kernel.org>
Cc: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Reviewed-by: Fabio Estevam <festevam@gmail.com>
Acked-by: Peng Fan <peng.fan@nxp.com>
Fixes: a38771d7a4 ("arm64: dts: imx8ulp: add scmi firmware node")
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
The vpumix power domain has a reset assigned to it, however
when used, it causes a system hang. Testing has shown that
it does not appear to be needed anywhere.
Fixes: d39d4bb153 ("arm64: dts: imx8mm: add GPC node")
Signed-off-by: Adam Ford <aford173@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
nf_defrag_ipv6_disable() requires CONFIG_IP6_NF_IPTABLES.
Fixes: 75063c9294 ("netfilter: xt_socket: fix a typo in socket_mt_destroy()")
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Eric Dumazet<edumazet@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Yonghong Song says:
====================
The patch [1] exposed a bpf_timer initialization bug in function
check_and_init_map_value(). With bug fix here, the patch [1]
can be applied with all selftests passed. Please see individual
patches for fix details.
[1] https://lore.kernel.org/bpf/20220209070324.1093182-2-memxor@gmail.com/
Changelog:
v3 -> v4:
. move header file in patch #1 to avoid bpf-next merge conflict
v2 -> v3:
. switch patch #1 and patch #2 for better bisecting
v1 -> v2:
. add Fixes tag for patch #1
. rebase against bpf tree
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Currently the following code in check_and_init_map_value()
*(struct bpf_timer *)(dst + map->timer_off) =
(struct bpf_timer){};
can help generate bpf_timer definition in vmlinuxBTF.
But the code above may not zero the whole structure
due to anonymour members and that code will be replaced
by memset in the subsequent patch and
bpf_timer definition will disappear from vmlinuxBTF.
Let us emit the type explicitly so bpf program can continue
to use it from vmlinux.h.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220211194948.3141529-1-yhs@fb.com
Kumar Kartikeya says:
====================
A fix for an oversight in copy_map_value that leads to kernel crash.
Also, a question for BPF developers:
It seems in arraymap.c, we always do check_and_free_timer_in_array after we do
copy_map_value in map_update_elem callback, but the same is not done for
hashtab.c. Is there a specific reason for this difference in behavior, or did I
miss that it happens for hashtab.c as well?
Changlog:
---------
v1 -> v2:
v1: https://lore.kernel.org/bpf/20220209051113.870717-1-memxor@gmail.com
* Fix build error for selftests patch due to missing SYS_PREFIX in bpf tree
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
If bpf_msg_push_data() is called with len 0 (as it happens during
selftests/bpf/test_sockmap), we do not need to do anything and can
return early.
Calling bpf_msg_push_data() with len 0 previously lead to a wrong ENOMEM
error: we later called get_order(copy + len); if len was 0, copy + len
was also often 0 and get_order() returned some undefined value (at the
moment 52). alloc_pages() caught that and failed, but then bpf_msg_push_data()
returned ENOMEM. This was wrong because we are most probably not out of
memory and actually do not need any additional memory.
Fixes: 6fff607e2f ("bpf: sk_msg program helper bpf_msg_push_data")
Signed-off-by: Felix Maurer <fmaurer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/df69012695c7094ccb1943ca02b4920db3537466.1644421921.git.fmaurer@redhat.com
In commit 02b9984d64, we pushed a sync_filesystem() call from the VFS
into xfs_fs_remount. The only time that we ever need to push dirty file
data or metadata to disk for a remount is if we're remounting the
filesystem read only, so this really could be moved to xfs_remount_ro.
Once we've moved the call site, actually check the return value from
sync_filesystem.
Fixes: 02b9984d64 ("fs: push sync_filesystem() down to the file system's remount_fs()")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
The DWC2 USB controller on the Agilex platform does not support clock
gating, so use the chip specific "intel,socfpga-agilex-hsotg"
compatible.
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
Add the compatible "intel,socfpga-agilex-hsotg" to the DWC2
implementation, because the Agilex DWC2 implementation does not support
clock gating.
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
struct xfrm_user_offload has flags variable that received user input,
but kernel didn't check if valid bits were provided. It caused a situation
where not sanitized input was forwarded directly to the drivers.
For example, XFRM_OFFLOAD_IPV6 define that was exposed, was used by
strongswan, but not implemented in the kernel at all.
As a solution, check and sanitize input flags to forward
XFRM_OFFLOAD_INBOUND to the drivers.
Fixes: d77e38e612 ("xfrm: Add an IPsec hardware offloading API")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
In the event that the SoC is under thermal pressure while booting it's
possible for the dcvs notification to happen inbetween the cpufreq
framework calling init and it actually updating the policy's
related_cpus cpumask.
Prior to the introduction of the thermal pressure update helper an empty
cpumask would simply result in the thermal pressure of no cpus being
updated, but the new code will attempt to dereference an invalid per_cpu
variable.
Avoid this problem by using the newly reintroduced "ready" callback, to
postpone enabling the IRQ until the related_cpus cpumask is filled in.
Fixes: 0258cb19c7 ("cpufreq: qcom-cpufreq-hw: Use new thermal pressure update function")
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
This effectively revert '4bf8e582119e ("cpufreq: Remove ready()
callback")', in order to reintroduce the ready callback.
This is needed in order to be able to leave the thermal pressure
interrupts in the Qualcomm CPUfreq driver disabled during
initialization, so that it doesn't fire while related_cpus are still 0.
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
[ Viresh: Added the Chinese translation as well and updated commit msg ]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
In zynq_qspi_exec_mem_op(), kzalloc() is directly used in memset(),
which could lead to a NULL pointer dereference on failure of
kzalloc().
Fix this bug by adding a check of tmpbuf.
This bug was found by a static analyzer. The analysis employs
differential checking to identify inconsistent security operations
(e.g., checks or kfrees) between two code paths and confirms that the
inconsistent operations are not recovered in the current function or
the callers, so they constitute bugs.
Note that, as a bug found by static analysis, it can be a false
positive or hard to trigger. Multiple researchers have cross-reviewed
the bug.
Builds with CONFIG_SPI_ZYNQ_QSPI=m show no new warnings,
and our static analyzer no longer warns about this code.
Fixes: 67dca5e580 ("spi: spi-mem: Add support for Zynq QSPI controller")
Signed-off-by: Zhou Qingyang <zhou1615@umn.edu>
Link: https://lore.kernel.org/r/20211130172253.203700-1-zhou1615@umn.edu
Signed-off-by: Mark Brown <broonie@kernel.org>
The fan curve control patches introduced a regression for at least the
TUF FX506 and possibly other TUF series laptops that do not have support
for fan curve control.
As part of the probing process, asus_wmi_evaluate_method_buf is called
to get the factory default fan curve . The WMI management function
returns 0 on certain laptops to indicate lack of fan curve control
instead of ASUS_WMI_UNSUPPORTED_METHOD. This 0 is transformed to
-ENODATA which results in failure when probing.
Fixes: 0f0ac158d2 ("platform/x86: asus-wmi: Add support for custom fan curves")
Reported-and-tested-by: Abhijeet V <abhijeetviswa@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20220205112840.33095-1-hdegoede@redhat.com
The introduction of '9a61f813fcc8 ("clk: qcom: regmap-mux: fix parent
clock lookup")' broke UFS support on SM8350.
The cause for this is that the symbol clocks have a specified rate in
the "freq-table-hz" table in the UFS node, which causes the UFS code to
request a rate change, for which the "bi_tcxo" happens to provide the
closest rate. Prior to the change in regmap-mux it was determined
(incorrectly) that no change was needed and everything worked.
The rates of 75 and 300MHz matches the documentation for the symbol
clocks, but we don't represent the parent clocks today. So let's mimic
the configuration found in other platforms, by omitting the rate for the
symbol clocks as well to avoid the rate change.
While at it also fill in the dummy symbol clocks that was dropped from
the GCC driver as it was upstreamed.
Fixes: 59c7cf8147 ("arm64: dts: qcom: sm8350: Add UFS nodes")
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Reviewed-by: Vinod Koul <vkoul@kernel.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Link: https://lore.kernel.org/r/20211222162058.3418902-1-bjorn.andersson@linaro.org
The text of various warning messages triggered by unittest has
changed. Update the text of expected warnings to match.
The expected vs actual warnings are most easily seen by filtering
the boot console messages with the of_unittest_expect program at
https://github.com/frowand/dt_tools.git. The filter prefixes
problem lines with '***', and prefixes lines that match expected
errors with 'ok '. All other lines are prefixed with ' '.
Unrelated lines have been deleted in the following examples.
The mismatch appears as:
-> ### dt-test ### start of unittest - you will see error messages
OF: /testcase-data/phandle-tests/consumer-a: #phandle-cells = 3 found 1
** of_unittest_expect WARNING - not found ---> OF: /testcase-data/phandle-tests/consumer-a: #phandle-cells = 3 found -1
OF: /testcase-data/phandle-tests/consumer-a: #phandle-cells = 3 found 1
** of_unittest_expect WARNING - not found ---> OF: /testcase-data/phandle-tests/consumer-a: #phandle-cells = 3 found -1
OF: /testcase-data/phandle-tests/consumer-b: #phandle-cells = 2 found 1
** of_unittest_expect WARNING - not found ---> OF: /testcase-data/phandle-tests/consumer-b: #phandle-cells = 2 found -1
platform testcase-data:testcase-device2: error -ENXIO: IRQ index 0 not found
** of_unittest_expect WARNING - not found ---> platform testcase-data:testcase-device2: IRQ index 0 not found
-> ### dt-test ### end of unittest - 254 passed, 0 failed
** EXPECT statistics:
**
** EXPECT found : 42
** EXPECT not found : 4
With this commit applied, the mismatch is resolved:
-> ### dt-test ### start of unittest - you will see error messages
ok OF: /testcase-data/phandle-tests/consumer-a: #phandle-cells = 3 found 1
ok OF: /testcase-data/phandle-tests/consumer-a: #phandle-cells = 3 found 1
ok OF: /testcase-data/phandle-tests/consumer-b: #phandle-cells = 2 found 1
ok platform testcase-data:testcase-device2: error -ENXIO: IRQ index 0 not found
-> ### dt-test ### end of unittest - 254 passed, 0 failed
** EXPECT statistics:
**
** EXPECT found : 46
** EXPECT not found : 0
Fixes: 2043727c28 ("driver core: platform: Make use of the helper function dev_err_probe()")
Fixes: 94a4950a4a ("of: base: Fix phandle argument length mismatch error message")
Signed-off-by: Frank Rowand <frank.rowand@sony.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20220127192643.2534941-1-frowand.list@gmail.com
The pm_runtime_enable will increase power disable depth.
If the probe fails, we should use pm_runtime_disable() to balance
pm_runtime_enable(). In the PM Runtime docs:
Drivers in ->remove() callback should undo the runtime PM changes done
in ->probe(). Usually this means calling pm_runtime_disable(),
pm_runtime_dont_use_autosuspend() etc.
We should do this in error handling.
Fix this problem for the following drivers: bmc150, bmg160, kmx61,
kxcj-1013, mma9551, mma9553.
Fixes: 7d0ead5c3f ("iio: Reconcile operation order between iio_register/unregister and pm functions")
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20220106112309.16879-1-linmq006@gmail.com
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
The Quartz64 Model A uses a voltage divider to ensure ddr voltage is
within specification from the default regulator configuration.
Adjusting this voltage is detrimental, and currently causes the ddr
voltage to be about 0.8v.
Remove the min and max voltage setpoints for the ddr regulator.
Fixes: b33a22a1e7 ("arm64: dts: rockchip: add basic dts for Pine64 Quartz64-A")
Signed-off-by: Peter Geis <pgwipeout@gmail.com>
Link: https://lore.kernel.org/r/20220128003809.3291407-2-pgwipeout@gmail.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
The field offset for port configuration status on SPR has been changed to
bit 14 from ICX where it resides at bit 12. By chance link status detection
continued to work on SPR. This is due to bit 12 being a configuration bit
which is in sync with the status bit. Fix this by checking for a SPR device
and checking correct status bit.
Fixes: 26bfe3d0b2 ("ntb: intel: Add Icelake (gen4) support for Intel NTB")
Tested-by: Jerry Dai <jerry.dai@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Commit e762232f94 ("arm64: tegra: Add ISO SMMU controller for Tegra194")
added the ISO SMMU for display devices on Tegra194. The SMMU is enabled by
default but not hooked up to the display controllers yet because we do not
have a way to pass frame-buffer memory from the bootloader to the kernel.
However, even though the SMMU is not hooked up to the display controllers'
SMMU faults are being seen if a display is connected. Therefore, keep the
ISO SMMU disabled by default for now.
Fixes: e762232f94 ("arm64: tegra: Add ISO SMMU controller for Tegra194")
Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Tegra186+ hangs if host1x hardware is disabled at a kernel boot time
because we touch hardware before runtime PM is resumed. Move sync point
assignment initialization to the RPM-resume callback. Older SoCs were
unaffected because they skip that sync point initialization.
Tested-by: Jon Hunter <jonathanh@nvidia.com> # T186
Reported-by: Jon Hunter <jonathanh@nvidia.com> # T186
Fixes: 6b6776e2ab ("gpu: host1x: Add initial runtime PM and OPP support")
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
This reverts commit b515d26372.
Commit b515d26372 ("xfrm: xfrm_state_mtu
should return at least 1280 for ipv6") in v5.14 breaks the TCP MSS
calculation in ipsec transport mode, resulting complete stalls of TCP
connections. This happens when the (P)MTU is 1280 or slighly larger.
The desired formula for the MSS is:
MSS = (MTU - ESP_overhead) - IP header - TCP header
However, the above commit clamps the (MTU - ESP_overhead) to a
minimum of 1280, turning the formula into
MSS = max(MTU - ESP overhead, 1280) - IP header - TCP header
With the (P)MTU near 1280, the calculated MSS is too large and the
resulting TCP packets never make it to the destination because they
are over the actual PMTU.
The above commit also causes suboptimal double fragmentation in
xfrm tunnel mode, as described in
https://lore.kernel.org/netdev/20210429202529.codhwpc7w6kbudug@dwarf.suse.cz/
The original problem the above commit was trying to fix is now fixed
by commit 6596a02295 ("xfrm: fix MTU
regression").
Signed-off-by: Jiri Bohac <jbohac@suse.cz>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Although it is painstakingly honest to describe all 3 PCI windows in
"dma-ranges", it misses the the subtle distinction that the window for
the GICv2m range is normally programmed for Device memory attributes
rather than Normal Cacheable like the DRAM windows. Since MMU-401 only
offers stage 2 translation, this means that when the PCI SMMU is
enabled, accesses through that IPA range unexpectedly lose coherency if
mapped as cacheable at the SMMU, due to the attribute combining rules.
Since an extra 256KB is neither here nor there when we still have 10GB
worth of usable address space, rather than attempting to describe and
cope with this detail let's just remove the offending range. If the SMMU
is not used then it makes no difference anyway.
Link: https://lore.kernel.org/r/856c3f7192c6c3ce545ba67462f2ce9c86ed6b0c.1643046936.git.robin.murphy@arm.com
Fixes: 4ac4d146cb ("arm64: dts: juno: Describe PCI dma-ranges")
Reported-by: Anders Roxell <anders.roxell@linaro.org>
Acked-by: Liviu Dudau <liviu.dudau@arm.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
xfrm_migrate cannot handle address family change of an xfrm_state.
The symptons are the xfrm_state will be migrated to a wrong address,
and sending as well as receiving packets wil be broken.
This commit fixes it by breaking the original xfrm_state_clone
method into two steps so as to update the props.family before
running xfrm_init_state. As the result, xfrm_state's inner mode,
outer mode, type and IP header length in xfrm_state_migrate can
be updated with the new address family.
Tested with additions to Android's kernel unit test suite:
https://android-review.googlesource.com/c/kernel/tests/+/1885354
Signed-off-by: Yan Yan <evitayan@google.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
This patch enables distinguishing SAs and SPs based on if_id during
the xfrm_migrate flow. This ensures support for xfrm interfaces
throughout the SA/SP lifecycle.
When there are multiple existing SPs with the same direction,
the same xfrm_selector and different endpoint addresses,
xfrm_migrate might fail with ENODATA.
Specifically, the code path for performing xfrm_migrate is:
Stage 1: find policy to migrate with
xfrm_migrate_policy_find(sel, dir, type, net)
Stage 2: find and update state(s) with
xfrm_migrate_state_find(mp, net)
Stage 3: update endpoint address(es) of template(s) with
xfrm_policy_migrate(pol, m, num_migrate)
Currently "Stage 1" always returns the first xfrm_policy that
matches, and "Stage 3" looks for the xfrm_tmpl that matches the
old endpoint address. Thus if there are multiple xfrm_policy
with same selector, direction, type and net, "Stage 1" might
rertun a wrong xfrm_policy and "Stage 3" will fail with ENODATA
because it cannot find a xfrm_tmpl with the matching endpoint
address.
The fix is to allow userspace to pass an if_id and add if_id
to the matching rule in Stage 1 and Stage 2 since if_id is a
unique ID for xfrm_policy and xfrm_state. For compatibility,
if_id will only be checked if the attribute is set.
Tested with additions to Android's kernel unit test suite:
https://android-review.googlesource.com/c/kernel/tests/+/1668886
Signed-off-by: Yan Yan <evitayan@google.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Commit 749439bfac ("ipv6: fix udpv6
sendmsg crash caused by too small MTU") breaks PMTU for xfrm.
A Packet Too Big ICMPv6 message received in response to an ESP
packet will prevent all further communication through the tunnel
if the reported MTU minus the ESP overhead is smaller than 1280.
E.g. in a case of a tunnel-mode ESP with sha256/aes the overhead
is 92 bytes. Receiving a PTB with MTU of 1371 or less will result
in all further packets in the tunnel dropped. A ping through the
tunnel fails with "ping: sendmsg: Invalid argument".
Apparently the MTU on the xfrm route is smaller than 1280 and
fails the check inside ip6_setup_cork() added by 749439bf.
We found this by debugging USGv6/ipv6ready failures. Failing
tests are: "Phase-2 Interoperability Test Scenario IPsec" /
5.3.11 and 5.4.11 (Tunnel Mode: Fragmentation).
Commit b515d26372 ("xfrm:
xfrm_state_mtu should return at least 1280 for ipv6") attempted
to fix this but caused another regression in TCP MSS calculations
and had to be reverted.
The patch below fixes the situation by dropping the MTU
check and instead checking for the underflows described in the
749439bf commit message.
Signed-off-by: Jiri Bohac <jbohac@suse.cz>
Fixes: 749439bfac ("ipv6: fix udpv6 sendmsg crash caused by too small MTU")
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Make use of the struct_size() helper instead of an open-coded version,
in order to avoid any potential type mistakes or integer overflows that,
in the worst scenario, could lead to heap overflows.
Also, address the following sparse warnings:
drivers/ntb/msi.c:46:23: warning: using sizeof on a flexible structure
Link: https://github.com/KSPP/linux/issues/174
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
On one side we have indio_dev->num_channels includes all physical channels +
timestamp channel. On other side we have an array allocated only for
physical channels. So, fix memory corruption by ARRAY_SIZE() instead of
num_channels variable.
Note the first case is a cleanup rather than a fix as the software
timestamp channel bit in active_scanmask is never set by the IIO core.
Fixes: 9374e8f5a3 ("iio: adc: add ADC driver for the TI TSC2046 controller")
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://lore.kernel.org/r/20220107081401.2816357-1-o.rempel@pengutronix.de
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
pclk_xpcs is not supported by mainline driver and breaks dtbs_check
following warnings occour, and many more
rk3568-evb1-v10.dt.yaml: ethernet@fe2a0000: clocks:
[[15, 386], [15, 389], [15, 389], [15, 184], [15, 180], [15, 181],
[15, 389], [15, 185], [15, 172]] is too long
From schema: Documentation/devicetree/bindings/net/snps,dwmac.yaml
rk3568-evb1-v10.dt.yaml: ethernet@fe2a0000: clock-names:
['stmmaceth', 'mac_clk_rx', 'mac_clk_tx', 'clk_mac_refout', 'aclk_mac',
'pclk_mac', 'clk_mac_speed', 'ptp_ref', 'pclk_xpcs'] is too long
From schema: Documentation/devicetree/bindings/net/snps,dwmac.yaml
after removing it, the clock and other warnings are gone.
pclk_xpcs on gmac is used to support QSGMII, but this requires a driver
supporting it.
Once xpcs support is introduced, the clock can be added to the
documentation and both controllers.
Fixes: b8d41e5053 ("arm64: dts: rockchip: add gmac0 node to rk3568")
Co-developed-by: Peter Geis <pgwipeout@gmail.com>
Signed-off-by: Peter Geis <pgwipeout@gmail.com>
Signed-off-by: Frank Wunderlich <frank-w@public-files.de>
Acked-by: Michael Riesch <michael.riesch@wolfvision.net>
Link: https://lore.kernel.org/r/20220123133510.135651-1-linux@fw-web.de
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
DMA-Cotrollers defined in rk356x.dtsi do not match the pattern in bindings.
arch/arm64/boot/dts/rockchip/rk3568-evb1-v10.dt.yaml:
dmac@fe530000: $nodename:0: 'dmac@fe530000' does not match '^dma-controller(@.*)?$'
From schema: Documentation/devicetree/bindings/dma/arm,pl330.yaml
arch/arm64/boot/dts/rockchip/rk3568-evb1-v10.dt.yaml:
dmac@fe550000: $nodename:0: 'dmac@fe550000' does not match '^dma-controller(@.*)?$'
From schema: Documentation/devicetree/bindings/dma/arm,pl330.yaml
This Patch fixes it.
Signed-off-by: Frank Wunderlich <frank-w@public-files.de>
Link: https://lore.kernel.org/r/20220123133615.135789-1-linux@fw-web.de
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
- Firmware status: Show if Indirect Branch Restricted Speculation (IBRS) is
used to protect against Spectre variant 2 attacks when calling firmware (x86 only).
@@ -583,12 +598,13 @@ kernel command line.
Specific mitigations can also be selected manually:
retpoline
replace indirect branches
retpoline,generic
google's original retpoline
retpoline,amd
AMD-specific minimal thunk
retpoline auto pick between generic,lfence
retpoline,generic Retpolines
retpoline,lfence LFENCE; indirect branch
retpoline,amd alias for retpoline,lfence
eibrs enhanced IBRS
eibrs,retpoline enhanced IBRS + Retpolines
eibrs,lfence enhanced IBRS + LFENCE
Not specifying this option is equivalent to
spectre_v2=auto.
@@ -599,7 +615,7 @@ kernel command line.
spectre_v2=off. Spectre variant 1 mitigations
cannot be disabled.
For spectre_v2_user see :doc:`/admin-guide/kernel-parameters`.
For spectre_v2_user see Documentation/admin-guide/kernel-parameters.txt
Mitigation selection guide
--------------------------
@@ -681,7 +697,7 @@ AMD white papers:
.._spec_ref6:
[6] `Software techniques for managing speculation on AMD processors <https://developer.amd.com/wp-content/resources/90343-B_SoftwareTechniquesforManagingSpeculation_WP_7-18Update_FNL.pdf>`_.
[6] `Software techniques for managing speculation on AMD processors <https://developer.amd.com/wp-content/resources/Managing-Speculation-on-AMD-Processors.pdf>`_.
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.