Pull LED fix from Pavel Machek:
"One-liner fixing a build problem"
* 'for-rc8-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/pavel/linux-leds:
leds: rt8515: add V4L2_FLASH_LED_CLASS dependency
Pull Kbuild fixes from Masahiro Yamada:
- Fix CONFIG_TRIM_UNUSED_KSYMS build for ppc64
- Use pkg-config for scripts/sign-file.c CFLAGS
* tag 'kbuild-fixes-v5.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
scripts: set proper OpenSSL include dir also for sign-file
sparc: remove wrong comment from arch/sparc/include/asm/Kbuild
kbuild: fix CONFIG_TRIM_UNUSED_KSYMS build for ppc64
Pull x86 fixes from Borislav Petkov:
"I kinda knew while typing 'I hope this is the last batch of x86/urgent
updates' last week, Murphy was reading too and uttered 'Hold my
beer!'.
So here's more fixes... Thanks Murphy.
Anyway, three more x86/urgent fixes for 5.11 final. We should be
finally ready (famous last words). :-)
- An SGX use after free fix
- A fix for the fix to disable CET instrumentation generation for
kernel code. We forgot 32-bit, which we seem to do very often
nowadays
- A Xen PV fix to irqdomain init ordering"
* tag 'x86_urgent_for_v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/pci: Create PCI/MSI irqdomain after x86_init.pci.arch_init()
x86/build: Disable CET instrumentation in the kernel for 32-bit too
x86/sgx: Maintain encl->refcount for each encl->mm_list entry
The leds-rt8515 driver can optionall use the v4l2 flash led class,
but it causes a link error when that class is in a loadable module
and the rt8515 driver itself is built-in:
ld.lld: error: undefined symbol: v4l2_flash_init
>>> referenced by leds-rt8515.c
>>> leds/flash/leds-rt8515.o:(rt8515_probe) in archive
drivers/built-in.a
Adding 'depends on V4L2_FLASH_LED_CLASS' in Kconfig would avoid that,
but it would make it impossible to use the driver without the
v4l2 support.
Add the same dependency that the other users of this class have
instead, which just prevents the broken configuration.
Fixes: e1c6edcbea ("leds: rt8515: Add Richtek RT8515 LED driver")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Pavel Machek <pavel@ucw.cz>
These are NOT exported to userspace.
The headers listed in arch/sparc/include/uapi/asm/Kbuild are exported.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Pull clk fix from Stephen Boyd:
"One small fix for the Allwinner clk driver so that display clks figure
out the correct rate to use.
This fixes displays running 4k@60Hz and some other resolutions that
haven't been exercised and fully understood until now"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: sunxi-ng: mp: fix parent rate change flag check
Pull SCSI fix from James Bottomley:
"One fix for scsi_debug that fixes a memory leak on module removal"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: scsi_debug: Fix a memory leak
Pull cgroup fixes from Tejun Heo:
"Two cgroup fixes:
- fix a NULL deref when trying to poll PSI in the root cgroup
- fix confusing controller parsing corner case when mounting cgroup
v1 hierarchies
And doc / maintainer file updates"
* 'for-5.11-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: update PSI file description in docs
cgroup: fix psi monitor for root cgroup
MAINTAINERS: Update my email address
MAINTAINERS: Remove stale URLs for cpuset
cgroup-v1: add disabled controller check in cgroup1_parse_param()
Merge fixes from Andrew Morton:
"6 patches.
Subsystems affected by this patch series: mm/pagemap, scripts,
MAINTAINERS, and h8300"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
h8300: fix PREEMPTION build, TI_PRE_COUNT undefined
MAINTAINERS: add Andrey Konovalov to KASAN reviewers
MAINTAINERS: update Andrey Konovalov's email address
MAINTAINERS: update KASAN file list
scripts/recordmcount.pl: support big endian for ARCH sh
m68k: make __pfn_to_phys() and __phys_to_pfn() available for !MMU
Pull i2c fix from Wolfram Sang:
"One more I2C driver bugfix"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: stm32f7: fix configuration of the digital filter
Pull btrfs fix from David Sterba:
"A regression fix caused by a refactoring in 5.11.
A corrupted superblock wouldn't be detected by checksum verification
due to wrongly placed initialization of the checksum length, thus
making memcmp always work"
* tag 'for-5.11-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: initialize fs_info::csum_size earlier in open_ctree
The kernel test robot reported the following issue:
CC [M] drivers/soc/litex/litex_soc_ctrl.o
sh4-linux-objcopy: Unable to change endianness of input file(s)
sh4-linux-ld: cannot find drivers/soc/litex/.tmp_gl_litex_soc_ctrl.o: No such file or directory
sh4-linux-objcopy: 'drivers/soc/litex/.tmp_mx_litex_soc_ctrl.o': No such file
The problem is that the format of input file is elf32-shbig-linux, but
sh4-linux-objcopy wants to output a file which format is elf32-sh-linux:
$ sh4-linux-objdump -d drivers/soc/litex/litex_soc_ctrl.o | grep format
drivers/soc/litex/litex_soc_ctrl.o: file format elf32-shbig-linux
Link: https://lkml.kernel.org/r/20210210150435.2171567-1-rong.a.chen@intel.com
Link: https://lore.kernel.org/linux-mm/202101261118.GbbYSlHu-lkp@intel.com
Signed-off-by: Rong Chen <rong.a.chen@intel.com>
Reported-by: kernel test robot <lkp@intel.com>
Cc: Yoshinori Sato <ysato@users.osdn.me>
Cc: Rich Felker <dalias@libc.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Recent changes that obsoleted DISCONTIGMEM on m68k switched the MMU
variant to use generic definitions of __pfn_to_phys() and __phys_to_pfn(),
but missed the !MMU variant which caused a build failure:
drivers/media/common/videobuf2/videobuf2-dma-contig.c: In function 'vb2_dc_get_userptr':
drivers/media/common/videobuf2/videobuf2-dma-contig.c:509:5: error: implicit declaration of function '__pfn_to_phys' [-Werror=implicit-function-declaration]
509 | __pfn_to_phys(nums[0]), size, buf->dma_dir, 0);
| ^~~~~~~~~~~~~
cc1: some warnings being treated as errors
Enable __pfn_to_phys() and __phys_to_pfn() on !MMU builds.
Link: https://lkml.kernel.org/r/20210211232202.GS299309@linux.ibm.com
Fixes: 4bfc848e09 ("m68k/mm: enable use of generic memory_model.h for !DISCONTIGMEM")
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Reported-by: kernel test robot <lkp@intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull cifs fixes from Steve French:
"Four small smb3 fixes to the new mount API (including a particularly
important one for DFS links).
These were found in testing this week of additional DFS scenarios, and
a user testing of an apache container problem"
* tag '5.11-rc7-smb3-github' of git://github.com/smfrench/smb3-kernel:
cifs: Set CIFS_MOUNT_USE_PREFIX_PATH flag on setting cifs_sb->prepath.
cifs: In the new mount api we get the full devname as source=
cifs: do not disable noperm if multiuser mount option is not provided
cifs: fix dfs-links
Pull io_uring fix from Jens Axboe:
"Revert of a patch from this release that caused a regression"
* tag 'io_uring-5.11-2021-02-12' of git://git.kernel.dk/linux-block:
Revert "io_uring: don't take fs for recvmsg/sendmsg"
Pull drm fixes from Dave Airlie:
"Regular fixes for final, there is a ttm regression fix, dp-mst fix,
one amdgpu revert, two i915 fixes, and some misc fixes for sun4i,
xlnx, and vc4.
All pretty quiet and don't think we have any known outstanding
regressions.
ttm:
- page pool regression fix.
dp_mst:
- don't report un-attached ports as connected
amdgpu:
- blank screen fix
i915:
- ensure Type-C FIA is powered when initializing
- fix overlay frontbuffer tracking
sun4i:
- tcon1 sync polarity fix
- always set HDMI clock rate
- fix H6 HDMI PHY config
- fix H6 max frequency
vc4:
- fix buffer overflow
xlnx:
- fix memory leak"
* tag 'drm-fixes-2021-02-12' of git://anongit.freedesktop.org/drm/drm:
drm/ttm: make sure pool pages are cleared
drm/sun4i: dw-hdmi: Fix max. frequency for H6
drm/sun4i: Fix H6 HDMI PHY configuration
drm/sun4i: dw-hdmi: always set clock rate
drm/sun4i: tcon: set sync polarity for tcon1 channel
drm/i915: Fix overlay frontbuffer tracking
Revert "drm/amd/display: Update NV1x SR latency values"
drm/i915/tgl+: Make sure TypeC FIA is powered up when initializing it
drm/dp_mst: Don't report ports connected if nothing is attached to them
drm/xlnx: fix kmemleak by sending vblank_event in atomic_disable
drm/vc4: hvs: Fix buffer overflow with the dlist handling
Pull tracing fix from Steven Rostedt:
"Fix buffer overflow in trace event filter.
It was reported that if an trace event was larger than a page and was
filtered, that it caused memory corruption. The reason is that
filtered events first go into a buffer to test the filter before being
written into the ring buffer. Unfortunately, this write did not check
the size"
* tag 'trace-v5.11-rc7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Check length before giving out the filter buffer
Pull xen fix from Juergen Gross:
"A single fix for an issue introduced this development cycle: when
running as a Xen guest on Arm systems the kernel will hang during
boot"
* tag 'for-linus-5.11-rc8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
arm/xen: Don't probe xenbus as part of an early initcall
Pull RISC-V fix from Palmer Dabbelt:
"A single fix this week: the removal of the GPIO reset method for the
Ethernet phy on the HiFive Unleashed.
This returns to relying on the bootloader's phy reset sequence, which
we'll have to continue doing until we can sort out how to get the
Linux phy driver to perform the special reset dance required for this
phy"
* tag 'riscv-for-linus-5.11-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
Revert "dts: phy: add GPIO number and active state used for phy reset"
Pull arm64 fix from Catalin Marinas:
"Fix PTRACE_PEEKMTETAGS access to an mmapped region before the first
write"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: mte: Allow PTRACE_PEEKMTETAGS access to the zero page
The ptrace(PTRACE_PEEKMTETAGS) implementation checks whether the user
page has valid tags (mapped with PROT_MTE) by testing the PG_mte_tagged
page flag. If this bit is cleared, ptrace(PTRACE_PEEKMTETAGS) returns
-EIO.
A newly created (PROT_MTE) mapping points to the zero page which had its
tags zeroed during cpu_enable_mte(). If there were no prior writes to
this mapping, ptrace(PTRACE_PEEKMTETAGS) fails with -EIO since the zero
page does not have the PG_mte_tagged flag set.
Set PG_mte_tagged on the zero page when its tags are cleared during
boot. In addition, to avoid ptrace(PTRACE_PEEKMTETAGS) succeeding on
!PROT_MTE mappings pointing to the zero page, change the
__access_remote_tags() check to (vm_flags & VM_MTE) instead of
PG_mte_tagged.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Fixes: 34bfeea4a9 ("arm64: mte: Clear the tags when a page is mapped in user-space with PROT_MTE")
Cc: <stable@vger.kernel.org> # 5.10.x
Cc: Will Deacon <will@kernel.org>
Reported-by: Luis Machado <luis.machado@linaro.org>
Tested-by: Luis Machado <luis.machado@linaro.org>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Link: https://lore.kernel.org/r/20210210180316.23654-1-catalin.marinas@arm.com
User reported that btrfs-progs misc-tests/028-superblock-recover fails:
[TEST/misc] 028-superblock-recover
unexpected success: mounted fs with corrupted superblock
test failed for case 028-superblock-recover
The test case expects that a broken image with bad superblock will be
rejected to be mounted. However, the test image just passed csum check
of superblock and was successfully mounted.
Commit 55fc29bed8 ("btrfs: use cached value of fs_info::csum_size
everywhere") replaces all calls to btrfs_super_csum_size by
fs_info::csum_size. The calls include the place where fs_info->csum_size
is not initialized. So btrfs_check_super_csum() passes because memcmp()
with len 0 always returns 0.
Fix it by caching csum size in btrfs_fs_info::csum_size once we know the
csum type in superblock is valid in open_ctree().
Link: https://github.com/kdave/btrfs-progs/issues/250
Fixes: 55fc29bed8 ("btrfs: use cached value of fs_info::csum_size everywhere")
Signed-off-by: Su Yue <l@damenly.su>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The digital filter related computation are present in the driver
however the programming of the filter within the IP is missing.
The maximum value for the DNF is wrong and should be 15 instead of 16.
Fixes: aeb068c572 ("i2c: i2c-stm32f7: add driver")
Signed-off-by: Alain Volmat <alain.volmat@foss.st.com>
Signed-off-by: Pierre-Yves MORDRET <pierre-yves.mordret@foss.st.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Pull powerpc fix from Michael Ellerman:
"One fix for a regression seen in io_uring, introduced by our support
for KUAP (Kernel User Access Prevention) with the Hash MMU.
Thanks to Aneesh Kumar K.V, and Zorro Lang"
* tag 'powerpc-5.11-8' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/kuap: Allow kernel thread to access userspace after kthread_use_mm
When filters are used by trace events, a page is allocated on each CPU and
used to copy the trace event fields to this page before writing to the ring
buffer. The reason to use the filter and not write directly into the ring
buffer is because a filter may discard the event and there's more overhead
on discarding from the ring buffer than the extra copy.
The problem here is that there is no check against the size being allocated
when using this page. If an event asks for more than a page size while being
filtered, it will get only a page, leading to the caller writing more that
what was allocated.
Check the length of the request, and if it is more than PAGE_SIZE minus the
header default back to allocating from the ring buffer directly. The ring
buffer may reject the event if its too big anyway, but it wont overflow.
Link: https://lore.kernel.org/ath10k/1612839593-2308-1-git-send-email-wgong@codeaurora.org/
Cc: stable@vger.kernel.org
Fixes: 0fc1b09ff1 ("tracing: Use temp buffer when filtering events")
Reported-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Pull gpio fixes from Bartosz Golaszewski:
"This is hopefully the last batch of fixes for this release cycle. We
have a minor fix for a Kconfig regression as well as fixes for older
bugs in gpio-ep93xx:
- don't build gpio-mxs unconditionally with COMPILE_TEST enabled
- fix two problems with interrupt handling in gpio-ep93xx"
* tag 'gpio-fixes-for-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpio: ep93xx: Fix single irqchip with multi gpiochips
gpio: ep93xx: fix BUG_ON port F usage
gpio: mxs: GPIO_MXS should not default to y unconditionally
Stephen Rothwell reported a build error on ppc64 when
CONFIG_TRIM_UNUSED_KSYMS is enabled.
Jessica Yu pointed out the cause of the error with the reference to the
ppc64 ELF ABI:
"Symbol names with a dot (.) prefix are reserved for holding entry
point addresses. The value of a symbol named ".FN", if it exists,
is the entry point of the function "FN".
As it turned out, CONFIG_TRIM_UNUSED_KSYMS has never worked for ppc64,
but this issue has been unnoticed until recently because this option
depends on !UNUSED_SYMBOLS hence is disabled by all{mod,yes}config.
(Then, it was uncovered by another patch removing UNUSED_SYMBOLS.)
Removing the dot prefix in scripts/gen_autoksyms.sh fixes the issue.
Please note it must be done before 'sort -u' because modules have
both ._mcount and _mcount undefined when CONFIG_FUNCTION_TRACER=y.
Link: https://lore.kernel.org/lkml/20210209210843.3af66662@canb.auug.org.au/
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Tested-by: Jessica Yu <jeyu@kernel.org>
While debugging another issue today, Steve and I noticed that if a
subdir for a file share is already mounted on the client, any new
mount of any other subdir (or the file share root) of the same share
results in sharing the cifs superblock, which e.g. can result in
incorrect device name.
While setting prefix path for the root of a cifs_sb,
CIFS_MOUNT_USE_PREFIX_PATH flag should also be set.
Without it, prepath is not even considered in some places,
and output of "mount" and various /proc/<>/*mount* related
options can be missing part of the device name.
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
so we no longer need to handle or parse the UNC= and prefixpath=
options that mount.cifs are generating.
This also fixes a bug in the mount command option where the devname
would be truncated into just //server/share because we were looking
at the truncated UNC value and not the full path.
I.e. in the mount command output the devive //server/share/path
would show up as just //server/share
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Shyam Prasad N <nspmangalore@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
After Commit 3499ba8198 ("xen: Fix event channel callback via
INTX/GSI"), xenbus_probe() will be called too early on Arm. This will
recent to a guest hang during boot.
If the hang wasn't there, we would have ended up to call
xenbus_probe() twice (the second time is in xenbus_probe_initcall()).
We don't need to initialize xenbus_probe() early for Arm guest.
Therefore, the call in xen_guest_init() is now removed.
After this change, there is no more external caller for xenbus_probe().
So the function is turned to a static one. Interestingly there were two
prototypes for it.
Cc: stable@vger.kernel.org
Fixes: 3499ba8198 ("xen: Fix event channel callback via INTX/GSI")
Reported-by: Ian Jackson <iwj@xenproject.org>
Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Link: https://lore.kernel.org/r/20210210170654.5377-1-julien@xen.org
Signed-off-by: Juergen Gross <jgross@suse.com>
VSC8541 phys need a special reset sequence, which the driver doesn't
currentlny support. As a result enabling the reset via GPIO essentially
guarnteees that the device won't work correctly. We've been relying on
bootloaders to reset the device for years, with this revert we'll go
back to doing so until we can sort out how to get the reset sequence
into the kernel.
This reverts commit a0fa9d7270.
Fixes: a0fa9d7270 ("dts: phy: add GPIO number and active state used for phy reset")
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Invoking x86_init.irqs.create_pci_msi_domain() before
x86_init.pci.arch_init() breaks XEN PV.
The XEN_PV specific pci.arch_init() function overrides the default
create_pci_msi_domain() which is obviously too late.
As a consequence the XEN PV PCI/MSI allocation goes through the native
path which runs out of vectors and causes malfunction.
Invoke it after x86_init.pci.arch_init().
Fixes: 6b15ffa07d ("x86/irq: Initialize PCI/MSI domain at PCI init time")
Reported-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/87pn18djte.fsf@nanos.tec.linutronix.de
Pull power management fixes from Rafael Wysocki:
"Address a performance regression related to scale-invariance on x86
that may prevent turbo CPU frequencies from being used in certain
workloads on systems using acpi-cpufreq as the CPU performance scaling
driver and schedutil as the scaling governor"
* tag 'pm-5.11-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: ACPI: Update arch scale-invariance max perf ratio if CPPC is not there
cpufreq: ACPI: Extend frequency tables to cover boost frequencies
Pull ACPI fix from Rafael Wysocki:
"Revert a problematic ACPICA commit that changed the code to attempt to
update memory regions which may be read-only on some systems (Ard
Biesheuvel)"
* tag 'acpi-5.11-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
Revert "ACPICA: Interpreter: fix memory leak by using existing buffer"
This reverts commit 10cad2c40d.
Petr reports that with this commit in place, io_uring fails the chroot
test (CVE-202-29373). We do need to retain ->fs for send/recvmsg, so
revert this commit.
Reported-by: Petr Vorel <pvorel@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull networking fixes from David Miller:
"Another pile of networing fixes:
1) ath9k build error fix from Arnd Bergmann
2) dma memory leak fix in mediatec driver from Lorenzo Bianconi.
3) bpf int3 kprobe fix from Alexei Starovoitov.
4) bpf stackmap integer overflow fix from Bui Quang Minh.
5) Add usb device ids for Cinterion MV31 to qmi_qwwan driver, from
Christoph Schemmel.
6) Don't update deleted entry in xt_recent netfilter module, from
Jazsef Kadlecsik.
7) Use after free in nftables, fix from Pablo Neira Ayuso.
8) Header checksum fix in flowtable from Sven Auhagen.
9) Validate user controlled length in qrtr code, from Sabyrzhan
Tasbolatov.
10) Fix race in xen/netback, from Juergen Gross,
11) New device ID in cxgb4, from Raju Rangoju.
12) Fix ring locking in rxrpc release call, from David Howells.
13) Don't return LAPB error codes from x25_open(), from Xie He.
14) Missing error returns in gsi_channel_setup() from Alex Elder.
15) Get skb_copy_and_csum_datagram working properly with odd segment
sizes, from Willem de Bruijn.
16) Missing RFS/RSS table init in enetc driver, from Vladimir Oltean.
17) Do teardown on probe failure in DSA, from Vladimir Oltean.
18) Fix compilation failures of txtimestamp selftest, from Vadim
Fedorenko.
19) Limit rx per-napi gro queue size to fix latency regression, from
Eric Dumazet.
20) dpaa_eth xdp fixes from Camelia Groza.
21) Missing txq mode update when switching CBS off, in stmmac driver,
from Mohammad Athari Bin Ismail.
22) Failover pending logic fix in ibmvnic driver, from Sukadev
Bhattiprolu.
23) Null deref fix in vmw_vsock, from Norbert Slusarek.
24) Missing verdict update in xdp paths of ena driver, from Shay
Agroskin.
25) seq_file iteration fix in sctp from Neil Brown.
26) bpf 32-bit src register truncation fix on div/mod, from Daniel
Borkmann.
27) Fix jmp32 pruning in bpf verifier, from Daniel Borkmann.
28) Fix locking in vsock_shutdown(), from Stefano Garzarella.
29) Various missing index bound checks in hns3 driver, from Yufeng Mo.
30) Flush ports on .phylink_mac_link_down() in dsa felix driver, from
Vladimir Oltean.
31) Don't mix up stp and mrp port states in bridge layer, from Horatiu
Vultur.
32) Fix locking during netif_tx_disable(), from Edwin Peer"
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (45 commits)
bpf: Fix 32 bit src register truncation on div/mod
bpf: Fix verifier jmp32 pruning decision logic
bpf: Fix verifier jsgt branch analysis on max bound
vsock: fix locking in vsock_shutdown()
net: hns3: add a check for index in hclge_get_rss_key()
net: hns3: add a check for tqp_index in hclge_get_ring_chain_from_mbx()
net: hns3: add a check for queue_id in hclge_reset_vf_queue()
net: dsa: felix: implement port flushing on .phylink_mac_link_down
switchdev: mrp: Remove SWITCHDEV_ATTR_ID_MRP_PORT_STAT
bridge: mrp: Fix the usage of br_mrp_port_switchdev_set_state
net: watchdog: hold device global xmit lock during tx disable
netfilter: nftables: relax check for stateful expressions in set definition
netfilter: conntrack: skip identical origin tuple in same zone only
vsock/virtio: update credit only if socket is not closed
net: fix iteration for sctp transport seq_files
net: ena: Update XDP verdict upon failure
net/vmw_vsock: improve locking in vsock_connect_timeout()
net/vmw_vsock: fix NULL pointer dereference
ibmvnic: Clear failover_pending if unable to schedule
net: stmmac: set TxQ mode back to DCB after disabling CBS
...
Merge misc fixes from Andrew Morton:
"14 patches.
Subsystems affected by this patch series: mm (kasan, mremap, tmpfs,
selftests, memcg, and slub), MAINTAINERS, squashfs, nilfs2, and
firmware"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
nilfs2: make splice write available again
mm, slub: better heuristic for number of cpus when calculating slab order
Revert "mm: memcontrol: avoid workload stalls when lowering memory.high"
MAINTAINERS: update Andrey Ryabinin's email address
selftests/vm: rename file run_vmtests to run_vmtests.sh
tmpfs: disallow CONFIG_TMPFS_INODE64 on alpha
tmpfs: disallow CONFIG_TMPFS_INODE64 on s390
mm/mremap: fix BUILD_BUG_ON() error in get_extent
firmware_loader: align .builtin_fw to 8
kasan: fix stack traces dependency for HW_TAGS
squashfs: add more sanity checks in xattr id lookup
squashfs: add more sanity checks in inode lookup
squashfs: add more sanity checks in id lookup
squashfs: avoid out of bounds writes in decompressors
When creating a new kmem cache, SLUB determines how large the slab pages
will based on number of inputs, including the number of CPUs in the
system. Larger slab pages mean that more objects can be allocated/free
from per-cpu slabs before accessing shared structures, but also
potentially more memory can be wasted due to low slab usage and
fragmentation. The rough idea of using number of CPUs is that larger
systems will be more likely to benefit from reduced contention, and also
should have enough memory to spare.
Number of CPUs used to be determined as nr_cpu_ids, which is number of
possible cpus, but on some systems many will never be onlined, thus
commit 045ab8c948 ("mm/slub: let number of online CPUs determine the
slub page order") changed it to nr_online_cpus(). However, for kmem
caches created early before CPUs are onlined, this may lead to
permamently low slab page sizes.
Vincent reports a regression [1] of hackbench on arm64 systems:
"I'm facing significant performances regression on a large arm64
server system (224 CPUs). Regressions is also present on small arm64
system (8 CPUs) but in a far smaller order of magnitude
On 224 CPUs system : 9 iterations of hackbench -l 16000 -g 16
v5.11-rc4 : 9.135sec (+/- 0.45%)
v5.11-rc4 + revert this patch: 3.173sec (+/- 0.48%)
v5.10: 3.136sec (+/- 0.40%)"
Mel reports a regression [2] of hackbench on x86_64, with lockstat suggesting
page allocator contention:
"i.e. the patch incurs a 7% to 32% performance penalty. This bisected
cleanly yesterday when I was looking for the regression and then
found the thread.
Numerous caches change size. For example, kmalloc-512 goes from
order-0 (vanilla) to order-2 with the revert.
So mostly this is down to the number of times SLUB calls into the
page allocator which only caches order-0 pages on a per-cpu basis"
Clearly num_online_cpus() doesn't work too early in bootup. We could
change the order dynamically in a memory hotplug callback, but runtime
order changing for existing kmem caches has been already shown as
dangerous, and removed in 32a6f409b6 ("mm, slub: remove runtime
allocation order changes").
It could be resurrected in a safe manner with some effort, but to fix
the regression we need something simpler.
We could use num_present_cpus() that should be the number of physically
present CPUs even before they are onlined. That would work for PowerPC
[3], which triggered the original commit, but that still doesn't work on
arm64 [4] as explained in [5].
So this patch tries to determine the best available value without
specific arch knowledge.
- num_present_cpus() if the number is larger than 1, as that means the
arch is likely setting it properly
- nr_cpu_ids otherwise
This should fix the reported regressions while also keeping the effect
of 045ab8c948 for PowerPC systems. It's possible there are
configurations where num_present_cpus() is 1 during boot while
nr_cpu_ids is at the same time bloated, so these (if they exist) would
keep the large orders based on nr_cpu_ids as was before 045ab8c948.
[1] https://lore.kernel.org/linux-mm/CAKfTPtA_JgMf_+zdFbcb_V9rM7JBWNPjAz9irgwFj7Rou=xzZg@mail.gmail.com/
[2] https://lore.kernel.org/linux-mm/20210128134512.GF3592@techsingularity.net/
[3] https://lore.kernel.org/linux-mm/20210123051607.GC2587010@in.ibm.com/
[4] https://lore.kernel.org/linux-mm/CAKfTPtAjyVmS5VYvU6DBxg4-JEo5bdmWbngf-03YsY18cmWv_g@mail.gmail.com/
[5] https://lore.kernel.org/linux-mm/20210126230305.GD30941@willie-the-truck/
Link: https://lkml.kernel.org/r/20210208134108.22286-1-vbabka@suse.cz
Fixes: 045ab8c948 ("mm/slub: let number of online CPUs determine the slub page order")
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: Vincent Guittot <vincent.guittot@linaro.org>
Reported-by: Mel Gorman <mgorman@techsingularity.net>
Tested-by: Mel Gorman <mgorman@techsingularity.net>
Tested-by: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Bharata B Rao <bharata@linux.ibm.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Jann Horn <jannh@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Will Deacon <will@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fixes the following warnings which results in interrupts disabled on
port B/F:
gpio gpiochip1: (B): detected irqchip that is shared with multiple gpiochips: please fix the driver.
gpio gpiochip5: (F): detected irqchip that is shared with multiple gpiochips: please fix the driver.
- added separate irqchip for each interrupt capable gpiochip
- provided unique names for each irqchip
Fixes: d2b0919615 ("gpio: ep93xx: Pass irqchip when adding gpiochip")
Cc: <stable@vger.kernel.org>
Signed-off-by: Nikita Shubin <nikita.shubin@maquefel.me>
Tested-by: Alexander Sverdlin <alexander.sverdlin@gmail.com>
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Two index spaces and ep93xx_gpio_port are confusing.
Instead add a separate struct to store necessary data and remove
ep93xx_gpio_port.
- add struct to store IRQ related data for each IRQ capable chip
- replace offset array with defined offsets
- add IRQ registers offset for each IRQ capable chip into
ep93xx_gpio_banks
------------[ cut here ]------------
kernel BUG at drivers/gpio/gpio-ep93xx.c:64!
---[ end trace 3f6544e133e9f5ae ]---
Fixes: fd935fc421 ("gpio: ep93xx: Do not pingpong irq numbers")
Cc: <stable@vger.kernel.org>
Reviewed-by: Alexander Sverdlin <alexander.sverdlin@gmail.com>
Tested-by: Alexander Sverdlin <alexander.sverdlin@gmail.com>
Signed-off-by: Nikita Shubin <nikita.shubin@maquefel.me>
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Merely enabling CONFIG_COMPILE_TEST should not enable additional code.
To fix this, restrict the automatic enabling of GPIO_MXS to ARCH_MXS,
and ask the user in case of compile-testing.
Fixes: 6876ca311b ("gpio: mxs: add COMPILE_TEST support for GPIO_MXS")
Cc: <stable@vger.kernel.org>
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
It turns out that reasoning for lowering max. supported frequency is
wrong. Scrambling works just fine. Several now fixed bugs prevented
proper functioning, even with rates lower than 340 MHz. Issues were just
more pronounced with higher frequencies.
Fix that by allowing max. supported frequency in HW and fix the comment.
Fixes: cd9063757a ("drm/sun4i: DW HDMI: Lower max. supported rate for H6")
Reviewed-by: Chen-Yu Tsai <wens@csie.org>
Tested-by: Andre Heider <a.heider@gmail.com>
Signed-off-by: Jernej Skrabec <jernej.skrabec@siol.net>
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20210209175900.7092-6-jernej.skrabec@siol.net
As it turns out, vendor HDMI PHY driver for H6 has a pretty big table
of predefined values for various pixel clocks. However, most of them are
not useful/tested because they come from reference driver code. Vendor
PHY driver is concerned with only few of those, namely 27 MHz, 74.25
MHz, 148.5 MHz, 297 MHz and 594 MHz. These are all frequencies for
standard CEA modes.
Fix sun50i_h6_cur_ctr and sun50i_h6_phy_config with the values only for
aforementioned frequencies.
Table sun50i_h6_mpll_cfg doesn't need to be changed because values are
actually frequency dependent and not so much SoC dependent. See i.MX6
documentation for explanation of those values for similar PHY.
Fixes: c71c9b2fee ("drm/sun4i: Add support for Synopsys HDMI PHY")
Tested-by: Andre Heider <a.heider@gmail.com>
Signed-off-by: Jernej Skrabec <jernej.skrabec@siol.net>
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20210209175900.7092-5-jernej.skrabec@siol.net
Channel 1 has polarity bits for vsync and hsync signals but driver never
sets them. It turns out that with pre-HDMI2 controllers seemingly there
is no issue if polarity is not set. However, with HDMI2 controllers
(H6) there often comes to de-synchronization due to phase shift. This
causes flickering screen. It's safe to assume that similar issues might
happen also with pre-HDMI2 controllers.
Solve issue with setting vsync and hsync polarity. Note that display
stacks with tcon top have polarity bits actually in tcon0 polarity
register.
Fixes: 9026e0d122 ("drm: Add Allwinner A10 Display Engine support")
Reviewed-by: Chen-Yu Tsai <wens@csie.org>
Tested-by: Andre Heider <a.heider@gmail.com>
Signed-off-by: Jernej Skrabec <jernej.skrabec@siol.net>
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20210209175900.7092-3-jernej.skrabec@siol.net
Daniel Borkmann says:
====================
pull-request: bpf 2021-02-10
The following pull-request contains BPF updates for your *net* tree.
We've added 5 non-merge commits during the last 8 day(s) which contain
a total of 3 files changed, 22 insertions(+), 21 deletions(-).
The main changes are:
1) Fix missed execution of kprobes BPF progs when kprobe is firing via
int3, from Alexei Starovoitov.
2) Fix potential integer overflow in map max_entries for stackmap on
32 bit archs, from Bui Quang Minh.
3) Fix a verifier pruning and a insn rewrite issue related to 32 bit ops,
from Daniel Borkmann.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
c# Please enter a commit message to explain why this merge is necessary,
This reverts commit 536d3bf261, as it can
cause writers to memory.high to get stuck in the kernel forever,
performing page reclaim and consuming excessive amounts of CPU cycles.
Before the patch, a write to memory.high would first put the new limit
in place for the workload, and then reclaim the requested delta. After
the patch, the kernel tries to reclaim the delta before putting the new
limit into place, in order to not overwhelm the workload with a sudden,
large excess over the limit. However, if reclaim is actively racing
with new allocations from the uncurbed workload, it can keep the write()
working inside the kernel indefinitely.
This is causing problems in Facebook production. A privileged
system-level daemon that adjusts memory.high for various workloads
running on a host can get unexpectedly stuck in the kernel and
essentially turn into a sort of involuntary kswapd for one of the
workloads. We've observed that daemon busy-spin in a write() for
minutes at a time, neglecting its other duties on the system, and
expending privileged system resources on behalf of a workload.
To remedy this, we have first considered changing the reclaim logic to
break out after a couple of loops - whether the workload has converged
to the new limit or not - and bound the write() call this way. However,
the root cause that inspired the sequence change in the first place has
been fixed through other means, and so a revert back to the proven
limit-setting sequence, also used by memory.max, is preferable.
The sequence was changed to avoid extreme latencies in the workload when
the limit was lowered: the sudden, large excess created by the limit
lowering would erroneously trigger the penalty sleeping code that is
meant to throttle excessive growth from below. Allocating threads could
end up sleeping long after the write() had already reclaimed the delta
for which they were being punished.
However, erroneous throttling also caused problems in other scenarios at
around the same time. This resulted in commit b3ff92916a ("mm, memcg:
reclaim more aggressively before high allocator throttling"), included
in the same release as the offending commit. When allocating threads
now encounter large excess caused by a racing write() to memory.high,
instead of entering punitive sleeps, they will simply be tasked with
helping reclaim down the excess, and will be held no longer than it
takes to accomplish that. This is in line with regular limit
enforcement - i.e. if the workload allocates up against or over an
otherwise unchanged limit from below.
With the patch breaking userspace, and the root cause addressed by other
means already, revert it again.
Link: https://lkml.kernel.org/r/20210122184341.292461-1-hannes@cmpxchg.org
Fixes: 536d3bf261 ("mm: memcontrol: avoid workload stalls when lowering memory.high")
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Tejun Heo <tj@kernel.org>
Acked-by: Chris Down <chris@chrisdown.name>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: <stable@vger.kernel.org> [5.8+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently there is an assumption in tmpfs that 64-bit architectures also
have a 64-bit ino_t. This is not true on s390 which has a 32-bit ino_t.
With CONFIG_TMPFS_INODE64=y tmpfs mounts will get 64-bit inode numbers
and display "inode64" in the mount options, but passing the "inode64"
mount option will fail. This leads to the following behavior:
# mkdir mnt
# mount -t tmpfs nodev mnt
# mount -o remount,rw mnt
mount: /home/ubuntu/mnt: mount point not mounted or bad option.
As mount sees "inode64" in the mount options and thus passes it in the
options for the remount.
So prevent CONFIG_TMPFS_INODE64 from being selected on s390.
Link: https://lkml.kernel.org/r/20210205230620.518245-1-seth.forshee@canonical.com
Fixes: ea3271f719 ("tmpfs: support 64-bit inums per-sb")
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Chris Down <chris@chrisdown.name>
Cc: Hugh Dickins <hughd@google.com>
Cc: Amir Goldstein <amir73il@gmail.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: <stable@vger.kernel.org> [5.9+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
arm64 references the start address of .builtin_fw (__start_builtin_fw)
with a pair of R_AARCH64_ADR_PREL_PG_HI21/R_AARCH64_LDST64_ABS_LO12_NC
relocations. The compiler is allowed to emit the
R_AARCH64_LDST64_ABS_LO12_NC relocation because struct builtin_fw in
include/linux/firmware.h is 8-byte aligned.
The R_AARCH64_LDST64_ABS_LO12_NC relocation requires the address to be a
multiple of 8, which may not be the case if .builtin_fw is empty.
Unconditionally align .builtin_fw to fix the linker error. 32-bit
architectures could use ALIGN(4) but that would add unnecessary
complexity, so just use ALIGN(8).
Link: https://lkml.kernel.org/r/20201208054646.2913063-1-maskray@google.com
Link: https://github.com/ClangBuiltLinux/linux/issues/1204
Fixes: 5658c76 ("firmware: allow firmware files to be built into kernel image")
Signed-off-by: Fangrui Song <maskray@google.com>
Reported-by: kernel test robot <lkp@intel.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Douglas Anderson <dianders@chromium.org>
Acked-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently, whether the alloc/free stack traces collection is enabled by
default for hardware tag-based KASAN depends on CONFIG_DEBUG_KERNEL.
The intention for this dependency was to only enable collection on slow
debug kernels due to a significant perf and memory impact.
As it turns out, CONFIG_DEBUG_KERNEL is not considered a debug option
and is enabled on many productions kernels including Android and Ubuntu.
As the result, this dependency is pointless and only complicates the
code and documentation.
Having stack traces collection disabled by default would make the
hardware mode work differently to to the software ones, which is
confusing.
This change removes the dependency and enables stack traces collection
by default.
Looking into the future, this default might makes sense for production
kernels, assuming we implement a fast stack trace collection approach.
Link: https://lkml.kernel.org/r/6678d77ceffb71f1cff2cf61560e2ffe7bb6bfe9.1612808820.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Marco Elver <elver@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sysbot has reported a warning where a kmalloc() attempt exceeds the
maximum limit. This has been identified as corruption of the xattr_ids
count when reading the xattr id lookup table.
This patch adds a number of additional sanity checks to detect this
corruption and others.
1. It checks for a corrupted xattr index read from the inode. This could
be because the metadata block is uncompressed, or because the
"compression" bit has been corrupted (turning a compressed block
into an uncompressed block). This would cause an out of bounds read.
2. It checks against corruption of the xattr_ids count. This can either
lead to the above kmalloc failure, or a smaller than expected
table to be read.
3. It checks the contents of the index table for corruption.
[phillip@squashfs.org.uk: fix checkpatch issue]
Link: https://lkml.kernel.org/r/270245655.754655.1612770082682@webmail.123-reg.co.uk
Link: https://lkml.kernel.org/r/20210204130249.4495-5-phillip@squashfs.org.uk
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
Reported-by: syzbot+2ccea6339d368360800d@syzkaller.appspotmail.com
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sysbot has reported an "slab-out-of-bounds read" error which has been
identified as being caused by a corrupted "ino_num" value read from the
inode. This could be because the metadata block is uncompressed, or
because the "compression" bit has been corrupted (turning a compressed
block into an uncompressed block).
This patch adds additional sanity checks to detect this, and the
following corruption.
1. It checks against corruption of the inodes count. This can either
lead to a larger table to be read, or a smaller than expected
table to be read.
In the case of a too large inodes count, this would often have been
trapped by the existing sanity checks, but this patch introduces
a more exact check, which can identify too small values.
2. It checks the contents of the index table for corruption.
[phillip@squashfs.org.uk: fix checkpatch issue]
Link: https://lkml.kernel.org/r/527909353.754618.1612769948607@webmail.123-reg.co.uk
Link: https://lkml.kernel.org/r/20210204130249.4495-4-phillip@squashfs.org.uk
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
Reported-by: syzbot+04419e3ff19d2970ea28@syzkaller.appspotmail.com
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "Squashfs: fix BIO migration regression and add sanity checks".
Patch [1/4] fixes a regression introduced by the "migrate from
ll_rw_block usage to BIO" patch, which has produced a number of
Sysbot/Syzkaller reports.
Patches [2/4], [3/4], and [4/4] fix a number of filesystem corruption
issues which have produced Sysbot reports in the id, inode and xattr
lookup code.
Each patch has been tested against the Sysbot reproducers using the
given kernel configuration. They have the appropriate "Reported-by:"
lines added.
Additionally, all of the reproducer filesystems are indirectly fixed by
patch [4/4] due to the fact they all have xattr corruption which is now
detected there.
Additional testing with other configurations and architectures (32bit,
big endian), and normal filesystems has also been done to trap any
inadvertent regressions caused by the additional sanity checks.
This patch (of 4):
This is a regression introduced by the patch "migrate from ll_rw_block
usage to BIO".
Sysbot/Syskaller has reported a number of "out of bounds writes" and
"unable to handle kernel paging request in squashfs_decompress" errors
which have been identified as a regression introduced by the above
patch.
Specifically, the patch removed the following sanity check
if (length < 0 || length > output->length ||
(index + length) > msblk->bytes_used)
This check did two things:
1. It ensured any reads were not beyond the end of the filesystem
2. It ensured that the "length" field read from the filesystem
was within the expected maximum length. Without this any
corrupted values can over-run allocated buffers.
Link: https://lkml.kernel.org/r/20210204130249.4495-1-phillip@squashfs.org.uk
Link: https://lkml.kernel.org/r/20210204130249.4495-2-phillip@squashfs.org.uk
Fixes: 93e72b3c61 ("squashfs: migrate from ll_rw_block usage to BIO")
Reported-by: syzbot+6fba78f99b9afd4b5634@syzkaller.appspotmail.com
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
Cc: Philippe Liard <pliard@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull i3c fix from Alexandre Belloni:
"A single build warning fix"
* tag 'i3c/fixes-for-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux:
i3c/master/mipi-i3c-hci: Fix position of __maybe_unused in i3c_hci_of_match
While reviewing a different fix, John and I noticed an oddity in one of the
BPF program dumps that stood out, for example:
# bpftool p d x i 13
0: (b7) r0 = 808464450
1: (b4) w4 = 808464432
2: (bc) w0 = w0
3: (15) if r0 == 0x0 goto pc+1
4: (9c) w4 %= w0
[...]
In line 2 we noticed that the mov32 would 32 bit truncate the original src
register for the div/mod operation. While for the two operations the dst
register is typically marked unknown e.g. from adjust_scalar_min_max_vals()
the src register is not, and thus verifier keeps tracking original bounds,
simplified:
0: R1=ctx(id=0,off=0,imm=0) R10=fp0
0: (b7) r0 = -1
1: R0_w=invP-1 R1=ctx(id=0,off=0,imm=0) R10=fp0
1: (b7) r1 = -1
2: R0_w=invP-1 R1_w=invP-1 R10=fp0
2: (3c) w0 /= w1
3: R0_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R1_w=invP-1 R10=fp0
3: (77) r1 >>= 32
4: R0_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R1_w=invP4294967295 R10=fp0
4: (bf) r0 = r1
5: R0_w=invP4294967295 R1_w=invP4294967295 R10=fp0
5: (95) exit
processed 6 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
Runtime result of r0 at exit is 0 instead of expected -1. Remove the
verifier mov32 src rewrite in div/mod and replace it with a jmp32 test
instead. After the fix, we result in the following code generation when
having dividend r1 and divisor r6:
div, 64 bit: div, 32 bit:
0: (b7) r6 = 8 0: (b7) r6 = 8
1: (b7) r1 = 8 1: (b7) r1 = 8
2: (55) if r6 != 0x0 goto pc+2 2: (56) if w6 != 0x0 goto pc+2
3: (ac) w1 ^= w1 3: (ac) w1 ^= w1
4: (05) goto pc+1 4: (05) goto pc+1
5: (3f) r1 /= r6 5: (3c) w1 /= w6
6: (b7) r0 = 0 6: (b7) r0 = 0
7: (95) exit 7: (95) exit
mod, 64 bit: mod, 32 bit:
0: (b7) r6 = 8 0: (b7) r6 = 8
1: (b7) r1 = 8 1: (b7) r1 = 8
2: (15) if r6 == 0x0 goto pc+1 2: (16) if w6 == 0x0 goto pc+1
3: (9f) r1 %= r6 3: (9c) w1 %= w6
4: (b7) r0 = 0 4: (b7) r0 = 0
5: (95) exit 5: (95) exit
x86 in particular can throw a 'divide error' exception for div
instruction not only for divisor being zero, but also for the case
when the quotient is too large for the designated register. For the
edx:eax and rdx:rax dividend pair it is not an issue in x86 BPF JIT
since we always zero edx (rdx). Hence really the only protection
needed is against divisor being zero.
Fixes: 68fda450a7 ("bpf: fix 32-bit divide by zero")
Co-developed-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Anatoly has been fuzzing with kBdysch harness and reported a hang in
one of the outcomes:
func#0 @0
0: R1=ctx(id=0,off=0,imm=0) R10=fp0
0: (b7) r0 = 808464450
1: R0_w=invP808464450 R1=ctx(id=0,off=0,imm=0) R10=fp0
1: (b4) w4 = 808464432
2: R0_w=invP808464450 R1=ctx(id=0,off=0,imm=0) R4_w=invP808464432 R10=fp0
2: (9c) w4 %= w0
3: R0_w=invP808464450 R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R10=fp0
3: (66) if w4 s> 0x30303030 goto pc+0
R0_w=invP808464450 R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff),s32_max_value=808464432) R10=fp0
4: R0_w=invP808464450 R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff),s32_max_value=808464432) R10=fp0
4: (7f) r0 >>= r0
5: R0_w=invP(id=0) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff),s32_max_value=808464432) R10=fp0
5: (9c) w4 %= w0
6: R0_w=invP(id=0) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0
6: (66) if w0 s> 0x3030 goto pc+0
R0_w=invP(id=0,s32_max_value=12336) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0
7: R0=invP(id=0,s32_max_value=12336) R1=ctx(id=0,off=0,imm=0) R4=invP(id=0) R10=fp0
7: (d6) if w0 s<= 0x303030 goto pc+1
9: R0=invP(id=0,s32_max_value=12336) R1=ctx(id=0,off=0,imm=0) R4=invP(id=0) R10=fp0
9: (95) exit
propagating r0
from 6 to 7: safe
4: R0_w=invP808464450 R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0,umin_value=808464433,umax_value=2147483647,var_off=(0x0; 0x7fffffff)) R10=fp0
4: (7f) r0 >>= r0
5: R0_w=invP(id=0) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0,umin_value=808464433,umax_value=2147483647,var_off=(0x0; 0x7fffffff)) R10=fp0
5: (9c) w4 %= w0
6: R0_w=invP(id=0) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0
6: (66) if w0 s> 0x3030 goto pc+0
R0_w=invP(id=0,s32_max_value=12336) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0
propagating r0
7: safe
propagating r0
from 6 to 7: safe
processed 15 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 1
The underlying program was xlated as follows:
# bpftool p d x i 10
0: (b7) r0 = 808464450
1: (b4) w4 = 808464432
2: (bc) w0 = w0
3: (15) if r0 == 0x0 goto pc+1
4: (9c) w4 %= w0
5: (66) if w4 s> 0x30303030 goto pc+0
6: (7f) r0 >>= r0
7: (bc) w0 = w0
8: (15) if r0 == 0x0 goto pc+1
9: (9c) w4 %= w0
10: (66) if w0 s> 0x3030 goto pc+0
11: (d6) if w0 s<= 0x303030 goto pc+1
12: (05) goto pc-1
13: (95) exit
The verifier rewrote original instructions it recognized as dead code with
'goto pc-1', but reality differs from verifier simulation in that we are
actually able to trigger a hang due to hitting the 'goto pc-1' instructions.
Taking a closer look at the verifier analysis, the reason is that it misjudges
its pruning decision at the first 'from 6 to 7: safe' occasion. What happens
is that while both old/cur registers are marked as precise, they get misjudged
for the jmp32 case as range_within() yields true, meaning that the prior
verification path with a wider register bound could be verified successfully
and therefore the current path with a narrower register bound is deemed safe
as well whereas in reality it's not. R0 old/cur path's bounds compare as
follows:
old: smin_value=0x8000000000000000,smax_value=0x7fffffffffffffff,umin_value=0x0,umax_value=0xffffffffffffffff,var_off=(0x0; 0xffffffffffffffff)
cur: smin_value=0x8000000000000000,smax_value=0x7fffffff7fffffff,umin_value=0x0,umax_value=0xffffffff7fffffff,var_off=(0x0; 0xffffffff7fffffff)
old: s32_min_value=0x80000000,s32_max_value=0x00003030,u32_min_value=0x00000000,u32_max_value=0xffffffff
cur: s32_min_value=0x00003031,s32_max_value=0x7fffffff,u32_min_value=0x00003031,u32_max_value=0x7fffffff
The 64 bit bounds generally look okay and while the information that got
propagated from 32 to 64 bit looks correct as well, it's not precise enough
for judging a conditional jmp32. Given the latter only operates on subregisters
we also need to take these into account as well for a range_within() probe
in order to be able to prune paths. Extending the range_within() constraint
to both bounds will be able to tell us that the old signed 32 bit bounds are
not wider than the cur signed 32 bit bounds.
With the fix in place, the program will now verify the 'goto' branch case as
it should have been:
[...]
6: R0_w=invP(id=0) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0
6: (66) if w0 s> 0x3030 goto pc+0
R0_w=invP(id=0,s32_max_value=12336) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0
7: R0=invP(id=0,s32_max_value=12336) R1=ctx(id=0,off=0,imm=0) R4=invP(id=0) R10=fp0
7: (d6) if w0 s<= 0x303030 goto pc+1
9: R0=invP(id=0,s32_max_value=12336) R1=ctx(id=0,off=0,imm=0) R4=invP(id=0) R10=fp0
9: (95) exit
7: R0_w=invP(id=0,smax_value=9223372034707292159,umax_value=18446744071562067967,var_off=(0x0; 0xffffffff7fffffff),s32_min_value=12337,u32_min_value=12337,u32_max_value=2147483647) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0
7: (d6) if w0 s<= 0x303030 goto pc+1
R0_w=invP(id=0,smax_value=9223372034707292159,umax_value=18446744071562067967,var_off=(0x0; 0xffffffff7fffffff),s32_min_value=3158065,u32_min_value=3158065,u32_max_value=2147483647) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0
8: R0_w=invP(id=0,smax_value=9223372034707292159,umax_value=18446744071562067967,var_off=(0x0; 0xffffffff7fffffff),s32_min_value=3158065,u32_min_value=3158065,u32_max_value=2147483647) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0
8: (30) r0 = *(u8 *)skb[808464432]
BPF_LD_[ABS|IND] uses reserved fields
processed 11 insns (limit 1000000) max_states_per_insn 1 total_states 1 peak_states 1 mark_read 1
The bug is quite subtle in the sense that when verifier would determine that
a given branch is dead code, it would (here: wrongly) remove these instructions
from the program and hard-wire the taken branch for privileged programs instead
of the 'goto pc-1' rewrites which will cause hard to debug problems.
Fixes: 3f50f132d8 ("bpf: Verifier, do explicit ALU32 bounds tracking")
Reported-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Fix incorrect is_branch{32,64}_taken() analysis for the jsgt case. The return
code for both will tell the caller whether a given conditional jump is taken
or not, e.g. 1 means branch will be taken [for the involved registers] and the
goto target will be executed, 0 means branch will not be taken and instead we
fall-through to the next insn, and last but not least a -1 denotes that it is
not known at verification time whether a branch will be taken or not. Now while
the jsgt has the branch-taken case correct with reg->s32_min_value > sval, the
branch-not-taken case is off-by-one when testing for reg->s32_max_value < sval
since the branch will also be taken for reg->s32_max_value == sval. The jgt
branch analysis, for example, gets this right.
Fixes: 3f50f132d8 ("bpf: Verifier, do explicit ALU32 bounds tracking")
Fixes: 4f7b3e8258 ("bpf: improve verifier branch analysis")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
1) nf_conntrack_tuple_taken() needs to recheck zone for
NAT clash resolution, from Florian Westphal.
2) Restore support for stateful expressions when set definition
specifies no stateful expressions.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
In vsock_shutdown() we touched some socket fields without holding the
socket lock, such as 'state' and 'sk_flags'.
Also, after the introduction of multi-transport, we are accessing
'vsk->transport' in vsock_send_shutdown() without holding the lock
and this call can be made while the connection is in progress, so
the transport can change in the meantime.
To avoid issues, we hold the socket lock when we enter in
vsock_shutdown() and release it when we leave.
Among the transports that implement the 'shutdown' callback, only
hyperv_transport acquired the lock. Since the caller now holds it,
we no longer take it.
Fixes: d021c34405 ("VSOCK: Introduce VM Sockets")
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Huazhong Tan says:
====================
net: hns3: fixes for -net
The parameters sent from vf may be unreliable. If these
parameters are used directly, memory overwriting may occur.
So this series adds some checks for this case.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The index is received from vf, if use it directly,
an out-of-bound issue may be caused, so add a check for
this index before using it in hclge_get_rss_key().
Fixes: a638b1d8cc ("net: hns3: fix get VF RSS issue")
Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The tqp_index is received from vf, if use it directly,
an out-of-bound issue may be caused, so add a check for
this tqp_index before using it in hclge_get_ring_chain_from_mbx().
Fixes: 84e095d64e ("net: hns3: Change PF to add ring-vect binding & resetQ to mailbox")
Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The queue_id is received from vf, if use it directly,
an out-of-bound issue may be caused, so add a check for
this queue_id before using it in hclge_reset_vf_queue().
Fixes: 1a426f8b40 ("net: hns3: fix the VF queue reset flow error")
Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are several issues which may be seen when the link goes down while
forwarding traffic, all of which can be attributed to the fact that the
port flushing procedure from the reference manual was not closely
followed.
With flow control enabled on both the ingress port and the egress port,
it may happen when a link goes down that Ethernet packets are in flight.
In flow control mode, frames are held back and not dropped. When there
is enough traffic in flight (example: iperf3 TCP), then the ingress port
might enter congestion and never exit that state. This is a problem,
because it is the egress port's link that went down, and that has caused
the inability of the ingress port to send packets to any other port.
This is solved by flushing the egress port's queues when it goes down.
There is also a problem when performing stream splitting for
IEEE 802.1CB traffic (not yet upstream, but a sort of multicast,
basically). There, if one port from the destination ports mask goes
down, splitting the stream towards the other destinations will no longer
be performed. This can be traced down to this line:
ocelot_port_writel(ocelot_port, 0, DEV_MAC_ENA_CFG);
which should have been instead, as per the reference manual:
ocelot_port_rmwl(ocelot_port, 0, DEV_MAC_ENA_CFG_RX_ENA,
DEV_MAC_ENA_CFG);
Basically only DEV_MAC_ENA_CFG_RX_ENA should be disabled, but not
DEV_MAC_ENA_CFG_TX_ENA - I don't have further insight into why that is
the case, but apparently multicasting to several ports will cause issues
if at least one of them doesn't have DEV_MAC_ENA_CFG_TX_ENA set.
I am not sure what the state of the Ocelot VSC7514 driver is, but
probably not as bad as Felix/Seville, since VSC7514 uses phylib and has
the following in ocelot_adjust_link:
if (!phydev->link)
return;
therefore the port is not really put down when the link is lost, unlike
the DSA drivers which use .phylink_mac_link_down for that.
Nonetheless, I put ocelot_port_flush() in the common ocelot.c because it
needs to access some registers from drivers/net/ethernet/mscc/ocelot_rew.h
which are not exported in include/soc/mscc/ and a bugfix patch should
probably not move headers around.
Fixes: bdeced75b1 ("net: dsa: felix: Add PCS operations for PHYLINK")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This fixes a regression following dfs links that was introduced in the
patch series for the new mount api.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
Horatiu Vultur says:
====================
bridge: mrp: Fix br_mrp_port_switchdev_set_state
Based on the discussion here[1], there was a problem with the function
br_mrp_port_switchdev_set_state. The problem was that it was called
both with BR_STATE* and BR_MRP_PORT_STATE* types. This patch series
fixes this issue and removes SWITCHDEV_ATTR_ID_MRP_PORT_STAT because
is not used anymore.
[1] https://www.spinics.net/lists/netdev/msg714816.html
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that MRP started to use also SWITCHDEV_ATTR_ID_PORT_STP_STATE to
notify HW, then SWITCHDEV_ATTR_ID_MRP_PORT_STAT is not used anywhere
else, therefore we can remove it.
Fixes: c284b54590 ("switchdev: mrp: Extend switchdev API to offload MRP")
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The function br_mrp_port_switchdev_set_state was called both with MRP
port state and STP port state, which is an issue because they don't
match exactly.
Therefore, update the function to be used only with STP port state and
use the id SWITCHDEV_ATTR_ID_PORT_STP_STATE.
The choice of using STP over MRP is that the drivers already implement
SWITCHDEV_ATTR_ID_PORT_STP_STATE and already in SW we update the port
STP state.
Fixes: 9a9f26e8f7 ("bridge: mrp: Connect MRP API with the switchdev API")
Fixes: fadd409136 ("bridge: switchdev: mrp: Implement MRP API for switchdev")
Fixes: 2f1a11ae11 ("bridge: mrp: Add MRP interface.")
Reported-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Prevent netif_tx_disable() running concurrently with dev_watchdog() by
taking the device global xmit lock. Otherwise, the recommended:
netif_carrier_off(dev);
netif_tx_disable(dev);
driver shutdown sequence can happen after the watchdog has already
checked carrier, resulting in possible false alarms. This is because
netif_tx_lock() only sets the frozen bit without maintaining the locks
on the individual queues.
Fixes: c3f26a269c ("netdev: Fix lockdep warnings in multiqueue configurations.")
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Restore the original behaviour where users are allowed to add an element
with any stateful expression if the set definition specifies no stateful
expressions. Make sure upper maximum number of stateful expressions of
NFT_SET_EXPR_MAX is not reached.
Fixes: 8cfd9b0f85 ("netfilter: nftables: generalize set expressions support")
Fixes: 48b0ae046e ("netfilter: nftables: netlink support for several set element expressions")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The origin skip check needs to re-test the zone. Else, we might skip
a colliding tuple in the reply direction.
This only occurs when using 'directional zones' where origin tuples
reside in different zones but the reply tuples share the same zone.
This causes the new conntrack entry to be dropped at confirmation time
because NAT clash resolution was elided.
Fixes: 4e35c1cb94 ("netfilter: nf_nat: skip nat clash resolution for same-origin entries")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
If the socket is closed or is being released, some resources used by
virtio_transport_space_update() such as 'vsk->trans' may be released.
To avoid a use after free bug we should only update the available credit
when we are sure the socket is still open and we have the lock held.
Fixes: 06a8fc7836 ("VSOCK: Introduce virtio_vsock_common.ko")
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Link: https://lore.kernel.org/r/20210208144454.84438-1-sgarzare@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull tracing fix from Steven Rostedt:
"Fix output of top level event tracing 'enable' file.
When writing a tool for enabling events in the tracing system, an
anomaly was discovered. The top level event 'enable' file would never
show '1' when all events were enabled.
The system and event 'enable' files worked as expected.
The reason was because the top level event 'enable' file included the
'ftrace' tracer events, which are not controlled by the 'enable' file
and would cause the output to be wrong. This appears to have been a
bug since it was created"
* tag 'trace-v5.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Do not count ftrace events in top level enable output
The sctp transport seq_file iterators take a reference to the transport
in the ->start and ->next functions and releases the reference in the
->show function. The preferred handling for such resources is to
release them in the subsequent ->next or ->stop function call.
Since Commit 1f4aace60b ("fs/seq_file.c: simplify seq_file iteration
code and interface") there is no guarantee that ->show will be called
after ->next, so this function can now leak references.
So move the sctp_transport_put() call to ->next and ->stop.
Fixes: 1f4aace60b ("fs/seq_file.c: simplify seq_file iteration code and interface")
Reported-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This has been shown in tests:
[ +0.000008] WARNING: CPU: 3 PID: 7620 at kernel/rcu/srcutree.c:374 cleanup_srcu_struct+0xed/0x100
This is essentially a use-after free, although SRCU notices it as
an SRCU cleanup in an invalid context.
== Background ==
SGX has a data structure (struct sgx_encl_mm) which keeps per-mm SGX
metadata. This is separate from struct sgx_encl because, in theory,
an enclave can be mapped from more than one mm. sgx_encl_mm includes
a pointer back to the sgx_encl.
This means that sgx_encl must have a longer lifetime than all of the
sgx_encl_mm's that point to it. That's usually the case: sgx_encl_mm
is freed only after the mmu_notifier is unregistered in sgx_release().
However, there's a race. If the process is exiting,
sgx_mmu_notifier_release() can be called in parallel with sgx_release()
instead of being called *by* it. The mmu_notifier path keeps encl_mm
alive past when sgx_encl can be freed. This inverts the lifetime rules
and means that sgx_mmu_notifier_release() can access a freed sgx_encl.
== Fix ==
Increase encl->refcount when encl_mm->encl is established. Release
this reference when encl_mm is freed. This ensures that encl outlives
encl_mm.
[ bp: Massage commit message. ]
Fixes: 1728ab54b4 ("x86/sgx: Add a page reclaimer")
Reported-by: Haitao Huang <haitao.huang@linux.intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lkml.kernel.org/r/20210207221401.29933-1-jarkko@kernel.org
This reverts commit 32cf1a12ca.
The 'exisitng buffer' in this case is the firmware provided table, and
we should not modify that in place. This fixes a crash on arm64 with
initrd table overrides, in which case the DSDT is not mapped with
read/write permissions.
Reported-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
If the maximum performance level taken for computing the
arch_max_freq_ratio value used in the x86 scale-invariance code is
higher than the one corresponding to the cpuinfo.max_freq value
coming from the acpi_cpufreq driver, the scale-invariant utilization
falls below 100% even if the CPU runs at cpuinfo.max_freq or slightly
faster, which causes the schedutil governor to select a frequency
below cpuinfo.max_freq. That frequency corresponds to a frequency
table entry below the maximum performance level necessary to get to
the "boost" range of CPU frequencies which prevents "boost"
frequencies from being used in some workloads.
While this issue is related to scale-invariance, it may be amplified
by commit db865272d9 ("cpufreq: Avoid configuring old governors as
default with intel_pstate") from the 5.10 development cycle which
made it extremely easy to default to schedutil even if the preferred
driver is acpi_cpufreq as long as intel_pstate is built too, because
the mere presence of the latter effectively removes the ondemand
governor from the defaults. Distro kernels are likely to include
both intel_pstate and acpi_cpufreq on x86, so their users who cannot
use intel_pstate or choose to use acpi_cpufreq may easily be
affectecd by this issue.
If CPPC is available, it can be used to address this issue by
extending the frequency tables created by acpi_cpufreq to cover the
entire available frequency range (including "boost" frequencies) for
each CPU, but if CPPC is not there, acpi_cpufreq has no idea what
the maximum "boost" frequency is and the frequency tables created by
it cannot be extended in a meaningful way, so in that case make it
ask the arch scale-invariance code to to use the "nominal" performance
level for CPU utilization scaling in order to avoid the issue at hand.
Fixes: db865272d9 ("cpufreq: Avoid configuring old governors as default with intel_pstate")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Giovanni Gherdovich <ggherdovich@suse.cz>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
A severe performance regression on AMD EPYC processors when using
the schedutil scaling governor was discovered by Phoronix.com and
attributed to the following commits:
41ea667227 ("x86, sched: Calculate frequency invariance for AMD
systems")
976df7e573 ("x86, sched: Use midpoint of max_boost and max_P for
frequency invariance on AMD EPYC")
The source of the problem is that the maximum performance level taken
for computing the arch_max_freq_ratio value used in the x86 scale-
invariance code is higher than the one corresponding to the
cpuinfo.max_freq value coming from the acpi_cpufreq driver.
This effectively causes the scale-invariant utilization to fall below
100% even if the CPU runs at cpuinfo.max_freq or slightly faster, so
the schedutil governor selects a frequency below cpuinfo.max_freq
then. That frequency corresponds to a frequency table entry below
the maximum performance level necessary to get to the "boost" range
of CPU frequencies.
However, if the cpuinfo.max_freq value coming from acpi_cpufreq was
higher, the schedutil governor would select higher frequencies which
in turn would allow acpi_cpufreq to set more adequate performance
levels and to get to the "boost" range of CPU frequencies more often.
This issue affects any systems where acpi_cpufreq is used and the
"boost" (or "turbo") frequencies are enabled, not just AMD EPYC.
Moreover, commit db865272d9 ("cpufreq: Avoid configuring old
governors as default with intel_pstate") from the 5.10 development
cycle made it extremely easy to default to schedutil even if the
preferred driver is acpi_cpufreq as long as intel_pstate is built
too, because the mere presence of the latter effectively removes the
ondemand governor from the defaults. Distro kernels are likely to
include both intel_pstate and acpi_cpufreq on x86, so their users
who cannot use intel_pstate or choose to use acpi_cpufreq may
easily be affectecd by this issue.
To address this issue, extend the frequency table constructed by
acpi_cpufreq for each CPU to cover the entire range of available
frequencies (including the "boost" ones) if CPPC is available and
indicates that "boost" (or "turbo") frequencies are enabled. That
causes cpuinfo.max_freq to become the maximum "boost" frequency of
the given CPU (instead of the maximum frequency returned by the ACPI
_PSS object that corresponds to the "nominal" performance level).
Fixes: 41ea667227 ("x86, sched: Calculate frequency invariance for AMD systems")
Fixes: 976df7e573 ("x86, sched: Use midpoint of max_boost and max_P for frequency invariance on AMD EPYC")
Fixes: db865272d9 ("cpufreq: Avoid configuring old governors as default with intel_pstate")
Link: https://www.phoronix.com/scan.php?page=article&item=linux511-amd-schedutil&num=1
Link: https://lore.kernel.org/linux-pm/20210203135321.12253-2-ggherdovich@suse.cz/
Reported-by: Michael Larabel <Michael@phoronix.com>
Diagnosed-by: Giovanni Gherdovich <ggherdovich@suse.cz>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Giovanni Gherdovich <ggherdovich@suse.cz>
Reviewed-by: Giovanni Gherdovich <ggherdovich@suse.cz>
Tested-by: Michael Larabel <Michael@phoronix.com>
This reverts commit 842067940a.
For some solutions e.g. sound/soc/intel/catpt, DW DMA is part of a
compound device (in that very example, domains: ADSP, SSP0, SSP1, DMA0
and DMA1 are part of a single entity) rather than being a standalone
one. Driver for said device may enlist DMA to transfer data during
suspend or resume sequences.
Manipulating RPM explicitly in dw's DMA request and release channel
functions causes suspend() to also invoke resume() for the exact same
device. Similar situation occurs for resume() sequence. Effectively
renders device dysfunctional after first suspend() attempt. Revert the
change to address the problem.
Fixes: 842067940a ("dmaengine: dw: Enable runtime PM")
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Cezary Rojewski <cezary.rojewski@intel.com>
Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Link: https://lore.kernel.org/r/20210203191924.15706-1-cezary.rojewski@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Pull libnvdimm fixes from Dan Williams:
"A fix for a crash scenario that has been present since the initial
merge, a minor regression in sysfs attribute visibility, and a fix for
some flexible array warnings.
The bulk of this pull is an update to the libnvdimm unit test
infrastructure to test non-ACPI platforms. Given there is zero
regression risk for test updates, and the tests enable validation of
bits headed towards the next merge window, I saw no reason to hold the
new tests back. Santosh originally submitted this before the v5.11
window opened.
Summary:
- Fix a crash when sysfs accesses race 'dimm' driver probe/remove.
- Fix a regression in 'resource' attribute visibility necessary for
mapping badblocks and other physical address interrogations.
- Fix some flexible array warnings
- Expand the unit test infrastructure for non-ACPI platforms"
* tag 'libnvdimm-fixes-5.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
libnvdimm/dimm: Avoid race between probe and available_slots_show()
ndtest: Add papr health related flags
ndtest: Add nvdimm control functions
ndtest: Add regions and mappings to the test buses
ndtest: Add dimm attributes
ndtest: Add dimms to the two buses
ndtest: Add compatability string to treat it as PAPR family
testing/nvdimm: Add test module for non-nfit platforms
libnvdimm/namespace: Fix visibility of namespace resource attribute
libnvdimm/pmem: Remove unused header
ACPI: NFIT: Fix flexible_array.cocci warnings
Pull dma-mapping fix from Christoph Hellwig:
"Fix a 32 vs 64-bit padding issue in the new benchmark code (Barry
Song)"
* tag 'dma-mapping-5.11-2' of git://git.infradead.org/users/hch/dma-mapping:
dma-mapping: benchmark: use u8 for reserved field in uAPI structure
Pull irq fixes from Borislav Petkov:
- Prevent device managed IRQ allocation helpers from returning IRQ 0
- A fix for MSI activation of PCI endpoints with multiple MSIs
* tag 'irq_urgent_for_v5.11_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq: Prevent [devm_]irq_alloc_desc from returning irq 0
genirq/msi: Activate Multi-MSI early when MSI_FLAG_ACTIVATE_EARLY is set
Pull syscall entry fixes from Borislav Petkov:
- For syscall user dispatch, separate prctl operation from syscall
redirection range specification before the API has been made official
in 5.11.
- Ensure tasks using the generic syscall code do trap after returning
from a syscall when single-stepping is requested.
* tag 'core_urgent_for_v5.11_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
entry: Use different define for selector variable in SUD
entry: Ensure trap after single-step on system call return
Pull scheduler fix from Borislav Petkov:
"Revert an attempt to not spread IRQ threads on isolated CPUs which has
a bunch of problems"
* tag 'sched_urgent_for_v5.11_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
Revert "lib: Restrict cpumask_local_spread to houskeeping CPUs"
Pull timer fixes from Borislav Petkov:
"Two more timers-related fixes for v5.11:
- Use a freezable workqueue for RTC sync because the sync can happen
at any time and trigger suspend assertion checks in the i2c
subsystem.
- Correct a previous RTC validation change to check only bit 6 in
register D because some Intel machines use bits 0-5"
* tag 'timers_urgent_for_v5.11_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
ntp: Use freezable workqueue for RTC synchronization
rtc: mc146818: Dont test for bit 0-5 in Register D
Pull x86 fixes from Borislav Petkov:
"I hope this is the last batch of x86/urgent updates for this round:
- Remove superfluous EFI PGD range checks which lead to those
assertions failing with certain kernel configs and LLVM.
- Disable setting breakpoints on facilities involved in #DB exception
handling to avoid infinite loops.
- Add extra serialization to non-serializing MSRs (IA32_TSC_DEADLINE
and x2 APIC MSRs) to adhere to SDM's recommendation and avoid any
theoretical issues.
- Re-add the EPB MSR reading on turbostat so that it works on older
kernels which don't have the corresponding EPB sysfs file.
- Add Alder Lake to the list of CPUs which support split lock.
- Fix %dr6 register handling in order to be able to set watchpoints
with gdb again.
- Disable CET instrumentation in the kernel so that gcc doesn't add
ENDBR64 to kernel code and thus confuse tracing"
* tag 'x86_urgent_for_v5.11_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/efi: Remove EFI PGD build time checks
x86/debug: Prevent data breakpoints on cpu_dr7
x86/debug: Prevent data breakpoints on __per_cpu_offset
x86/apic: Add extra serialization for non-serializing MSRs
tools/power/turbostat: Fallback to an MSR read for EPB
x86/split_lock: Enable the split lock feature on another Alder Lake CPU
x86/debug: Fix DR6 handling
x86/build: Disable CET instrumentation in the kernel
Pull Kbuild fixes from Masahiro Yamada:
- Use the 'python3' command to invoke python scripts because some
distributions do not provide the 'python' command any more.
- Clean-up and update documents
- Use pkg-config to search libcrypto
- Fix duplicated debug flags
- Ignore some more stubs in scripts/kallsyms.c
* tag 'kbuild-fixes-v5.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kallsyms: fix nonconverging kallsyms table with lld
kbuild: fix duplicated flags in DEBUG_CFLAGS
scripts/clang-tools: switch explicitly to Python 3
kbuild: remove PYTHON variable
Documentation/llvm: Add a section about supported architectures
Revert "checkpatch: add check for keyword 'boolean' in Kconfig definitions"
scripts: use pkg-config to locate libcrypto
kconfig: mconf: fix HOSTCC call
doc: gcc-plugins: update gcc-plugins.rst
kbuild: simplify GCC_PLUGINS enablement in dummy-tools/gcc
Documentation/Kbuild: Remove references to gcc-plugin.sh
scripts: switch explicitly to Python 3
Pull cifs fixes from Steve French:
"Three small smb3 fixes for stable"
* tag '5.11-rc6-smb3' of git://git.samba.org/sfrench/cifs-2.6:
cifs: report error instead of invalid when revalidating a dentry fails
smb3: fix crediting for compounding when only one request in flight
smb3: Fix out-of-bounds bug in SMB2_negotiate()
Pull RISC-V fixes from Palmer Dabbelt:
"A handful of fixes for this week:
- A fix to avoid evalating the VA twice in virt_addr_valid, which
fixes some WARNs under DEBUG_VIRTUAL.
- Two fixes related to STRICT_KERNEL_RWX: one that fixes some
permissions when strict is disabled, and one to fix some alignment
issues when strict is enabled.
- A fix to disallow the selection of MAXPHYSMEM_2GB on RV32, which
isn't valid any more but may still show up in some oldconfigs.
We still have the HiFive Unleashed ethernet phy reset regression, so
there will likely be something coming next week"
* tag 'riscv-for-linus-5.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
RISC-V: Define MAXPHYSMEM_1GB only for RV32
riscv: Align on L1_CACHE_BYTES when STRICT_KERNEL_RWX
RISC-V: Fix .init section permission update
riscv: virt_addr_valid must check the address belongs to linear mapping
Pull powerpc fixes from Michael Ellerman:
- A fix for a change we made to __kernel_sigtramp_rt64() which confused
glibc's backtrace logic, and also changed the semantics of that
symbol, which was arguably an ABI break.
- A fix for a stack overwrite in our VSX instruction emulation.
- A couple of fixes for the Makefile logic in the new C VDSO.
Thanks to Masahiro Yamada, Naveen N. Rao, Raoni Fassina Firmino, and
Ravi Bangoria.
* tag 'powerpc-5.11-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/64/signal: Fix regression in __kernel_sigtramp_rt64() semantics
powerpc/vdso64: remove meaningless vgettimeofday.o build rule
powerpc/vdso: fix unnecessary rebuilds of vgettimeofday.o
powerpc/sstep: Fix array out of bound warning
Pull ARM fixes from Russell King:
- Fix latent bug with DC21285 (Footbridge PCI bridge) configuration
accessors that affects GCC >= 4.9.2
- Fix misplaced tegra_uart_config in decompressor
- Ensure signal page contents are initialised
- Fix kexec oops
* tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: kexec: fix oops after TLB are invalidated
ARM: ensure the signal page contains defined contents
ARM: 9043/1: tegra: Fix misplaced tegra_uart_config in decompressor
ARM: footbridge: fix dc21285 PCI configuration accessors
The verdict returned from ena_xdp_execute() is used to determine the
fate of the RX buffer's page. In case of XDP Redirect/TX error the
verdict should be set to XDP_ABORTED, otherwise the page won't be freed.
Fixes: a318c70ad1 ("net: ena: introduce XDP redirect implementation")
Signed-off-by: Shay Agroskin <shayagr@amazon.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In vsock_stream_connect(), a thread will enter schedule_timeout().
While being scheduled out, another thread can enter vsock_stream_connect()
as well and set vsk->transport to NULL. In case a signal was sent, the
first thread can leave schedule_timeout() and vsock_transport_cancel_pkt()
will be called right after. Inside vsock_transport_cancel_pkt(), a null
dereference will happen on transport->cancel_pkt.
Fixes: c0cfa2d8a7 ("vsock: add multi-transports support")
Signed-off-by: Norbert Slusarek <nslusarek@gmx.net>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/trinity-c2d6cede-bfb1-44e2-85af-1fbc7f541715-1612535117028@3c-app-gmx-bap12
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull USB fixes from Greg KH:
"Here are some small, last-minute, USB driver fixes for 5.11-rc7
They all resolve issues reported, or are a few new device ids for some
drivers. They include:
- new device ids for some usb-serial drivers
- xhci fixes for a variety of reported problems
- dwc3 driver bugfixes
- dwc2 driver bugfixes
- usblp driver bugfix
- thunderbolt bugfix
- few other tiny fixes
All have been in linux-next with no reported issues"
* tag 'usb-5.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: dwc2: Fix endpoint direction check in ep_from_windex
usb: dwc3: fix clock issue during resume in OTG mode
xhci: fix bounce buffer usage for non-sg list case
usb: host: xhci: mvebu: make USB 3.0 PHY optional for Armada 3720
usb: xhci-mtk: break loop when find the endpoint to drop
usb: xhci-mtk: skip dropping bandwidth of unchecked endpoints
usb: renesas_usbhs: Clear pipe running flag in usbhs_pkt_pop()
USB: gadget: legacy: fix an error code in eth_bind()
thunderbolt: Fix possible NULL pointer dereference in tb_acpi_add_link()
USB: serial: option: Adding support for Cinterion MV31
usb: xhci-mtk: fix unreleased bandwidth data
usb: gadget: aspeed: add missing of_node_put
USB: usblp: don't call usb_set_interface if there's a single alt
USB: serial: cp210x: add pid/vid for WSDA-200-USB
USB: serial: cp210x: add new VID/PID for supporting Teraoka AD2000
Pull SCSI fix from James Bottomley:
"One fix in drivers (lpfc) that stops an oops on resource exhaustion"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: lpfc: Fix EEH encountering oops with NVMe traffic
Pull block fixes from Jens Axboe:
"A few small regression fixes:
- NVMe pull request from Christoph:
- more quirks for buggy devices (Thorsten Leemhuis, Claus Stovgaard)
- update the email address for Keith (Keith Busch)
- fix an out of bounds access in nvmet-tcp (Sagi Grimberg)
- Regression fix for BFQ shallow depth calculations introduced in
this merge window (Lin)"
* tag 'block-5.11-2021-02-05' of git://git.kernel.dk/linux-block:
nvmet-tcp: fix out-of-bounds access when receiving multiple h2cdata PDUs
bfq-iosched: Revert "bfq: Fix computation of shallow depth"
update the email address for Keith Bush
nvme-pci: ignore the subsysem NQN on Phison E16
nvme-pci: avoid the deepest sleep state on Kingston A2000 SSDs
Pull io_uring fixes from Jens Axboe:
"Two small fixes that should go into 5.11:
- task_work resource drop fix (Pavel)
- identity COW fix (Xiaoguang)"
* tag 'io_uring-5.11-2021-02-05' of git://git.kernel.dk/linux-block:
io_uring: drop mm/files between task_work_submit
io_uring: don't modify identity's files uncess identity is cowed
Normally we clear the failover_pending flag when processing the reset.
But if we are unable to schedule a failover reset we must clear the
flag ourselves. We could fail to schedule the reset if we are in PROBING
state (eg: when booting via kexec) or because we could not allocate memory.
Thanks to Cris Forno for helping isolate the problem and for testing.
Fixes: 1d85049374 ("powerpc/vnic: Extend "failover pending" window")
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Tested-by: Cristobal Forno <cforno12@linux.ibm.com>
Link: https://lore.kernel.org/r/20210203050802.680772-1-sukadev@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kalle Valo says:
====================
wireless-drivers fixes for v5.11
Third, and most likely the last, set of fixes for v5.11. Two very
small fixes.
ath9k
* fix build regression related to LEDS_CLASS
mt76
* fix a memory leak
* tag 'wireless-drivers-2021-02-05' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers:
mt76: dma: fix a possible memory leak in mt76_add_fragment()
ath9k: fix build error with LEDS_CLASS=m
====================
Link: https://lore.kernel.org/r/20210205163434.14D94C433ED@smtp.codeaurora.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
With CONFIG_X86_5LEVEL, CONFIG_UBSAN and CONFIG_UBSAN_UNSIGNED_OVERFLOW
enabled, clang fails the build with
x86_64-linux-ld: arch/x86/platform/efi/efi_64.o: in function `efi_sync_low_kernel_mappings':
efi_64.c:(.text+0x22c): undefined reference to `__compiletime_assert_354'
which happens due to -fsanitize=unsigned-integer-overflow being enabled:
-fsanitize=unsigned-integer-overflow: Unsigned integer overflow, where
the result of an unsigned integer computation cannot be represented
in its type. Unlike signed integer overflow, this is not undefined
behavior, but it is often unintentional. This sanitizer does not check
for lossy implicit conversions performed before such a computation
(see -fsanitize=implicit-conversion).
and that fires when the (intentional) EFI_VA_START/END defines overflow
an unsigned long, leading to the assertion expressions not getting
optimized away (on GCC they do)...
However, those checks are superfluous: the runtime services mapping
code already makes sure the ranges don't overshoot EFI_VA_END as the
EFI mapping range is hardcoded. On each runtime services call, it is
switched to the EFI-specific PGD and even if mappings manage to escape
that last PGD, this won't remain unnoticed for long.
So rip them out.
See https://github.com/ClangBuiltLinux/linux/issues/256 for more info.
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Link: http://lkml.kernel.org/r/20210107223424.4135538-1-arnd@kernel.org
This fix the bad fault reported by KUAP when io_wqe_worker access userspace.
Bug: Read fault blocked by KUAP!
WARNING: CPU: 1 PID: 101841 at arch/powerpc/mm/fault.c:229 __do_page_fault+0x6b4/0xcd0
NIP [c00000000009e7e4] __do_page_fault+0x6b4/0xcd0
LR [c00000000009e7e0] __do_page_fault+0x6b0/0xcd0
..........
Call Trace:
[c000000016367330] [c00000000009e7e0] __do_page_fault+0x6b0/0xcd0 (unreliable)
[c0000000163673e0] [c00000000009ee3c] do_page_fault+0x3c/0x120
[c000000016367430] [c00000000000c848] handle_page_fault+0x10/0x2c
--- interrupt: 300 at iov_iter_fault_in_readable+0x148/0x6f0
..........
NIP [c0000000008e8228] iov_iter_fault_in_readable+0x148/0x6f0
LR [c0000000008e834c] iov_iter_fault_in_readable+0x26c/0x6f0
interrupt: 300
[c0000000163677e0] [c0000000007154a0] iomap_write_actor+0xc0/0x280
[c000000016367880] [c00000000070fc94] iomap_apply+0x1c4/0x780
[c000000016367990] [c000000000710330] iomap_file_buffered_write+0xa0/0x120
[c0000000163679e0] [c00800000040791c] xfs_file_buffered_aio_write+0x314/0x5e0 [xfs]
[c000000016367a90] [c0000000006d74bc] io_write+0x10c/0x460
[c000000016367bb0] [c0000000006d80e4] io_issue_sqe+0x8d4/0x1200
[c000000016367c70] [c0000000006d8ad0] io_wq_submit_work+0xc0/0x250
[c000000016367cb0] [c0000000006e2578] io_worker_handle_work+0x498/0x800
[c000000016367d40] [c0000000006e2cdc] io_wqe_worker+0x3fc/0x4f0
[c000000016367da0] [c0000000001cb0a4] kthread+0x1c4/0x1d0
[c000000016367e10] [c00000000000dbf0] ret_from_kernel_thread+0x5c/0x6c
The kernel consider thread AMR value for kernel thread to be
AMR_KUAP_BLOCKED. Hence access to userspace is denied. This
of course not correct and we should allow userspace access after
kthread_use_mm(). To be precise, kthread_use_mm() should inherit the
AMR value of the operating address space. But, the AMR value is
thread-specific and we inherit the address space and not thread
access restrictions. Because of this ignore AMR value when accessing
userspace via kernel thread.
current_thread_amr/iamr() are updated, because we use them in the
below stack.
....
[ 530.710838] CPU: 13 PID: 5587 Comm: io_wqe_worker-0 Tainted: G D 5.11.0-rc6+ #3
....
NIP [c0000000000aa0c8] pkey_access_permitted+0x28/0x90
LR [c0000000004b9278] gup_pte_range+0x188/0x420
--- interrupt: 700
[c00000001c4ef3f0] [0000000000000000] 0x0 (unreliable)
[c00000001c4ef490] [c0000000004bd39c] gup_pgd_range+0x3ac/0xa20
[c00000001c4ef5a0] [c0000000004bdd44] internal_get_user_pages_fast+0x334/0x410
[c00000001c4ef620] [c000000000852028] iov_iter_get_pages+0xf8/0x5c0
[c00000001c4ef6a0] [c0000000007da44c] bio_iov_iter_get_pages+0xec/0x700
[c00000001c4ef770] [c0000000006a325c] iomap_dio_bio_actor+0x2ac/0x4f0
[c00000001c4ef810] [c00000000069cd94] iomap_apply+0x2b4/0x740
[c00000001c4ef920] [c0000000006a38b8] __iomap_dio_rw+0x238/0x5c0
[c00000001c4ef9d0] [c0000000006a3c60] iomap_dio_rw+0x20/0x80
[c00000001c4ef9f0] [c008000001927a30] xfs_file_dio_aio_write+0x1f8/0x650 [xfs]
[c00000001c4efa60] [c0080000019284dc] xfs_file_write_iter+0xc4/0x130 [xfs]
[c00000001c4efa90] [c000000000669984] io_write+0x104/0x4b0
[c00000001c4efbb0] [c00000000066cea4] io_issue_sqe+0x3d4/0xf50
[c00000001c4efc60] [c000000000670200] io_wq_submit_work+0xb0/0x2f0
[c00000001c4efcb0] [c000000000674268] io_worker_handle_work+0x248/0x4a0
[c00000001c4efd30] [c0000000006746e8] io_wqe_worker+0x228/0x2a0
[c00000001c4efda0] [c00000000019d994] kthread+0x1b4/0x1c0
Fixes: 48a8ab4eeb ("powerpc/book3s64/pkeys: Don't update SPRN_AMR when in kernel mode.")
Reported-by: Zorro Lang <zlang@redhat.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210206025634.521979-1-aneesh.kumar@linux.ibm.com
Camelia Groza says:
====================
dpaa_eth: A050385 erratum workaround fixes under XDP
This series addresses issue with the current workaround for the A050385
erratum in XDP scenarios.
The first patch makes sure the xdp_frame structure stored at the start of
new buffers isn't overwritten.
The second patch decreases the required data alignment value, thus
preventing unnecessary realignments.
The third patch moves the data in place to align it, instead of allocating
a new buffer for each frame that breaks the alignment rules, thus bringing
an up to 40% performance increase. With this change, the impact of the
erratum workaround is reduced in many cases to a single digit decrease, and
to lower double digits in single flow scenarios.
Changes in v2:
- guarantee enough tailroom is available for the shared_info in 1/3
====================
Link: https://lore.kernel.org/r/cover.1612456902.git.camelia.groza@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The XDP frame's headroom might be large enough to accommodate the
xdpf backpointer as well as shifting the data to an aligned address.
Try this first before resorting to allocating a new buffer and copying
the data.
Suggested-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Camelia Groza <camelia.groza@nxp.com>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Acked-by: Madalin Bucur <madalin.bucur@oss.nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The 256 byte data alignment is required for preventing DMA transaction
splits when crossing 4K page boundaries. Since XDP deals only with page
sized buffers or less, this restriction isn't needed. Instead, the data
only needs to be aligned to 64 bytes to prevent DMA transaction splits.
These lessened restrictions can increase performance by widening the pool
of permitted data alignments and preventing unnecessary realignments.
Fixes: ae680bcbd0 ("dpaa_eth: implement the A050385 erratum workaround for XDP")
Signed-off-by: Camelia Groza <camelia.groza@nxp.com>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Acked-by: Madalin Bucur <madalin.bucur@oss.nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When the erratum workaround is triggered, the newly created xdp_frame
structure is stored at the start of the newly allocated buffer. Avoid
the structure from being overwritten by explicitly reserving enough
space in the buffer for storing it.
Account for the fact that the structure's size might increase in time by
aligning the headroom to DPAA_FD_DATA_ALIGNMENT bytes, thus guaranteeing
the data's alignment.
Fixes: ae680bcbd0 ("dpaa_eth: implement the A050385 erratum workaround for XDP")
Signed-off-by: Camelia Groza <camelia.groza@nxp.com>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Acked-by: Madalin Bucur <madalin.bucur@oss.nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit c80794323e ("net: Fix packet reordering caused by GRO and
listified RX cooperation") had the unfortunate effect of adding
latencies in common workloads.
Before the patch, GRO packets were immediately passed to
upper stacks.
After the patch, we can accumulate quite a lot of GRO
packets (depdending on NAPI budget).
My fix is counting in napi->rx_count number of segments
instead of number of logical packets.
Fixes: c80794323e ("net: Fix packet reordering caused by GRO and listified RX cooperation")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Bisected-by: John Sperbeck <jsperbeck@google.com>
Tested-by: Jian Yang <jianyang@google.com>
Cc: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Alexander Lobakin <alobakin@pm.me>
Link: https://lore.kernel.org/r/20210204213146.4192368-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Michael Kerrisk suggested that, from an API perspective, it is a bad
idea to share the PR_SYS_DISPATCH_ defines between the prctl operation
and the selector variable.
Therefore, define two new constants to be used by SUD's selector variable
and update the corresponding documentation and test cases.
While this changes the API syscall user dispatch has never been part of a
Linux release, it will show up for the first time in 5.11.
Suggested-by: Michael Kerrisk (man-pages) <mtk.manpages@gmail.com>
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210205184321.2062251-1-krisman@collabora.com
Commit 2991552447 ("entry: Drop usage of TIF flags in the generic syscall
code") introduced a bug on architectures using the generic syscall entry
code, in which processes stopped by PTRACE_SYSCALL do not trap on syscall
return after receiving a TIF_SINGLESTEP.
The reason is that the meaning of TIF_SINGLESTEP flag is overloaded to
cause the trap after a system call is executed, but since the above commit,
the syscall call handler only checks for the SYSCALL_WORK flags on the exit
work.
Split the meaning of TIF_SINGLESTEP such that it only means single-step
mode, and create a new type of SYSCALL_WORK to request a trap immediately
after a syscall in single-step mode. In the current implementation, the
SYSCALL_WORK flag shadows the TIF_SINGLESTEP flag for simplicity.
Update x86 to flip this bit when a tracer enables single stepping.
Fixes: 2991552447 ("entry: Drop usage of TIF flags in the generic syscall code")
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Kyle Huey <me@kylehuey.com>
Link: https://lore.kernel.org/r/87h7mtc9pr.fsf_-_@collabora.com
This reverts commit 1abdfe706a.
This change is broken and not solving any problem it claims to solve.
Robin reported that cpumask_local_spread() now returns any cpu out of
cpu_possible_mask in case that NOHZ_FULL is disabled (runtime or compile
time). It can also return any offline or not-present CPU in the
housekeeping mask. Before that it was returning a CPU out of
online_cpu_mask.
While the function is racy against CPU hotplug if the caller does not
protect against it, the actual use cases are not caring much about it as
they use it mostly as hint for:
- the user space affinity hint which is unused by the kernel
- memory node selection which is just suboptimal
- network queue affinity which might fail but is handled gracefully
But the occasional fail vs. hotplug is very different from returning
anything from possible_cpu_mask which can have a large amount of offline
CPUs obviously.
The changelog of the commit claims:
"The current implementation of cpumask_local_spread() does not respect
the isolated CPUs, i.e., even if a CPU has been isolated for Real-Time
task, it will return it to the caller for pinning of its IRQ
threads. Having these unwanted IRQ threads on an isolated CPU adds up
to a latency overhead."
The only correct part of this changelog is:
"The current implementation of cpumask_local_spread() does not respect
the isolated CPUs."
Everything else is just disjunct from reality.
Reported-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Nitesh Narayan Lal <nitesh@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: abelits@marvell.com
Cc: davem@davemloft.net
Link: https://lore.kernel.org/r/87y2g26tnt.fsf@nanos.tec.linutronix.de
Merge misc fixes from Andrew Morton:
"18 patches.
Subsystems affected by this patch series: mm (hugetlb, compaction,
vmalloc, shmem, memblock, pagecache, kasan, and hugetlb), mailmap,
gcov, ubsan, and MAINTAINERS"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
MAINTAINERS/.mailmap: use my @kernel.org address
mm: hugetlb: fix missing put_page in gather_surplus_pages()
ubsan: implement __ubsan_handle_alignment_assumption
kasan: make addr_has_metadata() return true for valid addresses
kasan: add explicit preconditions to kasan_report()
mm/filemap: add missing mem_cgroup_uncharge() to __add_to_page_cache_locked()
mailmap: add entries for Manivannan Sadhasivam
mailmap: fix name/email for Viresh Kumar
memblock: do not start bottom-up allocations with kernel_end
mm: thp: fix MADV_REMOVE deadlock on shmem THP
init/gcov: allow CONFIG_CONSTRUCTORS on UML to fix module gcov
mm/vmalloc: separate put pages and flush VM flags
mm, compaction: move high_pfn to the for loop scope
mm: migrate: do not migrate HugeTLB page whose refcount is one
mm: hugetlb: remove VM_BUG_ON_PAGE from page_huge_active
mm: hugetlb: fix a race between isolating and freeing page
mm: hugetlb: fix a race between freeing and dissolving the page
mm: hugetlbfs: fix cannot migrate the fallocated HugeTLB page
The file /sys/kernel/tracing/events/enable is used to enable all events by
echoing in "1", or disabling all events when echoing in "0". To know if all
events are enabled, disabled, or some are enabled but not all of them,
cating the file should show either "1" (all enabled), "0" (all disabled), or
"X" (some enabled but not all of them). This works the same as the "enable"
files in the individule system directories (like tracing/events/sched/enable).
But when all events are enabled, the top level "enable" file shows "X". The
reason is that its checking the "ftrace" events, which are special events
that only exist for their format files. These include the format for the
function tracer events, that are enabled when the function tracer is
enabled, but not by the "enable" file. The check includes these events,
which will always be disabled, and even though all true events are enabled,
the top level "enable" file will show "X" instead of "1".
To fix this, have the check test the event's flags to see if it has the
"IGNORE_ENABLE" flag set, and if so, not test it.
Cc: stable@vger.kernel.org
Fixes: 553552ce17 ("tracing: Combine event filter_active and enable into single flags field")
Reported-by: "Yordan Karadzhov (VMware)" <y.karadz@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Since commit a85a6c86c2 ("driver core: platform: Clarify that IRQ 0
is invalid"), having a linux-irq with number 0 will trigger a WARN()
when calling platform_get_irq*() to retrieve that linux-irq.
Since [devm_]irq_alloc_desc allocs a single irq and since irq 0 is not used
on some systems, it can return 0, triggering that WARN(). This happens
e.g. on Intel Bay Trail and Cherry Trail devices using the LPE audio engine
for HDMI audio:
0 is an invalid IRQ number
WARNING: CPU: 3 PID: 472 at drivers/base/platform.c:238 platform_get_irq_optional+0x108/0x180
Modules linked in: snd_hdmi_lpe_audio(+) ...
Call Trace:
platform_get_irq+0x17/0x30
hdmi_lpe_audio_probe+0x4a/0x6c0 [snd_hdmi_lpe_audio]
---[ end trace ceece38854223a0b ]---
Change the 'from' parameter passed to __[devm_]irq_alloc_descs() by the
[devm_]irq_alloc_desc macros from 0 to 1, so that these macros will no
longer return 0.
Fixes: a85a6c86c2 ("driver core: platform: Clarify that IRQ 0 is invalid")
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20201221185647.226146-1-hdegoede@redhat.com
Assuming
- //HOST/a is mounted on /mnt
- //HOST/b is mounted on /mnt/b
On a slow connection, running 'df' and killing it while it's
processing /mnt/b can make cifs_get_inode_info() returns -ERESTARTSYS.
This triggers the following chain of events:
=> the dentry revalidation fail
=> dentry is put and released
=> superblock associated with the dentry is put
=> /mnt/b is unmounted
This patch makes cifs_d_revalidate() return the error instead of 0
(invalid) when cifs_revalidate_dentry() fails, except for ENOENT (file
deleted) and ESTALE (file recreated).
Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Suggested-by: Shyam Prasad N <nspmangalore@gmail.com>
Reviewed-by: Shyam Prasad N <nspmangalore@gmail.com>
CC: stable@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
local_db_save() is called at the start of exc_debug_kernel(), reads DR7 and
disables breakpoints to prevent recursion.
When running in a guest (X86_FEATURE_HYPERVISOR), local_db_save() reads the
per-cpu variable cpu_dr7 to check whether a breakpoint is active or not
before it accesses DR7.
A data breakpoint on cpu_dr7 therefore results in infinite #DB recursion.
Disallow data breakpoints on cpu_dr7 to prevent that.
Fixes: 84b6a3491567a("x86/entry: Optimize local_db_save() for virt")
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20210204152708.21308-2-jiangshanlai@gmail.com
When FSGSBASE is enabled, paranoid_entry() fetches the per-CPU GSBASE value
via __per_cpu_offset or pcpu_unit_offsets.
When a data breakpoint is set on __per_cpu_offset[cpu] (read-write
operation), the specific CPU will be stuck in an infinite #DB loop.
RCU will try to send an NMI to the specific CPU, but it is not working
either since NMI also relies on paranoid_entry(). Which means it's
undebuggable.
Fixes: eaad981291ee3("x86/entry/64: Introduce the FIND_PERCPU_BASE macro")
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20210204152708.21308-1-jiangshanlai@gmail.com
Patch series "kasan: Fix metadata detection for KASAN_HW_TAGS", v5.
With the introduction of KASAN_HW_TAGS, kasan_report() currently assumes
that every location in memory has valid metadata associated. This is
due to the fact that addr_has_metadata() returns always true.
As a consequence of this, an invalid address (e.g. NULL pointer
address) passed to kasan_report() when KASAN_HW_TAGS is enabled, leads
to a kernel panic.
Example below, based on arm64:
BUG: KASAN: invalid-access in 0x0
Read at addr 0000000000000000 by task swapper/0/1
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
Mem abort info:
ESR = 0x96000004
EC = 0x25: DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
Data abort info:
ISV = 0, ISS = 0x00000004
CM = 0, WnR = 0
...
Call trace:
mte_get_mem_tag+0x24/0x40
kasan_report+0x1a4/0x410
alsa_sound_last_init+0x8c/0xa4
do_one_initcall+0x50/0x1b0
kernel_init_freeable+0x1d4/0x23c
kernel_init+0x14/0x118
ret_from_fork+0x10/0x34
Code: d65f03c0 9000f021 f9428021 b6cfff61 (d9600000)
---[ end trace 377c8bb45bdd3a1a ]---
hrtimer: interrupt took 48694256 ns
note: swapper/0[1] exited with preempt_count 1
Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
SMP: stopping secondary CPUs
Kernel Offset: 0x35abaf140000 from 0xffff800010000000
PHYS_OFFSET: 0x40000000
CPU features: 0x0a7e0152,61c0a030
Memory Limit: none
---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---
This series fixes the behavior of addr_has_metadata() that now returns
true only when the address is valid.
This patch (of 2):
With the introduction of KASAN_HW_TAGS, kasan_report() accesses the
metadata only when addr_has_metadata() succeeds.
Add a comment to make sure that the preconditions to the function are
explicitly clarified.
Link: https://lkml.kernel.org/r/20210126134409.47894-1-vincenzo.frascino@arm.com
Link: https://lkml.kernel.org/r/20210126134409.47894-2-vincenzo.frascino@arm.com
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Leon Romanovsky <leonro@mellanox.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 3fea5a499d ("mm: memcontrol: convert page cache to a new
mem_cgroup_charge() API") introduced a bug in __add_to_page_cache_locked()
causing the following splat:
page dumped because: VM_BUG_ON_PAGE(page_memcg(page))
pages's memcg:ffff8889a4116000
------------[ cut here ]------------
kernel BUG at mm/memcontrol.c:2924!
invalid opcode: 0000 [#1] SMP KASAN PTI
CPU: 35 PID: 12345 Comm: cat Tainted: G S W I 5.11.0-rc4-debug+ #1
Hardware name: HP HP Z8 G4 Workstation/81C7, BIOS P60 v01.25 12/06/2017
RIP: commit_charge+0xf4/0x130
Call Trace:
mem_cgroup_charge+0x175/0x770
__add_to_page_cache_locked+0x712/0xad0
add_to_page_cache_lru+0xc5/0x1f0
cachefiles_read_or_alloc_pages+0x895/0x2e10 [cachefiles]
__fscache_read_or_alloc_pages+0x6c0/0xa00 [fscache]
__nfs_readpages_from_fscache+0x16d/0x630 [nfs]
nfs_readpages+0x24e/0x540 [nfs]
read_pages+0x5b1/0xc40
page_cache_ra_unbounded+0x460/0x750
generic_file_buffered_read_get_pages+0x290/0x1710
generic_file_buffered_read+0x2a9/0xc30
nfs_file_read+0x13f/0x230 [nfs]
new_sync_read+0x3af/0x610
vfs_read+0x339/0x4b0
ksys_read+0xf1/0x1c0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Before that commit, there was a try_charge() and commit_charge() in
__add_to_page_cache_locked(). These two separated charge functions were
replaced by a single mem_cgroup_charge(). However, it forgot to add a
matching mem_cgroup_uncharge() when the xarray insertion failed with the
page released back to the pool.
Fix this by adding a mem_cgroup_uncharge() call when insertion error
happens.
Link: https://lkml.kernel.org/r/20210125042441.20030-1-longman@redhat.com
Fixes: 3fea5a499d ("mm: memcontrol: convert page cache to a new mem_cgroup_charge() API")
Signed-off-by: Waiman Long <longman@redhat.com>
Reviewed-by: Alex Shi <alex.shi@linux.alibaba.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Muchun Song <smuchun@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
With kaslr the kernel image is placed at a random place, so starting the
bottom-up allocation with the kernel_end can result in an allocation
failure and a warning like this one:
hugetlb_cma: reserve 2048 MiB, up to 2048 MiB per node
------------[ cut here ]------------
memblock: bottom-up allocation failed, memory hotremove may be affected
WARNING: CPU: 0 PID: 0 at mm/memblock.c:332 memblock_find_in_range_node+0x178/0x25a
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 5.10.0+ #1169
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 04/01/2014
RIP: 0010:memblock_find_in_range_node+0x178/0x25a
Code: e9 6d ff ff ff 48 85 c0 0f 85 da 00 00 00 80 3d 9b 35 df 00 00 75 15 48 c7 c7 c0 75 59 88 c6 05 8b 35 df 00 01 e8 25 8a fa ff <0f> 0b 48 c7 44 24 20 ff ff ff ff 44 89 e6 44 89 ea 48 c7 c1 70 5c
RSP: 0000:ffffffff88803d18 EFLAGS: 00010086 ORIG_RAX: 0000000000000000
RAX: 0000000000000000 RBX: 0000000240000000 RCX: 00000000ffffdfff
RDX: 00000000ffffdfff RSI: 00000000ffffffea RDI: 0000000000000046
RBP: 0000000100000000 R08: ffffffff88922788 R09: 0000000000009ffb
R10: 00000000ffffe000 R11: 3fffffffffffffff R12: 0000000000000000
R13: 0000000000000000 R14: 0000000080000000 R15: 00000001fb42c000
FS: 0000000000000000(0000) GS:ffffffff88f71000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffa080fb401000 CR3: 00000001fa80a000 CR4: 00000000000406b0
Call Trace:
memblock_alloc_range_nid+0x8d/0x11e
cma_declare_contiguous_nid+0x2c4/0x38c
hugetlb_cma_reserve+0xdc/0x128
flush_tlb_one_kernel+0xc/0x20
native_set_fixmap+0x82/0xd0
flat_get_apic_id+0x5/0x10
register_lapic_address+0x8e/0x97
setup_arch+0x8a5/0xc3f
start_kernel+0x66/0x547
load_ucode_bsp+0x4c/0xcd
secondary_startup_64_no_verify+0xb0/0xbb
random: get_random_bytes called from __warn+0xab/0x110 with crng_init=0
---[ end trace f151227d0b39be70 ]---
At the same time, the kernel image is protected with memblock_reserve(),
so we can just start searching at PAGE_SIZE. In this case the bottom-up
allocation has the same chances to success as a top-down allocation, so
there is no reason to fallback in the case of a failure. All together it
simplifies the logic.
Link: https://lkml.kernel.org/r/20201217201214.3414100-2-guro@fb.com
Fixes: 8fabc62323 ("powerpc: Ensure that swiotlb buffer is allocated from low memory")
Signed-off-by: Roman Gushchin <guro@fb.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Wonhyuk Yang <vvghjk1234@gmail.com>
Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sergey reported deadlock between kswapd correctly doing its usual
lock_page(page) followed by down_read(page->mapping->i_mmap_rwsem), and
madvise(MADV_REMOVE) on an madvise(MADV_HUGEPAGE) area doing
down_write(page->mapping->i_mmap_rwsem) followed by lock_page(page).
This happened when shmem_fallocate(punch hole)'s unmap_mapping_range()
reaches zap_pmd_range()'s call to __split_huge_pmd(). The same deadlock
could occur when partially truncating a mapped huge tmpfs file, or using
fallocate(FALLOC_FL_PUNCH_HOLE) on it.
__split_huge_pmd()'s page lock was added in 5.8, to make sure that any
concurrent use of reuse_swap_page() (holding page lock) could not catch
the anon THP's mapcounts and swapcounts while they were being split.
Fortunately, reuse_swap_page() is never applied to a shmem or file THP
(not even by khugepaged, which checks PageSwapCache before calling), and
anonymous THPs are never created in shmem or file areas: so that
__split_huge_pmd()'s page lock can only be necessary for anonymous THPs,
on which there is no risk of deadlock with i_mmap_rwsem.
Link: https://lkml.kernel.org/r/alpine.LSU.2.11.2101161409470.2022@eggly.anvils
Fixes: c444eb564f ("mm: thp: make the THP mapcount atomic against __split_huge_pmd_locked()")
Signed-off-by: Hugh Dickins <hughd@google.com>
Reported-by: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On ARCH=um, loading a module doesn't result in its constructors getting
called, which breaks module gcov since the debugfs files are never
registered. On the other hand, in-kernel constructors have already been
called by the dynamic linker, so we can't call them again.
Get out of this conundrum by allowing CONFIG_CONSTRUCTORS to be
selected, but avoiding the in-kernel constructor calls.
Also remove the "if !UML" from GCOV selecting CONSTRUCTORS now, since we
really do want CONSTRUCTORS, just not kernel binary ones.
Link: https://lkml.kernel.org/r/20210120172041.c246a2cac2fb.I1358f584b76f1898373adfed77f4462c8705b736@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Jessica Yu <jeyu@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In fast_isolate_freepages, high_pfn will be used if a prefered one (ie
PFN >= low_fn) not found.
But the high_pfn is not reset before searching an free area, so when it
was used as freepage, it may from another free area searched before. As
a result move_freelist_head(freelist, freepage) will have unexpected
behavior (eg corrupt the MOVABLE freelist)
Unable to handle kernel paging request at virtual address dead000000000200
Mem abort info:
ESR = 0x96000044
Exception class = DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
Data abort info:
ISV = 0, ISS = 0x00000044
CM = 0, WnR = 1
[dead000000000200] address between user and kernel address ranges
-000|list_cut_before(inline)
-000|move_freelist_head(inline)
-000|fast_isolate_freepages(inline)
-000|isolate_freepages(inline)
-000|compaction_alloc(?, ?)
-001|unmap_and_move(inline)
-001|migrate_pages([NSD:0xFFFFFF80088CBBD0] from = 0xFFFFFF80088CBD88, [NSD:0xFFFFFF80088CBBC8] get_new_p
-002|__read_once_size(inline)
-002|static_key_count(inline)
-002|static_key_false(inline)
-002|trace_mm_compaction_migratepages(inline)
-002|compact_zone(?, [NSD:0xFFFFFF80088CBCB0] capc = 0x0)
-003|kcompactd_do_work(inline)
-003|kcompactd([X19] p = 0xFFFFFF93227FBC40)
-004|kthread([X20] _create = 0xFFFFFFE1AFB26380)
-005|ret_from_fork(asm)
The issue was reported on an smart phone product with 6GB ram and 3GB
zram as swap device.
This patch fixes the issue by reset high_pfn before searching each free
area, which ensure freepage and freelist match when call
move_freelist_head in fast_isolate_freepages().
Link: http://lkml.kernel.org/r/20190118175136.31341-12-mgorman@techsingularity.net
Link: https://lkml.kernel.org/r/20210112094720.1238444-1-wu-yan@tcl.com
Fixes: 5a811889de ("mm, compaction: use free lists to quickly locate a migration target")
Signed-off-by: Rokudo Yan <wu-yan@tcl.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There is a race between isolate_huge_page() and __free_huge_page().
CPU0: CPU1:
if (PageHuge(page))
put_page(page)
__free_huge_page(page)
spin_lock(&hugetlb_lock)
update_and_free_page(page)
set_compound_page_dtor(page,
NULL_COMPOUND_DTOR)
spin_unlock(&hugetlb_lock)
isolate_huge_page(page)
// trigger BUG_ON
VM_BUG_ON_PAGE(!PageHead(page), page)
spin_lock(&hugetlb_lock)
page_huge_active(page)
// trigger BUG_ON
VM_BUG_ON_PAGE(!PageHuge(page), page)
spin_unlock(&hugetlb_lock)
When we isolate a HugeTLB page on CPU0. Meanwhile, we free it to the
buddy allocator on CPU1. Then, we can trigger a BUG_ON on CPU0, because
it is already freed to the buddy allocator.
Link: https://lkml.kernel.org/r/20210115124942.46403-5-songmuchun@bytedance.com
Fixes: c8721bbbdd ("mm: memory-hotplug: enable memory hotplug to handle hugepage")
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There is a race condition between __free_huge_page()
and dissolve_free_huge_page().
CPU0: CPU1:
// page_count(page) == 1
put_page(page)
__free_huge_page(page)
dissolve_free_huge_page(page)
spin_lock(&hugetlb_lock)
// PageHuge(page) && !page_count(page)
update_and_free_page(page)
// page is freed to the buddy
spin_unlock(&hugetlb_lock)
spin_lock(&hugetlb_lock)
clear_page_huge_active(page)
enqueue_huge_page(page)
// It is wrong, the page is already freed
spin_unlock(&hugetlb_lock)
The race window is between put_page() and dissolve_free_huge_page().
We should make sure that the page is already on the free list when it is
dissolved.
As a result __free_huge_page would corrupt page(s) already in the buddy
allocator.
Link: https://lkml.kernel.org/r/20210115124942.46403-4-songmuchun@bytedance.com
Fixes: c8721bbbdd ("mm: memory-hotplug: enable memory hotplug to handle hugepage")
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull nfsd fix from Chuck Lever:
"Fix non-page-aligned NFS READs"
* tag 'nfsd-5.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
SUNRPC: Fix NFS READs that start at non-page-aligned offsets
Pull KVM fixes from Paolo Bonzini:
"x86 has lots of small bugfixes, mostly one liners. It's quite late in
5.11-rc but none of them are related to this merge window; it's just
bugs coming in at the wrong time.
Of note among the others is "KVM: x86: Allow guests to see
MSR_IA32_TSX_CTRL even if tsx=off" that fixes a live migration failure
seen on distros that hadn't switched to tsx=off right away.
ARM:
- Avoid clobbering extra registers on initialisation"
[ Sean Christopherson notes that commit 943dea8af2 ("KVM: x86: Update
emulator context mode if SYSENTER xfers to 64-bit mode") should have
had authorship credited to Jonny Barker, not to him. - Linus ]
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: Set so called 'reserved CR3 bits in LM mask' at vCPU reset
KVM: x86/mmu: Fix TDP MMU zap collapsible SPTEs
KVM: x86: cleanup CR3 reserved bits checks
KVM: SVM: Treat SVM as unsupported when running as an SEV guest
KVM: x86: Update emulator context mode if SYSENTER xfers to 64-bit mode
KVM: x86: Supplement __cr4_reserved_bits() with X86_FEATURE_PCID check
KVM/x86: assign hva with the right value to vm_munmap the pages
KVM: x86: Allow guests to see MSR_IA32_TSX_CTRL even if tsx=off
Fix unsynchronized access to sev members through svm_register_enc_region
KVM: Documentation: Fix documentation for nested.
KVM: x86: fix CPUID entries returned by KVM_GET_CPUID2 ioctl
KVM: arm64: Don't clobber x4 in __do_hyp_init
Pull IOMMU fix from Joerg Roedel:
"Fix a possible NULL-ptr dereference in dev_iommu_priv_get() which is
too easy to accidentially trigger from IOMMU drivers.
In the current case the AMD IOMMU driver triggered it on some machines
in the IO-page-fault path, so fix it once and for all"
* tag 'iommu-fixes-v5.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu: Check dev->iommu in dev_iommu_priv_get() before dereferencing it
Pull vdpa fix from Michael Tsirkin:
"A bugfix in the mlx driver I got at the last minute"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
vdpa/mlx5: Restore the hardware used index after change map
Pull drm fixes from Dave Airlie:
"Fixes for rc7, bit bigger than I'd like at this stage, but most of the
i915 stuff and some amdgpu is destined for staging and I'd rather not
hold it up, the i915 changes also pulled in a few precusor code
movement patches to make things cleaner, but nothing seems that
horrible, and I've checked over all of it.
Otherwise there is a nouveau dma-api warning regression, and a ttm
page allocation warning fix, and some fixes for a bridge chip,
ttm:
- fix huge page warning regression
i915:
- Skip vswing programming for TBT
- Power up combo PHY lanes for HDMI
- Fix double YUV range correction on HDR planes
- Fix the MST PBN divider calculation
- Fix LTTPR vswing/pre-emp setting in non-transparent mode
- Move the breadcrumb to the signaler if completed upon cancel
- Close race between enable_breadcrumbs and cancel_breadcrumbs
- Drop lru bumping on display unpinning
amdgpu:
- Fix retry in gem create
- Vangogh fixes
- Fix for display from shared buffers
- Various display fixes
amdkfd:
- Fix regression in buffer free
nouveau:
- fix DMA API warning regression
drm/bridge/lontium-lt9611uxc:
- EDID fixes
- Don't handle hotplug events in IRQ handler"
* tag 'drm-fixes-2021-02-05-1' of git://anongit.freedesktop.org/drm/drm: (29 commits)
drm/nouveau: fix dma syncing warning with debugging on.
drm/amd/display: Decrement refcount of dc_sink before reassignment
drm/amd/display: Free atomic state after drm_atomic_commit
drm/amd/display: Fix dc_sink kref count in emulated_link_detect
drm/amd/display: Release DSC before acquiring
drm/amd/display: Revert "Fix EDID parsing after resume from suspend"
drm/amd/display: Add more Clock Sources to DCN2.1
drm/amd/display: reuse current context instead of recreating one
drm/amd/display: Fix DPCD translation for LTTPR AUX_RD_INTERVAL
drm/amdgpu: enable freesync for A+A configs
drm/amd/pm: fill in the data member of v2 gpu metrics table for vangogh
drm/amdgpu/gfx10: update CGTS_TCC_DISABLE and CGTS_USER_TCC_DISABLE register offsets for VGH
drm/amdkfd: fix null pointer panic while free buffer in kfd
drm/amdgpu: fix the issue that retry constantly once the buffer is oversize
drm/i915/dp: Fix LTTPR vswing/pre-emp setting in non-transparent mode
drm/i915/dp: Move intel_dp_set_signal_levels() to intel_dp_link_training.c
drm/i915: Fix the MST PBN divider calculation
drm/dp/mst: Export drm_dp_get_vc_payload_bw()
drm/i915/gem: Drop lru bumping on display unpinning
drm/i915/gt: Close race between enable_breadcrumbs and cancel_breadcrumbs
...
The bug fixed by commit e3fab2f3de ("ntp: Fix RTC synchronization on
32-bit platforms") revealed an underlying issue: RTC synchronization may
happen anytime, even while the system is partially suspended.
On systems where the RTC is connected to an I2C bus, the I2C bus controller
may already or still be suspended, triggering a WARNING during suspend or
resume from s2ram:
WARNING: CPU: 0 PID: 124 at drivers/i2c/i2c-core.h:54 __i2c_transfer+0x634/0x680
i2c i2c-6: Transfer while suspended
[...]
Workqueue: events_power_efficient sync_hw_clock
[...]
(__i2c_transfer)
(i2c_transfer)
(regmap_i2c_read)
...
(da9063_rtc_set_time)
(rtc_set_time)
(sync_hw_clock)
(process_one_work)
Fix this race condition by using the freezable instead of the normal
power-efficient workqueue.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Link: https://lore.kernel.org/r/20210125143039.1051912-1-geert+renesas@glider.be
When a change of memory map occurs, the hardware resources are destroyed
and then re-created again with the new memory map. In such case, we need
to restore the hardware available and used indices. The driver failed to
restore the used index which is added here.
Also, since the driver also fails to reset the available and used
indices upon device reset, fix this here to avoid regression caused by
the fact that used index may not be zero upon device reset.
Fixes: 1a86b377aa ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices")
Signed-off-by: Eli Cohen <elic@nvidia.com>
Link: https://lore.kernel.org/r/20210204073618.36336-1-elic@nvidia.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Currently we try to guess if a compound request is going to
succeed waiting for credits or not based on the number of
requests in flight. This approach doesn't work correctly
all the time because there may be only one request in
flight which is going to bring multiple credits satisfying
the compound request.
Change the behavior to fail a request only if there are no requests
in flight at all and proceed waiting for credits otherwise.
Cc: <stable@vger.kernel.org> # 5.1+
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Tom Talpey <tom@talpey.com>
Reviewed-by: Shyam Prasad N <nspmangalore@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
The original code put five u32 before a u64 expansion[10] array. Five is
odd, this will cause trouble in the extension of the structure by adding
new features. This patch moves to use u8 for reserved field to avoid
future alignment risk.
Meanwhile, it also clears the memory of struct map_benchmark in tools,
otherwise, if users use old version to run on newer kernel, the random
expansion value will cause side effect on newer kernel.
Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Giancarlo Ferrari reports the following oops while trying to use kexec:
Unable to handle kernel paging request at virtual address 80112f38
pgd = fd7ef03e
[80112f38] *pgd=0001141e(bad)
Internal error: Oops: 80d [#1] PREEMPT SMP ARM
...
This is caused by machine_kexec() trying to set the kernel text to be
read/write, so it can poke values into the relocation code before
copying it - and an interrupt occuring which changes the page tables.
The subsequent writes then hit read-only sections that trigger a
data abort resulting in the above oops.
Fix this by copying the relocation code, and then writing the variables
into the destination, thereby avoiding the need to make the kernel text
read/write.
Reported-by: Giancarlo Ferrari <giancarlo.ferrari89@gmail.com>
Tested-by: Giancarlo Ferrari <giancarlo.ferrari89@gmail.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Ensure that the signal page contains our poison instruction to increase
the protection against ROP attacks and also contains well defined
contents.
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
dwc2_hsotg_process_req_status uses ep_from_windex() to retrieve
the endpoint for the index provided in the wIndex request param.
In a test-case with a rndis gadget running and sending a malformed
packet to it like:
dev.ctrl_transfer(
0x82, # bmRequestType
0x00, # bRequest
0x0000, # wValue
0x0001, # wIndex
0x00 # wLength
)
it is possible to cause a crash:
[ 217.533022] dwc2 ff300000.usb: dwc2_hsotg_process_req_status: USB_REQ_GET_STATUS
[ 217.559003] Unable to handle kernel read from unreadable memory at virtual address 0000000000000088
...
[ 218.313189] Call trace:
[ 218.330217] ep_from_windex+0x3c/0x54
[ 218.348565] usb_gadget_giveback_request+0x10/0x20
[ 218.368056] dwc2_hsotg_complete_request+0x144/0x184
This happens because ep_from_windex wants to compare the endpoint
direction even if index_to_ep() didn't return an endpoint due to
the direction not matching.
The fix is easy insofar that the actual direction check is already
happening when calling index_to_ep() which will return NULL if there
is no endpoint for the targeted direction, so the offending check
can go away completely.
Fixes: c6f5c050e2 ("usb: dwc2: gadget: add bi-directional endpoint support")
Cc: stable@vger.kernel.org
Reported-by: Gerhard Klostermeier <gerhard.klostermeier@syss.de>
Signed-off-by: Heiko Stuebner <heiko.stuebner@theobroma-systems.com>
Link: https://lore.kernel.org/r/20210127103919.58215-1-heiko@sntech.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit fe8abf332b ("usb: dwc3: support clocks and resets for DWC3
core") introduced clock support and a new function named
dwc3_core_init_for_resume() which enables the clock before calling
dwc3_core_init() during resume as clocks get disabled during suspend.
Unfortunately in this commit the DWC3_GCTL_PRTCAP_OTG case was forgotten
and therefore during resume, a platform could call dwc3_core_init()
without re-enabling the clocks first, preventing to resume properly.
So update the resume path to call dwc3_core_init_for_resume() as it
should.
Fixes: fe8abf332b ("usb: dwc3: support clocks and resets for DWC3 core")
Cc: stable@vger.kernel.org
Signed-off-by: Gary Bisson <gary.bisson@boundarydevices.com>
Link: https://lore.kernel.org/r/20210125161934.527820-1-gary.bisson@boundarydevices.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
ARM randconfig builds with lld sometimes show a build failure
from kallsyms:
Inconsistent kallsyms data
Try make KALLSYMS_EXTRA_PASS=1 as a workaround
The problem is the veneers/thunks getting added by the linker extend
the symbol table, which in turn leads to more veneers being needed,
so it may take a few extra iterations to converge.
This bug has been fixed multiple times before, but comes back every time
a new symbol name is used. lld uses a different set of identifiers from
ld.bfd, so the additional ones need to be added as well.
I looked through the sources and found that arm64 and mips define similar
prefixes, so I'm adding those as well, aside from the ones I observed. I'm
not sure about powerpc64, which seems to already be handled through a
section match, but if it comes back, the "__long_branch_" and "__plt_"
prefixes would have to get added as well.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Sedat Dilek noticed duplicated flags in DEBUG_CFLAGS when building
deb-pkg with CONFIG_DEBUG_INFO. For example, 'make CC=clang bindeb-pkg'
reproduces the issue.
Kbuild recurses to the top Makefile for some targets such as package
builds.
With commit 121c5d08d5 ("kbuild: Only add -fno-var-tracking-assignments
for old GCC versions") applied, DEBUG_CFLAGS is now reset only when
CONFIG_CC_IS_GCC=y.
Fix it to reset DEBUG_CFLAGS all the time.
Fixes: 121c5d08d5 ("kbuild: Only add -fno-var-tracking-assignments for old GCC versions")
Reported-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Reviewed-by: Mark Wielaard <mark@klomp.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
1) Fix combination of --reap and --update in xt_recent that triggers
UAF, from Jozsef Kadlecsik.
2) Fix current year in nft_meta selftest, from Fabian Frederick.
3) Fix possible UAF in the netns destroy path of nftables.
4) Fix incorrect checksum calculation when mangling ports in flowtable,
from Sven Auhagen.
* git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf:
netfilter: flowtable: fix tcp and udp header checksum update
netfilter: nftables: fix possible UAF over chains from packet path in netns
selftests: netfilter: fix current year
netfilter: xt_recent: Fix attempt to update deleted entry
====================
Link: https://lore.kernel.org/r/20210205001727.2125-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Michael tried to enable Advanced Error Reporting through the ENETC's
Root Complex Event Collector, and the system started spitting out single
bit correctable ECC errors coming from the ENETC interfaces:
pcieport 0000:00:1f.0: AER: Multiple Corrected error received: 0000:00:00.0
fsl_enetc 0000:00:00.0: PCIe Bus Error: severity=Corrected, type=Transaction Layer, (Receiver ID)
fsl_enetc 0000:00:00.0: device [1957:e100] error status/mask=00004000/00000000
fsl_enetc 0000:00:00.0: [14] CorrIntErr
fsl_enetc 0000:00:00.1: PCIe Bus Error: severity=Corrected, type=Transaction Layer, (Receiver ID)
fsl_enetc 0000:00:00.1: device [1957:e100] error status/mask=00004000/00000000
fsl_enetc 0000:00:00.1: [14] CorrIntErr
Further investigating the port correctable memory error detect register
(PCMEDR) shows that these AER errors have an associated SOURCE_ID of 6
(RFS/RSS):
$ devmem 0x1f8010e10 32
0xC0000006
$ devmem 0x1f8050e10 32
0xC0000006
Discussion with the hardware design engineers reveals that on LS1028A,
the hardware does not do initialization of that RFS/RSS memory, and that
software should clear/initialize the entire table before starting to
operate. That comes as a bit of a surprise, since the driver does not do
initialization of the RFS memory. Also, the initialization of the
Receive Side Scaling is done only partially.
Even though the entire ENETC IP has a single shared flow steering
memory, the flow steering service should returns matches only for TCAM
entries that are within the range of the Station Interface that is doing
the search. Therefore, it should be sufficient for a Station Interface
to initialize all of its own entries in order to avoid any ECC errors,
and only the Station Interfaces in use should need initialization.
There are Physical Station Interfaces associated with PCIe PFs and
Virtual Station Interfaces associated with PCIe VFs. We let the PF
driver initialize the entire port's memory, which includes the RFS
entries which are going to be used by the VF.
Reported-by: Michael Walle <michael@walle.cc>
Fixes: d4fd0404c1 ("enetc: Introduce basic PF and VF ENETC ethernet drivers")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Michael Walle <michael@walle.cc>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Link: https://lore.kernel.org/r/20210204134511.2640309-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In gsi_channel_setup(), we check to see if the configuration data
contains any information about channels that are not supported by
the hardware. If one is found, we abort the setup process, but
the error code (ret) is not set in this case. Fix this bug.
Fixes: 650d160382 ("soc: qcom: ipa: the generic software interface")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Alex Elder <elder@linaro.org>
Link: https://lore.kernel.org/r/20210204010655.15619-1-elder@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
At the end of rxrpc_release_call(), rxrpc_cleanup_ring() is called to clear
the Rx/Tx skbuff ring, but this doesn't lock the ring whilst it's accessing
it. Unfortunately, rxrpc_resend() might be trying to retransmit a packet
concurrently with this - and whilst it does lock the ring, this isn't
protection against rxrpc_cleanup_call().
Fix this by removing the call to rxrpc_cleanup_ring() from
rxrpc_release_call(). rxrpc_cleanup_ring() will be called again anyway
from rxrpc_cleanup_call(). The earlier call is just an optimisation to
recycle skbuffs more quickly.
Alternative solutions include rxrpc_release_call() could try to cancel the
work item or wait for it to complete or rxrpc_cleanup_ring() could lock
when accessing the ring (which would require a bh lock).
This can produce a report like the following:
BUG: KASAN: use-after-free in rxrpc_send_data_packet+0x19b4/0x1e70 net/rxrpc/output.c:372
Read of size 4 at addr ffff888011606e04 by task kworker/0:0/5
...
Workqueue: krxrpcd rxrpc_process_call
Call Trace:
...
kasan_report.cold+0x79/0xd5 mm/kasan/report.c:413
rxrpc_send_data_packet+0x19b4/0x1e70 net/rxrpc/output.c:372
rxrpc_resend net/rxrpc/call_event.c:266 [inline]
rxrpc_process_call+0x1634/0x1f60 net/rxrpc/call_event.c:412
process_one_work+0x98d/0x15f0 kernel/workqueue.c:2275
...
Allocated by task 2318:
...
sock_alloc_send_pskb+0x793/0x920 net/core/sock.c:2348
rxrpc_send_data+0xb51/0x2bf0 net/rxrpc/sendmsg.c:358
rxrpc_do_sendmsg+0xc03/0x1350 net/rxrpc/sendmsg.c:744
rxrpc_sendmsg+0x420/0x630 net/rxrpc/af_rxrpc.c:560
...
Freed by task 2318:
...
kfree_skb+0x140/0x3f0 net/core/skbuff.c:704
rxrpc_free_skb+0x11d/0x150 net/rxrpc/skbuff.c:78
rxrpc_cleanup_ring net/rxrpc/call_object.c:485 [inline]
rxrpc_release_call+0x5dd/0x860 net/rxrpc/call_object.c:552
rxrpc_release_calls_on_socket+0x21c/0x300 net/rxrpc/call_object.c:579
rxrpc_release_sock net/rxrpc/af_rxrpc.c:885 [inline]
rxrpc_release+0x263/0x5a0 net/rxrpc/af_rxrpc.c:916
__sock_release+0xcd/0x280 net/socket.c:597
...
The buggy address belongs to the object at ffff888011606dc0
which belongs to the cache skbuff_head_cache of size 232
Fixes: 248f219cb8 ("rxrpc: Rewrite the data and ack handling code")
Reported-by: syzbot+174de899852504e4a74a@syzkaller.appspotmail.com
Reported-by: syzbot+3d1c772efafd3c38d007@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Hillf Danton <hdanton@sina.com>
Link: https://lore.kernel.org/r/161234207610.653119.5287360098400436976.stgit@warthog.procyon.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
drm/i915 fixes for v5.11-rc7:
- Skip vswing programming for TBT
- Power up combo PHY lanes for HDMI
- Fix double YUV range correction on HDR planes
- Fix the MST PBN divider calculation
- Fix LTTPR vswing/pre-emp setting in non-transparent mode
- Move the breadcrumb to the signaler if completed upon cancel
- Close race between enable_breadcrumbs and cancel_breadcrumbs
- Drop lru bumping on display unpinning
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/87bld0f36b.fsf@intel.com
Since SQPOLL task can be shared and so task_work entries can be a mix of
them, we need to drop mm and files before trying to issue next request.
Cc: stable@vger.kernel.org # 5.10+
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jan Kiszka reported that the x2apic_wrmsr_fence() function uses a plain
MFENCE while the Intel SDM (10.12.3 MSR Access in x2APIC Mode) calls for
MFENCE; LFENCE.
Short summary: we have special MSRs that have weaker ordering than all
the rest. Add fencing consistent with current SDM recommendations.
This is not known to cause any issues in practice, only in theory.
Longer story below:
The reason the kernel uses a different semantic is that the SDM changed
(roughly in late 2017). The SDM changed because folks at Intel were
auditing all of the recommended fences in the SDM and realized that the
x2apic fences were insufficient.
Why was the pain MFENCE judged insufficient?
WRMSR itself is normally a serializing instruction. No fences are needed
because the instruction itself serializes everything.
But, there are explicit exceptions for this serializing behavior written
into the WRMSR instruction documentation for two classes of MSRs:
IA32_TSC_DEADLINE and the X2APIC MSRs.
Back to x2apic: WRMSR is *not* serializing in this specific case.
But why is MFENCE insufficient? MFENCE makes writes visible, but
only affects load/store instructions. WRMSR is unfortunately not a
load/store instruction and is unaffected by MFENCE. This means that a
non-serializing WRMSR could be reordered by the CPU to execute before
the writes made visible by the MFENCE have even occurred in the first
place.
This means that an x2apic IPI could theoretically be triggered before
there is any (visible) data to process.
Does this affect anything in practice? I honestly don't know. It seems
quite possible that by the time an interrupt gets to consume the (not
yet) MFENCE'd data, it has become visible, mostly by accident.
To be safe, add the SDM-recommended fences for all x2apic WRMSRs.
This also leaves open the question of the _other_ weakly-ordered WRMSR:
MSR_IA32_TSC_DEADLINE. While it has the same ordering architecture as
the x2APIC MSRs, it seems substantially less likely to be a problem in
practice. While writes to the in-memory Local Vector Table (LVT) might
theoretically be reordered with respect to a weakly-ordered WRMSR like
TSC_DEADLINE, the SDM has this to say:
In x2APIC mode, the WRMSR instruction is used to write to the LVT
entry. The processor ensures the ordering of this write and any
subsequent WRMSR to the deadline; no fencing is required.
But, that might still leave xAPIC exposed. The safest thing to do for
now is to add the extra, recommended LFENCE.
[ bp: Massage commit message, fix typos, drop accidentally added
newline to tools/arch/x86/include/asm/barrier.h. ]
Reported-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20200305174708.F77040DD@viggo.jf.intel.com
Pull ACPI fix from Rafael Wysocki:
"Address recent regression causing battery devices to be never bound to
a driver on some systems (Hans de Goede)"
* tag 'acpi-5.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: scan: Fix battery devices sometimes never binding
Pull overlayfs fixes from Miklos Szeredi:
- Fix capability conversion and minor overlayfs bugs that are related
to the unprivileged overlay mounts introduced in this cycle.
- Fix two recent (v5.10) and one old (v4.10) bug.
- Clean up security xattr copy-up (related to a SELinux regression).
* tag 'ovl-fixes-5.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
ovl: implement volatile-specific fsync error behaviour
ovl: skip getxattr of security labels
ovl: fix dentry leak in ovl_get_redirect
ovl: avoid deadlock on directory ioctl
cap: fix conversions on getxattr
ovl: perform vfs_getxattr() with mounter creds
ovl: add warning on user_ns mismatch
Set cr3_lm_rsvd_bits, which is effectively an invalid GPA mask, at vCPU
reset. The reserved bits check needs to be done even if userspace never
configures the guest's CPUID model.
Cc: stable@vger.kernel.org
Fixes: 0107973a80 ("KVM: x86: Introduce cr3_lm_rsvd_bits in kvm_vcpu_arch")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210204000117.3303214-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Pull NVMe fixes from Christoph.
* 'nvme-5.11' of git://git.infradead.org/nvme:
nvmet-tcp: fix out-of-bounds access when receiving multiple h2cdata PDUs
update the email address for Keith Bush
nvme-pci: ignore the subsysem NQN on Phison E16
nvme-pci: avoid the deepest sleep state on Kingston A2000 SSDs
There is a bug in the TDP MMU function to zap SPTEs which could be
replaced with a larger mapping which prevents the function from doing
anything. Fix this by correctly zapping the last level SPTEs.
Cc: stable@vger.kernel.org
Fixes: 1488199856 ("kvm: x86/mmu: Support disabling dirty logging for the tdp MMU")
Signed-off-by: Ben Gardon <bgardon@google.com>
Message-Id: <20210202185734.1680553-11-bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When updating the tcp or udp header checksum on port nat the function
inet_proto_csum_replace2 with the last parameter pseudohdr as true.
This leads to an error in the case that GRO is used and packets are
split up in GSO. The tcp or udp checksum of all packets is incorrect.
The error is probably masked due to the fact the most network driver
implement tcp/udp checksum offloading. It also only happens when GRO is
applied and not on single packets.
The error is most visible when using a pppoe connection which is not
triggering the tcp/udp checksum offload.
Fixes: ac2a66665e ("netfilter: add generic flow table infrastructure")
Signed-off-by: Sven Auhagen <sven.auhagen@voleatech.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Although hooks are released via call_rcu(), chain and rule objects are
immediately released while packets are still walking over these bits.
This patch adds the .pre_exit callback which is invoked before
synchronize_rcu() in the netns framework to stay safe.
Remove a comment which is not valid anymore since the core does not use
synchronize_net() anymore since 8c873e2199 ("netfilter: core: free
hooks with call_rcu").
Suggested-by: Florian Westphal <fw@strlen.de>
Fixes: df05ef874b ("netfilter: nf_tables: release objects on netns destruction")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
use date %Y instead of %G to read current year
Problem appeared when running lkp-tests on 01/01/2021
Fixes: 48d072c4e8 ("selftests: netfilter: add time counter check")
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
When both --reap and --update flag are specified, there's a code
path at which the entry to be updated is reaped beforehand,
which then leads to kernel crash. Reap only entries which won't be
updated.
Fixes kernel bugzilla #207773.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=207773
Reported-by: Reindl Harald <h.reindl@thelounge.net>
Fixes: 0079c5aee3 ("netfilter: xt_recent: add an entry reaper")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Since I wrote the below patch if you run a debug kernel you can a
dma debug warning like:
nouveau 0000:1f:00.0: DMA-API: device driver tries to sync DMA memory it has not allocated [device address=0x000000016e012000] [size=4096 bytes]
The old nouveau code wasn't consolidate the pages like the ttm code,
but the dma-debug expects the sync code to give it the same base/range
pairs as the allocator.
Fix the nouveau sync code to consolidate pages before calling the
sync code.
Fixes: bd549d35b4 ("nouveau: use ttm populate mapping functions. (v2)")
Reported-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/417588/
Pull UML fixes from Richard Weinberger:
- Make sure to set a default console, otherwise ttynull is selected
- Revert initial ARCH_HAS_SET_MEMORY support, this needs more work
- Fix a regression due to ubd refactoring
- Various small fixes
* tag 'for-linus-5.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
um: time: fix initialization in time-travel mode
um: fix os_idle_sleep() to not hang
Revert "um: support some of ARCH_HAS_SET_MEMORY"
Revert "um: allocate a guard page to helper threads"
um: virtio: free vu_dev only with the contained struct device
um: kmsg_dumper: always dump when not tty console
um: stdio_console: Make preferred console
um: return error from ioremap()
um: ubd: fix command line handling of ubd
Pull arm64 fixes from Catalin Marinas:
"Fix the arm64 linear map range detection for tagged addresses and
replace the bitwise operations with subtract (virt_addr_valid(),
__is_lm_address(), __lm_to_phys())"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: Use simpler arithmetics for the linear map macros
arm64: Do not pass tagged addresses to __is_lm_address()
Pull tracing fixes from Steven Rostedt:
- Initialize tracing-graph-pause at task creation, not start of
function tracing, to avoid corrupting the pause counter.
- Set "pause-on-trace" for latency tracers as that option breaks their
output (regression).
- Fix the wrong error return for setting kretprobes on future modules
(before they are loaded).
- Fix re-registering the same kretprobe.
- Add missing value check for added RCU variable reload.
* tag 'trace-v5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracepoint: Fix race between tracing and removing tracepoint
kretprobe: Avoid re-registration of the same kretprobe earlier
tracing/kprobe: Fix to support kretprobe events on unloaded modules
tracing: Use pause-on-trace with the latency tracers
fgraph: Initialize tracing_graph_pause at task creation
Pull ARM SoC fixes from Arnd Bergmann:
"The code fixes in this round are all for the Texas Instruments OMAP
platform, addressing several regressions related to the ti-sysc
interconnect changes that was merged in linux-5.11 and one recently
introduced RCU usage warning.
Tero Kristo updates his maintainer file entries as he is changing to a
new employer.
The other changes are for devicetree files across eight different
platforms:
TI OMAP:
- multiple gpio related one-line fixes
Allwinner/sunxi:
- ARM: dts: sun7i: a20: bananapro: Fix ethernet phy-mode
- soc: sunxi: mbus: Remove DE2 display engine compatibles
NXP lpc32xx:
- ARM: dts: lpc32xx: Revert set default clock rate of HCLK PLL
STMicroelectronics stm32
- multiple minor fixes for DHCOM/DHCOR boards
NXP Layerscape:
- Fix DCFG address range on LS1046A SoC
Amlogic meson:
- fix reboot issue on odroid C4
- revert an ethernet change that caused a regression
- meson-g12: Set FL-adj property value
Rockchip:
- multiple minor fixes on 64-bit rockchip machines
Qualcomm:
- Regression fixes for Lenovo Yoga touchpad and for interconnect
configuration
- Boot fixes for 'LPASS' clock configuration on two machines"
* tag 'arm-soc-fixes-v5.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (31 commits)
ARM: dts: lpc32xx: Revert set default clock rate of HCLK PLL
ARM: dts: sun7i: a20: bananapro: Fix ethernet phy-mode
arm64: dts: ls1046a: fix dcfg address range
soc: sunxi: mbus: Remove DE2 display engine compatibles
arm64: dts: meson: switch TFLASH_VDD_EN pin to open drain on Odroid-C4
Revert "arm64: dts: amlogic: add missing ethernet reset ID"
arm64: dts: rockchip: Disable display for NanoPi R2S
ARM: dts: omap4-droid4: Fix lost keypad slide interrupts for droid4
arm64: dts: rockchip: remove interrupt-names property from rk3399 vdec node
drivers: bus: simple-pm-bus: Fix compatibility with simple-bus for auxdata
ARM: OMAP2+: Fix booting for am335x after moving to simple-pm-bus
ARM: OMAP2+: Fix suspcious RCU usage splats for omap_enter_idle_coupled
ARM: dts: stm32: Fix GPIO hog flags on DHCOM DRC02
ARM: dts: stm32: Fix GPIO hog flags on DHCOM PicoITX
ARM: dts: stm32: Fix GPIO hog names on DHCOM
ARM: dts: stm32: Disable optional TSC2004 on DRC02 board
ARM: dts: stm32: Disable WP on DHCOM uSD slot
ARM: dts: stm32: Connect card-detect signal on DHCOM
ARM: dts: stm32: Fix polarity of the DH DRC02 uSD card detect
arm64: dts: qcom: sdm845: Reserve LPASS clocks in gcc
...
Pull gpio fixes from Bartosz Golaszewski:
"Some more fixes from the GPIO subsystem for this release. This time
it's only core fixes:
- fix a memory leak in error path in gpiolib
- clear debounce period in output mode in the character device code
- remove shadowed variable"
* tag 'gpio-fixes-for-v5.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpio: gpiolib: remove shadowed variable
gpiolib: free device name on error path to fix kmemleak
gpiolib: cdev: clear debounce period if line set to output
Pull x86 platform driver fixes from Hans de Goede:
"Two last minute small but important fixes.
The hp-wmi change fixes an issue which is actively being hit by users:
https://bugzilla.redhat.com/show_bug.cgi?id=1918255https://gitlab.gnome.org/GNOME/gnome-shell/-/issues/3564
And the dell-wmi-sysman patch fixes a bug in the new dell-wmi-sysman
driver which causes some systems to hang at boot when the driver
loads"
* tag 'platform-drivers-x86-v5.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/x86: dell-wmi-sysman: fix a NULL pointer dereference
platform/x86: hp-wmi: Disable tablet-mode reporting by default
When the host sends multiple h2cdata PDUs, we keep track on
the receive progress and calculate the scatterlist index and
offsets.
The issue is that sg_offset should only be kept for the first
iov entry we map in the iovec as this is the difference between
our cursor and the sg entry offset itself.
In addition, the sg index was calculated wrong because we should
not round up when dividing the command byte offset with PAG_SIZE.
Fixes: 872d26a391 ("nvmet-tcp: add NVMe over TCP target driver")
Reported-by: Narayan Ayalasomayajula <Narayan.Ayalasomayajula@wdc.com>
Tested-by: Narayan Ayalasomayajula <Narayan.Ayalasomayajula@wdc.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
The commit 0d00449c7a ("x86: Replace ist_enter() with nmi_enter()")
converted do_int3 handler to be "NMI-like".
That made old if (in_nmi()) check abort execution of bpf programs
attached to kprobe when kprobe is firing via int3
(For example when kprobe is placed in the middle of the function).
Remove the check to restore user visible behavior.
Fixes: 0d00449c7a ("x86: Replace ist_enter() with nmi_enter()")
Reported-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Link: https://lore.kernel.org/bpf/20210203070636.70926-1-alexei.starovoitov@gmail.com
xhci driver may in some special cases need to copy small amounts
of payload data to a bounce buffer in order to meet the boundary
and alignment restrictions set by the xHCI specification.
In the majority of these cases the data is in a sg list, and
driver incorrectly assumed data is always in urb->sg when using
the bounce buffer.
If data instead is contiguous, and in urb->transfer_buffer, we may still
need to bounce buffer a small part if data starts very close (less than
packet size) to a 64k boundary.
Check if sg list is used before copying data to/from it.
Fixes: f9c589e142 ("xhci: TD-fragment, align the unsplittable case with a bounce buffer")
Cc: stable@vger.kernel.org
Reported-by: Andreas Hartmann <andihartmann@01019freenet.de>
Tested-by: Andreas Hartmann <andihartmann@01019freenet.de>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20210203113702.436762-2-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When running xrandr to change resolution of DP, the kmemleak as below
can be observed:
unreferenced object 0xffff00080a351000 (size 256):
comm "Xorg", pid 248, jiffies 4294899614 (age 19.960s)
hex dump (first 32 bytes):
98 a0 bc 01 08 00 ff ff 01 00 00 00 00 00 00 00 ................
ff ff ff ff 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<00000000e0bd0f69>] kmemleak_alloc+0x30/0x40
[<00000000cde2f318>] kmem_cache_alloc+0x3d4/0x588
[<0000000088ea9bd7>] drm_atomic_helper_setup_commit+0x84/0x5f8
[<000000002290a264>] drm_atomic_helper_commit+0x58/0x388
[<00000000f6ea78c3>] drm_atomic_commit+0x4c/0x60
[<00000000c8e0725e>] drm_atomic_connector_commit_dpms+0xe8/0x110
[<0000000020ade187>] drm_mode_obj_set_property_ioctl+0x1b0/0x450
[<00000000918206d6>] drm_connector_property_set_ioctl+0x3c/0x68
[<000000008d51e7a5>] drm_ioctl_kernel+0xc4/0x118
[<000000002a819b75>] drm_ioctl+0x214/0x448
[<000000008ca4e588>] __arm64_sys_ioctl+0xa8/0xf0
[<0000000034e15a35>] el0_svc_common.constprop.0+0x74/0x190
[<000000001b93d916>] do_el0_svc+0x24/0x90
[<00000000ce9230e0>] el0_svc+0x14/0x20
[<00000000e3607d82>] el0_sync_handler+0xb0/0xb8
[<000000003e79c15f>] el0_sync+0x174/0x180
This is because there is a scenario that a drm_crtc_commit commit is
allocated but not freed. The drm subsystem require/release references
to a CRTC commit by calling drm_crtc_commit_get/put, and when
drm_crtc_commit_put find that commit.ref.refcount is zero, it will
call __drm_crtc_commit_free to free this CRTC commit. Among these
drm_crtc_commit_get/put pairs, there is a drm_crtc_commit_get in
drm_atomic_helper_setup_commit as below:
...
new_crtc_state->event->base.completion = &commit->flip_done;
new_crtc_state->event->base.completion_release = release_crtc_commit;
drm_crtc_commit_get(commit);
...
This reference to the CRTC commit should be released at the function
release_crtc_commit by calling e->completion_release(e->completion) in
drm_send_event_locked. So we need to call drm_send_event_locked at
two places: handling vblank event in the irq handler and the crtc disable
helper. But in zynqmp_disp_crtc_atomic_disable, it only marks the flip
is done and not call drm_crtc_commit_put. This result that the refcount
of this commit is always non-zero and this commit will never be freed.
Since the function drm_crtc_send_vblank_event has operations both sending
a flip_done signal and releasing reference to the CRTC commit, let's use
it instead.
Signed-off-by: Quanyang Wang <quanyang.wang@windriver.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210202064121.173362-1-quanyang.wang@windriver.com
Older ATF does not provide SMC call for USB 3.0 phy power on functionality
and therefore initialization of xhci-hcd is failing when older version of
ATF is used. In this case phy_power_on() function returns -EOPNOTSUPP.
[ 3.108467] mvebu-a3700-comphy d0018300.phy: unsupported SMC call, try updating your firmware
[ 3.117250] phy phy-d0018300.phy.0: phy poweron failed --> -95
[ 3.123465] xhci-hcd: probe of d0058000.usb failed with error -95
This patch introduces a new plat_setup callback for xhci platform drivers
which is called prior calling usb_add_hcd() function. This function at its
beginning skips PHY init if hcd->skip_phy_initialization is set.
Current init_quirk callback for xhci platform drivers is called from
xhci_plat_setup() function which is called after chip reset completes.
It happens in the middle of the usb_add_hcd() function and therefore this
callback cannot be used for setting if PHY init should be skipped or not.
For Armada 3720 this patch introduce a new xhci_mvebu_a3700_plat_setup()
function configured as a xhci platform plat_setup callback. This new
function calls phy_power_on() and in case it returns -EOPNOTSUPP then
XHCI_SKIP_PHY_INIT quirk is set to instruct xhci-plat to skip PHY
initialization.
This patch fixes above failure by ignoring 'not supported' error in
xhci-hcd driver. In this case it is expected that phy is already power on.
It fixes initialization of xhci-hcd on Espressobin boards where is older
Marvell's Arm Trusted Firmware without SMC call for USB 3.0 phy power.
This is regression introduced in commit bd3d25b073 ("arm64: dts: marvell:
armada-37xx: link USB hosts with their PHYs") where USB 3.0 phy was defined
and therefore xhci-hcd on Espressobin with older ATF started failing.
Fixes: bd3d25b073 ("arm64: dts: marvell: armada-37xx: link USB hosts with their PHYs")
Cc: <stable@vger.kernel.org> # 5.1+: ea17a0f153: phy: marvell: comphy: Convert internal SMCC firmware return codes to errno
Cc: <stable@vger.kernel.org> # 5.1+: f768e71891: usb: host: xhci-plat: add priv quirk for skip PHY initialization
Tested-by: Tomasz Maciej Nowak <tmn505@gmail.com>
Tested-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> # On R-Car
Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> # xhci-plat
Acked-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Pali Rohár <pali@kernel.org>
Link: https://lore.kernel.org/r/20210201150803.7305-1-pali@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
One fix for a phy-mode ethernet issue, and one to fix the display output on
SoCs with the Display Engine 2
* tag 'sunxi-fixes-for-5.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux:
ARM: dts: sun7i: a20: bananapro: Fix ethernet phy-mode
soc: sunxi: mbus: Remove DE2 display engine compatibles
Link: https://lore.kernel.org/r/f8298059-f9ca-43b4-9e29-35bc0e0c9b15.lettre@localhost
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
This reverts commit c17e9377aa.
The lpc32xx clock driver is not able to actually change the PLL rate as
this would require reparenting ARM_CLK, DDRAM_CLK, PERIPH_CLK to SYSCLK,
then stop the PLL, update the register, restart the PLL and wait for the
PLL to lock and finally reparent ARM_CLK, DDRAM_CLK, PERIPH_CLK to HCLK
PLL.
Currently, the HCLK driver simply updates the registers but this has no
real effect and all the clock rate calculation end up being wrong. This is
especially annoying for the peripheral (e.g. UARTs, I2C, SPI).
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Tested-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Link: https://lore.kernel.org/r/20210203090320.GA3760268@piout.net'
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
BPi Pro needs TX and RX delay for Gbit to work reliable and avoid high
packet loss rates. The realtek phy driver overrides the settings of the
pull ups for the delays, so fix this for BananaPro.
Fix the phy-mode description to correctly reflect this so that the
implementation doesn't reconfigure the delays incorrectly. This
happened with commit bbc4d71d63 ("net: phy: realtek: fix rtl8211e
rx/tx delay config").
Fixes: 10662a33dc ("ARM: dts: sun7i: Add dts file for Bananapro board")
Signed-off-by: Hermann Lauer <Hermann.Lauer@uni-heidelberg.de>
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Link: https://lore.kernel.org/r/20210128111842.GA11919@lemon.iwr.uni-heidelberg.de
If not in long mode, the low bits of CR3 are reserved but not enforced to
be zero, so remove those checks. If in long mode, however, the MBZ bits
extend down to the highest physical address bit of the guest, excluding
the encryption bit.
Make the checks consistent with the above, and match them between
nested_vmcb_checks and KVM_SET_SREGS.
Cc: stable@vger.kernel.org
Fixes: 761e416934 ("KVM: nSVM: Check that MBZ bits in CR3 and CR4 are not set on vmrun of nested guests")
Fixes: a780a3ea62 ("KVM: X86: Fix reserved bits check for MOV to CR3")
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Don't let KVM load when running as an SEV guest, regardless of what
CPUID says. Memory is encrypted with a key that is not accessible to
the host (L0), thus it's impossible for L0 to emulate SVM, e.g. it'll
see garbage when reading the VMCB.
Technically, KVM could decrypt all memory that needs to be accessible to
the L0 and use shadow paging so that L0 does not need to shadow NPT, but
exposing such information to L0 largely defeats the purpose of running as
an SEV guest. This can always be revisited if someone comes up with a
use case for running VMs inside SEV guests.
Note, VMLOAD, VMRUN, etc... will also #GP on GPAs with C-bit set, i.e. KVM
is doomed even if the SEV guest is debuggable and the hypervisor is willing
to decrypt the VMCB. This may or may not be fixed on CPUs that have the
SVME_ADDR_CHK fix.
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210202212017.2486595-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This reverts commit 6d4d273588.
bfq.limit_depth passes word_depths[] as shallow_depth down to sbitmap core
sbitmap_get_shallow, which uses just the number to limit the scan depth of
each bitmap word, formula:
scan_percentage_for_each_word = shallow_depth / (1 << sbimap->shift) * 100%
That means the comments's percentiles 50%, 75%, 18%, 37% of bfq are correct.
But after commit patch 'bfq: Fix computation of shallow depth', we use
sbitmap.depth instead, as a example in following case:
sbitmap.depth = 256, map_nr = 4, shift = 6; sbitmap_word.depth = 64.
The resulsts of computed bfqd->word_depths[] are {128, 192, 48, 96}, and
three of the numbers exceed core dirver's 'sbitmap_word.depth=64' limit
nothing.
Signed-off-by: Lin Feng <linf@wangsu.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
MAXPHYSMEM_1GB option was added for RV32 because RV32 only supports 1GB
of maximum physical memory. This lead to few compilation errors reported
by kernel test robot which created the following configuration combination
which are not useful but can be configured.
1. MAXPHYSMEM_1GB & RV64
2, MAXPHYSMEM_2GB & RV32
Fix this by restricting MAXPHYSMEM_1GB for RV32 and MAXPHYSMEM_2GB only for
RV64.
Fixes: e557793799 ("RISC-V: Fix maximum allowed phsyical memory for RV32")
Cc: stable@vger.kernel.org
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Atish Patra <atish.patra@wdc.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Allows the sections to be aligned on smaller boundaries and
therefore results in a smaller kernel image size.
Signed-off-by: Sebastien Van Cauwenberghe <svancau@gmail.com>
Reviewed-by: Atish Patra <atish.patra@wdc.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
virt_addr_valid macro checks that a virtual address is valid, ie that
the address belongs to the linear mapping and that the corresponding
physical page exists.
Add the missing check that ensures the virtual address belongs to the
linear mapping, otherwise __virt_to_phys, when compiled with
CONFIG_DEBUG_VIRTUAL enabled, raises a WARN that is interpreted as a
kernel bug by syzbot.
Signed-off-by: Alexandre Ghiti <alex@ghiti.fr>
Reviewed-by: Atish Patra <atish.patra@wdc.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
For the same reason as commit 51839e29cb ("scripts: switch explicitly
to Python 3"), switch some more scripts, which I tested and confirmed
working on Python 3.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Nathan Chancellor <nathan@kernel.org>
[why]
An old dc_sink state is causing a memory leak because it is missing a
dc_sink_release before a new dc_sink is assigned back to
aconnector->dc_sink.
[how]
Decrement the dc_sink refcount before reassigning it to a new dc_sink.
Signed-off-by: Victor Lu <victorchengchi.lu@amd.com>
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Acked-by: Anson Jacob <Anson.Jacob@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[why]
drm_atomic_commit was changed so that the caller must free their
drm_atomic_state reference on successes.
[how]
Add drm_atomic_commit_put after drm_atomic_commit call in
dm_force_atomic_commit.
Signed-off-by: Victor Lu <victorchengchi.lu@amd.com>
Reviewed-by: Roman Li <Roman.Li@amd.com>
Acked-by: Anson Jacob <Anson.Jacob@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[why]
Need to unassign DSC from pipes that are not using it
so other pipes can acquire it. That is needed for
asic's that have unmatching number of DSC engines from
the number of pipes.
[how]
Before acquiring dsc to stream resources, first remove it.
Signed-off-by: Mikita Lipski <mikita.lipski@amd.com>
Reviewed-by: Eryk Brol <Eryk.Brol@amd.com>
Acked-by: Anson Jacob <Anson.Jacob@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
Currently we discard the current context and recreate it. The current
context is what is applied to the HW so we should be re-using this
rather than creating a new context.
Recreating the context can lead to mismatch between new context and the
current context
For example: gsl groups get changed when we create a new context this
can cause issues in a multi display config (with flip immediate) because
we don't align the existing gsl groups in the new and current context.
If we reuse the current context the gsl group assignment stays the same.
[How]
Instead of discarding the current context, we instead just copy the
current state and add/remove planes and streams.
Signed-off-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Reviewed-by: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com>
Acked-by: Anson Jacob <Anson.Jacob@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Some newer APUs can scanout directly from GTT, that saves us from
allocating another bounce buffer in VRAM and enables freesync in such
configurations.
Without this patch creating a framebuffer from the imported BO will
fail and userspace will fall back to a copy.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Shashank Sharma <shashank.sharma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
For Vangogh:
The offset of the CGTS_TCC_DISABLE is 0x5006 by calculation.
The offset of the CGTS_USER_TCC_DISABLE is 0x5007 by calculation.
Signed-off-by: chen gong <curry.gong@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Pull clang-format update from Miguel Ojeda:
"Update with the latest for_each macro list"
* tag 'clang-format-for-linux-v5.11-rc7' of git://github.com/ojeda/linux:
clang-format: Update with the latest for_each macro list
Pull dma-mapping fix from Christoph Hellwig:
"Fix a kernel crash in the new dma-mapping benchmark test (Barry Song)"
* tag 'dma-mapping-5.11-1' of git://git.infradead.org/users/hch/dma-mapping:
dma-mapping: benchmark: fix kernel crash when dma_map_single fails
Pull vdpa fix from Michael Tsirkin:
"A single mlx bugfix"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
vdpa/mlx5: Fix memory key MTT population
Pull networking fixes from Jakub Kicinski:
"Networking fixes for 5.11-rc7, including fixes from bpf and mac80211
trees.
Current release - regressions:
- ip_tunnel: fix mtu calculation
- mlx5: fix function calculation for page trees
Previous releases - regressions:
- vsock: fix the race conditions in multi-transport support
- neighbour: prevent a dead entry from updating gc_list
- dsa: mv88e6xxx: override existent unicast portvec in port_fdb_add
Previous releases - always broken:
- bpf, cgroup: two copy_{from,to}_user() warn_on_once splats for BPF
cgroup getsockopt infra when user space is trying to race against
optlen, from Loris Reiff.
- bpf: add missing fput() in BPF inode storage map update helper
- udp: ipv4: manipulate network header of NATed UDP GRO fraglist
- mac80211: fix station rate table updates on assoc
- r8169: work around RTL8125 UDP HW bug
- igc: report speed and duplex as unknown when device is runtime
suspended
- rxrpc: fix deadlock around release of dst cached on udp tunnel"
* tag 'net-5.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (36 commits)
net: hsr: align sup_multicast_addr in struct hsr_priv to u16 boundary
net: ipa: fix two format specifier errors
net: ipa: use the right accessor in ipa_endpoint_status_skip()
net: ipa: be explicit about endianness
net: ipa: add a missing __iomem attribute
net: ipa: pass correct dma_handle to dma_free_coherent()
r8169: fix WoL on shutdown if CONFIG_DEBUG_SHIRQ is set
net/rds: restrict iovecs length for RDS_CMSG_RDMA_ARGS
net: mvpp2: TCAM entry enable should be written after SRAM data
net: lapb: Copy the skb before sending a packet
net/mlx5e: Release skb in case of failure in tc update skb
net/mlx5e: Update max_opened_tc also when channels are closed
net/mlx5: Fix leak upon failure of rule creation
net/mlx5: Fix function calculation for page trees
docs: networking: swap words in icmp_errors_use_inbound_ifaddr doc
udp: ipv4: manipulate network header of NATed UDP GRO fraglist
net: ip_tunnel: fix mtu calculation
vsock: fix the race conditions in multi-transport support
net: sched: replaced invalid qdisc tree flush helper in qdisc_replace
ibmvnic: device remove has higher precedence over reset
...
Set the emulator context to PROT64 if SYSENTER transitions from 32-bit
userspace (compat mode) to a 64-bit kernel, otherwise the RIP update at
the end of x86_emulate_insn() will incorrectly truncate the new RIP.
Note, this bug is mostly limited to running an Intel virtual CPU model on
an AMD physical CPU, as other combinations of virtual and physical CPUs
do not trigger full emulation. On Intel CPUs, SYSENTER in compatibility
mode is legal, and unconditionally transitions to 64-bit mode. On AMD
CPUs, SYSENTER is illegal in compatibility mode and #UDs. If the vCPU is
AMD, KVM injects a #UD on SYSENTER in compat mode. If the pCPU is Intel,
SYSENTER will execute natively and not trigger #UD->VM-Exit (ignoring
guest TLB shenanigans).
Fixes: fede8076aa ("KVM: x86: handle wrap around 32-bit address space")
Cc: stable@vger.kernel.org
Signed-off-by: Jonny Barker <jonny@jonnybarker.com>
[sean: wrote changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210202165546.2390296-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Saeed Mahameed says:
====================
mlx5 fixes 2021-02-01
Please note the first patch in this series
("Fix function calculation for page trees") is fixing a regression
due to previous fix in net which you didn't include in your previous
rc pr. So I hope this series will make it into your next rc pr,
so mlx5 won't be broken in the next rc.
* tag 'mlx5-fixes-2021-02-01' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5e: Release skb in case of failure in tc update skb
net/mlx5e: Update max_opened_tc also when channels are closed
net/mlx5: Fix leak upon failure of rule creation
net/mlx5: Fix function calculation for page trees
====================
Link: https://lore.kernel.org/r/20210202070703.617251-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alex Elder says:
====================
net: ipa: a few bug fixes
This series fixes four minor bugs. The first two are things that
sparse points out. All four are very simple and each patch should
explain itself pretty well.
====================
Link: https://lore.kernel.org/r/20210201232609.3524451-1-elder@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fix two format specifiers that used %lu for a size_t in "ipa_mem.c".
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When extracting the destination endpoint ID from the status in
ipa_endpoint_status_skip(), u32_get_bits() is used. This happens to
work, but it's wrong: the structure field is only 8 bits wide
instead of 32.
Fix this by using u8_get_bits() to get the destination endpoint ID.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sparse warns that the assignment of the metadata mask for a QMAP
endpoint in ipa_endpoint_init_hdr_metadata_mask() is a bad
assignment. We know we want the mask value to be big endian, even
though the value we write is in host byte order. Use a __force
tag to indicate we really mean it.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The virt local variable in gsi_channel_state() does not have an
__iomem attribute but should. Fix this.
Signed-off-by: Alex Elder <elder@linaro.org>
Reviewed-by: Amy Parker <enbyamy@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
So far phy_disconnect() is called before free_irq(). If CONFIG_DEBUG_SHIRQ
is set and interrupt is shared, then free_irq() creates an "artificial"
interrupt by calling the interrupt handler. The "link change" flag is set
in the interrupt status register, causing phylib to eventually call
phy_suspend(). Because the net_device is detached from the PHY already,
the PHY driver can't recognize that WoL is configured and powers down the
PHY.
Fixes: f1e911d5d0 ("r8169: add basic phylib support")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/fe732c2c-a473-9088-3974-df83cfbd6efd@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
syzbot found WARNING in rds_rdma_extra_size [1] when RDS_CMSG_RDMA_ARGS
control message is passed with user-controlled
0x40001 bytes of args->nr_local, causing order >= MAX_ORDER condition.
The exact value 0x40001 can be checked with UIO_MAXIOV which is 0x400.
So for kcalloc() 0x400 iovecs with sizeof(struct rds_iovec) = 0x10
is the closest limit, with 0x10 leftover.
Same condition is currently done in rds_cmsg_rdma_args().
[1] WARNING: mm/page_alloc.c:5011
[..]
Call Trace:
alloc_pages_current+0x18c/0x2a0 mm/mempolicy.c:2267
alloc_pages include/linux/gfp.h:547 [inline]
kmalloc_order+0x2e/0xb0 mm/slab_common.c:837
kmalloc_order_trace+0x14/0x120 mm/slab_common.c:853
kmalloc_array include/linux/slab.h:592 [inline]
kcalloc include/linux/slab.h:621 [inline]
rds_rdma_extra_size+0xb2/0x3b0 net/rds/rdma.c:568
rds_rm_size net/rds/send.c:928 [inline]
Reported-by: syzbot+1bd2b07f93745fa38425@syzkaller.appspotmail.com
Signed-off-by: Sabyrzhan Tasbolatov <snovitoll@gmail.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Link: https://lore.kernel.org/r/20210201203233.1324704-1-snovitoll@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Maintainer changes for TI for v5.12 merge window:
- Tero switches to kernel.org address
* tag 'ti-k3-maintainer-for-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/nmenon/linux:
MAINTAINERS: Update my email address and maintainer level status
Link: https://lore.kernel.org/r/20210130131411.afna4wj72r7xscqn@skinny
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
When sending a packet, we will prepend it with an LAPB header.
This modifies the shared parts of a cloned skb, so we should copy the
skb rather than just clone it, before we prepend the header.
In "Documentation/networking/driver.rst" (the 2nd point), it states
that drivers shouldn't modify the shared parts of a cloned skb when
transmitting.
The "dev_queue_xmit_nit" function in "net/core/dev.c", which is called
when an skb is being sent, clones the skb and sents the clone to
AF_PACKET sockets. Because the LAPB drivers first remove a 1-byte
pseudo-header before handing over the skb to us, if we don't copy the
skb before prepending the LAPB header, the first byte of the packets
received on AF_PACKET sockets can be corrupted.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Xie He <xie.he.0141@gmail.com>
Acked-by: Martin Schiller <ms@dev.tdt.de>
Link: https://lore.kernel.org/r/20210201055706.415842-1-xie.he.0141@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Johannes Berg says:
====================
Two fixes:
- station rate tables were not updated correctly
after association, leading to bad configuration
- rtl8723bs (staging) was initializing data incorrectly
after the previous fix and needed to move the init
later
* tag 'mac80211-for-net-2021-02-02' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211:
staging: rtl8723bs: Move wiphy setup to after reading the regulatory settings from the chip
mac80211: fix station rate table updates on assoc
====================
Link: https://lore.kernel.org/r/20210202143505.37610-1-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit 0a038c1c29 ("drm/vc4: Move LBM creation out of
vc4_plane_mode_set()") changed the LBM allocation logic from first
allocating the LBM memory for the plane to running mode_set,
adding a gap in the LBM, and then running the dlist allocation filling
that gap.
The gap was introduced by incrementing the dlist array index, but was
never checking whether or not we were over the array length, leading
eventually to memory corruptions if we ever crossed this limit.
vc4_dlist_write had that logic though, and was reallocating a larger
dlist array when reaching the end of the buffer. Let's share the logic
between both functions.
Cc: Boris Brezillon <boris.brezillon@collabora.com>
Cc: Eric Anholt <eric@anholt.net>
Fixes: 0a038c1c29 ("drm/vc4: Move LBM creation out of vc4_plane_mode_set()")
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Acked-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210129160647.128373-1-maxime@cerno.tech
The DP PHY vswing/pre-emphasis level programming the driver does is
related to the DPTX -> first LTTPR link segment only. Accordingly it
should be only programmed when link training the first LTTPR and kept
as-is when training subsequent LTTPRs and the DPRX. For these latter
PHYs the vs/pe levels will be set in response to writing the
DP_TRAINING_LANEx_SET_PHY_REPEATERy DPCD registers (by an upstream LTTPR
TX PHY snooping this write access of its downstream LTTPR/DPRX RX PHY).
The above is also described in DP Standard v2.0 under 3.6.6.1.
While at it simplify and add the LTTPR that is link trained to the debug
message in intel_dp_set_signal_levels().
Fixes: b30edfd8d0 ("drm/i915: Switch to LTTPR non-transparent mode link training")
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201229172201.4155327-2-imre.deak@intel.com
(cherry picked from commit 67fba3f1c7)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Atm the driver will calculate a wrong MST timeslots/MTP (aka time unit)
value for MST streams if the link parameters (link rate or lane count)
are limited in a way independent of the sink capabilities (reported by
DPCD).
One example of such a limitation is when a MUX between the sink and
source connects only a limited number of lanes to the display and
connects the rest of the lanes to other peripherals (USB).
Another issue is that atm MST core calculates the divider based on the
backwards compatible DPCD (at address 0x0000) vs. the extended
capability info (at address 0x2200). This can result in leaving some
part of the MST BW unused (For instance in case of the WD19TB dock).
Fix the above two issues by calculating the PBN divider value based on
the rate and lane count link parameters that the driver uses for all
other computation.
Bugzilla: https://gitlab.freedesktop.org/drm/intel/-/issues/2977
Cc: Lyude Paul <lyude@redhat.com>
Cc: Ville Syrjala <ville.syrjala@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Ville Syrjala <ville.syrjala@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210125173636.1733812-2-imre.deak@intel.com
(cherry picked from commit b59c27cab2)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
When executing a tracepoint, the tracepoint's func is dereferenced twice -
in __DO_TRACE() (where the returned pointer is checked) and later on in
__traceiter_##_name where the returned pointer is dereferenced without
checking which leads to races against tracepoint_removal_sync() and
crashes.
This adds a check before referencing the pointer in tracepoint_ptr_deref.
Link: https://lkml.kernel.org/r/20210202072326.120557-1-aik@ozlabs.ru
Cc: stable@vger.kernel.org
Fixes: d25e37d89d ("tracepoint: Optimize using static_call()")
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
If we enable_breadcrumbs for a request while that request is being
removed from HW; we may see that the request is active as we take the
ce->signal_lock and proceed to attach the request to ce->signals.
However, during unsubmission after marking the request as inactive, we
see that the request has not yet been added to ce->signals and so skip
the removal. Pull the check during cancel_breadcrumbs under the same
spinlock as enabling so that we the two tests are consistent in
enable/cancel.
Otherwise, we may insert a request onto ce->signals that we expect should
not be there:
intel_context_remove_breadcrumbs:488 GEM_BUG_ON(!__i915_request_is_complete(rq))
While updating, we can note that we are always called with
irqs-disabled, due to the engine->active.lock being held at the single
caller, and so remove the irqsave/restore making it symmetric to
enable_breadcrumbs.
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2931
Fixes: c18636f763 ("drm/i915: Remove requirement for holding i915_request.lock for breadcrumbs")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Andi Shyti <andi.shyti@intel.com>
Cc: <stable@vger.kernel.org> # v5.10+
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210119162057.31097-1-chris@chris-wilson.co.uk
(cherry picked from commit e7004ea4f5)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Commit 0138ba5783 ("powerpc/64/signal: Balance return predictor
stack in signal trampoline") changed __kernel_sigtramp_rt64() VDSO and
trampoline code, and introduced a regression in the way glibc's
backtrace()[1] detects the signal-handler stack frame. Apart from the
practical implications, __kernel_sigtramp_rt64() was a VDSO function
with the semantics that it is a function you can call from userspace
to end a signal handling. Now this semantics are no longer valid.
I believe the aforementioned change affects all releases since 5.9.
This patch tries to fix both the semantics and practical aspect of
__kernel_sigtramp_rt64() returning it to the previous code, whilst
keeping the intended behaviour of 0138ba5783 by adding a new symbol
to serve as the jump target from the kernel to the trampoline. Now the
trampoline has two parts, a new entry point and the old return point.
[1] https://lists.ozlabs.org/pipermail/linuxppc-dev/2021-January/223194.html
Fixes: 0138ba5783 ("powerpc/64/signal: Balance return predictor stack in signal trampoline")
Cc: stable@vger.kernel.org # v5.9+
Signed-off-by: Raoni Fassina Firmino <raoni@linux.ibm.com>
Acked-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Minor tweaks to change log formatting, add stable tag]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210201200505.iz46ubcizipnkcxe@work-tp
Redirect my older email addresses that are in the git logs.
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested both with Corsairs firmware 11.3 and 13.0 for the Corsairs MP600
and both have the issue as reported by the kernel.
nvme nvme0: missing or invalid SUBNQN field.
Signed-off-by: Claus Stovgaard <claus.stovgaard@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
In case of failure in tc update skb the packet is dropped
without freeing the skb.
Fixed by freeing the skb in case failure in tc update skb.
Fixes: d6d2778286 ("net/mlx5: E-Switch, Restore chain id on miss")
Fixes: c756909722 ("net/mlx5e: Add tc chains offload support for nic flows")
Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
max_opened_tc is used for stats, so that potentially non-zero stats
won't disappear when num_tc decreases. However, mlx5e_setup_tc_mqprio
fails to update it in the flow where channels are closed.
This commit fixes it. The new value of priv->channels.params.num_tc is
always checked on exit. In case of errors it will just be the old value,
and in case of success it will be the updated value.
Fixes: 05909babce ("net/mlx5e: Avoid reset netdev stats on configuration changes")
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
When creation of a new rule that requires allocation of an FTE fails,
need to call to tree_put_node on the FTE in order to release its'
resource.
Fixes: cefc23554f ("net/mlx5: Fix FTE cleanup")
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Reviewed-by: Alaa Hleihel <alaa@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The function calculation always results in a value of 0. This works
generally, but when the release all pages feature is enabled it will
result in crashes.
Fixes: 0aa128475d ("net/mlx5: Maintain separate page trees for ECPF and PF functions")
Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Reported-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
While addressing some warnings generated by -Warray-bounds, I found this
bug that was introduced back in 2017:
CC [M] fs/cifs/smb2pdu.o
fs/cifs/smb2pdu.c: In function ‘SMB2_negotiate’:
fs/cifs/smb2pdu.c:822:16: warning: array subscript 1 is above array bounds
of ‘__le16[1]’ {aka ‘short unsigned int[1]’} [-Warray-bounds]
822 | req->Dialects[1] = cpu_to_le16(SMB30_PROT_ID);
| ~~~~~~~~~~~~~^~~
fs/cifs/smb2pdu.c:823:16: warning: array subscript 2 is above array bounds
of ‘__le16[1]’ {aka ‘short unsigned int[1]’} [-Warray-bounds]
823 | req->Dialects[2] = cpu_to_le16(SMB302_PROT_ID);
| ~~~~~~~~~~~~~^~~
fs/cifs/smb2pdu.c:824:16: warning: array subscript 3 is above array bounds
of ‘__le16[1]’ {aka ‘short unsigned int[1]’} [-Warray-bounds]
824 | req->Dialects[3] = cpu_to_le16(SMB311_PROT_ID);
| ~~~~~~~~~~~~~^~~
fs/cifs/smb2pdu.c:816:16: warning: array subscript 1 is above array bounds
of ‘__le16[1]’ {aka ‘short unsigned int[1]’} [-Warray-bounds]
816 | req->Dialects[1] = cpu_to_le16(SMB302_PROT_ID);
| ~~~~~~~~~~~~~^~~
At the time, the size of array _Dialects_ was changed from 1 to 3 in struct
validate_negotiate_info_req, and then in 2019 it was changed from 3 to 4,
but those changes were never made in struct smb2_negotiate_req, which has
led to a 3 and a half years old out-of-bounds bug in function
SMB2_negotiate() (fs/cifs/smb2pdu.c).
Fix this by increasing the size of array _Dialects_ in struct
smb2_negotiate_req to 4.
Fixes: 9764c02fcb ("SMB3: Add support for multidialect negotiate (SMB2.1 and later)")
Fixes: d5c7076b77 ("smb3: add smb3.1.1 to default dialect list")
Cc: stable@vger.kernel.org
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2021-02-01
This series contains updates to igc and i40e drivers.
Kai-Heng Feng fixes igc to report unknown speed and duplex during suspend
as an attempted read will cause errors.
Kevin Lo sets the default value to -IGC_ERR_NVM instead of success for
writing shadow RAM as this could miss a timeout. Also propagates the return
value for Flow Control configuration to properly pass on errors for igc.
Aleksandr reverts commit 2ad1274fa3 ("i40e: don't report link up for a VF
who hasn't enabled queues") as this can cause link flapping.
* '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
i40e: Revert "i40e: don't report link up for a VF who hasn't enabled queues"
igc: check return value of ret_val in igc_config_fc_after_link_up
igc: set the default return value to -IGC_ERR_NVM in igc_write_nvm_srwr
igc: Report speed and duplex as unknown when device is runtime suspended
====================
Link: https://lore.kernel.org/r/20210201214618.852831-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
UDP/IP header of UDP GROed frag_skbs are not updated even after NAT
forwarding. Only the header of head_skb from ip_finish_output_gso ->
skb_gso_segment is updated but following frag_skbs are not updated.
A call path skb_mac_gso_segment -> inet_gso_segment ->
udp4_ufo_fragment -> __udp_gso_segment -> __udp_gso_segment_list
does not try to update UDP/IP header of the segment list but copy
only the MAC header.
Update port, addr and check of each skb of the segment list in
__udp_gso_segment_list. It covers both SNAT and DNAT.
Fixes: 9fd1ff5d2a (udp: Support UDP fraglist GRO/GSO.)
Signed-off-by: Dongseok Yi <dseok.yi@samsung.com>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Link: https://lore.kernel.org/r/1611962007-80092-1-git-send-email-dseok.yi@samsung.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
There are multiple similar bugs implicitly introduced by the
commit c0cfa2d8a7 ("vsock: add multi-transports support") and
commit 6a2c096210 ("vsock: prevent transport modules unloading").
The bug pattern:
[1] vsock_sock.transport pointer is copied to a local variable,
[2] lock_sock() is called,
[3] the local variable is used.
VSOCK multi-transport support introduced the race condition:
vsock_sock.transport value may change between [1] and [2].
Let's copy vsock_sock.transport pointer to local variables after
the lock_sock() call.
Fixes: c0cfa2d8a7 ("vsock: add multi-transports support")
Signed-off-by: Alexander Popov <alex.popov@linux.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Jorgen Hansen <jhansen@vmware.com>
Link: https://lore.kernel.org/r/20210201084719.2257066-1-alex.popov@linux.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Richard reports that the following test:
(while true; do
cat /sys/bus/nd/devices/nmem*/available_slots 2>&1 > /dev/null
done) &
while true; do
for i in $(seq 0 4); do
echo nmem$i > /sys/bus/nd/drivers/nvdimm/bind
done
for i in $(seq 0 4); do
echo nmem$i > /sys/bus/nd/drivers/nvdimm/unbind
done
done
...fails with a crash signature like:
divide error: 0000 [#1] SMP KASAN PTI
RIP: 0010:nd_label_nfree+0x134/0x1a0 [libnvdimm]
[..]
Call Trace:
available_slots_show+0x4e/0x120 [libnvdimm]
dev_attr_show+0x42/0x80
? memset+0x20/0x40
sysfs_kf_seq_show+0x218/0x410
The root cause is that available_slots_show() consults driver-data, but
fails to synchronize against device-unbind setting up a TOCTOU race to
access uninitialized memory.
Validate driver-data under the device-lock.
Fixes: 4d88a97aa9 ("libnvdimm, nvdimm: dimm driver and base libnvdimm device-driver infrastructure")
Cc: <stable@vger.kernel.org>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Coly Li <colyli@suse.com>
Reported-by: Richard Palethorpe <rpalethorpe@suse.com>
Acked-by: Richard Palethorpe <rpalethorpe@suse.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
This reverts commit 2ad1274fa3
VF queues were not brought up when PF was brought up after being
downed if the VF driver disabled VFs queues during PF down.
This could happen in some older or external VF driver implementations.
The problem was that PF driver used vf->queues_enabled as a condition
to decide what link-state it would send out which caused the issue.
Remove the check for vf->queues_enabled in the VF link notify.
Now VF will always be notified of the current link status.
Also remove the queues_enabled member from i40e_vf structure as it is
not used anymore. Otherwise VNF implementation was broken and caused
a link flap.
The original commit was a workaround to avoid breaking existing VFs though
it's really a fault of the VF code not the PF. The commit should be safe to
revert as all of the VFs we know of have been fixed. Also, since we now
know there is a related bug in the workaround, removing it is preferred.
Fixes: 2ad1274fa3 ("i40e: don't report link up for a VF who hasn't enabled")
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Pull media fixes from Mauro Carvalho Chehab:
"The rockship rkisp1 driver will be promoted from staging in 5.11.
While not too late, do a few uAPI changes which are needed to better
support its functionalities"
* tag 'media/v5.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
media: rockchip: rkisp1: extend uapi array sizes
media: rockchip: rkisp1: carry ip version information
media: rockchip: rkisp1: reduce number of histogram grid elements in uapi
media: rkisp1: stats: mask the hist_bins values
media: rkisp1: stats: remove a wrong cast to u8
media: rkisp1: uapi: change hist_bins array type from __u16 to __u32
Commit 81f153faac ("staging: rtl8723bs: fix wireless regulatory API
misuse") moved the wiphy_apply_custom_regulatory() call to earlier in the
driver's init-sequence, so that it gets called before wiphy_register().
But at this point in time the eFuses which code the regulatory-settings
for the chip have not been read by the driver yet, causing
_rtw_reg_apply_flags() to set the IEEE80211_CHAN_DISABLED flag on *all*
channels.
On the device where I initially tested the fix, a Jumper EZpad 7 tablet,
this does not cause any problems because shortly after init the
rtw_reg_notifier() gets called fixing things up. I guess this happens
into response to receiving a (broadcast) packet with regulatory info
from the access-point ?
But on another device with a RTL8723BS wifi chip, an Acer Switch 10E
(SW3-016), the rtw_reg_notifier() never gets called. I assume that some
fuse has been set on this device to ignore regulatory info received from
access-points.
This means that on the Acer the driver is stuck in a state with all
channels disabled, leading to non working Wifi.
We cannot move the wiphy_apply_custom_regulatory() call back, because
that call must be made before the wiphy_register() call.
Instead move the entire rtw_wdev_alloc() call to after the Efuses have
been read, fixing all channels being disabled in the initial channel-map.
Fixes: 81f153faac ("staging: rtl8723bs: fix wireless regulatory API misuse")
Cc: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/20210201152956.370186-2-hdegoede@redhat.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This patch sets the default return value to -IGC_ERR_NVM in
igc_write_nvm_srwr. Without this change it wouldn't lead to a shadow RAM
write EEWR timeout.
Fixes: ab40561268 ("igc: Add NVM support")
Signed-off-by: Kevin Lo <kevlo@kevlo.org>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
With the new 2 step scanning process, which defers instantiating some
ACPI-devices based on their _DEP to the second step, the following may
happen:
1. During the first acpi_walk_namespace(acpi_bus_check_add) call
acpi_scan_check_dep() gets called on the Battery ACPI dev handle and
adds one or more deps for this handle to the acpi_dep_list
2. During the first acpi_bus_attach() call one or more of the suppliers of
these deps get their driver attached and
acpi_walk_dep_device_list(supplier_handle) gets called.
At this point acpi_bus_get_device(dep->consumer) get called,
but since the battery has DEPs it has not been instantiated during the
first acpi_walk_namespace(acpi_bus_check_add), so the
acpi_bus_get_device(dep->consumer) call fails.
Before this commit, acpi_walk_dep_device_list() would now continue
*without* removing the acpi_dep_data entry for this supplier,consumer
pair from the acpi_dep_list.
3. During the second acpi_walk_namespace(acpi_bus_check_add) call
an acpi_device gets instantiated for the battery and
acpi_scan_dep_init() gets called to initialize its dep_unmet val.
Before this commit, the dep_unmet count would include DEPs for
suppliers for which acpi_walk_dep_device_list(supplier_handle)
has already been called, so it will never become 0 and the
ACPI battery driver will never get attached / bind.
Fix the ACPI battery driver never binding in this scenario by making
acpi_walk_dep_device_list() always remove matching acpi_dep_data
entries independent of the acpi_bus_get_device(dep->consumer) call
succeeding or not.
Fixes: 71da201f38 ("ACPI: scan: Defer enumeration of devices with _DEP lists")
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Commit 7a873e4555 ("KVM: selftests: Verify supported CR4 bits can be set
before KVM_SET_CPUID2") reveals that KVM allows to set X86_CR4_PCIDE even
when PCID support is missing:
==== Test Assertion Failure ====
x86_64/set_sregs_test.c:41: rc
pid=6956 tid=6956 - Invalid argument
1 0x000000000040177d: test_cr4_feature_bit at set_sregs_test.c:41
2 0x00000000004014fc: main at set_sregs_test.c:119
3 0x00007f2d9346d041: ?? ??:0
4 0x000000000040164d: _start at ??:?
KVM allowed unsupported CR4 bit (0x20000)
Add X86_FEATURE_PCID feature check to __cr4_reserved_bits() to make
kvm_is_valid_cr4() fail.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210201142843.108190-1-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Userspace that does not know about KVM_GET_MSR_FEATURE_INDEX_LIST
will generally use the default value for MSR_IA32_ARCH_CAPABILITIES.
When this happens and the host has tsx=on, it is possible to end up with
virtual machines that have HLE and RTM disabled, but TSX_CTRL available.
If the fleet is then switched to tsx=off, kvm_get_arch_capabilities()
will clear the ARCH_CAP_TSX_CTRL_MSR bit and it will not be possible to
use the tsx=off hosts as migration destinations, even though the guests
do not have TSX enabled.
To allow this migration, allow guests to write to their TSX_CTRL MSR,
while keeping the host MSR unchanged for the entire life of the guests.
This ensures that TSX remains disabled and also saves MSR reads and
writes, and it's okay to do because with tsx=off we know that guests will
not have the HLE and RTM features in their CPUID. (If userspace sets
bogus CPUID data, we do not expect HLE and RTM to work in guests anyway).
Cc: stable@vger.kernel.org
Fixes: cbbaa2727a ("KVM: x86: fix presentation of TSX feature in ARCH_CAPABILITIES")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Anj Duvnjak reports that the Kodi.tv NFS client is not able to read
video files from a v5.10.11 Linux NFS server.
The new sendpage-based TCP sendto logic was not attentive to non-
zero page_base values. nfsd_splice_read() sets that field when a
READ payload starts in the middle of a page.
The Linux NFS client rarely emits an NFS READ that is not page-
aligned. All of my testing so far has been with Linux clients, so I
missed this one.
Reported-by: A. Duvnjak <avian@extremenerds.net>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=211471
Fixes: 4a85a6a332 ("SUNRPC: Handle TCP socket sends with kernel_sendpage() again")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: A. Duvnjak <avian@extremenerds.net>
Mika writes:
thunderbolt: Fix for v5.11-rc7
A single fix for a possible NULL pointer dereference when adding device
links from ACPI description.
* tag 'thunderbolt-for-v5.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/westeri/thunderbolt:
thunderbolt: Fix possible NULL pointer dereference in tb_acpi_add_link()
After refactoring, we had two variables for the same thing. Remove the
second declaration, one is enough here. Found by cppcheck.
drivers/gpio/gpiolib.c:2551:17: warning: Local variable 'ret' shadows outer variable [shadowVariable]
Fixes: d377f56f34 ("gpio: gpiolib: Normalize return code variable name")
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
The implementation of sdhci_pltfm_suspend() is only available when
CONFIG_PM_SLEEP is set, which triggers a linking error:
"undefined symbol: sdhci_pltfm_suspend" when building sdhci-brcmstb.c.
Fix this by implementing the missing stubs when CONFIG_PM_SLEEP is unset.
Reported-by: Arnd Bergmann <arnd@arndb.de>
Suggested-by: Florian Fainelli <f.fainelli@gmail.com>
Fixes: 5b191dcba7 ("mmc: sdhci-brcmstb: Fix mmc timeout errors on S5 suspend")
Cc: stable@vger.kernel.org
Tested-By: Nicolas Schichan <nschichan@freeebox.fr>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Python retired in 2020, and some distributions do not provide the
'python' command any more.
As in commit 51839e29cb ("scripts: switch explicitly to Python 3"),
we need to use more specific 'python3' to invoke scripts even if they
are written in a way compatible with both Python 2 and 3.
This commit removes the variable 'PYTHON', and switches the existing
users to 'PYTHON3'.
BTW, PEP 394 (https://www.python.org/dev/peps/pep-0394/) is a helpful
material.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
An upcoming Dell platform is causing a NULL pointer dereference
in dell-wmi-sysman initialization. Validate that the input from
BIOS matches correct ACPI types and abort module initialization
if it fails.
Signed-off-by: Mario Limonciello <mario.limonciello@dell.com>
Tested-by: Perry Yuan <perry_yuan@dell.com>
Link: https://lore.kernel.org/r/20210129172654.2326751-1-mario.limonciello@dell.com
[hdegoede@redhat.com: Drop redundant release_attributes_data() call]
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Pull EFI fix from Borislav Petkov:
"A single fix from Lukas: handle boolean device properties imported
from Apple firmware correctly"
* tag 'efi-urgent-for-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi/apple-properties: Reinstate support for boolean properties
Pull x86 fix from Borislav Petkov:
"A single fix for objtool to generate proper unwind info for newer
toolchains which do not generate section symbols anymore. And a
cleanup ontop.
This was originally going to go during the next merge window but
people can already trigger a build error with binutils-2.36 which
doesn't emit section symbols - something which objtool relies on - so
let's expedite it"
* tag 'x86_entry_for_v5.11_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/entry: Remove put_ret_addr_in_rdi THUNK macro argument
x86/entry: Emit a symbol for register restoring thunk
Pull timer fix from Thomas Gleixner:
"A fix for handling advertised, but non-existent 146818 RTCs correctly.
With the recent UIP handling changes the time readout of non-existent
RTCs hangs forever as the read returns always 0xFF which means the UIP
bit is set.
Sanity check the RTC before registering by checking the RTC_VALID
register for correctness"
* tag 'timers-urgent-2021-01-31' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
rtc: mc146818: Detect and handle broken RTCs
Pull single stepping fix from Thomas Gleixner:
"A single fix for the single step reporting regression caused by
getting the condition wrong when moving SYSCALL_EMU away from TIF
flags"
[ There's apparently another problem too, fix pending ]
* tag 'core-urgent-2021-01-31' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
entry: Unbreak single step reporting behaviour
Pull powerpc fix from Michael Ellerman:
"One fix for a bug in our soft interrupt masking, which could lead to
interrupt replaying recursing, causing spurious interrupts.
Thanks to Nicholas Piggin"
* tag 'powerpc-5.11-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/64s: prevent recursive replay_soft_interrupts causing superfluous interrupt
Pull i2c fix from Wolfram Sang:
"Just one I2C driver update this time"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: mediatek: Move suspend and resume handling to NOIRQ phase
Pull LED fixes from Pavel Machek:
"This pull is due to 'leds: trigger: fix potential deadlock with
libata' -- people find the warn annoying.
It also contains new driver and two trivial fixes"
* 'for-rc-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/pavel/linux-leds:
leds: rt8515: Add Richtek RT8515 LED driver
dt-bindings: leds: Add DT binding for Richtek RT8515
leds: trigger: fix potential deadlock with libata
leds: leds-ariel: convert comma to semicolon
leds: leds-lm3533: convert comma to semicolon
Pull NFS client fixes from Trond Myklebust:
- SUNRPC: Handle 0 length opaque XDR object data properly
- Fix a layout segment leak in pnfs_layout_process()
- pNFS/NFSv4: Update the layout barrier when we schedule a layoutreturn
- pNFS/NFSv4: Improve rejection of out-of-order layouts
- pNFS/NFSv4: Try to return invalid layout in pnfs_layout_process()
* tag 'nfs-for-5.11-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
SUNRPC: Handle 0 length opaque XDR object data properly
SUNRPC: Move simple_get_bytes and simple_get_netobj into private header
pNFS/NFSv4: Improve rejection of out-of-order layouts
pNFS/NFSv4: Update the layout barrier when we schedule a layoutreturn
pNFS/NFSv4: Try to return invalid layout in pnfs_layout_process()
pNFS/NFSv4: Fix a layout segment leak in pnfs_layout_process()
This adds a driver for the Richtek RT8515 dual channel
torch/flash white LED driver.
This LED driver is found in some mobile phones from
Samsung such as the GT-S7710 and GT-I8190.
A V4L interface is added.
We do not have a proper datasheet for the RT8515 but
it turns out that RT9387A has a public datasheet and
is essentially the same chip. We designed the driver
in accordance with this datasheet. The day someone
needs to drive a RT9387A this driver can probably
easily be augmented to handle that chip too.
Sakari Ailus, Pavel Machek and Andy Shevchenko helped
significantly in getting this driver right.
Cc: Sakari Ailus <sakari.ailus@iki.fi>
Cc: newbytee@protonmail.com
Cc: Stephan Gerhold <stephan@gerhold.net>
Cc: linux-media@vger.kernel.org
Cc: phone-devel@vger.kernel.org
Reviewed-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Pavel Machek <pavel@ucw.cz>
We have the following potential deadlock condition:
========================================================
WARNING: possible irq lock inversion dependency detected
5.10.0-rc2+ #25 Not tainted
--------------------------------------------------------
swapper/3/0 just changed the state of lock:
ffff8880063bd618 (&host->lock){-...}-{2:2}, at: ata_bmdma_interrupt+0x27/0x200
but this lock took another, HARDIRQ-READ-unsafe lock in the past:
(&trig->leddev_list_lock){.+.?}-{2:2}
and interrupts could create inverse lock ordering between them.
other info that might help us debug this:
Possible interrupt unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&trig->leddev_list_lock);
local_irq_disable();
lock(&host->lock);
lock(&trig->leddev_list_lock);
<Interrupt>
lock(&host->lock);
*** DEADLOCK ***
no locks held by swapper/3/0.
the shortest dependencies between 2nd lock and 1st lock:
-> (&trig->leddev_list_lock){.+.?}-{2:2} ops: 46 {
HARDIRQ-ON-R at:
lock_acquire+0x15f/0x420
_raw_read_lock+0x42/0x90
led_trigger_event+0x2b/0x70
rfkill_global_led_trigger_worker+0x94/0xb0
process_one_work+0x240/0x560
worker_thread+0x58/0x3d0
kthread+0x151/0x170
ret_from_fork+0x1f/0x30
IN-SOFTIRQ-R at:
lock_acquire+0x15f/0x420
_raw_read_lock+0x42/0x90
led_trigger_event+0x2b/0x70
kbd_bh+0x9e/0xc0
tasklet_action_common.constprop.0+0xe9/0x100
tasklet_action+0x22/0x30
__do_softirq+0xcc/0x46d
run_ksoftirqd+0x3f/0x70
smpboot_thread_fn+0x116/0x1f0
kthread+0x151/0x170
ret_from_fork+0x1f/0x30
SOFTIRQ-ON-R at:
lock_acquire+0x15f/0x420
_raw_read_lock+0x42/0x90
led_trigger_event+0x2b/0x70
rfkill_global_led_trigger_worker+0x94/0xb0
process_one_work+0x240/0x560
worker_thread+0x58/0x3d0
kthread+0x151/0x170
ret_from_fork+0x1f/0x30
INITIAL READ USE at:
lock_acquire+0x15f/0x420
_raw_read_lock+0x42/0x90
led_trigger_event+0x2b/0x70
rfkill_global_led_trigger_worker+0x94/0xb0
process_one_work+0x240/0x560
worker_thread+0x58/0x3d0
kthread+0x151/0x170
ret_from_fork+0x1f/0x30
}
... key at: [<ffffffff83da4c00>] __key.0+0x0/0x10
... acquired at:
_raw_read_lock+0x42/0x90
led_trigger_blink_oneshot+0x3b/0x90
ledtrig_disk_activity+0x3c/0xa0
ata_qc_complete+0x26/0x450
ata_do_link_abort+0xa3/0xe0
ata_port_freeze+0x2e/0x40
ata_hsm_qc_complete+0x94/0xa0
ata_sff_hsm_move+0x177/0x7a0
ata_sff_pio_task+0xc7/0x1b0
process_one_work+0x240/0x560
worker_thread+0x58/0x3d0
kthread+0x151/0x170
ret_from_fork+0x1f/0x30
-> (&host->lock){-...}-{2:2} ops: 69 {
IN-HARDIRQ-W at:
lock_acquire+0x15f/0x420
_raw_spin_lock_irqsave+0x52/0xa0
ata_bmdma_interrupt+0x27/0x200
__handle_irq_event_percpu+0xd5/0x2b0
handle_irq_event+0x57/0xb0
handle_edge_irq+0x8c/0x230
asm_call_irq_on_stack+0xf/0x20
common_interrupt+0x100/0x1c0
asm_common_interrupt+0x1e/0x40
native_safe_halt+0xe/0x10
arch_cpu_idle+0x15/0x20
default_idle_call+0x59/0x1c0
do_idle+0x22c/0x2c0
cpu_startup_entry+0x20/0x30
start_secondary+0x11d/0x150
secondary_startup_64_no_verify+0xa6/0xab
INITIAL USE at:
lock_acquire+0x15f/0x420
_raw_spin_lock_irqsave+0x52/0xa0
ata_dev_init+0x54/0xe0
ata_link_init+0x8b/0xd0
ata_port_alloc+0x1f1/0x210
ata_host_alloc+0xf1/0x130
ata_host_alloc_pinfo+0x14/0xb0
ata_pci_sff_prepare_host+0x41/0xa0
ata_pci_bmdma_prepare_host+0x14/0x30
piix_init_one+0x21f/0x600
local_pci_probe+0x48/0x80
pci_device_probe+0x105/0x1c0
really_probe+0x221/0x490
driver_probe_device+0xe9/0x160
device_driver_attach+0xb2/0xc0
__driver_attach+0x91/0x150
bus_for_each_dev+0x81/0xc0
driver_attach+0x1e/0x20
bus_add_driver+0x138/0x1f0
driver_register+0x91/0xf0
__pci_register_driver+0x73/0x80
piix_init+0x1e/0x2e
do_one_initcall+0x5f/0x2d0
kernel_init_freeable+0x26f/0x2cf
kernel_init+0xe/0x113
ret_from_fork+0x1f/0x30
}
... key at: [<ffffffff83d9fdc0>] __key.6+0x0/0x10
... acquired at:
__lock_acquire+0x9da/0x2370
lock_acquire+0x15f/0x420
_raw_spin_lock_irqsave+0x52/0xa0
ata_bmdma_interrupt+0x27/0x200
__handle_irq_event_percpu+0xd5/0x2b0
handle_irq_event+0x57/0xb0
handle_edge_irq+0x8c/0x230
asm_call_irq_on_stack+0xf/0x20
common_interrupt+0x100/0x1c0
asm_common_interrupt+0x1e/0x40
native_safe_halt+0xe/0x10
arch_cpu_idle+0x15/0x20
default_idle_call+0x59/0x1c0
do_idle+0x22c/0x2c0
cpu_startup_entry+0x20/0x30
start_secondary+0x11d/0x150
secondary_startup_64_no_verify+0xa6/0xab
This lockdep splat is reported after:
commit e918188611 ("locking: More accurate annotations for read_lock()")
To clarify:
- read-locks are recursive only in interrupt context (when
in_interrupt() returns true)
- after acquiring host->lock in CPU1, another cpu (i.e. CPU2) may call
write_lock(&trig->leddev_list_lock) that would be blocked by CPU0
that holds trig->leddev_list_lock in read-mode
- when CPU1 (ata_ac_complete()) tries to read-lock
trig->leddev_list_lock, it would be blocked by the write-lock waiter
on CPU2 (because we are not in interrupt context, so the read-lock is
not recursive)
- at this point if an interrupt happens on CPU0 and
ata_bmdma_interrupt() is executed it will try to acquire host->lock,
that is held by CPU1, that is currently blocked by CPU2, so:
* CPU0 blocked by CPU1
* CPU1 blocked by CPU2
* CPU2 blocked by CPU0
*** DEADLOCK ***
The deadlock scenario is better represented by the following schema
(thanks to Boqun Feng <boqun.feng@gmail.com> for the schema and the
detailed explanation of the deadlock condition):
CPU 0: CPU 1: CPU 2:
----- ----- -----
led_trigger_event():
read_lock(&trig->leddev_list_lock);
<workqueue>
ata_hsm_qc_complete():
spin_lock_irqsave(&host->lock);
write_lock(&trig->leddev_list_lock);
ata_port_freeze():
ata_do_link_abort():
ata_qc_complete():
ledtrig_disk_activity():
led_trigger_blink_oneshot():
read_lock(&trig->leddev_list_lock);
// ^ not in in_interrupt() context, so could get blocked by CPU 2
<interrupt>
ata_bmdma_interrupt():
spin_lock_irqsave(&host->lock);
Fix by using read_lock_irqsave/irqrestore() in led_trigger_event(), so
that no interrupt can happen in between, preventing the deadlock
condition.
Apply the same change to led_trigger_blink_setup() as well, since the
same deadlock scenario can also happen in power_supply_update_bat_leds()
-> led_trigger_blink() -> led_trigger_blink_setup() (workqueue context),
and potentially prevent other similar usages.
Link: https://lore.kernel.org/lkml/20201101092614.GB3989@xps-13-7390/
Fixes: eb25cb9956 ("leds: convert IDE trigger to common disk trigger")
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Signed-off-by: Pavel Machek <pavel@ucw.cz>
Replace a comma between expression statements by a semicolon.
Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com>
Reviewed-by: Alexander Dahl <ada@thorsis.com>
Signed-off-by: Pavel Machek <pavel@ucw.cz>
Replace a comma between expression statements by a semicolon.
Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com>
Signed-off-by: Pavel Machek <pavel@ucw.cz>
Pull cifs fixes from Steve French:
"Four cifs patches found in additional testing of the conversion to the
new mount API: three small option processing ones, and one fixing domain
based DFS referrals"
* tag '5.11-rc5-smb3' of git://git.samba.org/sfrench/cifs-2.6:
cifs: fix dfs domain referrals
cifs: returning mount parm processing errors correctly
cifs: fix mounts to subdirectories of target
cifs: ignore auto and noauto options if given
Pull SCSI fixes from James Bottomley:
"Two minor fixes in drivers. Both changing strings (one in a comment,
one in a module help text) with no code impact"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: qla2xxx: Fix description for parameter ql2xenforce_iocb_limit
scsi: target: iscsi: Fix typo in comment
Pull OpenRISC fix from Stafford Horne:
"Fix config dependencies for Litex SOC driver causing issues on um"
* tag 'for-linus' of git://github.com/openrisc/linux:
soc: litex: Properly depend on HAS_IOMEM
Pull devicetree fixes from Rob Herring:
- Cleanups on properties with standard unit suffixes
- Fix overwriting dma_range_map if there's no 'dma-ranges' property
- Fix a bug when creating a /chosen node from ARM ATAGs
- Add missing properties for TI j721e USB binding
- Several doc reference updates due to DT schema conversions
* tag 'devicetree-fixes-for-5.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
dt-bindings: Cleanup standard unit properties
of/device: Update dma_range_map only when dev has valid dma-ranges
ARM: zImage: atags_to_fdt: Fix node names on added root nodes
dt-bindings: usb: j721e: add ranges and dma-coherent props
dt-bindings:iio:adc: update adc.yaml reference
dt-bindings: memory: mediatek: update mediatek,smi-larb.yaml references
dt-bindings: display: mediatek: update mediatek,dpi.yaml reference
ASoC: audio-graph-card: update audio-graph-card.yaml reference
Pull s390 fixes from Vasily Gorbik:
- Fix max number of VCPUs reported via ultravisor information sysfs
interface.
- Fix memory leaks during vfio-ap resources clean up on KVM pointer
invalidation notification.
- Fix potential specification exception by avoiding unnecessary
interrupts disable after queue reset in vfio-ap.
* tag 's390-5.11-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390: uv: Fix sysfs max number of VCPUs reporting
s390/vfio-ap: No need to disable IRQ after queue reset
s390/vfio-ap: clean up vfio_ap resources when KVM pointer invalidated
Pull RISC-V fix from Palmer Dabbelt:
"A fix to avoid initializing max_mapnr to be too large, which may
manifest on NUMA systems"
* tag 'riscv-for-linus-5.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: Fixup pfn_valid error with wrong max_mapnr
Following race condition was detected:
<CPU A, t0> - neigh_flush_dev() is under execution and calls
neigh_mark_dead(n) marking the neighbour entry 'n' as dead.
<CPU B, t1> - Executing: __netif_receive_skb() ->
__netif_receive_skb_core() -> arp_rcv() -> arp_process().arp_process()
calls __neigh_lookup() which takes a reference on neighbour entry 'n'.
<CPU A, t2> - Moves further along neigh_flush_dev() and calls
neigh_cleanup_and_release(n), but since reference count increased in t2,
'n' couldn't be destroyed.
<CPU B, t3> - Moves further along, arp_process() and calls
neigh_update()-> __neigh_update() -> neigh_update_gc_list(), which adds
the neighbour entry back in gc_list(neigh_mark_dead(), removed it
earlier in t0 from gc_list)
<CPU B, t4> - arp_process() finally calls neigh_release(n), destroying
the neighbour entry.
This leads to 'n' still being part of gc_list, but the actual
neighbour structure has been freed.
The situation can be prevented from happening if we disallow a dead
entry to have any possibility of updating gc_list. This is what the
patch intends to achieve.
Fixes: 9c29a2f55e ("neighbor: Fix locking order for gc_list changes")
Signed-off-by: Chinmay Agarwal <chinagar@codeaurora.org>
Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20210127165453.GA20514@chinagar-linux.qualcomm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Dcfg was overlapping with clockgen address space which resulted
in failure in memory allocation for dcfg. According regs description
dcfg size should not be bigger than 4KB.
Signed-off-by: Zyta Szpak <zr@semihalf.com>
Fixes: 8126d88162 ("arm64: dts: add QorIQ LS1046A SoC support")
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Johan writes:
USB-serial fixes for 5.11-rc6
Here are some new device-ids for 5.11-rc6.
All but the option one have been in linux-next, and with no reported
issues.
* tag 'usb-serial-5.11-rc6' of https://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial:
USB: serial: option: Adding support for Cinterion MV31
USB: serial: cp210x: add pid/vid for WSDA-200-USB
USB: serial: cp210x: add new VID/PID for supporting Teraoka AD2000
vgettimeofday.o is unnecessarily rebuilt. Adding it to 'targets' is not
enough to fix the issue. Kbuild is correctly rebuilding it because the
command line is changed.
PowerPC builds each vdso directory twice; first in vdso_prepare to
generate vdso{32,64}-offsets.h, second as part of the ordinary build
process to embed vdso{32,64}.so.dbg into the kernel.
The problem shows up when CONFIG_PPC_WERROR=y due to the following line
in arch/powerpc/Kbuild:
subdir-ccflags-$(CONFIG_PPC_WERROR) := -Werror
In the preparation stage, Kbuild directly visits the vdso directories,
hence it does not inherit subdir-ccflags-y. In the second descend,
Kbuild adds -Werror, which results in the command line flipping
with/without -Werror.
It implies a potential danger; if a more critical flag that would impact
the resulted vdso, the offsets recorded in the headers might be different
from real offsets in the embedded vdso images.
Removing the unneeded second descend solves the problem.
Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/linuxppc-dev/87tuslxhry.fsf@mpe.ellerman.id.au/
Link: https://lore.kernel.org/r/20201223171142.707053-1-masahiroy@kernel.org
Compiling kernel with -Warray-bounds throws below warning:
In function 'emulate_vsx_store':
warning: array subscript is above array bounds [-Warray-bounds]
buf.d[2] = byterev_8(reg->d[1]);
~~~~~^~~
buf.d[3] = byterev_8(reg->d[0]);
~~~~~^~~
Fix it by using temporary array variable 'union vsx_reg buf32[]' in
that code block. Also, with element_size = 32, 'union vsx_reg *reg'
is an array of size 2. So, use 'reg' as an array instead of pointer
in the same code block.
Fixes: af99da7433 ("powerpc/sstep: Support VSX vector paired storage access instructions")
Suggested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210129071745.111466-1-ravi.bangoria@linux.ibm.com
AF_RXRPC sockets use UDP ports in encap mode. This causes socket and dst
from an incoming packet to get stolen and attached to the UDP socket from
whence it is leaked when that socket is closed.
When a network namespace is removed, the wait for dst records to be cleaned
up happens before the cleanup of the rxrpc and UDP socket, meaning that the
wait never finishes.
Fix this by moving the rxrpc (and, by dependence, the afs) private
per-network namespace registrations to the device group rather than subsys
group. This allows cached rxrpc local endpoints to be cleared and their
UDP sockets closed before we try waiting for the dst records.
The symptom is that lines looking like the following:
unregister_netdevice: waiting for lo to become free
get emitted at regular intervals after running something like the
referenced syzbot test.
Thanks to Vadim for tracking this down and work out the fix.
Reported-by: syzbot+df400f2f24a1677cd7e0@syzkaller.appspotmail.com
Reported-by: Vadim Fedorenko <vfedorenko@novek.ru>
Fixes: 5271953cad ("rxrpc: Use the UDP encap_rcv hook")
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Vadim Fedorenko <vfedorenko@novek.ru>
Link: https://lore.kernel.org/r/161196443016.3868642.5577440140646403533.stgit@warthog.procyon.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
It was reported that on RTL8125 network breaks under heavy UDP load,
e.g. torrent traffic ([0], from comment 27). Realtek confirmed a hw bug
and provided me with a test version of the r8125 driver including a
workaround. Tests confirmed that the workaround fixes the issue.
I modified the original version of the workaround to meet mainline
code style.
[0] https://bugzilla.kernel.org/show_bug.cgi?id=209839
v2:
- rebased to net
v3:
- make rtl_skb_is_udp() more robust and use skb_header_pointer()
to access the ip(v6) header
v4:
- remove dependency on ptp_classify.h
- replace magic number with offsetof(struct udphdr, len)
Fixes: f1bce4ad2f ("r8169: add support for RTL8125")
Tested-by: xplo <xplo.bn@gmail.com>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/6e453d49-1801-e6de-d5f7-d7e6c7526c8f@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The main arcnet interrupt handler calls arcnet_close() then
arcnet_open(), if the RESET status flag is encountered.
This is invalid:
1) In general, interrupt handlers should never call ->ndo_stop() and
->ndo_open() functions. They are usually full of blocking calls and
other methods that are expected to be called only from drivers
init and exit code paths.
2) arcnet_close() contains a del_timer_sync(). If the irq handler
interrupts the to-be-deleted timer, del_timer_sync() will just loop
forever.
3) arcnet_close() also calls tasklet_kill(), which has a warning if
called from irq context.
4) For device reset, the sequence "arcnet_close(); arcnet_open();" is
not complete. Some children arcnet drivers have special init/exit
code sequences, which then embed a call to arcnet_open() and
arcnet_close() accordingly. Check drivers/net/arcnet/com20020.c.
Run the device RESET sequence from a scheduled workqueue instead.
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://lore.kernel.org/r/20210128194802.727770-1-a.darwish@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When MSI_FLAG_ACTIVATE_EARLY is set (which is the case for PCI),
__msi_domain_alloc_irqs() performs the activation of the interrupt (which
in the case of PCI results in the endpoint being programmed) as soon as the
interrupt is allocated.
But it appears that this is only done for the first vector, introducing an
inconsistent behaviour for PCI Multi-MSI.
Fix it by iterating over the number of vectors allocated to each MSI
descriptor. This is easily achieved by introducing a new
"for_each_msi_vector" iterator, together with a tiny bit of refactoring.
Fixes: f3b0946d62 ("genirq/msi: Make sure PCI MSIs are activated early")
Reported-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20210123122759.1781359-1-maz@kernel.org
Our system encountered a re-init error when re-registering same kretprobe,
where the kretprobe_instance in rp->free_instances is illegally accessed
after re-init.
Implementation to avoid re-registration has been introduced for kprobe
before, but lags for register_kretprobe(). We must check if kprobe has
been re-registered before re-initializing kretprobe, otherwise it will
destroy the data struct of kretprobe registered, which can lead to memory
leak, system crash, also some unexpected behaviors.
We use check_kprobe_rereg() to check if kprobe has been re-registered
before running register_kretprobe()'s body, for giving a warning message
and terminate registration process.
Link: https://lkml.kernel.org/r/20210128124427.2031088-1-bobo.shaobowang@huawei.com
Cc: stable@vger.kernel.org
Fixes: 1f0ab40976 ("kprobes: Prevent re-registration of the same kprobe")
[ The above commit should have been done for kretprobes too ]
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@linux.ibm.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Wang ShaoBo <bobo.shaobowang@huawei.com>
Signed-off-by: Cheng Jian <cj.chengjian@huawei.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Pull arm64 fixes from Catalin Marinas:
- Fix the virt_addr_valid() returning true for < PAGE_OFFSET addresses.
- Do not blindly trust the DMA masks from ACPI/IORT.
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
ACPI/IORT: Do not blindly trust DMA masks from firmware
arm64: Fix kernel address detection of __is_lm_address()
Pull btrfs fixes from David Sterba:
"A few more fixes for a late rc:
- fix lockdep complaint on 32bit arches and also remove an unsafe
memory use due to device vs filesystem lifetime
- two fixes for free space tree:
* race during log replay and cache rebuild, now more likely to
happen due to changes in this dev cycle
* possible free space tree corruption with online conversion
during initial tree population"
* tag 'for-5.11-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix log replay failure due to race with space cache rebuild
btrfs: fix lockdep warning due to seqcount_mutex on 32bit arch
btrfs: fix possible free space tree corruption with online conversion
Pull block fixes from Jens Axboe:
"All over the place fixes for this release:
- blk-cgroup iteration teardown resched fix (Baolin)
- NVMe pull request from Christoph:
- add another Write Zeroes quirk (Chaitanya Kulkarni)
- handle a no path available corner case (Daniel Wagner)
- use the proper RCU aware list_add helper (Chao Leng)
- bcache regression fix (Coly)
- bdev->bd_size_lock IRQ fix. This will be fixed in drivers for 5.12,
but for now, we'll make it IRQ safe (Damien)
- null_blk zoned init fix (Damien)
- add_partition() error handling fix (Dinghao)
- s390 dasd kobject fix (Jan)
- nbd fix for freezing queue while adding connections (Josef)
- tag queueing regression fix (Ming)
- revert of a patch that inadvertently meant that we regressed write
performance on raid (Maxim)"
* tag 'block-5.11-2021-01-29' of git://git.kernel.dk/linux-block:
null_blk: cleanup zoned mode initialization
nvme-core: use list_add_tail_rcu instead of list_add_tail for nvme_init_ns_head
nvme-multipath: Early exit if no path is available
nvme-pci: add the DISABLE_WRITE_ZEROES quirk for a SPCC device
bcache: only check feature sets when sb->version >= BCACHE_SB_VERSION_CDEV_WITH_FEATURES
block: fix bd_size_lock use
blk-cgroup: Use cond_resched() when destroy blkgs
Revert "block: simplify set_init_blocksize" to regain lost performance
nbd: freeze the queue while we're adding connections
s390/dasd: Fix inconsistent kobject removal
block: Fix an error handling in add_partition
blk-mq: test QUEUE_FLAG_HCTX_ACTIVE for sbitmap_shared in hctx_may_queue
Pull io_uring fixes from Jens Axboe:
"We got the cancelation story sorted now, so for all intents and
purposes, this should be it for 5.11 outside of any potential little
fixes that may come in. This contains:
- task_work task state fixes (Hao, Pavel)
- Cancelation fixes (me, Pavel)
- Fix for an inflight req patch in this release (Pavel)
- Fix for a lock deadlock issue (Pavel)"
* tag 'io_uring-5.11-2021-01-29' of git://git.kernel.dk/linux-block:
io_uring: reinforce cancel on flush during exit
io_uring: fix sqo ownership false positive warning
io_uring: fix list corruption for splice file_get
io_uring: fix flush cqring overflow list while TASK_INTERRUPTIBLE
io_uring: fix wqe->lock/completion_lock deadlock
io_uring: fix cancellation taking mutex while TASK_UNINTERRUPTIBLE
io_uring: fix __io_uring_files_cancel() with TASK_UNINTERRUPTIBLE
io_uring: only call io_cqring_ev_posted() if events were posted
io_uring: if we see flush on exit, cancel related tasks
The LiteX SOC controller driver makes use of IOMEM functions like
devm_platform_ioremap_resource(), which are only available if
CONFIG_HAS_IOMEM is defined.
This causes the driver to be enable under make ARCH=um allyesconfig,
even though it won't build.
By adding a dependency on HAS_IOMEM, the driver will not be enabled on
architectures which don't support it.
Fixes: 22447a99c9 ("drivers/soc/litex: add LiteX SoC Controller driver")
Signed-off-by: David Gow <davidgow@google.com>
[shorne@gmail.com: Fix typo in commit message pointed out in review]
Signed-off-by: Stafford Horne <shorne@gmail.com>
Pull iommu fixes from Joerg Roedel:
- AMD IOMMU fix to make sure features are detected before they are
queried.
- Intel IOMMU address alignment check fix for an IOLTB flushing
command.
- Performance fix for Intel IOMMU to make sure the code does not do
full IOTLB flushes all the time. Those flushes are very expensive
on emulated IOMMUs.
* tag 'iommu-fixes-v5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/vt-d: Do not use flush-queue when caching-mode is on
iommu/vt-d: Correctly check addr alignment in qi_flush_dev_iotlb_pasid()
iommu/amd: Use IVHD EFR for early initialization of IOMMU features
Pull power management fixes from Rafael Wysocki:
"These fix a deadlock in the 'kexec jump' code and address a possible
hibernation image creation issue.
Specifics:
- Fix a deadlock caused by attempting to acquire the same mutex twice
in a row in the "kexec jump" code (Baoquan He)
- Modify the hibernation image saving code to flush the unwritten
data to the swap storage later so as to avoid failing to write the
image signature which is possible in some cases (Laurent Badel)"
* tag 'pm-5.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM: hibernate: flush swap writer after marking
kernel: kexec: remove the lock operation of system_transition_mutex
Pull ACPI fixes from Rafael Wysocki:
"These fix the handling of notifications in the ACPI thermal driver and
address a device enumeration issue leading to the presence of multiple
'MODALIAS=' entries in one uevent file in sysfs in some cases.
Specifics:
- Modify the ACPI thermal driver to avoid evaluating _TMP directly in
its Notify () handler callback and running too many thermal checks
for one thermal zone at the same time so as to address a work item
accumulation issue observed on some systems that fail to shut down
as a result of it (Rafael Wysocki)
- Modify the ACPI uevent file creation code to avoid putting multiple
'MODALIAS=' entries in one uevent file in sysfs which breaks
systemd-udevd (Kai-Heng Feng)"
* tag 'acpi-5.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: thermal: Do not call acpi_thermal_check() directly
ACPI: sysfs: Prefer "compatible" modalias
Pull drm fixes from Dave Airlie:
"Weekly fixes for graphics, nothing too major, nouveau has a few
regression fixes for various fallout from header changes previously,
vc4 has two fixes, two amdgpu, and a smattering of i915 fixes.
All seems on course for a quieter rc7, fingers crossed.
nouveau:
- fix svm init conditions
- fix nv50 modesetting regression
- fix cursor plane modifiers
- fix > 64x64 cursor regression
vc4:
- Fix LBM size calculation
- Fix high resolutions for hvs5
i915:
- Fix ICL MG PHY vswing
- Fix subplatform handling
- Fix selftest memleak
- Clear CACHE_MODE prior to clearing residuals
- Always flush the active worker before returning from the wait
- Always try to reserve GGTT address 0x0
amdgpu:
- Fix a fan control regression on some boards
- Fix clang warning"
* tag 'drm-fixes-2021-01-29' of git://anongit.freedesktop.org/drm/drm:
drm/nouveau/kms/gk104-gp1xx: Fix > 64x64 cursors
drm/nouveau/kms/nv50-: Report max cursor size to userspace
drivers/nouveau/kms/nv50-: Reject format modifiers for cursor planes
drm/nouveau/svm: fail NOUVEAU_SVM_INIT ioctl on unsupported devices
drm/nouveau/dispnv50: Restore pushing of all data.
amdgpu: fix clang build warning
Revert "drm/amdgpu/swsmu: drop set_fan_speed_percent (v2)"
drm/i915/gt: Always try to reserve GGTT address 0x0
drm/i915: Always flush the active worker before returning from the wait
drm/i915/selftest: Fix potential memory leak
drm/i915: Check for all subplatform bits
drm/i915: Fix ICL MG PHY vswing handling
drm/i915/gt: Clear CACHE_MODE prior to clearing residuals
drm/vc4: Correct POS1_SCL for hvs5
drm/vc4: Correct lbm size and calculation
drm/nouveau/nvif: fix method count when pushing an array
It turns out that the vfs_iocb_iter_{read,write}() functions are
entirely broken, and don't actually use the passed-in file pointer for
IO - only for the preparatory work (permission checking and for the
write_iter function lookup).
That worked fine for overlayfs, which always builds the new iocb with
the same file pointer that it passes in, but in the general case it ends
up doing nonsensical things (and could cause an iterator call that
doesn't even match the passed-in file pointer).
This subtly broke the tty conversion to write_iter in commit
9bb48c82ac ("tty: implement write_iter"), because the console
redirection didn't actually end up redirecting anything, since the
passed-in file pointer was basically ignored, and the actual write was
done with the original non-redirected console tty after all.
The main visible effect of this is that the console messages were no
longer logged to /var/log/boot.log during graphical boot.
Fix the issue by simply not using the vfs write "helper" function at
all, and just redirecting the write entirely internally to the tty
layer. Do the target writability permission checks when actually
registering the target tty with TIOCCONS instead of at write time.
Fixes: 9bb48c82ac ("tty: implement write_iter")
Reported-and-tested-by: Hans de Goede <hdegoede@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The most common question around building the Linux kernel with clang is
"does it work?" and the answer has always been "it depends on your
architecture, configuration, and LLVM version" with no hard answers for
users wanting to experiment. LLVM support has significantly improved
over the past couple of years, resulting in more architectures and
configurations supported, and continuous integration has made it easier
to see what works and what does not.
Add a section that goes over what architectures are supported in the
current kernel version, how they should be built (with just clang or the
LLVM utilities as well), and the level of support they receive. This
will make it easier for people to try out building their kernel with
LLVM and reporting issues that come about from it.
Suggested-by: Miguel Ojeda <ojeda@kernel.org>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Nick Desaulniers <ndesaulnier@google.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Fix kprobe_on_func_entry() returns error code instead of false so that
register_kretprobe() can return an appropriate error code.
append_trace_kprobe() expects the kprobe registration returns -ENOENT
when the target symbol is not found, and it checks whether the target
module is unloaded or not. If the target module doesn't exist, it
defers to probe the target symbol until the module is loaded.
However, since register_kretprobe() returns -EINVAL instead of -ENOENT
in that case, it always fail on putting the kretprobe event on unloaded
modules. e.g.
Kprobe event:
/sys/kernel/debug/tracing # echo p xfs:xfs_end_io >> kprobe_events
[ 16.515574] trace_kprobe: This probe might be able to register after target module is loaded. Continue.
Kretprobe event: (p -> r)
/sys/kernel/debug/tracing # echo r xfs:xfs_end_io >> kprobe_events
sh: write error: Invalid argument
/sys/kernel/debug/tracing # cat error_log
[ 41.122514] trace_kprobe: error: Failed to register probe event
Command: r xfs:xfs_end_io
^
To fix this bug, change kprobe_on_func_entry() to detect symbol lookup
failure and return -ENOENT in that case. Otherwise it returns -EINVAL
or 0 (succeeded, given address is on the entry).
Link: https://lkml.kernel.org/r/161176187132.1067016.8118042342894378981.stgit@devnote2
Cc: stable@vger.kernel.org
Fixes: 59158ec4ae ("tracing/kprobes: Check the probe on unloaded module correctly")
Reported-by: Jianlin Lv <Jianlin.Lv@arm.com>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Eaerlier, tracing was disabled when reading the trace file. This behavior
was changed with:
commit 06e0a548ba ("tracing: Do not disable tracing when reading the
trace file").
This doesn't seem to work with the latency tracers.
The above mentioned commit dit not only change the behavior but also added
an option to emulate the old behavior. The idea with this patch is to
enable this pause-on-trace option when the latency tracers are used.
Link: https://lkml.kernel.org/r/20210119164344.37500-2-Viktor.Rosendahl@bmw.de
Cc: stable@vger.kernel.org
Fixes: 06e0a548ba ("tracing: Do not disable tracing when reading the trace file")
Signed-off-by: Viktor Rosendahl <Viktor.Rosendahl@bmw.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
On some archs, the idle task can call into cpu_suspend(). The cpu_suspend()
will disable or pause function graph tracing, as there's some paths in
bringing down the CPU that can have issues with its return address being
modified. The task_struct structure has a "tracing_graph_pause" atomic
counter, that when set to something other than zero, the function graph
tracer will not modify the return address.
The problem is that the tracing_graph_pause counter is initialized when the
function graph tracer is enabled. This can corrupt the counter for the idle
task if it is suspended in these architectures.
CPU 1 CPU 2
----- -----
do_idle()
cpu_suspend()
pause_graph_tracing()
task_struct->tracing_graph_pause++ (0 -> 1)
start_graph_tracing()
for_each_online_cpu(cpu) {
ftrace_graph_init_idle_task(cpu)
task-struct->tracing_graph_pause = 0 (1 -> 0)
unpause_graph_tracing()
task_struct->tracing_graph_pause-- (0 -> -1)
The above should have gone from 1 to zero, and enabled function graph
tracing again. But instead, it is set to -1, which keeps it disabled.
There's no reason that the field tracing_graph_pause on the task_struct can
not be initialized at boot up.
Cc: stable@vger.kernel.org
Fixes: 380c4b1411 ("tracing/function-graph-tracer: append the tracing_graph_flag")
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=211339
Reported-by: pierre.gondois@arm.com
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
In testing, in a configuration with Redfish and native NVMe multipath when
an EEH is injected, a kernel oops is being encountered:
(unreliable)
lpfc_nvme_ls_req+0x328/0x720 [lpfc]
__nvme_fc_send_ls_req.constprop.13+0x1d8/0x3d0 [nvme_fc]
nvme_fc_create_association+0x224/0xd10 [nvme_fc]
nvme_fc_reset_ctrl_work+0x110/0x154 [nvme_fc]
process_one_work+0x304/0x5d
the NBMe transport is issuing a Disconnect LS request, which the driver
receives and tries to post but the work queue used by the driver is already
being torn down by the eeh.
Fix by validating the validity of the work queue before proceeding with the
LS transmit.
Link: https://lore.kernel.org/r/20210127221601.84878-1-jsmart2021@gmail.com
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
With retpolines disabled, some configurations of GCC, and specifically
the GCC versions 9 and 10 in Ubuntu will add Intel CET instrumentation
to the kernel by default. That breaks certain tracing scenarios by
adding a superfluous ENDBR64 instruction before the fentry call, for
functions which can be called indirectly.
CET instrumentation isn't currently necessary in the kernel, as CET is
only supported in user space. Disable it unconditionally and move it
into the x86's Makefile as CET/CFI... enablement should be a per-arch
decision anyway.
[ bp: Massage and extend commit message. ]
Fixes: 29be86d7f9 ("kbuild: add -fcf-protection=none when using retpoline flags")
Reported-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Tested-by: Nikolay Borisov <nborisov@suse.com>
Cc: <stable@vger.kernel.org>
Cc: Seth Forshee <seth.forshee@canonical.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Link: https://lkml.kernel.org/r/20210128215219.6kct3h2eiustncws@treble
To avoid potential compilation problems, replaced the badly written
MB_TO_SECTS() macro (missing parenthesis around the argument use) with
the inline function mb_to_sects(). And while at it, simplify the
calculation of the total number of zones of the device using the
round_up() macro.
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The tegra_uart_config of the DEBUG_LL code is now placed right at the
start of the .text section after commit which enabled debug output in the
decompressor. Tegra devices are not booting anymore if DEBUG_LL is enabled
since tegra_uart_config data is executes as a code. Fix the misplaced
tegra_uart_config storage by embedding it into the code.
Cc: stable@vger.kernel.org
Fixes: 2596a72d33 ("ARM: 9009/1: uncompress: Enable debug in head.S")
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Building with gcc 4.9.2 reveals a latent bug in the PCI accessors
for Footbridge platforms, which causes a fatal alignment fault
while accessing IO memory. Fix this by making the assembly volatile.
Cc: stable@vger.kernel.org
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Daniel Borkmann says:
====================
pull-request: bpf 2021-01-29
1) Fix two copy_{from,to}_user() warn_on_once splats for BPF cgroup getsockopt
infra when user space is trying to race against optlen, from Loris Reiff.
2) Fix a missing fput() in BPF inode storage map update helper, from Pan Bian.
3) Fix a build error on unresolved symbols on disabled networking / keys LSM
hooks, from Mikko Ylinen.
4) Fix preload BPF prog build when the output directory from make points to a
relative path, from Quentin Monnet.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf, preload: Fix build when $(O) points to a relative path
bpf: Drop disabled LSM hooks from the sleepable set
bpf, inode_storage: Put file handler if no storage was found
bpf, cgroup: Fix problematic bounds check
bpf, cgroup: Fix optlen WARN_ON_ONCE toctou
====================
Link: https://lore.kernel.org/r/20210129001556.6648-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The new mount API requires additional changes to how DFS
is handled. Additional testing of DFS uncovered problems
with domain based DFS referrals (a follow on patch addresses
DFS links) which this patch addresses.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
Pull ecryptfs fix from Tyler Hicks:
"Fix a regression that resulted in two rounds of UID translations when
setting v3 namespaced file capabilities in some configurations"
* tag 'ecryptfs-5.11-rc6-setxattr-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs:
ecryptfs: fix uid translation for setxattr on security.capability
While we do handle the additional cursor sizes introduced in NVE4, it looks
like we accidentally broke this when converting over to use Nvidia's
display headers. Since we now use NVVAL in dispnv50/head907d.c in order to
format the value for the cursor layout and NVD9 only had one byte reserved
vs. the 2 bytes reserved in later generations, we end up accidentally
stripping the second bit in the cursor layout format parameter - causing us
to set the wrong cursor size.
This fixes that by adding our own curs_set hook for 917d which uses the
NV917D headers.
Cc: Martin Peres <martin.peres@free.fr>
Cc: Jeremy Cline <jcline@redhat.com>
Cc: Simon Ser <contact@emersion.fr>
Cc: <stable@vger.kernel.org> # v5.9+
Signed-off-by: Lyude Paul <lyude@redhat.com>
Fixes: ed0b86a90b ("drm/nouveau/kms/nv50-: use NVIDIA's headers for core head_curs_set()")
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Fixes a crash when trying to create a channel on e.g. Turing GPUs when
NOUVEAU_SVM_INIT was called before.
Fixes: eeaf06ac1a ("drm/nouveau/svm: initial support for shared virtual memory")
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Commit f844eb485e introduced a regression for
NV50, which lead to visual artifacts, tearing and eventual crashes.
In the changes of f844eb485e only the first line
was correctly translated to the new NVIDIA header macros:
- PUSH_NVSQ(push, NV827C, 0x0110, 0,
- 0x0114, 0);
+ PUSH_MTHD(push, NV827C, SET_PROCESSING,
+ NVDEF(NV827C, SET_PROCESSING, USE_GAIN_OFS, DISABLE));
The lower part ("0x0114, 0") was probably omitted by accident.
This patch restores the push of the missing data and fixes the regression.
Signed-off-by: Bastian Beranek <bastian.beischer@rwth-aachen.de>
Fixes: f844eb485e ("drm/nouveau/kms/nv50-: use NVIDIA's headers for wndw image_set()")
Link: https://gitlab.freedesktop.org/drm/nouveau/-/issues/14
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
What 84965ff8a8 ("io_uring: if we see flush on exit, cancel related tasks")
really wants is to cancel all relevant REQ_F_INFLIGHT requests reliably.
That can be achieved by io_uring_cancel_files(), but we'll miss it
calling io_uring_cancel_task_requests(files=NULL) from io_uring_flush(),
because it will go through __io_uring_cancel_task_requests().
Just always call io_uring_cancel_files() during cancel, it's good enough
for now.
Cc: stable@vger.kernel.org # 5.9+
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull networking fixes from Jakub Kicinski:
"Networking fixes including fixes from can, xfrm, wireless,
wireless-drivers and netfilter trees. Nothing scary, Intel
WiFi-related fixes seemed most notable to the users.
Current release - regressions:
- dsa: microchip: ksz8795: fix KSZ8794 port map again to program the
CPU port correctly
Current release - new code bugs:
- iwlwifi: pcie: reschedule in long-running memory reads
Previous releases - regressions:
- iwlwifi: dbg: don't try to overwrite read-only FW data
- iwlwifi: provide gso_type to GSO packets
- octeontx2: make sure the buffer is 128 byte aligned
- tcp: make TCP_USER_TIMEOUT accurate for zero window probes
- xfrm: fix wraparound in xfrm_policy_addr_delta()
- xfrm: fix oops in xfrm_replay_advance_bmp due to a race between
CPUs in presence of packet reorder
- tcp: fix TLP timer not set when CA_STATE changes from DISORDER to
OPEN
- wext: fix NULL-ptr-dereference with cfg80211's lack of commit()
Previous releases - always broken:
- igc: fix link speed advertising
- stmmac: configure EHL PSE0 GbE and PSE1 GbE to 32 bits DMA
addressing
- team: protect features update by RCU to avoid deadlock
- xfrm: fix disable_xfrm sysctl when used on xfrm interfaces
themselves
- fec: fix temporary RMII clock reset on link up
- can: dev: prevent potential information leak in can_fill_info()
Misc:
- mrp: fix bad packing of MRP test packet structures
- uapi: fix big endian definition of ipv6_rpl_sr_hdr
- add David Ahern to IPv4/IPv6 maintainers"
* tag 'net-5.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (86 commits)
rxrpc: Fix memory leak in rxrpc_lookup_local
mlxsw: spectrum_span: Do not overwrite policer configuration
selftests: forwarding: Specify interface when invoking mausezahn
stmmac: intel: Configure EHL PSE0 GbE and PSE1 GbE to 32 bits DMA addressing
net: usb: cdc_ether: added support for Thales Cinterion PLSx3 modem family.
ibmvnic: Ensure that CRQ entry read are correctly ordered
MAINTAINERS: add missing header for bonding
net: decnet: fix netdev refcount leaking on error path
net: switchdev: don't set port_obj_info->handled true when -EOPNOTSUPP
can: dev: prevent potential information leak in can_fill_info()
net: fec: Fix temporary RMII clock reset on link up
net: lapb: Add locking to the lapb module
team: protect features update by RCU to avoid deadlock
MAINTAINERS: add David Ahern to IPv4/IPv6 maintainers
net/mlx5: CT: Fix incorrect removal of tuple_nat_node from nat rhashtable
net/mlx5e: Revert parameters on errors when changing MTU and LRO state without reset
net/mlx5e: Revert parameters on errors when changing trust state without reset
net/mlx5e: Correctly handle changing the number of queues when the interface is down
net/mlx5e: Fix CT rule + encap slow path offload and deletion
net/mlx5e: Disable hw-tc-offload when MLX5_CLS_ACT config is disabled
...
During additional testing of the updated cifs.ko with the
new mount API support, we found a few additional cases where
we were logging errors, but not returning them to the user.
For example:
a) invalid security mechanisms
b) invalid cache options
c) unsupported rdma
d) invalid smb dialect requested
Fixes: 24e0a1eff9 ("cifs: switch to new mount api")
Acked-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Ido Schimmel says:
====================
mlxsw: Various fixes
Patch #1 fixes wrong invocation of mausezahn in a couple of selftests.
The tests started failing after Fedora updated their libnet package from
version 1.1.6 to 1.2.1. With the fix the tests pass regardless of libnet
version.
Patch #2 fixes an issue in the mirroring to CPU code that results in
policer configuration being overwritten.
====================
Link: https://lore.kernel.org/r/20210128144820.3280295-1-idosch@idosch.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The purpose of the delayed work in the SPAN module is to potentially
update the destination port and various encapsulation parameters of SPAN
agents that point to a VLAN device or a GRE tap. The destination port
can change following the insertion of a new route, for example.
SPAN agents that point to a physical port or the CPU port are static and
never change throughout the lifetime of the SPAN agent. Therefore, skip
over them in the delayed work.
This fixes an issue where the delayed work overwrites the policer
that was set on a SPAN agent pointing to the CPU. Modifying the delayed
work to inherit the original policer configuration is error-prone, as
the same will be needed for any new parameter.
Fixes: 4039504e6a ("mlxsw: spectrum_span: Allow setting policer on a SPAN agent")
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Specify the interface through which packets should be transmitted so
that the test will pass regardless of the libnet version against which
mausezahn is linked.
Fixes: cab14d1087 ("selftests: Add version of router_multipath.sh using nexthop objects")
Fixes: 3d578d8795 ("selftests: forwarding: Test IPv4 weighted nexthops")
Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull locking fixes from Thomas Gleixner:
"A set of PI futex fixes:
- Address a longstanding issue where the user space part of the PI
futex is not writeable. The kernel returns with inconsistent state
which can in the worst case result in a UAF of a tasks kernel
stack.
The solution is to establish consistent kernel state which makes
future operations on the futex fail because user space and kernel
space state are inconsistent. Not a problem as PI futexes
fundamentaly require a functional RW mapping and if user space
pulls the rug under it, then it can keep the pieces it asked for.
- Address an issue where the return value is incorrect in case that
the futex was acquired after a timeout/signal made the waiter drop
out of the rtmutex wait.
In one of the corner cases the kernel returned an error code
despite having successfully acquired the futex"
* tag 'locking-urgent-2021-01-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
futex: Handle faults correctly for PI futexes
futex: Simplify fixup_pi_state_owner()
futex: Use pi_state_update_owner() in put_pi_state()
rtmutex: Remove unused argument from rt_mutex_proxy_unlock()
futex: Provide and use pi_state_update_owner()
futex: Replace pointless printk in fixup_owner()
futex: Ensure the correct return value from futex_lock_pi()
Pull NVMe fixes from Christoph:
"nvme fixes for 5.11:
- add another Write Zeroes quirk (Chaitanya Kulkarni)
- handle a no path available corner case (Daniel Wagner)
- use the proper RCU aware list_add helper (Chao Leng)"
* tag 'nvme-5.11-2021-01-28' of git://git.infradead.org/nvme:
nvme-core: use list_add_tail_rcu instead of list_add_tail for nvme_init_ns_head
nvme-multipath: Early exit if no path is available
nvme-pci: add the DISABLE_WRITE_ZEROES quirk for a SPCC device
WARNING: CPU: 0 PID: 21359 at fs/io_uring.c:9042
io_uring_cancel_task_requests+0xe55/0x10c0 fs/io_uring.c:9042
Call Trace:
io_uring_flush+0x47b/0x6e0 fs/io_uring.c:9227
filp_close+0xb4/0x170 fs/open.c:1295
close_files fs/file.c:403 [inline]
put_files_struct fs/file.c:418 [inline]
put_files_struct+0x1cc/0x350 fs/file.c:415
exit_files+0x7e/0xa0 fs/file.c:435
do_exit+0xc22/0x2ae0 kernel/exit.c:820
do_group_exit+0x125/0x310 kernel/exit.c:922
get_signal+0x427/0x20f0 kernel/signal.c:2773
arch_do_signal_or_restart+0x2a8/0x1eb0 arch/x86/kernel/signal.c:811
handle_signal_work kernel/entry/common.c:147 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x148/0x250 kernel/entry/common.c:201
__syscall_exit_to_user_mode_work kernel/entry/common.c:291 [inline]
syscall_exit_to_user_mode+0x19/0x50 kernel/entry/common.c:302
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Now io_uring_cancel_task_requests() can be called not through file
notes but directly, remove a WARN_ONCE() there that give us false
positives. That check is not very important and we catch it in other
places.
Fixes: 84965ff8a8 ("io_uring: if we see flush on exit, cancel related tasks")
Cc: stable@vger.kernel.org # 5.9+
Reported-by: syzbot+3e3d9bd0c6ce9efbc3ef@syzkaller.appspotmail.com
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
clang warns about the -mhard-float command line arguments
on architectures that do not support this:
clang: error: argument unused during compilation: '-mhard-float' [-Werror,-Wunused-command-line-argument]
Move this into the gcc-specific arguments.
Fixes: e77165bf7b ("drm/amd/display: Add DCN3 blocks to Makefile")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The "list" of nvme_ns_head is used as rcu list, now in nvme_init_ns_head
list_add_tail is used to add ns->siblings to the rcu list. It is not safe.
Should use list_add_tail_rcu instead of list_add_tail.
Signed-off-by: Chao Leng <lengchao@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
nvme_round_robin_path() should test if the return ns pointer is valid.
nvme_next_ns() will return a NULL pointer if there is no path left.
Fixes: 75c10e7327 ("nvme-multipath: round-robin I/O policy")
Signed-off-by: Daniel Wagner <dwagner@suse.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Pull xen fixes from Juergen Gross:
- A fix for a regression introduced in 5.11 resulting in Xen dom0
having problems to correctly initialize Xenstore.
- A fix for avoiding WARN splats when booting as Xen dom0 with
CONFIG_AMD_MEM_ENCRYPT enabled due to a missing trap handler for the
#VC exception (even if the handler should never be called).
- A fix for the Xen bklfront driver adapting to the correct but
unexpected behavior of new qemu.
* tag 'for-linus-5.11-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
x86/xen: avoid warning in Xen pv guest with CONFIG_AMD_MEM_ENCRYPT enabled
xen: Fix XenStore initialisation for XS_LOCAL
xen-blkfront: allow discard-* nodes to be optional
Pull ia64 fixes from Arnd Bergmann:
"asm-generic/ia64 fixes, and mark as orphaned
Commit 2b49ddcef2 ("ia64: convert to legacy_timer_tick") from my
timer series I merged through the asm-generic tree caused a regression
on all ia64 machines, as bisected by Adrian Glaubitz.
Tony Luck is no longer really working on ia64, so instead of merging
the fix through his tree, we ended up deciding that I'd merge the fix
myself along a patch to mark the architecture as Orphaned and a
compile time warning fix I made while working on the regression"
[ HPE no longer accepts orders for new Itanium hardware, and Intel
stopped accepting orders a year ago. While intel is still officially
shipping chips until July 29, 2021, it's unlikely that any such orders
actually exist.
It's dead, Jim.
- Linus ]
* tag 'asm-generic-fixes-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
ia64: Mark architecture as orphaned
ia64: fix xchg() warning
ia64: fix timer cleanup regression
Pull ARM SoC fixes from Arnd Bergmann:
"These are the current arm-soc bug fixes for linux-5.11. I already
merged a larger set that just came in during the past three days but
has not had much exposure in linux-next, but this is the subset I
merged last week.
Most of these are for the NXP i.MX platform (descriptions from their
pull request):
- Fix pcf2127 reset for imx7d-flex-concentrator board.
- Fix i.MX6 suspend with Thumb-2 kernel.
- Fix ethernet-phy address issue on imx6qdl-sr-som board.
- Fix GPIO3 `gpio-ranges` on i.MX8MP.
- Select SOC_BUS for IMX_SCU driver to fix build issue.
- Fix backlight pwm on imx6qdl-kontron-samx6i which is lost from
#pwm-cells conversion.
- Fix duplicated bus node name for i.MX8MN SoC.
- Fix reset register offset on LS1028A SoC.
- Rename MMC node aliases for imx6q-tbs2910 to keep the MMC device
index consistent with previous kernel version.
- Selecting ARM_GIC_V3 on non-CP15 processors to fix one build
failure with i.MX8M SoC driver.
- Fix typos with status property on imx6qdl-kontron-samx6i board.
- Fix duplicated regulator-name on imx6qdl-gw52xx board.
Aside from i.MX, the bugfixes are all over the place:
- Coccinelle found a refcount imbalance on integrator
- defconfig fix for TI K3
- A boot regression fix for ST ux500
- A code preemption fix for the optee driver
- USB DMA regression on Broadcom Stingray
- A bogus boot time warning fix for at91 code"
* tag 'arm-soc-fixes-v5.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
MAINTAINERS: Include bcm2835 subsequents into search
arm64: dts: broadcom: Fix USB DMA address translation for Stingray
drivers: soc: atmel: add null entry at the end of at91_soc_allowed_list[]
drivers: soc: atmel: Avoid calling at91_soc_init on non AT91 SoCs
tee: optee: replace might_sleep with cond_resched
firmware: imx: select SOC_BUS to fix firmware build
arm64: dts: imx8mp: Correct the gpio ranges of gpio3
ARM: dts: imx6qdl-sr-som: fix some cubox-i platforms
ARM: imx: build suspend-imx6.S with arm instruction set
ARM: dts: imx7d-flex-concentrator: fix pcf2127 reset
ARM: dts: ux500: Reserve memory carveouts
arm64: defconfig: Drop unused K3 SoC specific options
bus: arm-integrator-lm: Add of_node_put() before return statement
ARM: dts: imx6qdl-gw52xx: fix duplicate regulator naming
ARM: dts: imx6qdl-kontron-samx6i: fix i2c_lcd/cam default status
ARM: imx: fix imx8m dependencies
ARM: dts: tbs2910: rename MMC node aliases
arm64: dts: ls1028a: fix the offset of the reset register
arm64: dts: imx8mn: Fix duplicate node name
ARM: dts: imx6qdl-kontron-samx6i: fix pwms for lcd-backlight
Pull rdma fixes from Jason Gunthorpe:
"Several recent regressions and some bug fixes:
- Typo corrupting the max_recv_sge for cxgb4
- Regression from re-using kernel enums as a HW AbI in vmw_pvrdma
- Sleeping inside a spinlock in hns
- Revert the attempt to fix devlink deadlocks as the fix is more buggy
- Typo in sysfs_emit_at conversions
- Revert the removal of VLAN support in rxe"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
Revert "RDMA/rxe: Remove VLAN code leftovers from RXE"
RDMA/usnic: Fix misuse of sysfs_emit_at
Revert "RDMA/mlx5: Fix devlink deadlock on net namespace deletion"
RDMA/hns: Use mutex instead of spinlock for ida allocation
RDMA/vmw_pvrdma: Fix network_hdr_type reported in WC
RDMA/cxgb4: Fix the reported max_recv_sge value
The "prefixpath" mount option needs to be ignored
which was missed in the recent conversion to the
new mount API (prefixpath would be set by the mount
helper if mounting a subdirectory of the root of a
share e.g. //server/share/subdir)
Fixes: 24e0a1eff9 ("cifs: switch to new mount api")
Suggested-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Pull media fixes from Mauro Carvalho Chehab:
- a V4L2 core regression at videobuf2 when checking for single-plane
dmabuf
- a change at uAPI header v4l2-subdev.h, fixing a breakage as BIT()
macro is not available in userspace
- fix some regressions at RC core due to the usage of microseconds
everywhere on it
- a fix for a race condition at RC core
- a rename on a newly-introduced kAPI symbol (v4l2_get_link_rate),
currently used only by a single driver
- Regression fixes for rcar-vin, cedrus, ite-cir, hantro, css, venus,
and cec drivers.
* tag 'media/v5.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
media: hantro: Fix reset_raw_fmt initialization
media: cec: add stm32 driver
media: cedrus: Fix H264 decoding
media: v4l2-subdev.h: BIT() is not available in userspace
media: Revert "media: videobuf2: Fix length check for single plane dmabuf queueing"
media: rc: ite-cir: fix min_timeout calculation
media: venus: core: Fix platform driver shutdown
media: rc: fix timeout handling after switch to microsecond durations
media: v4l: common: Fix naming of v4l2_get_link_rate
media: rcar-vin: fix return, use ret instead of zero
media: ccs: Get static data version minor correctly
media: ccs-pll: Fix link frequency for C-PHY
media: rc: ensure that uevent can be read directly after rc device register
Pull clk fixes from Stephen Boyd:
"A handful of clk driver fixes:
- Build fix for CONFIG_PM=n in the mmp2 driver
- Kconfig warning for unmet dependencies in the i.MX driver
- Make the camera AHB clk always be enabled on qcom sc7180
- Use rate round down semantics for qcom sm8250 SD clks"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: mmp2: fix build without CONFIG_PM
clk: qcom: gcc-sm250: Use floor ops for sdcc clks
clk: imx: fix Kconfig warning for i.MX SCU clk
clk: qcom: gcc-sc7180: Mark the camera abh clock always ON
Pull sound fixes from Takashi Iwai:
"Although the incoming fixes haven't settled down yet, all changes here
are small and mostly device-specific fixes, so nothing look worrisome.
- Yet another USB-audio regression fixes
- HD-audio ID fix and device-specific quirks
- SOF Intel / SoundWire fixes including topology
- ASoC Qualcomm and Mediatek fixes"
* tag 'sound-5.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (24 commits)
ALSA: hda/via: Apply the workaround generically for Clevo machines
ASoC: Intel: sof_sdw: set proper flags for Dell TGL-H SKU 0A5E
ASoC: qcom: lpass: Fix out-of-bounds DAI ID lookup
ASoC: mediatek: mt8192-mt6359: add format constraints for RT5682
ASoC: ak4458: correct reset polarity
ASoC: SOF: SND_INTEL_DSP_CONFIG dependency
ASoC: SOF: Intel: soundwire: fix select/depend unmet dependencies
ALSA: hda: intel-dsp-config: add PCI id for TGL-H
ALSA: usb-audio: workaround for iface reset issue
ALSA: pcm: One more dependency for hw constraints
ALSA: hda/realtek: Enable headset of ASUS B1400CEPE with ALC256
ASoC: Intel: Skylake: Zero snd_ctl_elem_value
ASoC: Intel: Skylake: skl-topology: Fix OOPs ib skl_tplg_complete
ASoC: qcom: Fix number of HDMI RDMA channels on sc7180
ASoC: mediatek: mt8183-da7219: ignore TDM DAI link by default
ASoC: mediatek: mt8183-mt6358: ignore TDM DAI link by default
ASoC: topology: Properly unregister DAI on removal
ASoC: topology: Fix memory corruption in soc_tplg_denum_create_values()
ASoC: qcom: lpass-ipq806x: fix bitwidth regmap field
ASoC: AMD Renoir - refine DMI entries for some Lenovo products
...
This reverts commit dde3c6b72a.
syzbot report a double-free bug. The following case can cause this bug.
- mm/slab_common.c: create_cache(): if the __kmem_cache_create() fails,
it does:
out_free_cache:
kmem_cache_free(kmem_cache, s);
- but __kmem_cache_create() - at least for slub() - will have done
sysfs_slab_add(s)
-> sysfs_create_group() .. fails ..
-> kobject_del(&s->kobj); .. which frees s ...
We can't remove the kmem_cache_free() in create_cache(), because other
error cases of __kmem_cache_create() do not free this.
So, revert the commit dde3c6b72a ("mm/slub: fix a memory leak in
sysfs_slab_add()") to fix this.
Reported-by: syzbot+d0bd96b4696c1ef67991@syzkaller.appspotmail.com
Fixes: dde3c6b72a ("mm/slub: fix a memory leak in sysfs_slab_add()")
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In 24e0a1eff9, the noauto and auto options were missed when migrating
to the new mount API. As a result, users with noauto in their fstab
mount options are now unable to mount cifs filesystems, as they'll
receive an "Unknown parameter" error.
This restores the old behaviour of ignoring noauto and auto if they're
given.
Fixes: 24e0a1eff9 ("cifs: switch to new mount api")
Signed-off-by: Adam Harvey <adam@adamharvey.name>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
For super block version < BCACHE_SB_VERSION_CDEV_WITH_FEATURES, it
doesn't make sense to check the feature sets. This patch checks
super block version in bch_has_feature_* routines, if the version
doesn't have feature sets yet, returns 0 (false) to the caller.
Fixes: 5342fd4255 ("bcache: set bcache device into read-only mode for BCH_FEATURE_INCOMPAT_OBSO_LARGE_BUCKET")
Fixes: ffa4703275 ("bcache: add bucket_size_hi into struct cache_sb_disk for large bucket")
Cc: stable@vger.kernel.org # 5.9+
Reported-and-tested-by: Bockholdt Arne <a.bockholdt@precitec-optronik.de>
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Some block device drivers, e.g. the skd driver, call set_capacity() with
IRQ disabled. This results in lockdep ito complain about inconsistent
lock states ("inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage")
because set_capacity takes a block device bd_size_lock using the
functions spin_lock() and spin_unlock(). Ensure a consistent locking
state by replacing these calls with spin_lock_irqsave() and
spin_lock_irqrestore(). The same applies to bdev_set_nr_sectors().
With this fix, all lockdep complaints are resolved.
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
On !PREEMPT kernel, we can get below softlockup when doing stress
testing with creating and destroying block cgroup repeatly. The
reason is it may take a long time to acquire the queue's lock in
the loop of blkcg_destroy_blkgs(), or the system can accumulate a
huge number of blkgs in pathological cases. We can add a need_resched()
check on each loop and release locks and do cond_resched() if true
to avoid this issue, since the blkcg_destroy_blkgs() is not called
from atomic contexts.
[ 4757.010308] watchdog: BUG: soft lockup - CPU#11 stuck for 94s!
[ 4757.010698] Call trace:
[ 4757.010700] blkcg_destroy_blkgs+0x68/0x150
[ 4757.010701] cgwb_release_workfn+0x104/0x158
[ 4757.010702] process_one_work+0x1bc/0x3f0
[ 4757.010704] worker_thread+0x164/0x468
[ 4757.010705] kthread+0x108/0x138
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
When an Intel IOMMU is virtualized, and a physical device is
passed-through to the VM, changes of the virtual IOMMU need to be
propagated to the physical IOMMU. The hypervisor therefore needs to
monitor PTE mappings in the IOMMU page-tables. Intel specifications
provide "caching-mode" capability that a virtual IOMMU uses to report
that the IOMMU is virtualized and a TLB flush is needed after mapping to
allow the hypervisor to propagate virtual IOMMU mappings to the physical
IOMMU. To the best of my knowledge no real physical IOMMU reports
"caching-mode" as turned on.
Synchronizing the virtual and the physical IOMMU tables is expensive if
the hypervisor is unaware which PTEs have changed, as the hypervisor is
required to walk all the virtualized tables and look for changes.
Consequently, domain flushes are much more expensive than page-specific
flushes on virtualized IOMMUs with passthrough devices. The kernel
therefore exploited the "caching-mode" indication to avoid domain
flushing and use page-specific flushing in virtualized environments. See
commit 78d5f0f500 ("intel-iommu: Avoid global flushes with caching
mode.")
This behavior changed after commit 13cf017446 ("iommu/vt-d: Make use
of iova deferred flushing"). Now, when batched TLB flushing is used (the
default), full TLB domain flushes are performed frequently, requiring
the hypervisor to perform expensive synchronization between the virtual
TLB and the physical one.
Getting batched TLB flushes to use page-specific invalidations again in
such circumstances is not easy, since the TLB invalidation scheme
assumes that "full" domain TLB flushes are performed for scalability.
Disable batched TLB flushes when caching-mode is on, as the performance
benefit from using batched TLB invalidations is likely to be much
smaller than the overhead of the virtual-to-physical IOMMU page-tables
synchronization.
Fixes: 13cf017446 ("iommu/vt-d: Make use of iova deferred flushing")
Signed-off-by: Nadav Amit <namit@vmware.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Will Deacon <will@kernel.org>
Cc: stable@vger.kernel.org
Acked-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210127175317.1600473-1-namit@vmware.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Wrong irq number on px30 and cleanups of stuff on rk3399 regarding
wrongly used dt properties and parts that shouldn't be enabled.
* tag 'v5.11-rockchip-dtsfixes1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip:
arm64: dts: rockchip: Disable display for NanoPi R2S
arm64: dts: rockchip: remove interrupt-names property from rk3399 vdec node
arm64: dts: rockchip: Fix PCIe DT properties on rk3399
arm64: dts: rockchip: Use only supported PCIe link speed on Pinebook Pro
arm64: dts: rockchip: fix vopl iommu irq on px30
Link: https://lore.kernel.org/r/5429065.DvuYhMxLoT@phil
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes for omaps for v5.11-rc cycle
The recent changes to switch SoCs to boot with ti-sysc interconnect
target module driver and genpd caused few regressions:
- The omap_prm reset driver needs to clear any reset bits deasserted by
the bootloader or kexec boot for the three reset bit cases. Otherwise
we can have an oops with accelerators starting to boot with potentially
unconfigured MMU for example
- Custom kernel configs are not automatically selecting simple-pm-bus
driver that we now need to probe interconnects so we need to select it
always
- We are not passing legacy platform data in auxdata with simple-pm-bus
like we do for simple-bus. We need to pass auxdata to simple-pm-bus so
it can pass it to of_platform_populate()
Then recent RCU changes started causing splats for cpuidle44xx that now
need RCU_NONIDLE added to the calls in several places
And then we have few device specific fixes:
- We need to remove legacy spi-cs-hig for gta04 display to work, and
set the gpio to active low
- Omap1 specific ohci-omap needs to call gpio_free()
- Droid4 needs to use padconf interrupt for the slider as the edge
gpio interrupts may be lost for deeper idle states
* tag 'omap-for-v5.11/fixes-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
ARM: dts: omap4-droid4: Fix lost keypad slide interrupts for droid4
drivers: bus: simple-pm-bus: Fix compatibility with simple-bus for auxdata
ARM: OMAP2+: Fix booting for am335x after moving to simple-pm-bus
ARM: OMAP2+: Fix suspcious RCU usage splats for omap_enter_idle_coupled
ARM: dts; gta04: SPI panel chip select is active low
soc: ti: omap-prm: Fix boot time errors for rst_map_012 bits 0 and 1
ARM: OMAP1: OSK: fix ohci-omap breakage
DTS: ARM: gta04: remove legacy spi-cs-high to make display work again
Link: https://lore.kernel.org/r/pull-1611818709-243493@atomide.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
The move of TIF_SYSCALL_EMU to SYSCALL_WORK_SYSCALL_EMU broke single step
reporting. The original code reported the single step when TIF_SINGLESTEP
was set and TIF_SYSCALL_EMU was not set. The SYSCALL_WORK conversion got
the logic wrong and now the reporting only happens when both bits are set.
Restore the original behaviour.
[ tglx: Massaged changelog and dropped the pointless double negation ]
Fixes: 64eb35f701 ("ptrace: Migrate TIF_SYSCALL_EMU to use SYSCALL_WORK flag")
Signed-off-by: Yuxuan Shui <yshuiv7@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Link: https://lore.kernel.org/r/877do3gaq9.fsf@m5Zedd9JOGzJrf0
When we walk up the device hierarchy in tb_acpi_add_link() make sure we
break the loop if the device has no parent. Otherwise we may crash the
kernel by dereferencing a NULL pointer.
Fixes: b2be2b05cf ("thunderbolt: Create device links from ACPI description")
Cc: stable@vger.kernel.org
Signed-off-by: Mario Limonciello <mario.limonciello@dell.com>
Acked-by: Yehezkel Bernat <YehezkelShB@gmail.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
An incorrect address mask is being used in the qi_flush_dev_iotlb_pasid()
to check the address alignment. This leads to a lot of spurious kernel
warnings:
[ 485.837093] DMAR: Invalidate non-aligned address 7f76f47f9000, order 0
[ 485.837098] DMAR: Invalidate non-aligned address 7f76f47f9000, order 0
[ 492.494145] qi_flush_dev_iotlb_pasid: 5734 callbacks suppressed
[ 492.494147] DMAR: Invalidate non-aligned address 7f7728800000, order 11
[ 492.508965] DMAR: Invalidate non-aligned address 7f7728800000, order 11
Fix it by checking the alignment in right way.
Fixes: 288d08e780 ("iommu/vt-d: Handle non-page aligned address")
Reported-and-tested-by: Guo Kaijie <Kaijie.Guo@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Cc: Liu Yi L <yi.l.liu@intel.com>
Link: https://lore.kernel.org/r/20210119043500.1539596-1-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
IOMMU Extended Feature Register (EFR) is used to communicate
the supported features for each IOMMU to the IOMMU driver.
This is normally read from the PCI MMIO register offset 0x30,
and used by the iommu_feature() helper function.
However, there are certain scenarios where the information is needed
prior to PCI initialization, and the iommu_feature() function is used
prematurely w/o warning. This has caused incorrect initialization of IOMMU.
This is the case for the commit 6d39bdee23 ("iommu/amd: Enforce 4k
mapping for certain IOMMU data structures")
Since, the EFR is also available in the IVHD header, and is available to
the driver prior to PCI initialization. Therefore, default to using
the IVHD EFR instead.
Fixes: 6d39bdee23 ("iommu/amd: Enforce 4k mapping for certain IOMMU data structures")
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Tested-by: Brijesh Singh <brijesh.singh@amd.com>
Reviewed-by: Robert Richter <rrichter@amd.com>
Link: https://lore.kernel.org/r/20210120135002.2682-1-suravee.suthikulpanit@amd.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
drm hotplug handling code (drm_client_dev_hotplug()) can wait on mutex,
thus delaying further lt9611uxc IRQ events processing. It was observed
occasionally during bootups, when drm_client_modeset_probe() was waiting
for EDID ready event, which was delayed because IRQ handler was stuck
trying to deliver hotplug event.
Move hotplug notifications from IRQ handler to separate work to be able
to process IRQ events without delays.
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Fixes: 0cbbd5b1a0 ("drm: bridge: add support for lontium LT9611UXC bridge")
Reviewed-by: Andrzej Hajda <a.hajda@samsung.com>
Signed-off-by: Andrzej Hajda <a.hajda@samsung.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210121233303.1221784-4-dmitry.baryshkov@linaro.org
Later variants of the rkisp1 block use more entries in some arrays:
RKISP1_CIF_ISP_AE_MEAN_MAX 25 -> 81
RKISP1_CIF_ISP_HIST_BIN_N_MAX 16 -> 32
RKISP1_CIF_ISP_GAMMA_OUT_MAX_SAMPLES 17 -> 34
RKISP1_CIF_ISP_HISTOGRAM_WEIGHT_GRIDS_SIZE 25 -> 81
and we can still extend the uapi during the 5.11-rc cycle, so do that
now to be on the safe side.
V10 and V11 only need the smaller sizes, while V12 and V13 needed
the larger sizes.
When adding the bigger sizes make sure, values filled from hardware
values and transmitted to userspace don't leak kernel data by zeroing
them beforehand.
Signed-off-by: Heiko Stuebner <heiko.stuebner@theobroma-systems.com>
Signed-off-by: Dafna Hirschfeld <dafna.hirschfeld@collabora.com>
Reviewed-by: Ezequiel Garcia <ezequiel@collabora.com>
Acked-by: Helen Koike <helen.koike@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
The IP block evolved from its rk3288/rk3399 base and the vendor
designates them with a numerical version. rk3399 for example
is designated V10 probably meaning V1.0.
There doesn't seem to be an actual version register we could read that
information from, so allow the match_data to carry that information
for future differentiation.
Also carry that information in the hw_revision field of the media-
controller API, so that userspace also has access to that.
The added versions are:
- V10: at least rk3288 + rk3399
- V11: seemingly unused as of now, but probably appeared in some soc
- V12: at least rk3326 + px30
- V13: at least rk1808
[fix checkpatch warning don't use multiple blank lines]
Signed-off-by: Heiko Stuebner <heiko.stuebner@theobroma-systems.com>
Signed-off-by: Dafna Hirschfeld <dafna.hirschfeld@collabora.com>
Reviewed-by: Ezequiel Garcia <ezequiel@collabora.com>
Acked-by: Helen Koike <helen.koike@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Each entry in the array is a 20 bits value composed of 16 bits unsigned
integer and 4 bits fractional part. So the type should change to __u32.
In addition add a documentation of how the measurements are done.
Signed-off-by: Dafna Hirschfeld <dafna.hirschfeld@collabora.com>
Acked-by: Helen Koike <helen.koike@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
The DE2 display engine hardware takes physical addresses that do not
need PHYS_BASE subtracted. As a result, they should not be present
on the mbus driver match list. Remove them.
This was tested on the A83T, along with the patch allowing the DMA
range map to be non-NULL and restores a working display.
Fixes: b4bdc4fbf8 ("soc: sunxi: Deal with the MBUS DMA offsets in a central place")
Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Link: https://lore.kernel.org/r/20210115175831.1184260-2-paul.kocialkowski@bootlin.com
Some i2c device driver indirectly uses I2C driver when it is now
being suspended. The i2c devices driver is suspended during the
NOIRQ phase and this cannot be changed due to other dependencies.
Therefore, we also need to move the suspend handling for the I2C
controller driver to the NOIRQ phase as well.
Signed-off-by: Qii Wang <qii.wang@mediatek.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Overlayfs's volatile option allows the user to bypass all forced sync calls
to the upperdir filesystem. This comes at the cost of safety. We can never
ensure that the user's data is intact, but we can make a best effort to
expose whether or not the data is likely to be in a bad state.
The best way to handle this in the time being is that if an overlayfs's
upperdir experiences an error after a volatile mount occurs, that error
will be returned on fsync, fdatasync, sync, and syncfs. This is
contradictory to the traditional behaviour of VFS which fails the call
once, and only raises an error if a subsequent fsync error has occurred,
and been raised by the filesystem.
One awkward aspect of the patch is that we have to manually set the
superblock's errseq_t after the sync_fs callback as opposed to just
returning an error from syncfs. This is because the call chain looks
something like this:
sys_syncfs ->
sync_filesystem ->
__sync_filesystem ->
/* The return value is ignored here
sb->s_op->sync_fs(sb)
_sync_blockdev
/* Where the VFS fetches the error to raise to userspace */
errseq_check_and_advance
Because of this we call errseq_set every time the sync_fs callback occurs.
Due to the nature of this seen / unseen dichotomy, if the upperdir is an
inconsistent state at the initial mount time, overlayfs will refuse to
mount, as overlayfs cannot get a snapshot of the upperdir's errseq that
will increment on error until the user calls syncfs.
Signed-off-by: Sargun Dhillon <sargun@sargun.me>
Suggested-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Fixes: c86243b090 ("ovl: provide a mount option "volatile"")
Cc: stable@vger.kernel.org
Reviewed-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
When inode has no listxattr op of its own (e.g. squashfs) vfs_listxattr
calls the LSM inode_listsecurity hooks to list the xattrs that LSMs will
intercept in inode_getxattr hooks.
When selinux LSM is installed but not initialized, it will list the
security.selinux xattr in inode_listsecurity, but will not intercept it
in inode_getxattr. This results in -ENODATA for a getxattr call for an
xattr returned by listxattr.
This situation was manifested as overlayfs failure to copy up lower
files from squashfs when selinux is built-in but not initialized,
because ovl_copy_xattr() iterates the lower inode xattrs by
vfs_listxattr() and vfs_getxattr().
ovl_copy_xattr() skips copy up of security labels that are indentified by
inode_copy_up_xattr LSM hooks, but it does that after vfs_getxattr().
Since we are not going to copy them, skip vfs_getxattr() of the security
labels.
Reported-by: Michael Labriola <michael.d.labriola@gmail.com>
Tested-by: Michael Labriola <michael.d.labriola@gmail.com>
Link: https://lore.kernel.org/linux-unionfs/2nv9d47zt7.fsf@aldarion.sourceruckus.org/
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
The function ovl_dir_real_file() currently uses the inode lock to serialize
writes to the od->upperfile field.
However, this function will get called by ovl_ioctl_set_flags(), which
utilizes the inode lock too. In this case ovl_dir_real_file() will try to
claim a lock that is owned by a function in its call stack, which won't get
released before ovl_dir_real_file() returns.
Fix by replacing the open coded compare and exchange by an explicit atomic
op.
Fixes: 61536bed21 ("ovl: support [S|G]ETFLAGS and FS[S|G]ETXATTR ioctls for directories")
Cc: stable@vger.kernel.org # v5.10
Reported-by: Icenowy Zheng <icenowy@aosc.io>
Tested-by: Icenowy Zheng <icenowy@aosc.io>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
If a capability is stored on disk in v2 format cap_inode_getsecurity() will
currently return in v2 format unconditionally.
This is wrong: v2 cap should be equivalent to a v3 cap with zero rootid,
and so the same conversions performed on it.
If the rootid cannot be mapped, v3 is returned unconverted. Fix this so
that both v2 and v3 return -EOVERFLOW if the rootid (or the owner of the fs
user namespace in case of v2) cannot be mapped into the current user
namespace.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
The vfs_getxattr() in ovl_xattr_set() is used to check whether an xattr
exist on a lower layer file that is to be removed. If the xattr does not
exist, then no need to copy up the file.
This call of vfs_getxattr() wasn't wrapped in credential override, and this
is probably okay. But for consitency wrap this instance as well.
Reported-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Currently there's no way to create an overlay filesystem outside of the
current user namespace. Make sure that if this assumption changes it
doesn't go unnoticed.
Reported-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
The current test module cannot be used for testing platforms (make check)
that do not have support for NFIT. In order to get the ndctl tests working,
we need a module which can emulate NVDIMM devices without relying on
ACPI/NFIT.
The aim of this proposed module is to implement a similar functionality to
the existing module but without the ACPI dependencies.
This RFC series is split into reviewable and compilable chunks.
This patch adds a new driver and registers two nvdimm bus needed for ndctl
make check.
Signed-off-by: Santosh Sivaraj <santosh@fossix.org>
Link: https://lore.kernel.org/r/20201222042240.2983755-2-santosh@fossix.org
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Recent commit 255cbecfe0 modified struct kvm_vcpu_arch to make
'cpuid_entries' a pointer to an array of kvm_cpuid_entry2 entries
rather than embedding the array in the struct. KVM_SET_CPUID and
KVM_SET_CPUID2 were updated accordingly, but KVM_GET_CPUID2 was missed.
As a result, KVM_GET_CPUID2 currently returns random fields from struct
kvm_vcpu_arch to userspace rather than the expected CPUID values. Fix
this by treating 'cpuid_entries' as a pointer when copying its
contents to userspace buffer.
Fixes: 255cbecfe0 ("KVM: x86: allocate vcpu->arch.cpuid_entries dynamically")
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Michael Roth <michael.roth@amd.com.com>
Message-Id: <20210128024451.1816770-1-michael.roth@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When CONFIG_ATH9K is built-in but LED support is in a loadable
module, both ath9k drivers fails to link:
x86_64-linux-ld: drivers/net/wireless/ath/ath9k/gpio.o: in function `ath_deinit_leds':
gpio.c:(.text+0x36): undefined reference to `led_classdev_unregister'
x86_64-linux-ld: drivers/net/wireless/ath/ath9k/gpio.o: in function `ath_init_leds':
gpio.c:(.text+0x179): undefined reference to `led_classdev_register_ext'
The problem is that the 'imply' keyword does not enforce any dependency
but is only a weak hint to Kconfig to enable another symbol from a
defconfig file.
Change imply to a 'depends on LEDS_CLASS' that prevents the incorrect
configuration but still allows building the driver without LED support.
The 'select MAC80211_LEDS' is now ensures that the LED support is
actually used if it is present, and the added Kconfig dependency
on MAC80211_LEDS ensures that it cannot be enabled manually when it
has no effect.
Fixes: 197f466e93 ("ath9k_htc: Do not select MAC80211_LEDS by default")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20210125113654.2408057-1-arnd@kernel.org
This reverts commit 327953e9af.
You cannot use 'boolean' since commit b92d804a51 ("kconfig: drop
'boolean' keyword").
This check is no longer needed.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Joe Perches <joe@perches.com>
Otherwise build fails if the headers are not in the default location. While at
it also ask pkg-config for the libs, with fallback to the existing value.
Signed-off-by: Rolf Eike Beer <eb@emlix.com>
Cc: stable@vger.kernel.org # 5.6.x
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Saeed Mahameed says:
====================
mlx5 fixes 2021-01-26
This series introduces some fixes to mlx5 driver.
* tag 'mlx5-fixes-2021-01-26' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5: CT: Fix incorrect removal of tuple_nat_node from nat rhashtable
net/mlx5e: Revert parameters on errors when changing MTU and LRO state without reset
net/mlx5e: Revert parameters on errors when changing trust state without reset
net/mlx5e: Correctly handle changing the number of queues when the interface is down
net/mlx5e: Fix CT rule + encap slow path offload and deletion
net/mlx5e: Disable hw-tc-offload when MLX5_CLS_ACT config is disabled
net/mlx5: Maintain separate page trees for ECPF and PF functions
net/mlx5e: Fix IPSEC stats
net/mlx5e: Reduce tc unsupported key print level
net/mlx5e: free page before return
net/mlx5e: E-switch, Fix rate calculation for overflow
net/mlx5: Fix memory leak on flow table creation error flow
====================
Link: https://lore.kernel.org/r/20210126234345.202096-1-saeedm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ensure that received Command-Response Queue (CRQ) entries are
properly read in order by the driver. dma_rmb barrier has
been added before accessing the CRQ descriptor to ensure
the entire descriptor is read before processing.
Fixes: 032c5e8284 ("Driver for IBM System i/p VNIC protocol")
Signed-off-by: Lijun Pan <ljp@linux.ibm.com>
Link: https://lore.kernel.org/r/20210128013442.88319-1-ljp@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
1) Honor stateful expressions defined in the set from the dynset
extension. The set definition provides a stateful expression
that must be used by the dynset expression in case it is specified.
2) Missing timeout extension in the set element in the dynset
extension leads to inconsistent ruleset listing, not allowing
the user to restore timeout and expiration on ruleset reload.
3) Do not dump the stateful expression from the dynset extension
if it coming from the set definition.
* git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf:
netfilter: nft_dynset: dump expressions when set definition contains no expressions
netfilter: nft_dynset: add timeout extension to template
netfilter: nft_dynset: honor stateful expressions in set definition
====================
Link: https://lore.kernel.org/r/20210127132512.5472-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Marc Kleine-Budde says:
====================
pull-request: can 2021-01-27
The patch is by Dan Carpenter and fixes a potential information leak in
can_fill_info().
* tag 'linux-can-fixes-for-5.11-20210127' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
can: dev: prevent potential information leak in can_fill_info()
====================
Link: https://lore.kernel.org/r/20210127094028.2778793-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Anthony Nguyen says:
====================
Intel Wired LAN Driver Updates 2021-01-26
This series contains updates to the ice, i40e, and igc driver.
Henry corrects setting an unspecified protocol to IPPROTO_NONE instead of
0 for IPv6 flexbytes filters for ice.
Nick fixes the IPv6 extension header being processed incorrectly and
updates the netdev->dev_addr if it exists in hardware as it may have been
modified outside the ice driver.
Brett ensures a user cannot request more channels than available LAN MSI-X
and fixes the minimum allocation logic as it was incorrectly trying to use
more MSI-X than allocated for ice.
Stefan Assmann minimizes the delay between getting and using the VSI
pointer to prevent a possible crash for i40e.
Corinna Vinschen fixes link speed advertising for igc.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
igc: fix link speed advertising
i40e: acquire VSI pointer only after VF is initialized
ice: Fix MSI-X vector fallback logic
ice: Don't allow more channels than LAN MSI-X available
ice: update dev_addr in ice_set_mac_address even if HW filter exists
ice: Implement flow for IPv6 next header (extension header)
ice: fix FDir IPv6 flexbyte
====================
Link: https://lore.kernel.org/r/20210126221035.658124-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
On building the route there is an assumption that the destination
could be local. In this case loopback_dev is used to get the address.
If the address is still cannot be retrieved dn_route_output_slow
returns EADDRNOTAVAIL with loopback_dev reference taken.
Cannot find hash for the fixes tag because this code was introduced
long time ago. I don't think that this bug has ever fired but the
patch is done just to have a consistent code base.
Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru>
Link: https://lore.kernel.org/r/1611619334-20955-1-git-send-email-vfedorenko@novek.ru
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
It's not true that switchdev_port_obj_notify() only inspects the
->handled field of "struct switchdev_notifier_port_obj_info" if
call_switchdev_blocking_notifiers() returns 0 - there's a WARN_ON()
triggering for a non-zero return combined with ->handled not being
true. But the real problem here is that -EOPNOTSUPP is not being
properly handled.
The wrapper functions switchdev_handle_port_obj_add() et al change a
return value of -EOPNOTSUPP to 0, and the treatment of ->handled in
switchdev_port_obj_notify() seems to be designed to change that back
to -EOPNOTSUPP in case nobody actually acted on the notifier (i.e.,
everybody returned -EOPNOTSUPP).
Currently, as soon as some device down the stack passes the check_cb()
check, ->handled gets set to true, which means that
switchdev_port_obj_notify() cannot actually ever return -EOPNOTSUPP.
This, for example, means that the detection of hardware offload
support in the MRP code is broken: switchdev_port_obj_add() used by
br_mrp_switchdev_send_ring_test() always returns 0, so since the MRP
code thinks the generation of MRP test frames has been offloaded, no
such frames are actually put on the wire. Similarly,
br_mrp_switchdev_set_ring_role() also always returns 0, causing
mrp->ring_role_offloaded to be set to 1.
To fix this, continue to set ->handled true if any callback returns
success or any error distinct from -EOPNOTSUPP. But if all the
callbacks return -EOPNOTSUPP, make sure that ->handled stays false, so
the logic in switchdev_port_obj_notify() can propagate that
information.
Fixes: 9a9f26e8f7 ("bridge: mrp: Connect MRP API with the switchdev API")
Fixes: f30f0601eb ("switchdev: Add helpers to aid traversal through lower devices")
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Link: https://lore.kernel.org/r/20210125124116.102928-1-rasmus.villemoes@prevas.dk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull parisc fixes from Helge Deller:
"Two small fixes:
- Fix linking error with 64-bit kernel when modules are disabled,
reported by kernel test robot
- Remove leftover reference to power_tasklet, by Davidlohr Bueso"
* 'parisc-5.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Enable -mlong-calls gcc option by default when !CONFIG_MODULES
parisc: Remove leftover reference to the power_tasklet
For the proper reboot Odroid-C4 board requires to switch TFLASH_VDD_EN
pin to the high impedance mode, otherwise the board is stuck in the
middle of loading early stages of the bootloader from SD card.
This can be achieved by using the OPEN_DRAIN flag instead of the
ACTIVE_HIGH, what will leave the pin in input mode to achieve high state
(pin has the pull-up) and solve the issue.
Suggested-by: Neil Armstrong <narmstrong@baylibre.com>
Fixes: 326e57518b ("arm64: dts: meson-sm1: add support for Hardkernel ODROID-C4")
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Kevin Hilman <khilman@baylibre.com>
Link: https://lore.kernel.org/r/20210122055218.27241-1-m.szyprowski@samsung.com
The cited commit introduced a serious regression with SATA write speed,
as found by bisecting. This patch reverts this commit, which restores
write speed back to the values observed before this commit.
The performance tests were done on a Helios4 NAS (2nd batch) with 4 HDDs
(WD8003FFBX) using dd (bs=1M count=2000). "Direct" is a test with a
single HDD, the rest are different RAID levels built over the first
partitions of 4 HDDs. Test results are in MB/s, R is read, W is write.
| Direct | RAID0 | RAID10 f2 | RAID10 n2 | RAID6
----------------+--------+-------+-----------+-----------+--------
9011495c94 | R:256 | R:313 | R:276 | R:313 | R:323
(before faulty) | W:254 | W:253 | W:195 | W:204 | W:117
----------------+--------+-------+-----------+-----------+--------
5ff9f19231 | R:257 | R:398 | R:312 | R:344 | R:391
(faulty commit) | W:154 | W:122 | W:67.7 | W:66.6 | W:67.2
----------------+--------+-------+-----------+-----------+--------
5.10.10 | R:256 | R:401 | R:312 | R:356 | R:375
unpatched | W:149 | W:123 | W:64 | W:64.1 | W:61.5
----------------+--------+-------+-----------+-----------+--------
5.10.10 | R:255 | R:396 | R:312 | R:340 | R:393
patched | W:247 | W:274 | W:220 | W:225 | W:121
Applying this patch doesn't hurt read performance, while improves the
write speed by 1.5x - 3.5x (more impact on RAID tests). The write speed
is restored back to the state before the faulty commit, and even a bit
higher in RAID tests (which aren't HDD-bound on this device) - that is
likely related to other optimizations done between the faulty commit and
5.10.10 which also improved the read speed.
Signed-off-by: Maxim Mikityanskiy <maxtram95@gmail.com>
Fixes: 5ff9f19231 ("block: simplify set_init_blocksize")
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jens Axboe <axboe@kernel.dk>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
When set_config changes a line from input to output debounce is
implicitly disabled, as debounce makes no sense for outputs, but the
debounce period is not being cleared and is still reported in the
line info.
So clear the debounce period when the debouncer is stopped in
edge_detector_stop().
Fixes: 65cff70464 ("gpiolib: cdev: support setting debounce")
Cc: stable@vger.kernel.org
Signed-off-by: Kent Gibson <warthog618@gmail.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Address issue observed on real world system with suboptimal IORT table
where DMA masks of PCI devices would get set to 0 as result.
iort_dma_setup() would query the root complex'/named component IORT
entry for a DMA mask, and use that over the one the device has been
configured with earlier.
Ideally we want to use the minimum mask of what the IORT contains for
the root complex and what the device was configured with.
Fixes: 5ac65e8c89 ("ACPI/IORT: Support address size limit for root complexes")
Signed-off-by: Moritz Fischer <mdf@kernel.org>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Link: https://lore.kernel.org/r/20210122012419.95010-1-mdf@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The number reported by the query is N-1 and I think people reading the
sysfs file would expect N instead. For users creating VMs there's no
actual difference because KVM's limit is currently below the UV's
limit.
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Fixes: a0f60f8431 ("s390/protvirt: Add sysfs firmware interface for Ultravisor information")
Cc: stable@vger.kernel.org
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
The queues assigned to a matrix mediated device are currently reset when:
* The VFIO_DEVICE_RESET ioctl is invoked
* The mdev fd is closed by userspace (QEMU)
* The mdev is removed from sysfs.
Immediately after the reset of a queue, a call is made to disable
interrupts for the queue. This is entirely unnecessary because the reset of
a queue disables interrupts, so this will be removed.
Furthermore, vfio_ap_irq_disable() does an unconditional PQAP/AQIC which
can result in a specification exception (when the corresponding facility
is not available), so this is actually a bugfix.
Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
[pasic@linux.ibm.com: minor rework before merging]
Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
Fixes: ec89b55e3b ("s390: ap: implement PAPQ AQIC interception in kernel")
Cc: <stable@vger.kernel.org>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
The vfio_ap device driver registers a group notifier with VFIO when the
file descriptor for a VFIO mediated device for a KVM guest is opened to
receive notification that the KVM pointer is set (VFIO_GROUP_NOTIFY_SET_KVM
event). When the KVM pointer is set, the vfio_ap driver takes the
following actions:
1. Stashes the KVM pointer in the vfio_ap_mdev struct that holds the state
of the mediated device.
2. Calls the kvm_get_kvm() function to increment its reference counter.
3. Sets the function pointer to the function that handles interception of
the instruction that enables/disables interrupt processing.
4. Sets the masks in the KVM guest's CRYCB to pass AP resources through to
the guest.
In order to avoid memory leaks, when the notifier is called to receive
notification that the KVM pointer has been set to NULL, the vfio_ap device
driver should reverse the actions taken when the KVM pointer was set.
Fixes: 258287c994 ("s390: vfio-ap: implement mediated device open callback")
Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Halil Pasic <pasic@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20201223012013.5418-1-akrowiak@linux.ibm.com
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
The "bec" struct isn't necessarily always initialized. For example, the
mcp251xfd_get_berr_counter() function doesn't initialize anything if the
interface is down.
Fixes: 52c793f240 ("can: netlink support for bus-error reporting and counters")
Link: https://lore.kernel.org/r/YAkaRdRJncsJO8Ve@mwanda
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
The recent fix for handling the UIP bit unearthed another issue in the RTC
code. If the RTC is advertised but the readout is straight 0xFF because
it's not available, the old code just proceeded with crappy values, but the
new code hangs because it waits for the UIP bit to become low.
Add a sanity check in the RTC CMOS probe function which reads the RTC_VALID
register (Register D) which should have bit 0-6 cleared. If that's not the
case then fail to register the CMOS.
Add the same check to mc146818_get_time(), warn once when the condition
is true and invalidate the rtc_time data.
Reported-by: Mickaël Salaün <mic@digikod.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Mickaël Salaün <mic@linux.microsoft.com>
Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Link: https://lore.kernel.org/r/87tur3fx7w.fsf@nanos.tec.linutronix.de
In commit 3499ba8198 ("xen: Fix event channel callback via INTX/GSI")
I reworked the triggering of xenbus_probe().
I tried to simplify things by taking out the workqueue based startup
triggered from wake_waiting(); the somewhat poorly named xenbus IRQ
handler.
I missed the fact that in the XS_LOCAL case (Dom0 starting its own
xenstored or xenstore-stubdom, which happens after the kernel is booted
completely), that IRQ-based trigger is still actually needed.
So... put it back, except more cleanly. By just spawning a xenbus_probe
thread which waits on xb_waitq and runs the probe the first time it
gets woken, just as the workqueue-based hack did.
This is actually a nicer approach for *all* the back ends with different
interrupt methods, and we can switch them all over to that without the
complex conditions for when to trigger it. But not in -rc6. This is
the minimal fix for the regression, although it's a step in the right
direction instead of doing a partial revert and actually putting the
workqueue back. It's also simpler than the workqueue.
Fixes: 3499ba8198 ("xen: Fix event channel callback via INTX/GSI")
Reported-by: Juergen Gross <jgross@suse.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/4c9af052a6e0f6485d1de43f2c38b1461996db99.camel@infradead.org
Signed-off-by: Juergen Gross <jgross@suse.com>
Laurent Badel says:
====================
net: fec: Fix temporary RMII clock reset on link up
v2: fixed a compilation warning
The FEC drivers performs a "hardware reset" of the MAC module when the
link is reported to be up. This causes a short glitch in the RMII clock
due to the hardware reset clearing the receive control register which
controls the MII mode. It seems that some link partners do not tolerate
this glitch, and invalidate the link, which leads to a never-ending loop
of negotiation-link up-link down events.
This was observed with the iMX28 Soc and LAN8720/LAN8742 PHYs, with two
Intel adapters I218-LM and X722-DA2 as link partners, though a number of
other link partners do not seem to mind the clock glitch. Changing the
hardware reset to a software reset (clearing bit 1 of the ECR register)
cured the issue.
Attempts to optimize fec_restart() in order to minimize the duration of
the glitch were unsuccessful. Furthermore manually producing the glitch by
setting MII mode and then back to RMII in two consecutive instructions,
resulting in a clock glitch <10us in duration, was enough to cause the
partner to invalidate the link. This strongly suggests that the root cause
of the link being dropped is indeed the change in clock frequency.
In an effort to minimize changes to driver, the patch proposes to use
soft reset only for tested SoCs (iMX28) and only if the link is up. This
preserves hardware reset in other situations, which might be required for
proper setup of the MAC.
====================
Link: https://lore.kernel.org/r/20210125100745.5090-1-laurentbadel@eaton.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
fec_restart() does a hard reset of the MAC module when the link status
changes to up. This temporarily resets the R_CNTRL register which controls
the MII mode of the ENET_OUT clock. In the case of RMII, the clock
frequency momentarily drops from 50MHz to 25MHz until the register is
reconfigured. Some link partners do not tolerate this glitch and
invalidate the link causing failure to establish a stable link when using
PHY polling mode. Since as per IEEE802.3 the criteria for link validity
are PHY-specific, what the partner should tolerate cannot be assumed, so
avoid resetting the MII clock by using software reset instead of hardware
reset when the link is up. This is generally relevant only if the SoC
provides the clock to an external PHY and the PHY is configured for RMII.
Signed-off-by: Laurent Badel <laurentbadel@eaton.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In the lapb module, the timers may run concurrently with other code in
this module, and there is currently no locking to prevent the code from
racing on "struct lapb_cb". This patch adds locking to prevent racing.
1. Add "spinlock_t lock" to "struct lapb_cb"; Add "spin_lock_bh" and
"spin_unlock_bh" to APIs, timer functions and notifier functions.
2. Add "bool t1timer_stop, t2timer_stop" to "struct lapb_cb" to make us
able to ask running timers to abort; Modify "lapb_stop_t1timer" and
"lapb_stop_t2timer" to make them able to abort running timers;
Modify "lapb_t2timer_expiry" and "lapb_t1timer_expiry" to make them
abort after they are stopped by "lapb_stop_t1timer", "lapb_stop_t2timer",
and "lapb_start_t1timer", "lapb_start_t2timer".
3. Let lapb_unregister wait for other API functions and running timers
to stop.
4. The lapb_device_event function calls lapb_disconnect_request. In
order to avoid trying to hold the lock twice, add a new function named
"__lapb_disconnect_request" which assumes the lock is held, and make
it called by lapb_disconnect_request and lapb_device_event.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Cc: Martin Schiller <ms@dev.tdt.de>
Signed-off-by: Xie He <xie.he.0141@gmail.com>
Link: https://lore.kernel.org/r/20210126040939.69995-1-xie.he.0141@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Function __team_compute_features() is protected by team->lock
mutex when it is called from team_compute_features() used when
features of an underlying device is changed. This causes
a deadlock when NETDEV_FEAT_CHANGE notifier for underlying device
is fired due to change propagated from team driver (e.g. MTU
change). It's because callbacks like team_change_mtu() or
team_vlan_rx_{add,del}_vid() protect their port list traversal
by team->lock mutex.
Example (r8169 case where this driver disables TSO for certain MTU
values):
...
[ 6391.348202] __mutex_lock.isra.6+0x2d0/0x4a0
[ 6391.358602] team_device_event+0x9d/0x160 [team]
[ 6391.363756] notifier_call_chain+0x47/0x70
[ 6391.368329] netdev_update_features+0x56/0x60
[ 6391.373207] rtl8169_change_mtu+0x14/0x50 [r8169]
[ 6391.378457] dev_set_mtu_ext+0xe1/0x1d0
[ 6391.387022] dev_set_mtu+0x52/0x90
[ 6391.390820] team_change_mtu+0x64/0xf0 [team]
[ 6391.395683] dev_set_mtu_ext+0xe1/0x1d0
[ 6391.399963] do_setlink+0x231/0xf50
...
In fact team_compute_features() called from team_device_event()
does not need to be protected by team->lock mutex and rcu_read_lock()
is sufficient there for port list traversal.
Fixes: 3d249d4ca7 ("net: introduce ethernet teaming device")
Cc: Saeed Mahameed <saeed@kernel.org>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20210125074416.4056484-1-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If a non nat tuple entry is inserted just to the regular tuples
rhashtable (ct_tuples_ht) and not to natted tuples rhashtable
(ct_nat_tuples_ht). Commit bc562be967 ("net/mlx5e: CT: Save ct entries
tuples in hashtables") mixed up the return labels and names sot that on
cleanup or failure we still try to remove for the natted tuples rhashtable.
Fix that by correctly checking if a natted tuples insertion
before removing it. While here make it more readable.
Fixes: bc562be967 ("net/mlx5e: CT: Save ct entries tuples in hashtables")
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Sometimes, channel params are changed without recreating the channels.
It happens in two basic cases: when the channels are closed, and when
the parameter being changed doesn't affect how channels are configured.
Such changes invoke a hardware command that might fail. The whole
operation should be reverted in such cases, but the code that restores
the parameters' values in the driver was missing. This commit adds this
handling.
Fixes: 2e20a15120 ("net/mlx5e: Fail safe mtu and lro setting")
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Trust state may be changed without recreating the channels. It happens
when the channels are closed, and when channel parameters (min inline
mode) stay the same after changing the trust state. Changing the trust
state is a hardware command that may fail. The current code didn't
restore the channel parameters to their old values if an error happened
and the channels were closed. This commit adds handling for this case.
Fixes: 6e0504c698 ("net/mlx5e: Change inline mode correctly when changing trust state")
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
This commit addresses two issues related to changing the number of
queues when the channels are closed:
1. Missing call to mlx5e_num_channels_changed to update
real_num_tx_queues when the number of TCs is changed.
2. When mlx5e_num_channels_changed returns an error, the channel
parameters must be reverted.
Two Fixes: tags correspond to the first commits where these two issues
were introduced.
Fixes: 3909a12e79 ("net/mlx5e: Fix configuration of XPS cpumasks and netdev queues in corner cases")
Fixes: fa3748775b ("net/mlx5e: Handle errors from netif_set_real_num_{tx,rx}_queues")
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Currently, if a neighbour isn't valid when offloading tunnel encap rules,
we offload the original match and replace the original action with
"goto slow path" action. For this we use a temporary flow attribute based
on the original flow attribute and then change the action. Flow flags,
which among those is the CT flag, are still shared for the slow path rule
offload, so we end up parsing this flow as a CT + goto slow path rule.
Besides being unnecessary, CT action offload saves extra information in
the passed flow attribute, such as created ct_flow and mod_hdr, which
is lost onces the temporary flow attribute is freed.
When a neigh is updated and is valid, we offload the original CT rule
with original CT action, which again creates a ct_flow and mod_hdr
and saves it in the flow's original attribute. Then we delete the slow
path rule with a temporary flow attribute based on original updated
flow attribute, and we free the relevant ct_flow and mod_hdr.
Then when tc deletes this flow, we try to free the ct_flow and mod_hdr
on the flow's attribute again.
To fix the issue, skip all furture proccesing (CT/Sample/Split rules)
in offload/unoffload of slow path rules.
Call trace:
[ 758.850525] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000218
[ 758.952987] Internal error: Oops: 96000005 [#1] PREEMPT SMP
[ 758.964170] Modules linked in: act_csum(E) act_pedit(E) act_tunnel_key(E) act_ct(E) nf_flow_table(E) xt_nat(E) ip6table_filter(E) ip6table_nat(E) xt_comment(E) ip6_tables(E) xt_conntrack(E) xt_MASQUERADE(E) nf_conntrack_netlink(E) xt_addrtype(E) iptable_filter(E) iptable_nat(E) bpfilter(E) br_netfilter(E) bridge(E) stp(E) llc(E) xfrm_user(E) overlay(E) act_mirred(E) act_skbedit(E) rdma_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) esp6_offload(E) esp6(E) esp4_offload(E) esp4(E) xfrm_algo(E) mlx5_ib(OE) ib_uverbs(OE) geneve(E) ip6_udp_tunnel(E) udp_tunnel(E) nfnetlink_cttimeout(E) nfnetlink(E) mlx5_core(OE) act_gact(E) cls_flower(E) sch_ingress(E) openvswitch(E) nsh(E) nf_conncount(E) nf_nat(E) mlxfw(OE) psample(E) nf_conntrack(E) nf_defrag_ipv4(E) vfio_mdev(E) mdev(E) ib_core(OE) mlx_compat(OE) crct10dif_ce(E) uio_pdrv_genirq(E) uio(E) i2c_mlx(E) mlxbf_pmc(E) sbsa_gwdt(E) mlxbf_gige(E) gpio_mlxbf2(E) mlxbf_pka(E) mlx_trio(E) mlx_bootctl(E) bluefield_edac(E) knem(O)
[ 758.964225] ip_tables(E) mlxbf_tmfifo(E) ipv6(E) crc_ccitt(E) nf_defrag_ipv6(E)
[ 759.154186] CPU: 5 PID: 122 Comm: kworker/u16:1 Tainted: G OE 5.4.60-mlnx.52.gde81e85 #1
[ 759.172870] Hardware name: https://www.mellanox.com BlueField SoC/BlueField SoC, BIOS BlueField:3.5.0-2-gc1b5d64 Jan 4 2021
[ 759.195466] Workqueue: mlx5e mlx5e_rep_neigh_update [mlx5_core]
[ 759.207344] pstate: a0000005 (NzCv daif -PAN -UAO)
[ 759.217003] pc : mlx5_del_flow_rules+0x5c/0x160 [mlx5_core]
[ 759.228229] lr : mlx5_del_flow_rules+0x34/0x160 [mlx5_core]
[ 759.405858] Call trace:
[ 759.410804] mlx5_del_flow_rules+0x5c/0x160 [mlx5_core]
[ 759.421337] __mlx5_eswitch_del_rule.isra.43+0x5c/0x1c8 [mlx5_core]
[ 759.433963] mlx5_eswitch_del_offloaded_rule_ct+0x34/0x40 [mlx5_core]
[ 759.446942] mlx5_tc_rule_delete_ct+0x68/0x74 [mlx5_core]
[ 759.457821] mlx5_tc_ct_delete_flow+0x160/0x21c [mlx5_core]
[ 759.469051] mlx5e_tc_unoffload_fdb_rules+0x158/0x168 [mlx5_core]
[ 759.481325] mlx5e_tc_encap_flows_del+0x140/0x26c [mlx5_core]
[ 759.492901] mlx5e_rep_update_flows+0x11c/0x1ec [mlx5_core]
[ 759.504127] mlx5e_rep_neigh_update+0x160/0x200 [mlx5_core]
[ 759.515314] process_one_work+0x178/0x400
[ 759.523350] worker_thread+0x58/0x3e8
[ 759.530685] kthread+0x100/0x12c
[ 759.537152] ret_from_fork+0x10/0x18
[ 759.544320] Code: 97ffef55 51000673 3100067f 54ffff41 (b9421ab3)
[ 759.556548] ---[ end trace fab818bb1085832d ]---
Fixes: 4c3844d9e9 ("net/mlx5e: CT: Introduce connection tracking")
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The cited commit introduce new CONFIG_MLX5_CLS_ACT kconfig variable
to control compilation of TC hardware offloads implementation.
When this configuration is disabled the driver is still wrongly
reports in ethtool that hw-tc-offload is supported.
Fixed by reporting hw-tc-offload is supported only when
CONFIG_MLX5_CLS_ACT is enabled.
Fixes: d956873f90 ("net/mlx5e: Introduce kconfig var for TC support")
Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Pages for the host PF and ECPF were stored in the same tree, so the ECPF
pages were being freed along with the host PF's when the host driver
unloaded.
Combine the function ID and ECPF flag to use as an index into the
x-array containing the trees to get a different tree for the host PF and
ECPF.
Fixes: c6168161f6 ("net/mlx5: Add support for release all pages event")
Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
When IPSEC offload isn't active, the number of stats is not zero, but
the strings are not filled, leading to exposing stats with empty names.
Fix this by using the same condition for NUM_STATS and FILL_STRS.
Fixes: 0aab3e1b04 ("net/mlx5e: IPSec, Expose IPsec HW stat only for supporting HW")
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Raed Salem <raeds@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
"Unsupported key used:" appears in kernel log when flows with
unsupported key are used, arp fields for example.
OpenVSwitch was changed to match on arp fields by default that
caused this warning to appear in kernel log for every arp rule, which
can be a lot.
Fix by lowering print level from warning to debug.
Fixes: e3a2b7ed01 ("net/mlx5e: Support offload cls_flower with drop action")
Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Instead of directly return, goto the error handling label to free
allocated page.
Fixes: 5f29458b77 ("net/mlx5e: Support dump callback in TX reporter")
Signed-off-by: Pan Bian <bianpan2016@163.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
rate_bytes_ps is a 64-bit field. It passed as 32-bit field to
apply_police_params(). Due to this when police rate is higher
than 4Gbps, 32-bit calculation ignores the carry. This results
in incorrect rate configurationn the device.
Fix it by performing 64-bit calculation.
Fixes: fcb64c0f56 ("net/mlx5: E-Switch, add ingress rate support")
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Eli Cohen <elic@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
When we create the ft object we also init rhltable in ft->fgs_hash.
So in error flow before kfree of ft we need to destroy that rhltable.
Fixes: 693c6883bb ("net/mlx5: Add hash table for flow groups in flow table")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Johannes Berg says:
====================
A couple of fixes:
* fix 160 MHz channel switch in mac80211
* fix a staging driver to not deadlock due to some
recent cfg80211 changes
* fix NULL-ptr deref if cfg80211 returns -EINPROGRESS
to wext (syzbot)
* pause TX in mac80211 in type change to prevent crashes
(syzbot)
* tag 'mac80211-for-net-2021-01-26' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211:
staging: rtl8723bs: fix wireless regulatory API misuse
mac80211: pause TX while changing interface type
wext: fix NULL-ptr-dereference with cfg80211's lack of commit()
mac80211: 160MHz with extended NSS BW in CSA
====================
Link: https://lore.kernel.org/r/20210126130529.75225-1-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kalle Valo says:
====================
wireless-drivers fixes for v5.11
Second set of fixes for v5.11. Like in last time we again have more
fixes than usual Actually a bit too much for my liking in this state
of the cycle, but due to unrelated challenges I was only able to
submit them now.
We have few important crash fixes, iwlwifi modifying read-only data
being the most reported issue, and also smaller fixes to iwlwifi.
mt76
* fix a clang warning about enum usage
* fix rx buffer refcounting crash
mt7601u
* fix rx buffer refcounting crash
* fix crash when unbplugging the device
iwlwifi
* fix a crash where we were modifying read-only firmware data
* lots of smaller fixes all over the driver
* tag 'wireless-drivers-2021-01-26' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers: (24 commits)
mt7601u: fix kernel crash unplugging the device
iwlwifi: queue: bail out on invalid freeing
iwlwifi: mvm: guard against device removal in reprobe
iwlwifi: Fix IWL_SUBDEVICE_NO_160 macro to use the correct bit.
iwlwifi: mvm: clear IN_D3 after wowlan status cmd
iwlwifi: pcie: add rules to match Qu with Hr2
iwlwifi: mvm: invalidate IDs of internal stations at mvm start
iwlwifi: mvm: fix the return type for DSM functions 1 and 2
iwlwifi: pcie: reschedule in long-running memory reads
iwlwifi: pcie: use jiffies for memory read spin time limit
iwlwifi: pcie: fix context info memory leak
iwlwifi: pcie: add a NULL check in iwl_pcie_txq_unmap
iwlwifi: pcie: set LTR on more devices
iwlwifi: queue: don't crash if txq->entries is NULL
iwlwifi: fix the NMI flow for old devices
iwlwifi: pnvm: don't try to load after failures
iwlwifi: pnvm: don't skip everything when not reloading
iwlwifi: pcie: avoid potential PNVM leaks
iwlwifi: mvm: take mutex for calling iwl_mvm_get_sync_time()
iwlwifi: mvm: skip power command when unbinding vif during CSA
...
====================
Link: https://lore.kernel.org/r/20210126092202.6A367C433CA@smtp.codeaurora.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Building the kernel with CONFIG_BPF_PRELOAD, and by providing a relative
path for the output directory, may fail with the following error:
$ make O=build bindeb-pkg
...
/.../linux/tools/scripts/Makefile.include:5: *** O=build does not exist. Stop.
make[7]: *** [/.../linux/kernel/bpf/preload/Makefile:9: kernel/bpf/preload/libbpf.a] Error 2
make[6]: *** [/.../linux/scripts/Makefile.build:500: kernel/bpf/preload] Error 2
make[5]: *** [/.../linux/scripts/Makefile.build:500: kernel/bpf] Error 2
make[4]: *** [/.../linux/Makefile:1799: kernel] Error 2
make[4]: *** Waiting for unfinished jobs....
In the case above, for the "bindeb-pkg" target, the error is produced by
the "dummy" check in Makefile.include, called from libbpf's Makefile.
This check changes directory to $(PWD) before checking for the existence
of $(O). But at this step we have $(PWD) pointing to "/.../linux/build",
and $(O) pointing to "build". So the Makefile.include tries in fact to
assert the existence of a directory named "/.../linux/build/build",
which does not exist.
Note that the error does not occur for all make targets and
architectures combinations. This was observed on x86 for "bindeb-pkg",
or for a regular build for UML [0].
Here are some details. The root Makefile recursively calls itself once,
after changing directory to $(O). The content for the variable $(PWD) is
preserved across recursive calls to make, so it is unchanged at this
step. For "bindeb-pkg", $(PWD) is eventually updated because the target
writes a new Makefile (as debian/rules) and calls it indirectly through
dpkg-buildpackage. This script does not preserve $(PWD), which is reset
to the current working directory when the target in debian/rules is
called.
Although not investigated, it seems likely that something similar causes
UML to change its value for $(PWD).
Non-trivial fixes could be to remove the use of $(PWD) from the "dummy"
check, or to make sure that $(PWD) and $(O) are preserved or updated to
always play well and form a valid $(PWD)/$(O) path across the different
targets and architectures. Instead, we take a simpler approach and just
update $(O) when calling libbpf's Makefile, so it points to an absolute
path which should always resolve for the "dummy" check run (through
includes) by that Makefile.
David Gow previously posted a slightly different version of this patch
as a RFC [0], two months ago or so.
[0] https://lore.kernel.org/bpf/20201119085022.3606135-1-davidgow@google.com/t/#u
Fixes: d71fa5c976 ("bpf: Add kernel module with user mode driver that populates bpffs.")
Reported-by: David Gow <davidgow@google.com>
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Link: https://lore.kernel.org/bpf/20210126161320.24561-1-quentin@isovalent.com
Link speed advertising in igc has two problems:
- When setting the advertisement via ethtool, the link speed is converted
to the legacy 32 bit representation for the intel PHY code.
This inadvertently drops ETHTOOL_LINK_MODE_2500baseT_Full_BIT (being
beyond bit 31). As a result, any call to `ethtool -s ...' drops the
2500Mbit/s link speed from the PHY settings. Only reloading the driver
alleviates that problem.
Fix this by converting the ETHTOOL_LINK_MODE_2500baseT_Full_BIT to the
Intel PHY ADVERTISE_2500_FULL bit explicitly.
- Rather than checking the actual PHY setting, the .get_link_ksettings
function always fills link_modes.advertising with all link speeds
the device is capable of.
Fix this by checking the PHY autoneg_advertised settings and report
only the actually advertised speeds up to ethtool.
Fixes: 8c5ad0dae9 ("igc: Add ethtool support")
Signed-off-by: Corinna Vinschen <vinschen@redhat.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
In time-travel mode, since my previous patch, the start time was
initialized too late, so that the system would read it before we
set it, thus always starting system time at 0 (1970-01-01). This
happens because timekeeping_init() reads the time and is called
before time_init().
Unfortunately, I didn't see this before because I was testing it
only with the RTC patch applied (and enabled), and then the time
is read again by the RTC a little - after time_init() this time.
Fix this by just doing the initialization whenever necessary.
Fixes: 2701c1bd91 ("um: time: Fix read_persistent_clock64() in time-travel")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Changing os_idle_sleep() to use pause() (I accidentally described
it as an empty select() in the commit log because I had changed it
from that to pause() in a later revision) exposed a race condition
in the idle code. The following can happen:
timer_settime(0, 0, {it_interval={tv_sec=0, tv_nsec=0}, it_value={tv_sec=0, tv_nsec=624017}}, NULL) = 0
...
<SIGALRM is delivered but we're already on the way to idle>
pause()
and we now hang forever. This was previously possible as well, but
it could never cause UML to hang for more than a second since we
could only sleep for that much, so at most you'd notice a "hiccup"
in the UML. Obviously, any sort of external interrupt also "saves"
it and interrupts pause().
Fix this by properly handling the race, rather than papering over
it again:
- first, block SIGALRM, and obtain the old signal set
- check the timer
- suspend, waiting for any signal out of the old set, if, and only
if, the timer will fire in the future
- restore the old signal mask
This ensures race-free operation: as it's blocked, the signal won't
be delivered while we're looking at the timer even if it were to be
triggered right _after_ we've returned from timer_gettime() with a
non-zero value (telling us the timer will trigger). Thus, despite
getting to sigsuspend() because timer_gettime() told us we're still
waiting, we'll not hang because sigsuspend() will return immediately
due to the pending signal.
Fixes: 49da38a3ef ("um: Simplify os_idle_sleep() and sleep longer")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
This reverts commit 963285b0b4 ("um: support some of
ARCH_HAS_SET_MEMORY"), as it turns out that it's not only not
working (due to um never using the protection bits in the
page tables) but also corrupts the page tables if used on a
non-vmalloc page, since um never allocates proper page tables
for the 'physmem' in the first place.
Fixing all this will take more effort, so for now revert it.
Reported-by: Benjamin Berg <benjamin@sipsolutions.net>
Fixes: 963285b0b4 ("um: support some of ARCH_HAS_SET_MEMORY")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
This reverts commit ef4459a6da ("um: allocate a guard page to
helper threads"), it's broken in multiple ways:
1) the free no longer matches the alloc; and
2) more importantly, the set_memory_ro() causes allocation of
page tables for the normal memory that doesn't have any,
and that later causes corruption and crashes (usually but
not always in vfree()).
We could fix the first bug and use vmalloc() to work around the
second, but set_memory_ro() actually doesn't do anything either
so I'll just revert that as well.
Reported-by: Benjamin Berg <benjamin@sipsolutions.net>
Fixes: ef4459a6da ("um: allocate a guard page to helper threads")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Since struct device is refcounted, we shouldn't free the vu_dev
immediately when it's removed from the platform device, but only
when the references actually all go away. Move the freeing to
the release to accomplish that.
Fixes: 5d38f32499 ("um: drivers: Add virtio vhost-user driver")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
With the addition of the ttynull console driver, the chance that a
console driver was already registerd did increase. Refine the logic when
to dump the kernel message buffer: always dump the buffer, when the UML
stdio console driver is not active and the preferred console.
Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Signed-off-by: Richard Weinberger <richard@nod.at>
The addition of the "ttynull" console driver did break the ordering of the
UML stdio console driver.
The UML stdio console driver is added in late_initcall (7), whereby the
ttynull driver is added in device_initcall (6), which always does make the
ttynull driver the default console.
Fix it by explicitly adding the UML stdio console as the preferred console,
in case no 'console=' command line option was specified.
Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Signed-off-by: Richard Weinberger <richard@nod.at>
This commit fixes a regression to handle command line parameters of ubd.
With a simple line "./linux ubd0="./disk-ext4.img", it fails at
ubd_setup_common(). The commit adds additional checks to the variables
in order to properly parse the paremeters which previously worked.
Fixes: ef3ba87cb7 ("um: ubd: Set device serial attribute from cmdline")
Cc: Christopher Obbard <chris.obbard@collabora.com>
Signed-off-by: Hajime Tazaki <thehajime@gmail.com>
Acked-by: Christopher Obbard <chris.obbard@collabora.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Qualcomm ARM64 defconfig fixes for v5.11
Devicetree patches for SDM845 introduced in v5.11 requires the
platform's interconnect driver to be buildin, or the kernel will fail to
provide a valid console when we hit userspace.
* tag 'qcom-arm64-defconfig-fixes-for-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux:
arm64: defconfig: Make INTERCONNECT_QCOM_SDM845 builtin
Link: https://lore.kernel.org/r/20210125232412.642834-1-bjorn.andersson@linaro.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Qualcomm ARM64 fixes for 5.11
This fixes a regression in Lenovo Yoga C630, where the touchpad in some
units stopped working, by re-enabling the "tsc2" device.
It also marks the LPASS related clocks as protected to allow DB845c and
the Lenovo Yoga C630 to boot even if CONFIG_SDM_LPASSCC_845 is enabled.
* tag 'qcom-arm64-fixes-for-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux:
arm64: dts: qcom: sdm845: Reserve LPASS clocks in gcc
arm64: dts: qcom: c630: keep both touchpad devices enabled
Link: https://lore.kernel.org/r/20210125232039.642565-1-bjorn.andersson@linaro.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
arm64: dts: amlogic: fixes for v5.11-rc
- meson-g12: Set FL-adj property value
* tag 'amlogic-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/khilman/linux-amlogic:
arm64: dts: amlogic: meson-g12: Set FL-adj property value
Link: https://lore.kernel.org/r/7hk0s0x3im.fsf@baylibre.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Remove the magical "repo-abbrev" comment added when this file was
introduced in e0ab1ec9fc ([PATCH] add .mailmap for proper
git-shortlog output, 2007-02-14).
It's been an undocumented feature of git-shortlog(1), originally added
to git for Linus's use. Since then he's no longer using it[1], and
I've removed the feature in git.git's 4e168333a87 (shortlog: remove
unused(?) "repo-abbrev" feature, 2021-01-12). It's on the "master"
branch, but not yet in a release version.
Let's also remove it from linux.git, both as a heads-up to any
potential users of it in linux.git whose use would be broken sooner
than later by git itself, and because it'll eventually be entirely
redundant.
1. https://lore.kernel.org/git/CAHk-=wixHyBKZVUcxq+NCWMbkrX0xnppb7UCopRWw1+oExYpYw@mail.gmail.com/
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When building a kernel without module support, the CONFIG_MLONGCALL option
needs to be enabled in order to reach symbols which are outside of a 22-bit
branch.
This patch changes the autodetection in the Kconfig script to always enable
CONFIG_MLONGCALL when modules are disabled and uses a far call to
preempt_schedule_irq() in intr_do_preempt() to reach the symbol in all cases.
Signed-off-by: Helge Deller <deller@gmx.de>
Reported-by: kernel test robot <lkp@intel.com>
Cc: stable@vger.kernel.org # v5.6+
Pull kvm fixes from Paolo Bonzini:
- x86 bugfixes
- Documentation fixes
- Avoid performance regression due to SEV-ES patches
- ARM:
- Don't allow tagged pointers to point to memslots
- Filter out ARMv8.1+ PMU events on v8.0 hardware
- Hide PMU registers from userspace when no PMU is configured
- More PMU cleanups
- Don't try to handle broken PSCI firmware
- More sys_reg() to reg_to_encoding() conversions
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: allow KVM_REQ_GET_NESTED_STATE_PAGES outside guest mode for VMX
KVM: x86: Revert "KVM: x86: Mark GPRs dirty when written"
KVM: SVM: Unconditionally sync GPRs to GHCB on VMRUN of SEV-ES guest
KVM: nVMX: Sync unsync'd vmcs02 state to vmcs12 on migration
kvm: tracing: Fix unmatched kvm_entry and kvm_exit events
KVM: Documentation: Update description of KVM_{GET,CLEAR}_DIRTY_LOG
KVM: x86: get smi pending status correctly
KVM: x86/pmu: Fix HW_REF_CPU_CYCLES event pseudo-encoding in intel_arch_events[]
KVM: x86/pmu: Fix UBSAN shift-out-of-bounds warning in intel_pmu_refresh()
KVM: x86: Add more protection against undefined behavior in rsvd_bits()
KVM: Documentation: Fix spec for KVM_CAP_ENABLE_CAP_VM
KVM: Forbid the use of tagged userspace addresses for memslots
KVM: arm64: Filter out v8.1+ events on v8.0 HW
KVM: arm64: Compute TPIDR_EL2 ignoring MTE tag
KVM: arm64: Use the reg_to_encoding() macro instead of sys_reg()
KVM: arm64: Allow PSCI SYSTEM_OFF/RESET to return
KVM: arm64: Simplify handling of absent PMU system registers
KVM: arm64: Hide PMU registers from userspace when not available
Pull spi fixes from Mark Brown:
"One new device ID here, plus an error handling fix - nothing
remarkable in either"
* tag 'spi-fix-v5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spidev: Add cisco device compatible
spi: altera: Fix memory leak on error path
Pull regulator fixes from Mark Brown:
"The main thing here is a change to make sure that we don't try to
double resolve the supply of a regulator if we have two probes going
on simultaneously, plus an incremental fix on top of that to resolve a
lockdep issue it introduced.
There's also a patch from Dmitry Osipenko adding stubs for some
functions to avoid build issues in consumers in some configurations"
* tag 'regulator-fix-v5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: Fix lockdep warning resolving supplies
regulator: consumer: Add missing stubs to regulator/consumer.h
regulator: core: avoid regulator_resolve_supply() race condition
This was removed long ago, back in:
6e16d9409e ([PARISC] Convert soft power switch driver to kthread)
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Helge Deller <deller@gmx.de>
This change simplifies the VF initialization check and also minimizes
the delay between acquiring the VSI pointer and using it. As known by
the commit being fixed, there is a risk of the VSI pointer getting
changed. Therefore minimize the delay between getting and using the
pointer.
Fixes: 9889707b06 ("i40e: Fix crash caused by stress setting of VF MAC addresses")
Signed-off-by: Stefan Assmann <sassmann@kpanic.de>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The current MSI-X enablement logic tries to enable best-case MSI-X
vectors and if that fails we only support a bare-minimum set. This
includes a single MSI-X for 1 Tx and 1 Rx queue and a single MSI-X
for the OICR interrupt. Unfortunately, the driver fails to load when we
don't get as many MSI-X as requested for a couple reasons.
First, the code to allocate MSI-X in the driver tries to allocate
num_online_cpus() MSI-X for LAN traffic without caring about the number
of MSI-X actually enabled/requested from the kernel for LAN traffic.
So, when calling ice_get_res() for the PF VSI, it returns failure
because the number of available vectors is less than requested. Fix
this by not allowing the PF VSI to allocation more than
pf->num_lan_msix MSI-X vectors and pf->num_lan_msix Rx/Tx queues.
Limiting the number of queues is done because we don't want more than
1 Tx/Rx queue per interrupt due to performance conerns.
Second, the driver assigns pf->num_lan_msix = 2, to account for LAN
traffic and the OICR. However, pf->num_lan_msix is only meant for LAN
MSI-X. This is causing a failure when the PF VSI tries to
allocate/reserve the minimum pf->num_lan_msix because the OICR MSI-X has
already been reserved, so there may not be enough MSI-X vectors left.
Fix this by setting pf->num_lan_msix = 1 for the failure case. Then the
ICE_MIN_MSIX accounts for the LAN MSI-X and the OICR MSI-X needed for
the failure case.
Update the related defines used in ice_ena_msix_range() to align with
the above behavior and remove the unused RDMA defines because RDMA is
currently not supported. Also, remove the now incorrect comment.
Fixes: 152b978a1f ("ice: Rework ice_ena_msix_range")
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Currently users could create more channels than LAN MSI-X available.
This is happening because there is no check against pf->num_lan_msix
when checking the max allowed channels and will cause performance issues
if multiple Tx and Rx queues are tied to a single MSI-X. Fix this by not
allowing more channels than LAN MSI-X available in pf->num_lan_msix.
Fixes: 87324e747f ("ice: Implement ethtool ops for channels")
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Fix the driver to copy the MAC address configured in ndo_set_mac_address
into dev_addr, even if the MAC filter already exists in HW. In some
situations (e.g. bonding) the netdev's dev_addr could have been modified
outside of the driver, with no change to the HW filter, so the driver
cannot assume that they match.
Fixes: 757976ab16 ("ice: Fix check for removing/adding mac filters")
Signed-off-by: Nick Nunley <nicholas.d.nunley@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
This patch is based on a similar change to i40e by Slawomir Laba:
"i40e: Implement flow for IPv6 next header (extension header)".
When a packet contains an IPv6 header with next header which is
an extension header and not a protocol one, the kernel function
skb_transport_header called with such sk_buff will return a
pointer to the extension header and not to the TCP one.
The above explained call caused a problem with packet processing
for skb with encapsulation for tunnel with ICE_TX_CTX_EIPT_IPV6.
The extension header was not skipped at all.
The ipv6_skip_exthdr function does check if next header of the IPV6
header is an extension header and doesn't modify the l4_proto pointer
if it points to a protocol header value so its safe to omit the
comparison of exthdr and l4.hdr pointers. The ipv6_skip_exthdr can
return value -1. This means that the skipping process failed
and there is something wrong with the packet so it will be dropped.
Fixes: a4e82a81f5 ("ice: Add support for tunnel offloads")
Signed-off-by: Nick Nunley <nicholas.d.nunley@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The packet classifier would occasionally misrecognize an IPv6 training
packet when the next protocol field was 0. The correct value for
unspecified protocol is IPPROTO_NONE.
Fixes: 165d80d6ad ("ice: Support IPv6 Flow Director filters")
Signed-off-by: Henry Tieman <henry.w.tieman@intel.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Commit c0f975af17 ("kconfig: Support building mconf with vendor
sysroot ncurses") introduces a bug when HOSTCC contains parameters:
the whole command line is treated as the program name (with spaces
in it). Therefore, we have to remove the quotes.
Fixes: c0f975af17 ("kconfig: Support building mconf with vendor sysroot ncurses")
Signed-off-by: Enrico Weigelt, metux IT consult <info@metux.net>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
During H264 API overhaul subtle bug was introduced Cedrus driver.
Progressive references have both, top and bottom reference flags set.
Cedrus reference list expects only bottom reference flag and only when
interlaced frames are decoded. However, due to a bug in Cedrus check,
exclusivity is not tested and that flag is set also for progressive
references. That causes "jumpy" background with many videos.
Fix that by checking that only bottom reference flag is set in control
and nothing else.
Tested-by: Andre Heider <a.heider@gmail.com>
Fixes: cfc8c3ed53 ("media: cedrus: h264: Properly configure reference field")
Signed-off-by: Jernej Skrabec <jernej.skrabec@siol.net>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Cc: <stable@vger.kernel.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Some networking and keys LSM hooks are conditionally enabled
and when building the new sleepable BPF LSM hooks with those
LSM hooks disabled, the following build error occurs:
BTFIDS vmlinux
FAILED unresolved symbol bpf_lsm_socket_socketpair
To fix the error, conditionally add the relevant networking/keys
LSM hooks to the sleepable set.
Fixes: 423f16108c ("bpf: Augment the set of sleepable LSM hooks")
Signed-off-by: Mikko Ylinen <mikko.ylinen@linux.intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: KP Singh <kpsingh@kernel.org>
Link: https://lore.kernel.org/bpf/20210125063936.89365-1-mikko.ylinen@linux.intel.com
It has been reported on IRC and in KernelCI boot tests, this change breaks
internal PHY support on the Amlogic G12A/SM1 Based boards.
We suspect the added signal to reset more than the Ethernet MAC but also
the MDIO/(RG)MII mux used to redirect the MAC signals to the internal PHY.
This reverts commit f3362f0c18 while we find
and acceptable solution to cleanly reset the Ethernet MAC.
Reported-by: Corentin Labbe <clabbe@baylibre.com>
Acked-by: Jérôme Brunet <jbrunet@baylibre.com>
Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
Signed-off-by: Kevin Hilman <khilman@baylibre.com>
Link: https://lore.kernel.org/r/20210126080951.2383740-1-narmstrong@baylibre.com
If the tctx inflight number haven't changed because of cancellation,
__io_uring_task_cancel() will continue leaving the task in
TASK_UNINTERRUPTIBLE state, that's not expected by
__io_uring_files_cancel(). Ensure we always call finish_wait() before
retrying.
Cc: stable@vger.kernel.org # 5.9+
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
xhci-mtk needs XHCI_MTK_HOST quirk functions in add_endpoint() and
drop_endpoint() to handle its own sw bandwidth management.
It stores bandwidth data into an internal table every time
add_endpoint() is called, and drops those in drop_endpoint().
But when bandwidth allocation fails at one endpoint, all earlier
allocation from the same interface could still remain at the table.
This patch moves bandwidth management codes to check_bandwidth() and
reset_bandwidth() path. To do so, this patch also adds those functions
to xhci_driver_overrides and lets mtk-xhci to release all failed
endpoints in reset_bandwidth() path.
Fixes: 08e469de87 ("usb: xhci-mtk: supports bandwidth scheduling with multi-TT")
Signed-off-by: Ikjoon Jang <ikjn@chromium.org>
Link: https://lore.kernel.org/r/20210113180444.v6.1.Id0d31b5f3ddf5e734d2ab11161ac5821921b1e1e@changeid
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
fixup_pi_state_owner() tries to ensure that the state of the rtmutex,
pi_state and the user space value related to the PI futex are consistent
before returning to user space. In case that the user space value update
faults and the fault cannot be resolved by faulting the page in via
fault_in_user_writeable() the function returns with -EFAULT and leaves
the rtmutex and pi_state owner state inconsistent.
A subsequent futex_unlock_pi() operates on the inconsistent pi_state and
releases the rtmutex despite not owning it which can corrupt the RB tree of
the rtmutex and cause a subsequent kernel stack use after free.
It was suggested to loop forever in fixup_pi_state_owner() if the fault
cannot be resolved, but that results in runaway tasks which is especially
undesired when the problem happens due to a programming error and not due
to malice.
As the user space value cannot be fixed up, the proper solution is to make
the rtmutex and the pi_state consistent so both have the same owner. This
leaves the user space value out of sync. Any subsequent operation on the
futex will fail because the 10th rule of PI futexes (pi_state owner and
user space value are consistent) has been violated.
As a consequence this removes the inept attempts of 'fixing' the situation
in case that the current task owns the rtmutex when returning with an
unresolvable fault by unlocking the rtmutex which left pi_state::owner and
rtmutex::owner out of sync in a different and only slightly less dangerous
way.
Fixes: 1b7558e457 ("futexes: fix fault handling in futex_lock_pi")
Reported-by: gzobqq@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Updating pi_state::owner is done at several places with the same
code. Provide a function for it and use that at the obvious places.
This is also a preparation for a bug fix to avoid yet another copy of the
same code or alternatively introducing a completely unpenetratable mess of
gotos.
Originally-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
If that unexpected case of inconsistent arguments ever happens then the
futex state is left completely inconsistent and the printk is not really
helpful. Replace it with a warning and make the state consistent.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
In case that futex_lock_pi() was aborted by a signal or a timeout and the
task returned without acquiring the rtmutex, but is the designated owner of
the futex due to a concurrent futex_unlock_pi() fixup_owner() is invoked to
establish consistent state. In that case it invokes fixup_pi_state_owner()
which in turn tries to acquire the rtmutex again. If that succeeds then it
does not propagate this success to fixup_owner() and futex_lock_pi()
returns -EINTR or -ETIMEOUT despite having the futex locked.
Return success from fixup_pi_state_owner() in all cases where the current
task owns the rtmutex and therefore the futex and propagate it correctly
through fixup_owner(). Fixup the other callsite which does not expect a
positive return value.
Fixes: c1e2f0eaf0 ("futex: Avoid violating the 10th rule of futex")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
The first thing the active retirement worker does is decrement the
i915_active count.
The first thing we do during i915_active_wait is try to increment the
i915_active count, but only if already active [non-zero].
The wait may see that the retirement is already started and so marked the
i915_active as idle, and skip waiting for the retirement handler.
However, the caller of i915_active_wait may immediately free the
i915_active upon returning (e.g. i915_vma_destroy) so we must not return
before the concurrent access from the worker is completed. We must
always flush the worker.
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2473
Fixes: 274cbf20fd ("drm/i915: Push the i915_active.retire into a worker")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: <stable@vger.kernel.org> # v5.5+
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210121232807.16618-1-chris@chris-wilson.co.uk
(cherry picked from commit 977a372e972cb42799746c284035a33c64ebace9)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
ASoC: Fixes for v5.11
More fixes for v5.11, almost all driver specific issues including new
device IDs - there's one error handling fix for the topology stuff too.
This code ends up calling wiphy_apply_custom_regulatory(), for which
we document that it should be called before wiphy_register(). This
driver doesn't do that, but calls it from ndo_open() with the RTNL
held, which caused deadlocks.
Since the driver just registers static regdomain data and then the
notifier applies the channel changes if any, there's no reason for
it to call this in ndo_open(), move it earlier to fix the deadlock.
Reported-and-tested-by: Hans de Goede <hdegoede@redhat.com>
Fixes: 51d62f2f2c ("cfg80211: Save the regulatory domain with a lock")
Link: https://lore.kernel.org/r/20210126115409.d5fd6f8fe042.Ib5823a6feb2e2aa01ca1a565d2505367f38ad246@changeid
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The recent commit to fix a memory leak introduced an inadvertant NULL
pointer dereference. The `wacom_wac->pen_fifo` variable was never
intialized, resuling in a crash whenever functions tried to use it.
Since the FIFO is only used by AES pens (to buffer events from pen
proximity until the hardware reports the pen serial number) this would
have been easily overlooked without testing an AES device.
This patch converts `wacom_wac->pen_fifo` over to a pointer (since the
call to `devres_alloc` allocates memory for us) and ensures that we assign
it to point to the allocated and initalized `pen_fifo` before the function
returns.
Link: https://github.com/linuxwacom/input-wacom/issues/230
Fixes: 37309f47e2 ("HID: wacom: Fix memory leakage caused by kfifo_alloc")
CC: stable@vger.kernel.org # v4.19+
Signed-off-by: Jason Gerecke <jason.gerecke@wacom.com>
Tested-by: Ping Cheng <ping.cheng@wacom.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
This is inline with the specification described in blkif.h:
* discard-granularity: should be set to the physical block size if
node is not present.
* discard-alignment, discard-secure: should be set to 0 if node not
present.
This was detected as QEMU would only create the discard-granularity
node but not discard-alignment, and thus the setup done in
blkfront_setup_discard would fail.
Fix blkfront_setup_discard to not fail on missing nodes, and also fix
blkif_set_queue_limits to set the discard granularity to the physical
block size if none is specified in xenbus.
Fixes: ed30bf317c ('xen-blkfront: Handle discard requests.')
Reported-by: Arthur Borsboom <arthurborsboom@gmail.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Tested-By: Arthur Borsboom <arthurborsboom@gmail.com>
Link: https://lore.kernel.org/r/20210119105727.95173-1-roger.pau@citrix.com
Signed-off-by: Juergen Gross <jgross@suse.com>
Prior to commit 7c03e2cda4 ("vfs: move cap_convert_nscap() call into
vfs_setxattr()") the translation of nscap->rootid did not take stacked
filesystems (overlayfs and ecryptfs) into account.
That patch fixed the overlay case, but made the ecryptfs case worse.
Restore old the behavior for ecryptfs that existed before the overlayfs
fix. This does not fix ecryptfs's handling of complex user namespace
setups, but it does make sure existing setups don't regress.
Reported-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Tyler Hicks <code@tyhicks.com>
Fixes: 7c03e2cda4 ("vfs: move cap_convert_nscap() call into vfs_setxattr()")
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Tyler Hicks <code@tyhicks.com>
VMX also uses KVM_REQ_GET_NESTED_STATE_PAGES for the Hyper-V eVMCS,
which may need to be loaded outside guest mode. Therefore we cannot
WARN in that case.
However, that part of nested_get_vmcs12_pages is _not_ needed at
vmentry time. Split it out of KVM_REQ_GET_NESTED_STATE_PAGES handling,
so that both vmentry and migration (and in the latter case, independent
of is_guest_mode) do the parts that are needed.
Cc: <stable@vger.kernel.org> # 5.10.x: f2c7ef3ba: KVM: nSVM: cancel KVM_REQ_GET_NESTED_STATE_PAGES
Cc: <stable@vger.kernel.org> # 5.10.x
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Revert the dirty/available tracking of GPRs now that KVM copies the GPRs
to the GHCB on any post-VMGEXIT VMRUN, even if a GPR is not dirty. Per
commit de3cd117ed ("KVM: x86: Omit caching logic for always-available
GPRs"), tracking for GPRs noticeably impacts KVM's code footprint.
This reverts commit 1c04d8c986.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210122235049.3107620-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Drop the per-GPR dirty checks when synchronizing GPRs to the GHCB, the
GRPs' dirty bits are set from time zero and never cleared, i.e. will
always be seen as dirty. The obvious alternative would be to clear
the dirty bits when appropriate, but removing the dirty checks is
desirable as it allows reverting GPR dirty+available tracking, which
adds overhead to all flavors of x86 VMs.
Note, unconditionally writing the GPRs in the GHCB is tacitly allowed
by the GHCB spec, which allows the hypervisor (or guest) to provide
unnecessary info; it's the guest's responsibility to consume only what
it needs (the hypervisor is untrusted after all).
The guest and hypervisor can supply additional state if desired but
must not rely on that additional state being provided.
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Fixes: 291bd20d5d ("KVM: SVM: Add initial support for a VMGEXIT VMEXIT")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210122235049.3107620-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Even when we are outside the nested guest, some vmcs02 fields
may not be in sync vs vmcs12. This is intentional, even across
nested VM-exit, because the sync can be delayed until the nested
hypervisor performs a VMCLEAR or a VMREAD/VMWRITE that affects those
rarely accessed fields.
However, during KVM_GET_NESTED_STATE, the vmcs12 has to be up to date to
be able to restore it. To fix that, call copy_vmcs02_to_vmcs12_rare()
before the vmcs12 contents are copied to userspace.
Fixes: 7952d769c2 ("KVM: nVMX: Sync rarely accessed guest fields only when needed")
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210114205449.8715-2-mlevitsk@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
On VMX, if we exit and then re-enter immediately without leaving
the vmx_vcpu_run() function, the kvm_entry event is not logged.
That means we will see one (or more) kvm_exit, without its (their)
corresponding kvm_entry, as shown here:
CPU-1979 [002] 89.871187: kvm_entry: vcpu 1
CPU-1979 [002] 89.871218: kvm_exit: reason MSR_WRITE
CPU-1979 [002] 89.871259: kvm_exit: reason MSR_WRITE
It also seems possible for a kvm_entry event to be logged, but then
we leave vmx_vcpu_run() right away (if vmx->emulation_required is
true). In this case, we will have a spurious kvm_entry event in the
trace.
Fix these situations by moving trace_kvm_entry() inside vmx_vcpu_run()
(where trace_kvm_exit() already is).
A trace obtained with this patch applied looks like this:
CPU-14295 [000] 8388.395387: kvm_entry: vcpu 0
CPU-14295 [000] 8388.395392: kvm_exit: reason MSR_WRITE
CPU-14295 [000] 8388.395393: kvm_entry: vcpu 0
CPU-14295 [000] 8388.395503: kvm_exit: reason EXTERNAL_INTERRUPT
Of course, not calling trace_kvm_entry() in common x86 code any
longer means that we need to adjust the SVM side of things too.
Signed-off-by: Lorenzo Brescia <lorenzo.brescia@edu.unito.it>
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Message-Id: <160873470698.11652.13483635328769030605.stgit@Wayrath>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The injection process of smi has two steps:
Qemu KVM
Step1:
cpu->interrupt_request &= \
~CPU_INTERRUPT_SMI;
kvm_vcpu_ioctl(cpu, KVM_SMI)
call kvm_vcpu_ioctl_smi() and
kvm_make_request(KVM_REQ_SMI, vcpu);
Step2:
kvm_vcpu_ioctl(cpu, KVM_RUN, 0)
call process_smi() if
kvm_check_request(KVM_REQ_SMI, vcpu) is
true, mark vcpu->arch.smi_pending = true;
The vcpu->arch.smi_pending will be set true in step2, unfortunately if
vcpu paused between step1 and step2, the kvm_run->immediate_exit will be
set and vcpu has to exit to Qemu immediately during step2 before mark
vcpu->arch.smi_pending true.
During VM migration, Qemu will get the smi pending status from KVM using
KVM_GET_VCPU_EVENTS ioctl at the downtime, then the smi pending status
will be lost.
Signed-off-by: Jay Zhou <jianjay.zhou@huawei.com>
Signed-off-by: Shengen Zhuang <zhuangshengen@huawei.com>
Message-Id: <20210118084720.1585-1-jianjay.zhou@huawei.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Since we know vPMU will not work properly when (1) the guest bit_width(s)
of the [gp|fixed] counters are greater than the host ones, or (2) guest
requested architectural events exceeds the range supported by the host, so
we can setup a smaller left shift value and refresh the guest cpuid entry,
thus fixing the following UBSAN shift-out-of-bounds warning:
shift exponent 197 is too large for 64-bit type 'long long unsigned int'
Call Trace:
__dump_stack lib/dump_stack.c:79 [inline]
dump_stack+0x107/0x163 lib/dump_stack.c:120
ubsan_epilogue+0xb/0x5a lib/ubsan.c:148
__ubsan_handle_shift_out_of_bounds.cold+0xb1/0x181 lib/ubsan.c:395
intel_pmu_refresh.cold+0x75/0x99 arch/x86/kvm/vmx/pmu_intel.c:348
kvm_vcpu_after_set_cpuid+0x65a/0xf80 arch/x86/kvm/cpuid.c:177
kvm_vcpu_ioctl_set_cpuid2+0x160/0x440 arch/x86/kvm/cpuid.c:308
kvm_arch_vcpu_ioctl+0x11b6/0x2d70 arch/x86/kvm/x86.c:4709
kvm_vcpu_ioctl+0x7b9/0xdb0 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3386
vfs_ioctl fs/ioctl.c:48 [inline]
__do_sys_ioctl fs/ioctl.c:753 [inline]
__se_sys_ioctl fs/ioctl.c:739 [inline]
__x64_sys_ioctl+0x193/0x200 fs/ioctl.c:739
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Reported-by: syzbot+ae488dc136a4cc6ba32b@syzkaller.appspotmail.com
Signed-off-by: Like Xu <like.xu@linux.intel.com>
Message-Id: <20210118025800.34620-1-like.xu@linux.intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add compile-time asserts in rsvd_bits() to guard against KVM passing in
garbage hardcoded values, and cap the upper bound at '63' for dynamic
values to prevent generating a mask that would overflow a u64.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210113204515.3473079-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The documentation classifies KVM_ENABLE_CAP with KVM_CAP_ENABLE_CAP_VM
as a vcpu ioctl, which is incorrect. Fix it by specifying it as a VM
ioctl.
Fixes: e5d83c74a5 ("kvm: make KVM_CAP_ENABLE_CAP_VM architecture agnostic")
Signed-off-by: Quentin Perret <qperret@google.com>
Message-Id: <20210108165349.747359-1-qperret@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
KVM/arm64 fixes for 5.11, take #2
- Don't allow tagged pointers to point to memslots
- Filter out ARMv8.1+ PMU events on v8.0 hardware
- Hide PMU registers from userspace when no PMU is configured
- More PMU cleanups
- Don't try to handle broken PSCI firmware
- More sys_reg() to reg_to_encoding() conversions
Pull crypto fix from Herbert Xu:
"Fix a regression in the cesa driver"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: marvel/cesa - Fix tdma descriptor on 64-bit
Following RFC 6554 [1], the current order of fields is wrong for big
endian definition. Indeed, here is how the header looks like:
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Next Header | Hdr Ext Len | Routing Type | Segments Left |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| CmprI | CmprE | Pad | Reserved |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
This patch reorders fields so that big endian definition is now correct.
[1] https://tools.ietf.org/html/rfc6554#section-3
Fixes: cfa933d938 ("include: uapi: linux: add rpl sr header definition")
Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
NanoPi R2S is headless, so rightly does not enable any of the display
interface hardware, which currently provokes an obnoxious error in the
boot log from the fake DRM device failing to find anything to bind to.
It probably isn't *too* hard to obviate the fake device shenanigans
entirely with a bit of driver reshuffling, but for now let's just
disable it here to shut up the spurious error.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/c4553dfad1ad6792c4f22454c135ff55de77e2d6.1611186099.git.robin.murphy@arm.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
This document was written a long time ago. Update it.
[1] Drop the version information
The range of the supported GCC versions are always changing. The
current minimal GCC version is 4.9, and commit 1e860048c5
("gcc-plugins: simplify GCC plugin-dev capability test") removed the
old code accordingly.
We do not need to mention specific version ranges like "all gcc versions
from 4.5 to 6.0" since we forget to update the documentation when we
raise the minimal compiler version.
[2] Drop the C compiler statements
Since commit 77342a02ff ("gcc-plugins: drop support for GCC <= 4.7")
the GCC plugin infrastructure only supports g++.
[3] Drop supported architectures
As of v5.11-rc4, the infrastructure supports more architectures;
arm, arm64, mips, powerpc, riscv, s390, um, and x86. (just grep
"select HAVE_GCC_PLUGINS") Again, we miss to update this document when a
new architecture is supported. Let's just say "only some architectures".
[4] Update the apt-get example
We are now discussing to bump the minimal version to GCC 5. The GCC 4.9
support will be removed sooner or later. Change the package example to
gcc-10-plugin-dev while we are here.
[5] Update the build target
Since commit ce2fd53a10 ("kbuild: descend into scripts/gcc-plugins/
via scripts/Makefile"), "make gcc-plugins" is not supported.
"make scripts" builds all the enabled plugins, including some other
tools.
[6] Update the steps for adding a new plugin
At first, all CONFIG options for GCC plugins were located in arch/Kconfig.
After commit 45332b1bdf ("gcc-plugins: split out Kconfig entries to
scripts/gcc-plugins/Kconfig"), scripts/gcc-plugins/Kconfig became the
central place to collect plugin CONFIG options. In my understanding,
this requirement no longer exists because commit 9f671e5815 ("security:
Create "kernel hardening" config area") moved some of plugin CONFIG
options to another file. Find an appropriate place to add the new CONFIG.
The sub-directory support was never used by anyone, and removed by
commit c17d6179ad ("gcc-plugins: remove unused GCC_PLUGIN_SUBDIR").
Remove the useless $(src)/ prefix.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Recently userspace has started making more use of SW_TABLET_MODE
(when an input-dev reports this).
Specifically recent GNOME3 versions will:
1. When SW_TABLET_MODE is reported and is reporting 0:
1.1 Disable accelerometer-based screen auto-rotation
1.2 Disable automatically showing the on-screen keyboard when a
text-input field is focussed
2. When SW_TABLET_MODE is reported and is reporting 1:
2.1 Ignore input-events from the builtin keyboard and touchpad
(this is for 360° hinges style 2-in-1s where the keyboard and
touchpads are accessible on the back of the tablet when folded
into tablet-mode)
This means that claiming to support SW_TABLET_MODE when it does not
actually work / reports correct values has bad side-effects.
The check in the hp-wmi code which is used to decide if the input-dev
should claim SW_TABLET_MODE support, only checks if the
HPWMI_HARDWARE_QUERY is supported. It does *not* check if the hardware
actually is capable of reporting SW_TABLET_MODE.
This leads to the hp-wmi input-dev claiming SW_TABLET_MODE support,
while in reality it will always report 0 as SW_TABLET_MODE value.
This has been seen on a "HP ENVY x360 Convertible 15-cp0xxx" and
this likely is the case on a whole lot of other HP models.
This problem causes both auto-rotation and on-screen keyboard
support to not work on affected x360 models.
There is no easy fix for this, but since userspace expects
SW_TABLET_MODE reporting to be reliable when advertised it is
better to not claim/report SW_TABLET_MODE support at all, then
to claim to support it while it does not work.
To avoid the mentioned problems, add a new enable_tablet_mode_sw
module-parameter which defaults to false.
Note I've made this an int using the standard -1=auto, 0=off, 1=on
triplett, with the hope that in the future we can come up with a
better way to detect SW_TABLET_MODE support. ATM the default
auto option just does the same as off.
BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1918255
Cc: Stefan Brüns <stefan.bruens@rwth-aachen.de>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Acked-by: Mark Gross <mgross@linux.intel.com>
Link: https://lore.kernel.org/r/20210120124941.73409-1-hdegoede@redhat.com
Remove duplicated helper functions to parse opaque XDR objects
and place inside new file net/sunrpc/auth_gss/auth_gss_internal.h.
In the new file carry the license and copyright from the source file
net/sunrpc/auth_gss/auth_gss.c. Finally, update the comment inside
include/linux/sunrpc/xdr.h since lockd is not the only user of
struct xdr_netobj.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
After commit 36e2c7421f ("fs: don't allow splice read/write
without explicit ops") sendfile() could no longer send data
from a real file to a pipe, breaking for example certain cgit
setups (e.g. when running behind fcgiwrap), because in this
case cgit will try to do exactly this: sendfile() to a pipe.
Fix this by using iter_file_splice_write for the splice_write
method of pipes, as suggested by Christoph.
Cc: stable@vger.kernel.org
Fixes: 36e2c7421f ("fs: don't allow splice read/write without explicit ops")
Suggested-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
layer to use write_iter. Fix the redirected_tty_write declaration
also in n_tty and change the comparisons to use write_iter instead of
write.
[ Also moved the declaration of redirected_tty_write() to the proper
location in a header file. The reason for the bug was the bogus extern
declaration in n_tty.c silently not matching the changed definition in
tty_io.c, and because it wasn't in a shared header file, there was no
cross-checking of the declaration.
Sami noticed because Clang's Control Flow Integrity checking ended up
incidentally noticing the inconsistent declaration. - Linus ]
Fixes: 9bb48c82ac ("tty: implement write_iter")
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull printk fix from Petr Mladek:
"The fix of a potential buffer overflow in 5.11-rc5 introduced another
one. The trailing '\0' might be written up to the message "len" past
the buffer. Fortunately, it is not that easy to hit.
Most readers use 1kB buffers for a single message. Typical messages
fit into the temporary buffer with enough reserve.
Also readers do not rely on the '\0'. It is related to the previous
fix. Some readers required the space for the trailing '\0'. We decided
to write it there to avoid such regressions in the future.
The most realistic victims are dumpers using kmsg_dump_get_buffer().
They are filling the entire buffer with as many messages as possible.
They are typically used when handling panic()"
* tag 'printk-for-5.11-urgent-fixup' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
printk: fix string termination for record_print_text()
When setting up a device, we can krealloc the config->socks array to add
new sockets to the configuration. However if we happen to get a IO
request in at this point even though we aren't setup we could hit a UAF,
as we deref config->socks without any locking, assuming that the
configuration was setup already and that ->socks is safe to access it as
we have a reference on the configuration.
But there's nothing really preventing IO from occurring at this point of
the device setup, we don't want to incur the overhead of a lock to
access ->socks when it will never change while the device is running.
To fix this UAF scenario simply freeze the queue if we are adding
sockets. This will protect us from this particular case without adding
any additional overhead for the normal running case.
Cc: stable@vger.kernel.org
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
After a sudden power failure we may end up with a space cache on disk that
is not valid and needs to be rebuilt from scratch.
If that happens, during log replay when we attempt to pin an extent buffer
from a log tree, at btrfs_pin_extent_for_log_replay(), we do not wait for
the space cache to be rebuilt through the call to:
btrfs_cache_block_group(cache, 1);
That is because that only waits for the task (work queue job) that loads
the space cache to change the cache state from BTRFS_CACHE_FAST to any
other value. That is ok when the space cache on disk exists and is valid,
but when the cache is not valid and needs to be rebuilt, it ends up
returning as soon as the cache state changes to BTRFS_CACHE_STARTED (done
at caching_thread()).
So this means that we can end up trying to unpin a range which is not yet
marked as free in the block group. This results in the call to
btrfs_remove_free_space() to return -EINVAL to
btrfs_pin_extent_for_log_replay(), which in turn makes the log replay fail
as well as mounting the filesystem. More specifically the -EINVAL comes
from free_space_cache.c:remove_from_bitmap(), because the requested range
is not marked as free space (ones in the bitmap), we have the following
condition triggered:
static noinline int remove_from_bitmap(struct btrfs_free_space_ctl *ctl,
(...)
if (ret < 0 || search_start != *offset)
return -EINVAL;
(...)
It's the "search_start != *offset" that results in the condition being
evaluated to true.
When this happens we got the following in dmesg/syslog:
[72383.415114] BTRFS: device fsid 32b95b69-0ea9-496a-9f02-3f5a56dc9322 devid 1 transid 1432 /dev/sdb scanned by mount (3816007)
[72383.417837] BTRFS info (device sdb): disk space caching is enabled
[72383.418536] BTRFS info (device sdb): has skinny extents
[72383.423846] BTRFS info (device sdb): start tree-log replay
[72383.426416] BTRFS warning (device sdb): block group 30408704 has wrong amount of free space
[72383.427686] BTRFS warning (device sdb): failed to load free space cache for block group 30408704, rebuilding it now
[72383.454291] BTRFS: error (device sdb) in btrfs_recover_log_trees:6203: errno=-22 unknown (Failed to pin buffers while recovering log root tree.)
[72383.456725] BTRFS: error (device sdb) in btrfs_replay_log:2253: errno=-22 unknown (Failed to recover log tree)
[72383.460241] BTRFS error (device sdb): open_ctree failed
We also mark the range for the extent buffer in the excluded extents io
tree. That is fine when the space cache is valid on disk and we can load
it, in which case it causes no problems.
However, for the case where we need to rebuild the space cache, because it
is either invalid or it is missing, having the extent buffer range marked
in the excluded extents io tree leads to a -EINVAL failure from the call
to btrfs_remove_free_space(), resulting in the log replay and mount to
fail. This is because by having the range marked in the excluded extents
io tree, the caching thread ends up never adding the range of the extent
buffer as free space in the block group since the calls to
add_new_free_space(), called from load_extent_tree_free(), filter out any
ranges that are marked as excluded extents.
So fix this by making sure that during log replay we wait for the caching
task to finish completely when we need to rebuild a space cache, and also
drop the need to mark the extent buffer range in the excluded extents io
tree, as well as clearing ranges from that tree at
btrfs_finish_extent_commit().
This started to happen with some frequency on large filesystems having
block groups with a lot of fragmentation since the recent commit
e747853cae ("btrfs: load free space cache asynchronously"), but in
fact the issue has been there for years, it was just much less likely
to happen.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This effectively reverts commit d5c8238849 ("btrfs: convert
data_seqcount to seqcount_mutex_t").
While running fstests on 32 bits test box, many tests failed because of
warnings in dmesg. One of those warnings (btrfs/003):
[66.441317] WARNING: CPU: 6 PID: 9251 at include/linux/seqlock.h:279 btrfs_remove_chunk+0x58b/0x7b0 [btrfs]
[66.441446] CPU: 6 PID: 9251 Comm: btrfs Tainted: G O 5.11.0-rc4-custom+ #5
[66.441449] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.14.0-1 04/01/2014
[66.441451] EIP: btrfs_remove_chunk+0x58b/0x7b0 [btrfs]
[66.441472] EAX: 00000000 EBX: 00000001 ECX: c576070c EDX: c6b15803
[66.441475] ESI: 10000000 EDI: 00000000 EBP: c56fbcfc ESP: c56fbc70
[66.441477] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010246
[66.441481] CR0: 80050033 CR2: 05c8da20 CR3: 04b20000 CR4: 00350ed0
[66.441485] Call Trace:
[66.441510] btrfs_relocate_chunk+0xb1/0x100 [btrfs]
[66.441529] ? btrfs_lookup_block_group+0x17/0x20 [btrfs]
[66.441562] btrfs_balance+0x8ed/0x13b0 [btrfs]
[66.441586] ? btrfs_ioctl_balance+0x333/0x3c0 [btrfs]
[66.441619] ? __this_cpu_preempt_check+0xf/0x11
[66.441643] btrfs_ioctl_balance+0x333/0x3c0 [btrfs]
[66.441664] ? btrfs_ioctl_get_supported_features+0x30/0x30 [btrfs]
[66.441683] btrfs_ioctl+0x414/0x2ae0 [btrfs]
[66.441700] ? __lock_acquire+0x35f/0x2650
[66.441717] ? lockdep_hardirqs_on+0x87/0x120
[66.441720] ? lockdep_hardirqs_on_prepare+0xd0/0x1e0
[66.441724] ? call_rcu+0x2d3/0x530
[66.441731] ? __might_fault+0x41/0x90
[66.441736] ? kvm_sched_clock_read+0x15/0x50
[66.441740] ? sched_clock+0x8/0x10
[66.441745] ? sched_clock_cpu+0x13/0x180
[66.441750] ? btrfs_ioctl_get_supported_features+0x30/0x30 [btrfs]
[66.441750] ? btrfs_ioctl_get_supported_features+0x30/0x30 [btrfs]
[66.441768] __ia32_sys_ioctl+0x165/0x8a0
[66.441773] ? __this_cpu_preempt_check+0xf/0x11
[66.441785] ? __might_fault+0x89/0x90
[66.441791] __do_fast_syscall_32+0x54/0x80
[66.441796] do_fast_syscall_32+0x32/0x70
[66.441801] do_SYSENTER_32+0x15/0x20
[66.441805] entry_SYSENTER_32+0x9f/0xf2
[66.441808] EIP: 0xab7b5549
[66.441814] EAX: ffffffda EBX: 00000003 ECX: c4009420 EDX: bfa91f5c
[66.441816] ESI: 00000003 EDI: 00000001 EBP: 00000000 ESP: bfa91e98
[66.441818] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b EFLAGS: 00000292
[66.441833] irq event stamp: 42579
[66.441835] hardirqs last enabled at (42585): [<c60eb065>] console_unlock+0x495/0x590
[66.441838] hardirqs last disabled at (42590): [<c60eafd5>] console_unlock+0x405/0x590
[66.441840] softirqs last enabled at (41698): [<c601b76c>] call_on_stack+0x1c/0x60
[66.441843] softirqs last disabled at (41681): [<c601b76c>] call_on_stack+0x1c/0x60
========================================================================
btrfs_remove_chunk+0x58b/0x7b0:
__seqprop_mutex_assert at linux/./include/linux/seqlock.h:279
(inlined by) btrfs_device_set_bytes_used at linux/fs/btrfs/volumes.h:212
(inlined by) btrfs_remove_chunk at linux/fs/btrfs/volumes.c:2994
========================================================================
The warning is produced by lockdep_assert_held() in
__seqprop_mutex_assert() if CONFIG_LOCKDEP is enabled.
And "olumes.c:2994 is btrfs_device_set_bytes_used() with mutex lock
fs_info->chunk_mutex held already.
After adding some debug prints, the cause was found that many
__alloc_device() are called with NULL @fs_info (during scanning ioctl).
Inside the function, btrfs_device_data_ordered_init() is expanded to
seqcount_mutex_init(). In this scenario, its second
parameter info->chunk_mutex is &NULL->chunk_mutex which equals
to offsetof(struct btrfs_fs_info, chunk_mutex) unexpectedly. Thus,
seqcount_mutex_init() is called in wrong way. And later
btrfs_device_get/set helpers trigger lockdep warnings.
The device and filesystem object lifetimes are different and we'd have
to synchronize initialization of the btrfs_device::data_seqcount with
the fs_info, possibly using some additional synchronization. It would
still not prevent concurrent access to the seqcount lock when it's used
for read and initialization.
Commit d5c8238849 ("btrfs: convert data_seqcount to seqcount_mutex_t")
does not mention a particular problem being fixed so revert should not
cause any harm and we'll get the lockdep warning fixed.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=210139
Reported-by: Erhard F <erhard_f@mailbox.org>
Fixes: d5c8238849 ("btrfs: convert data_seqcount to seqcount_mutex_t")
CC: stable@vger.kernel.org # 5.10
CC: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Su Yue <l@damenly.su>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
While running btrfs/011 in a loop I would often ASSERT() while trying to
add a new free space entry that already existed, or get an EEXIST while
adding a new block to the extent tree, which is another indication of
double allocation.
This occurs because when we do the free space tree population, we create
the new root and then populate the tree and commit the transaction.
The problem is when you create a new root, the root node and commit root
node are the same. During this initial transaction commit we will run
all of the delayed refs that were paused during the free space tree
generation, and thus begin to cache block groups. While caching block
groups the caching thread will be reading from the main root for the
free space tree, so as we make allocations we'll be changing the free
space tree, which can cause us to add the same range twice which results
in either the ASSERT(ret != -EEXIST); in __btrfs_add_free_space, or in a
variety of different errors when running delayed refs because of a
double allocation.
Fix this by marking the fs_info as unsafe to load the free space tree,
and fall back on the old slow method. We could be smarter than this,
for example caching the block group while we're populating the free
space tree, but since this is a serious problem I've opted for the
simplest solution.
CC: stable@vger.kernel.org # 4.9+
Fixes: a5ed918285 ("Btrfs: implement the free space B-tree")
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Function kernel_kexec() is called with lock system_transition_mutex
held in reboot system call. While inside kernel_kexec(), it will
acquire system_transition_mutex agin. This will lead to dead lock.
The dead lock should be easily triggered, it hasn't caused any
failure report just because the feature 'kexec jump' is almost not
used by anyone as far as I know. An inquiry can be made about who
is using 'kexec jump' and where it's used. Before that, let's simply
remove the lock operation inside CONFIG_KEXEC_JUMP ifdeffery scope.
Fixes: 55f2503c3b ("PM / reboot: Eliminate race between reboot and suspend")
Signed-off-by: Baoquan He <bhe@redhat.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Pingfan Liu <kernelfans@gmail.com>
Cc: 4.19+ <stable@vger.kernel.org> # 4.19+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Our intention was to only remove path kobjects whenever a device is
being set offline. However, one corner case was missing.
If a device is disabled and enabled (using the IOCTLs BIODASDDISABLE and
BIODASDENABLE respectively), the enabling process will call
dasd_eckd_reload_device() which itself calls dasd_eckd_read_conf() in
order to update path information. During that update,
dasd_eckd_clear_conf_data() clears all old data and also removes all
kobjects. This will leave us with an inconsistent state of path kobjects
and a subsequent path verification leads to a failing kobject creation.
Fix this by removing kobjects only in the context of offlining a device
as initially intended.
Fixes: 19508b2047 ("s390/dasd: Display FC Endpoint Security information via sysfs")
Reported-by: Stefan Haberland <sth@linux.ibm.com>
Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com>
Reviewed-by: Stefan Haberland <sth@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Calling acpi_thermal_check() from acpi_thermal_notify() directly
is problematic if _TMP triggers Notify () on the thermal zone for
which it has been evaluated (which happens on some systems), because
it causes a new acpi_thermal_notify() invocation to be queued up
every time and if that takes place too often, an indefinite number of
pending work items may accumulate in kacpi_notify_wq over time.
Besides, it is not really useful to queue up a new invocation of
acpi_thermal_check() if one of them is pending already.
For these reasons, rework acpi_thermal_notify() to queue up a thermal
check instead of calling acpi_thermal_check() directly and only allow
one thermal check to be pending at a time. Moreover, only allow one
acpi_thermal_check_fn() instance at a time to run
thermal_zone_device_update() for one thermal zone and make it return
early if it sees other instances running for the same thermal zone.
While at it, fold acpi_thermal_check() into acpi_thermal_check_fn(),
as it is only called from there after the other changes made here.
[This issue appears to have been exposed by commit 6d25be5782
("sched/core, workqueues: Distangle worker accounting from rq
lock"), but it is unclear why it was not visible earlier.]
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=208877
Reported-by: Stephen Berman <stephen.berman@gmx.net>
Diagnosed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Tested-by: Stephen Berman <stephen.berman@gmx.net>
Cc: All applicable <stable@vger.kernel.org>
Commit 8765c5ba19 ("ACPI / scan: Rework modalias creation when
"compatible" is present") may create two "MODALIAS=" in one uevent
file if specific conditions are met.
This breaks systemd-udevd, which assumes each "key" in one uevent file
to be unique. The internal implementation of systemd-udevd overwrites
the first MODALIAS with the second one, so its kmod rule doesn't load
the driver for the first MODALIAS.
So if both the ACPI modalias and the OF modalias are present, use the
latter to ensure that there will be only one MODALIAS.
Link: https://github.com/systemd/systemd/pull/18163
Suggested-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Fixes: 8765c5ba19 ("ACPI / scan: Rework modalias creation when "compatible" is present")
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: 4.1+ <stable@vger.kernel.org> # 4.1+
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
We've had several reports of broken dependencies. The 'right' fix is
to revisit the module dependencies as suggested by Arnd Bergmann. This
is WIP at https://github.com/thesofproject/linux/pull/2683. Since this
is taking longer than expected, I am only sharing quick fixes for now.
Pierre-Louis Bossart (2):
ASoC: SOF: Intel: soundwire: fix select/depend unmet dependencies
ASoC: SOF: SND_INTEL_DSP_CONFIG dependency
sound/soc/sof/intel/Kconfig | 3 ++-
sound/soc/sof/sof-acpi-dev.c | 11 ++++++-----
sound/soc/sof/sof-pci-dev.c | 10 ++++++----
3 files changed, 14 insertions(+), 10 deletions(-)
--
2.25.1
In D3 resume flow, avoid the following race where sending
packets before updating the sequence number (sequence
number received from the wowlan status command response):
Thread 1:
__iwl_mvm_resume clears IWL_MVM_STATUS_IN_D3 and is cut
by thread 2 before reaching iwl_mvm_query_wakeup_reasons.
Thread 2:
iwl_mvm_mac_itxq_xmit calls iwl_mvm_tx_skb since
IWL_MVM_STATUS_IN_D3 is not set using a wrong sequence number.
Thread 1:
__iwl_mvm_resume continues and calls iwl_mvm_query_wakeup_reasons
updating the sequence number received from the firmware.
The next packet that will be sent now will cause sysassert 0x1096.
Fix the bug by moving 'clear IWL_MVM_STATUS_IN_D3' to after
sending the wowlan status command and updating the sequence
number.
Signed-off-by: Shaul Triebitz <shaul.triebitz@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/iwlwifi.20210122144849.fe927ec939c6.I103d3321fb55da7e6c6c51582cfadf94eb8b6c58@changeid
Information pid/vid of WSDA-200-USB, Lord corporation company:
vid: 199b
pid: ba30
Signed-off-by: Pho Tran <pho.tran@silabs.com>
[ johan: amend comment with product name ]
Cc: stable@vger.kernel.org
Signed-off-by: Johan Hovold <johan@kernel.org>
If we spin for a long time in memory reads that (for some reason in
hardware) take a long time, then we'll eventually get messages such
as
watchdog: BUG: soft lockup - CPU#2 stuck for 24s! [kworker/2:2:272]
This is because the reading really does take a very long time, and
we don't schedule, so we're hogging the CPU with this task, at least
if CONFIG_PREEMPT is not set, e.g. with CONFIG_PREEMPT_VOLUNTARY=y.
Previously I misinterpreted the situation and thought that this was
only going to happen if we had interrupts disabled, and then fixed
this (which is good anyway, however), but that didn't always help;
looking at it again now I realized that the spin unlock will only
reschedule if CONFIG_PREEMPT is used.
In order to avoid this issue, change the code to cond_resched() if
we've been spinning for too long here.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Fixes: 04516706bb ("iwlwifi: pcie: limit memory read spin time")
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/iwlwifi.20210115130253.217a9d6a6a12.If964cb582ab0aaa94e81c4ff3b279eaafda0fd3f@changeid
I noticed that the flow that triggers an NMI on the firmware
for old devices (tested on 7265) doesn't work.
Apparently, the firmware / device is still in low power when
we write the register that triggers the NMI. We call the
"grab_nic_access" function to make sure the device is awake
but that wasn't enough. I played with this and noticed that
if we wait 1 ms after the device reports it is awake before
we write to the NMI register, the device always sees our
write and the firmware gets properly asserted.
Triggering an NMI to the firmware can be done with the
debugfs hook:
echo 1 > /sys/kernel/debug/iwlwifi/0000\:00\:03.0/iwlmvm/fw_nmi
What happened before is that the firmware would just stall
without running its NMI routine. Because of that the driver
wouldn't get the "firmware crashed" interrupt. After a while
the driver would notice that the firmware is not responding
to some command and it would read the error data from the
firmware, but this data is populated in the NMI service
routine in the firmware which was not called. So in the logs
it looked like:
iwlwifi 0000:00:03.0: Error sending REPLY_ERROR: time out after 2000ms.
iwlwifi 0000:00:03.0: Current CMD queue read_ptr 33 write_ptr 34
iwlwifi 0000:00:03.0: Loaded firmware version: 29.09bd31e1.0 7265D-29.ucode
iwlwifi 0000:00:03.0: 0x00000000 | ADVANCED_SYSASSERT
iwlwifi 0000:00:03.0: 0x00000000 | trm_hw_status0
iwlwifi 0000:00:03.0: 0x00000000 | trm_hw_status1
iwlwifi 0000:00:03.0: 0x00000000 | branchlink2
iwlwifi 0000:00:03.0: 0x00000000 | interruptlink1
iwlwifi 0000:00:03.0: 0x00000000 | interruptlink2
iwlwifi 0000:00:03.0: 0x00000000 | data1
iwlwifi 0000:00:03.0: 0x00000000 | data2
iwlwifi 0000:00:03.0: 0x00000000 | data3
iwlwifi 0000:00:03.0: 0x00000000 | beacon time
iwlwifi 0000:00:03.0: 0x00000000 | tsf low
...
With this fix, immediately after we trigger the NMI to the
firmware, we get the expected:
iwlwifi 0000:00:03.0: Microcode SW error detected. Restarting 0x2000000.
iwlwifi 0000:00:03.0: Start IWL Error Log Dump:
iwlwifi 0000:00:03.0: Status: 0x00000040, count: 6
iwlwifi 0000:00:03.0: Loaded firmware version: 29.09bd31e1.0 7265D-29.ucode
iwlwifi 0000:00:03.0: 0x00000084 | NMI_INTERRUPT_UNKNOWN
iwlwifi 0000:00:03.0: 0x000002F1 | trm_hw_status0
iwlwifi 0000:00:03.0: 0x00000000 | trm_hw_status1
iwlwifi 0000:00:03.0: 0x00043D6C | branchlink2
iwlwifi 0000:00:03.0: 0x0004AFD6 | interruptlink1
iwlwifi 0000:00:03.0: 0x000008C4 | interruptlink2
iwlwifi 0000:00:03.0: 0x00000000 | data1
iwlwifi 0000:00:03.0: 0x00000080 | data2
iwlwifi 0000:00:03.0: 0x07030000 | data3
iwlwifi 0000:00:03.0: 0x003FD4C3 | beacon time
iwlwifi 0000:00:03.0: 0x00C22AC3 | tsf low
iwlwifi 0000:00:03.0: 0x00000000 | tsf hi
iwlwifi 0000:00:03.0: 0x00000000 | time gp1
iwlwifi 0000:00:03.0: 0x00C22AC3 | time gp2
iwlwifi 0000:00:03.0: 0x00000001 | uCode revision type
iwlwifi 0000:00:03.0: 0x0000001D | uCode version major
Notice the first line: "Microcode SW error detected:" which
is printed in the driver's ISR, which means that the driver
actually got an interrupt from the firmware saying that it
crashed. And then we have the properly populated error data.
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/iwlwifi.20210115130252.70e67cc75d88.I6615cad4361862e7f3c9f2d3cafb6a8c61e16781@changeid
MT8192 determines the I2S clock rates according to the sampling rates.
There is only 1 set of I2S in between MT8192 and RT5682. If playing and
capturing via RT5682 in different sampling rates, the I2S data will be
corrupted.
Adds format constraints to the corresponding DAI links to make sure the
sampling rates are symmetric.
Fixes: 18b13ff23f ("ASoC: mediatek: mt8192: add machine driver with mt6359, rt1015 and rt5682")
Signed-off-by: Tzung-Bi Shih <tzungbi@google.com>
Link: https://lore.kernel.org/r/20210125061453.1056535-1-tzungbi@google.com
Signed-off-by: Mark Brown <broonie@kernel.org>
The sof-pci-dev driver fails to link when built into the kernel
and CONFIG_SND_INTEL_DSP_CONFIG is set to =m:
arm-linux-gnueabi-ld: sound/soc/sof/sof-pci-dev.o: in function `sof_pci_probe':
sof-pci-dev.c:(.text+0x1c): undefined reference to `snd_intel_dsp_driver_probe'
As a temporary fix, use IS_REACHABLE to prevent the problem from
happening. A more complete solution is to move this code to
Intel-specific parts, restructure the drivers and Kconfig as discussed
with Arnd Bergmann and Takashi Iwai.
Fixes: 82d9d54a6c ("ALSA: hda: add Intel DSP configuration / probe code")
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Link: https://lore.kernel.org/r/20210122005725.94163-3-pierre-louis.bossart@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
The LKP bot reports the following issue:
WARNING: unmet direct dependencies detected for SOUNDWIRE_INTEL
Depends on [m]: SOUNDWIRE [=m] && ACPI [=y] && SND_SOC [=y]
Selected by [y]:
- SND_SOC_SOF_INTEL_SOUNDWIRE [=y] && SOUND [=y] && !UML &&
SND [=y] && SND_SOC [=y] && SND_SOC_SOF_TOPLEVEL [=y] &&
SND_SOC_SOF_INTEL_TOPLEVEL [=y] && SND_SOC_SOF_INTEL_PCI [=y]
This comes from having tristates being configured independently, when
in practice the CONFIG_SOUNDWIRE needs to be aligned with the SOF
choices: when the SOF code is compiled as built-in, the
CONFIG_SOUNDWIRE also needs to be 'y'.
The easiest fix is to replace the 'depends' with a 'select' and have a
single user selection to activate SoundWire on Intel platforms. This
still allows regmap to be compiled independently as a module.
This is just a temporary fix, the select/depend usage will be
revisited and the SOF Kconfig re-organized, as suggested by Arnd
Bergman.
Reported-by: kernel test robot <lkp@intel.com>
Fixes: a115ab9b8b ('ASoC: SOF: Intel: add build support for SoundWire')
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Link: https://lore.kernel.org/r/20210122005725.94163-2-pierre-louis.bossart@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
LBM base address is measured in units of pixels per cycle.
That is 4 for 2711 (hvs5) and 2 for 2708.
We are wasting 75% of lbm by indexing without the scaling.
But we were also using too high a size for the lbm resulting
in partial corruption (right hand side) of vertically
scaled images, usually at 4K or lower resolutions with more layers.
The physical RAM of LBM on 2711 is 8 * 1920 * 16 * 12-bit
(pixels are stored 12-bits per component regardless of format).
The LBM address indexes work in units of pixels per clock,
so for 4 pixels per clock that means we have 32 * 1920 = 60K
Fixes: c54619b0bf ("drm/vc4: Add support for the BCM2711 HVS5")
Signed-off-by: Dom Cobley <popcornmix@gmail.com>
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Reviewed-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Tested-By: Lucas Nussbaum <lucas@debian.org>
Tested-By: Ryutaroh Matsumoto <ryutaroh@ict.e.titech.ac.jp>
Link: https://patchwork.freedesktop.org/patch/msgid/20210121105759.1262699-1-maxime@cerno.tech
Commit f0e386ee0c ("printk: fix buffer overflow potential for
print_text()") added string termination in record_print_text().
However it used the wrong base pointer for adding the terminator.
This led to a 0-byte being written somewhere beyond the buffer.
Use the correct base pointer when adding the terminator.
Fixes: f0e386ee0c ("printk: fix buffer overflow potential for print_text()")
Reported-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20210124202728.4718-1-john.ogness@linutronix.de
Palm ejection stops working on some Elan and Synaptics touchpad after
commit 40d5bb8737 ("HID: multitouch: enable multi-input as a quirk for
some devices").
The commit changes the mt_class from MT_CLS_WIN_8 to
MT_CLS_WIN_8_FORCE_MULTI_INPUT, so MT_QUIRK_CONFIDENCE isn't applied
anymore.
So also apply the quirk since MT_CLS_WIN_8_FORCE_MULTI_INPUT is
essentially MT_CLS_WIN_8.
Fixes: 40d5bb8737 ("HID: multitouch: enable multi-input as a quirk for some devices")
Cc: stable@vger.kernel.org
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
We may lose edge interrupts for gpio banks other than the first gpio
bank as they are not always powered. Instead, we must use the padconf
interrupt as that is always powered.
Note that we still also use the gpio for reading the pin state as that
can't be done with the padconf device.
Signed-off-by: Tony Lindgren <tony@atomide.com>
Once we have called device_initialize(), we should use put_device() to
give up the reference on error, just like what we have done on failure
of device_add().
Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
In case of blk_mq_is_sbitmap_shared(), we should test QUEUE_FLAG_HCTX_ACTIVE against
q->queue_flags instead of BLK_MQ_S_TAG_ACTIVE.
So fix it.
Cc: John Garry <john.garry@huawei.com>
Cc: Kashyap Desai <kashyap.desai@broadcom.com>
Fixes: f1b49fdc1c ("blk-mq: Record active_queues_shared_sbitmap per tag_set for when using shared sbitmap")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: John Garry <john.garry@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
If a layoutget ends up being reordered w.r.t. a layoutreturn, e.g. due
to a layoutget-on-open not knowing a priori which file to lock, then we
must assume the layout is no longer being considered valid state by the
server.
Incrementally improve our ability to reject such states by using the
cached old stateid in conjunction with the plh_barrier to try to
identify them.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
When we're scheduling a layoutreturn, we need to ignore any further
incoming layouts with sequence ids that are going to be affected by the
layout return.
Fixes: 44ea8dfce0 ("NFS/pnfs: Reference the layout cred in pnfs_prepare_layoutreturn()")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
If the server returns a new stateid that does not match the one in our
cache, then try to return the one we hold instead of just invalidating
it on the client side. This ensures that both client and server will
agree that the stateid is invalid.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
If the server returns a new stateid that does not match the one in our
cache, then pnfs_layout_process() will leak the layout segments returned
by pnfs_mark_layout_stateid_invalid().
Fixes: 9888d837f3 ("pNFS: Force a retry of LAYOUTGET if the stateid doesn't match our cache")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
This normally doesn't cause any extra harm, but it does mean that we'll
increment the eventfd notification count, if one has been registered
with the ring. This can confuse applications, when they see more
notifications on the eventfd side than are available in the ring.
Do the nice thing and only increment this count, if we actually posted
(or even overflowed) events.
Reported-and-tested-by: Dan Melnic <dmm@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Ensure we match tasks that belong to a dead or dying task as well, as we
need to reap those in addition to those belonging to the exiting task.
Cc: stable@vger.kernel.org # 5.9+
Reported-by: Josef Grieb <josef.grieb@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull io_uring fixes from Jens Axboe:
"Still need a final cancelation fix that isn't quite done done,
expected in the next day or two. That said, this contains:
- Wakeup fix for IOPOLL requests
- SQPOLL split close op handling fix
- Ensure that any use of io_uring fd itself is marked as inflight
- Short non-regular file read fix (Pavel)
- Fix up bad false positive warning (Pavel)
- SQPOLL fixes (Pavel)
- In-flight removal fix (Pavel)"
* tag 'io_uring-5.11-2021-01-24' of git://git.kernel.dk/linux-block:
io_uring: account io_uring internal files as REQ_F_INFLIGHT
io_uring: fix sleeping under spin in __io_clean_op
io_uring: fix short read retries for non-reg files
io_uring: fix SQPOLL IORING_OP_CLOSE cancelation state
io_uring: fix skipping disabling sqo on exec
io_uring: fix uring_flush in exit_files() warning
io_uring: fix false positive sqo warning on flush
io_uring: iopoll requests should also wake task ->in_idle state
Pull block fixes from Jens Axboe:
- NVMe pull request from Christoph:
- fix a status code in nvmet (Chaitanya Kulkarni)
- avoid double completions in nvme-rdma/nvme-tcp (Chao Leng)
- fix the CMB support to cope with NVMe 1.4 controllers (Klaus Jensen)
- fix PRINFO handling in the passthrough ioctl (Revanth Rajashekar)
- fix a double DMA unmap in nvme-pci
- lightnvm error path leak fix (Pan)
- MD pull request from Song:
- Flush request fix (Xiao)
* tag 'block-5.11-2021-01-24' of git://git.kernel.dk/linux-block:
lightnvm: fix memory leak when submit fails
nvme-pci: fix error unwind in nvme_map_data
nvme-pci: refactor nvme_unmap_data
md: Set prev_flush_start and flush_bio in an atomic way
nvmet: set right status on error in id-ns handler
nvme-pci: allow use of cmb on v1.4 controllers
nvme-tcp: avoid request double completion for concurrent nvme_tcp_timeout
nvme-rdma: avoid request double completion for concurrent nvme_rdma_timeout
nvme: check the PRINFO bit before deciding the host buffer length
Merge misc fixes from Andrew Morton:
"18 patches.
Subsystems affected by this patch series: mm (pagealloc, memcg, kasan,
memory-failure, and highmem), ubsan, proc, and MAINTAINERS"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
MAINTAINERS: add a couple more files to the Clang/LLVM section
proc_sysctl: fix oops caused by incorrect command parameters
powerpc/mm/highmem: use __set_pte_at() for kmap_local()
mips/mm/highmem: use set_pte() for kmap_local()
mm/highmem: prepare for overriding set_pte_at()
sparc/mm/highmem: flush cache and TLB
mm: fix page reference leak in soft_offline_page()
ubsan: disable unsigned-overflow check for i386
kasan, mm: fix resetting page_alloc tags for HW_TAGS
kasan, mm: fix conflicts with init_on_alloc/free
kasan: fix HW_TAGS boot parameters
kasan: fix incorrect arguments passing in kasan_add_zero_shadow
kasan: fix unaligned address is unhandled in kasan_remove_zero_shadow
mm: fix numa stats for thp migration
mm: memcg: fix memcg file_dirty numa stat
mm: memcg/slab: optimize objcg stock draining
mm: fix initialization of struct page for holes in memory layout
x86/setup: don't remove E820_TYPE_RAM for pfn 0
Pull char/misc driver fixes from Greg KH:
"Here are some small char/misc driver fixes for 5.11-rc5:
- habanalabs driver fixes
- phy driver fixes
- hwtracing driver fixes
- rtsx cardreader driver fix
All of these have been in linux-next with no reported issues"
* tag 'char-misc-5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
misc: rtsx: init value of aspm_enabled
habanalabs: disable FW events on device removal
habanalabs: fix backward compatibility of idle check
habanalabs: zero pci counters packet before submit to FW
intel_th: pci: Add Alder Lake-P support
stm class: Fix module init return on allocation failure
habanalabs: prevent soft lockup during unmap
habanalabs: fix reset process in case of failures
habanalabs: fix dma_addr passed to dma_mmap_coherent
phy: mediatek: allow compile-testing the dsi phy
phy: cpcap-usb: Fix warning for missing regulator_disable
PHY: Ingenic: fix unconditional build of phy-ingenic-usb
Pull driver core fixes from Greg KH:
"Here are some small driver core fixes for 5.11-rc5 that resolve some
reported problems:
- revert of a -rc1 patch that was causing problems with some machines
- device link device name collision problem fix (busses only have to
name devices unique to their bus, not unique to all busses)
- kernfs splice bugfixes to resolve firmware loading problems for
Qualcomm systems.
- other tiny driver core fixes for minor issues reported.
All of these have been in linux-next with no reported problems"
* tag 'driver-core-5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
driver core: Fix device link device name collision
driver core: Extend device_is_dependent()
kernfs: wire up ->splice_read and ->splice_write
kernfs: implement ->write_iter
kernfs: implement ->read_iter
Revert "driver core: Reorder devices on successful probe"
Driver core: platform: Add extra error check in devm_platform_get_irqs_affinity()
drivers core: Free dma_range_map when driver probe failed
Pull staging/IIO driver fixes from Greg KH:
"Here are some IIO driver fixes for 5.11-rc5 to resolve some reported
problems.
Nothing major, just a few small fixes, all of these have been in
linux-next for a while and full details are in the shortlog"
* tag 'staging-5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
iio: sx9310: Fix semtech,avg-pos-strength setting when > 16
iio: common: st_sensors: fix possible infinite loop in st_sensors_irq_thread
iio: ad5504: Fix setting power-down state
counter:ti-eqep: remove floor
drivers: iio: temperature: Add delay after the addressed reset command in mlx90632.c
iio: adc: ti_am335x_adc: remove omitted iio_kfifo_free()
dt-bindings: iio: accel: bma255: Fix bmc150/bmi055 compatible
iio: sx9310: Off by one in sx9310_read_thresh()
Pull tty/serial fixes from Greg KH:
"Here are three small tty/serial fixes for 5.11-rc5 to resolve reported
problems:
- two patches to fix up writing to ttys with splice
- mvebu-uart driver fix for reported problem
All of these have been in linux-next with no reported problems"
* tag 'tty-5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
tty: fix up hung_up_tty_write() conversion
tty: implement write_iter
serial: mvebu-uart: fix tx lost characters at power off
Pull USB fixes from Greg KH:
"Here are some small USB driver fixes for 5.11-rc5. They resolve:
- xhci issues for some reported problems
- ehci driver issue for one specific device
- USB gadget fixes for some reported problems
- cdns3 driver fixes for issues reported
- MAINTAINERS file update
- thunderbolt minor fix
All of these have been in linux-next with no reported issues"
* tag 'usb-5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: bdc: Make bdc pci driver depend on BROKEN
xhci: tegra: Delay for disabling LFPS detector
xhci: make sure TRB is fully written before giving it to the controller
usb: udc: core: Use lock when write to soft_connect
USB: gadget: dummy-hcd: Fix errors in port-reset handling
usb: gadget: aspeed: fix stop dma register setting.
USB: ehci: fix an interrupt calltrace error
ehci: fix EHCI host controller initialization sequence
MAINTAINERS: update Peter Chen's email address
thunderbolt: Drop duplicated 0x prefix from format string
MAINTAINERS: Update address for Cadence USB3 driver
usb: cdns3: imx: improve driver .remove API
usb: cdns3: imx: fix can't create core device the second time issue
usb: cdns3: imx: fix writing read-only memory issue
The process_sysctl_arg() does not check whether val is empty before
invoking strlen(val). If the command line parameter () is incorrectly
configured and val is empty, oops is triggered.
For example:
"hung_task_panic=1" is incorrectly written as "hung_task_panic", oops is
triggered. The call stack is as follows:
Kernel command line: .... hung_task_panic
......
Call trace:
__pi_strlen+0x10/0x98
parse_args+0x278/0x344
do_sysctl_args+0x8c/0xfc
kernel_init+0x5c/0xf4
ret_from_fork+0x10/0x30
To fix it, check whether "val" is empty when "phram" is a sysctl field.
Error codes are returned in the failure branch, and error logs are
generated by parse_args().
Link: https://lkml.kernel.org/r/20210118133029.28580-1-nixiaoming@huawei.com
Fixes: 3db978d480 ("kernel/sysctl: support setting sysctl parameters from kernel command line")
Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Heiner Kallweit <hkallweit1@gmail.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: <stable@vger.kernel.org> [5.8+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Building ubsan kernels even for compile-testing introduced these
warnings in my randconfig environment:
crypto/blake2b_generic.c:98:13: error: stack frame size of 9636 bytes in function 'blake2b_compress' [-Werror,-Wframe-larger-than=]
static void blake2b_compress(struct blake2b_state *S,
crypto/sha512_generic.c:151:13: error: stack frame size of 1292 bytes in function 'sha512_generic_block_fn' [-Werror,-Wframe-larger-than=]
static void sha512_generic_block_fn(struct sha512_state *sst, u8 const *src,
lib/crypto/curve25519-fiat32.c:312:22: error: stack frame size of 2180 bytes in function 'fe_mul_impl' [-Werror,-Wframe-larger-than=]
static noinline void fe_mul_impl(u32 out[10], const u32 in1[10], const u32 in2[10])
lib/crypto/curve25519-fiat32.c:444:22: error: stack frame size of 1588 bytes in function 'fe_sqr_impl' [-Werror,-Wframe-larger-than=]
static noinline void fe_sqr_impl(u32 out[10], const u32 in1[10])
Further testing showed that this is caused by
-fsanitize=unsigned-integer-overflow, but is isolated to the 32-bit x86
architecture.
The one in blake2b immediately overflows the 8KB stack area
architectures, so better ensure this never happens by disabling the
option for 32-bit x86.
Link: https://lkml.kernel.org/r/20210112202922.2454435-1-arnd@kernel.org
Link: https://lore.kernel.org/lkml/20201230154749.746641-1-arnd@kernel.org/
Fixes: d0a3ac549f ("ubsan: enable for all*config builds")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Marco Elver <elver@google.com>
Cc: George Popescu <georgepope@android.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
During testing kasan_populate_early_shadow and kasan_remove_zero_shadow,
if the shadow start and end address in kasan_remove_zero_shadow() is not
aligned to PMD_SIZE, the remain unaligned PTE won't be removed.
In the test case for kasan_remove_zero_shadow():
shadow_start: 0xffffffb802000000, shadow end: 0xffffffbfbe000000
3-level page table:
PUD_SIZE: 0x40000000 PMD_SIZE: 0x200000 PAGE_SIZE: 4K
0xffffffbf80000000 ~ 0xffffffbfbdf80000 will not be removed because in
kasan_remove_pud_table(), kasan_pmd_table(*pud) is true but the next
address is 0xffffffbfbdf80000 which is not aligned to PUD_SIZE.
In the correct condition, this should fallback to the next level
kasan_remove_pmd_table() but the condition flow always continue to skip
the unaligned part.
Fix by correcting the condition when next and addr are neither aligned.
Link: https://lkml.kernel.org/r/20210103135621.83129-1-lecopzer@gmail.com
Fixes: 0207df4fa1 ("kernel/memremap, kasan: make ZONE_DEVICE with work with KASAN")
Signed-off-by: Lecopzer Chen <lecopzer.chen@mediatek.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: YJ Chiang <yj.chiang@mediatek.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull irq fixes from Borislav Petkov:
- Fix a kernel panic in mips-cpu due to invalid irq domain hierarchy.
- Fix to not lose IPIs on bcm2836.
- Fix for a bogus marking of ITS devices as shared due to unitialized
stack variable.
- Clear a phantom interrupt on qcom-pdc to unblock suspend.
- Small cleanups, warning and build fixes.
* tag 'irq_urgent_for_v5.11_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq: Export irq_check_status_bit()
irqchip/mips-cpu: Set IPI domain parent chip
irqchip/pruss: Simplify the TI_PRUSS_INTC Kconfig
irqchip/loongson-liointc: Fix build warnings
driver core: platform: Add extra error check in devm_platform_get_irqs_affinity()
irqchip/bcm2836: Fix IPI acknowledgement after conversion to handle_percpu_devid_irq
irqchip/irq-sl28cpld: Convert comma to semicolon
genirq/msi: Initialize msi_alloc_info before calling msi_domain_prepare_irqs()
Pull objtool fixes from Borislav Petkov:
- Adjust objtool to handle a recent binutils change to not generate
unused symbols anymore.
- Revert the fail-the-build-on-fatal-errors objtool strategy for now
due to the ever-increasing matrix of supported toolchains/plugins and
them causing too many such fatal errors currently.
- Do not add empty symbols to objdump's rbtree to accommodate clang
removing section symbols.
* tag 'objtool_urgent_for_v5.11_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
objtool: Don't fail on missing symbol table
objtool: Don't fail the kernel build on fatal errors
objtool: Don't add empty symbols to the rbtree
Pull scheduler fixes from Borislav Petkov:
- Correct the marking of kthreads which are supposed to run on a
specific, single CPU vs such which are affine to only one CPU, mark
per-cpu workqueue threads as such and make sure that marking
"survives" CPU hotplug. Fix CPU hotplug issues with such kthreads.
- A fix to not push away tasks on CPUs coming online.
- Have workqueue CPU hotplug code use cpu_possible_mask when breaking
affinity on CPU offlining so that pending workers can finish on newly
arrived onlined CPUs too.
- Dump tasks which haven't vacated a CPU which is currently being
unplugged.
- Register a special scale invariance callback which gets called on
resume from RAM to read out APERF/MPERF after resume and thus make
the schedutil scaling governor more precise.
* tag 'sched_urgent_for_v5.11_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched: Relax the set_cpus_allowed_ptr() semantics
sched: Fix CPU hotplug / tighten is_per_cpu_kthread()
sched: Prepare to use balance_push in ttwu()
workqueue: Restrict affinity change to rescuer
workqueue: Tag bound workers with KTHREAD_IS_PER_CPU
kthread: Extract KTHREAD_IS_PER_CPU
sched: Don't run cpu-online with balance_push() enabled
workqueue: Use cpu_possible_mask instead of cpu_active_mask to break affinity
sched/core: Print out straggler tasks in sched_cpu_dying()
x86: PM: Register syscore_ops for scale invariance
Pull timer fixes from Borislav Petkov:
- Fix an integer overflow in the NTP RTC synchronization which led to
the latter happening every 2 seconds instead of the intended every 11
minutes.
- Get rid of now unused get_seconds().
* tag 'timers_urgent_for_v5.11_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
ntp: Fix RTC synchronization on 32-bit platforms
timekeeping: Remove unused get_seconds()
Pull x86 fixes from Borislav Petkov:
- Add a new Intel model number for Alder Lake
- Differentiate which aspects of the FPU state get saved/restored when
the FPU is used in-kernel and fix a boot crash on K7 due to early
MXCSR access before CR4.OSFXSR is even set.
- A couple of noinstr annotation fixes
- Correct die ID setting on AMD for users of topology information which
need the correct die ID
- A SEV-ES fix to handle string port IO to/from kernel memory properly
* tag 'x86_urgent_for_v5.11_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/cpu: Add another Alder Lake CPU to the Intel family
x86/mmx: Use KFPU_387 for MMX string operations
x86/fpu: Add kernel_fpu_begin_mask() to selectively initialize state
x86/topology: Make __max_die_per_package available unconditionally
x86: __always_inline __{rd,wr}msr()
x86/mce: Remove explicit/superfluous tracing
locking/lockdep: Avoid noinstr warning for DEBUG_LOCKDEP
locking/lockdep: Cure noinstr fail
x86/sev: Fix nonistr violation
x86/entry: Fix noinstr fail
x86/cpu/amd: Set __max_die_per_package on AMD
x86/sev-es: Handle string port IO to kernel memory properly
Pull powerpc fixes from Michael Ellerman:
- Fix a bad interaction between the scv handling and the fallback L1D
flush, which could lead to user register corruption. Only affects
people using scv (~no one) on machines with old firmware that are
missing the L1D flush.
- Two small selftest fixes.
Thanks to Eirik Fuller, Libor Pechacek, Nicholas Piggin, Sandipan Das,
and Tulio Magno Quites Machado Filho.
* tag 'powerpc-5.11-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/64s: fix scv entry fallback flush vs interrupt
selftests/powerpc: Only test lwm/stmw on big endian
selftests/powerpc: Fix exit status of pkey tests
Pull misc fixes from Christian Brauner:
- Jann reported sparse complaints because of a missing __user
annotation in a helper we added way back when we added
pidfd_send_signal() to avoid compat syscall handling. Fix it.
- Yanfei replaces a reference in a comment to the _do_fork() helper I
removed a while ago with a reference to the new kernel_clone()
replacement
- Alexander Guril added a simple coding style fix
* tag 'for-linus-2021-01-24' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
kthread: remove comments about old _do_fork() helper
Kernel: fork.c: Fix coding style: Do not use {} around single-line statements
signal: Add missing __user annotation to copy_siginfo_from_user_any
Pull cifs fixes from Steve French:
"An important signal handling patch for stable, and two small cleanup
patches"
* tag '5.11-rc4-smb3' of git://git.samba.org/sfrench/cifs-2.6:
cifs: do not fail __smb_send_rqst if non-fatal signals are pending
fs/cifs: Simplify bool comparison.
fs/cifs: Assign boolean values to a bool variable
There could be struct pages that are not backed by actual physical
memory. This can happen when the actual memory bank is not a multiple
of SECTION_SIZE or when an architecture does not register memory holes
reserved by the firmware as memblock.memory.
Such pages are currently initialized using init_unavailable_mem()
function that iterates through PFNs in holes in memblock.memory and if
there is a struct page corresponding to a PFN, the fields if this page
are set to default values and the page is marked as Reserved.
init_unavailable_mem() does not take into account zone and node the page
belongs to and sets both zone and node links in struct page to zero.
On a system that has firmware reserved holes in a zone above ZONE_DMA,
for instance in a configuration below:
# grep -A1 E820 /proc/iomem
7a17b000-7a216fff : Unknown E820 type
7a217000-7bffffff : System RAM
unset zone link in struct page will trigger
VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn), page);
because there are pages in both ZONE_DMA32 and ZONE_DMA (unset zone link
in struct page) in the same pageblock.
Update init_unavailable_mem() to use zone constraints defined by an
architecture to properly setup the zone link and use node ID of the
adjacent range in memblock.memory to set the node link.
Link: https://lkml.kernel.org/r/20210111194017.22696-3-rppt@kernel.org
Fixes: 73a6e474cb ("mm: memmap_init: iterate over memblock regions rather that check each PFN")
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Reported-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Qian Cai <cai@lca.pw>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "mm: fix initialization of struct page for holes in memory layout", v3.
Commit 73a6e474cb ("mm: memmap_init: iterate over memblock regions
rather that check each PFN") exposed several issues with the memory map
initialization and these patches fix those issues.
Initially there were crashes during compaction that Qian Cai reported
back in April [1]. It seemed back then that the problem was fixed, but
a few weeks ago Andrea Arcangeli hit the same bug [2] and there was an
additional discussion at [3].
[1] https://lore.kernel.org/lkml/8C537EB7-85EE-4DCF-943E-3CC0ED0DF56D@lca.pw
[2] https://lore.kernel.org/lkml/20201121194506.13464-1-aarcange@redhat.com
[3] https://lore.kernel.org/mm-commits/20201206005401.qKuAVgOXr%akpm@linux-foundation.org
This patch (of 2):
The first 4Kb of memory is a BIOS owned area and to avoid its allocation
for the kernel it was not listed in e820 tables as memory. As the result,
pfn 0 was never recognised by the generic memory management and it is not
a part of neither node 0 nor ZONE_DMA.
If set_pfnblock_flags_mask() would be ever called for the pageblock
corresponding to the first 2Mbytes of memory, having pfn 0 outside of
ZONE_DMA would trigger
VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn), page);
Along with reserving the first 4Kb in e820 tables, several first pages are
reserved with memblock in several places during setup_arch(). These
reservations are enough to ensure the kernel does not touch the BIOS area
and it is not necessary to remove E820_TYPE_RAM for pfn 0.
Remove the update of e820 table that changes the type of pfn 0 and move
the comment describing why it was done to trim_low_memory_range() that
reserves the beginning of the memory.
Link: https://lkml.kernel.org/r/20210111194017.22696-2-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Qian Cai <cai@lca.pw>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We need to actively cancel anything that introduces a potential circular
loop, where io_uring holds a reference to itself. If the file in question
is an io_uring file, then add the request to the inflight list.
Cc: stable@vger.kernel.org # 5.9+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
When an asynchronous interrupt calls irq_exit, it checks for softirqs
that may have been created, and runs them. Running softirqs enables
local irqs, which can replay pending interrupts causing recursion in
replay_soft_interrupts. This abridged trace shows how this can occur:
! NIP replay_soft_interrupts
LR interrupt_exit_kernel_prepare
Call Trace:
interrupt_exit_kernel_prepare (unreliable)
interrupt_return
--- interrupt: ea0 at __rb_reserve_next
NIP __rb_reserve_next
LR __rb_reserve_next
Call Trace:
ring_buffer_lock_reserve
trace_function
function_trace_call
ftrace_call
__do_softirq
irq_exit
timer_interrupt
! replay_soft_interrupts
interrupt_exit_kernel_prepare
interrupt_return
--- interrupt: ea0 at arch_local_irq_restore
This can not be prevented easily, because softirqs must not block hard
irqs, so it has to be dealt with.
The recursion is bounded by design in the softirq code because softirq
replay disables softirqs and loops around again to check for new
softirqs created while it ran, so that's not a problem.
However it does mess up interrupt replay state, causing superfluous
interrupts when the second replay_soft_interrupts clears a pending
interrupt, leaving it still set in the first call in the 'happened'
local variable.
Fix this by not caching a copy of irqs_happened across interrupt
handler calls.
Fixes: 3282a3da25 ("powerpc/64: Implement soft interrupt replay in C")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210123061244.2076145-1-npiggin@gmail.com
Upon receiving a cumulative ACK that changes the congestion state from
Disorder to Open, the TLP timer is not set. If the sender is app-limited,
it can only wait for the RTO timer to expire and retransmit.
The reason for this is that the TLP timer is set before the congestion
state changes in tcp_ack(), so we delay the time point of calling
tcp_set_xmit_timer() until after tcp_fastretrans_alert() returns and
remove the FLAG_SET_XMIT_TIMER from ack_flag when the RACK reorder timer
is set.
This commit has two additional benefits:
1) Make sure to reset RTO according to RFC6298 when receiving ACK, to
avoid spurious RTO caused by RTO timer early expires.
2) Reduce the xmit timer reschedule once per ACK when the RACK reorder
timer is set.
Fixes: df92c8394e ("tcp: fix xmit timer to only be reset if data ACKed/SACKed")
Link: https://lore.kernel.org/netdev/1611311242-6675-1-git-send-email-yangpc@wangsu.com
Signed-off-by: Pengcheng Yang <yangpc@wangsu.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/1611464834-23030-1-git-send-email-yangpc@wangsu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The TCP_USER_TIMEOUT is checked by the 0-window probe timer. As the
timer has backoff with a max interval of about two minutes, the
actual timeout for TCP_USER_TIMEOUT can be off by up to two minutes.
In this patch the TCP_USER_TIMEOUT is made more accurate by taking it
into account when computing the timer value for the 0-window probes.
This patch is similar to and builds on top of the one that made
TCP_USER_TIMEOUT accurate for RTOs in commit b701a99e43 ("tcp: Add
tcp_clamp_rto_to_user_timeout() helper to improve accuracy").
Fixes: 9721e709fa ("tcp: simplify window probe aborting on USER_TIMEOUT")
Signed-off-by: Enke Chen <enchen@paloaltonetworks.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20210122191306.GA99540@localhost.localdomain
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Rasmus Villemoes says:
====================
fix and move definitions of MRP data structures
We unnecessarily included packet structures of MRP in a uAPI header.
Turns out that some of them were in fact broken due to lack of packing,
so let's take this chance to remove them completely. Leave it to user
space to create its own, correct definitions. Said structures are not
used in the user space <> kernel communication.
====================
Link: https://lore.kernel.org/r/20210121204037.61390-1-rasmus.villemoes@prevas.dk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
None of these are actually used in the kernel/userspace interface -
there's a userspace component of implementing MRP, and userspace will
need to construct certain frames to put on the wire, but there's no
reason the kernel should provide the relevant definitions in a UAPI
header.
In fact, some of those definitions were broken until previous commit,
so only keep the few that are actually referenced in the kernel code,
and move them to the br_private_mrp.h header.
Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Wireshark says that the MRP test packets cannot be decoded - and the
reason for that is that there's a two-byte hole filled with garbage
between the "transitions" and "timestamp" members.
So Wireshark decodes the two garbage bytes and the top two bytes of
the timestamp written by the kernel as the timestamp value (which thus
fluctuates wildly), and interprets the lower two bytes of the
timestamp as a new (type, length) pair, which is of course broken.
Even though this makes the timestamp field in the struct unaligned, it
actually makes it end up on a 32 bit boundary in the frame as mandated
by the standard, since it is preceded by a two byte TLV header.
The struct definitions live under include/uapi/, but they are not
really part of any kernel<->userspace API/ABI, so fixing the
definitions by adding the packed attribute should not cause any
compatibility issues.
Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull mtd fixes from Miquel Raynal.
* 'mtd/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux:
mtd: rawnand: omap: Use BCH private fields in the specific OOB layout
mtd: spinand: Fix MTD_OPS_AUTO_OOB requests
mtd: rawnand: intel: check the mtd name only after setting the variable
mtd: rawnand: nandsim: Fix the logic when selecting Hamming soft ECC engine
mtd: rawnand: gpmi: fix dst bit offset when extracting raw payload
Pull i2c fixes from Wolfram Sang:
"Another bunch of driver fixes"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: sprd: depend on COMMON_CLK to fix compile tests
Revert "i2c: imx: Remove unused .id_table support"
i2c: octeon: check correct size of maximum RECV_LEN packet
i2c: tegra: Create i2c_writesl_vi() to use with VI I2C for filling TX FIFO
i2c: bpmp-tegra: Ignore unknown I2C_M flags
i2c: tegra: Wait for config load atomically while in ISR
Pull SCSI fixes from James Bottomley:
"Twelve minor fixes, all in drivers or doc.
Most of the fixes are pretty obvious (although we had two goes to get
the UFS sysfs doc right) and the biggest change is in the ufs driver
which they've extensively tested"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: ibmvfc: Set default timeout to avoid crash during migration
scsi: target: tcmu: Fix use-after-free of se_cmd->priv
scsi: fnic: Fix memleak in vnic_dev_init_devcmd2
scsi: libfc: Avoid invoking response handler twice if ep is already completed
scsi: scsi_transport_srp: Don't block target in failfast state
scsi: docs: ABI: sysfs-driver-ufs: Rectify table formatting
scsi: ufs: Fix tm request when non-fatal error happens
scsi: ufs: Fix livelock of ufshcd_clear_ua_wluns()
scsi: ibmvfc: Fix missing cast of ibmvfc_event pointer to u64 handle
scsi: ufs: ufshcd-pltfrm depends on HAS_IOMEM
scsi: megaraid_sas: Fix MEGASAS_IOC_FIRMWARE regression
scsi: docs: ABI: sysfs-driver-ufs: Add DeepSleep power mode
Pull kunit fixes from Shuah :
"Five fixes to the kunit tool and documentation from Daniel Latypov and
David Gow"
* tag 'linux-kselftest-kunit-fixes-5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
kunit: tool: move kunitconfig parsing into __init__, make it optional
kunit: tool: fix minor typing issue with None status
kunit: tool: surface and address more typing issues
Documentation: kunit: include example of a parameterized test
kunit: tool: Fix spelling of "diagnostic" in kunit_parser
The recently introduced sample rate validation code seems causing a
problem on some devices; namely, after performing this, the bus gets
screwed and it influences even on other USB devices.
As a quick workaround, perform it only for the necessary devices;
currently MOTU devices are known to need the valid altset checks, so
filter out other devices.
Fixes: 93db51d06b ("ALSA: usb-audio: Check valid altsetting at parsing rates for UAC2/3")
Reported-by: Jamie Heilman <jamie@audible.transient.net>
BugLink: https://bugzilla.suse.com/show_bug.cgi?id=1178203
Link: https://lore.kernel.org/r/20210123155842.22652-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
The fix for a long-standing USB-audio bug required one more dependency
variable to be added to the hw constraints. Unfortunately I didn't
realize at debugging that the new addition may result in the overflow
of the dependency array of each snd_pcm_hw_rule (up to three plus a
sentinel), because USB-audio driver adds one more dependency only for
a certain device and bus, hence it works as is for many devices. But
in a bad case, a simple open always results in -EINVAL (with kernel
WARNING if CONFIG_SND_DEBUG is set) no matter what is passed.
Since the dependencies are real and unavoidable (USB-audio restricts
the hw_params per looping over the format/rate/channels combos), the
only good solution seems to raise the bar for one more dependency for
snd_pcm_hw_rule -- so does this patch: now the hw constraint
dependencies can be up to four.
Fixes: 506c203cc3 ("ALSA: usb-audio: Fix hw constraints dependencies")
Reported-by: Jamie Heilman <jamie@audible.transient.net>
Link: https://lore.kernel.org/r/20210123155730.22576-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
With commit 1e860048c5 ("gcc-plugins: simplify GCC plugin-dev
capability test") applied, this hunk can be way simplified because
now scripts/gcc-plugins/Kconfig only checks plugin-version.h
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
RHBZ 1848178
The original intent of returning an error in this function
in the patch:
"CIFS: Mask off signals when sending SMB packets"
was to avoid interrupting packet send in the middle of
sending the data (and thus breaking an SMB connection),
but we also don't want to fail the request for non-fatal
signals even before we have had a chance to try to
send it (the reported problem could be reproduced e.g.
by exiting a child process when the parent process was in
the midst of calling futimens to update a file's timestamps).
In addition, since the signal may remain pending when we enter the
sending loop, we may end up not sending the whole packet before
TCP buffers become full. In this case the code returns -EINTR
but what we need here is to return -ERESTARTSYS instead to
allow system calls to be restarted.
Fixes: b30c74c73c ("CIFS: Mask off signals when sending SMB packets")
Cc: stable@vger.kernel.org # v5.1+
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
The octeontx2 hardware needs the buffer to be 128 byte aligned.
But in the current implementation of napi_alloc_frag(), it can't
guarantee the return address is 128 byte aligned even the request size
is a multiple of 128 bytes, so we have to request an extra 128 bytes and
use the PTR_ALIGN() to make sure that the buffer is aligned correctly.
Fixes: 7a36e4918e ("octeontx2-pf: Use the napi_alloc_frag() to alloc the pool buffers")
Reported-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Tested-by: Subbaraya Sundeep <sbhatta@marvell.com>
Link: https://lore.kernel.org/r/20210121070906.25380-1-haokexin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Change the bcm2835 maintainer info in order to handle subsequent SoCs.
After this get_maintainers.pl provides the proper maintainers for
irqchip-bcm2836.
Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
of_match_node() calls __of_match_node() which loops though the entries of
matches array. It stops when condition:
(matches->name[0] || matches->type[0] || matches->compatible[0]) is
false. Thus, add a null entry at the end of at91_soc_allowed_list[]
array.
Fixes: caab13b496 ("drivers: soc: atmel: Avoid calling at91_soc_init on non AT91 SoCs")
Cc: stable@vger.kernel.org #4.12+
Signed-off-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Pull device mapper fixes from Mike Snitzer:
- Fix DM integrity crash if "recalculate" used without "internal_hash"
- Fix DM integrity "recalculate" support to prevent recalculating
checksums if we use internal_hash or journal_hash with a key (e.g.
HMAC). Use of crypto as a means to prevent malicious corruption
requires further changes and was never a design goal for
dm-integrity's primary usecase of detecting accidental corruption.
- Fix a benign dm-crypt copy-and-paste bug introduced as part of a fix
that was merged for 5.11-rc4.
- Fix DM core's dm_get_device() to avoid filesystem lookup to get block
device (if possible).
* tag 'for-5.11/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm: avoid filesystem lookup in dm_get_dev_t()
dm crypt: fix copy and paste bug in crypt_alloc_req_aead
dm integrity: conditionally disable "recalculate" feature
dm integrity: fix a crash if "recalculate" used without "internal_hash"
A toctou issue in `__cgroup_bpf_run_filter_getsockopt` can trigger a
WARN_ON_ONCE in a check of `copy_from_user`.
`*optlen` is checked to be non-negative in the individual getsockopt
functions beforehand. Changing `*optlen` in a race to a negative value
will result in a `copy_from_user(ctx.optval, optval, ctx.optlen)` with
`ctx.optlen` being a negative integer.
Fixes: 0d01da6afc ("bpf: implement getsockopt and setsockopt hooks")
Signed-off-by: Loris Reiff <loris.reiff@liblor.ch>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20210122164232.61770-1-loris.reiff@liblor.ch
Pull more perf tools fixes from Arnaldo Carvalho de Melo:
- Fix id index used in Intel PT for heterogeneous systems
- Fix overrun issue in 'perf script' for dynamically-allocated PMU type
number
- Fix 'perf stat' metrics containing the 'duration_time' synthetic
event
- Fix system PMU 'perf stat' metrics
* tag 'perf-tools-fixes-v5.11-2-2021-01-22' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
perf script: Fix overrun issue for dynamically-allocated PMU type number
perf metricgroup: Fix system PMU metrics
perf metricgroup: Fix for metrics containing duration_time
perf evlist: Fix id index for heterogeneous systems
Pull arm64 fixes from Catalin Marinas:
- Correctly mask out bits 63:60 in a kernel tag check fault address
(specified as unknown by the architecture). Previously they were just
zeroed but for kernel pointers they need to be all ones.
- Fix a panic (unexpected kernel BRK exception) caused by kprobes being
reentered due to an interrupt.
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: kprobes: Fix Uexpected kernel BRK exception at EL1
kasan, arm64: fix pointer tags in KASAN reports
Pull ceph fixes from Ilya Dryomov:
"A patch to zero out sensitive cryptographic data and two minor
cleanups prompted by the fact that a bunch of code was moved in this
cycle"
* tag 'ceph-for-5.11-rc5' of git://github.com/ceph/ceph-client:
libceph: fix "Boolean result is used in bitwise operation" warning
libceph, ceph: disambiguate ceph_connection_operations handlers
libceph: zero out session key and connection secret
Pull typo fix from Mike Rapoport:
"Fix typo in comment of memblock_phys_alloc_try_nid()"
* tag 'fixes-2021-01-22' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
mm/memblock: Fix typo in comment of memblock_phys_alloc_try_nid()
Pull x86 platform driver fixes from Hans de Goede:
"A small collection of bug-fixes and model-specific quirks"
* tag 'platform-drivers-x86-v5.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/x86: thinkpad_acpi: Add P53/73 firmware to fan_quirk_table for dual fan control
platform/x86: hp-wmi: Don't log a warning on HPWMI_RET_UNKNOWN_COMMAND errors
platform/x86: intel-vbtn: Drop HP Stream x360 Convertible PC 11 from allow-list
platform/x86: ideapad-laptop: Disable touchpad_switch for ELAN0634
platform/x86: amd-pmc: Fix CONFIG_DEBUG_FS check
platform/x86: thinkpad_acpi: correct palmsensor error checking
platform/x86: intel-vbtn: Support for tablet mode on Dell Inspiron 7352
platform/x86: touchscreen_dmi: Add swap-x-y quirk for Goodix touchscreen on Estar Beauty HD tablet
platform/x86: i2c-multi-instantiate: Don't create platform device for INT3515 ACPI nodes
platform/surface: SURFACE_PLATFORMS should depend on ACPI
platform/surface: surface_gpe: Fix non-PM_SLEEP build warnings
tools/power/x86/intel-speed-select: Set higher of cpuinfo_max_freq or base_frequency
tools/power/x86/intel-speed-select: Set scaling_max_freq to base_frequency
Sockets and other non-regular files may actually expect short reads to
happen, don't retry reads for them. Because non-reg files don't set
FMODE_BUF_RASYNC and so it won't do second/retry do_read, we can filter
out those cases after first do_read() attempt with ret>0.
Cc: stable@vger.kernel.org # 5.9+
Suggested-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
IORING_OP_CLOSE is special in terms of cancelation, since it has an
intermediate state where we've removed the file descriptor but hasn't
closed the file yet. For that reason, it's currently marked with
IO_WQ_WORK_NO_CANCEL to prevent cancelation. This ensures that the op
is always run even if canceled, to prevent leaving us with a live file
but an fd that is gone. However, with SQPOLL, since a cancel request
doesn't carry any resources on behalf of the request being canceled, if
we cancel before any of the close op has been run, we can end up with
io-wq not having the ->files assigned. This can result in the following
oops reported by Joseph:
BUG: kernel NULL pointer dereference, address: 00000000000000d8
PGD 800000010b76f067 P4D 800000010b76f067 PUD 10b462067 PMD 0
Oops: 0000 [#1] SMP PTI
CPU: 1 PID: 1788 Comm: io_uring-sq Not tainted 5.11.0-rc4 #1
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
RIP: 0010:__lock_acquire+0x19d/0x18c0
Code: 00 00 8b 1d fd 56 dd 08 85 db 0f 85 43 05 00 00 48 c7 c6 98 7b 95 82 48 c7 c7 57 96 93 82 e8 9a bc f5 ff 0f 0b e9 2b 05 00 00 <48> 81 3f c0 ca 67 8a b8 00 00 00 00 41 0f 45 c0 89 04 24 e9 81 fe
RSP: 0018:ffffc90001933828 EFLAGS: 00010002
RAX: 0000000000000001 RBX: 0000000000000001 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000000000d8
RBP: 0000000000000246 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: ffff888106e8a140 R15: 00000000000000d8
FS: 0000000000000000(0000) GS:ffff88813bd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000000d8 CR3: 0000000106efa004 CR4: 00000000003706e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
lock_acquire+0x31a/0x440
? close_fd_get_file+0x39/0x160
? __lock_acquire+0x647/0x18c0
_raw_spin_lock+0x2c/0x40
? close_fd_get_file+0x39/0x160
close_fd_get_file+0x39/0x160
io_issue_sqe+0x1334/0x14e0
? lock_acquire+0x31a/0x440
? __io_free_req+0xcf/0x2e0
? __io_free_req+0x175/0x2e0
? find_held_lock+0x28/0xb0
? io_wq_submit_work+0x7f/0x240
io_wq_submit_work+0x7f/0x240
io_wq_cancel_cb+0x161/0x580
? io_wqe_wake_worker+0x114/0x360
? io_uring_get_socket+0x40/0x40
io_async_find_and_cancel+0x3b/0x140
io_issue_sqe+0xbe1/0x14e0
? __lock_acquire+0x647/0x18c0
? __io_queue_sqe+0x10b/0x5f0
__io_queue_sqe+0x10b/0x5f0
? io_req_prep+0xdb/0x1150
? mark_held_locks+0x6d/0xb0
? mark_held_locks+0x6d/0xb0
? io_queue_sqe+0x235/0x4b0
io_queue_sqe+0x235/0x4b0
io_submit_sqes+0xd7e/0x12a0
? _raw_spin_unlock_irq+0x24/0x30
? io_sq_thread+0x3ae/0x940
io_sq_thread+0x207/0x940
? do_wait_intr_irq+0xc0/0xc0
? __ia32_sys_io_uring_enter+0x650/0x650
kthread+0x134/0x180
? kthread_create_worker_on_cpu+0x90/0x90
ret_from_fork+0x1f/0x30
Fix this by moving the IO_WQ_WORK_NO_CANCEL until _after_ we've modified
the fdtable. Canceling before this point is totally fine, and running
it in the io-wq context _after_ that point is also fine.
For 5.12, we'll handle this internally and get rid of the no-cancel
flag, as IORING_OP_CLOSE is the only user of it.
Cc: stable@vger.kernel.org
Fixes: b5dba59e0c ("io_uring: add support for IORING_OP_CLOSE")
Reported-by: "Abaci <abaci@linux.alibaba.com>"
Reviewed-and-tested-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Fix rcu_sched trace from OP-TEE invoke
Replaces might_sleep() with a conditional call to cond_resched()
in order to avoid the rcu_sched trace in some corner cases.
* tag 'optee-rcu-sched-trace-for-v5.11' of git://git.linaro.org/people/jens.wiklander/linux-tee:
tee: optee: replace might_sleep with cond_resched
Link: https://lore.kernel.org/r/20210122074234.GA1074747@jade
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Now that we have KTHREAD_IS_PER_CPU to denote the critical per-cpu
tasks to retain during CPU offline, we can relax the warning in
set_cpus_allowed_ptr(). Any spurious kthread that wants to get on at
the last minute will get pushed off before it can run.
While during CPU online there is no harm, and actual benefit, to
allowing kthreads back on early, it simplifies hotplug code and fixes
a number of outstanding races.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Lai jiangshan <jiangshanlai@gmail.com>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Tested-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lkml.kernel.org/r/20210121103507.240724591@infradead.org
Prior to commit 1cf12e08bc ("sched/hotplug: Consolidate task
migration on CPU unplug") we'd leave any task on the dying CPU and
break affinity and force them off at the very end.
This scheme had to change in order to enable migrate_disable(). One
cannot wait for migrate_disable() to complete while stuck in
stop_machine(). Furthermore, since we need at the very least: idle,
hotplug and stop threads at any point before stop_machine, we can't
break affinity and/or push those away.
Under the assumption that all per-cpu kthreads are sanely handled by
CPU hotplug, the new code no long breaks affinity or migrates any of
them (which then includes the critical ones above).
However, there's an important difference between per-cpu kthreads and
kthreads that happen to have a single CPU affinity which is lost. The
latter class very much relies on the forced affinity breaking and
migration semantics previously provided.
Use the new kthread_is_per_cpu() infrastructure to tighten
is_per_cpu_kthread() and fix the hot-unplug problems stemming from the
change.
Fixes: 1cf12e08bc ("sched/hotplug: Consolidate task migration on CPU unplug")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Tested-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lkml.kernel.org/r/20210121103507.102416009@infradead.org
In preparation of using the balance_push state in ttwu() we need it to
provide a reliable and consistent state.
The immediate problem is that rq->balance_callback gets cleared every
schedule() and then re-set in the balance_push_callback() itself. This
is not a reliable signal, so add a variable that stays set during the
entire time.
Also move setting it before the synchronize_rcu() in
sched_cpu_deactivate(), such that we get guaranteed visibility to
ttwu(), which is a preempt-disable region.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Tested-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lkml.kernel.org/r/20210121103506.966069627@infradead.org
create_worker() will already set the right affinity using
kthread_bind_mask(), this means only the rescuer will need to change
it's affinity.
Howveer, while in cpu-hot-unplug a regular task is not allowed to run
on online&&!active as it would be pushed away quite agressively. We
need KTHREAD_IS_PER_CPU to survive in that environment.
Therefore set the affinity after getting that magic flag.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Tested-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lkml.kernel.org/r/20210121103506.826629830@infradead.org
There is a need to distinguish geniune per-cpu kthreads from kthreads
that happen to have a single CPU affinity.
Geniune per-cpu kthreads are kthreads that are CPU affine for
correctness, these will obviously have PF_KTHREAD set, but must also
have PF_NO_SETAFFINITY set, lest userspace modify their affinity and
ruins things.
However, these two things are not sufficient, PF_NO_SETAFFINITY is
also set on other tasks that have their affinities controlled through
other means, like for instance workqueues.
Therefore another bit is needed; it turns out kthread_create_per_cpu()
already has such a bit: KTHREAD_IS_PER_CPU, which is used to make
kthread_park()/kthread_unpark() work correctly.
Expose this flag and remove the implicit setting of it from
kthread_create_on_cpu(); the io_uring usage of it seems dubious at
best.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Tested-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lkml.kernel.org/r/20210121103506.557620262@infradead.org
The scheduler won't break affinity for us any more, and we should
"emulate" the same behavior when the scheduler breaks affinity for
us. The behavior is "changing the cpumask to cpu_possible_mask".
And there might be some other CPUs online later while the worker is
still running with the pending work items. The worker should be allowed
to use the later online CPUs as before and process the work items ASAP.
If we use cpu_active_mask here, we can't achieve this goal but
using cpu_possible_mask can.
Fixes: 06249738a4 ("workqueue: Manually break affinity on hotplug")
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Acked-by: Tejun Heo <tj@kernel.org>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
Tested-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lkml.kernel.org/r/20210111152638.2417-4-jiangshanlai@gmail.com
Since commit
1cf12e08bc ("sched/hotplug: Consolidate task migration on CPU unplug")
tasks are expected to move themselves out of a out-going CPU. For most
tasks this will be done automagically via BALANCE_PUSH, but percpu kthreads
will have to cooperate and move themselves away one way or another.
Currently, some percpu kthreads (workqueues being a notable exemple) do not
cooperate nicely and can end up on an out-going CPU at the time
sched_cpu_dying() is invoked.
Print the dying rq's tasks to shed some light on the stragglers.
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Tested-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lkml.kernel.org/r/20210113183141.11974-1-valentin.schneider@arm.com
With commit eaa7995c52 (regulator: core: avoid
regulator_resolve_supply() race condition) we started holding the rdev
lock while resolving supplies, an operation that requires holding the
regulator_list_mutex. This results in lockdep warnings since in other
places we take the list mutex then the mutex on an individual rdev.
Since the goal is to make sure that we don't call set_supply() twice
rather than a concern about the cost of resolution pull the rdev lock
and check for duplicate resolution down to immediately before we do the
set_supply() and drop it again once the allocation is done.
Fixes: eaa7995c52 (regulator: core: avoid regulator_resolve_supply() race condition)
Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20210122132042.10306-1-broonie@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
The I2C_SPRD uses Common Clock Framework thus it cannot be built on
platforms without it (e.g. compile test on MIPS with LANTIQ):
/usr/bin/mips-linux-gnu-ld: drivers/i2c/busses/i2c-sprd.o: in function `sprd_i2c_probe':
i2c-sprd.c:(.text.sprd_i2c_probe+0x254): undefined reference to `clk_set_parent'
Fixes: 4a2d5f663d ("i2c: Enable compile testing for more drivers")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Reviewed-by: Baolin Wang <baolin.wang7@gmail.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
The patch that added src_dma/dst_dma to struct mv_cesa_tdma_desc
is broken on 64-bit systems as the size of the descriptor has been
changed. This patch fixes it by using u32 instead of dma_addr_t.
Fixes: e62291c1d9 ("crypto: marvell/cesa - Fix sparse warnings")
Cc: <stable@vger.kernel.org>
Reported-by: Sven Auhagen <sven.auhagen@voleatech.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Pull OpenRISC fixes from Stafford Horne:
- Compiler warning fixup for new Litex SoC driver
- Sparse warning fixup for iounmap
* tag 'for-linus' of git://github.com/openrisc/linux:
openrisc: io: Add missing __iomem annotation to iounmap()
soc: litex: Fix compile warning when device tree is not configured
Pull drm fixes from Dave Airlie:
"Regular fixes pull, nothing too major in here, just some core fixes,
one vc4, bunch of i915 and a bunch of amdgpu.
core:
- atomic: Release state on error
- syncobj: Fix use-after-free
- ttm: Don't use GFP_TRANSHUGE_LIGTH
- vram-helper: Fix memory leak in vmap
vc4:
- Unify driver naming for PCM
i915:
- HDCP fixes
- PMU wakeref fix
- Fix HWSP validity race
- Fix DP protocol converter accidental 4:4:4->4:2:0 conversion for RGB
amdgpu:
- Green Sardine fixes
- Vangogh fixes
- Renoir fixes
- Misc display fixes"
* tag 'drm-fixes-2021-01-22' of git://anongit.freedesktop.org/drm/drm: (21 commits)
drm/amdgpu: update mmhub mgcg&ls for mmhub_v2_3
drm/amdgpu: modify GCR_GENERAL_CNTL for Vangogh
drm/amdgpu/pm: no need GPU status set since mmnbif_gpu_BIF_DOORBELL_FENCE_CNTL added in FSDL
drm/amd/display: Fixed corruptions on HPDRX link loss restore
drm/amd/display: Use hardware sequencer functions for PG control
drm/amd/display: Change function decide_dp_link_settings to avoid infinite looping
drm/amd/display: Allow PSTATE chnage when no displays are enabled
drm/amd/display: Update dram_clock_change_latency for DCN2.1
drm/amdgpu: remove gpu info firmware of green sardine
drm/amd/display: DCN2X Find Secondary Pipe properly in MPO + ODM Case
drm/syncobj: Fix use-after-free
drm/vram-helper: Reuse existing page mappings in vmap
drm/atomic: put state on error path
drm/i915: Only enable DFP 4:4:4->4:2:0 conversion when outputting YCbCr 4:4:4
drm/i915: Check for rq->hwsp validity after acquiring RCU lock
drm/i915/pmu: Don't grab wakeref when enabling events
drm/i915/gt: Prevent use of engine->wa_ctx after error
drm/vc4: Unify PCM card's driver_name
drm/ttm: stop using GFP_TRANSHUGE_LIGHT
drm/i915/hdcp: Get conn while content_type changed
...
Thanks to a recent binutils change which doesn't generate unused
symbols, it's now possible for thunk_64.o be completely empty without
CONFIG_PREEMPTION: no text, no data, no symbols.
We could edit the Makefile to only build that file when
CONFIG_PREEMPTION is enabled, but that will likely create confusion
if/when the thunks end up getting used by some other code again.
Just ignore it and move on.
Reported-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Tested-by: Nathan Chancellor <natechancellor@gmail.com>
Link: https://github.com/ClangBuiltLinux/linux/issues/1254
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
This is basically a revert of commit 644592d328 ("objtool: Fail the
kernel build on fatal errors").
That change turned out to be more trouble than it's worth. Failing the
build is an extreme measure which sometimes gets too much attention and
blocks CI build testing.
These fatal-type warnings aren't yet as rare as we'd hope, due to the
ever-increasing matrix of supported toolchains/plugins and their
fast-changing nature as of late.
Also, there are more people (and bots) looking for objtool warnings than
ever before, so even non-fatal warnings aren't likely to be ignored for
long.
Suggested-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Some distributions are about to switch to Python 3 support only.
This means that /usr/bin/python, which is Python 2, is not available
anymore. Hence, switch scripts to use Python 3 explicitly.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Metrics containing duration_time cause a segfault:
$ perf stat -v -M L1D_Cache_Fill_BW sleep 1
Using CPUID GenuineIntel-6-3D-4
metric expr 64 * l1d.replacement / 1000000000 / duration_time for L1D_Cache_Fill_BW
found event duration_time
found event l1d.replacement
adding {l1d.replacement}:W,duration_time
l1d.replacement -> cpu/umask=0x1,(null)=0x1e8483,event=0x51/
Segmentation fault
$
In commit c2337d6719 ("perf metricgroup: Fix metrics using aliases
covering multiple PMUs"), the logic in find_evsel_group() when iter'ing
events was changed to not only select events in same group, but also for
aliased PMUs.
Checking whether events were for aliased PMUs was done by comparing the
event PMU name. This was not safe for duration_time event, which has no
associated PMU (and no PMU name), so fix by checking if the event PMU name
is set also.
Committer testing:
Reproduced the bug, then, on a:
$ grep -m1 ^'model name' /proc/cpuinfo
model name : Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz
$
We now get:
$ perf stat -M L1D_Cache_Fill_BW sleep 1
Performance counter stats for 'sleep 1':
4,141 l1d.replacement:u
1,001,285,107 ns duration_time:u
1.001285107 seconds time elapsed
0.000000000 seconds user
0.001119000 seconds sys
$
Detais from -v:
Using CPUID GenuineIntel-6-8E-A
metric expr 64 * l1d.replacement / 1000000000 / duration_time for L1D_Cache_Fill_BW
found event duration_time
found event l1d.replacement
adding {l1d.replacement}:W,duration_time
l1d.replacement -> cpu/(null)=0x1e8483,umask=0x1,event=0x51/
Control descriptor is not initialized
Warning:
kernel.perf_event_paranoid=2, trying to fall back to excluding kernel and hypervisor samples
Warning:
kernel.perf_event_paranoid=2, trying to fall back to excluding kernel and hypervisor samples
l1d.replacement:u: 4592 612201 612201
duration_time:u: 1001478621 1001478621 1001478621
Fixes: c2337d6719 ("perf metricgroup: Fix metrics using aliases covering multiple PMUs")
Reported-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linuxarm@openeuler.org
Link: https://lore.kernel.org/r/1611159518-226883-1-git-send-email-john.garry@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf_evlist__set_sid_idx() updates perf_sample_id with the evlist map
index, CPU number and TID. It is passed indexes to the evsel's cpu and
thread maps, but references the evlist's maps instead. That results in
using incorrect CPU numbers on heterogeneous systems. Fix it by using
evsel maps.
The id index (PERF_RECORD_ID_INDEX) is used by AUX area tracing when in
sampling mode. Having an incorrect CPU number causes the trace data to
be attributed to the wrong CPU, and can result in decoder errors because
the trace data is then associated with the wrong process.
Committer notes:
Keep the class prefix convention in the function name, switching from
perf_evlist__set_sid_idx() to perf_evsel__set_sid_idx().
Fixes: 3c659eedad ("perf tools: Add id index")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lore.kernel.org/lkml/20210121125446.11287-1-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This reverts commit
644bda6f34 ("dm table: fall back to getting device using name_to_dev_t()")
dm_get_dev_t() is just used to convert an arbitrary 'path' string
into a dev_t. It doesn't presume that the device is present; that
check will be done later, as the only caller is dm_get_device(),
which does a dm_get_table_device() later on, which will properly
open the device.
So if the path string already _is_ in major:minor representation
we can convert it directly, avoiding a recursion into the filesystem
to lookup the block device.
This avoids a hang in multipath_message() when the filesystem is
inaccessible.
Fixes: 644bda6f34 ("dm table: fall back to getting device using name_to_dev_t()")
Cc: stable@vger.kernel.org
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin Wilck <mwilck@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
In commit d68b29584c ("dm crypt: use GFP_ATOMIC when allocating
crypto requests from softirq") code was incorrectly copy and pasted
from crypt_alloc_req_skcipher()'s crypto request allocation code to
crypt_alloc_req_aead(). It is OK from runtime perspective as both
simple encryption request pointer and AEAD request pointer are part of
a union, but may confuse code reviewers.
Fixes: d68b29584c ("dm crypt: use GFP_ATOMIC when allocating crypto requests from softirq")
Cc: stable@vger.kernel.org # v5.9+
Reported-by: Pavel Machek <pavel@denx.de>
Signed-off-by: Ignat Korchagin <ignat@cloudflare.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Otherwise a malicious user could (ab)use the "recalculate" feature
that makes dm-integrity calculate the checksums in the background
while the device is already usable. When the system restarts before all
checksums have been calculated, the calculation continues where it was
interrupted even if the recalculate feature is not requested the next
time the dm device is set up.
Disable recalculating if we use internal_hash or journal_hash with a
key (e.g. HMAC) and we don't have the "legacy_recalculate" flag.
This may break activation of a volume, created by an older kernel,
that is not yet fully recalculated -- if this happens, the user should
add the "legacy_recalculate" flag to constructor parameters.
Cc: stable@vger.kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Reported-by: Daniel Glockner <dg@emlix.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Pull fs and udf fixes from Jan Kara:
"A lazytime handling fix from Eric Biggers and a fix of UDF session
handling for large devices"
* tag 'fs_for_v5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
udf: fix the problem that the disc content is not displayed
fs: fix lazytime expiration handling in __writeback_single_inode()
Oded writes:
This tag contains the following bug fixes for 5.11-rc5/6:
- Clear the fence field in the PCI counters packet before sending
the packet to the F/W. Not clearing it might cause the driver
and F/W to get out-of-sync
- Fix backward compatibility in the uapi of IDLE check that is
part of the INFO IOCTL.
- Tell the F/W to not access the Host (device outbound) while
the driver removes the device. If that happens, the server
might crash.
* tag 'misc-habanalabs-fixes-2021-01-21' of https://git.kernel.org/pub/scm/linux/kernel/git/ogabbay/linux:
habanalabs: disable FW events on device removal
habanalabs: fix backward compatibility of idle check
habanalabs: zero pci counters packet before submit to FW
Pull printk fixes from Petr Mladek:
- Fix line counting and buffer size calculation. Both regressions
caused that a reader buffer might not get filled as much as possible.
- Restore non-documented behavior of printk() reader API and make it
official.
It did not fill the last byte of the provided buffer before 5.10. Two
architectures, powerpc and um, used it to add the trailing '\0'.
There might theoretically be more callers depending on this behavior
in userspace.
* tag 'printk-for-5.11-printk-rework-fixup' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
printk: fix buffer overflow potential for print_text()
printk: fix kmsg_dump_get_buffer length calulations
printk: ringbuffer: fix line counting
Pull ACPI fix from Rafael Wysocki:
"Modify a helper function in the ACPI core to match the behavior
expected by its users so as to prevent NULL pointer dereferences and
occasional memory corruption from occurring (Hans de Goede)"
* tag 'acpi-5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: scan: Make acpi_bus_get_device() clear return pointer on error
Pull sound fixes from Takashi Iwai:
"Here is a collection of sound fixes targeted for 5.11-rc5. Most
notably, USB-audio still got a few intensive changes for covering the
regressions while the rest are all small fixes.
- A trivial fix for sequencer OSS emulation error path
- HD-audio runtime PM regression fix, a few quirks and new IDs
- USB-audio regression fixes for Pioneer device, Logitech webcams,
etc
- ASoC SOF Intel coverage
- MAINTAINERS file update
- A fix in the jack handling in ASoC HDMI codec"
* tag 'sound-5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: usb-audio: Fix hw constraints dependencies
ALSA: hda: Balance runtime/system PM if direct-complete is disabled
ALSA: usb-audio: Avoid implicit feedback on Pioneer devices
ALSA: usb-audio: Set sample rate for all sharing EPs on UAC1
ALSA: usb-audio: Fix UAC1 rate setup for secondary endpoints
MAINTAINERS: update qcom ASoC drivers list
MAINTAINERS: update maintainers of qcom audio
ALSA: hda: Add Cometlake-R PCI ID
ALSA: seq: oss: Fix missing error check in snd_seq_oss_synth_make_info()
ALSA: hda/via: Add minimum mute flag
ALSA: hda/realtek - Limit int mic boost on Acer Aspire E5-575T
ALSA: usb-audio: Always apply the hw constraints for implicit fb sync
ASoC: SOF: Intel: fix page fault at probe if i915 init fails
ALSA: hda: Add AlderLake-P PCI ID and HDMI codec vid
ASoC: SOF: Intel: hda: Avoid checking jack on system suspend
ASoC: SOF: Intel: hda: Modify existing helper to disable WAKEEN
ASoC: SOF: Intel: hda: Resume codec to do jack detection
MAINTAINERS: update references to stm32 audio bindings
ASoC: hdmi-codec: Fix return value in hdmi_codec_set_jack()
Pull gpio fixes from Bartosz Golaszewski:
- rework the character device code to avoid a frame size warning
- fix printk format issues in gpio-tools
- warn on redefinition of the to_irq callback in core gpiolib code
- fix PWM period calculation in gpio-mvebu
- make gpio-sifive Kconfig entry consistent with other drivers
- fix a build issue in gpio-tegra
* tag 'gpio-fixes-for-v5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpio: tegra: Add missing dependencies
gpio: sifive: select IRQ_DOMAIN_HIERARCHY rather than depend on it
gpio: mvebu: fix pwm .get_state period calculation
gpiolib: add a warning on gpiochip->to_irq defined
tools: gpio: fix %llu warning in gpio-watch.c
tools: gpio: fix %llu warning in gpio-event-mon.c
gpiolib: cdev: fix frame size warning in gpio_ioctl()
Pull pin control fixes from Linus Walleij:
"These are all driver fixes, the Qualcomm stuff is the most widely used
and important:
- The main matter is a complicated fixup for the Qualcomm deep sleep
states.
This manifests in how interrupts get handled or in some cases not
handled in cooperation with the PDC (Power Domain Controller). It's
one of these really hardcore bug fixes that signifies high maturity
of the platform.
- Fix a register layout problem in the JZ4760 driver
- Fix a register offset in the Aspeed G6 driver
- Fix a compiler warning in the Nomadik driver
- Fix a fallback code path in the mediatek driver"
* tag 'pinctrl-v5.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: qcom: Don't clear pending interrupts when enabling
pinctrl: qcom: Properly clear "intr_ack_high" interrupts when unmasking
pinctrl: qcom: No need to read-modify-write the interrupt status
pinctrl: qcom: Allow SoCs to specify a GPIO function that's not 0
pinctrl: mediatek: Fix fallback call path
pinctrl: nomadik: Remove unused variable in nmk_gpio_dbg_show_one
pinctrl: aspeed: g6: Fix PWMG0 pinctrl setting
pinctrl: ingenic: Rename registers from JZ4760_GPIO_* to JZ4770_GPIO_*
pinctrl: ingenic: Fix JZ4760 support
Steffen Klassert says:
====================
pull request (net): ipsec 2021-01-21
1) Fix a rare panic on SMP systems when packet reordering
happens between anti replay check and update.
From Shmulik Ladkani.
2) Fix disable_xfrm sysctl when used on xfrm interfaces.
From Eyal Birger.
3) Fix a race in PF_KEY when the availability of crypto
algorithms is set. From Cong Wang.
4) Fix a return value override in the xfrm policy selftests.
From Po-Hsu Lin.
5) Fix an integer wraparound in xfrm_policy_addr_delta.
From Visa Hankala.
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec:
xfrm: Fix wraparound in xfrm_policy_addr_delta()
selftests: xfrm: fix test return value override issue in xfrm_policy.sh
af_key: relax availability checks for skb size calculation
xfrm: fix disable_xfrm sysctl when used on xfrm interfaces
xfrm: Fix oops in xfrm_replay_advance_bmp
====================
Link: https://lore.kernel.org/r/20210121121558.621339-1-steffen.klassert@secunet.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When device is removed, we need to make sure the F/W won't send us
any more events because during the remove process we disable the
interrupts.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Need to take the lower 32 bits of the driver's 64-bit idle mask and put
it in the legacy 32-bit variable that the userspace reads to know the
idle mask.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Driver does not zero some pci counters packets before sending
to FW. This causes an out of sync PI/CI between driver and FW.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
If the device passed as the target (second argument) to
device_is_dependent() is not completely registered (that is, it has
been initialized, but not added yet), but the parent pointer of it
is set, it may be missing from the list of the parent's children
and device_for_each_child() called by device_is_dependent() cannot
be relied on to catch that dependency.
For this reason, modify device_is_dependent() to check the ancestors
of the target device by following its parent pointer in addition to
the device_for_each_child() walk.
Fixes: 9ed9895370 ("driver core: Functional dependencies tracking support")
Reported-by: Stephan Gerhold <stephan@gerhold.net>
Tested-by: Stephan Gerhold <stephan@gerhold.net>
Reviewed-by: Saravana Kannan <saravanak@google.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/r/17705994.d592GUb2YH@kreacher
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This line dates back to 2013, but cppcheck complained because commit
2f713615dd ("libceph: move msgr1 protocol implementation to its own
file") moved it. Add parenthesis to silence the warning.
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Starting from vangogh, the ATCL2 and DAGB0 registers relative
to mgcg/ls has changed.
For MGCG:
Replace mmMM_ATC_L2_MISC_CG with mmMM_ATC_L2_CGTT_CLK_CTRL.
For MGLS:
Replace mmMM_ATC_L2_MISC_CG with mmMM_ATC_L2_CGTT_CLK_CTRL.
Add DAGB0_(WR/RD)_CGTT_CLK_CTRL registers.
Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Acked-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
GCR_GENERAL_CNTL is defined differently in gc_10_1_0_offset.h and
gc_10_3_0_offset.h. Update GCR_GENERAL_CNTL for Vangogh.
Signed-off-by: Jinzhou Su <Jinzhou.Su@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
In the renoir there is no need GpuChangeState message set to exit gfxoff in the s0i3 resume since
mmnbif_gpu_BIF_DOORBELL_FENCE_CNTL has been added in the s0i3 FSDL.
Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
[why]
Heavy corruption or blank screen reported on wake,
with 6k display connected and FEC enabled
[how]
When Disable/Enable stream for display pipes on HPDRX,
DC should take into account ODM split pipes.
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Vladimir Stempen <vladimir.stempen@amd.com>
Reviewed-by: Aric Cyr <Aric.Cyr@amd.com>
Acked-by: Anson Jacob <anson.jacob@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Why:
Function decide_dp_link_settings() loops infinitely when required bandwidth
can't be supported.
How:
Check the required bandwidth against verified_link_cap before trying to
find a link setting for it.
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Bing Guo <bing.guo@amd.com>
Reviewed-by: Jun Lei <Jun.Lei@amd.com>
Acked-by: Anson Jacob <anson.jacob@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
When no displays are currently enabled, display driver should not
disallow PSTATE switching.
[How]
Allow PSTATE switching if either the active configuration supports it,
or there are no active displays.
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Aric Cyr <aric.cyr@amd.com>
Reviewed-by: Jun Lei <Jun.Lei@amd.com>
Acked-by: Anson Jacob <anson.jacob@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[WHY]
dram clock change latencies get updated using ddr4 latency table, but
that update does not happen before validation. This value
should not be the default and should be number received from
df for better mode support.
This may cause a PState hang on high refresh panels with short vblanks
such as on 1080p 360hz or 300hz panels.
[HOW]
Update latency from 23.84 to 11.72.
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Jake Wang <haonan.wang2@amd.com>
Reviewed-by: Sung Lee <Sung.Lee@amd.com>
Acked-by: Anson Jacob <anson.jacob@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[WHY]
Previously as MPO + ODM Combine was not supported, finding secondary pipes
for each case was mutually exclusive. Now that both are supported at the same
time, both cases should be taken into account when finding a secondary pipe.
[HOW]
If a secondary pipe cannot be found based on previous bottom pipe,
search for a second pipe using next_odm_pipe instead.
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Sung Lee <sung.lee@amd.com>
Reviewed-by: Dmytro Laktyushkin <Dmytro.Laktyushkin@amd.com>
Acked-by: Anson Jacob <anson.jacob@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 5.10.x
The use of a tagged address could be pretty confusing for the
whole memslot infrastructure as well as the MMU notifiers.
Forbid it altogether, as it never quite worked the first place.
Cc: stable@vger.kernel.org
Reported-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
The allocated page is not released if error occurs in
nvm_submit_io_sync_raw(). __free_page() is moved ealier to avoid
possible memory leak issue.
Fixes: aff3fb18f9 ("lightnvm: move bad block and chunk state logic to core")
Signed-off-by: Pan Bian <bianpan2016@163.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull NVMe fixes from Christoph:
"nvme fixes for 5.11:
- fix a status code in nvmet (Chaitanya Kulkarni)
- avoid double completions in nvme-rdma/nvme-tcp (Chao Leng)
- fix the CMB support to cope with NVMe 1.4 controllers (Klaus Jensen)
- fix PRINFO handling in the passthrough ioctl (Revanth Rajashekar)
- fix a double DMA unmap in nvme-pci"
* tag 'nvme-5.11-2020-01-21' of git://git.infradead.org/nvme:
nvme-pci: fix error unwind in nvme_map_data
nvme-pci: refactor nvme_unmap_data
nvmet: set right status on error in id-ns handler
nvme-pci: allow use of cmb on v1.4 controllers
nvme-tcp: avoid request double completion for concurrent nvme_tcp_timeout
nvme-rdma: avoid request double completion for concurrent nvme_rdma_timeout
nvme: check the PRINFO bit before deciding the host buffer length
Suspending/resuming with an HDMI dongle attached leads to crashes from
an audio regmap.
Unable to handle kernel paging request at virtual address ffffffc018068000
Mem abort info:
ESR = 0x96000047
EC = 0x25: DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
Data abort info:
ISV = 0, ISS = 0x00000047
CM = 0, WnR = 1
swapper pgtable: 4k pages, 39-bit VAs, pgdp=0000000081b12000
[ffffffc018068000] pgd=0000000275d14003, pud=0000000275d14003, pmd=000000026365d003, pte=0000000000000000
Internal error: Oops: 96000047 [#1] PREEMPT SMP
Call trace:
regmap_mmio_write32le+0x2c/0x40
regmap_mmio_write+0x48/0x6c
_regmap_bus_reg_write+0x34/0x44
_regmap_write+0x100/0x150
regcache_default_sync+0xc0/0x138
regcache_sync+0x188/0x26c
lpass_platform_pcmops_resume+0x48/0x54 [snd_soc_lpass_platform]
snd_soc_component_resume+0x28/0x40
soc_resume_deferred+0x6c/0x178
process_one_work+0x208/0x3c8
worker_thread+0x23c/0x3e8
kthread+0x144/0x178
ret_from_fork+0x10/0x18
Code: d503201f d50332bf f94002a8 8b344108 (b9000113)
I can reliably reproduce this problem by running 'tail' on the registers
file in debugfs for the hdmi regmap.
# tail /sys/kernel/debug/regmap/62d87000.lpass-lpass_hdmi/registers
[ 84.658733] Unable to handle kernel paging request at virtual address ffffffd0128e800c
This crash happens because we're trying to read registers from the
regmap beyond the length of the mapping created by ioremap().
The number of hdmi_rdma_channels determines the size of the regmap via
this code in sound/soc/qcom/lpass-cpu.c:
lpass_hdmi_regmap_config.max_register = LPAIF_HDMI_RDMAPER_REG(variant, variant->hdmi_rdma_channels);
According to debugfs the size of the regmap is 0x68010 but according to
the DTS file posted in [1] the size is only 0x68000 (see the first reg
property of the lpass_cpu node). Let's change the number of channels to
be 3 instead of 4 so the math works out to have a max register of
0x67010, nicely fitting inside of the region size of 0x68000.
Note: I tried to bump up the size of the register region to the next
page to include the 0x68010 register but then the tail command caused
SErrors with an async abort, implying that the register region doesn't
exist or it isn't clocked because the bus is telling us that the
register read failed. I reduce the number of channels and played audio
through the HDMI channel and it kept working so I think this is correct.
Fixes: 2ad63dc8df ("ASoC: qcom: sc7180: Add support for audio over DP")
Link: https://lore.kernel.org/r/1601448168-18396-2-git-send-email-srivasam@codeaurora.org [1]
Cc: V Sujith Kumar Reddy <vsujithk@codeaurora.org>
Cc: Srinivasa Rao <srivasam@codeaurora.org>
Cc: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Cc: Cheng-Yi Chiang <cychiang@chromium.org>
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Link: https://lore.kernel.org/r/20210115203329.846824-1-swboyd@chromium.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Currently, requesting kernel FPU access doesn't distinguish which parts of
the extended ("FPU") state are needed. This is nice for simplicity, but
there are a few cases in which it's suboptimal:
- The vast majority of in-kernel FPU users want XMM/YMM/ZMM state but do
not use legacy 387 state. These users want MXCSR initialized but don't
care about the FPU control word. Skipping FNINIT would save time.
(Empirically, FNINIT is several times slower than LDMXCSR.)
- Code that wants MMX doesn't want or need MXCSR initialized.
_mmx_memcpy(), for example, can run before CR4.OSFXSR gets set, and
initializing MXCSR will fail because LDMXCSR generates an #UD when the
aforementioned CR4 bit is not set.
- Any future in-kernel users of XFD (eXtended Feature Disable)-capable
dynamic states will need special handling.
Add a more specific API that allows callers to specify exactly what they
want.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Krzysztof Piotr Olędzki <ole@ans.pl>
Link: https://lkml.kernel.org/r/aff1cac8b8fc7ee900cf73e8f2369966621b053f.1611205691.git.luto@kernel.org
KASAN in HW_TAGS mode will store MTE tags in the top byte of the
pointer. When computing the offset for TPIDR_EL2 we don't want anything
in the top byte, so remove the tag to ensure the computation is correct
no matter what the tag.
Fixes: 94ab5b61ee ("kasan, arm64: enable CONFIG_KASAN_HW_TAGS")
Signed-off-by: Steven Price <steven.price@arm.com>
[maz: added comment]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210108161254.53674-1-steven.price@arm.com
We want the single "splice/sendfile to a tty" regression fix into
tty-linus so it can get into 5.11-final, while the larger patch series
fixing "splice/sendfile from a tty" should wait for 5.12-rc1 so that we
get more testing.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
memblock_phys_alloc_try_nid function's comments has typo NUMA as MUMA.
Correct this typo.
Signed-off-by: Levi Yun <ppbuk5246@gmail.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
KSZ8794CNX datasheet section 8.0 RESET CIRCUIT describes recommended
circuit for interfacing with CPU/FPGA reset consisting of 10k pullup
resistor and 10uF capacitor to ground. This circuit takes ~100 ms to
rise enough to release the reset.
For maximum supply voltage VDDIO=3.3V VIH=2.0V R=10kR C=10uF that is
VDDIO - VIH
t = R * C * -ln( ------------- ) = 10000*0.00001*-(-0.93)=0.093 s
VDDIO
so we need ~95 ms for the reset to really de-assert, and then the
original 100us for the switch itself to come out of reset. Simply
msleep() for 100 ms which fits the constraint with a bit of extra
space.
Fixes: 5b79798090 ("net: dsa: microchip: Implement recommended reset timing")
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Michael Grzeschik <m.grzeschik@pengutronix.de>
Reviewed-by: Paul Barker <pbarker@konsulko.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20210120030502.617185-1-marex@denx.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The KSZ8795 switch has 4 external ports {0,1,2,3} and 1 CPU port {4}, so
does the KSZ8765. The KSZ8794 seems to be repackaged KSZ8795 with different
ID and port 3 not routed out, however the port 3 registers are present in
the silicon, so the KSZ8794 switch has 3 external ports {0,1,2} and 1 CPU
port {4}. Currently the driver always uses the last port as CPU port, on
KSZ8795/KSZ8765 that is port 4 and that is OK, but on KSZ8794 that is port
3 and that is not OK, as it must also be port 4.
This patch adjusts the driver such that it always registers a switch with
5 ports total (4 external ports, 1 CPU port), always sets the CPU port to
switch port 4, and then configures the external port mask according to
the switch model -- 3 ports for KSZ8794 and 4 for KSZ8795/KSZ8765.
Fixes: 68a1b676db ("net: dsa: microchip: ksz8795: remove superfluous port_cnt assignment")
Fixes: 4ce2a984ab ("net: dsa: microchip: ksz8795: use phy_port_cnt where possible")
Fixes: 241ed719bc ("net: dsa: microchip: ksz8795: use port_cnt instead of TOTOAL_PORT_NUM")
Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Michael Grzeschik <m.grzeschik@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20210120001045.488506-1-marex@denx.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This makes the tty layer use the .write_iter() function instead of the
traditional .write() functionality.
That allows writev(), but more importantly also makes it possible to
enable .splice_write() for ttys, reinstating the "splice to tty"
functionality that was lost in commit 36e2c7421f ("fs: don't allow
splice read/write without explicit ops").
Fixes: 36e2c7421f ("fs: don't allow splice read/write without explicit ops")
Reported-by: Oliver Giles <ohw.giles@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
LPASS driver is partially broken on DragonBoard DB410c on 5.10 and
its totally broken on other Supported Qualcomm SoCs.
This was due to DAI ids being over written by the SoC specific header files
in the dt-bindings.
Idea of having SoC specific headers is not doable when we are dealing with
a common driver. So this patchset attempts to fix this properly by creating
a common dt-bindings header for lpass which can be updated with new entries
if required. This patchset also add an simple of_xlate function to resolve
the dai names and different SoCs might not have 1:1 mapping for the
dai_driver array with dai ids.
Changes since v1:
- removed array indexes as suggested by Stephan G.
- rebased to sound/for-next branch
- collected Srinivasa tested-by tag for sc7180 platform.
Thanks,
srini
Srinivas Kandagatla (2):
ASoC: dt-bindings: lpass: Fix and common up lpass dai ids
ASoC: qcom: Fix broken support to MI2S TERTIARY and QUATERNARY
include/dt-bindings/sound/apq8016-lpass.h | 7 +++----
include/dt-bindings/sound/qcom,lpass.h | 15 +++++++++++++++
include/dt-bindings/sound/sc7180-lpass.h | 6 ++----
sound/soc/qcom/lpass-cpu.c | 22 ++++++++++++++++++++++
sound/soc/qcom/lpass-platform.c | 12 ++++++++++++
sound/soc/qcom/lpass-sc7180.c | 9 +++------
sound/soc/qcom/lpass.h | 2 +-
7 files changed, 58 insertions(+), 15 deletions(-)
create mode 100644 include/dt-bindings/sound/qcom,lpass.h
--
2.21.0
hdmi-codec is an optional property. The 2 patches fix DAI link binding
error when the property doesn't exist in DTS.
Tzung-Bi Shih (2):
ASoC: mediatek: mt8183-mt6358: ignore TDM DAI link by default
ASoC: mediatek: mt8183-da7219: ignore TDM DAI link by default
sound/soc/mediatek/mt8183/mt8183-da7219-max98357.c | 5 ++++-
sound/soc/mediatek/mt8183/mt8183-mt6358-ts3a227-max98357.c | 5 ++++-
2 files changed, 8 insertions(+), 2 deletions(-)
--
2.30.0.284.gd98b1dd5eaa7-goog
This series adds unit tests for ASoC topology.
First fix problems found when developing and running test cases and
then add tests implementation.
Tests themselves are quite simple and just call
snd_soc_tplg_component_load() with various parameters and check the
result. Tests themselves are described in more detail in commits
adding them.
Goal is to expand the amount of test cases in following patches.
Prerequisity for this patchset are 2 patches which have already been
sent:
https://lore.kernel.org/alsa-devel/20210114163602.911205-1-amadeuszx.slawinski@linux.intel.com/T/#t
Description on how typical test case itself works:
In order to load topology we need to have 3 things:
card, codec component & platform component.
In typical test case we register card and platform component and bind
to dummy codec. There are of course execeptions, when we want to
test behaviour of topology API when component or card is missing.
Note that this is bit different from typical scenario (in SOF and skylake
drivers) where card is registered by machine driver and component by
platform driver, as we register both when setting up test.
If you check the test case most of them have similar architecture of:
1.
/* run test */
ret = snd_soc_register_card(&kunit_comp->card);
if (ret != 0 && ret != -EPROBE_DEFER)
KUNIT_FAIL(test, "Failed to register card");
2.
ret = snd_soc_component_initialize(&kunit_comp->comp, &test_component, test_dev);
KUNIT_EXPECT_EQ(test, 0, ret);
3.
ret = snd_soc_add_component(&kunit_comp->comp, NULL, 0);
KUNIT_EXPECT_EQ(test, 0, ret);
Ad. 1.
First we register card, which in most tests returns -EPROBE_DEFER
(from snd_soc_bind_card()), as platform component is not yet created.
I test for both 0 and -EPROBE_DEFER, as it makes it easier to reshuffle
this code around if needed and there is one test case which does it in
different order.
Ad. 2.
Then we initialize platform component with structure pointing at proper
probe function, which calls snd_soc_tplg_component_load() with test
parameters and checks expected result.
Ad. 3.
And then in follow up we call snd_soc_add_component() which creates
platform component for us and calls snd_soc_try_rebind_card() which
if everything is bound properly calls previously set probe function.
Amadeusz Sławiński (5):
ASoC: topology: Properly unregister DAI on removal
Revert "ASoC: soc-devres: add devm_snd_soc_register_dai()"
ASoC: topology: KUnit: Add KUnit tests passing various arguments to
snd_soc_tplg_component_load
ASoC: topology: KUnit: Add KUnit tests passing empty topology with
variants to snd_soc_tplg_component_load
ASoC: topology: KUnit: Add KUnit tests passing topology with PCM to
snd_soc_tplg_component_load
include/sound/soc.h | 4 -
sound/soc/Kconfig | 17 +
sound/soc/Makefile | 5 +
sound/soc/soc-devres.c | 37 --
sound/soc/soc-topology-test.c | 843 ++++++++++++++++++++++++++++++++++
sound/soc/soc-topology.c | 9 +-
6 files changed, 870 insertions(+), 45 deletions(-)
create mode 100644 sound/soc/soc-topology-test.c
--
2.25.1
The OMAP driver may leverage software BCH logic to locate errors while
using its own hardware to detect the presence of errors. This is
achieved with a "mixed" mode which initializes manually the software
BCH internal logic while providing its own OOB layout.
The issue here comes from the fact that the BCH driver has been
updated to only use generic NAND objects, and no longer depend on raw
NAND structures as it is usable from SPI-NAND as well. However, at the
end of the BCH context initialization, the driver checks the validity
of the OOB layout. At this stage, the raw NAND fields have not been
populated yet while being used by the layout helpers, leading to an
invalid layout.
The chosen solution here is to include the BCH structure definition
and to refer to the BCH fields directly (de-referenced as a const
pointer here) to know as early as possible the number of steps and ECC
bytes which have been chosen.
Note: I don't know which commit exactly triggered the error, but the
entire migration to a generic BCH driver got merged in one go, so this
should not be a problem for stable backports.
Reported-by: Adam Ford <aford173@gmail.com>
Fixes: 80fe603160 ("mtd: nand: ecc-bch: Stop using raw NAND structures")
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Tested-by: Adam Ford <aford173@gmail.com> #logicpd-torpedo-37xx-devkit-28.dts
Link: https://lore.kernel.org/linux-mtd/20210119155510.5655-1-miquel.raynal@bootlin.com
Pull btrfs fixes from David Sterba:
"A few more one line fixes for various bugs, stable material.
- fix send when emitting clone operation from the same file and root
- fix double free on error when cleaning backrefs
- lockdep fix during relocation
- handle potential error during reloc when starting transaction
- skip running delayed refs during commit (leftover from code removal
in this dev cycle)"
* tag 'for-5.11-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: don't clear ret in btrfs_start_dirty_block_groups
btrfs: fix lockdep splat in btrfs_recover_relocation
btrfs: do not double free backref nodes on error
btrfs: don't get an EINTR during drop_snapshot for reloc
btrfs: send: fix invalid clone operations when cloning from the same file and root
btrfs: no need to run delayed refs after commit_fs_roots during commit
Since the recent refactoring, it's been reported that some USB-audio
devices (typically webcams) are no longer detected properly by
PulseAudio. The debug session revealed that it's failing at probing
by PA to try the sample rate 44.1kHz while the device has discrete
sample rates other than 44.1kHz. But the puzzle was that arecord
works as is, and some other devices with the discrete rates work,
either.
After all, this turned out to be the lack of the dependencies in a few
hw constraint rules: snd_pcm_hw_rule_add() has the (variable)
arguments specifying the dependent parameters, and some functions
didn't set the target parameter itself as the dependencies. This
resulted in an invalid parameter that could be generated only in a
certain call pattern. This bug itself has been present in the code,
but it didn't trigger errors just because the rules were casually
avoiding such a corner case. After the recent refactoring and
cleanup, however, the hw constraints work "as expected", and the
problem surfaced now.
For fixing the problem above, this patch adds the missing dependent
parameters to each snd_pcm_hw_rule() call.
Fixes: bc4e94aa8e ("ALSA: usb-audio: Handle discrete rates properly in hw constraints")
BugLink: http://bugzilla.opensuse.org/show_bug.cgi?id=1181014
Link: https://lore.kernel.org/r/20210120204554.30177-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Pull networking fixes from Jakub Kicinski:
"Networking fixes for 5.11-rc5, including fixes from bpf, wireless, and
can trees.
Current release - regressions:
- nfc: nci: fix the wrong NCI_CORE_INIT parameters
Current release - new code bugs:
- bpf: allow empty module BTFs
Previous releases - regressions:
- bpf: fix signed_{sub,add32}_overflows type handling
- tcp: do not mess with cloned skbs in tcp_add_backlog()
- bpf: prevent double bpf_prog_put call from bpf_tracing_prog_attach
- bpf: don't leak memory in bpf getsockopt when optlen == 0
- tcp: fix potential use-after-free due to double kfree()
- mac80211: fix encryption issues with WEP
- devlink: use right genl user_ptr when handling port param get/set
- ipv6: set multicast flag on the multicast route
- tcp: fix TCP_USER_TIMEOUT with zero window
Previous releases - always broken:
- bpf: local storage helpers should check nullness of owner ptr passed
- mac80211: fix incorrect strlen of .write in debugfs
- cls_flower: call nla_ok() before nla_next()
- skbuff: back tiny skbs with kmalloc() in __netdev_alloc_skb() too"
* tag 'net-5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (52 commits)
net: systemport: free dev before on error path
net: usb: cdc_ncm: don't spew notifications
net: mscc: ocelot: Fix multicast to the CPU port
tcp: Fix potential use-after-free due to double kfree()
bpf: Fix signed_{sub,add32}_overflows type handling
can: peak_usb: fix use after free bugs
can: vxcan: vxcan_xmit: fix use after free bug
can: dev: can_restart: fix use after free bug
tcp: fix TCP socket rehash stats mis-accounting
net: dsa: b53: fix an off by one in checking "vlan->vid"
tcp: do not mess with cloned skbs in tcp_add_backlog()
selftests: net: fib_tests: remove duplicate log test
net: nfc: nci: fix the wrong NCI_CORE_INIT parameters
sh_eth: Fix power down vs. is_opened flag ordering
net: Disable NETIF_F_HW_TLS_RX when RXCSUM is disabled
netfilter: rpfilter: mask ecn bits before fib lookup
udp: mask TOS bits in udp_v4_early_demux()
xsk: Clear pool even for inactive queues
bpf: Fix helper bpf_map_peek_elem_proto pointing to wrong callback
sh_eth: Make PHY access aware of Runtime PM to fix reboot crash
...
Pull xen fix from Juergen Gross:
"A fix for build failure showing up in some configurations"
* tag 'for-linus-5.11-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
x86/xen: fix 'nopvspin' build error
On the following call path, `sig->pkey_algo` is not assigned
in asymmetric_key_verify_signature(), which causes runtime
crash in public_key_verify_signature().
keyctl_pkey_verify
asymmetric_key_verify_signature
verify_signature
public_key_verify_signature
This patch simply check this situation and fixes the crash
caused by NULL pointer.
Fixes: 2155256396 ("X.509: support OSCCA SM2-with-SM3 certificate verification")
Reported-by: Tobias Markus <tobias@markus-regensburg.de>
Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-and-tested-by: Toke Høiland-Jørgensen <toke@redhat.com>
Tested-by: João Fonseca <jpedrofonseca@ua.pt>
Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
Cc: stable@vger.kernel.org # v5.10+
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
After the recent actions to convert readpages aops to readahead, the
NULL checks of readpages aops in cachefiles_read_or_alloc_page() may
hit falsely. More badly, it's an ASSERT() call, and this panics.
Drop the superfluous NULL checks for fixing this regression.
[DH: Note that cachefiles never actually used readpages, so this check was
never actually necessary]
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=208883
BugLink: https://bugzilla.opensuse.org/show_bug.cgi?id=1175245
Fixes: 9ae326a690 ("CacheFiles: A cache that backs onto a mounted filesystem")
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Set the acpi_device pointer which acpi_bus_get_device() returns-by-
reference to NULL on errors.
We've recently had 2 cases where callers of acpi_bus_get_device()
did not properly error check the return value, so set the returned-
by-reference acpi_device pointer to NULL, because at least some
callers of acpi_bus_get_device() expect that to be done on errors.
[ rjw: This issue was exposed by commit 71da201f38 ("ACPI: scan:
Defer enumeration of devices with _DEP lists") which caused it to
be much more likely to occur on some systems, but the real defect
had been introduced by an earlier commit. ]
Fixes: 40e7fcb192 ("ACPI: Add _DEP support to fix battery issue on Asus T100TA")
Fixes: bcfcd409d4 ("usb: split code locating ACPI companion into port and device")
Reported-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Tested-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Diagnosed-by: Rafael J. Wysocki <rafael@kernel.org>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Cc: All applicable <stable@vger.kernel.org>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Properly unwind step by step using refactored helpers from nvme_unmap_data
to avoid a potential double dma_unmap on a mapping failure.
Fixes: 7fe07d14f7 ("nvme-pci: merge nvme_free_iod into nvme_unmap_data")
Reported-by: Marc Orr <marcorr@google.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Marc Orr <marcorr@google.com>
Split out three helpers from nvme_unmap_data that will allow finer grained
unwinding from nvme_map_data.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Marc Orr <marcorr@google.com>
This reverts commit b2d2440430.
It's true that creating rxe on top of 802.1q interfaces doesn't work.
Thus, commit fd49ddaf7e ("RDMA/rxe: prevent rxe creation on top of vlan
interface") was absolutely correct.
But b2d2440430 was incorrect assuming that with this change, RDMA and
VLAN don't work togehter at all. It just has to be set up
differently. Rather than creating rxe on top of the VLAN interface, rxe
must be created on top of the physical interface. RDMA then works just
fine through VLAN interfaces on top of that physical interface, via the
"upper device" logic.
This is hard to see in the rxe logic because it never talks about vlan,
but instead rxe carefully selects upper vlan netdevices when working with
packets which in turn imply certain vlan tagging. This is all done
correctly and interacts with the gid table with VLAN support the same as
real HW does.
b2d2440430 broke this setup deliberately and should thus be
reverted. Also, b2d2440430 removed rxe_dma_device(), so adapt the revert
to discard that hunk.
Fixes: b2d2440430 ("RDMA/rxe: Remove VLAN code leftovers from RXE")
Link: https://lore.kernel.org/r/20210120161913.7347-1-mwilck@suse.com
Signed-off-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Marc Kleine-Budde says:
====================
linux-can-fixes-for-5.11-20210120
All three patches are by Vincent Mailhol and fix a potential use after free bug
in the CAN device infrastructure, the vxcan driver, and the peak_usk driver. In
the TX-path the skb is used to read from after it was passed to the networking
stack with netif_rx_ni().
* tag 'linux-can-fixes-for-5.11-20210120' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
can: peak_usb: fix use after free bugs
can: vxcan: vxcan_xmit: fix use after free bug
can: dev: can_restart: fix use after free bug
====================
Link: https://lore.kernel.org/r/20210120125202.2187358-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
RTL8156 sends notifications about every 32ms.
Only display/log notifications when something changes.
This issue has been reported by others:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1832472https://lkml.org/lkml/2020/8/27/1083
...
[785962.779840] usb 1-1: new high-speed USB device number 5 using xhci_hcd
[785962.929944] usb 1-1: New USB device found, idVendor=0bda, idProduct=8156, bcdDevice=30.00
[785962.929949] usb 1-1: New USB device strings: Mfr=1, Product=2, SerialNumber=6
[785962.929952] usb 1-1: Product: USB 10/100/1G/2.5G LAN
[785962.929954] usb 1-1: Manufacturer: Realtek
[785962.929956] usb 1-1: SerialNumber: 000000001
[785962.991755] usbcore: registered new interface driver cdc_ether
[785963.017068] cdc_ncm 1-1:2.0: MAC-Address: 00:24:27:88:08:15
[785963.017072] cdc_ncm 1-1:2.0: setting rx_max = 16384
[785963.017169] cdc_ncm 1-1:2.0: setting tx_max = 16384
[785963.017682] cdc_ncm 1-1:2.0 usb0: register 'cdc_ncm' at usb-0000:00:14.0-1, CDC NCM, 00:24:27:88:08:15
[785963.019211] usbcore: registered new interface driver cdc_ncm
[785963.023856] usbcore: registered new interface driver cdc_wdm
[785963.025461] usbcore: registered new interface driver cdc_mbim
[785963.038824] cdc_ncm 1-1:2.0 enx002427880815: renamed from usb0
[785963.089586] cdc_ncm 1-1:2.0 enx002427880815: network connection: disconnected
[785963.121673] cdc_ncm 1-1:2.0 enx002427880815: network connection: disconnected
[785963.153682] cdc_ncm 1-1:2.0 enx002427880815: network connection: disconnected
...
This is about 2KB per second and will overwrite all contents of a 1MB
dmesg buffer in under 10 minutes rendering them useless for debugging
many kernel problems.
This is also an extra 180 MB/day in /var/logs (or 1GB per week) rendering
the majority of those logs useless too.
When the link is up (expected state), spew amount is >2x higher:
...
[786139.600992] cdc_ncm 2-1:2.0 enx002427880815: network connection: connected
[786139.632997] cdc_ncm 2-1:2.0 enx002427880815: 2500 mbit/s downlink 2500 mbit/s uplink
[786139.665097] cdc_ncm 2-1:2.0 enx002427880815: network connection: connected
[786139.697100] cdc_ncm 2-1:2.0 enx002427880815: 2500 mbit/s downlink 2500 mbit/s uplink
[786139.729094] cdc_ncm 2-1:2.0 enx002427880815: network connection: connected
[786139.761108] cdc_ncm 2-1:2.0 enx002427880815: 2500 mbit/s downlink 2500 mbit/s uplink
...
Chrome OS cannot support RTL8156 until this is fixed.
Signed-off-by: Grant Grundler <grundler@chromium.org>
Reviewed-by: Hayes Wang <hayeswang@realtek.com>
Link: https://lore.kernel.org/r/20210120011208.3768105-1-grundler@chromium.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Multicast entries in the MAC table use the high bits of the MAC
address to encode the ports that should get the packets. But this port
mask does not work for the CPU port, to receive these packets on the
CPU port the MAC_CPU_COPY flag must be set.
Because of this IPv6 was effectively not working because neighbor
solicitations were never received. This was not apparent before commit
9403c158 (net: mscc: ocelot: support IPv4, IPv6 and plain Ethernet mdb
entries) as the IPv6 entries were broken so all incoming IPv6
multicast was then treated as unknown and flooded on all ports.
To fix this problem rework the ocelot_mact_learn() to set the
MAC_CPU_COPY flag when a multicast entry that target the CPU port is
added. For this we have to read back the ports endcoded in the pseudo
MAC address by the caller. It is not a very nice design but that avoid
changing the callers and should make backporting easier.
Signed-off-by: Alban Bedel <alban.bedel@aerq.com>
Fixes: 9403c158b8 ("net: mscc: ocelot: support IPv4, IPv6 and plain Ethernet mdb entries")
Link: https://lore.kernel.org/r/20210119140638.203374-1-alban.bedel@aerq.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Receiving ACK with a valid SYN cookie, cookie_v4_check() allocates struct
request_sock and then can allocate inet_rsk(req)->ireq_opt. After that,
tcp_v4_syn_recv_sock() allocates struct sock and copies ireq_opt to
inet_sk(sk)->inet_opt. Normally, tcp_v4_syn_recv_sock() inserts the full
socket into ehash and sets NULL to ireq_opt. Otherwise,
tcp_v4_syn_recv_sock() has to reset inet_opt by NULL and free the full
socket.
The commit 01770a1661 ("tcp: fix race condition when creating child
sockets from syncookies") added a new path, in which more than one cores
create full sockets for the same SYN cookie. Currently, the core which
loses the race frees the full socket without resetting inet_opt, resulting
in that both sock_put() and reqsk_put() call kfree() for the same memory:
sock_put
sk_free
__sk_free
sk_destruct
__sk_destruct
sk->sk_destruct/inet_sock_destruct
kfree(rcu_dereference_protected(inet->inet_opt, 1));
reqsk_put
reqsk_free
__reqsk_free
req->rsk_ops->destructor/tcp_v4_reqsk_destructor
kfree(rcu_dereference_protected(inet_rsk(req)->ireq_opt, 1));
Calling kmalloc() between the double kfree() can lead to use-after-free, so
this patch fixes it by setting NULL to inet_opt before sock_put().
As a side note, this kind of issue does not happen for IPv6. This is
because tcp_v6_syn_recv_sock() clones both ipv6_opt and pktopts which
correspond to ireq_opt in IPv4.
Fixes: 01770a1661 ("tcp: fix race condition when creating child sockets from syncookies")
CC: Ricardo Dias <rdias@singlestore.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Reviewed-by: Benjamin Herrenschmidt <benh@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20210118055920.82516-1-kuniyu@amazon.co.jp
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Daniel Borkmann says:
====================
pull-request: bpf 2021-01-20
1) Fix wrong bpf_map_peek_elem_proto helper callback, from Mircea Cirjaliu.
2) Fix signed_{sub,add32}_overflows type truncation, from Daniel Borkmann.
3) Fix AF_XDP to also clear pools for inactive queues, from Maxim Mikityanskiy.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf: Fix signed_{sub,add32}_overflows type handling
xsk: Clear pool even for inactive queues
bpf: Fix helper bpf_map_peek_elem_proto pointing to wrong callback
====================
Link: https://lore.kernel.org/r/20210120163439.8160-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
lpass hdmi support patch totally removed support for MI2S TERTIARY
and QUATERNARY.
One of the major issue was spotted with the design of having
separate SoC specific header files for the common lpass driver.
This design is prone to break as an when new SoC header is added
as the common DAI ids of other SoCs will be overwritten by the
new ones.
Having a common header qcom,lpass.h should fix the issue and any new
DAI ids should be added to the common header.
With this change lpass also needs a new of_xlate function to resolve
dai name.
Fixes: 7cb37b7bd0 ("ASoC: qcom: Add support for lpass hdmi driver")
Reported-by: Jun Nie <jun.nie@linaro.org>
Reported-by: Stephan Gerhold <stephan@gerhold.net>
Tested-by: Srinivasa Rao <srivasam@codeaurora.org>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Tested-by: Stephan Gerhold <stephan@gerhold.net>
Link: https://lore.kernel.org/r/20210119171527.32145-3-srinivas.kandagatla@linaro.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Existing header file design of having separate SoC specific header files
for the common lpass driver has mutiple issues.
This design is prone to break as an when new SoC header is added
as the common DAI ids of other SoCs will be overwritten by the
new ones.
One of them surfaced by recent patch that adds support to sc7180, this
one totally broke LPASS drivers on other Qualcomm SoCs.
Before this gets worst, fix this by having a common header qcom,lpass.h.
This should fix the issue and any new DAI ids should be added to the
common header. This will be more sustainable then the existing design!
Fixes: 12fbfc4cab ("ASoC: Add sc7180-lpass binding header hdmi define")
Reported-by: Jun Nie <jun.nie@linaro.org>
Reported-by: Stephan Gerhold <stephan@gerhold.net>
Tested-by: Srinivasa Rao <srivasam@codeaurora.org>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Link: https://lore.kernel.org/r/20210119171527.32145-2-srinivas.kandagatla@linaro.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Fix incorrect signed_{sub,add32}_overflows() input types (and a related buggy
comment). It looks like this might have slipped in via copy/paste issue, also
given prior to 3f50f132d8 ("bpf: Verifier, do explicit ALU32 bounds tracking")
the signature of signed_sub_overflows() had s64 a and s64 b as its input args
whereas now they are truncated to s32. Thus restore proper types. Also, the case
of signed_add32_overflows() is not consistent to signed_sub32_overflows(). Both
have s32 as inputs, therefore align the former.
Fixes: 3f50f132d8 ("bpf: Verifier, do explicit ALU32 bounds tracking")
Reported-by: De4dCr0w <sa516203@mail.ustc.edu.cn>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
One customer reports a crash problem which causes by flush request. It
triggers a warning before crash.
/* new request after previous flush is completed */
if (ktime_after(req_start, mddev->prev_flush_start)) {
WARN_ON(mddev->flush_bio);
mddev->flush_bio = bio;
bio = NULL;
}
The WARN_ON is triggered. We use spin lock to protect prev_flush_start and
flush_bio in md_flush_request. But there is no lock protection in
md_submit_flush_data. It can set flush_bio to NULL first because of
compiler reordering write instructions.
For example, flush bio1 sets flush bio to NULL first in
md_submit_flush_data. An interrupt or vmware causing an extended stall
happen between updating flush_bio and prev_flush_start. Because flush_bio
is NULL, flush bio2 can get the lock and submit to underlayer disks. Then
flush bio1 updates prev_flush_start after the interrupt or extended stall.
Then flush bio3 enters in md_flush_request. The start time req_start is
behind prev_flush_start. The flush_bio is not NULL(flush bio2 hasn't
finished). So it can trigger the WARN_ON now. Then it calls INIT_WORK
again. INIT_WORK() will re-initialize the list pointers in the
work_struct, which then can result in a corrupted work list and the
work_struct queued a second time. With the work list corrupted, it can
lead in invalid work items being used and cause a crash in
process_one_work.
We need to make sure only one flush bio can be handled at one same time.
So add spin lock in md_submit_flush_data to protect prev_flush_start and
flush_bio in an atomic way.
Reviewed-by: David Jeffery <djeffery@redhat.com>
Signed-off-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
map_direct_mr() assumed that the number of scatter/gather entries
returned by dma_map_sg_attrs() was equal to the number of segments in
the sgl list. This led to wrong population of the mkey object. Fix this
by properly referring to the returned value.
The hardware expects each MTT entry to contain the DMA address of a
contiguous block of memory of size (1 << mr->log_size) bytes.
dma_map_sg_attrs() can coalesce several sg entries into a single
scatter/gather entry of contiguous DMA range so we need to scan the list
and refer to the size of each s/g entry.
In addition, get rid of fill_sg() which effect is overwritten by
populate_mtts().
Fixes: 94abbccdf2 ("vdpa/mlx5: Add shared memory registration code")
Signed-off-by: Eli Cohen <elic@nvidia.com>
Link: https://lore.kernel.org/r/20210107071845.GA224876@mtl-vdi-166.wap.labs.mlnx
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
The L1D flush fallback functions are not recoverable vs interrupts,
yet the scv entry flush runs with MSR[EE]=1. This can result in a
timer (soft-NMI) or MCE or SRESET interrupt hitting here and overwriting
the EXRFI save area, which ends up corrupting userspace registers for
scv return.
Fix this by disabling RI and EE for the scv entry fallback flush.
Fixes: f79643787e ("powerpc/64s: flush L1D on kernel entry")
Cc: stable@vger.kernel.org # 5.9+ which also have flush L1D patch backport
Reported-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210111062408.287092-1-npiggin@gmail.com
The previous commit 32efcc06d2 ("tcp: export count for rehash attempts")
would mis-account rehashing SNMP and socket stats:
a. During handshake of an active open, only counts the first
SYN timeout
b. After handshake of passive and active open, stop updating
after (roughly) TCP_RETRIES1 recurring RTOs
c. After the socket aborts, over count timeout_rehash by 1
This patch fixes this by checking the rehash result from sk_rethink_txhash.
Fixes: 32efcc06d2 ("tcp: export count for rehash attempts")
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Link: https://lore.kernel.org/r/20210119192619.1848270-1-ycheng@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The > comparison should be >= to prevent accessing one element beyond
the end of the dev->vlans[] array in the caller function, b53_vlan_add().
The "dev->vlans" array is allocated in the b53_switch_init() function
and it has "dev->num_vlans" elements.
Fixes: a2482d2ce3 ("net: dsa: b53: Plug in VLAN support")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/YAbxI97Dl/pmBy5V@mwanda
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Heiner Kallweit reported that some skbs were sent with
the following invalid GSO properties :
- gso_size > 0
- gso_type == 0
This was triggerring a WARN_ON_ONCE() in rtl8169_tso_csum_v2.
Juerg Haefliger was able to reproduce a similar issue using
a lan78xx NIC and a workload mixing TCP incoming traffic
and forwarded packets.
The problem is that tcp_add_backlog() is writing
over gso_segs and gso_size even if the incoming packet will not
be coalesced to the backlog tail packet.
While skb_try_coalesce() would bail out if tail packet is cloned,
this overwriting would lead to corruptions of other packets
cooked by lan78xx, sharing a common super-packet.
The strategy used by lan78xx is to use a big skb, and split
it into all received packets using skb_clone() to avoid copies.
The drawback of this strategy is that all the small skb share a common
struct skb_shared_info.
This patch rewrites TCP gso_size/gso_segs handling to only
happen on the tail skb, since skb_try_coalesce() made sure
it was not cloned.
Fixes: 4f693b55c3 ("tcp: implement coalescing on backlog queue")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Bisected-by: Juerg Haefliger <juergh@canonical.com>
Tested-by: Juerg Haefliger <juergh@canonical.com>
Reported-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=209423
Link: https://lore.kernel.org/r/20210119164900.766957-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Newer binutils (>= 2.36) refuse to assemble lmw/stmw when building in
little endian mode. That breaks compilation of our alignment handler
test:
/tmp/cco4l14N.s: Assembler messages:
/tmp/cco4l14N.s:1440: Error: `lmw' invalid when little-endian
/tmp/cco4l14N.s:1814: Error: `stmw' invalid when little-endian
make[2]: *** [../../lib.mk:139: /output/kselftest/powerpc/alignment/alignment_handler] Error 1
These tests do pass on little endian machines, as the kernel will
still emulate those instructions even when running little
endian (which is arguably a kernel bug).
But we don't really need to test that case, so ifdef those
instructions out to get the alignment test building again.
Reported-by: Libor Pechacek <lpechacek@suse.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Tested-by: Libor Pechacek <lpechacek@suse.com>
Link: https://lore.kernel.org/r/20210119041800.3093047-1-mpe@ellerman.id.au
In commit e28bf1f03b ("RDMA: Convert various random sprintf sysfs _show
uses to sysfs_emit") I mistakenly used len = sysfs_emit_at to overwrite
the last trailing space of potentially multiple entry output.
Instead use a more common style by removing the trailing space from the
output formats and adding a prefixing space to the contination formats and
converting the final terminating output newline from the defective
len = sysfs_emit_at(buf, len, "\n");
to the now appropriate and typical
len += sysfs_emit_at(buf, len, "\n");
Fixes: e28bf1f03b ("RDMA: Convert various random sprintf sysfs _show uses to sysfs_emit")
Link: https://lore.kernel.org/r/5eb794b9c9bca0494d94b2b209f1627fa4e7b555.camel@perches.com
Reported-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
This reverts commit fbdd0049d9.
Due to commit in fixes tag, netdevice events were received only in one net
namespace of mlx5_core_dev. Due to this when netdevice events arrive in
net namespace other than net namespace of mlx5_core_dev, they are missed.
This results in empty GID table due to RDMA device being detached from its
net device.
Hence, revert back to receive netdevice events in all net namespaces to
restore back RDMA functionality in non init_net net namespace. The
deadlock will have to be addressed in another patch.
Fixes: fbdd0049d9 ("RDMA/mlx5: Fix devlink deadlock on net namespace deletion")
Link: https://lore.kernel.org/r/20210117092633.10690-1-leon@kernel.org
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
GFP_KERNEL may cause ida_alloc_range() to sleep, but the spinlock covering
this function is not allowed to sleep, so the spinlock needs to be changed
to mutex.
As there is a certain chance of memory allocation failure, GFP_ATOMIC is
not suitable for QP allocation scenarios.
Fixes: 71586dd200 ("RDMA/hns: Create QP with selected QPN for bank load balance")
Link: https://lore.kernel.org/r/1611048513-28663-1-git-send-email-liweihang@huawei.com
Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
The PVRDMA device HW interface defines network_hdr_type according to an
old definition of the internal kernel rdma_network_type enum that has
since changed, resulting in the wrong rdma_network_type being reported.
Fix this by explicitly defining the enum used by the PVRDMA device and
adding a function to convert the pvrdma_network_type to rdma_network_type
enum.
Cc: stable@vger.kernel.org # 5.10+
Fixes: 1c15b4f2a4 ("RDMA/core: Modify enum ib_gid_type and enum rdma_network_type")
Link: https://lore.kernel.org/r/1611026189-17943-1-git-send-email-bryantan@vmware.com
Reviewed-by: Adit Ranadive <aditr@vmware.com>
Signed-off-by: Bryan Tan <bryantan@vmware.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Guillaume Nault says:
====================
ipv4: Ensure ECN bits don't influence source address validation
Functions that end up calling fib_table_lookup() should clear the ECN
bits from the TOS, otherwise ECT(0) and ECT(1) packets can be treated
differently.
Most functions already clear the ECN bits, but there are a few cases
where this is not done. This series only fixes the ones related to
source address validation.
====================
Link: https://lore.kernel.org/r/cover.1610790904.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
RT_TOS() only masks one of the two ECN bits. Therefore rpfilter_mt()
treats Not-ECT or ECT(1) packets in a different way than those with
ECT(0) or CE.
Reproducer:
Create two netns, connected with a veth:
$ ip netns add ns0
$ ip netns add ns1
$ ip link add name veth01 netns ns0 type veth peer name veth10 netns ns1
$ ip -netns ns0 link set dev veth01 up
$ ip -netns ns1 link set dev veth10 up
$ ip -netns ns0 address add 192.0.2.10/32 dev veth01
$ ip -netns ns1 address add 192.0.2.11/32 dev veth10
Add a route to ns1 in ns0:
$ ip -netns ns0 route add 192.0.2.11/32 dev veth01
In ns1, only packets with TOS 4 can be routed to ns0:
$ ip -netns ns1 route add 192.0.2.10/32 tos 4 dev veth10
Ping from ns0 to ns1 works regardless of the ECN bits, as long as TOS
is 4:
$ ip netns exec ns0 ping -Q 4 192.0.2.11 # TOS 4, Not-ECT
... 0% packet loss ...
$ ip netns exec ns0 ping -Q 5 192.0.2.11 # TOS 4, ECT(1)
... 0% packet loss ...
$ ip netns exec ns0 ping -Q 6 192.0.2.11 # TOS 4, ECT(0)
... 0% packet loss ...
$ ip netns exec ns0 ping -Q 7 192.0.2.11 # TOS 4, CE
... 0% packet loss ...
Now use iptable's rpfilter module in ns1:
$ ip netns exec ns1 iptables-legacy -t raw -A PREROUTING -m rpfilter --invert -j DROP
Not-ECT and ECT(1) packets still pass:
$ ip netns exec ns0 ping -Q 4 192.0.2.11 # TOS 4, Not-ECT
... 0% packet loss ...
$ ip netns exec ns0 ping -Q 5 192.0.2.11 # TOS 4, ECT(1)
... 0% packet loss ...
But ECT(0) and ECN packets are dropped:
$ ip netns exec ns0 ping -Q 6 192.0.2.11 # TOS 4, ECT(0)
... 100% packet loss ...
$ ip netns exec ns0 ping -Q 7 192.0.2.11 # TOS 4, CE
... 100% packet loss ...
After this patch, rpfilter doesn't drop ECT(0) and CE packets anymore.
Fixes: 8f97339d3f ("netfilter: add ipv4 reverse path filter match")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
udp_v4_early_demux() is the only function that calls
ip_mc_validate_source() with a TOS that hasn't been masked with
IPTOS_RT_MASK.
This results in different behaviours for incoming multicast UDPv4
packets, depending on if ip_mc_validate_source() is called from the
early-demux path (udp_v4_early_demux) or from the regular input path
(ip_route_input_noref).
ECN would normally not be used with UDP multicast packets, so the
practical consequences should be limited on that side. However,
IPTOS_RT_MASK is used to also masks the TOS' high order bits, to align
with the non-early-demux path behaviour.
Reproducer:
Setup two netns, connected with veth:
$ ip netns add ns0
$ ip netns add ns1
$ ip -netns ns0 link set dev lo up
$ ip -netns ns1 link set dev lo up
$ ip link add name veth01 netns ns0 type veth peer name veth10 netns ns1
$ ip -netns ns0 link set dev veth01 up
$ ip -netns ns1 link set dev veth10 up
$ ip -netns ns0 address add 192.0.2.10 peer 192.0.2.11/32 dev veth01
$ ip -netns ns1 address add 192.0.2.11 peer 192.0.2.10/32 dev veth10
In ns0, add route to multicast address 224.0.2.0/24 using source
address 198.51.100.10:
$ ip -netns ns0 address add 198.51.100.10/32 dev lo
$ ip -netns ns0 route add 224.0.2.0/24 dev veth01 src 198.51.100.10
In ns1, define route to 198.51.100.10, only for packets with TOS 4:
$ ip -netns ns1 route add 198.51.100.10/32 tos 4 dev veth10
Also activate rp_filter in ns1, so that incoming packets not matching
the above route get dropped:
$ ip netns exec ns1 sysctl -wq net.ipv4.conf.veth10.rp_filter=1
Now try to receive packets on 224.0.2.11:
$ ip netns exec ns1 socat UDP-RECVFROM:1111,ip-add-membership=224.0.2.11:veth10,ignoreeof -
In ns0, send packet to 224.0.2.11 with TOS 4 and ECT(0) (that is,
tos 6 for socat):
$ echo test0 | ip netns exec ns0 socat - UDP-DATAGRAM:224.0.2.11:1111,bind=:1111,tos=6
The "test0" message is properly received by socat in ns1, because
early-demux has no cached dst to use, so source address validation
is done by ip_route_input_mc(), which receives a TOS that has the
ECN bits masked.
Now send another packet to 224.0.2.11, still with TOS 4 and ECT(0):
$ echo test1 | ip netns exec ns0 socat - UDP-DATAGRAM:224.0.2.11:1111,bind=:1111,tos=6
The "test1" message isn't received by socat in ns1, because, now,
early-demux has a cached dst to use and calls ip_mc_validate_source()
immediately, without masking the ECN bits.
Fixes: bc044e8db7 ("udp: perform source validation for mcast early demux")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The number of queues can change by other means, rather than ethtool. For
example, attaching an mqprio qdisc with num_tc > 1 leads to creating
multiple sets of TX queues, which may be then destroyed when mqprio is
deleted. If an AF_XDP socket is created while mqprio is active,
dev->_tx[queue_id].pool will be filled, but then real_num_tx_queues may
decrease with deletion of mqprio, which will mean that the pool won't be
NULLed, and a further increase of the number of TX queues may expose a
dangling pointer.
To avoid any potential misbehavior, this commit clears pool for RX and
TX queues, regardless of real_num_*_queues, still taking into
consideration num_*_queues to avoid overflows.
Fixes: 1c1efc2af1 ("xsk: Create and free buffer pool independently from umem")
Fixes: a41b4f3c58 ("xsk: simplify xdp_clear_umem_at_qid implementation")
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Link: https://lore.kernel.org/bpf/20210118160333.333439-1-maximmi@mellanox.com
Pull task_work fix from Jens Axboe:
"The TIF_NOTIFY_SIGNAL change inadvertently removed the unconditional
task_work run we had in get_signal().
This caused a regression for some setups, since we're relying on eg
____fput() being run to close and release, for example, a pipe and
wake the other end.
For 5.11, I prefer the simple solution of just reinstating the
unconditional run, even if it conceptually doesn't make much sense -
if you need that kind of guarantee, you should be using TWA_SIGNAL
instead of TWA_NOTIFY. But it's the trivial fix for 5.11, and would
ensure that other potential gotchas/assumptions for task_work don't
regress for 5.11.
We're looking into further simplifying the task_work notifications for
5.12 which would resolve that too"
* tag 'task_work-2021-01-19' of git://git.kernel.dk/linux-block:
task_work: unconditionally run task_work from get_signal()
Pull nfsd fixes from Chuck Lever:
- Avoid exposing parent of root directory in NFSv3 READDIRPLUS results
- Fix a tracepoint change that went in the initial 5.11 merge
* tag 'nfsd-5.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
SUNRPC: Move the svc_xdr_recvfrom tracepoint again
nfsd4: readdirplus shouldn't return parent of export
Pull hyperv fix from Wei Liu:
"One patch from Dexuan to fix clockevent initialization"
* tag 'hyperv-fixes-signed-20210119' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
x86/hyperv: Initialize clockevents after LAPIC is initialized
Geert Uytterhoeven says:
====================
sh_eth: Fix reboot crash
This patch fixes a regression v5.11-rc1, where rebooting while a sh_eth
device is not opened will cause a crash.
Changes compared to v1:
- Export mdiobb_{read,write}(),
- Call mdiobb_{read,write}() now they are exported,
- Use mii_bus.parent to avoid bb_info.dev copy,
- Drop RFC state.
Alternatively, mdio-bitbang could provide Runtime PM-aware wrappers
itself, and use them either manually (through a new parameter to
alloc_mdio_bitbang(), or a new alloc_mdio_bitbang_*() function), or
automatically (e.g. if pm_runtime_enabled() returns true). Note that
the latter requires a "struct device *" parameter to operate on.
Currently there are only two drivers that call alloc_mdio_bitbang() and
use Runtime PM: the Renesas sh_eth and ravb drivers. This series fixes
the former, while the latter is not affected (it keeps the device
powered all the time between driver probe and driver unbind, and
changing that seems to be non-trivial).
====================
Link: https://lore.kernel.org/r/20210118150656.796584-1-geert+renesas@glider.be
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Wolfram reports that his R-Car H2-based Lager board can no longer be
rebooted in v5.11-rc1, as it crashes with an imprecise external abort.
The issue can be reproduced on other boards (e.g. Koelsch with R-Car
M2-W) too, if CONFIG_IP_PNP is disabled, and the Ethernet interface is
down at reboot time:
Unhandled fault: imprecise external abort (0x1406) at 0x00000000
pgd = (ptrval)
[00000000] *pgd=422b6835, *pte=00000000, *ppte=00000000
Internal error: : 1406 [#1] ARM
Modules linked in:
CPU: 0 PID: 1105 Comm: init Tainted: G W 5.10.0-rc1-00402-ge2f016cf7751 #1048
Hardware name: Generic R-Car Gen2 (Flattened Device Tree)
PC is at sh_mdio_ctrl+0x44/0x60
LR is at sh_mmd_ctrl+0x20/0x24
...
Backtrace:
[<c0451f30>] (sh_mdio_ctrl) from [<c0451fd4>] (sh_mmd_ctrl+0x20/0x24)
r7:0000001f r6:00000020 r5:00000002 r4:c22a1dc4
[<c0451fb4>] (sh_mmd_ctrl) from [<c044fc18>] (mdiobb_cmd+0x38/0xa8)
[<c044fbe0>] (mdiobb_cmd) from [<c044feb8>] (mdiobb_read+0x58/0xdc)
r9:c229f844 r8:c0c329dc r7:c221e000 r6:00000001 r5:c22a1dc4 r4:00000001
[<c044fe60>] (mdiobb_read) from [<c044c854>] (__mdiobus_read+0x74/0xe0)
r7:0000001f r6:00000001 r5:c221e000 r4:c221e000
[<c044c7e0>] (__mdiobus_read) from [<c044c9d8>] (mdiobus_read+0x40/0x54)
r7:0000001f r6:00000001 r5:c221e000 r4:c221e458
[<c044c998>] (mdiobus_read) from [<c044d678>] (phy_read+0x1c/0x20)
r7:ffffe000 r6:c221e470 r5:00000200 r4:c229f800
[<c044d65c>] (phy_read) from [<c044d94c>] (kszphy_config_intr+0x44/0x80)
[<c044d908>] (kszphy_config_intr) from [<c044694c>] (phy_disable_interrupts+0x44/0x50)
r5:c229f800 r4:c229f800
[<c0446908>] (phy_disable_interrupts) from [<c0449370>] (phy_shutdown+0x18/0x1c)
r5:c229f800 r4:c229f804
[<c0449358>] (phy_shutdown) from [<c040066c>] (device_shutdown+0x168/0x1f8)
[<c0400504>] (device_shutdown) from [<c013de44>] (kernel_restart_prepare+0x3c/0x48)
r9:c22d2000 r8:c0100264 r7:c0b0d034 r6:00000000 r5:4321fedc r4:00000000
[<c013de08>] (kernel_restart_prepare) from [<c013dee0>] (kernel_restart+0x1c/0x60)
[<c013dec4>] (kernel_restart) from [<c013e1d8>] (__do_sys_reboot+0x168/0x208)
r5:4321fedc r4:01234567
[<c013e070>] (__do_sys_reboot) from [<c013e2e8>] (sys_reboot+0x18/0x1c)
r7:00000058 r6:00000000 r5:00000000 r4:00000000
[<c013e2d0>] (sys_reboot) from [<c0100060>] (ret_fast_syscall+0x0/0x54)
As of commit e2f016cf77 ("net: phy: add a shutdown procedure"),
system reboot calls phy_disable_interrupts() during shutdown. As this
happens unconditionally, the PHY registers may be accessed while the
device is suspended, causing undefined behavior, which may crash the
system.
Fix this by wrapping the PHY bitbang accessors in the sh_eth driver by
wrappers that take care of Runtime PM, to resume the device when needed.
Reported-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Suggested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Update PSI file description in cgroup-v2 docs to reflect the current
implementation.
tj: Changed cpu.pressure from read-only to read-write as suggested by
Johannes.
Signed-off-by: Odin Ugedal <odin@uged.al>
Acked-by: Dan Schatzberg <dschatzberg@fb.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
Fix NULL pointer dereference when adding new psi monitor to the root
cgroup. PSI files for root cgroup was introduced in df5ba5be74 by using
system wide psi struct when reading, but file write/monitor was not
properly fixed. Since the PSI config for the root cgroup isn't
initialized, the current implementation tries to lock a NULL ptr,
resulting in a crash.
Can be triggered by running this as root:
$ tee /sys/fs/cgroup/cpu.pressure <<< "some 10000 1000000"
Signed-off-by: Odin Ugedal <odin@uged.al>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: Dan Schatzberg <dschatzberg@fb.com>
Fixes: df5ba5be74 ("kernel/sched/psi.c: expose pressure metrics on root cgroup")
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: stable@vger.kernel.org # 5.2+
Signed-off-by: Tejun Heo <tj@kernel.org>
On x86 scale invariace tends to be disabled during resume from
suspend-to-RAM, because the MPERF or APERF MSR values are not as
expected then due to updates taking place after the platform
firmware has been invoked to complete the suspend transition.
That, of course, is not desirable, especially if the schedutil
scaling governor is in use, because the lack of scale invariance
causes it to be less reliable.
To counter that effect, modify init_freq_invariance() to register
a syscore_ops object for scale invariance with the ->resume callback
pointing to init_counter_refs() which will run on the CPU starting
the resume transition (the other CPUs will be taken care of the
"online" operations taking place later).
Fixes: e2b0d619b4 ("x86, sched: check for counters overflow in frequency invariant accounting")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Giovanni Gherdovich <ggherdovich@suse.cz>
Link: https://lkml.kernel.org/r/1803209.Mvru99baaF@kreacher
After hibernation, HDA controller can't be runtime-suspended after
commit 215a22ed31 ("ALSA: hda: Refactor codjc PM to use
direct-complete optimization"), which enables direct-complete for HDA
codec.
The HDA codec driver didn't expect direct-complete will be disabled
after it returns a positive value from prepare() callback. However,
there are some places that PM core can disable direct-complete. For
instance, system hibernation or when codec has subordinates like LEDs.
So if the codec is prepared for direct-complete but PM core still calls
codec's suspend or freeze callback, partially revert the commit and take
the original approach, which uses pm_runtime_force_*() helpers to
ensure PM refcount are balanced. Meanwhile, still keep prepare() and
complete() callbacks to enable direct-complete and request a resume for
jack detection, respectively.
Reported-by: Kenneth R. Crudup <kenny@panix.com>
Fixes: 215a22ed31 ("ALSA: hda: Refactor codec PM to use direct-complete optimization")
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Link: https://lore.kernel.org/r/20210119152145.346558-1-kai.heng.feng@canonical.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Commit efcdca286eef ("gpio: tegra: Convert to gpio_irq_chip") moved the
Tegra GPIO driver to the generic GPIO IRQ chip infrastructure and made
the IRQ domain hierarchical, so the driver needs to pull in the support
infrastructure via the GPIOLIB_IRQCHIP and IRQ_DOMAIN_HIERARCHY Kconfig
options.
Fixes: efcdca286eef ("gpio: tegra: Convert to gpio_irq_chip")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
The period is the sum of on and off values. That is, calculate period as
($on + $off) / clkrate
instead of
$off / clkrate - $on / clkrate
that makes no sense.
Reported-by: Russell King <linux@armlinux.org.uk>
Reviewed-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Fixes: 757642f9a5 ("gpio: mvebu: Add limited PWM support")
Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
gpiochip->to_irq method is redefined in gpiochip_add_irqchip.
A lot of gpiod driver's still define ->to_irq method, let's give
a gentle warning that they can no longer rely on it, so they can remove
it on ocassion.
Fixes: e0d8972898 ("gpio: Implement tighter IRQ chip integration")
Signed-off-by: Nikita Shubin <nikita.shubin@maquefel.me>
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Before the commit 896fbe20b4 ("printk: use the lockless
ringbuffer"), msg_print_text() would only write up to size-1 bytes
into the provided buffer. Some callers expect this behavior and
append a terminator to returned string. In particular:
arch/powerpc/xmon/xmon.c:dump_log_buf()
arch/um/kernel/kmsg_dump.c:kmsg_dumper_stdout()
msg_print_text() has been replaced by record_print_text(), which
currently fills the full size of the buffer. This causes a
buffer overflow for the above callers.
Change record_print_text() so that it will only use size-1 bytes
for text data. Also, for paranoia sakes, add a terminator after
the text data.
And finally, document this behavior so that it is clear that only
size-1 bytes are used and a terminator is added.
Fixes: 896fbe20b4 ("printk: use the lockless ringbuffer")
Cc: stable@vger.kernel.org # 5.10+
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20210114170412.4819-1-john.ogness@linutronix.de
The TCP session does not terminate with TCP_USER_TIMEOUT when data
remain untransmitted due to zero window.
The number of unanswered zero-window probes (tcp_probes_out) is
reset to zero with incoming acks irrespective of the window size,
as described in tcp_probe_timer():
RFC 1122 4.2.2.17 requires the sender to stay open indefinitely
as long as the receiver continues to respond probes. We support
this by default and reset icsk_probes_out with incoming ACKs.
This counter, however, is the wrong one to be used in calculating the
duration that the window remains closed and data remain untransmitted.
Thanks to Jonathan Maxwell <jmaxwell37@gmail.com> for diagnosing the
actual issue.
In this patch a new timestamp is introduced for the socket in order to
track the elapsed time for the zero-window probes that have not been
answered with any non-zero window ack.
Fixes: 9721e709fa ("tcp: simplify window probe aborting on USER_TIMEOUT")
Reported-by: William McCall <william.mccall@gmail.com>
Co-developed-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Enke Chen <enchen@paloaltonetworks.com>
Reviewed-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20210115223058.GA39267@localhost.localdomain
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The multicast route ff00::/8 is created with type RTN_UNICAST:
$ ip -6 -d route
unicast ::1 dev lo proto kernel scope global metric 256 pref medium
unicast fe80::/64 dev eth0 proto kernel scope global metric 256 pref medium
unicast ff00::/8 dev eth0 proto kernel scope global metric 256 pref medium
Set the type to RTN_MULTICAST which is more appropriate.
Fixes: e8478e80e5 ("net/ipv6: Save route type in rt6_info")
Signed-off-by: Matteo Croce <mcroce@microsoft.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The ff00::/8 multicast route is created without specifying the fc_protocol
field, so the default RTPROT_BOOT value is used:
$ ip -6 -d route
unicast ::1 dev lo proto kernel scope global metric 256 pref medium
unicast fe80::/64 dev eth0 proto kernel scope global metric 256 pref medium
unicast ff00::/8 dev eth0 proto boot scope global metric 256 pref medium
As the documentation says, this value identifies routes installed during
boot, but the route is created when interface is set up.
Change the value to RTPROT_KERNEL which is a better value.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Matteo Croce <mcroce@microsoft.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Johannes Berg says:
====================
Various fixes:
* kernel-doc parsing fixes
* incorrect debugfs string checks
* locking fix in regulatory
* some encryption-related fixes
* tag 'mac80211-for-net-2021-01-18.2' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211:
mac80211: check if atf has been disabled in __ieee80211_schedule_txq
mac80211: do not drop tx nulldata packets on encrypted links
mac80211: fix encryption key selection for 802.3 xmit
mac80211: fix fast-rx encryption check
mac80211: fix incorrect strlen of .write in debugfs
cfg80211: fix a kerneldoc markup
cfg80211: Save the regulatory domain with a lock
cfg80211/mac80211: fix kernel-doc for SAR APIs
====================
Link: https://lore.kernel.org/r/20210118204750.7243-1-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Since main() does not return a value explicitly, the
return values from FAIL_IF() conditions are ignored
and the tests can still pass irrespective of failures.
This makes sure that we always explicitly return the
correct test exit status.
Fixes: 1addb64447 ("selftests/powerpc: Add test for execute-disabled pkeys")
Fixes: c27f2fd170 ("selftests/powerpc: Add test for pkey siginfo verification")
Reported-by: Eirik Fuller <efuller@redhat.com>
Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210118093145.10134-1-sandipan@linux.ibm.com
mv88e6xxx_port_vlan_join checks whether the VTU already contains an
entry for the given vid (via mv88e6xxx_vtu_getnext), and if so, merely
changes the relevant .member[] element and loads the updated entry
into the VTU.
However, at least for the mv88e6250, the on-stack struct
mv88e6xxx_vtu_entry vlan never has its .state[] array explicitly
initialized, neither in mv88e6xxx_port_vlan_join() nor inside the
getnext implementation. So the new entry has random garbage for the
STU bits, breaking VLAN filtering.
When the VTU entry is initially created, those bits are all zero, and
we should make sure to keep them that way when the entry is updated.
Fixes: 92307069a9 (net: dsa: mv88e6xxx: Avoid VTU corruption on 6097)
Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Tobias Waldekranz <tobias@waldekranz.com>
Tested-by: Tobias Waldekranz <tobias@waldekranz.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The recently added thermal policy support makes a
hp_wmi_perform_query(0x4c, ...) call on older devices which do not
support thermal policies this causes the following warning to be
logged (seen on a HP Stream x360 Convertible PC 11):
[ 26.805305] hp_wmi: query 0x4c returned error 0x3
Error 0x3 is HPWMI_RET_UNKNOWN_COMMAND error. This commit silences
the warning for unknown-command errors, silencing the new warning.
Cc: Elia Devito <eliadevito@gmail.com>
Fixes: 81c93798ef ("platform/x86: hp-wmi: add support for thermal policy")
Link: https://lore.kernel.org/r/20210114232744.154886-1-hdegoede@redhat.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
The blamed commit was too aggressive, and it made ocelot_netdevice_event
react only to network interface events emitted for the ocelot switch
ports.
In fact, only the PRECHANGEUPPER should have had that check.
When we ignore all events that are not for us, we miss the fact that the
upper of the LAG changes, and the bonding interface gets enslaved to a
bridge. This is an operation we could offload under certain conditions.
Fixes: 7afb3e575e ("net: mscc: ocelot: don't handle netdev events for other netdevs")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Link: https://lore.kernel.org/r/20210118135210.2666246-1-olteanv@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull spi fixes from Mark Brown:
"A few more bug fixes for SPI, both driver specific ones. The caching
in the Cadence driver is to avoid a deadlock trying to retrieve the
cached value later at runtime"
* tag 'spi-fix-v5.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: cadence: cache reference clock rate during probe
spi: fsl: Fix driver breakage when SPI_CS_HIGH is not set in spi->mode
Pull ia64 build fix from Mike Rapoport:
"Fix an ia64 build failure caused by memory model changes"
* tag 'fixes-2021-01-18' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
ia64: fix build failure caused by memory model changes
Pull crypto fixes from Herbert Xu:
"A Kconfig dependency issue with omap-sham and a divide by zero in xor
on some platforms"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: omap-sham - Fix link error without crypto-engine
crypto: xor - Fix divide error in do_xor_speed()
THe HP Stream x360 Convertible PC 11 DSDT has the following VGBS function:
Method (VGBS, 0, Serialized)
{
If ((^^PCI0.LPCB.EC0.ROLS == Zero))
{
VBDS = Zero
}
Else
{
VBDS = Zero
}
Return (VBDS) /* \_SB_.VGBI.VBDS */
}
Which is obviously wrong, because it always returns 0 independent of the
2-in-1 being in laptop or tablet mode. This causes the intel-vbtn driver
to initially report SW_TABLET_MODE = 1 to userspace, which is known to
cause problems when the 2-in-1 is actually in laptop mode.
During earlier testing this turned out to not be a problem because the
2-in-1 would do a Notify(..., 0xCC) or Notify(..., 0xCD) soon after
the intel-vbtn driver loaded, correcting the SW_TABLET_MODE state.
Further testing however has shown that this Notify() soon after the
intel-vbtn driver loads, does not always happen. When the Notify
does not happen, then intel-vbtn reports SW_TABLET_MODE = 1 resulting in
a non-working touchpad.
IOW the tablet-mode reporting is not reliable on this device, so it
should be dropped from the allow-list, fixing the touchpad sometimes
not working.
Fixes: 8169bd3e6e ("platform/x86: intel-vbtn: Switch to an allow-list for SW_TABLET_MODE reporting")
Link: https://lore.kernel.org/r/20210114143432.31750-1-hdegoede@redhat.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
The function nvmet_execute_identify_ns() doesn't set the status if call
to nvmet_find_namespace() fails. In that case we set the status of the
request to the value return by the nvmet_copy_sgl().
Set the status to NVME_SC_INVALID_NS and adjust the code such that
request will have the right status on nvmet_find_namespace() failure.
Without this patch :-
NVME Identify Namespace 3:
nsze : 0
ncap : 0
nuse : 0
nsfeat : 0
nlbaf : 0
flbas : 0
mc : 0
dpc : 0
dps : 0
nmic : 0
rescap : 0
fpi : 0
dlfeat : 0
nawun : 0
nawupf : 0
nacwu : 0
nabsn : 0
nabo : 0
nabspf : 0
noiob : 0
nvmcap : 0
mssrl : 0
mcl : 0
msrc : 0
nsattr : 0
nvmsetid: 0
anagrpid: 0
endgid : 0
nguid : 00000000000000000000000000000000
eui64 : 0000000000000000
lbaf 0 : ms:0 lbads:0 rp:0 (in use)
With this patch-series :-
feb3b88b501e (HEAD -> nvme-5.11) nvmet: remove extra variable in identify ns
6302aa67210a nvmet: remove extra variable in id-desclist
ed57951da453 nvmet: remove extra variable in smart log nsid
be384b8c24dc nvmet: set right status on error in id-ns handler
NVMe status: INVALID_NS: The namespace or the format of that namespace is invalid(0xb)
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Since NVMe v1.4 the Controller Memory Buffer must be explicitly enabled
by the host.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
[hch: avoid a local variable and add a comment]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Each name space has a request queue, if complete request long time,
multi request queues may have time out requests at the same time,
nvme_tcp_timeout will execute concurrently. Multi requests in different
request queues may be queued in the same tcp queue, multi
nvme_tcp_timeout may call nvme_tcp_stop_queue at the same time.
The first nvme_tcp_stop_queue will clear NVME_TCP_Q_LIVE and continue
stopping the tcp queue(cancel io_work), but the others check
NVME_TCP_Q_LIVE is already cleared, and then directly complete the
requests, complete request before the io work is completely canceled may
lead to a use-after-free condition.
Add a multex lock to serialize nvme_tcp_stop_queue.
Signed-off-by: Chao Leng <lengchao@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
A crash happens when inject completing request long time(nearly 30s).
Each name space has a request queue, when inject completing request long
time, multi request queues may have time out requests at the same time,
nvme_rdma_timeout will execute concurrently. Multi requests in different
request queues may be queued in the same rdma queue, multi
nvme_rdma_timeout may call nvme_rdma_stop_queue at the same time.
The first nvme_rdma_timeout will clear NVME_RDMA_Q_LIVE and continue
stopping the rdma queue(drain qp), but the others check NVME_RDMA_Q_LIVE
is already cleared, and then directly complete the requests, complete
request before the qp is fully drained may lead to a use-after-free
condition.
Add a multex lock to serialize nvme_rdma_stop_queue.
Signed-off-by: Chao Leng <lengchao@huawei.com>
Tested-by: Israel Rukshin <israelr@nvidia.com>
Reviewed-by: Israel Rukshin <israelr@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
According to NVMe spec v1.4, section 8.3.1, the PRINFO bit and
the metadata size play a vital role in deteriming the host buffer size.
If PRIFNO bit is set and MS==8, the host doesn't add the metadata buffer,
instead the controller adds it.
Signed-off-by: Revanth Rajashekar <revanth.rajashekar@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
In Linux, if a driver does disable_irq() and later does enable_irq()
on its interrupt, I believe it's expecting these properties:
* If an interrupt was pending when the driver disabled then it will
still be pending after the driver re-enables.
* If an edge-triggered interrupt comes in while an interrupt is
disabled it should assert when the interrupt is re-enabled.
If you think that the above sounds a lot like the disable_irq() and
enable_irq() are supposed to be masking/unmasking the interrupt
instead of disabling/enabling it then you've made an astute
observation. Specifically when talking about interrupts, "mask"
usually means to stop posting interrupts but keep tracking them and
"disable" means to fully shut off interrupt detection. It's
unfortunate that this is so confusing, but presumably this is all the
way it is for historical reasons.
Perhaps more confusing than the above is that, even though clients of
IRQs themselves don't have a way to request mask/unmask
vs. disable/enable calls, IRQ chips themselves can implement both.
...and yet more confusing is that if an IRQ chip implements
disable/enable then they will be called when a client driver calls
disable_irq() / enable_irq().
It does feel like some of the above could be cleared up. However,
without any other core interrupt changes it should be clear that when
an IRQ chip gets a request to "disable" an IRQ that it has to treat it
like a mask of that IRQ.
In any case, after that long interlude you can see that the "unmask
and clear" can break things. Maulik tried to fix it so that we no
longer did "unmask and clear" in commit 71266d9d39 ("pinctrl: qcom:
Move clearing pending IRQ to .irq_request_resources callback"), but it
only handled the PDC case and it had problems (it caused
sc7180-trogdor devices to fail to suspend). Let's fix.
>From my understanding the source of the phantom interrupt in the
were these two things:
1. One that could have been introduced in msm_gpio_irq_set_type()
(only for the non-PDC case).
2. Edges could have been detected when a GPIO was muxed away.
Fixing case #1 is easy. We can just add a clear in
msm_gpio_irq_set_type().
Fixing case #2 is harder. Let's use a concrete example. In
sc7180-trogdor.dtsi we configure the uart3 to have two pinctrl states,
sleep and default, and mux between the two during runtime PM and
system suspend (see geni_se_resources_{on,off}() for more
details). The difference between the sleep and default state is that
the RX pin is muxed to a GPIO during sleep and muxed to the UART
otherwise.
As per Qualcomm, when we mux the pin over to the UART function the PDC
(or the non-PDC interrupt detection logic) is still watching it /
latching edges. These edges don't cause interrupts because the
current code masks the interrupt unless we're entering suspend.
However, as soon as we enter suspend we unmask the interrupt and it's
counted as a wakeup.
Let's deal with the problem like this:
* When we mux away, we'll mask our interrupt. This isn't necessary in
the above case since the client already masked us, but it's a good
idea in general.
* When we mux back will clear any interrupts and unmask.
Fixes: 4b7618fdc7 ("pinctrl: qcom: Add irq_enable callback for msm gpio")
Fixes: 71266d9d39 ("pinctrl: qcom: Move clearing pending IRQ to .irq_request_resources callback")
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Maulik Shah <mkshah@codeaurora.org>
Tested-by: Maulik Shah <mkshah@codeaurora.org>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Link: https://lore.kernel.org/r/20210114191601.v7.4.I7cf3019783720feb57b958c95c2b684940264cd1@changeid
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
When the Qualcomm pinctrl driver wants to Ack an interrupt, it does a
read-modify-write on the interrupt status register. On some SoCs it
makes sure that the status bit is 1 to "Ack" and on others it makes
sure that the bit is 0 to "Ack". Presumably the first type of
interrupt controller is a "write 1 to clear" type register and the
second just let you directly set the interrupt status register.
As far as I can tell from scanning structure definitions, the
interrupt status bit is always in a register by itself. Thus with
both types of interrupt controllers it is safe to "Ack" interrupts
without doing a read-modify-write. We can do a simple write.
It should be noted that if the interrupt status bit _was_ ever in a
register with other things (like maybe status bits for other GPIOs):
a) For "write 1 clear" type controllers then read-modify-write would
be totally wrong because we'd accidentally end up clearing
interrupts we weren't looking at.
b) For "direct set" type controllers then read-modify-write would also
be wrong because someone setting one of the other bits in the
register might accidentally clear (or set) our interrupt.
I say this simply to show that the current read-modify-write doesn't
provide any sort of "future proofing" of the code. In fact (for
"write 1 clear" controllers) the new code is slightly more "future
proof" since it would allow more than one interrupt status bits to
share a register.
NOTE: this code fixes no bugs--it simply avoids an extra register
read.
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Maulik Shah <mkshah@codeaurora.org>
Tested-by: Maulik Shah <mkshah@codeaurora.org>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20210114191601.v7.2.I3635de080604e1feda770591c5563bd6e63dd39d@changeid
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
If we fail to update a block group item in the loop we'll break, however
we'll do btrfs_run_delayed_refs and lose our error value in ret, and
thus not clean up properly. Fix this by only running the delayed refs
if there was no failure.
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Zygo reported the following KASAN splat:
BUG: KASAN: use-after-free in btrfs_backref_cleanup_node+0x18a/0x420
Read of size 8 at addr ffff888112402950 by task btrfs/28836
CPU: 0 PID: 28836 Comm: btrfs Tainted: G W 5.10.0-e35f27394290-for-next+ #23
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
Call Trace:
dump_stack+0xbc/0xf9
? btrfs_backref_cleanup_node+0x18a/0x420
print_address_description.constprop.8+0x21/0x210
? record_print_text.cold.34+0x11/0x11
? btrfs_backref_cleanup_node+0x18a/0x420
? btrfs_backref_cleanup_node+0x18a/0x420
kasan_report.cold.10+0x20/0x37
? btrfs_backref_cleanup_node+0x18a/0x420
__asan_load8+0x69/0x90
btrfs_backref_cleanup_node+0x18a/0x420
btrfs_backref_release_cache+0x83/0x1b0
relocate_block_group+0x394/0x780
? merge_reloc_roots+0x4a0/0x4a0
btrfs_relocate_block_group+0x26e/0x4c0
btrfs_relocate_chunk+0x52/0x120
btrfs_balance+0xe2e/0x1900
? check_flags.part.50+0x6c/0x1e0
? btrfs_relocate_chunk+0x120/0x120
? kmem_cache_alloc_trace+0xa06/0xcb0
? _copy_from_user+0x83/0xc0
btrfs_ioctl_balance+0x3a7/0x460
btrfs_ioctl+0x24c8/0x4360
? __kasan_check_read+0x11/0x20
? check_chain_key+0x1f4/0x2f0
? __asan_loadN+0xf/0x20
? btrfs_ioctl_get_supported_features+0x30/0x30
? kvm_sched_clock_read+0x18/0x30
? check_chain_key+0x1f4/0x2f0
? lock_downgrade+0x3f0/0x3f0
? handle_mm_fault+0xad6/0x2150
? do_vfs_ioctl+0xfc/0x9d0
? ioctl_file_clone+0xe0/0xe0
? check_flags.part.50+0x6c/0x1e0
? check_flags.part.50+0x6c/0x1e0
? check_flags+0x26/0x30
? lock_is_held_type+0xc3/0xf0
? syscall_enter_from_user_mode+0x1b/0x60
? do_syscall_64+0x13/0x80
? rcu_read_lock_sched_held+0xa1/0xd0
? __kasan_check_read+0x11/0x20
? __fget_light+0xae/0x110
__x64_sys_ioctl+0xc3/0x100
do_syscall_64+0x37/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f4c4bdfe427
Allocated by task 28836:
kasan_save_stack+0x21/0x50
__kasan_kmalloc.constprop.18+0xbe/0xd0
kasan_kmalloc+0x9/0x10
kmem_cache_alloc_trace+0x410/0xcb0
btrfs_backref_alloc_node+0x46/0xf0
btrfs_backref_add_tree_node+0x60d/0x11d0
build_backref_tree+0xc5/0x700
relocate_tree_blocks+0x2be/0xb90
relocate_block_group+0x2eb/0x780
btrfs_relocate_block_group+0x26e/0x4c0
btrfs_relocate_chunk+0x52/0x120
btrfs_balance+0xe2e/0x1900
btrfs_ioctl_balance+0x3a7/0x460
btrfs_ioctl+0x24c8/0x4360
__x64_sys_ioctl+0xc3/0x100
do_syscall_64+0x37/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Freed by task 28836:
kasan_save_stack+0x21/0x50
kasan_set_track+0x20/0x30
kasan_set_free_info+0x1f/0x30
__kasan_slab_free+0xf3/0x140
kasan_slab_free+0xe/0x10
kfree+0xde/0x200
btrfs_backref_error_cleanup+0x452/0x530
build_backref_tree+0x1a5/0x700
relocate_tree_blocks+0x2be/0xb90
relocate_block_group+0x2eb/0x780
btrfs_relocate_block_group+0x26e/0x4c0
btrfs_relocate_chunk+0x52/0x120
btrfs_balance+0xe2e/0x1900
btrfs_ioctl_balance+0x3a7/0x460
btrfs_ioctl+0x24c8/0x4360
__x64_sys_ioctl+0xc3/0x100
do_syscall_64+0x37/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xa9
This occurred because we freed our backref node in
btrfs_backref_error_cleanup(), but then tried to free it again in
btrfs_backref_release_cache(). This is because
btrfs_backref_release_cache() will cycle through all of the
cache->leaves nodes and free them up. However
btrfs_backref_error_cleanup() freed the backref node with
btrfs_backref_free_node(), which simply kfree()d the backref node
without unlinking it from the cache. Change this to a
btrfs_backref_drop_node(), which does the appropriate cleanup and
removes the node from the cache->leaves list, so when we go to free the
remaining cache we don't trip over items we've already dropped.
Fixes: 75bfb9aff4 ("Btrfs: cleanup error handling in build_backref_tree")
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This was partially fixed by f3e3d9cc35 ("btrfs: avoid possible signal
interruption of btrfs_drop_snapshot() on relocation tree"), however it
missed a spot when we restart a trans handle because we need to end the
transaction. The fix is the same, simply use btrfs_join_transaction()
instead of btrfs_start_transaction() when deleting reloc roots.
Fixes: f3e3d9cc35 ("btrfs: avoid possible signal interruption of btrfs_drop_snapshot() on relocation tree")
CC: stable@vger.kernel.org # 5.4+
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
A test with the command below gives this error:
/arch/arm64/boot/dts/rockchip/rk3399-evb.dt.yaml: video-codec@ff660000:
'interrupt-names' does not match any of the regexes: 'pinctrl-[0-9]+'
The rkvdec driver gets it irq with help of the platform_get_irq()
function, so remove the interrupt-names property from the rk3399
vdec node.
make ARCH=arm64 dtbs_check
DT_SCHEMA_FILES=Documentation/devicetree/bindings/
media/rockchip,vdec.yaml
Signed-off-by: Johan Jonker <jbx6244@gmail.com>
Link: https://lore.kernel.org/r/20210117181653.24886-1-jbx6244@gmail.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
When the capacity of the disc is too large (assuming the 4.7G
specification), the disc (UDF file system) will be burned
multiple times in the windows (Multisession Usage). When the
remaining capacity of the CD is less than 300M (estimated
value, for reference only), open the CD in the Linux system,
the content of the CD is displayed as blank (the kernel will
say "No VRS found"). Windows can display the contents of the
CD normally.
Through analysis, in the "fs/udf/super.c": udf_check_vsd
function, the actual value of VSD_MAX_SECTOR_OFFSET may
be much larger than 0x800000. According to the current code
logic, it is found that the type of sbi->s_session is "__s32",
when the remaining capacity of the disc is less than 300M
(take a set of test values: sector=3154903040,
sbi->s_session=1540464, sb->s_blocksize_bits=11 ), the
calculation result of "sbi->s_session << sb->s_blocksize_bits"
will overflow. Therefore, it is necessary to convert the
type of s_session to "loff_t" (when udf_check_vsd starts,
assign a value to _sector, which is also converted in this
way), so that the result will not overflow, and then the
content of the disc can be displayed normally.
Link: https://lore.kernel.org/r/20210114075741.30448-1-changlianzhi@uniontech.com
Signed-off-by: lianzhi chang <changlianzhi@uniontech.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Since we allow removing the timeline map at runtime, there is a risk
that rq->hwsp points into a stale page. To control that risk, we hold
the RCU read lock while reading *rq->hwsp, but we missed a couple of
important barriers. First, the unpinning / removal of the timeline map
must be after all RCU readers into that map are complete, i.e. after an
rcu barrier (in this case courtesy of call_rcu()). Secondly, we must
make sure that the rq->hwsp we are about to dereference under the RCU
lock is valid. In this case, we make the rq->hwsp pointer safe during
i915_request_retire() and so we know that rq->hwsp may become invalid
only after the request has been signaled. Therefore is the request is
not yet signaled when we acquire rq->hwsp under the RCU, we know that
rq->hwsp will remain valid for the duration of the RCU read lock.
This is a very small window that may lead to either considering the
request not completed (causing a delay until the request is checked
again, any wait for the request is not affected) or dereferencing an
invalid pointer.
Fixes: 3adac4689f ("drm/i915: Introduce concept of per-timeline (context) HWSP")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: <stable@vger.kernel.org> # v5.1+
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201218122421.18344-1-chris@chris-wilson.co.uk
(cherry picked from commit 9bb36cf660)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210118101755.476744-1-chris@chris-wilson.co.uk
User-space ALSA matches a card's driver name against an internal list of
aliases in order to select the correct configuration for the system.
When the driver name isn't defined, the match is performed against the
card's name.
With the introduction of RPi4 we now have two HDMI ports with two
distinct audio cards. This is reflected in their names, making them
different from previous RPi versions. With this, ALSA ultimately misses
the board's configuration on RPi4.
In order to avoid this, set "card->driver_name" to "vc4-hdmi"
unanimously.
Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
Fixes: f437bc1ec7 ("drm/vc4: drv: Support BCM2711")
Reviewed-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20210115191209.12852-1-nsaenzjulienne@suse.de
Some platforms, such as mips64, don't map __u64 to long long unsigned
int so using %llu produces a warning:
gpio-watch.c: In function ‘main’:
gpio-watch.c:89:30: warning: format ‘%llu’ expects argument of type ‘long long unsigned int’, but argument 4 has type ‘__u64’ {aka ‘long unsigned int’} [-Wformat=]
89 | printf("line %u: %s at %llu\n",
| ~~~^
| |
| long long unsigned int
| %lu
90 | chg.info.offset, event, chg.timestamp_ns);
| ~~~~~~~~~~~~~~~~
| |
| __u64 {aka long unsigned int}
Replace the %llu with PRIu64 and cast the argument to uint64_t.
Fixes: 33f0c47b8f ("tools: gpio: implement gpio-watch")
Signed-off-by: Kent Gibson <warthog618@gmail.com>
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Some platforms, such as mips64, don't map __u64 to long long unsigned
int so using %llu produces a warning:
gpio-event-mon.c:110:37: warning: format ‘%llu’ expects argument of type ‘long long unsigned int’, but argument 3 has type ‘__u64’ {aka ‘long unsigned int’} [-Wformat=]
110 | fprintf(stdout, "GPIO EVENT at %llu on line %d (%d|%d) ",
| ~~~^
| |
| long long unsigned int
| %lu
111 | event.timestamp_ns, event.offset, event.line_seqno,
| ~~~~~~~~~~~~~~~~~~
| |
| __u64 {aka long unsigned int}
Replace the %llu with PRIu64 and cast the argument to uint64_t.
Fixes: 03fd11b033 ("tools/gpio/gpio-event-mon: fix warning")
Signed-off-by: Kent Gibson <warthog618@gmail.com>
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
For addressing the regression on Pioneer devices, we recently
corrected the quirk code to enable the implicit feedback mode on those
devices properly. However, the devices still showed problems with the
full duplex operations with JACK, and after debug sessions, we figured
out that the older kernels that had worked with JACK also didn't use
the implicit feedback mode at all although they had the quirk code to
enable it; instead, the old code worked just to skip the normal sync
endpoint setup that would have been detected without it. IOW, what
broke without the implicit-fb quirk in the past was the application of
the normal sync endpoint that is actually the capture data endpoint on
these devices.
This patch covers the overseen piece: it modifies the quirk code again
not to enable the implicit feedback mode but just to make the driver
skipping the sync endpoint detection. This made the driver working
with JACK full-duplex mode again.
Still it's not quite clear why the implicit feedback doesn't work on
those devices yet; maybe it's about some issues in the URB setup. But
at least, with this patch, the driver should work in the level of the
older kernels again.
Fixes: 167c9dc84e ("ALSA: usb-audio: Fix implicit feedback sync setup for Pioneer devices")
Link: https://lore.kernel.org/r/20210118075816.25068-4-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
The UAC2/3 sample rate setup is based on the clock node, which is
usually shared in the interface, and can't be re-setup without
deselecting the interface once, and that's how the current code
behaves. OTOH, the sample rate setup of UAC1 is per endpoint, hence
we basically need to call for each endpoint usage even if those share
the same interface.
This patch fixes the behavior of UAC1 to call always
snd_usb_init_sample_rate() in snd_usb_endpoint_configure().
Fixes: bf6313a0ff ("ALSA: usb-audio: Refactor endpoint management")
Link: https://lore.kernel.org/r/20210118075816.25068-3-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
The current sample rate setup function for UAC1 assumes only the first
endpoint retrieved from the interface:altset pair, but the rate set up
may be needed also for the secondary endpoint. Also, retrieving the
endpoint number from the interface descriptor is redundant; we have
already the target endpoint in the given audioformat object.
This patch simplifies the code and corrects the target endpoint as
described in the above. It simply refers to fmt->endpoint directly.
Also, this patch drops the pioneer_djm_set_format_quirk() that is
caleld from snd_usb_set_format_quirk(); this function does the sample
rate setup but for the capture endpoint (0x82), and that's exactly
what the change above fixes.
Link: https://lore.kernel.org/r/20210118075816.25068-2-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
When crtc state need_modeset is true it is not necessary
it is going to be a real modeset, it can turns to be a
fastset instead of modeset.
This turns content protection property to be DESIRED and hdcp
update_pipe left with property to be in DESIRED state but
actual hdcp->value was ENABLED.
This issue is caught with DP MST setup, where we have multiple
connector in same DP_MST topology. When disabling HDCP on one of
DP MST connector leads to set the crtc state need_modeset to true
for all other crtc driving the other DP-MST topology connectors.
This turns up other DP MST connectors CP property to be DESIRED
despite the actual hdcp->value is ENABLED.
Above scenario fails the DP MST HDCP IGT test, disabling HDCP on
one MST stream should not cause to disable HDCP on another MST
stream on same DP MST topology.
v2:
- Fixed connector->base.registration_state == DRM_CONNECTOR_REGISTERED
WARN_ON.
v3:
- Commit log improvement. [Uma]
- Added a comment before scheduling prop_work. [Uma]
Fixes: 33f9a623bf ("drm/i915/hdcp: Update CP as per the kernel internal state")
Cc: Ramalingam C <ramalingam.c@intel.com>
Reviewed-by: Uma Shankar <uma.shankar@intel.com>
Reviewed-by: Ramalingam C <ramalingam.c@intel.com>
Tested-by: Karthik B S <karthik.b.s@intel.com>
Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210111081120.28417-2-anshuman.gupta@intel.com
(cherry picked from commit d276e16702)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
On i.MX8MP, The GPIO3's secondary gpio-ranges's 'gpio controller offset'
cell value should be 26, so correct it.
Signed-off-by: Jacky Bai <ping.bai@nxp.com>
Fixes: 6d9b8d2043 ("arm64: dts: freescale: Add i.MX8MP dtsi support")
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
The PHY address bit 2 is configured by the LED pin. Attaching a LED
to this pin is not sufficient to guarantee this configuration pin is
correctly read. This leads to some platforms having their PHY at
address 0 and others at address 4.
If there is no phy-handle specified, the FEC driver will scan the PHY
bus for a PHY and use that. Consequently, adding the DT configuration
of the PHY and the phy properties to the FEC driver broke some boards.
Fix this by removing the phy-handle property, and listing two PHY
entries for both possible PHY addresses, so that the DT configuration
for the PHY can be found by the PHY driver.
Fixes: 86b08bd5b9 ("ARM: dts: imx6-sr-som: add ethernet PHY configuration")
Reported-by: Christoph Mattheis <christoph.mattheis@arcor.de>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
When the kernel is configured to use the Thumb-2 instruction set
"suspend-to-memory" fails to resume. Observed on a Colibri iMX6ULL
(i.MX 6ULL) and Apalis iMX6 (i.MX 6Q).
It looks like the CPU resumes unconditionally in ARM instruction mode
and then chokes on the presented Thumb-2 code it should execute.
Fix this by using the arm instruction set for all code in
suspend-imx6.S.
Signed-off-by: Max Krummenacher <max.krummenacher@toradex.com>
Fixes: df595746fa ("ARM: imx: add suspend in ocram support for i.mx6q")
Acked-by: Oleksandr Suvorov <oleksandr.suvorov@toradex.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Pull perf tools fixes from Arnaldo Carvalho de Melo:
- Fix 'CPU too large' error in Intel PT
- Correct event attribute sizes in 'perf inject'
- Sync build_bug.h and kvm.h kernel copies
- Fix bpf.h header include directive in 5sec.c 'perf trace' bpf example
- libbpf tests fixes
- Fix shadow stat 'perf test' for non-bash shells
- Take cgroups into account for shadow stats in 'perf stat'
* tag 'perf-tools-fixes-2021-01-17' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
perf inject: Correct event attribute sizes
perf intel-pt: Fix 'CPU too large' error
perf stat: Take cgroups into account for shadow stats
perf stat: Introduce struct runtime_stat_data
libperf tests: Fail when failing to get a tracepoint id
libperf tests: If a test fails return non-zero
libperf tests: Avoid uninitialized variable warning
perf test: Fix shadow stat test for non-bash shells
tools headers: Syncronize linux/build_bug.h with the kernel sources
tools headers UAPI: Sync kvm.h headers with the kernel sources
perf bpf examples: Fix bpf.h header include directive in 5sec.c example
Pull powerpc fixes from Michael Ellerman:
"One fix for a lack of alignment in our linker script, that can lead to
crashes depending on configuration etc.
One fix for the 32-bit VDSO after the C VDSO conversion.
Thanks to Andreas Schwab, Ariel Marcovitch, and Christophe Leroy"
* tag 'powerpc-5.11-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/vdso: Fix clock_gettime_fallback for vdso32
powerpc: Fix alignment bug within the init sections
Pull misc vfs fixes from Al Viro:
"Several assorted fixes.
I still think that audit ->d_name race is better fixed this way for
the benefit of backports, with any possibly fancier variants done on
top of it"
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
dump_common_audit_data(): fix racy accesses to ->d_name
iov_iter: fix the uaccess area in copy_compat_iovec_from_user
umount(2): move the flag validity checks first
So technically there is nothing wrong with adding a pinned page to the
swap cache, but the pinning obviously means that the page can't actually
be free'd right now anyway, so it's a bit pointless.
However, the real problem is not with it being a bit pointless: the real
issue is that after we've added it to the swap cache, we'll try to unmap
the page. That will succeed, because the code in mm/rmap.c doesn't know
or care about pinned pages.
Even the unmapping isn't fatal per se, since the page will stay around
in memory due to the pinning, and we do hold the connection to it using
the swap cache. But when we then touch it next and take a page fault,
the logic in do_swap_page() will map it back into the process as a
possibly read-only page, and we'll then break the page association on
the next COW fault.
Honestly, this issue could have been fixed in any of those other places:
(a) we could refuse to unmap a pinned page (which makes conceptual
sense), or (b) we could make sure to re-map a pinned page writably in
do_swap_page(), or (c) we could just make do_wp_page() not COW the
pinned page (which was what we historically did before that "mm:
do_wp_page() simplification" commit).
But while all of them are equally valid models for breaking this chain,
not putting pinned pages into the swap cache in the first place is the
simplest one by far.
It's also the safest one: the reason why do_wp_page() was changed in the
first place was that getting the "can I re-use this page" wrong is so
fraught with errors. If you do it wrong, you end up with an incorrectly
shared page.
As a result, using "page_maybe_dma_pinned()" in either do_wp_page() or
do_swap_page() would be a serious bug since it is only a (very good)
heuristic. Re-using the page requires a hard black-and-white rule with
no room for ambiguity.
In contrast, saying "this page is very likely dma pinned, so let's not
add it to the swap cache and try to unmap it" is an obviously safe thing
to do, and if the heuristic might very rarely be a false positive, no
harm is done.
Fixes: 09854ba94c ("mm: do_wp_page() simplification")
Reported-and-tested-by: Martin Raiber <martin@urbackup.org>
Cc: Pavel Begunkov <asml.silence@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
With commit 4df4cb9e99, the Hyper-V direct-mode STIMER is actually
initialized before LAPIC is initialized: see
apic_intr_mode_init()
x86_platform.apic_post_init()
hyperv_init()
hv_stimer_alloc()
apic_bsp_setup()
setup_local_APIC()
setup_local_APIC() temporarily disables LAPIC, initializes it and
re-eanble it. The direct-mode STIMER depends on LAPIC, and when it's
registered, it can be programmed immediately and the timer can fire
very soon:
hv_stimer_init
clockevents_config_and_register
clockevents_register_device
tick_check_new_device
tick_setup_device
tick_setup_periodic(), tick_setup_oneshot()
clockevents_program_event
When the timer fires in the hypervisor, if the LAPIC is in the
disabled state, new versions of Hyper-V ignore the event and don't inject
the timer interrupt into the VM, and hence the VM hangs when it boots.
Note: when the VM starts/reboots, the LAPIC is pre-enabled by the
firmware, so the window of LAPIC being temporarily disabled is pretty
small, and the issue can only happen once out of 100~200 reboots for
a 40-vCPU VM on one dev host, and on another host the issue doesn't
reproduce after 2000 reboots.
The issue is more noticeable for kdump/kexec, because the LAPIC is
disabled by the first kernel, and stays disabled until the kdump/kexec
kernel enables it. This is especially an issue to a Generation-2 VM
(for which Hyper-V doesn't emulate the PIT timer) when CONFIG_HZ=1000
(rather than CONFIG_HZ=250) is used.
Fix the issue by moving hv_stimer_alloc() to a later place where the
LAPIC timer is initialized.
Fixes: 4df4cb9e99 ("x86/hyperv: Initialize clockevents earlier in CPU onlining")
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20210116223136.13892-1-decui@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
The change of ia64's default memory model to SPARSEMEM causes defconfig
build to fail:
CC kernel/async.o
In file included from include/linux/numa.h:25,
from include/linux/async.h:13,
from kernel/async.c:47:
arch/ia64/include/asm/sparsemem.h:14:40: warning: "PAGE_SHIFT" is not defined, evaluates to 0 [-Wundef]
14 | #if ((CONFIG_FORCE_MAX_ZONEORDER - 1 + PAGE_SHIFT) > SECTION_SIZE_BITS)
| ^~~~~~~~~~
In file included from include/linux/gfp.h:6,
from include/linux/xarray.h:14,
from include/linux/radix-tree.h:19,
from include/linux/idr.h:15,
from include/linux/kernfs.h:13,
from include/linux/sysfs.h:16,
from include/linux/kobject.h:20,
from include/linux/energy_model.h:7,
from include/linux/device.h:16,
from include/linux/async.h:14,
from kernel/async.c:47:
include/linux/mmzone.h:1156:2: error: #error Allocator MAX_ORDER exceeds SECTION_SIZE
1156 | #error Allocator MAX_ORDER exceeds SECTION_SIZE
| ^~~~~
The error cause is the missing definition of PAGE_SHIFT in the calculation
of SECTION_SIZE_BITS.
Add include of <asm/page.h> to arch/ia64/include/asm/sparsemem.h to solve
the problem.
Fixes: 214496cb18 ("ia64: make SPARSEMEM default and disable DISCONTIGMEM")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
I2C_SMBUS_BLOCK_MAX defines already the maximum number as defined in the
SMBus 2.0 specs. No reason to add one to it.
Fixes: 886f6f8337 ("i2c: octeon: Support I2C_M_RECV_LEN")
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Robert Richter <rric@kernel.org>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
VI I2C controller has known hardware bug where immediate multiple
writes to TX_FIFO register gets stuck.
Recommended software work around is to read I2C register after
each write to TX_FIFO register to flush out the data.
This patch implements this work around for VI I2C controller.
Signed-off-by: Sowjanya Komatineni <skomatineni@nvidia.com>
Reviewed-by: Dmitry Osipenko <digetx@gmail.com>
Acked-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
In order to not to start returning errors when new I2C_M flags are
added, change behavior to just ignore all flags that we don't know
about. This includes the I2C_M_DMA_SAFE flag that already exists but
causes -EINVAL to be returned for valid transactions.
Cc: stable@vger.kernel.org # v4.19+
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Nikhil reported the misc interrupt handler can sometimes miss handling
the command interrupt when an error interrupt happens near the same time.
Have the irq handling thread continue to process the misc interrupts until
all interrupts are processed. This is a low usage interrupt and is not
expected to handle high volume traffic. Therefore there is no concern of
this thread running for a long time.
Fixes: 0d5c10b4c8 ("dmaengine: idxd: add work queue drain support")
Reported-by: Nikhil Rao <nikhil.rao@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/161074755329.2183844.13295528344116907983.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
If there are no requests at the time __io_uring_task_cancel() is called,
tctx_inflight() returns zero and and it terminates not getting a chance
to go through __io_uring_files_cancel() and do
io_disable_sqo_submit(). And we absolutely want them disabled by the
time cancellation ends.
Reported-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Fixes: d9d05217cb ("io_uring: stop SQPOLL submit on creator's death")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Commit 3226b158e6 ("net: avoid 32 x truesize under-estimation for
tiny skbs") ensured that skbs with data size lower than 1025 bytes
will be kmalloc'ed to avoid excessive page cache fragmentation and
memory consumption.
However, the fix adressed only __napi_alloc_skb() (primarily for
virtio_net and napi_get_frags()), but the issue can still be achieved
through __netdev_alloc_skb(), which is still used by several drivers.
Drivers often allocate a tiny skb for headers and place the rest of
the frame to frags (so-called copybreak).
Mirror the condition to __netdev_alloc_skb() to handle this case too.
Since v1 [0]:
- fix "Fixes:" tag;
- refine commit message (mention copybreak usecase).
[0] https://lore.kernel.org/netdev/20210114235423.232737-1-alobakin@pm.me
Fixes: a1c7fff7e1 ("net: netdev_alloc_skb() use build_skb()")
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Link: https://lore.kernel.org/r/20210115150354.85967-1-alobakin@pm.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull SCSI fixes from James Bottomley:
"Nine minor fixes, seven in drivers and two in the core SCSI disk
driver (sd) which should be harmless involving removing an unused
variable and quietening a spurious warning"
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: sd: Remove obsolete variable in sd_remove()
scsi: sd: Suppress spurious errors when WRITE SAME is being disabled
scsi: scsi_debug: Fix memleak in scsi_debug_init()
scsi: mpt3sas: Fix spelling mistake in Kconfig "compatiblity" -> "compatibility"
scsi: qedi: Correct max length of CHAP secret
scsi: ufs: Correct the LUN used in eh_device_reset_handler() callback
scsi: ufs: Relocate flush of exceptional event
scsi: ufs: Relax the condition of UFSHCI_QUIRK_SKIP_MANUAL_WB_FLUSH_CTRL
scsi: ufs: Fix possible power drain during system suspend
We are not guaranteed the locking environment that would prevent
dentry getting renamed right under us. And it's possible for
old long name to be freed after rename, leading to UAF here.
Cc: stable@kernel.org # v2.6.2+
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Pull block fixes from Jens Axboe:
"Just an nvme pull request via Christoph:
- don't initialize hwmon for discover controllers (Sagi Grimberg)
- fix iov_iter handling in nvme-tcp (Sagi Grimberg)
- fix a preempt warning in nvme-tcp (Sagi Grimberg)
- fix a possible NULL pointer dereference in nvme (Israel Rukshin)"
* tag 'block-5.11-2021-01-16' of git://git.kernel.dk/linux-block:
nvme: don't intialize hwmon for discovery controllers
nvme-tcp: fix possible data corruption with bio merges
nvme-tcp: Fix warning with CONFIG_DEBUG_PREEMPT
nvmet-rdma: Fix NULL deref when setting pi_enable and traddr INADDR_ANY
WARNING: CPU: 1 PID: 11100 at fs/io_uring.c:9096
io_uring_flush+0x326/0x3a0 fs/io_uring.c:9096
RIP: 0010:io_uring_flush+0x326/0x3a0 fs/io_uring.c:9096
Call Trace:
filp_close+0xb4/0x170 fs/open.c:1280
close_files fs/file.c:401 [inline]
put_files_struct fs/file.c:416 [inline]
put_files_struct+0x1cc/0x350 fs/file.c:413
exit_files+0x7e/0xa0 fs/file.c:433
do_exit+0xc22/0x2ae0 kernel/exit.c:820
do_group_exit+0x125/0x310 kernel/exit.c:922
get_signal+0x3e9/0x20a0 kernel/signal.c:2770
arch_do_signal_or_restart+0x2a8/0x1eb0 arch/x86/kernel/signal.c:811
handle_signal_work kernel/entry/common.c:147 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x148/0x250 kernel/entry/common.c:201
__syscall_exit_to_user_mode_work kernel/entry/common.c:291 [inline]
syscall_exit_to_user_mode+0x19/0x50 kernel/entry/common.c:302
entry_SYSCALL_64_after_hwframe+0x44/0xa9
An SQPOLL ring creator task may have gotten rid of its file note during
exit and called io_disable_sqo_submit(), but the io_uring is still left
referenced through fdtable, which will be put during close_files() and
cause a false positive warning.
First split the warning into two for more clarity when is hit, and the
add sqo_dead check to handle the described case.
Reported-by: syzbot+a32b546d58dde07875a1@syzkaller.appspotmail.com
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
WARNING: CPU: 1 PID: 9094 at fs/io_uring.c:8884
io_disable_sqo_submit+0x106/0x130 fs/io_uring.c:8884
Call Trace:
io_uring_flush+0x28b/0x3a0 fs/io_uring.c:9099
filp_close+0xb4/0x170 fs/open.c:1280
close_fd+0x5c/0x80 fs/file.c:626
__do_sys_close fs/open.c:1299 [inline]
__se_sys_close fs/open.c:1297 [inline]
__x64_sys_close+0x2f/0xa0 fs/open.c:1297
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9
io_uring's final close() may be triggered by any task not only the
creator. It's well handled by io_uring_flush() including SQPOLL case,
though a warning in io_disable_sqo_submit() will fallaciously fire by
moving this warning out to the only call site that matters.
Reported-by: syzbot+2f5d1785dc624932da78@syzkaller.appspotmail.com
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
If we're freeing/finishing iopoll requests, ensure we check if the task
is in idling in terms of cancelation. Otherwise we could end up waiting
forever in __io_uring_task_cancel() if the task has active iopoll
requests that need cancelation.
Cc: stable@vger.kernel.org # 5.9+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull io_uring fixes from Jens Axboe:
"We still have a pending fix for a cancelation issue, but it's still
being investigated. In the meantime:
- Dead mm handling fix (Pavel)
- SQPOLL setup error handling (Pavel)
- Flush timeout sequence fix (Marcelo)
- Missing finish_wait() for one exit case"
* tag 'io_uring-5.11-2021-01-16' of git://git.kernel.dk/linux-block:
io_uring: ensure finish_wait() is always called in __io_uring_task_cancel()
io_uring: flush timeouts that should already have expired
io_uring: do sqo disable on install_fd error
io_uring: fix null-deref in io_disable_sqo_submit
io_uring: don't take files/mm for a dead task
io_uring: drop mm and files after task_work_run
Pull RISC-V fixes from Palmer Dabbelt:
"There are a few more fixes than a normal rc4, largely due to the
bubble introduced by the holiday break:
- return -ENOSYS for syscall number -1, which previously returned an
uninitialized value.
- ensure of_clk_init() has been called in time_init(), without which
clock drivers may not be initialized.
- fix sifive,uart0 driver to properly display the baud rate. A fix to
initialize MPIE that allows interrupts to be processed during
system calls.
- avoid erronously begin tracing IRQs when interrupts are disabled,
which at least triggers suprious lockdep failures.
- workaround for a warning related to calling smp_processor_id()
while preemptible. The warning itself is suprious on currently
availiable systems.
- properly include the generic time VDSO calls. A fix to our kasan
address mapping. A fix to the HiFive Unleashed device tree, which
allows the Ethernet PHY to be properly initialized by Linux (as
opposed to relying on the bootloader).
- defconfig update to include SiFive's GPIO driver, which is present
on the HiFive Unleashed and necessary to initialize the PHY.
- avoid allocating memory while initializing reserved memory.
- avoid allocating the last 4K of memory, as pointers there alias
with syscall errors.
There are also two cleanups that should have no functional effect but
do fix build warnings:
- drop a duplicated definition of PAGE_KERNEL_EXEC.
- properly declare the asm register SP shim.
- cleanup the rv32 memory size Kconfig entry, to reflect the actual
size of memory availiable"
* tag 'riscv-for-linus-5.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
RISC-V: Fix maximum allowed phsyical memory for RV32
RISC-V: Set current memblock limit
RISC-V: Do not allocate memblock while iterating reserved memblocks
riscv: stacktrace: Move register keyword to beginning of declaration
riscv: defconfig: enable gpio support for HiFive Unleashed
dts: phy: add GPIO number and active state used for phy reset
dts: phy: fix missing mdio device and probe failure of vsc8541-01 device
riscv: Fix KASAN memory mapping.
riscv: Fixup CONFIG_GENERIC_TIME_VSYSCALL
riscv: cacheinfo: Fix using smp_processor_id() in preemptible
riscv: Trace irq on only interrupt is enabled
riscv: Drop a duplicated PAGE_KERNEL_EXEC
riscv: Enable interrupts during syscalls with M-Mode
riscv: Fix sifive serial driver
riscv: Fix kernel time_init()
riscv: return -ENOSYS for syscall -1
If the set definition provides no stateful expressions, then include the
stateful expression in the ruleset listing. Without this fix, the dynset
rule listing shows the stateful expressions provided by the set
definition.
Fixes: 65038428b2 ("netfilter: nf_tables: allow to specify stateful expression in set definition")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Turning a pinned page read-only breaks the pinning after COW. Don't do it.
The whole "track page soft dirty" state doesn't work with pinned pages
anyway, since the page might be dirtied by the pinning entity without
ever being noticed in the page tables.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Turning page table entries read-only requires the mmap_sem held for
writing.
So stop doing the odd games with turning things from read locks to write
locks and back. Just get the write lock.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Otherwise, the newly create element shows no timeout when listing the
ruleset. If the set definition does not specify a default timeout, then
the set element only shows the expiration time, but not the timeout.
This is a problem when restoring a stateful ruleset listing since it
skips the timeout policy entirely.
Fixes: 22fe54d5fe ("netfilter: nf_tables: add support for dynamic set updates")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
If the set definition contains stateful expressions, allocate them for
the newly added entries from the packet path.
Fixes: 65038428b2 ("netfilter: nf_tables: allow to specify stateful expression in set definition")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Linux kernel can only map 1GB of address space for RV32 as the page offset
is set to 0xC0000000. The current description in the Kconfig is confusing
as it indicates that RV32 can support 2GB of physical memory. That is
simply not true for current kernel. In future, a 2GB split support can be
added to allow 2GB physical address space.
Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Atish Patra <atish.patra@wdc.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Currently, linux kernel can not use last 4k bytes of addressable space
because IS_ERR_VALUE macro treats those as an error. This will be an issue
for RV32 as any memblock allocator potentially allocate chunk of memory
from the end of DRAM (2GB) leading bad address error even though the
address was technically valid.
Fix this issue by limiting the memblock if available memory spans the
entire address space.
Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Atish Patra <atish.patra@wdc.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
sizeof needs to be called on the compat pointer, not the native one.
Fixes: 89cd35c58b ("iov_iter: transparently handle compat iovecs in import_iovec")
Reported-by: David Laight <David.Laight@ACULAB.COM>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Pull device mapper fixes from Mike Snitzer:
- Fix DM-raid's raid1 discard limits so discards work.
- Select missing Kconfig dependencies for DM integrity and zoned
targets.
- Four fixes for DM crypt target's support to optionally bypass kcryptd
workqueues.
- Fix DM snapshot merge supports missing data flushes before committing
metadata.
- Fix DM integrity data device flushing when external metadata is used.
- Fix DM integrity's maximum number of supported constructor arguments
that user can request when creating an integrity device.
- Eliminate DM core ioctl logging noise when an ioctl is issued without
required CAP_SYS_RAWIO permission.
* tag 'for-5.11/dm-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm crypt: defer decryption to a tasklet if interrupts disabled
dm integrity: fix the maximum number of arguments
dm crypt: do not call bio_endio() from the dm-crypt tasklet
dm integrity: fix flush with external metadata device
dm: eliminate potential source of excessive kernel log noise
dm snapshot: flush merged data before committing metadata
dm crypt: use GFP_ATOMIC when allocating crypto requests from softirq
dm crypt: do not wait for backlogged crypto request completion in softirq
dm zoned: select CONFIG_CRC32
dm integrity: select CRYPTO_SKCIPHER
dm raid: fix discard limits for raid1
LinuxSourceTree will unceremoniously crash if the user doesn't call
read_kunitconfig() first in a number of functions.
And currently every place we create an instance, the caller also calls
create_kunitconfig() and read_kunitconfig().
Move these instead into __init__() so they can't be forgotten and to
reduce copy-paste.
The https://github.com/google/pytype type-checker complained that
_config wasn't initialized. With this, kunit_tool now type checks
under both pytype and mypy.
Add an optional boolean that can be used to disable this for use cases
in the future where we might not need/want to load the config.
Signed-off-by: Daniel Latypov <dlatypov@google.com>
Reviewed-by: Brendan Higgins <brendanhiggins@google.com>
Tested-by: Brendan Higgins <brendanhiggins@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
The code to handle aggregating statuses didn't check that the status
actually got set to some non-None value.
Default the value to SUCCESS instead of adding a bunch of `is None`
checks.
This sorta follows the precedent in commit 3fc48259d5 ("kunit: Don't
fail test suites if one of them is empty").
Also slightly simplify the code and add type annotations.
Signed-off-by: Daniel Latypov <dlatypov@google.com>
Reviewed-by: David Gow <davidgow@google.com>
Reviewed-by: Brendan Higgins <brendanhiggins@google.com>
Tested-by: Brendan Higgins <brendanhiggins@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
The authors of this tool were more familiar with a different
type-checker, https://github.com/google/pytype.
That's open source, but mypy seems more prevalent (and runs faster).
And unlike pytype, mypy doesn't try to infer types so it doesn't check
unanotated functions.
So annotate ~all functions in kunit tool to increase type-checking
coverage.
Note: per https://www.python.org/dev/peps/pep-0484/, `__init__()` should
be annotated as `-> None`.
Doing so makes mypy discover a number of new violations.
Exclude main() since we reuse `request` for the different types of
requests, which mypy isn't happy about.
This commit fixes all but one error, where `TestSuite.status` might be
None.
Signed-off-by: Daniel Latypov <dlatypov@google.com>
Reviewed-by: David Gow <davidgow@google.com>
Tested-by: Brendan Higgins <brendanhiggins@google.com>
Acked-by: Brendan Higgins <brendanhiggins@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Commit fadb08e7c7 ("kunit: Support for Parameterized Testing")
introduced support but lacks documentation for how to use it.
This patch builds on commit 1f0e943df6 ("Documentation: kunit: provide
guidance for testing many inputs") to show a minimal example of the new
feature.
Signed-off-by: Daniel Latypov <dlatypov@google.com>
Reviewed-by: David Gow <davidgow@google.com>
Tested-by: Brendan Higgins <brendanhiggins@google.com>
Acked-by: Brendan Higgins <brendanhiggins@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Daniel Borkmann says:
====================
pull-request: bpf 2021-01-16
1) Fix a double bpf_prog_put() for BPF_PROG_{TYPE_EXT,TYPE_TRACING} types in
link creation's error path causing a refcount underflow, from Jiri Olsa.
2) Fix BTF validation errors for the case where kernel modules don't declare
any new types and end up with an empty BTF, from Andrii Nakryiko.
3) Fix BPF local storage helpers to first check their {task,inode} owners for
being NULL before access, from KP Singh.
4) Fix a memory leak in BPF setsockopt handling for the case where optlen is
zero and thus temporary optval buffer should be freed, from Stanislav Fomichev.
5) Fix a syzbot memory allocation splat in BPF_PROG_TEST_RUN infra for
raw_tracepoint caused by too big ctx_size_in, from Song Liu.
6) Fix LLVM code generation issues with verifier where PTR_TO_MEM{,_OR_NULL}
registers were spilled to stack but not recognized, from Gilad Reti.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
MAINTAINERS: Update my email address
selftests/bpf: Add verifier test for PTR_TO_MEM spill
bpf: Support PTR_TO_MEM{,_OR_NULL} register spilling
bpf: Reject too big ctx_size_in for raw_tp test run
libbpf: Allow loading empty BTFs
bpf: Allow empty module BTFs
bpf: Don't leak memory in bpf getsockopt when optlen == 0
bpf: Update local storage test to check handling of null ptrs
bpf: Fix typo in bpf_inode_storage.c
bpf: Local storage helpers should check nullness of owner ptr passed
bpf: Prevent double bpf_prog_put call from bpf_tracing_prog_attach
====================
Link: https://lore.kernel.org/r/20210116002025.15706-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Merge misc fixes from Andrew Morton:
"10 patches.
Subsystems affected by this patch series: MAINTAINERS and mm (slub,
pagealloc, memcg, kasan, vmalloc, migration, hugetlb, memory-failure,
and process_vm_access)"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm/process_vm_access.c: include compat.h
mm,hwpoison: fix printing of page flags
MAINTAINERS: add Vlastimil as slab allocators maintainer
mm/hugetlb: fix potential missing huge page size info
mm: migrate: initialize err in do_migrate_pages
mm/vmalloc.c: fix potential memory leak
arm/kasan: fix the array size of kasan_early_shadow_pte[]
mm/memcontrol: fix warning in mem_cgroup_page_lruvec()
mm/page_alloc: add a missing mm_page_alloc_zone_locked() tracepoint
mm, slub: consider rest of partial list if acquire_slab() fails
Pull rdma fixes from Jason Gunthorpe:
"A fairly modest set of bug fixes, nothing abnormal from the merge
window
The ucma patch is a bit on the larger side, but given the regression
was recently added I've opted to forward it to the rc stream.
- Fix a ucma memory leak introduced in v5.9 while fixing the
Syzkaller bugs
- Don't fail when the xarray wraps for user verbs objects
- User triggerable oops regression from the umem page size rework
- Error unwind bugs in usnic, ocrdma, mlx5 and cma"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/cma: Fix error flow in default_roce_mode_store
RDMA/mlx5: Fix wrong free of blue flame register on error
IB/mlx5: Fix error unwinding when set_has_smi_cap fails
RDMA/umem: Avoid undefined behavior of rounddown_pow_of_two()
RDMA/ocrdma: Fix use after free in ocrdma_dealloc_ucontext_pd()
RDMA/usnic: Fix memleak in find_free_vf_and_create_qp_grp
RDMA/restrack: Don't treat as an error allocation ID wrapping
RDMA/ucma: Do not miss ctx destruction steps in some cases
If we enter with requests pending and performm cancelations, we'll have
a different inflight count before and after calling prepare_to_wait().
This causes the loop to restart. If we actually ended up canceling
everything, or everything completed in-between, then we'll break out
of the loop without calling finish_wait() on the waitqueue. This can
trigger a warning on exit_signals(), as we leave the task state in
TASK_UNINTERRUPTIBLE.
Put a finish_wait() after the loop to catch that case.
Cc: stable@vger.kernel.org # 5.9+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull ext4 fixes from Ted Ts'o:
"A number of bug fixes for ext4:
- Fix for the new fast_commit feature
- Fix some error handling codepaths in whiteout handling and
mountpoint sampling
- Fix how we write ext4_error information so it goes through the
journal when journalling is active, to avoid races that can lead to
lost error information, superblock checksum failures, or DIF/DIX
features"
* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: remove expensive flush on fast commit
ext4: fix bug for rename with RENAME_WHITEOUT
ext4: fix wrong list_splice in ext4_fc_cleanup
ext4: use IS_ERR instead of IS_ERR_OR_NULL and set inode null when IS_ERR
ext4: don't leak old mountpoint samples
ext4: drop ext4_handle_dirty_super()
ext4: fix superblock checksum failure when setting password salt
ext4: use sbi instead of EXT4_SB(sb) in ext4_update_super()
ext4: save error info to sb through journal if available
ext4: protect superblock modifications with a buffer lock
ext4: drop sync argument of ext4_commit_super()
ext4: combine ext4_handle_error() and save_error_info()
Pull cifs fixes from Steve French:
"Two small cifs fixes for stable (including an important handle leak
fix) and three small cleanup patches"
* tag '5.11-rc3-smb3' of git://git.samba.org/sfrench/cifs-2.6:
cifs: style: replace one-element array with flexible-array
cifs: connect: style: Simplify bool comparison
fs: cifs: remove unneeded variable in smb3_fs_context_dup
cifs: fix interrupted close commands
cifs: check pointer before freeing
Pull arm64 fixes from Catalin Marinas:
- Set the minimum GCC version to 5.1 for arm64 due to earlier compiler
bugs.
- Make atomic helpers __always_inline to avoid a section mismatch when
compiling with clang.
- Fix the CMA and crashkernel reservations to use ZONE_DMA (remove the
arm64_dma32_phys_limit variable, no longer needed with a dynamic
ZONE_DMA sizing in 5.11).
- Remove redundant IRQ flag tracing that was leaving lockdep
inconsistent with the hardware state.
- Revert perf events based hard lockup detector that was causing
smp_processor_id() to be called in preemptible context.
- Some trivial cleanups - spelling fix, renaming S_FRAME_SIZE to
PT_REGS_SIZE, function prototypes added.
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: selftests: Fix spelling of 'Mismatch'
arm64: syscall: include prototype for EL0 SVC functions
compiler.h: Raise minimum version of GCC to 5.1 for arm64
arm64: make atomic helpers __always_inline
arm64: rename S_FRAME_SIZE to PT_REGS_SIZE
Revert "arm64: Enable perf events based hard lockup detector"
arm64: entry: remove redundant IRQ flag tracing
arm64: Remove arm64_dma32_phys_limit and its uses
Pull MIPS fixes from Thomas Bogendoerfer:
- fix coredumps on 64bit kernels
- fix for alignment bugs preventing booting
- fix checking for failed irq_alloc_desc calls
* tag 'mips_fixes_5.11.1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
MIPS: OCTEON: fix unreachable code in octeon_irq_init_ciu
MIPS: relocatable: fix possible boot hangup with KASLR enabled
MIPS: Fix malformed NT_FILE and NT_SIGINFO in 32bit coredumps
MIPS: boot: Fix unaligned access with CONFIG_MIPS_RAW_APPENDED_DTB
In some cases, the number of cpus (nr_cpus_online) is confused with the
maximum cpu number (nr_cpus_avail), which results in the error in the
example below:
Example on system with 8 cpus:
Before:
# echo 0 > /sys/devices/system/cpu/cpu2/online
# ./perf record --kcore -e intel_pt// taskset --cpu-list 7 uname
Linux
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.147 MB perf.data ]
# ./perf script --itrace=e
Requested CPU 7 too large. Consider raising MAX_NR_CPUS
0x25908 [0x8]: failed to process type: 68 [Invalid argument]
After:
# ./perf script --itrace=e
#
Fixes: 8c7274691f ("perf machine: Replace MAX_NR_CPUS with perf_env::nr_cpus_online")
Fixes: 7df4e36a47 ("perf session: Replace MAX_NR_CPUS with perf_env::nr_cpus_online")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: stable@vger.kernel.org
Link: http://lore.kernel.org/lkml/20210107174159.24897-1-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
As of now it doesn't consider cgroups when collecting shadow stats and
metrics so counter values from different cgroups will be saved in a same
slot. This resulted in incorrect numbers when those cgroups have
different workloads.
For example, let's look at the scenario below: cgroups A and C runs same
workload which burns a cpu while cgroup B runs a light workload.
$ perf stat -a -e cycles,instructions --for-each-cgroup A,B,C sleep 1
Performance counter stats for 'system wide':
3,958,116,522 cycles A
6,722,650,929 instructions A # 2.53 insn per cycle
1,132,741 cycles B
571,743 instructions B # 0.00 insn per cycle
4,007,799,935 cycles C
6,793,181,523 instructions C # 2.56 insn per cycle
1.001050869 seconds time elapsed
When I run 'perf stat' with single workload, it usually shows IPC around
1.7. We can verify it (6,722,650,929.0 / 3,958,116,522 = 1.698) for cgroup A.
But in this case, since cgroups are ignored, cycles are averaged so it
used the lower value for IPC calculation and resulted in around 2.5.
avg cycle: (3958116522 + 1132741 + 4007799935) / 3 = 2655683066
IPC (A) : 6722650929 / 2655683066 = 2.531
IPC (B) : 571743 / 2655683066 = 0.0002
IPC (C) : 6793181523 / 2655683066 = 2.557
We can simply compare cgroup pointers in the evsel and it'll be NULL
when cgroups are not specified. With this patch, I can see correct
numbers like below:
$ perf stat -a -e cycles,instructions --for-each-cgroup A,B,C sleep 1
Performance counter stats for 'system wide':
4,171,051,687 cycles A
7,219,793,922 instructions A # 1.73 insn per cycle
1,051,189 cycles B
583,102 instructions B # 0.55 insn per cycle
4,171,124,710 cycles C
7,192,944,580 instructions C # 1.72 insn per cycle
1.007909814 seconds time elapsed
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20210115071139.257042-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When mounting a cgroup hierarchy with disabled controller in cgroup v1,
all available controllers will be attached.
For example, boot with cgroup_no_v1=cpu or cgroup_disable=cpu, and then
mount with "mount -t cgroup -ocpu cpu /sys/fs/cgroup/cpu", then all
enabled controllers will be attached except cpu.
Fix this by adding disabled controller check in cgroup1_parse_param().
If the specified controller is disabled, just return error with information
"Disabled controller xx" rather than attaching all the other enabled
controllers.
Fixes: f5dfb5315d ("cgroup: take options parsing into ->parse_monolithic()")
Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
Reviewed-by: Zefan Li <lizefan.x@bytedance.com>
Reviewed-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
We got a "deleted inode referenced" warning cross our fsstress test. The
bug can be reproduced easily with following steps:
cd /dev/shm
mkdir test/
fallocate -l 128M img
mkfs.ext4 -b 1024 img
mount img test/
dd if=/dev/zero of=test/foo bs=1M count=128
mkdir test/dir/ && cd test/dir/
for ((i=0;i<1000;i++)); do touch file$i; done # consume all block
cd ~ && renameat2(AT_FDCWD, /dev/shm/test/dir/file1, AT_FDCWD,
/dev/shm/test/dir/dst_file, RENAME_WHITEOUT) # ext4_add_entry in
ext4_rename will return ENOSPC!!
cd /dev/shm/ && umount test/ && mount img test/ && ls -li test/dir/file1
We will get the output:
"ls: cannot access 'test/dir/file1': Structure needs cleaning"
and the dmesg show:
"EXT4-fs error (device loop0): ext4_lookup:1626: inode #2049: comm ls:
deleted inode referenced: 139"
ext4_rename will create a special inode for whiteout and use this 'ino'
to replace the source file's dir entry 'ino'. Once error happens
latter(the error above was the ENOSPC return from ext4_add_entry in
ext4_rename since all space has been consumed), the cleanup do drop the
nlink for whiteout, but forget to restore 'ino' with source file. This
will trigger the bug describle as above.
Signed-off-by: yangerkun <yangerkun@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: stable@vger.kernel.org
Fixes: cd808deced ("ext4: support RENAME_WHITEOUT")
Link: https://lore.kernel.org/r/20210105062857.3566-1-yangerkun@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
It was using some bash-specific features and failed to parse when
running with a different shell like below:
root@kbl-ppc:~/kbl-ws/perf-dev/lck-9077/acme.tmp/tools/perf# ./perf test 83 -vv
83: perf stat metrics (shadow stat) test :
--- start ---
test child forked, pid 3922
./tests/shell/stat+shadow_stat.sh: 19: ./tests/shell/stat+shadow_stat.sh: [[: not found
./tests/shell/stat+shadow_stat.sh: 24: ./tests/shell/stat+shadow_stat.sh: [[: not found
./tests/shell/stat+shadow_stat.sh: 30: ./tests/shell/stat+shadow_stat.sh: [[: not found
(standard_in) 2: syntax error
./tests/shell/stat+shadow_stat.sh: 36: ./tests/shell/stat+shadow_stat.sh: [[: not found
./tests/shell/stat+shadow_stat.sh: 19: ./tests/shell/stat+shadow_stat.sh: [[: not found
./tests/shell/stat+shadow_stat.sh: 24: ./tests/shell/stat+shadow_stat.sh: [[: not found
./tests/shell/stat+shadow_stat.sh: 30: ./tests/shell/stat+shadow_stat.sh: [[: not found
(standard_in) 2: syntax error
./tests/shell/stat+shadow_stat.sh: 36: ./tests/shell/stat+shadow_stat.sh: [[: not found
./tests/shell/stat+shadow_stat.sh: 45: ./tests/shell/stat+shadow_stat.sh: declare: not found
test child finished with -1
---- end ----
perf stat metrics (shadow stat) test: FAILED!
Reported-by: Jin Yao <yao.jin@linux.intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Laight <david.laight@aculab.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20210114050609.1258820-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick up the changes in:
3a176b9460 ("Revert "kbuild: avoid static_assert for genksyms"")
And silence this perf build warning:
Warning: Kernel ABI header at 'tools/include/linux/build_bug.h' differs from latest version at 'include/linux/build_bug.h'
diff -u tools/include/linux/build_bug.h include/linux/build_bug.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick the changes in:
647daca25d ("KVM: SVM: Add support for booting APs in an SEV-ES guest")
That don't cause any tooling change, just silences this perf build
warning:
Warning: Kernel ABI header at 'tools/include/uapi/linux/kvm.h' differs from latest version at 'include/uapi/linux/kvm.h'
diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Pull ACPI fixes from Rafael Wysocki:
"These address a device ID bounds check error in the device enumeration
code and fix a mistake in the documentation.
Specifics:
- Harden the ACPI device enumeration code against device ID length
overflows to address a Linux VM cash on Hyper-V (Dexuan Cui).
- Fix a mistake in the documentation of error type values for PCIe
errors (Qiuxu Zhuo)"
* tag 'acpi-5.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
Documentation: ACPI: EINJ: Fix error type values for PCIe errors
ACPI: scan: Harden acpi_device_add() against device ID overflows
Pull xen fixes from Juergen Gross:
- A series to fix a regression when running as a fully virtualized
guest on an old Xen hypervisor not supporting PV interrupt callbacks
for HVM guests.
- A patch to add support to query Xen resource sizes (setting was
possible already) from user mode.
* tag 'for-linus-5.11-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
x86/xen: Fix xen_hvm_smp_init() when vector callback not available
x86/xen: Don't register Xen IPIs when they aren't going to be used
x86/xen: Add xen_no_vector_callback option to test PCI INTX delivery
xen: Set platform PCI device INTX affinity to CPU0
xen: Fix event channel callback via INTX/GSI
xen/privcmd: allow fetching resource sizes
Pull iommu fixes from Will Deacon:
"Three IOMMU fixes for -rc4.
The main one is a change to the Intel IOMMU driver to fix the handling
of unaligned addresses when invalidating the TLB.
The fix itself is a bit ugly (the caller does a bunch of shifting
which is then effectively undone later in the callchain), but Lu has
patches to clean all of this up in 5.12.
Summary:
- Fix address alignment handling for VT-D TLB invalidation
- Enable workarounds for buggy Qualcomm firmware on two more SoCs
- Drop duplicate #include"
* tag 'iommu-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
iommu/vt-d: Fix duplicate included linux/dma-map-ops.h
iommu: arm-smmu-qcom: Add sdm630/msm8998 compatibles for qcom quirks
iommu/vt-d: Fix unaligned addresses for intel_flush_svm_range_dev()
Pull drm nouveau ampere display support from Dave Airlie:
"Ben has requested if we can include Ampere modesetting support under
fixes, it's for new GPUs and shouldn't affect existing hardware.
It's a bit bigger than just adding a PCI ID, but It has no effect on
older GPUs"
* tag 'topic/nouveau-ampere-modeset-2021-01-15' of git://anongit.freedesktop.org/drm/drm:
drm/nouveau/disp/ga10[24]: initial support
drm/nouveau/dmaobj/ga10[24]: initial support
drm/nouveau/i2c/ga10[024]: initial support
drm/nouveau/gpio/ga10[024]: initial support
drm/nouveau/bar/ga10[024]: initial support
drm/nouveau/mmu/ga10[024]: initial support
drm/nouveau/timer/ga10[024]: initial support
drm/nouveau/fb/ga10[024]: initial support
drm/nouveau/imem/ga10[024]: initial support
drm/nouveau/privring/ga10[024]: initial support
drm/nouveau/mc/ga10[024]: initial support
drm/nouveau/devinit/ga10[024]: initial support
drm/nouveau/bios/ga10[024]: initial support
drm/nouveau/pci/ga10[024]: initial support
drm/nouveau/core: recognise GA10[024]
Right now io_flush_timeouts() checks if the current number of events
is equal to ->timeout.target_seq, but this will miss some timeouts if
there have been more than 1 event added since the last time they were
flushed (possible in io_submit_flush_completions(), for example). Fix
it by recording the last sequence at which timeouts were flushed so
that the number of events seen can be compared to the number of events
needed without overflow.
Signed-off-by: Marcelo Diop-Gonzalez <marcelo827@gmail.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The Ux500 platforms have some memory carveouts set aside for
communicating with the modem and for the initial secure software
(ISSW). These areas are protected by the memory controller
and will result in an external abort if accessed like common
read/write memory.
On the legacy boot loaders, these were set aside by using
cmdline arguments such as this:
mem=96M@0 mem_mtrace=15M@96M mem_mshared=1M@111M
mem_modem=16M@112M mali.mali_mem=32M@128M mem=96M@160M
hwmem=127M@256M mem_issw=1M@383M mem_ram_console=1M@384M
mem=638M@385M
Reserve the relevant areas in the device tree instead. The
"mali", "hwmem", "mem_ram_console" and the trailing 1MB at the
end of the memory reservations in the list are not relevant for
the upstream kernel as these are nowadays replaced with
upstream technologies such as CMA. The modem and ISSW
reservations are necessary.
This was manifested in a bug that surfaced in response to
commit 7fef431be9 ("mm/page_alloc: place pages to tail in __free_pages_core()")
which changes the behaviour of memory allocations
in such a way that the platform will sooner run into these
dangerous areas, with "Unhandled fault: imprecise external
abort (0xc06) at 0xb6fd83dc" or similar: the real reason
turns out to be that the PTE is pointing right into one of
the reserved memory areas. We were just lucky until now.
We need to augment the DB8500 and DB8520 SoCs similarly
and also create a new include for the DB9500 used in the
Snowball since this does not have a modem and thus does
not need the modem memory reservation, albeit it needs
the ISSW reservation.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Cc: stable@vger.kernel.org
Cc: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20201213225517.3838501-1-linus.walleij@linaro.org'
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
i.MX fixes for 5.11:
- Fix backlight pwm on imx6qdl-kontron-samx6i which is lost from
#pwm-cells conversion.
- Fix duplicated bus node name for i.MX8MN SoC.
- Fix reset register offset on LS1028A SoC.
- Rename MMC node aliases for imx6q-tbs2910 to keep the MMC device
index consistent with previous kernel version.
- Selecting ARM_GIC_V3 on non-CP15 processors to fix one build failure
with i.MX8M SoC driver.
- Fix typos with status property on imx6qdl-kontron-samx6i board.
- Fix duplicated regulator-name on imx6qdl-gw52xx board.
* tag 'imx-fixes-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux:
ARM: dts: imx6qdl-gw52xx: fix duplicate regulator naming
ARM: dts: imx6qdl-kontron-samx6i: fix i2c_lcd/cam default status
ARM: imx: fix imx8m dependencies
ARM: dts: tbs2910: rename MMC node aliases
arm64: dts: ls1028a: fix the offset of the reset register
arm64: dts: imx8mn: Fix duplicate node name
ARM: dts: imx6qdl-kontron-samx6i: fix pwms for lcd-backlight
Link: https://lore.kernel.org/r/20210112131224.GI28365@dragon
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Occasionally, we are seeing some SuperSpeed devices resumes right after
being directed to U3. This commits add 500us delay to ensure LFPS
detector is disabled before sending ACK to firmware.
[ 16.099363] tegra-xusb 70090000.usb: entering ELPG
[ 16.104343] tegra-xusb 70090000.usb: 2-1 isn't suspended: 0x0c001203
[ 16.114576] tegra-xusb 70090000.usb: not all ports suspended: -16
[ 16.120789] tegra-xusb 70090000.usb: entering ELPG failed
The register write passes through a few flop stages of 32KHz clock domain.
NVIDIA ASIC designer reviewed RTL and suggests 500us delay.
Cc: stable@vger.kernel.org
Signed-off-by: JC Kuo <jckuo@nvidia.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20210115161907.2875631-3-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Once the command ring doorbell is rung the xHC controller will parse all
command TRBs on the command ring that have the cycle bit set properly.
If the driver just started writing the next command TRB to the ring when
hardware finished the previous TRB, then HW might fetch an incomplete TRB
as long as its cycle bit set correctly.
A command TRB is 16 bytes (128 bits) long.
Driver writes the command TRB in four 32 bit chunks, with the chunk
containing the cycle bit last. This does however not guarantee that
chunks actually get written in that order.
This was detected in stress testing when canceling URBs with several
connected USB devices.
Two consecutive "Set TR Dequeue pointer" commands got queued right
after each other, and the second one was only partially written when
the controller parsed it, causing the dequeue pointer to be set
to bogus values. This was seen as error messages:
"Mismatch between completed Set TR Deq Ptr command & xHCI internal state"
Solution is to add a write memory barrier before writing the cycle bit.
Cc: <stable@vger.kernel.org>
Tested-by: Ross Zwisler <zwisler@google.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20210115161907.2875631-2-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
After converting am335x to probe devices with simple-pm-bus I noticed
that we are not passing auxdata for of_platform_populate() like we do
with simple-bus.
While device tree using SoCs should no longer need platform data, there
are still quite a few drivers that still need it as can be seen with
git grep OF_DEV_AUXDATA. We want to have simple-pm-bus be usable as a
replacement for simple-bus also for cases where OF_DEV_AUXDATA is still
needed.
Let's fix the issue by passing auxdata as platform data to simple-pm-bus.
That way the SoCs needing this can pass the auxdata with OF_DEV_AUXDATA.
And let's pass the auxdata for omaps to fix the issue for am335x.
As an alternative solution, adding simple-pm-bus handling directly to
drivers/of/platform.c was considered, but we would still need simple-pm-bus
device driver. So passing auxdata as platform data seems like the simplest
solution.
Fixes: 5a230524f8 ("ARM: dts: Use simple-pm-bus for genpd for am3 l4_wkup")
Acked-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Tony Lindgren <tony@atomide.com>
We now depend on SIMPLE_PM_BUS for probing devices. While we have it
selected in omap2plus_defconfig, custom configs can fail if it's missing.
As SIMPLE_PM_BUS depends on OF and PM, we must now select PM in Kconfig.
We have already OF selected by ARCH_MULTIPLATFORM.
Let's also drop the earlier PM dependency entries as suggested by
Geert Uytterhoeven <geert@linux-m68k.org>.
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reported-by: Matti Vaittinen <matti.vaittinen@fi.rohmeurope.com>
Fixes: 5a230524f8 ("ARM: dts: Use simple-pm-bus for genpd for am3 l4_wkup")
Signed-off-by: Tony Lindgren <tony@atomide.com>
We get suspcious RCU usage splats with cpuidle in several places in
omap_enter_idle_coupled() with the kernel debug options enabled:
RCU used illegally from extended quiescent state!
...
(_raw_spin_lock_irqsave)
(omap_enter_idle_coupled+0x17c/0x2d8)
(omap_enter_idle_coupled)
(cpuidle_enter_state)
(cpuidle_enter_state_coupled)
(cpuidle_enter)
Let's use RCU_NONIDLE to suppress these splats. Things got changed around
with commit 1098582a0f ("sched,idle,rcu: Push rcu_idle deeper into the
idle path") that started triggering these warnings.
For the tick_broadcast related calls, ideally we'd just switch over to
using CPUIDLE_FLAG_TIMER_STOP for omap_enter_idle_coupled() to have the
generic cpuidle code handle the tick_broadcast related calls for us and
then just drop the tick_broadcast calls here.
But we're currently missing the call in the common cpuidle code for
tick_broadcast_enable() that CPU1 hotplug needs as described in earlier
commit 50d6b3cf94 ("ARM: OMAP2+: fix lack of timer interrupts on CPU1
after hotplug").
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Russell King <rmk+kernel@armlinux.org.uk>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Tony Lindgren <tony@atomide.com>
kmsg_dump_get_buffer() uses @syslog to determine if the syslog
prefix should be written to the buffer. However, when calculating
the maximum number of records that can fit into the buffer, it
always counts the bytes from the syslog prefix.
Use @syslog when calculating the maximum number of records that can
fit into the buffer.
Fixes: e2ae715d66 ("kmsg - kmsg_dump() use iterator to receive log buffer content")
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20210113164413.1599-1-john.ogness@linutronix.de
Counting text lines in a record simply involves counting the number
of newline characters (+1). However, it is searching the full data
block for newline characters, even though the text data can be (and
often is) a subset of that area. Since the extra area in the data
block was never initialized, the result is that extra newlines may
be seen and counted.
Restrict newline searching to the text data length.
Fixes: b6cf8b3f33 ("printk: add lockless ringbuffer")
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20210113144234.6545-1-john.ogness@linutronix.de
The kbuild test robot reports that when building with W=1, GCC will warn
for a couple of missing prototypes in syscall.c:
| arch/arm64/kernel/syscall.c:157:6: warning: no previous prototype for 'do_el0_svc' [-Wmissing-prototypes]
| 157 | void do_el0_svc(struct pt_regs *regs)
| | ^~~~~~~~~~
| arch/arm64/kernel/syscall.c:164:6: warning: no previous prototype for 'do_el0_svc_compat' [-Wmissing-prototypes]
| 164 | void do_el0_svc_compat(struct pt_regs *regs)
| | ^~~~~~~~~~~~~~~~~
While this isn't a functional problem, as a general policy we should
include the prototype for functions wherever possible to catch any
accidental divergence between the prototype and implementation. Here we
can easily include <asm/exception.h>, so let's do so.
While there are a number of warnings elsewhere and some warnings enabled
under W=1 are of questionable benefit, this change helps to make the
code more robust as it evolved and reduces the noise somewhat, so it
seems worthwhile.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reported-by: kernel test robot <lkp@intel.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/202101141046.n8iPO3mw-lkp@intel.com
Link: https://lore.kernel.org/r/20210114124812.17754-1-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Jonathan writes:
First set of IIO and counter fixes for the 5.11 cycle.
Counter fixes
ti,eqep
- Remove floor interface as the device always wraps to 0.
IIO
adi,ad5504
- Fix inverted power state control.
bosch,bma255
- Fix a difference in part naming between dt-binding doc and the driver.
melexis,mlx90632
- Add a delay after reset command.
semtech,sx9310
- Off by one error.
- Fix an issue due to need to skip a value in a power of 2 series.
st,st_sensors
- Fix a possible infinite loop if data read is not define or reading it fails.
ti,am335x
- Remove a left over iio_kfifo_free after managed allocation conversion.
* tag 'iio-fixes-for-5.11a' of https://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio:
iio: sx9310: Fix semtech,avg-pos-strength setting when > 16
iio: common: st_sensors: fix possible infinite loop in st_sensors_irq_thread
iio: ad5504: Fix setting power-down state
counter:ti-eqep: remove floor
drivers: iio: temperature: Add delay after the addressed reset command in mlx90632.c
iio: adc: ti_am335x_adc: remove omitted iio_kfifo_free()
dt-bindings: iio: accel: bma255: Fix bmc150/bmi055 compatible
iio: sx9310: Off by one in sx9310_read_thresh()
Pull drm fixes from Dave Airlie:
"Regular fixes for rc4, a bunch of fixes across i915, amdgpu and
nouveau here, along with a couple of TTM fixes, and dma-buf and one
core pageflip/modifier interaction fix.
One notable i915 fix is a HSW GT1 regression fix that has been
outstanding for quite a while. (Thanks to Matt Turner for kicking
Intel into getting it fixed).
dma-buf:
- Fix a memory leak in CMAV heap
core:
- Fix format check for legacy pageflips
ttm:
- Pass correct address to dma_mapping_error()
- Use mutex in pool shrinker
i915:
- Allow the sysadmin to override security mitigations
- Restore clear-residual mitigations for ivb/byt
- Limit VFE threads based on GT
- GVT: fix vfio edid and full display detection
- Fix DSI DSC power refcounting
- Fix LPT CPU mode backlight takeover
- Disable RPM wakeref assertions during driver shutdown
- Fix DSI sequence sleeps
amdgpu:
- Update repo location in MAINTAINERS
- Add some new renoir PCI IDs
- Revert CRC UAPI changes
- Revert OLED display fix which cases clocking problems for some systems
- Misc vangogh fixes
- GFX fix for sienna cichlid
- DCN1.0 fix for pipe split
- Fix incorrect PSP command
amdkfd:
- Fix possible out of bounds read in vcrat creation
nouveau:
- irq handling fix
- expansion ROM fix
- hw init dpcd disable
- aux semaphore owner field fix
- vram heap sizing fix
- notifier at 0 is valid fix"
* tag 'drm-fixes-2021-01-15' of git://anongit.freedesktop.org/drm/drm: (37 commits)
drm/nouveau/kms/nv50-: fix case where notifier buffer is at offset 0
drm/nouveau/mmu: fix vram heap sizing
drm/nouveau/i2c/gm200: increase width of aux semaphore owner fields
drm/nouveau/i2c/gk110-: disable hw-initiated dpcd reads
drm/nouveau/i2c/gk110: split out from i2c/gk104
drm/nouveau/privring: ack interrupts the same way as RM
drm/nouveau/bios: fix issue shadowing expansion ROMs
drm/amd/display: Fix to be able to stop crc calculation
Revert "drm/amd/display: Expose new CRC window property"
Revert "drm/amdgpu/disply: fix documentation warnings in display manager"
Revert "drm/amd/display: Fix unused variable warning"
drm/amdgpu: set power brake sequence
drm/amdgpu: add new device id for Renior
drm/amdgpu: add green_sardine device id (v2)
drm/amdgpu: fix vram type and bandwidth error for DDR5 and DDR4
drm/amdgpu/gfx10: add updated GOLDEN_TSC_COUNT_UPPER/LOWER register offsets for VGH
drm/amdkfd: Fix out-of-bounds read in kdf_create_vcrat_image_cpu()
Revert "drm/amd/display: Fixed Intermittent blue screen on OLED panel"
drm/amd/display: disable dcn10 pipe split by default
drm/amd/display: Add a missing DCN3.01 API mapping
...
Pull bootconfig fix from Steven Rostedt:
"Update bootconf scripts for tracing_on option
The tracing_on option is supported by bootconfig entries, but the
scripts to convert from ftrace to a bootconfig and back were not
updated"
* tag 'trace-v5.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tools/bootconfig: Add tracing_on support to helper scripts
While testing live partition mobility, we have observed occasional crashes
of the Linux partition. What we've seen is that during the live migration,
for specific configurations with large amounts of memory, slow network
links, and workloads that are changing memory a lot, the partition can end
up being suspended for 30 seconds or longer. This resulted in the following
scenario:
CPU 0 CPU 1
------------------------------- ----------------------------------
scsi_queue_rq migration_store
-> blk_mq_start_request -> rtas_ibm_suspend_me
-> blk_add_timer -> on_each_cpu(rtas_percpu_suspend_me
_______________________________________V
|
V
-> IPI from CPU 1
-> rtas_percpu_suspend_me
-> __rtas_suspend_last_cpu
-- Linux partition suspended for > 30 seconds --
-> for_each_online_cpu(cpu)
plpar_hcall_norets(H_PROD
-> scsi_dispatch_cmd
-> scsi_times_out
-> scsi_abort_command
-> queue_delayed_work
-> ibmvfc_queuecommand_lck
-> ibmvfc_send_event
-> ibmvfc_send_crq
- returns H_CLOSED
<- returns SCSI_MLQUEUE_HOST_BUSY
-> __blk_mq_requeue_request
-> scmd_eh_abort_handler
-> scsi_try_to_abort_cmd
- returns SUCCESS
-> scsi_queue_insert
Normally, the SCMD_STATE_COMPLETE bit would protect against the command
completion and the timeout, but that doesn't work here, since we don't
check that at all in the SCSI_MLQUEUE_HOST_BUSY path.
In this case we end up calling scsi_queue_insert on a request that has
already been queued, or possibly even freed, and we crash.
The patch below simply increases the default I/O timeout to avoid this race
condition. This is also the timeout value that nearly all IBM SAN storage
recommends setting as the default value.
Link: https://lore.kernel.org/r/1610463998-19791-1-git-send-email-brking@linux.vnet.ibm.com
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Using global sp_in_global directly to fix the following warning,
arch/riscv/kernel/stacktrace.c:31:3: warning: ‘register’ is not at beginning of declaration [-Wold-style-declaration]
31 | const register unsigned long current_sp = sp_in_global;
| ^~~~~
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
amd-drm-fixes-5.11-2021-01-14:
amdgpu:
- Update repo location in MAINTAINERS
- Add some new renoir PCI IDs
- Revert CRC UAPI changes
- Revert OLED display fix which cases clocking problems for some systems
- Misc vangogh fixes
- GFX fix for sienna cichlid
- DCN1.0 fix for pipe split
- Fix incorrect PSP command
amdkfd:
- Fix possible out of bounds read in vcrat creation
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210114201354.3998-1-alexander.deucher@amd.com
UEFI/RM no longer use IED scripts from the VBIOS, though they appear to
have been updated for use by the x86 VBIOS code, so we should be able to
continue using them for the moment.
Unfortunately, we require some hacks to do so, as the BeforeLinkTraining
IED script became a pointer to an array of scripts instead, without a
revbump of the relevant tables.
There's also some changes to SOR clock divider fiddling, which are
hopefully correct enough that things work as they should.
AFAIK, GA100 shouldn't have display, so it hasn't been added.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Fortunately, all the interrupts we need to bring up basic display support
are contained in a single leaf register, allowing this basic (but hackish)
implementation.
There's a bunch more invasive patches to come implementing all this in a
better/more complete way, but trying to get a minimal series out first.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
VPLL regs changed a bit. There's more stuff to do around these, but it's
less invasive to stick those changes into disp for now.
None of that belongs here anymore anyhow - fix that someday.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Forcing PRAMIN-shadowing off for GA100, as it requires display, and we don't
know if/where the fuse register for detecting its presence is.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
GA100 hidden behind a module option, as it's not been as well verified
since initial bring-up and may need additional changes.
There's no display anyway, so this can wait for a bit.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This issue has generally been covered up by the presence of additional
expansion ROMs after the ones we're interested in, with header fetches
of subsequent images loading enough of the ROM to hide the issue.
Noticed on GA102, which lacks a type 0x70 image compared to TU102,.
[ 906.364197] nouveau 0000:09:00.0: bios: 00000000: type 00, 65024 bytes
[ 906.381205] nouveau 0000:09:00.0: bios: 0000fe00: type 03, 91648 bytes
[ 906.405213] nouveau 0000:09:00.0: bios: 00026400: type e0, 22016 bytes
[ 906.410984] nouveau 0000:09:00.0: bios: 0002ba00: type e0, 366080 bytes
vs
[ 22.961901] nouveau 0000:09:00.0: bios: 00000000: type 00, 60416 bytes
[ 22.984174] nouveau 0000:09:00.0: bios: 0000ec00: type 03, 71168 bytes
[ 23.010446] nouveau 0000:09:00.0: bios: 00020200: type e0, 48128 bytes
[ 23.028220] nouveau 0000:09:00.0: bios: 0002be00: type e0, 140800 bytes
[ 23.080196] nouveau 0000:09:00.0: bios: 0004e400: type 70, 7168 bytes
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Pull NVMe fixes from Christoph:
"nvme fixes for 5.11:
- don't initialize hwmon for discover controllers (Sagi Grimberg)
- fix iov_iter handling in nvme-tcp (Sagi Grimberg)
- fix a preempt warning in nvme-tcp (Sagi Grimberg)
- fix a possible NULL pointer dereference in nvme (Israel Rukshin)"
* tag 'nvme-5.11-2021-01-14' of git://git.infradead.org/nvme:
nvme: don't intialize hwmon for discovery controllers
nvme-tcp: fix possible data corruption with bio merges
nvme-tcp: Fix warning with CONFIG_DEBUG_PREEMPT
nvmet-rdma: Fix NULL deref when setting pi_enable and traddr INADDR_ANY
Pull kselftest fixes from Shuah Khan:
"One single fix to skip BPF selftests by default.
BPF selftests have a hard dependency on cutting edge versions of tools
in the BPF ecosystem including LLVM.
Skipping BPF allows by default will make it easier for users
interested in running kselftest as a whole. Users can include BPF in
Kselftest build by via SKIP_TARGETS variable"
* tag 'linux-kselftest-fixes-5.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests: Skip BPF seftests by default
Pull networking fixes from Jakub Kicinski:
"We have a few fixes for long standing issues, in particular Eric's fix
to not underestimate the skb sizes, and my fix for brokenness of
register_netdevice() error path. They may uncover other bugs so we
will keep an eye on them. Also included are Willem's fixes for
kmap(_atomic).
Looking at the "current release" fixes, it seems we are about one rc
behind a normal cycle. We've previously seen an uptick of "people had
run their test suites" / "humans actually tried to use new features"
fixes between rc2 and rc3.
Summary:
Current release - regressions:
- fix feature enforcement to allow NETIF_F_HW_TLS_TX if IP_CSUM &&
IPV6_CSUM
- dcb: accept RTM_GETDCB messages carrying set-like DCB commands if
user is admin for backward-compatibility
- selftests/tls: fix selftests build after adding ChaCha20-Poly1305
Current release - always broken:
- ppp: fix refcount underflow on channel unbridge
- bnxt_en: clear DEFRAG flag in firmware message when retry flashing
- smc: fix out of bound access in the new netlink interface
Previous releases - regressions:
- fix use-after-free with UDP GRO by frags
- mptcp: better msk-level shutdown
- rndis_host: set proper input size for OID_GEN_PHYSICAL_MEDIUM
request
- i40e: xsk: fix potential NULL pointer dereferencing
Previous releases - always broken:
- skb frag: kmap_atomic fixes
- avoid 32 x truesize under-estimation for tiny skbs
- fix issues around register_netdevice() failures
- udp: prevent reuseport_select_sock from reading uninitialized socks
- dsa: unbind all switches from tree when DSA master unbinds
- dsa: clear devlink port type before unregistering slave netdevs
- can: isotp: isotp_getname(): fix kernel information leak
- mlxsw: core: Thermal control fixes
- ipv6: validate GSO SKB against MTU before finish IPv6 processing
- stmmac: use __napi_schedule() for PREEMPT_RT
- net: mvpp2: remove Pause and Asym_Pause support
Misc:
- remove from MAINTAINERS folks who had been inactive for >5yrs"
* tag 'net-5.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (58 commits)
mptcp: fix locking in mptcp_disconnect()
net: Allow NETIF_F_HW_TLS_TX if IP_CSUM && IPV6_CSUM
MAINTAINERS: dccp: move Gerrit Renker to CREDITS
MAINTAINERS: ipvs: move Wensong Zhang to CREDITS
MAINTAINERS: tls: move Aviad to CREDITS
MAINTAINERS: ena: remove Zorik Machulsky from reviewers
MAINTAINERS: vrf: move Shrijeet to CREDITS
MAINTAINERS: net: move Alexey Kuznetsov to CREDITS
MAINTAINERS: altx: move Jay Cliburn to CREDITS
net: avoid 32 x truesize under-estimation for tiny skbs
nt: usb: USB_RTL8153_ECM should not default to y
net: stmmac: fix taprio configuration when base_time is in the past
net: stmmac: fix taprio schedule configuration
net: tip: fix a couple kernel-doc markups
net: sit: unregister_netdevice on newlink's error path
net: stmmac: Fixed mtu channged by cache aligned
cxgb4/chtls: Fix tid stuck due to wrong update of qid
i40e: fix potential NULL pointer dereferencing
net: stmmac: use __napi_schedule() for PREEMPT_RT
can: mcp251xfd: mcp251xfd_handle_rxif_one(): fix wrong NULL pointer check
...
ieee80211_tx_h_select_key drops any non-mgmt packets without a key when
encryption is used. This is wrong for nulldata packets that can't be
encrypted and are sent out for probing clients and indicating 4-address
mode.
Reported-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Fixes: a0761a3017 ("mac80211: drop data frames without key on encrypted links")
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20201218191525.1168-1-nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This fixes strlen mismatch problems happening in some .write callbacks
of debugfs.
When trying to configure airtime_flags in debugfs, an error appeared:
ash: write error: Invalid argument
The error is returned from kstrtou16() since a wrong length makes it
miss the real end of input string. To fix this, use count as the string
length, and set proper end of string for a char buffer.
The debug print is shown - airtime_flags_write: count = 2, len = 8,
where the actual length is 2, but "len = strlen(buf)" gets 8.
Also cleanup the other similar cases for the sake of consistency.
Signed-off-by: Sujuan Chen <sujuan.chen@mediatek.com>
Signed-off-by: Ryder Lee <ryder.lee@mediatek.com>
Signed-off-by: Shayne Chen <shayne.chen@mediatek.com>
Link: https://lore.kernel.org/r/20210112032028.7482-1-shayne.chen@mediatek.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The power-down mask of the ad5504 is actually a power-up mask. Meaning if
a bit is set the corresponding channel is powered up and if it is not set
the channel is powered down.
The driver currently has this the wrong way around, resulting in the
channel being powered up when requested to be powered down and vice versa.
Fixes: 3bbbf150ff ("staging:iio:dac:ad5504: Use strtobool for boolean values")
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Acked-by: Alexandru Ardelean <alexandru.ardelean@analog.com>
Link: https://lore.kernel.org/r/20201209104649.5794-1-lars@metafoo.de
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
The bmc150-accel-i2c.c driver has an "_accel" suffix for the
compatibles of BMC150 and BMI055. This is necessary because BMC150
contains both accelerometer (bosch,bmc150_accel) and magnetometer
(bosch,bmc150_magn) and therefore "bosch,bmc150" would be ambiguous.
However, the binding documentation suggests using "bosch,bmc150".
Add the "_accel" suffix for BMC150 and BMI055 so the binding docs
match what is expected by the driver.
Fixes: 6259551cf1 ("iio: accel: bmc150-accel: Add DT bindings")
Signed-off-by: Stephan Gerhold <stephan@gerhold.net>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Rob Herring <robh@kernel.org>
Cc: Linus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/r/20201202083551.7753-1-stephan@gerhold.net
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Discovery controllers usually don't support smart log page command.
So when we connect to the discovery controller we see this warning:
nvme nvme0: Failed to read smart log (error 24577)
nvme nvme0: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 192.168.123.1:8009
nvme nvme0: Removing ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"
Introduce a new helper to understand if the controller is a discovery
controller and use this helper to skip nvme_init_hwmon (also use it in
other places that we check if the controller is a discovery controller).
Fixes: 400b6a7b13 ("nvme: Add hardware monitoring support")
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
When a bio merges, we can get a request that spans multiple
bios, and the overall request payload size is the sum of
all bios. When we calculate how much we need to send
from the existing bio (and bvec), we did not take into
account the iov_iter byte count cap.
Since multipage bvecs support, bvecs can split in the middle
which means that when we account for the last bvec send we
should also take the iov_iter byte count cap as it might be
lower than the last bvec size.
Reported-by: Hao Wang <pkuwangh@gmail.com>
Fixes: 3f2304f8c6 ("nvme-tcp: add NVMe over TCP host driver")
Tested-by: Hao Wang <pkuwangh@gmail.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
We shouldn't call smp_processor_id() in a preemptible
context, but this is advisory at best, so instead
call __smp_processor_id().
Fixes: db5ad6b7f8 ("nvme-tcp: try to send request in queue_rq context")
Reported-by: Or Gerlitz <gerlitz.or@gmail.com>
Reported-by: Yi Zhang <yi.zhang@redhat.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
When setting port traddr to INADDR_ANY, the listening cm_id->device
is NULL. The associate IB device is known only when a connect request
event arrives, so checking T10-PI device capability should be done
at this stage.
Fixes: b09160c399 ("nvmet-rdma: add metadata/T10-PI support")
Signed-off-by: Israel Rukshin <israelr@nvidia.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Pull HID fixes from Jiri Kosina:
- memory leak fix for Wacom driver (Ping Cheng)
- various trivial small fixes, cleanups and device ID additions
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
HID: logitech-hidpp: Add product ID for MX Ergo in Bluetooth mode
HID: Ignore battery for Elan touchscreen on ASUS UX550
HID: logitech-dj: add the G602 receiver
HID: wiimote: remove h from printk format specifier
HID: uclogic: remove h from printk format specifier
HID: sony: select CONFIG_CRC32
HID: sfh: fix address space confusion
HID: multitouch: Enable multi-input for Synaptics pointstick/touchpad device
HID: wacom: Fix memory leakage caused by kfifo_alloc
[Why]
Find out when we try to disable CRC calculation, crc generation is still
enabled. Main reason is that dc_stream_configure_crc() will never get
called when the source is AMDGPU_DM_PIPE_CRC_SOURCE_NONE.
[How]
Add checking condition that when source is
AMDGPU_DM_PIPE_CRC_SOURCE_NONE, we should also call
dc_stream_configure_crc() to disable crc calculation.
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Wayne Lin <Wayne.Lin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cited patch below blocked the TLS TX device offload unless HW_CSUM
is set. This broke devices that use IP_CSUM && IP6_CSUM.
Here we fix it.
Note that the single HW_TLS_TX feature flag indicates support for
both IPv4/6, hence it should still be disabled in case only one of
(IP_CSUM | IPV6_CSUM) is set.
Fixes: ae0b04b238 ("net: Disable NETIF_F_HW_TLS_TX when HW_CSUM is disabled")
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reported-by: Rohit Maheshwari <rohitm@chelsio.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Link: https://lore.kernel.org/r/20210114151215.7061-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
To make maintainers' lives easier we're trying to nudge people
towards CCing all the relevant folks on patches, in an attempt
to improve review rate. We have a check in patchwork which validates
the CC list against get_maintainers.pl. It's a little awkward, however,
to force people to CC maintainers who we haven't seen on the mailing
list for years. This series removes from maintainers folks who didn't
provide any tag (incl. authoring a patch) in the last 5 years.
To ensure reasonable signal to noise ratio we only considered
MAINTAINERS entries which had more than 100 patches fall under
them in that time period.
All this is purely a process-greasing exercise, I hope nobody
sees this series as an affront. Most folks are moved to CREDITS,
a couple entries are simply removed.
The following inactive maintainers are kept, because they indicated
the intention to come back in the near future:
- Veaceslav Falico (bonding)
- Christian Benvenuti (Cisco drivers)
- Felix Fietkau (mtk-eth)
- Mirko Linder (skge/sky2)
Patches in this series contain report from a script which did
the analysis. Big thanks to Jonathan Corbet for help and writing
the script (although I feel like I used it differently than Jon
may have intended ;)). The output format is thus:
Subsystem $name
Changes $reviewed / $total ($percent%) // how many changes to the subsystem had at least one ack/review
Last activity: $date_of_most_recent_patch
$maintainer/reviewer1:
Author $last_commit_authored_by_the_person $how_many_in_5yrs
Committer $last_committed $how_many
Tags $last_tag_like_review_signoff_etc $how_many
$maintainer/reviewer2:
Author $last_commit_authored_by_the_person $how_many_in_5yrs
Committer $last_committed $how_many
Tags $last_tag_like_review_signoff_etc $how_many
Top reviewers: // Top 3 reviewers (who are not listed in MAINTAINERS)
[$count_of_reviews_and_acks]: $email
INACTIVE MAINTAINER $name // maintainer / reviewer who has done nothing in last 5yrs
v2:
- keep Felix and Mirko
Link: https://lore.kernel.org/r/20210114014912.2519931-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
As far as I can tell we haven't heard from Gerrit for roughly
5 years now. DCCP patch would really benefit from some review.
Gerrit was the last maintainer so mark this entry as orphaned.
Subsystem DCCP PROTOCOL
Changes 38 / 166 (22%)
(No activity)
Top reviewers:
[6]: kstewart@linuxfoundation.org
[6]: allison@lohutok.net
[5]: edumazet@google.com
INACTIVE MAINTAINER Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Both virtio net and napi_get_frags() allocate skbs
with a very small skb->head
While using page fragments instead of a kmalloc backed skb->head might give
a small performance improvement in some cases, there is a huge risk of
under estimating memory usage.
For both GOOD_COPY_LEN and GRO_MAX_HEAD, we can fit at least 32 allocations
per page (order-3 page in x86), or even 64 on PowerPC
We have been tracking OOM issues on GKE hosts hitting tcp_mem limits
but consuming far more memory for TCP buffers than instructed in tcp_mem[2]
Even if we force napi_alloc_skb() to only use order-0 pages, the issue
would still be there on arches with PAGE_SIZE >= 32768
This patch makes sure that small skb head are kmalloc backed, so that
other objects in the slab page can be reused instead of being held as long
as skbs are sitting in socket queues.
Note that we might in the future use the sk_buff napi cache,
instead of going through a more expensive __alloc_skb()
Another idea would be to use separate page sizes depending
on the allocated length (to never have more than 4 frags per page)
I would like to thank Greg Thelen for his precious help on this matter,
analysing crash dumps is always a time consuming task.
Fixes: fd11a83dd3 ("net: Pull out core bits of __netdev_alloc_skb and add __napi_alloc_skb")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Link: https://lore.kernel.org/r/20210113161819.1155526-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The Synopsys TSN MAC supports Qbv base times in the past, but only up to a
certain limit. As a result, a taprio qdisc configuration with a small
base time (for example when treating the base time as a simple phase
offset) is not applied by the hardware and silently ignored.
This was observed on an NXP i.MX8MPlus device, but likely affects all
TSN-variants of the MAC.
Fix the issue by making sure the base time is in the future, pushing it by
an integer amount of cycle times if needed. (a similar check is already
done in several other taprio implementations, see for example
drivers/net/ethernet/intel/igc/igc_tsn.c#L116 or
drivers/net/dsa/sja1105/sja1105_ptp.h#L39).
Fixes: b60189e039 ("net: stmmac: Integrate EST with TAPRIO scheduler API")
Signed-off-by: Yannick Vignon <yannick.vignon@nxp.com>
Link: https://lore.kernel.org/r/20210113131557.24651-2-yannick.vignon@oss.nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When configuring a 802.1Qbv schedule through the tc taprio qdisc on an NXP
i.MX8MPlus device, the effective cycle time differed from the requested one
by N*96ns, with N number of entries in the Qbv Gate Control List. This is
because the driver was adding a 96ns margin to each interval of the GCL,
apparently to account for the IPG. The problem was observed on NXP
i.MX8MPlus devices but likely affected all devices relying on the same
configuration callback (dwmac 4.00, 4.10, 5.10 variants).
Fix the issue by removing the margins, and simply setup the MAC with the
provided cycle time value. This is the behavior expected by the user-space
API, as altering the Qbv schedule timings would break standards conformance.
This is also the behavior of several other Ethernet MAC implementations
supporting taprio, including the dwxgmac variant of stmmac.
Fixes: 504723af0d ("net: stmmac: Add basic EST support for GMAC5+")
Signed-off-by: Yannick Vignon <yannick.vignon@nxp.com>
Link: https://lore.kernel.org/r/20210113131557.24651-1-yannick.vignon@oss.nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
A function has a different name between their prototype
and its kernel-doc markup:
../net/tipc/link.c:2551: warning: expecting prototype for link_reset_stats(). Prototype was for tipc_link_reset_stats() instead
../net/tipc/node.c:1678: warning: expecting prototype for is the general link level function for message sending(). Prototype was for tipc_node_xmit() instead
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The GPIO hog node name should match regex '^(hog-[0-9]+|.+-hog(-[0-9]+)?)$',
make it so and fix the following two make dtbs_check warnings:
arch/arm/boot/dts/stm32mp157c-dhcom-picoitx.dt.yaml: hog-usb-port-power: $nodename:0: 'hog-usb-port-power' does not match '^(hog-[0-9]+|.+-hog(-[0-9]+)?)$'
arch/arm/boot/dts/stm32mp153c-dhcom-drc02.dt.yaml: hog-usb-hub: $nodename:0: 'hog-usb-hub' does not match '^(hog-[0-9]+|.+-hog(-[0-9]+)?)$'
Fixes: ac68793f49 ("ARM: dts: stm32: Add DHCOM based PicoITX board")
Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Patrice Chotard <patrice.chotard@st.com>
Cc: Patrick Delaunay <patrick.delaunay@st.com>
Cc: linux-stm32@st-md-mailman.stormreply.com
To: linux-arm-kernel@lists.infradead.org
Signed-off-by: Alexandre Torgue <alexandre.torgue@foss.st.com>
rounddown_pow_of_two() is undefined when the input is 0. Therefore we need
to avoid it in ib_umem_find_best_pgsz and return 0. Otherwise, it could
result in not rejecting an invalid page size which eventually causes a
kernel oops due to the logical inconsistency.
Fixes: 3361c29e92 ("RDMA/umem: Use simpler logic for ib_umem_find_best_pgsz()")
Link: https://lore.kernel.org/r/20210113121703.559778-2-leon@kernel.org
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Clang warns in both mt7615 and mt7915:
drivers/net/wireless/mediatek/mt76/mt7915/mcu.c:271:9: warning: implicit
conversion from enumeration type 'enum mt76_mcuq_id' to different
enumeration type 'enum mt76_txq_id' [-Wenum-conversion]
txq = MT_MCUQ_FWDL;
~ ^~~~~~~~~~~~
drivers/net/wireless/mediatek/mt76/mt7915/mcu.c:278:9: warning: implicit
conversion from enumeration type 'enum mt76_mcuq_id' to different
enumeration type 'enum mt76_txq_id' [-Wenum-conversion]
txq = MT_MCUQ_WA;
~ ^~~~~~~~~~
drivers/net/wireless/mediatek/mt76/mt7915/mcu.c:282:9: warning: implicit
conversion from enumeration type 'enum mt76_mcuq_id' to different
enumeration type 'enum mt76_txq_id' [-Wenum-conversion]
txq = MT_MCUQ_WM;
~ ^~~~~~~~~~
3 warnings generated.
drivers/net/wireless/mediatek/mt76/mt7615/mcu.c:238:9: warning: implicit
conversion from enumeration type 'enum mt76_mcuq_id' to different
enumeration type 'enum mt76_txq_id' [-Wenum-conversion]
qid = MT_MCUQ_WM;
~ ^~~~~~~~~~
drivers/net/wireless/mediatek/mt76/mt7615/mcu.c:240:9: warning: implicit
conversion from enumeration type 'enum mt76_mcuq_id' to different
enumeration type 'enum mt76_txq_id' [-Wenum-conversion]
qid = MT_MCUQ_FWDL;
~ ^~~~~~~~~~~~
2 warnings generated.
Use the proper type for the queue ID variables to fix these warnings.
Additionally, rename the txq variable in mt7915_mcu_send_message to be
more neutral like mt7615_mcu_send_message.
Fixes: e637763b60 ("mt76: move mcu queues to mt76_dev q_mcu array")
Link: https://github.com/ClangBuiltLinux/linux/issues/1229
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Acked-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20201229211548.1348077-1-natechancellor@gmail.com
Arnd found a randconfig that produces the warning:
arch/x86/entry/thunk_64.o: warning: objtool: missing symbol for insn at
offset 0x3e
when building with LLVM_IAS=1 (Clang's integrated assembler). Josh
notes:
With the LLVM assembler not generating section symbols, objtool has no
way to reference this code when it generates ORC unwinder entries,
because this code is outside of any ELF function.
The limitation now being imposed by objtool is that all code must be
contained in an ELF symbol. And .L symbols don't create such symbols.
So basically, you can use an .L symbol *inside* a function or a code
segment, you just can't use the .L symbol to contain the code using a
SYM_*_START/END annotation pair.
Fangrui notes that this optimization is helpful for reducing image size
when compiling with -ffunction-sections and -fdata-sections. I have
observed on the order of tens of thousands of symbols for the kernel
images built with those flags.
A patch has been authored against GNU binutils to match this behavior
of not generating unused section symbols ([1]), so this will
also become a problem for users of GNU binutils once they upgrade to 2.36.
Omit the .L prefix on a label so that the assembler will emit an entry
into the symbol table for the label, with STB_LOCAL binding. This
enables objtool to generate proper unwind info here with LLVM_IAS=1 or
GNU binutils 2.36+.
[ bp: Massage commit message. ]
Reported-by: Arnd Bergmann <arnd@arndb.de>
Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com>
Suggested-by: Borislav Petkov <bp@alien8.de>
Suggested-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20210112194625.4181814-1-ndesaulniers@google.com
Link: https://github.com/ClangBuiltLinux/linux/issues/1209
Link: https://reviews.llvm.org/D93783
Link: https://sourceware.org/binutils/docs/as/Symbol-Names.html
Link: https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=d1bcae833b32f1408485ce69f844dcd7ded093a8 [1]
The initial change breaking the logic is
commit 3d1f08b032 ("mtd: spinand: Use the external ECC engine logic")
It inadvertently dropped proper OOB support while doing something
else.
Shortly later, half of it got re-integrated by
commit 868cbe2a6d ("mtd: spinand: Fix OOB read")
(pointing by the way to a more early change which had nothing to do
with the issue). Problem is, this commit failed to revert the faulty
change entirely and missed the logic handling MTD_OPS_AUTO_OOB
requests.
Let's fix this mess by re-inserting the missing part now.
Fixes: 868cbe2a6d ("mtd: spinand: Fix OOB read")
Reported-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20210107083813.24283-1-miquel.raynal@bootlin.com
The Logitech MX Ergo trackball supports HID++ 4.5 over Bluetooth. Add its
product ID to the table so we can get battery monitoring support.
(The hid-logitech-hidpp driver already recognizes it when connected via
a Unifying Receiver.)
[jkosina@suse.cz: fix whitespace damage]
Signed-off-by: Nicholas Miell <nmiell@gmail.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Fix the error type value for PCI Express uncorrectable non-fatal
error to 0x00000080 and fix the error type value for PCI Express
uncorrectable fatal error to 0x00000100.
See Advanced Configuration and Power Interface Specification,
version 6.2, table "18-409 Error Type Definition".
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Reported-by: Lijian Zhao <lijian.zhao@intel.com>
[ rjw: Subject edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The Acer Apire E5-575T laptop with codec ALC255 has a terrible
background noise comes from internal mic capture. And the jack
sensing dose not work for headset like some other Acer laptops.
This patch limits the internal mic boost on top of the existing
ALC255_FIXUP_ACER_MIC_NO_PRESENCE quirk for Acer Aspire E5-575T.
Signed-off-by: Chris Chiu <chiu@endlessos.org>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20210114082728.74729-1-chiu@endlessos.org
Signed-off-by: Takashi Iwai <tiwai@suse.de>
The KVM/arm64 PSCI relay assumes that SYSTEM_OFF and SYSTEM_RESET should
not return, as dictated by the PSCI spec. However, there is firmware out
there which breaks this assumption, leading to a hyp panic. Make KVM
more robust to broken firmware by allowing these to return.
Signed-off-by: David Brazdil <dbrazdil@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201229160059.64135-1-dbrazdil@google.com
Now that all PMU registers are gated behind a .visibility callback,
remove the other checks against an absent PMU.
Signed-off-by: Marc Zyngier <maz@kernel.org>
It appears that while we are now able to properly hide PMU
registers from the guest when a PMU isn't available (either
because none has been configured, the host doesn't have
the PMU support compiled in, or that the HW doesn't have
one at all), we are still exposing more than we should to
userspace.
Introduce a visibility callback gating all the PMU registers,
which covers both usrespace and guest.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Commit c318840fb2 ("USB: Gadget: dummy-hcd: Fix shift-out-of-bounds
bug") messed up the way dummy-hcd handles requests to turn on the
RESET port feature (I didn't notice that the original switch case
ended with a fallthrough). The call to set_link_state() was
inadvertently removed, as was the code to set the USB_PORT_STAT_RESET
flag when the speed is USB2.
In addition, the original code never checked whether the port was
connected before handling the port-reset request. There was a check
for the port being powered, but it was removed by that commit! In
practice this doesn't matter much because the kernel doesn't try to
reset disconnected ports, but it's still bad form.
This patch fixes these problems by changing the fallthrough to break,
adding back in the missing set_link_state() call, setting the
port-reset status flag, adding a port-is-connected test, and removing
a redundant assignment statement.
Fixes: c318840fb2 ("USB: Gadget: dummy-hcd: Fix shift-out-of-bounds bug")
CC: <stable@vger.kernel.org>
Acked-by: Felipe Balbi <balbi@kernel.org>
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Link: https://lore.kernel.org/r/20210113194510.GA1290698@rowland.harvard.edu
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The address of the GOLDEN_TSC_COUNT_UPPER/GOLDEN_TSC_COUNT_LOWER for
Vnagogh are different from the others.
The offset of the GOLDEN_TSC_COUNT_UPPER for Vangogh is 0x0025 by
calculation.
The offset of the GOLDEN_TSC_COUNT_LOWER for Vangogh is 0x0026 by
calculation.
Signed-off-by: chen gong <curry.gong@amd.com>
Acked-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
KASAN reported a slab-out-of-bounds read of size 1 in
kdf_create_vcrat_image_cpu().
This occurs when, for example, when on an x86_64 with a single NUMA node
because kfd_fill_iolink_info_for_cpu() is a no-op, but afterwards the
sub_type_hdr->length, which is out-of-bounds, is read and multiplied by
entries. Fortunately, entries is 0 in this case so the overall
crat_table->length is still correct.
Check if there were any entries before de-referencing sub_type_hdr which
may be pointing to out-of-bounds memory.
Fixes: b7b6c38529 ("drm/amdkfd: Calculate CPU VCRAT size dynamically (v2)")
Suggested-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Jeremy Cline <jcline@redhat.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
The initial purpose of dcn10 pipe split is to support some high
bandwidth mode which requires dispclk greater than max dispclk. By
initial bring up power measurement data, it showed power consumption is
less with pipe split for dcn block. This could be reason for enable pipe
split by default. By battery life measurement of some Chromebooks,
result shows battery life is longer with pipe split disabled.
[How]
Disable pipe split by default. Pipe split could be still enabled when
required dispclk is greater than max dispclk.
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Hersen Wu <hersenxs.wu@amd.com>
Signed-off-by: Roman Li <Roman.Li@amd.com>
Reviewed-by: Roman Li <Roman.Li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
psp GFX_CTRL_CMD_ID_CONSUME_CMD different for windows and linux,
according to psp, linux cmds are not correct.
v2: only correct GFX_CTRL_CMD_ID_CONSUME_CMD.
Signed-off-by: Victor Zhao <Victor.Zhao@amd.com>
Reviewed-by: Emily.Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Ethernet phy VSC8541-01 on HiFive Unleashed has its reset line
connected to a gpio, so enable GPIO driver's required to reset
the phy.
Signed-off-by: Sagar Shrikant Kadam <sagar.kadam@sifive.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
The GEMGXL_RST line on HiFive Unleashed is pulled low and is
using GPIO number 12. Add these reset-gpio details to dt-node
using which the linux phylib can reset the phy.
Signed-off-by: Sagar Shrikant Kadam <sagar.kadam@sifive.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
HiFive unleashed A00 board has VSC8541-01 ethernet phy, this device is
identified as a Revision B device as described in device identification
registers. In order to use this phy in the unmanaged mode, it requires
a specific reset sequence of logical 0-1-0-1 transition on the NRESET pin
as documented here [1].
Currently, the bootloader (fsbl or u-boot-spl) takes care of the phy reset.
If due to some reason the phy device hasn't received the reset by the prior
stages before the linux macb driver comes into the picture, the MACB mii
bus gets probed but the mdio scan fails and is not even able to read the
phy ID registers. It gives an error message:
"libphy: MACB_mii_bus: probed
mdio_bus 10090000.ethernet-ffffffff: MDIO device at address 0 is missing."
Thus adding the device OUI (Organizationally Unique Identifier) to the phy
device node helps to probe the phy device.
[1]: VSC8541-01 datasheet:
https://www.mouser.com/ds/2/523/Microsemi_VSC8541-01_Datasheet_10496_V40-1148034.pdf
Signed-off-by: Sagar Shrikant Kadam <sagar.kadam@sifive.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Use virtual address instead of physical address when translating
the address to shadow memory by kasan_mem_to_shadow().
Signed-off-by: Nick Hu <nickhu@andestech.com>
Signed-off-by: Nylon Chen <nylon7@andestech.com>
Fixes: b10d6bca87 ("arch, drivers: replace for_each_membock() with for_each_mem_range()")
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Add a test to check that the verifier is able to recognize spilling of
PTR_TO_MEM registers, by reserving a ringbuf buffer, forcing the spill
of a pointer holding the buffer address to the stack, filling it back
in from the stack and writing to the memory area pointed by it.
The patch was partially contributed by CyberArk Software, Inc.
Signed-off-by: Gilad Reti <gilad.reti@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: KP Singh <kpsingh@kernel.org>
Link: https://lore.kernel.org/bpf/20210113053810.13518-2-gilad.reti@gmail.com
Add support for pointer to mem register spilling, to allow the verifier
to track pointers to valid memory addresses. Such pointers are returned
for example by a successful call of the bpf_ringbuf_reserve helper.
The patch was partially contributed by CyberArk Software, Inc.
Fixes: 457f44363a ("bpf: Implement BPF ring buffer and verifier support for it")
Suggested-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Gilad Reti <gilad.reti@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: KP Singh <kpsingh@kernel.org>
Link: https://lore.kernel.org/bpf/20210113053810.13518-1-gilad.reti@gmail.com
TID stuck is seen when there is a race in
CPL_PASS_ACCEPT_RPL/CPL_ABORT_REQ and abort is arriving
before the accept reply, which sets the queue number.
In this case HW ends up sending CPL_ABORT_RPL_RSS to an
incorrect ingress queue.
V1->V2:
- Removed the unused variable len in chtls_set_quiesce_ctrl().
V2->V3:
- As kfree_skb() has a check for null skb, so removed this
check before calling kfree_skb() in func chtls_send_reset().
Fixes: cc35c88ae4 ("crypto : chtls - CPL handler definition")
Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com>
Signed-off-by: Ayush Sawal <ayush.sawal@chelsio.com>
Link: https://lore.kernel.org/r/20210112053600.24590-1-ayush.sawal@chelsio.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently, the function i40e_construct_skb_zc only frees the input xdp
buffer when the output skb is successfully built. On error, the
function i40e_clean_rx_irq_zc does not commit anything for the current
packet descriptor and simply exits the packet descriptor processing
loop, with the plan to restart the processing of this descriptor on
the next invocation. Therefore, on error the ring next-to-clean
pointer should not advance, the xdp i.e. *bi buffer should not be
freed and the current buffer info should not be invalidated by setting
*bi to NULL. Therefore, the *bi should only be set to NULL when the
function i40e_construct_skb_zc is successful, otherwise a NULL *bi
will be dereferenced when the work for the current descriptor is
eventually restarted.
Fixes: 3b4f0b66c2 ("i40e, xsk: Migrate to new MEM_TYPE_XSK_BUFF_POOL")
Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Link: https://lore.kernel.org/r/20210111181138.49757-1-cristian.dumitrescu@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Marc Kleine-Budde says:
====================
pull-request: can 2021-01-13
The first patch is by Oliver Hartkopp for the CAn ISO-TP protocol and fixes a
kernel information leak to userspace.
The last patch is by Qinglang Miao for the mcp251xfd driver and fixes a NULL
pointer check to work on the correct variable.
* tag 'linux-can-fixes-for-5.11-20210113' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
can: mcp251xfd: mcp251xfd_handle_rxif_one(): fix wrong NULL pointer check
can: isotp: isotp_getname(): fix kernel information leak
====================
Link: https://lore.kernel.org/r/20210113212158.925513-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The test robot reported:
drivers/soc/litex/litex_soc_ctrl.c:143:34: warning: unused variable 'litex_soc_ctrl_of_match' [-Wunused-const-variable]
static const struct of_device_id litex_soc_ctrl_of_match[] = {
^
1 warning generated.
As per the random config device tree is not configured causing the
litex_soc_ctrl_of_match match list to not be used. This would usually
mean that we cannot even use this driver as it depends on device tree,
but as we also have COMPILE_TEST configured we allow it.
Fix the warning by surrounding the unused variable with an ifdef
CONFIG_OF.
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Stafford Horne <shorne@gmail.com>
Use of __napi_schedule_irqoff() is not safe with PREEMPT_RT in which
hard interrupts are not disabled while running the threaded interrupt.
Using __napi_schedule() works for both PREEMPT_RT and mainline Linux,
just at the cost of an additional check if interrupts are disabled for
mainline (since they are already disabled).
Similar to the fix done for enetc commit 215602a8d2 ("enetc: use
napi_schedule to be compatible with PREEMPT_RT")
Signed-off-by: Seb Laveze <sebastien.laveze@nxp.com>
Link: https://lore.kernel.org/r/20210112140121.1487619-1-sebastien.laveze@oss.nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Building with the Clang assembler shows the following warning:
arch/x86/kernel/ftrace_64.o: warning: objtool: missing symbol for insn at offset 0x16
The Clang assembler strips section symbols. That ends up giving
objtool's find_func_containing() much more test coverage than normal.
Turns out, find_func_containing() doesn't work so well for overlapping
symbols:
2: 000000000000000e 0 NOTYPE LOCAL DEFAULT 2 fgraph_trace
3: 000000000000000f 0 NOTYPE LOCAL DEFAULT 2 trace
4: 0000000000000000 165 FUNC GLOBAL DEFAULT 2 __fentry__
5: 000000000000000e 0 NOTYPE GLOBAL DEFAULT 2 ftrace_stub
The zero-length NOTYPE symbols are inside __fentry__(), confusing the
rbtree search for any __fentry__() offset coming after a NOTYPE.
Try to avoid this problem by not adding zero-length symbols to the
rbtree. They're rare and aren't needed in the rbtree anyway.
One caveat, this actually might not end up being the right fix.
Non-empty overlapping symbols, if they exist, could have the same
problem. But that would need bigger changes, let's see if we can get
away with the easy fix for now.
Reported-by: Arnd Bergmann <arnd@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Since the commit 5a6c3e11c9 ("ALSA: usb-audio: Add hw constraint for
implicit fb sync"), we apply the hw constraints for the implicit
feedback sync to make the secondary open aligned with the already
opened stream setup. This change assumed that the secondary open is
performed after the first stream has been already set up, and adds the
hw constraints to sync with the first stream's parameters only when
the EP setup for the first stream was confirmed at the open time.
However, most of applications handling the full-duplex operations do
open both playback and capture streams at first, then set up both
streams. This results in skipping the additional hw constraints since
the counter-part stream hasn't been set up yet at the open of the
second stream, and it eventually leads to "incompatible EP" error in
the end.
This patch corrects the behavior by always applying the hw constraints
for the implicit fb sync. The hw constraint rules are defined so that
they check the sync EP dynamically at each invocation, instead. This
covers the concurrent stream setups better and lets the hw refine
calls resolving to the right configuration.
Also this patch corrects a minor error that has existed in the debug
print that isn't built as default.
Fixes: 5a6c3e11c9 ("ALSA: usb-audio: Add hw constraint for implicit fb sync")
Link: https://lore.kernel.org/r/20210111081611.12790-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Pull sound fixes from Takashi Iwai:
"Here are some piled fixes, hopefully the last big one for 5.11.
All changes are device-specific small fixes, and majority of commits
are for ASoC while USB-audio got a bit large changes for addressing
the regression for devices with quirks"
* tag 'sound-5.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (31 commits)
ALSA: hda/hdmi - enable runtime pm for CI AMD display audio
ALSA: firewire-tascam: Fix integer overflow in midi_port_work()
ALSA: fireface: Fix integer overflow in transmit_midi_msg()
ALSA: hda/tegra: fix tegra-hda on tegra30 soc
clk: tegra30: Add hda clock default rates to clock driver
ALSA: doc: Fix reference to mixart.rst
ALSA: usb-audio: Fix implicit feedback sync setup for Pioneer devices
ALSA: usb-audio: Annotate the endpoint index in audioformat
ALSA: usb-audio: Avoid unnecessary interface re-setup
ALSA: usb-audio: Choose audioformat of a counter-part substream
ALSA: usb-audio: Fix the missing endpoints creations for quirks
ALSA: hda/realtek: fix right sounds and mute/micmute LEDs for HP machines
ASoC: AMD Renoir - add DMI entry for Lenovo ThinkPad X395
ASoC: amd: Replacing MSI with Legacy IRQ model
ASoC: AMD Renoir - add DMI entry for Lenovo ThinkPad E14 Gen 2
ASoC: meson: axg-tdm-interface: fix loopback
ASoC: meson: axg-tdmin: fix axg skew offset
ASoC: max98373: don't access volatile registers in bias level off
ASoC: rt711: mutex between calibration and power state changes
ASoC: Intel: haswell: Add missing pm_ops
...
clang static analysis reports this problem
dfs_cache.c:591:2: warning: Argument to kfree() is a constant address
(18446744073709551614), which is not memory allocated by malloc()
kfree(vi);
^~~~~~~~~
In dfs_cache_del_vol() the volume info pointer 'vi' being freed
is the return of a call to find_vol(). The large constant address
is find_vol() returning an error.
Add an error check to dfs_cache_del_vol() similar to the one done
in dfs_cache_update_vol().
Fixes: 54be1f6c1c ("cifs: Add DFS cache routines")
Signed-off-by: Tom Rix <trix@redhat.com>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
CC: <stable@vger.kernel.org> # v5.0+
Signed-off-by: Steve French <stfrench@microsoft.com>
The call state may be changed at any time by the data-ready routine in
response to received packets, so if the call state is to be read and acted
upon several times in a function, READ_ONCE() must be used unless the call
state lock is held.
As it happens, we used READ_ONCE() to read the state a few lines above the
unmarked read in rxrpc_input_data(), so use that value rather than
re-reading it.
Fixes: a158bdd324 ("rxrpc: Fix call timeouts")
Signed-off-by: Baptiste Lepers <baptiste.lepers@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/161046715522.2450566.488819910256264150.stgit@warthog.procyon.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Clang static analysis reports the following:
net/rxrpc/key.c:657:11: warning: Assigned value is garbage or undefined
toksize = toksizes[tok++];
^ ~~~~~~~~~~~~~~~
rxrpc_read() contains two consecutive loops. The first loop calculates the
token sizes and stores the results in toksizes[] and the second one uses
the array. When there is an error in identifying the token in the first
loop, the token is skipped, no change is made to the toksizes[] array.
When the same error happens in the second loop, the token is not skipped.
This will cause the toksizes[] array to be out of step and will overrun
past the calculated sizes.
Fix this by making both loops log a message and return an error in this
case. This should only happen if a new token type is incompletely
implemented, so it should normally be impossible to trigger this.
Fixes: 9a059cd5ca ("rxrpc: Downgrade the BUG() for unsupported token type in rxrpc_read()")
Reported-by: Tom Rix <trix@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Tom Rix <trix@redhat.com>
Link: https://lore.kernel.org/r/161046503122.2445787.16714129930607546635.stgit@warthog.procyon.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The earlier commit to fix runtime PM in case i915 init fails,
introduces a possibility to hit a page fault.
snd_hdac_ext_bus_device_exit() is designed to be called from
dev.release(). Calling it outside device reference counting, is
not safe and may lead to calling the device_exit() function
twice. Additionally, as part of ext_bus_device_init(), the device
is also registered with snd_hdac_device_register(). Thus before
calling device_exit(), the device must be removed from device
hierarchy first.
Fix the issue by rolling back init actions by calling
hdac_device_unregister() and then releasing device with put_device().
This matches with existing code in hdac-ext module.
To complete the fix, add handling for the case where
hda_codec_load_module() returns -ENODEV, and clean up the hdac_ext
resources also in this case.
In future work, hdac-ext interface should be extended to allow clients
more flexibility to handle the life-cycle of individual devices, beyond
just the current snd_hdac_ext_bus_device_remove(), which removes all
devices.
BugLink: https://github.com/thesofproject/linux/issues/2646
Reported-by: Jaroslav Kysela <perex@perex.cz>
Fixes: 6c63c954e1 ("ASoC: SOF: fix a runtime pm issue in SOF when HDMI codec doesn't work")
Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
Reviewed-by: Rander Wang <rander.wang@intel.com>
Reviewed-by: Libin Yang <libin.yang@intel.com>
Reviewed-by: Bard Liao <bard.liao@intel.com>
Link: https://lore.kernel.org/r/20210113150715.3992635-1-kai.vehmanen@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
When lazytime is enabled and an inode is being written due to its
in-memory updated timestamps having expired, either due to a sync() or
syncfs() system call or due to dirtytime_expire_interval having elapsed,
the VFS needs to inform the filesystem so that the filesystem can copy
the inode's timestamps out to the on-disk data structures.
This is done by __writeback_single_inode() calling
mark_inode_dirty_sync(), which then calls ->dirty_inode(I_DIRTY_SYNC).
However, this occurs after __writeback_single_inode() has already
cleared the dirty flags from ->i_state. This causes two bugs:
- mark_inode_dirty_sync() redirties the inode, causing it to remain
dirty. This wastefully causes the inode to be written twice. But
more importantly, it breaks cases where sync_filesystem() is expected
to clean dirty inodes. This includes the FS_IOC_REMOVE_ENCRYPTION_KEY
ioctl (as reported at
https://lore.kernel.org/r/20200306004555.GB225345@gmail.com), as well
as possibly filesystem freezing (freeze_super()).
- Since ->i_state doesn't contain I_DIRTY_TIME when ->dirty_inode() is
called from __writeback_single_inode() for lazytime expiration,
xfs_fs_dirty_inode() ignores the notification. (XFS only cares about
lazytime expirations, and it assumes that i_state will contain
I_DIRTY_TIME during those.) Therefore, lazy timestamps aren't
persisted by sync(), syncfs(), or dirtytime_expire_interval on XFS.
Fix this by moving the call to mark_inode_dirty_sync() to earlier in
__writeback_single_inode(), before the dirty flags are cleared from
i_state. This makes filesystems be properly notified of the timestamp
expiration, and it avoids incorrectly redirtying the inode.
This fixes xfstest generic/580 (which tests
FS_IOC_REMOVE_ENCRYPTION_KEY) when run on ext4 or f2fs with lazytime
enabled. It also fixes the new lazytime xfstest I've proposed, which
reproduces the above-mentioned XFS bug
(https://lore.kernel.org/r/20210105005818.92978-1-ebiggers@kernel.org).
Alternatively, we could call ->dirty_inode(I_DIRTY_SYNC) directly. But
due to the introduction of I_SYNC_QUEUED, mark_inode_dirty_sync() is the
right thing to do because mark_inode_dirty_sync() now knows not to move
the inode to a writeback list if it is currently queued for sync.
Fixes: 0ae45f63d4 ("vfs: add support for a lazytime mount option")
Cc: stable@vger.kernel.org
Depends-on: 5afced3bf2 ("writeback: Avoid skipping inode writeback")
Link: https://lore.kernel.org/r/20210112190253.64307-2-ebiggers@kernel.org
Suggested-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jan Kara <jack@suse.cz>
WARNING: CPU: 0 PID: 8494 at fs/io_uring.c:8717
io_ring_ctx_wait_and_kill+0x4f2/0x600 fs/io_uring.c:8717
Call Trace:
io_uring_release+0x3e/0x50 fs/io_uring.c:8759
__fput+0x283/0x920 fs/file_table.c:280
task_work_run+0xdd/0x190 kernel/task_work.c:140
tracehook_notify_resume include/linux/tracehook.h:189 [inline]
exit_to_user_mode_loop kernel/entry/common.c:174 [inline]
exit_to_user_mode_prepare+0x249/0x250 kernel/entry/common.c:201
__syscall_exit_to_user_mode_work kernel/entry/common.c:291 [inline]
syscall_exit_to_user_mode+0x19/0x50 kernel/entry/common.c:302
entry_SYSCALL_64_after_hwframe+0x44/0xa9
failed io_uring_install_fd() is a special case, we don't do
io_ring_ctx_wait_and_kill() directly but defer it to fput, though still
need to io_disable_sqo_submit() before.
note: it doesn't fix any real problem, just a warning. That's because
sqring won't be available to the userspace in this case and so SQPOLL
won't submit anything.
Reported-by: syzbot+9c9c35374c0ecac06516@syzkaller.appspotmail.com
Fixes: d9d05217cb ("io_uring: stop SQPOLL submit on creator's death")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
general protection fault, probably for non-canonical address
0xdffffc0000000022: 0000 [#1] KASAN: null-ptr-deref
in range [0x0000000000000110-0x0000000000000117]
RIP: 0010:io_ring_set_wakeup_flag fs/io_uring.c:6929 [inline]
RIP: 0010:io_disable_sqo_submit+0xdb/0x130 fs/io_uring.c:8891
Call Trace:
io_uring_create fs/io_uring.c:9711 [inline]
io_uring_setup+0x12b1/0x38e0 fs/io_uring.c:9739
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9
io_disable_sqo_submit() might be called before user rings were
allocated, don't do io_ring_set_wakeup_flag() in those cases.
Reported-by: syzbot+ab412638aeb652ded540@syzkaller.appspotmail.com
Fixes: d9d05217cb ("io_uring: stop SQPOLL submit on creator's death")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
System takes a very long time to suspend after commit 215a22ed31
("ALSA: hda: Refactor codec PM to use direct-complete optimization"):
[ 90.065964] PM: suspend entry (s2idle)
[ 90.067337] Filesystems sync: 0.001 seconds
[ 90.185758] Freezing user space processes ... (elapsed 0.002 seconds) done.
[ 90.188713] OOM killer disabled.
[ 90.188714] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
[ 90.190024] printk: Suspending console(s) (use no_console_suspend to debug)
[ 90.904912] intel_pch_thermal 0000:00:12.0: CPU-PCH is cool [49C], continue to suspend
[ 321.262505] snd_hda_codec_realtek ehdaudio0D0: Unable to sync register 0x2b8000. -5
[ 328.426919] snd_hda_codec_realtek ehdaudio0D0: Unable to sync register 0x2b8000. -5
[ 329.490933] ACPI: EC: interrupt blocked
That commit keeps the codec suspended during the system suspend. However,
mute/micmute LED will clear codec's direct-complete flag by
dpm_clear_superiors_direct_complete().
This doesn't play well with SOF driver. When its runtime resume is
called for system suspend, hda_codec_jack_check() schedules
jackpoll_work which uses snd_hdac_is_power_on() to check whether codec
is suspended. Because the direct-complete path isn't taken,
pm_runtime_disable() isn't called so snd_hdac_is_power_on() returns
false and jackpoll continues to run, and snd_hda_power_up_pm() cannot
power up an already suspended codec in multiple attempts, causes the
long delay on system suspend:
if (dev->power.direct_complete) {
if (pm_runtime_status_suspended(dev)) {
pm_runtime_disable(dev);
if (pm_runtime_status_suspended(dev)) {
pm_dev_dbg(dev, state, "direct-complete ");
goto Complete;
}
pm_runtime_enable(dev);
}
dev->power.direct_complete = false;
}
When direct-complete path is taken, snd_hdac_is_power_on() returns true
and hda_jackpoll_work() is skipped by accident. So this is still not
correct.
If we were to use snd_hdac_is_power_on() in system PM path,
pm_runtime_status_suspended() should be used instead of
pm_runtime_suspended(), otherwise pm_runtime_{enable,disable}() may
change the outcome of snd_hdac_is_power_on().
Because devices suspend in reverse order (i.e. child first), it doesn't
make much sense to resume an already suspended codec from audio
controller. So avoid the issue by making sure jackpoll isn't used in
system PM process.
Fixes: 215a22ed31 ("ALSA: hda: Refactor codec PM to use direct-complete optimization")
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Reviewed-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
Link: https://lore.kernel.org/r/20210112181128.1229827-3-kai.heng.feng@canonical.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Only the IPI-related functions in the smp_ops should be conditional
on the vector callback being available. The rest should still happen:
• xen_hvm_smp_prepare_boot_cpu()
This function does two things, both of which should still happen if
there is no vector callback support.
The call to xen_vcpu_setup() for vCPU0 should still happen as it just
sets up the vcpu_info for CPU0. That does happen for the secondary
vCPUs too, from xen_cpu_up_prepare_hvm().
The second thing it does is call xen_init_spinlocks(), which perhaps
counter-intuitively should *also* still be happening in the case
without vector callbacks, so that it can clear its local xen_pvspin
flag and disable the virt_spin_lock_key accordingly.
Checking xen_have_vector_callback in xen_init_spinlocks() itself
would affect PV guests, so set the global nopvspin flag in
xen_hvm_smp_init() instead, when vector callbacks aren't available.
• xen_hvm_smp_prepare_cpus()
This does some IPI-related setup by calling xen_smp_intr_init() and
xen_init_lock_cpu(), which can be made conditional. And it sets the
xen_vcpu_id to XEN_VCPU_ID_INVALID for all possible CPUS, which does
need to happen.
• xen_smp_cpus_done()
This offlines any vCPUs which doesn't fit in the global shared_info
page, if separate vcpu_info placement isn't available. That part also
needs to happen regardless of vector callback support.
• xen_hvm_cpu_die()
This doesn't actually do anything other than commin_cpu_die() right
right now in the !vector_callback case; all three teardown functions
it calls should be no-ops. But to guard against future regressions
it's useful to call it anyway, and for it to explicitly check for
xen_have_vector_callback before calling those additional functions.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Link: https://lore.kernel.org/r/20210106153958.584169-6-dwmw2@infradead.org
Signed-off-by: Juergen Gross <jgross@suse.com>
It's useful to be able to test non-vector event channel delivery, to make
sure Linux will work properly on older Xen which doesn't have it.
It's also useful for those working on Xen and Xen-compatible hypervisors,
because there are guest kernels still in active use which use PCI INTX
even when vector delivery is available.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Link: https://lore.kernel.org/r/20210106153958.584169-4-dwmw2@infradead.org
Signed-off-by: Juergen Gross <jgross@suse.com>
With INTX or GSI delivery, Xen uses the event channel structures of CPU0.
If the interrupt gets handled by Linux on a different CPU, then no events
are seen as pending. Rather than introducing locking to allow other CPUs
to process CPU0's events, just ensure that the PCI interrupts happens
only on CPU0.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Link: https://lore.kernel.org/r/20210106153958.584169-3-dwmw2@infradead.org
Signed-off-by: Juergen Gross <jgross@suse.com>
For a while, event channel notification via the PCI platform device
has been broken, because we attempt to communicate with xenstore before
we even have notifications working, with the xs_reset_watches() call
in xs_init().
We tend to get away with this on Xen versions below 4.0 because we avoid
calling xs_reset_watches() anyway, because xenstore might not cope with
reading a non-existent key. And newer Xen *does* have the vector
callback support, so we rarely fall back to INTX/GSI delivery.
To fix it, clean up a bit of the mess of xs_init() and xenbus_probe()
startup. Call xs_init() directly from xenbus_init() only in the !XS_HVM
case, deferring it to be called from xenbus_probe() in the XS_HVM case
instead.
Then fix up the invocation of xenbus_probe() to happen either from its
device_initcall if the callback is available early enough, or when the
callback is finally set up. This means that the hack of calling
xenbus_probe() from a workqueue after the first interrupt, or directly
from the PCI platform device setup, is no longer needed.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Link: https://lore.kernel.org/r/20210113132606.422794-2-dwmw2@infradead.org
Signed-off-by: Juergen Gross <jgross@suse.com>
With UBSAN enabled and building with clang, there are occasionally
warnings like
WARNING: modpost: vmlinux.o(.text+0xc533ec): Section mismatch in reference from the function arch_atomic64_or() to the variable .init.data:numa_nodes_parsed
The function arch_atomic64_or() references
the variable __initdata numa_nodes_parsed.
This is often because arch_atomic64_or lacks a __initdata
annotation or the annotation of numa_nodes_parsed is wrong.
for functions that end up not being inlined as intended but operating
on __initdata variables. Mark these as __always_inline, along with
the corresponding asm-generic wrappers.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210108092024.4034860-1-arnd@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
This reverts commit 367c820ef0.
lockup_detector_init() makes heavy use of per-cpu variables and must be
called with preemption disabled. Usually, it's handled early during boot
in kernel_init_freeable(), before SMP has been initialised.
Since we do not know whether or not our PMU interrupt can be signalled
as an NMI until considerably later in the boot process, the Arm PMU
driver attempts to re-initialise the lockup detector off the back of a
device_initcall(). Unfortunately, this is called from preemptible
context and results in the following splat:
| BUG: using smp_processor_id() in preemptible [00000000] code: swapper/0/1
| caller is debug_smp_processor_id+0x20/0x2c
| CPU: 2 PID: 1 Comm: swapper/0 Not tainted 5.10.0+ #276
| Hardware name: linux,dummy-virt (DT)
| Call trace:
| dump_backtrace+0x0/0x3c0
| show_stack+0x20/0x6c
| dump_stack+0x2f0/0x42c
| check_preemption_disabled+0x1cc/0x1dc
| debug_smp_processor_id+0x20/0x2c
| hardlockup_detector_event_create+0x34/0x18c
| hardlockup_detector_perf_init+0x2c/0x134
| watchdog_nmi_probe+0x18/0x24
| lockup_detector_init+0x44/0xa8
| armv8_pmu_driver_init+0x54/0x78
| do_one_initcall+0x184/0x43c
| kernel_init_freeable+0x368/0x380
| kernel_init+0x1c/0x1cc
| ret_from_fork+0x10/0x30
Rather than bodge this with raw_smp_processor_id() or randomly disabling
preemption, simply revert the culprit for now until we figure out how to
do this properly.
Reported-by: Lecopzer Chen <lecopzer.chen@mediatek.com>
Signed-off-by: Will Deacon <will@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Cc: Sumit Garg <sumit.garg@linaro.org>
Cc: Alexandru Elisei <alexandru.elisei@arm.com>
Link: https://lore.kernel.org/r/20201221162249.3119-1-lecopzer.chen@mediatek.com
Link: https://lore.kernel.org/r/20210112221855.10666-1-will@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Commit 156708adf2 ("SUNRPC: Move the svc_xdr_recvfrom()
tracepoint") tried to capture the correct XID in the trace record,
but this line in svc_recv:
rqstp->rq_xid = svc_getu32(&rqstp->rq_arg.head[0]);
alters the size of rq_arg.head[0].iov_len. The tracepoint records
the correct XID but an incorrect value for the length of the
xdr_buf's head.
To keep the trace callsites simple, I've created two trace classes.
One assumes the xdr_buf contains a full RPC message, and the XID
can be extracted from it. The other assumes the contents of the
xdr_buf are arbitrary, and the xid will be provided by the caller.
Currently there is only one user of each class, but I expect we will
need a few more tracepoints using each class as time goes on.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
All EL0 returns go via ret_to_user(), which masks IRQs and notifies
lockdep and tracing before calling into do_notify_resume(). Therefore,
there's no need for do_notify_resume() to call trace_hardirqs_off(), and
the comment is stale. The call is simply redundant.
In ret_to_user() we call exit_to_user_mode(), which notifies lockdep and
tracing the IRQs will be enabled in userspace, so there's no need for
el0_svc_common() to call trace_hardirqs_on() before returning. Further,
at the start of ret_to_user() we call trace_hardirqs_off(), so not only
is this redundant, but it is immediately undone.
In addition to being redundant, the trace_hardirqs_on() in
el0_svc_common() leaves lockdep inconsistent with the hardware state,
and is liable to cause issues for any C code or instrumentation
between this and the call to trace_hardirqs_off() which undoes it in
ret_to_user().
This patch removes the redundant tracing calls and associated stale
comments.
Fixes: 23529049c6 ("arm64: entry: fix non-NMI user<->kernel transitions")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Will Deacon <will@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210107145310.44616-1-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Allow issuing an IOCTL_PRIVCMD_MMAP_RESOURCE ioctl with num = 0 and
addr = 0 in order to fetch the size of a specific resource.
Add a shortcut to the default map resource path, since fetching the
size requires no address to be passed in, and thus no VMA to setup.
This is missing from the initial implementation, and causes issues
when mapping resources that don't have fixed or known sizes.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: stable@vger.kernel.org # >= 4.18
Link: https://lore.kernel.org/r/20210112115358.23346-1-roger.pau@citrix.com
Signed-off-by: Juergen Gross <jgross@suse.com>
Commit e7b5d63a82 ("mmc: sdhci-brcmstb: Add shutdown callback")
that added a shutdown callback to the diver, is causing "mmc timeout"
errors on S5 suspend. The problem was that the "remove" was queuing
additional MMC commands after the "shutdown" and these caused
timeouts as the MMC queues were cleaned up for "remove". The
shutdown callback will be changed to calling sdhci-pltfm_suspend
which should get better power savings because the clocks will be
shutdown.
Fixes: e7b5d63a82 ("mmc: sdhci-brcmstb: Add shutdown callback")
Signed-off-by: Al Cooper <alcooperx@gmail.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20210107221509.6597-1-alcooperx@gmail.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Newer ideapads (e.g.: Yoga 14s, 720S 14) come with ELAN0634 touchpad do not
use EC to switch touchpad.
Reading VPCCMD_R_TOUCHPAD will return zero thus touchpad may be blocked
unexpectedly.
Writing VPCCMD_W_TOUCHPAD may cause a spurious key press.
Add has_touchpad_switch to workaround these machines.
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Cc: stable@vger.kernel.org # 5.4+
--
v2: Specify touchpad to ELAN0634
v3: Stupid missing ! in v2
v4: Correct acpi_dev_present usage (Hans)
Link: https://lore.kernel.org/r/20210107144438.12605-1-jiaxun.yang@flygoat.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
The type of 'r' in octeon_irq_init_ciu is 'unsigned int', so 'r < 0'
can't be true.
Fix this by change the type of 'r' and 'i' from 'unsigned int'
to 'int'. As 'i' won't be negative, this change works.
Fixes: 99fbc70f85 ("MIPS: Octeon: irq: Alloc desc before configuring IRQ")
Signed-off-by: Menglong Dong <dong.menglong@zte.com.cn>
Reviewed-by: Alexander Sverdlin <alexander.sverdlin@nokia.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
LLVM-built Linux triggered a boot hangup with KASLR enabled.
arch/mips/kernel/relocate.c:get_random_boot() uses linux_banner,
which is a string constant, as a random seed, but accesses it
as an array of unsigned long (in rotate_xor()).
When the address of linux_banner is not aligned to sizeof(long),
such access emits unaligned access exception and hangs the kernel.
Use PTR_ALIGN() to align input address to sizeof(long) and also
align down the input length to prevent possible access-beyond-end.
Fixes: 405bc8fd12 ("MIPS: Kernel: Implement KASLR using CONFIG_RELOCATABLE")
Cc: stable@vger.kernel.org # 4.7+
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Tested-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Teraoka AD2000 uses the CP210x driver, but the chip VID/PID is
customized with 0988/0578. We need the driver to support the new
VID/PID.
Signed-off-by: Chenxin Jin <bg4akv@hotmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Johan Hovold <johan@kernel.org>
Oded writes:
This tag contains the following bug fixes:
- Fix the dma address that is passed to dma_mmap_coherent. We passed
an address that includes an offset that is needed by our device and
that caused dma_mmap_coherent to do an errounous mapping.
- Fix the reset process in case failures happen during the reset process.
Without this fix, if the user would have asked to perform reset after
the previous reset failed he would get a kernel panic
- WA to prevent soft lockup BUG during unmap of host memory. In case of
tens of thousands of mappings, the unmapping can take a long time that
exceeds the soft lockup timeout. This WA adds a small sleep every 32K
page unmappings to prevent that.
* tag 'misc-habanalabs-fixes-2021-01-13' of https://git.kernel.org/pub/scm/linux/kernel/git/ogabbay/linux:
habanalabs: prevent soft lockup during unmap
habanalabs: fix reset process in case of failures
habanalabs: fix dma_addr passed to dma_mmap_coherent
The GCC_LPASS_Q6_AXI_CLK and GCC_LPASS_SWAY_CLK clocks may not be
touched on a typical UEFI based SDM845 device, but when the kernel is
built with CONFIG_SDM_LPASSCC_845 this happens, unless they are marked
as protected-clocks in the DT.
This was done for the MTP and the Pocophone, but not for DB845c and the
Lenovo Yoga C630 - causing these to fail to boot if the LPASS clock
controller is enabled (which it typically isn't).
Tested-by: Vinod Koul <vkoul@kernel.org> #on db845c
Reviewed-by: Vinod Koul <vkoul@kernel.org>
Link: https://lore.kernel.org/r/20201222001103.3112306-1-bjorn.andersson@linaro.org
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
The patch fix commit: ad5d112 ("riscv: use vDSO common flow to
reduce the latency of the time-related functions").
The GENERIC_TIME_VSYSCALL should be CONFIG_GENERIC_TIME_VSYSCALL
or vgettimeofday won't work.
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Fixes: ad5d1122b8 ("riscv: use vDSO common flow to reduce the latency of the time-related functions")
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
1) Pass conntrack -f to specify family in netfilter conntrack helper
selftests, from Chen Yi.
2) Honor hashsize modparam from nf_conntrack_buckets sysctl,
from Jesper D. Brouer.
3) Fix memleak in nf_nat_init() error path, from Dinghao Liu.
* git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf:
netfilter: nf_nat: Fix memleak in nf_nat_init
netfilter: conntrack: fix reading nf_conntrack_buckets
selftests: netfilter: Pass family parameter "-f" to conntrack tool
====================
Link: https://lore.kernel.org/r/20210112222033.9732-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Karsten Graul says:
====================
net/smc: fix out of bound access in netlink interface
Both patches fix possible out-of-bounds reads. The original code expected
that snprintf() reads len-1 bytes from source and appends the terminating
null, but actually snprintf() first copies len bytes and finally overwrites
the last byte with a null.
Fix this by using memcpy() and terminating the string afterwards.
====================
Link: https://lore.kernel.org/r/20210112162122.26832-1-kgraul@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Using snprintf() to convert not null-terminated strings to null
terminated strings may cause out of bounds read in the source string.
Therefore use memcpy() and terminate the target string with a null
afterwards.
Fixes: a3db10efcc ("net/smc: Add support for obtaining SMCR device list")
Signed-off-by: Guvenc Gulce <guvenc@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Instead of re-implementing most of inet_shutdown, re-use
such helper, and implement the MPTCP-specific bits at the
'proto' level.
The msk-level disconnect() can now be invoked, lets provide a
suitable implementation.
As a side effect, this fixes bad state management for listener
sockets. The latter could lead to division by 0 oops since
commit ea4ca586b1 ("mptcp: refine MPTCP-level ack scheduling").
Fixes: 43b54c6ee3 ("mptcp: Use full MPTCP-level disconnect state machine")
Fixes: ea4ca586b1 ("mptcp: refine MPTCP-level ack scheduling")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Syzkaller found a way to trigger division by zero
in mptcp_subflow_cleanup_rbuf().
The current checks implemented into tcp_can_send_ack()
are too week, let's be more accurate.
Reported-by: Christoph Paasch <cpaasch@apple.com>
Fixes: ea4ca586b1 ("mptcp: refine MPTCP-level ack scheduling")
Fixes: fd8976790a ("mptcp: be careful on MPTCP-level ack.")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
A race condition exists between the response handler getting called because
of exchange_mgr_reset() (which clears out all the active XIDs) and the
response we get via an interrupt.
Sequence of events:
rport ba0200: Port timeout, state PLOGI
rport ba0200: Port entered PLOGI state from PLOGI state
xid 1052: Exchange timer armed : 20000 msecs xid timer armed here
rport ba0200: Received LOGO request while in state PLOGI
rport ba0200: Delete port
rport ba0200: work event 3
rport ba0200: lld callback ev 3
bnx2fc: rport_event_hdlr: event = 3, port_id = 0xba0200
bnx2fc: ba0200 - rport not created Yet!!
/* Here we reset any outstanding exchanges before
freeing rport using the exch_mgr_reset() */
xid 1052: Exchange timer canceled
/* Here we got two responses for one xid */
xid 1052: invoking resp(), esb 20000000 state 3
xid 1052: invoking resp(), esb 20000000 state 3
xid 1052: fc_rport_plogi_resp() : ep->resp_active 2
xid 1052: fc_rport_plogi_resp() : ep->resp_active 2
Skip the response if the exchange is already completed.
Link: https://lore.kernel.org/r/20201215194731.2326-1-jhasan@marvell.com
Signed-off-by: Javed Hasan <jhasan@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Michael Chan says:
====================
bnxt_en: Bug fixes.
This series has 2 fixes. The first one fixes a resource accounting error
with the RDMA driver loaded and the second one fixes the firmware
flashing sequence after defragmentation.
====================
Link: https://lore.kernel.org/r/1610357200-30755-1-git-send-email-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When the FW tells the driver to retry the INSTALL_UPDATE command after
it has cleared the NVM area, the driver is not clearing the previously
used ALLOWED_TO_DEFRAG flag. As a result the FW tries to defrag the NVM
area a second time in a loop and can fail the request.
Fixes: 1432c3f6a6 ("bnxt_en: Retry installing FW package under NO_SPACE error condition.")
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The function bnxt_get_ulp_stat_ctxs() does not count the stats contexts
used by the RDMA driver correctly when the RDMA driver is freeing the
MSIX vectors. It assumes that if the RDMA driver is registered, the
additional stats contexts will be needed. This is not true when the
RDMA driver is about to unregister and frees the MSIX vectors.
This slight error leads to over accouting of the stats contexts needed
after the RDMA driver has unloaded. This will cause some firmware
warning and error messages in dmesg during subsequent config. changes
or ifdown/ifup.
Fix it by properly accouting for extra stats contexts only if the
RDMA driver is registered and MSIX vectors have been successfully
requested.
Fixes: c027c6b4e9 ("bnxt_en: get rid of num_stat_ctxs variable")
Reviewed-by: Yongping Zhang <yongping.zhang@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This commit enables the use of the r8153_ecm driver, introduced with
commit c1aedf015e ("net/usb/r8153_ecm: support ECM mode for
RTL8153") for the Lenovo Powered USB-C Hub (17ef:721e) based on the
Realtek RTL8153B chip.
This results in the following driver preference:
- if r8152 is available, use the r8152 driver
- if r8152 is not available, use the r8153_ecm driver
This is done to prevent the NIC from constantly sending pause frames
when the host system enters standby (fixed by using the r8152 driver
in "r8152: Add Lenovo Powered USB-C Travel Hub"), while still allowing
the device to work with the r8153_ecm driver as a fallback.
Signed-off-by: Leon Schuermann <leon@is.currently.online>
Tested-by: Leon Schuermann <leon@is.currently.online>
Link: https://lore.kernel.org/r/20210111190312.12589-3-leon@is.currently.online
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This USB-C Hub (17ef:721e) based on the Realtek RTL8153B chip used to
use the cdc_ether driver. However, using this driver, with the system
suspended the device constantly sends pause-frames as soon as the
receive buffer fills up. This causes issues with other devices, where
some Ethernet switches stop forwarding packets altogether.
Using the Realtek driver (r8152) fixes this issue. Pause frames are no
longer sent while the host system is suspended.
Signed-off-by: Leon Schuermann <leon@is.currently.online>
Tested-by: Leon Schuermann <leon@is.currently.online>
Link: https://lore.kernel.org/r/20210111190312.12589-2-leon@is.currently.online
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit 0b2894cd0f ("scsi: docs: ABI: sysfs-driver-ufs: Add DeepSleep
power mode") adds new entries in tables of sysfs-driver-ufs ABI
documentation, but formatted the table incorrectly.
Hence, make htmldocs warns:
./Documentation/ABI/testing/sysfs-driver-ufs:{915,956}:
WARNING: Malformed table. Text in column margin in table line 15.
Rectify table formatting for DeepSleep power mode.
Link: https://lore.kernel.org/r/20210111102212.19377-1-lukas.bulwahn@gmail.com
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Florian reported a use-after-free bug in devlink_nl_port_fill found with
KASAN:
(devlink_nl_port_fill)
(devlink_port_notify)
(devlink_port_unregister)
(dsa_switch_teardown.part.3)
(dsa_tree_teardown_switches)
(dsa_unregister_switch)
(bcm_sf2_sw_remove)
(platform_remove)
(device_release_driver_internal)
(device_links_unbind_consumers)
(device_release_driver_internal)
(device_driver_detach)
(unbind_store)
Allocated by task 31:
alloc_netdev_mqs+0x5c/0x50c
dsa_slave_create+0x110/0x9c8
dsa_register_switch+0xdb0/0x13a4
b53_switch_register+0x47c/0x6dc
bcm_sf2_sw_probe+0xaa4/0xc98
platform_probe+0x90/0xf4
really_probe+0x184/0x728
driver_probe_device+0xa4/0x278
__device_attach_driver+0xe8/0x148
bus_for_each_drv+0x108/0x158
Freed by task 249:
free_netdev+0x170/0x194
dsa_slave_destroy+0xac/0xb0
dsa_port_teardown.part.2+0xa0/0xb4
dsa_tree_teardown_switches+0x50/0xc4
dsa_unregister_switch+0x124/0x250
bcm_sf2_sw_remove+0x98/0x13c
platform_remove+0x44/0x5c
device_release_driver_internal+0x150/0x254
device_links_unbind_consumers+0xf8/0x12c
device_release_driver_internal+0x84/0x254
device_driver_detach+0x30/0x34
unbind_store+0x90/0x134
What happens is that devlink_port_unregister emits a netlink
DEVLINK_CMD_PORT_DEL message which associates the devlink port that is
getting unregistered with the ifindex of its corresponding net_device.
Only trouble is, the net_device has already been unregistered.
It looks like we can stub out the search for a corresponding net_device
if we clear the devlink_port's type. This looks like a bit of a hack,
but also seems to be the reason why the devlink_port_type_clear function
exists in the first place.
Fixes: 3122433eb5 ("net: dsa: Register devlink ports before calling DSA driver setup()")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Florian fainelli <f.fainelli@gmail.com>
Reported-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20210112004831.3778323-1-olteanv@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently the following happens when a DSA master driver unbinds while
there are DSA switches attached to it:
$ echo 0000:00:00.5 > /sys/bus/pci/drivers/mscc_felix/unbind
------------[ cut here ]------------
WARNING: CPU: 0 PID: 392 at net/core/dev.c:9507
Call trace:
rollback_registered_many+0x5fc/0x688
unregister_netdevice_queue+0x98/0x120
dsa_slave_destroy+0x4c/0x88
dsa_port_teardown.part.16+0x78/0xb0
dsa_tree_teardown_switches+0x58/0xc0
dsa_unregister_switch+0x104/0x1b8
felix_pci_remove+0x24/0x48
pci_device_remove+0x48/0xf0
device_release_driver_internal+0x118/0x1e8
device_driver_detach+0x28/0x38
unbind_store+0xd0/0x100
Located at the above location is this WARN_ON:
/* Notifier chain MUST detach us all upper devices. */
WARN_ON(netdev_has_any_upper_dev(dev));
Other stacked interfaces, like VLAN, do indeed listen for
NETDEV_UNREGISTER on the real_dev and also unregister themselves at that
time, which is clearly the behavior that rollback_registered_many
expects. But DSA interfaces are not VLAN. They have backing hardware
(platform devices, PCI devices, MDIO, SPI etc) which have a life cycle
of their own and we can't just trigger an unregister from the DSA
framework when we receive a netdev notifier that the master unregisters.
Luckily, there is something we can do, and that is to inform the driver
core that we have a runtime dependency to the DSA master interface's
device, and create a device link where that is the supplier and we are
the consumer. Having this device link will make the DSA switch unbind
before the DSA master unbinds, which is enough to avoid the WARN_ON from
rollback_registered_many.
Note that even before the blamed commit, DSA did nothing intelligent
when the master interface got unregistered either. See the discussion
here:
https://lore.kernel.org/netdev/20200505210253.20311-1-f.fainelli@gmail.com/
But this time, at least the WARN_ON is loud enough that the
upper_dev_link commit can be blamed.
The advantage with this approach vs dev_hold(master) in the attached
link is that the latter is not meant for long term reference counting.
With dev_hold, the only thing that will happen is that when the user
attempts an unbind of the DSA master, netdev_wait_allrefs will keep
waiting and waiting, due to DSA keeping the refcount forever. DSA would
not access freed memory corresponding to the master interface, but the
unbind would still result in a freeze. Whereas with device links,
graceful teardown is ensured. It even works with cascaded DSA trees.
$ echo 0000:00:00.2 > /sys/bus/pci/drivers/fsl_enetc/unbind
[ 1818.797546] device swp0 left promiscuous mode
[ 1819.301112] sja1105 spi2.0: Link is Down
[ 1819.307981] DSA: tree 1 torn down
[ 1819.312408] device eno2 left promiscuous mode
[ 1819.656803] mscc_felix 0000:00:00.5: Link is Down
[ 1819.667194] DSA: tree 0 torn down
[ 1819.711557] fsl_enetc 0000:00:00.2 eno2: Link is Down
This approach allows us to keep the DSA framework absolutely unchanged,
and the driver core will just know to unbind us first when the master
goes away - as opposed to the large (and probably impossible) rework
required if attempting to listen for NETDEV_UNREGISTER.
As per the documentation at Documentation/driver-api/device_link.rst,
specifying the DL_FLAG_AUTOREMOVE_CONSUMER flag causes the device link
to be automatically purged when the consumer fails to probe or later
unbinds. So we don't need to keep the consumer_link variable in struct
dsa_switch.
Fixes: 2f1e8ea726 ("net: dsa: link interfaces with the DSA master to get rid of lockdep warnings")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20210111230943.3701806-1-olteanv@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The size of kasan_early_shadow_pte[] now is PTRS_PER_PTE which defined
to 512 for arm. This means that it only covers the prev Linux pte
entries, but not the HWTABLE pte entries for arm.
The reason it currently works is that the symbol kasan_early_shadow_page
immediately following kasan_early_shadow_pte in memory is page aligned,
which makes kasan_early_shadow_pte look like a 4KB size array. But we
can't ensure the order is always right with different compiler/linker,
or if more bss symbols are introduced.
We had a test with QEMU + vexpress:put a 512KB-size symbol with
attribute __section(".bss..page_aligned") after kasan_early_shadow_pte,
and poisoned it after kasan_early_init(). Then enabled CONFIG_KASAN, it
failed to boot up.
Link: https://lkml.kernel.org/r/20210109044622.8312-1-hailongliiu@yeah.net
Signed-off-by: Hailong Liu <liu.hailong6@zte.com.cn>
Signed-off-by: Ziliang Guo <guo.ziliang@zte.com.cn>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Boot a CONFIG_MEMCG=y kernel with "cgroup_disabled=memory" and you are
met by a series of warnings from the VM_WARN_ON_ONCE_PAGE(!memcg, page)
recently added to the inline mem_cgroup_page_lruvec().
An earlier attempt to place that warning, in mem_cgroup_lruvec(), had
been careful to do so after weeding out the mem_cgroup_disabled() case;
but was itself invalid because of the mem_cgroup_lruvec(NULL, pgdat) in
clear_pgdat_congested() and age_active_anon().
Warning in mem_cgroup_page_lruvec() was once useful in detecting a KSM
charge bug, so may be worth keeping: but skip if mem_cgroup_disabled().
Link: https://lkml.kernel.org/r/alpine.LSU.2.11.2101032056260.1093@eggly.anvils
Fixes: 9a1ac2288c ("mm/memcontrol:rewrite mem_cgroup_page_lruvec()")
Signed-off-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Alex Shi <alex.shi@linux.alibaba.com>
Acked-by: Roman Gushchin <guro@fb.com>
Acked-by: Chris Down <chris@chrisdown.name>
Reviewed-by: Baoquan He <bhe@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Hui Su <sh_def@163.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
acquire_slab() fails if there is contention on the freelist of the page
(probably because some other CPU is concurrently freeing an object from
the page). In that case, it might make sense to look for a different page
(since there might be more remote frees to the page from other CPUs, and
we don't want contention on struct page).
However, the current code accidentally stops looking at the partial list
completely in that case. Especially on kernels without CONFIG_NUMA set,
this means that get_partial() fails and new_slab_objects() falls back to
new_slab(), allocating new pages. This could lead to an unnecessary
increase in memory fragmentation.
Link: https://lkml.kernel.org/r/20201228130853.1871516-1-jannh@google.com
Fixes: 7ced371971 ("slub: Acquire_slab() avoid loop")
Signed-off-by: Jann Horn <jannh@google.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In commit 826f328e2b ("net: dcb: Validate netlink message in DCB
handler"), Linux started rejecting RTM_GETDCB netlink messages if they
contained a set-like DCB_CMD_ command.
The reason was that privileges were only verified for RTM_SETDCB messages,
but the value that determined the action to be taken is the command, not
the message type. And validation of message type against the DCB command
was the obvious missing piece.
Unfortunately it turns out that mlnx_qos, a somewhat widely deployed tool
for configuration of DCB, accesses the DCB set-like APIs through
RTM_GETDCB.
Therefore do not bounce the discrepancy between message type and command.
Instead, in addition to validating privileges based on the actual message
type, validate them also based on the expected message type. This closes
the loophole of allowing DCB configuration on non-admin accounts, while
maintaining backward compatibility.
Fixes: 2f90b8657e ("ixgbe: this patch adds support for DCB to the kernel and ixgbe driver")
Fixes: 826f328e2b ("net: dcb: Validate netlink message in DCB handler")
Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://lore.kernel.org/r/a3edcfda0825f2aa2591801c5232f2bbf2d8a554.1610384801.git.me@pmachata.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Advance the maximum number of arguments from 9 to 15 to account for
all potential feature flags that may be supplied.
Linux 4.19 added "meta_device"
(356d9d52e1) and "recalculate"
(a3fcf72531) flags.
Commit 468dfca38b added
"sectors_per_bit" and "bitmap_flush_interval".
Commit 84597a44a9 added
"allow_discards".
And the commit d537858ac8 added
"fix_padding".
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org # v4.19+
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Pull irqchip fixes from Marc Zyngier:
- Fix the MIPS CPU interrupt controller hierarchy
- Simplify the PRUSS Kconfig entry
- Eliminate trivial build warnings on the MIPS Loongson liointc
- Fix error path in devm_platform_get_irqs_affinity()
- Turn the BCM2836 IPI irq_eoi callback into irq_ack
- Fix initialisation of on-stack msi_alloc_info
- Cleanup spurious comma in irq-sl28cpld
Link: https://lore.kernel.org/r/20210110110001.2328708-1-maz@kernel.org
Due to an integer overflow, RTC synchronization now happens every 2s
instead of the intended 11 minutes. Fix this by forcing 64-bit
arithmetic for the sync period calculation.
Annotate the other place which multiplies seconds for consistency as well.
Fixes: c9e6189fb0 ("ntp: Make the RTC synchronization more reliable")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210111103956.290378-1-geert+renesas@glider.be
Empty BTFs do come up (e.g., simple kernel modules with no new types and
strings, compared to the vmlinux BTF) and there is nothing technically wrong
with them. So remove unnecessary check preventing loading empty BTFs.
Fixes: d812362450 ("libbpf: Fix BTF data layout checks and allow empty BTF")
Reported-by: Christopher William Snowhill <chris@kode54.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210110070341.1380086-2-andrii@kernel.org
Some modules don't declare any new types and end up with an empty BTF,
containing only valid BTF header and no types or strings sections. This
currently causes BTF validation error. There is nothing wrong with such BTF,
so fix the issue by allowing module BTFs with no types or strings.
Fixes: 36e68442d1 ("bpf: Load and verify kernel module BTFs")
Reported-by: Christopher William Snowhill <chris@kode54.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210110070341.1380086-1-andrii@kernel.org
pm_clk_suspend()/pm_clk_resume() are defined as NULL pointers rather than
empty inline stubs without CONFIG_PM:
drivers/clk/mmp/clk-audio.c:402:16: error: called object type 'void *' is not a function or function pointer
pm_clk_suspend(dev);
drivers/clk/mmp/clk-audio.c:411:15: error: called object type 'void *' is not a function or function pointer
pm_clk_resume(dev);
I tried redefining the helper functions, but that caused additional
problems. This is the simple solution of replacing the __maybe_unused
trick with an #ifdef.
Fixes: 725262d291 ("clk: mmp2: Add audio clock controller driver")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20210103135503.3668784-1-arnd@kernel.org
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
optlen == 0 indicates that the kernel should ignore BPF buffer
and use the original one from the user. We, however, forget
to free the temporary buffer that we've allocated for BPF.
Fixes: d8fe449a9c ("bpf: Don't return EINVAL from {get,set}sockopt when optlen > PAGE_SIZE")
Reported-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20210112162829.775079-1-sdf@google.com
A previous patch introduced a harmless randconfig warning:
WARNING: unmet direct dependencies detected for MXC_CLK_SCU
Depends on [n]: COMMON_CLK [=y] && ARCH_MXC [=n] && IMX_SCU [=y] && HAVE_ARM_SMCCC [=y]
Selected by [m]:
- CLK_IMX8QXP [=m] && COMMON_CLK [=y] && (ARCH_MXC [=n] && ARM64 [=y] || COMPILE_TEST [=y]) && IMX_SCU [=y] && HAVE_ARM_SMCCC [=y]
Since the symbol is now hidden and only selected by other symbols,
just remove the dependencies and require the other drivers to
get it right.
Fixes: 6247e31b75 ("clk: imx: scu: fix MXC_CLK_SCU module build break")
Reported-by: Anders Roxell <anders.roxell@linaro.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20201230155244.981757-1-arnd@kernel.org
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Sometimes, when dm-crypt executes decryption in a tasklet, we may get
"BUG: KASAN: use-after-free in tasklet_action_common.constprop..."
with a kasan-enabled kernel.
When the decryption fully completes in the tasklet, dm-crypt will call
bio_endio(), which in turn will call clone_endio() from dm.c core code. That
function frees the resources associated with the bio, including per bio private
structures. For dm-crypt it will free the current struct dm_crypt_io, which
contains our tasklet object, causing use-after-free, when the tasklet is being
dequeued by the kernel.
To avoid this, do not call bio_endio() from the current tasklet context, but
delay its execution to the dm-crypt IO workqueue.
Fixes: 39d42fa96b ("dm crypt: add flags to optionally bypass kcryptd workqueues")
Cc: <stable@vger.kernel.org> # v5.9+
Signed-off-by: Ignat Korchagin <ignat@cloudflare.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
With the introduction of a dynamic ZONE_DMA range based on DT or IORT
information, there's no need for CMA allocations from the wider
ZONE_DMA32 since on most platforms ZONE_DMA will cover the 32-bit
addressable range. Remove the arm64_dma32_phys_limit and set
arm64_dma_phys_limit to cover the smallest DMA range required on the
platform. CMA allocation and crashkernel reservation now go in the
dynamically sized ZONE_DMA, allowing correct functionality on RPi4.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chen Zhou <chenzhou10@huawei.com>
Reviewed-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
Tested-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de> # On RPi4B
Pull NFS client fixes from Trond Myklebust:
"Highlights include:
- Fix parsing of link-local IPv6 addresses
- Fix confusing logging of mount errors that was introduced by the
fsopen() patchset.
- Fix a tracing use after free in _nfs4_do_setlk()
- Layout return-on-close fixes when called from nfs4_evict_inode()
- Layout segments were being leaked in
pnfs_generic_clear_request_commit()
- Don't leak DS commits in pnfs_generic_retry_commit()
- Fix an Oopsable use-after-free when nfs_delegation_find_inode_server()
calls iput() on an inode after the super block has gone away"
* tag 'nfs-for-5.11-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFS: nfs_igrab_and_active must first reference the superblock
NFS: nfs_delegation_find_inode_server must first reference the superblock
NFS/pNFS: Fix a leak of the layout 'plh_outstanding' counter
NFS/pNFS: Don't leak DS commits in pnfs_generic_retry_commit()
NFS/pNFS: Don't call pnfs_free_bucket_lseg() before removing the request
pNFS: Stricter ordering of layoutget and layoutreturn
pNFS: Clean up pnfs_layoutreturn_free_lsegs()
pNFS: We want return-on-close to complete when evicting the inode
pNFS: Mark layout for return if return-on-close was not sent
net: sunrpc: interpret the return value of kstrtou32 correctly
NFS: Adjust fs_context error logging
NFS4: Fix use-after-free in trace_event_raw_event_nfs4_set_lock
Pull SCSI target fix from Martin Petersen:
"This addresses an issue in the SCSI target subsystem. A connected
initiator could specify IDs for any configured backing store device,
not just the ones explicitly made visible to the host.
The remedy is to honor the access control list when doing ID
descriptor lookups"
* tag 'mkp-scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi:
scsi: target: Fix XCOPY NAA identifier lookup
The system that use Synopsys USB host controllers goes to suspend
when using USB audio player. This causes the USB host controller
continuous send interrupt signal to system, When the number of
interrupts exceeds 100000, the system will forcibly close the
interrupts and output a calltrace error.
When the system goes to suspend, the last interrupt is reported to
the driver. At this time, the system has set the state to suspend.
This causes the last interrupt to not be processed by the system and
not clear the interrupt flag. This uncleared interrupt flag constantly
triggers new interrupt event. This causing the driver to receive more
than 100,000 interrupts, which causes the system to forcibly close the
interrupt report and report the calltrace error.
so, when the driver goes to sleep and changes the system state to
suspend, the interrupt flag needs to be cleared.
Signed-off-by: Longfang Liu <liulongfang@huawei.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Link: https://lore.kernel.org/r/1610416647-45774-1-git-send-email-liulongfang@huawei.com
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Mika writes:
thunderbolt: Fix for v5.11-rc4
This includes a single format string fix for the firmware connection
manager USB4 NVM authentication proxy implementation introduced in this
merge window.
* tag 'thunderbolt-for-v5.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/westeri/thunderbolt:
thunderbolt: Drop duplicated 0x prefix from format string
Tony Luck has maintained arch/ia64 for the past 16 years, but mentioned
that he no longer has working ia64 machines, nor time to look at patches,
so he is stepping down as the maintainer.
Fenghua Yu came in as a temporary co-maintainer when Tony was on
sabbatical in 2009, but has not worked on it after that either.
This leaves the architecture as Orphaned, meaning that patches
will have to get routed through other trees from now on.
Acked-by: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: linux-ia64@vger.kernel.org
Link: https://lore.kernel.org/lkml/20210105153603.GA17644@agluck-desk2.amr.corp.intel.com/
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
The definition if xchg() causes a harmless warning in some files, like:
In file included from ../arch/ia64/include/uapi/asm/intrinsics.h:22,
from ../arch/ia64/include/asm/intrinsics.h:11,
from ../arch/ia64/include/asm/bitops.h:19,
from ../include/linux/bitops.h:32,
from ../include/linux/kernel.h:11,
from ../fs/nfs/read.c:12:
../fs/nfs/read.c: In function 'nfs_read_completion':
../arch/ia64/include/uapi/asm/cmpxchg.h:57:2: warning: value computed is not used [-Wunused-value]
57 | ((__typeof__(*(ptr))) __xchg((unsigned long) (x), (ptr), sizeof(*(ptr))))
| ~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../fs/nfs/read.c:196:5: note: in expansion of macro 'xchg'
196 | xchg(&nfs_req_openctx(req)->error, error);
| ^~~~
Change it to a compound expression like the other architectures have
to get a clean defconfig build.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
A cleanup patch from my legacy timer series broke ia64 and led
to RCU stall errors and a fast system clock:
[ 909.360108] INFO: task systemd-sysv-ge:200 blocked for more than 127 seconds.
[ 909.360108] Not tainted 5.10.0+ #130
[ 909.360108] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 909.360108] task:systemd-sysv-ge state:D stack: 0 pid: 200 ppid: 189 flags:0x00000000
[ 909.364108]
[ 909.364108] Call Trace:
[ 909.364423] [<a00000010109b210>] __schedule+0x890/0x21e0
[ 909.364423] sp=e0000100487d7b70 bsp=e0000100487d1748
[ 909.368423] [<a00000010109cc00>] schedule+0xa0/0x240
[ 909.368423] sp=e0000100487d7b90 bsp=e0000100487d16e0
[ 909.368558] [<a00000010109ce70>] io_schedule+0x70/0xa0
[ 909.368558] sp=e0000100487d7b90 bsp=e0000100487d16c0
[ 909.372290] [<a00000010109e1c0>] bit_wait_io+0x20/0xe0
[ 909.372290] sp=e0000100487d7b90 bsp=e0000100487d1698
[ 909.374168] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[ 909.376290] [<a00000010109d860>] __wait_on_bit+0xc0/0x1c0
[ 909.376290] sp=e0000100487d7b90 bsp=e0000100487d1648
[ 909.374168] rcu: 3-....: (2 ticks this GP) idle=19e/1/0x4000000000000002 softirq=1581/1581 fqs=2
[ 909.374168] (detected by 0, t=5661 jiffies, g=1089, q=3)
[ 909.376290] [<a00000010109da80>] out_of_line_wait_on_bit+0x120/0x140
[ 909.376290] sp=e0000100487d7b90 bsp=e0000100487d1610
[ 909.374168] Task dump for CPU 3:
[ 909.374168] task:khungtaskd state:R running task
Revert most of my patch to make this work again, including the extra
update_process_times()/profile_tick() and the local_irq_enable() in the
loop that I expected not to be needed here.
I have not found out exactly what goes wrong, and would suggest that
someone with hardware access tries to convert this code into a singleshot
clockevent driver, which should give better behavior in all cases.
Reported-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Fixes: 2b49ddcef2 ("ia64: convert to legacy_timer_tick")
Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
The VT-d hardware will ignore those Addr bits which have been masked by
the AM field in the PASID-based-IOTLB invalidation descriptor. As the
result, if the starting address in the descriptor is not aligned with
the address mask, some IOTLB caches might not invalidate. Hence people
will see below errors.
[ 1093.704661] dmar_fault: 29 callbacks suppressed
[ 1093.704664] DMAR: DRHD: handling fault status reg 3
[ 1093.712738] DMAR: [DMA Read] Request device [7a:02.0] PASID 2
fault addr 7f81c968d000 [fault reason 113]
SM: Present bit in first-level paging entry is clear
Fix this by using aligned address for PASID-based-IOTLB invalidation.
Fixes: 1c4f88b7f1 ("iommu/vt-d: Shared virtual address in scalable mode")
Reported-and-tested-by: Guo Kaijie <Kaijie.Guo@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20201231005323.2178523-2-baolu.lu@linux.intel.com
Signed-off-by: Will Deacon <will@kernel.org>
It was found in [1] that bpf_inode_storage_get helper did not check
the nullness of the passed owner ptr which caused an oops when
dereferenced. This change incorporates the example suggested in [1] into
the local storage selftest.
The test is updated to create a temporary directory instead of just
using a tempfile. In order to replicate the issue this copied rm binary
is renamed tiggering the inode_rename with a null pointer for the
new_inode. The logic to verify the setting and deletion of the inode
local storage of the old inode is also moved to this LSM hook.
The change also removes the copy_rm function and simply shells out
to copy files and recursively delete directories and consolidates the
logic of setting the initial inode storage to the bprm_committed_creds
hook and removes the file_open hook.
[1]: https://lore.kernel.org/bpf/CANaYP3HWkH91SN=wTNO9FL_2ztHfqcXKX38SSE-JJ2voh+vssw@mail.gmail.com
Suggested-by: Gilad Reti <gilad.reti@gmail.com>
Signed-off-by: KP Singh <kpsingh@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210112075525.256820-2-kpsingh@kernel.org
The verifier allows ARG_PTR_TO_BTF_ID helper arguments to be NULL, so
helper implementations need to check this before dereferencing them.
This was already fixed for the socket storage helpers but not for task
and inode.
The issue can be reproduced by attaching an LSM program to
inode_rename hook (called when moving files) which tries to get the
inode of the new file without checking for its nullness and then trying
to move an existing file to a new path:
mv existing_file new_file_does_not_exist
The report including the sample program and the steps for reproducing
the bug:
https://lore.kernel.org/bpf/CANaYP3HWkH91SN=wTNO9FL_2ztHfqcXKX38SSE-JJ2voh+vssw@mail.gmail.com
Fixes: 4cf1bc1f10 ("bpf: Implement task local storage")
Fixes: 8ea636848a ("bpf: Implement bpf_local_storage for inodes")
Reported-by: Gilad Reti <gilad.reti@gmail.com>
Signed-off-by: KP Singh <kpsingh@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210112075525.256820-3-kpsingh@kernel.org
Peter writes:
- Several bug-fixes for cdns3 imx driver
- Update Peter Chen and Roger Quadros email address
* tag 'usb-v5.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/peter.chen/usb:
MAINTAINERS: update Peter Chen's email address
MAINTAINERS: Update address for Cadence USB3 driver
usb: cdns3: imx: improve driver .remove API
usb: cdns3: imx: fix can't create core device the second time issue
usb: cdns3: imx: fix writing read-only memory issue
When an incremental send finds an extent that is shared, it checks which
file extent items in the range refer to that extent, and for those it
emits clone operations, while for others it emits regular write operations
to avoid corruption at the destination (as described and fixed by commit
d906d49fc5 ("Btrfs: send, fix file corruption due to incorrect cloning
operations")).
However when the root we are cloning from is the send root, we are cloning
from the inode currently being processed and the source file range has
several extent items that partially point to the desired extent, with an
offset smaller than the offset in the file extent item for the range we
want to clone into, it can cause the algorithm to issue a clone operation
that starts at the current eof of the file being processed in the receiver
side, in which case the receiver will fail, with EINVAL, when attempting
to execute the clone operation.
Example reproducer:
$ cat test-send-clone.sh
#!/bin/bash
DEV=/dev/sdi
MNT=/mnt/sdi
mkfs.btrfs -f $DEV >/dev/null
mount $DEV $MNT
# Create our test file with a single and large extent (1M) and with
# different content for different file ranges that will be reflinked
# later.
xfs_io -f \
-c "pwrite -S 0xab 0 128K" \
-c "pwrite -S 0xcd 128K 128K" \
-c "pwrite -S 0xef 256K 256K" \
-c "pwrite -S 0x1a 512K 512K" \
$MNT/foobar
btrfs subvolume snapshot -r $MNT $MNT/snap1
btrfs send -f /tmp/snap1.send $MNT/snap1
# Now do a series of changes to our file such that we end up with
# different parts of the extent reflinked into different file offsets
# and we overwrite a large part of the extent too, so no file extent
# items refer to that part that was overwritten. This used to confuse
# the algorithm used by the kernel to figure out which file ranges to
# clone, making it attempt to clone from a source range starting at
# the current eof of the file, resulting in the receiver to fail since
# it is an invalid clone operation.
#
xfs_io -c "reflink $MNT/foobar 64K 1M 960K" \
-c "reflink $MNT/foobar 0K 512K 256K" \
-c "reflink $MNT/foobar 512K 128K 256K" \
-c "pwrite -S 0x73 384K 640K" \
$MNT/foobar
btrfs subvolume snapshot -r $MNT $MNT/snap2
btrfs send -f /tmp/snap2.send -p $MNT/snap1 $MNT/snap2
echo -e "\nFile digest in the original filesystem:"
md5sum $MNT/snap2/foobar
# Now unmount the filesystem, create a new one, mount it and try to
# apply both send streams to recreate both snapshots.
umount $DEV
mkfs.btrfs -f $DEV >/dev/null
mount $DEV $MNT
btrfs receive -f /tmp/snap1.send $MNT
btrfs receive -f /tmp/snap2.send $MNT
# Must match what we got in the original filesystem of course.
echo -e "\nFile digest in the new filesystem:"
md5sum $MNT/snap2/foobar
umount $MNT
When running the reproducer, the incremental send operation fails due to
an invalid clone operation:
$ ./test-send-clone.sh
wrote 131072/131072 bytes at offset 0
128 KiB, 32 ops; 0.0015 sec (80.906 MiB/sec and 20711.9741 ops/sec)
wrote 131072/131072 bytes at offset 131072
128 KiB, 32 ops; 0.0013 sec (90.514 MiB/sec and 23171.6148 ops/sec)
wrote 262144/262144 bytes at offset 262144
256 KiB, 64 ops; 0.0025 sec (98.270 MiB/sec and 25157.2327 ops/sec)
wrote 524288/524288 bytes at offset 524288
512 KiB, 128 ops; 0.0052 sec (95.730 MiB/sec and 24506.9883 ops/sec)
Create a readonly snapshot of '/mnt/sdi' in '/mnt/sdi/snap1'
At subvol /mnt/sdi/snap1
linked 983040/983040 bytes at offset 1048576
960 KiB, 1 ops; 0.0006 sec (1.419 GiB/sec and 1550.3876 ops/sec)
linked 262144/262144 bytes at offset 524288
256 KiB, 1 ops; 0.0020 sec (120.192 MiB/sec and 480.7692 ops/sec)
linked 262144/262144 bytes at offset 131072
256 KiB, 1 ops; 0.0018 sec (133.833 MiB/sec and 535.3319 ops/sec)
wrote 655360/655360 bytes at offset 393216
640 KiB, 160 ops; 0.0093 sec (66.781 MiB/sec and 17095.8436 ops/sec)
Create a readonly snapshot of '/mnt/sdi' in '/mnt/sdi/snap2'
At subvol /mnt/sdi/snap2
File digest in the original filesystem:
9c13c61cb0b9f5abf45344375cb04dfa /mnt/sdi/snap2/foobar
At subvol snap1
At snapshot snap2
ERROR: failed to clone extents to foobar: Invalid argument
File digest in the new filesystem:
132f0396da8f48d2e667196bff882cfc /mnt/sdi/snap2/foobar
The clone operation is invalid because its source range starts at the
current eof of the file in the receiver, causing the receiver to get
an EINVAL error from the clone operation when attempting it.
For the example above, what happens is the following:
1) When processing the extent at file offset 1M, the algorithm checks that
the extent is shared and can be (fully or partially) found at file
offset 0.
At this point the file has a size (and eof) of 1M at the receiver;
2) It finds that our extent item at file offset 1M has a data offset of
64K and, since the file extent item at file offset 0 has a data offset
of 0, it issues a clone operation, from the same file and root, that
has a source range offset of 64K, destination offset of 1M and a length
of 64K, since the extent item at file offset 0 refers only to the first
128K of the shared extent.
After this clone operation, the file size (and eof) at the receiver is
increased from 1M to 1088K (1M + 64K);
3) Now there's still 896K (960K - 64K) of data left to clone or write, so
it checks for the next file extent item, which starts at file offset
128K. This file extent item has a data offset of 0 and a length of
256K, so a clone operation with a source range offset of 256K, a
destination offset of 1088K (1M + 64K) and length of 128K is issued.
After this operation the file size (and eof) at the receiver increases
from 1088K to 1216K (1088K + 128K);
4) Now there's still 768K (896K - 128K) of data left to clone or write, so
it checks for the next file extent item, located at file offset 384K.
This file extent item points to a different extent, not the one we want
to clone, with a length of 640K. So we issue a write operation into the
file range 1216K (1088K + 128K, end of the last clone operation), with
a length of 640K and with a data matching the one we can find for that
range in send root.
After this operation, the file size (and eof) at the receiver increases
from 1216K to 1856K (1216K + 640K);
5) Now there's still 128K (768K - 640K) of data left to clone or write, so
we look into the file extent item, which is for file offset 1M and it
points to the extent we want to clone, with a data offset of 64K and a
length of 960K.
However this matches the file offset we started with, the start of the
range to clone into. So we can't for sure find any file extent item
from here onwards with the rest of the data we want to clone, yet we
proceed and since the file extent item points to the shared extent,
with a data offset of 64K, we issue a clone operation with a source
range starting at file offset 1856K, which matches the file extent
item's offset, 1M, plus the amount of data cloned and written so far,
which is 64K (step 2) + 128K (step 3) + 640K (step 4). This clone
operation is invalid since the source range offset matches the current
eof of the file in the receiver. We should have stopped looking for
extents to clone at this point and instead fallback to write, which
would simply the contain the data in the file range from 1856K to
1856K + 128K.
So fix this by stopping the loop that looks for file ranges to clone at
clone_range() when we reach the current eof of the file being processed,
if we are cloning from the same file and using the send root as the clone
root. This ensures any data not yet cloned will be sent to the receiver
through a write operation.
A test case for fstests will follow soon.
Reported-by: Massimo B. <massimo.b@gmx.net>
Link: https://lore.kernel.org/linux-btrfs/6ae34776e85912960a253a8327068a892998e685.camel@gmx.net/
Fixes: 11f2069c11 ("Btrfs: send, allow clone operations within the same file")
CC: stable@vger.kernel.org # 5.5+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The inode number cache has been removed in this dev cycle, there's one
more leftover. We don't need to run the delayed refs again after
commit_fs_roots as stated in the comment, because btrfs_save_ino_cache
is no more since 5297199a8b ("btrfs: remove inode number cache
feature").
Nothing else between commit_fs_roots and btrfs_qgroup_account_extents
could create new delayed refs so the qgroup consistency should be safe.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
As snd_fw_async_midi_port.consume_bytes is unsigned int, and
NSEC_PER_SEC is 1000000000L, the second multiplication in
port->consume_bytes * 8 * NSEC_PER_SEC / 31250
always overflows on 32-bit platforms, truncating the result. Fix this
by precalculating "NSEC_PER_SEC / 31250", which is an integer constant.
Note that this assumes port->consume_bytes <= 16777.
Fixes: 531f471834 ("ALSA: firewire-lib/firewire-tascam: localize async midi port")
Reviewed-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/20210111130251.361335-3-geert+renesas@glider.be
Signed-off-by: Takashi Iwai <tiwai@suse.de>
As snd_ff.rx_bytes[] is unsigned int, and NSEC_PER_SEC is 1000000000L,
the second multiplication in
ff->rx_bytes[port] * 8 * NSEC_PER_SEC / 31250
always overflows on 32-bit platforms, truncating the result. Fix this
by precalculating "NSEC_PER_SEC / 31250", which is an integer constant.
Note that this assumes ff->rx_bytes[port] <= 16777.
Fixes: 1917429578 ("ALSA: fireface: add transaction support")
Reviewed-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/20210111130251.361335-2-geert+renesas@glider.be
Signed-off-by: Takashi Iwai <tiwai@suse.de>
If you export a subdirectory of a filesystem, a READDIRPLUS on the root
of that export will return the filehandle of the parent with the ".."
entry.
The filehandle is optional, so let's just not return the filehandle for
".." if we're at the root of an export.
Note that once the client learns one filehandle outside of the export,
they can trivially access the rest of the export using further lookups.
However, it is also not very difficult to guess filehandles outside of
the export. So exporting a subdirectory of a filesystem should
considered equivalent to providing access to the entire filesystem. To
avoid confusion, we recommend only exporting entire filesystems.
Reported-by: Youjipeng <wangzhibei1999@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
My employment with TI is ending tomorrow, so update the email address
entry in the maintainers file. Also, I don't expect to spend that much
time with maintaining TI code anymore, so downgrade the status level to
odd fixes only on areas where I remain as the main contact point for
now, and move myself as secondary contact point where someone else has
taken over the maintainership.
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Tero Kristo <kristo@kernel.org>
Signed-off-by: Nishanth Menon <nm@ti.com>
Acked-by: Nishanth Menon <nm@ti.com>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: Michael Turquette <mturquette@baylibre.com>
Cc: Nishanth Menon <nm@ti.com>
Cc: Santosh Shilimkar <ssantosh@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20201217130721.23555-1-t-kristo@ti.com
Currently hda on tegra30 fails to open a stream with an input/output error.
For example:
speaker-test -Dhw:0,3 -c 2
speaker-test 1.2.2
Playback device is hw:0,3
Stream parameters are 48000Hz, S16_LE, 2 channels
Using 16 octaves of pink noise
Rate set to 48000Hz (requested 48000Hz)
Buffer size range from 64 to 16384
Period size range from 32 to 8192
Using max buffer size 16384
Periods = 4
was set period_size = 4096
was set buffer_size = 16384
0 - Front Left
Write error: -5,Input/output error
xrun_recovery failed: -5,Input/output error
Transfer failed: Input/output error
The tegra-hda device was introduced in tegra30 but only utilized in
tegra124 until recent chips. Tegra210/186 work only due to a hardware
change. For this reason it is unknown when this issue first manifested.
Discussions with the hardware team show this applies to all current tegra
chips. It has been resolved in the tegra234, which does not have hda
support at this time.
The explanation from the hardware team is this:
Below is the striping formula referenced from HD audio spec.
{ ((num_channels * bits_per_sample) / number of SDOs) >= 8 }
The current issue is seen because Tegra HW has a problem with boundary
condition (= 8) for striping. The reason why it is not seen on
Tegra210/Tegra186 is because it uses max 2SDO lines. Max SDO lines is
read from GCAP register.
For the given stream (channels = 2, bps = 16);
ratio = (channels * bps) / NSDO = 32 / NSDO;
On Tegra30, ratio = 32/4 = 8 (FAIL)
On Tegra210/186, ratio = 32/2 = 16 (PASS)
On Tegra194, ratio = 32/4 = 8 (FAIL) ==> Earlier workaround was
applied for it
If Tegra210/186 is forced to use 4SDO, it fails there as well. So the
behavior is consistent across all these chips.
Applying the fix in [1] universally resolves this issue on tegra30-hda.
Tested on the Ouya game console and the tf201 tablet.
[1] commit 60019d8c65 ("ALSA: hda/tegra: workaround playback failure on
Tegra194")
Reviewed-by: Jon Hunter <jonathanh@nvidia.com>
Tested-by: Ion Agorria <ion@agorria.com>
Reviewed-by: Sameer Pujar <spujar@nvidia.com>
Acked-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Peter Geis <pgwipeout@gmail.com>
Link: https://lore.kernel.org/r/20210108135913.2421585-3-pgwipeout@gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
When using Deep learning framework such as tensorflow or pytorch, there
are tens of thousands of host memory mappings. When the user frees
all those mappings at the same time, the process of unmapping and
unpinning them can take a long time, which may cause a soft lockup
bug.
To prevent this, we need to free the core to do other things during
the unmapping process. For now, we chose to do it every 32K unmappings
(each unmap is a single 4K page).
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
There are some points in the reset process where if the code fails
for some reason, and the system admin tries to initiate the reset
process again we will get a kernel panic.
This is because there aren't any protections in different fini
functions that are called during the reset process.
The protections that are added in this patch make sure that if the fini
functions are called multiple times, without calling init functions
between them, there won't be double release of already released
resources.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
When doing dma_alloc_coherent in the driver, we add a certain hard-coded
offset to the DMA address before returning to the callee function. This
offset is needed when our device use this DMA address to perform
outbound transactions to the host.
However, if we want to map the DMA'able memory to the user via
dma_mmap_coherent(), we need to pass the original dma address, without
this offset. Otherwise, we will get erronouos mapping.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Commit 528222d853 ("media: rc: harmonize infrared durations to
microseconds") missed to switch the min_timeout calculation from ns
to us. This resulted in a minimum timeout of 1.2 seconds instead of 1.2ms,
leading to large delays and long key repeats.
Fix this by applying proper ns->us conversion.
Cc: stable@vger.kernel.org
Fixes: 528222d853 ("media: rc: harmonize infrared durations to microseconds")
Signed-off-by: Matthias Reichl <hias@horus.com>
Signed-off-by: Sean Young <sean@mess.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
This is a bug that causes early crashes in builds with an .exit.text
section smaller than a page and an .init.text section that ends in the
beginning of a physical page (this is kinda random, which might
explain why this wasn't really encountered before).
The init sections are ordered like this:
.init.text
.exit.text
.init.data
Currently, these sections aren't page aligned.
Because the init code might become read-only at runtime and because
the .init.text section can potentially reside on the same physical
page as .init.data, the beginning of .init.data might be mapped
read-only along with .init.text.
Then when the kernel tries to modify a variable in .init.data (like
kthreadd_done, used in kernel_init()) the kernel panics.
To avoid this, make _einittext page aligned and also align .exit.text
to make sure .init.data is always seperated from the text segments.
Fixes: 060ef9d89d ("powerpc32: PAGE_EXEC required for inittext")
Signed-off-by: Ariel Marcovitch <ariel.marcovitch@gmail.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210102201156.10805-1-ariel.marcovitch@gmail.com
Willem de Bruijn says:
====================
skb frag: kmap_atomic fixes
skb frags may be backed by highmem and/or compound pages. Various
code calls kmap_atomic to safely access highmem pages. But this
needs additional care for compound pages. Fix a few issues:
patch 1 expect kmap mappings with CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP
patch 2 fixes kmap_atomic + compound page support in skb_seq_read
patch 3 fixes kmap_atomic + compound page support in esp
====================
Link: https://lore.kernel.org/r/20210109221834.3459768-1-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
esp(6)_output_head uses skb_page_frag_refill to allocate a buffer for
the esp trailer.
It accesses the page with kmap_atomic to handle highmem. But
skb_page_frag_refill can return compound pages, of which
kmap_atomic only maps the first underlying page.
skb_page_frag_refill does not return highmem, because flag
__GFP_HIGHMEM is not set. ESP uses it in the same manner as TCP.
That also does not call kmap_atomic, but directly uses page_address,
in skb_copy_to_page_nocache. Do the same for ESP.
This issue has become easier to trigger with recent kmap local
debugging feature CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP.
Fixes: cac2661c53 ("esp4: Avoid skb_cow_data whenever possible")
Fixes: 03e2a30f6a ("esp6: Avoid skb_cow_data whenever possible")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
skb_seq_read iterates over an skb, returning pointer and length of
the next data range with each call.
It relies on kmap_atomic to access highmem pages when needed.
An skb frag may be backed by a compound page, but kmap_atomic maps
only a single page. There are not enough kmap slots to always map all
pages concurrently.
Instead, if kmap_atomic is needed, iterate over each page.
As this increases the number of calls, avoid this unless needed.
The necessary condition is captured in skb_frag_must_loop.
I tried to make the change as obvious as possible. It should be easy
to verify that nothing changes if skb_frag_must_loop returns false.
Tested:
On an x86 platform with
CONFIG_HIGHMEM=y
CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP=y
CONFIG_NETFILTER_XT_MATCH_STRING=y
Run
ip link set dev lo mtu 1500
iptables -A OUTPUT -m string --string 'badstring' -algo bm -j ACCEPT
dd if=/dev/urandom of=in bs=1M count=20
nc -l -p 8000 > /dev/null &
nc -w 1 -q 0 localhost 8000 < in
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Skb frags may be backed by highmem and/or compound pages. Highmem
pages need kmap_atomic mappings to access. But kmap_atomic maps a
single page, not the entire compound page.
skb_foreach_page iterates over an skb frag, in one step in the common
case, page by page only if kmap_atomic must be called for each page.
The decision logic is captured in skb_frag_must_loop.
CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP extends kmap from highmem to all
pages, to increase code coverage.
Extend skb_frag_must_loop to this new condition.
Link: https://lore.kernel.org/linux-mm/20210106180132.41dc249d@gandalf.local.home/
Fixes: 0e91a0c698 ("mm/highmem: Provide CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP")
Reported-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Tested-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
MSFT ActiveSync implementation requires that the size of the response for
incoming query is to be provided in the request input length. Failure to
set the input size proper results in failed request transfer, where the
ActiveSync counterpart reports the NDIS_STATUS_INVALID_LENGTH (0xC0010014L)
error.
Set the input size for OID_GEN_PHYSICAL_MEDIUM query to the expected size
of the response in order for the ActiveSync to properly respond to the
request.
Fixes: 039ee17d1b ("rndis_host: Add RNDIS physical medium checking into generic_rndis_bind()")
Signed-off-by: Andrey Zhizhikin <andrey.zhizhikin@leica-geosystems.com>
Link: https://lore.kernel.org/r/20210108095839.3335-1-andrey.zhizhikin@leica-geosystems.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The bpf_tracing_prog_attach error path calls bpf_prog_put
on prog, which causes refcount underflow when it's called
from link_create function.
link_create
prog = bpf_prog_get <-- get
...
tracing_bpf_link_attach(prog..
bpf_tracing_prog_attach(prog..
out_put_prog:
bpf_prog_put(prog); <-- put
if (ret < 0)
bpf_prog_put(prog); <-- put
Removing bpf_prog_put call from bpf_tracing_prog_attach
and making sure its callers call it instead.
Fixes: 4a1e7c0c63 ("bpf: Support attaching freplace programs to multiple attach points")
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210111191650.1241578-1-jolsa@kernel.org
Pull tracing fix from Steven Rostedt:
"Blacklist properly on all archs.
The code to blacklist notrace functions for kprobes was not using the
right kconfig option, which caused some archs (powerpc) to possibly
not blacklist them"
* tag 'trace-v5.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing/kprobes: Do the notrace functions check without kprobes on ftrace
Pull btrfs fixes from David Sterba:
"More material for stable trees.
- tree-checker: check item end overflow
- fix false warning during relocation regarding extent type
- fix inode flushing logic, caused notable performance regression
(since 5.10)
- debugging fixups:
- print correct offset for reloc tree key
- pass reliable fs_info pointer to error reporting helper"
* tag 'for-5.11-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: shrink delalloc pages instead of full inodes
btrfs: reloc: fix wrong file extent type check to avoid false ENOENT
btrfs: tree-checker: check if chunk item end overflows
btrfs: prevent NULL pointer dereference in extent_io_tree_panic
btrfs: print the actual offset in btrfs_root_name
When attempting to match EXTENDED COPY CSCD descriptors with corresponding
se_devices, target_xcopy_locate_se_dev_e4() currently iterates over LIO's
global devices list which includes all configured backstores.
This change ensures that only initiator-accessible backstores are
considered during CSCD descriptor lookup, according to the session's
se_node_acl LUN list.
To avoid LUN removal race conditions, device pinning is changed from being
configfs based to instead using the se_node_acl lun_ref.
Reference: CVE-2020-28374
Fixes: cbf031f425 ("target: Add support for EXTENDED_COPY copy offload emulation")
Reviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: David Disseldorp <ddiss@suse.de>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Upon a communication error, the interrupt handler can call
tegra_i2c_disable_packet_mode. This causes a sleeping poll to happen
unless the current transaction was marked atomic. Fix this by
making the poll happen atomically if we are in an IRQ.
This matches the behavior prior to the patch mentioned
in the Fixes tag.
Fixes: ede2299f71 ("i2c: tegra: Support atomic transfers")
Cc: stable@vger.kernel.org
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Reviewed-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Julia and 0day report:
Zero-length and one-element arrays are deprecated, see
Documentation/process/deprecated.rst
Flexible-array members should be used instead.
However, a straight conversion to flexible arrays yields:
drivers/acpi/nfit/core.c:2276:4: error: flexible array member in a struct with no named members
drivers/acpi/nfit/core.c:2287:4: error: flexible array member in a struct with no named members
Instead, just use plain arrays not embedded flexible arrays.
Cc: Denis Efremov <efremov@linux.com>
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Julia Lawall <julia.lawall@inria.fr>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Linux VM on Hyper-V crashes with the latest mainline:
[ 4.069624] detected buffer overflow in strcpy
[ 4.077733] kernel BUG at lib/string.c:1149!
..
[ 4.085819] RIP: 0010:fortify_panic+0xf/0x11
...
[ 4.085819] Call Trace:
[ 4.085819] acpi_device_add.cold.15+0xf2/0xfb
[ 4.085819] acpi_add_single_object+0x2a6/0x690
[ 4.085819] acpi_bus_check_add+0xc6/0x280
[ 4.085819] acpi_ns_walk_namespace+0xda/0x1aa
[ 4.085819] acpi_walk_namespace+0x9a/0xc2
[ 4.085819] acpi_bus_scan+0x78/0x90
[ 4.085819] acpi_scan_init+0xfa/0x248
[ 4.085819] acpi_init+0x2c1/0x321
[ 4.085819] do_one_initcall+0x44/0x1d0
[ 4.085819] kernel_init_freeable+0x1ab/0x1f4
This is because of the recent buffer overflow detection in the
commit 6a39e62abb ("lib: string.h: detect intra-object overflow in
fortified string functions")
Here acpi_device_bus_id->bus_id can only hold 14 characters, while the
the acpi_device_hid(device) returns a 22-char string
"HYPER_V_GEN_COUNTER_V1".
Per ACPI Spec v6.2, Section 6.1.5 _HID (Hardware ID), if the ID is a
string, it must be of the form AAA#### or NNNN####, i.e. 7 chars or 8
chars.
The field bus_id in struct acpi_device_bus_id was originally defined as
char bus_id[9], and later was enlarged to char bus_id[15] in 2007 in the
commit bb0958544f ("ACPI: use more understandable bus_id for ACPI
devices")
Fix the issue by changing the field bus_id to const char *, and use
kstrdup_const() to initialize it.
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Tested-By: Jethro Beekman <jethro@fortanix.com>
[ rjw: Subject change, whitespace adjustment ]
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Pull hyperv fixes from Wei Liu:
- fix kexec panic/hang (Dexuan Cui)
- fix occasional crashes when flushing TLB (Wei Liu)
* tag 'hyperv-fixes-signed-20210111' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
x86/hyperv: check cpu mask after interrupt has been disabled
x86/hyperv: Fix kexec panic/hang issues
Don't assume dest/source buffers are userspace addresses when manually
copying data for string I/O or MOVS MMIO, as {get,put}_user() will fail
if handed a kernel address and ultimately lead to a kernel panic.
When invoking INSB/OUTSB instructions in kernel space in a
SEV-ES-enabled VM, the kernel crashes with the following message:
"SEV-ES: Unsupported exception in #VC instruction emulation - can't continue"
Handle that case properly.
[ bp: Massage commit message. ]
Fixes: f980f9c31a ("x86/sev-es: Compile early handler code into kernel image")
Signed-off-by: Hyunwook (Wooky) Baek <baekhw@google.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: David Rientjes <rientjes@google.com>
Link: https://lkml.kernel.org/r/20210110071102.2576186-1-baekhw@google.com
Automatic Clock Gating is a feature used for the power consumption
optimisation. It turned out that during early init phase it may prevent the
stable voltage switch to 1.8V - due to that on some platforms an endless
printout in dmesg can be observed: "mmc1: 1.8V regulator output did not
became stable" Fix the problem by disabling the ACG at very beginning of
the sdhci_init and let that be enabled later.
Fixes: 3a3748dba8 ("mmc: sdhci-xenon: Add Marvell Xenon SDHC core functionality")
Signed-off-by: Alex Leibovich <alexl@marvell.com>
Signed-off-by: Marcin Wojtas <mw@semihalf.com>
Cc: stable@vger.kernel.org
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Link: https://lore.kernel.org/r/20201211141656.24915-1-mw@semihalf.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Commit a44f7cb937 ("mmc: core: use mrq->sbc when sending CMD23 for
RPMB") began to use ACMD23 for RPMB if the host supports ACMD23. In
RPMB ACM23 case, we need to set bit 31 to CMD23 argument, otherwise
RPMB write operation will return general fail.
However, no matter V4 is enabled or not, the dwcmshc's ARGUMENT2
register is 32-bit block count register which doesn't support stuff
bits of CMD23 argument. So let's handle this specific ACMD23 case.
From another side, this patch also prepare for future v4 enabling
for dwcmshc, because from the 4.10 spec, the ARGUMENT2 register is
redefined as 32bit block count which doesn't support stuff bits of
CMD23 argument.
Fixes: a44f7cb937 ("mmc: core: use mrq->sbc when sending CMD23 for RPMB")
Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Link: https://lore.kernel.org/r/20201229161625.38255233@xhacker.debian
Cc: stable@vger.kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
In rare cases a task may be exiting while io_ring_exit_work() trying to
cancel/wait its requests. It's ok for __io_sq_thread_acquire_mm()
because of SQPOLL check, but is not for __io_sq_thread_acquire_files().
Play safe and fail for both of them.
Cc: stable@vger.kernel.org # 5.5+
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
__io_req_task_submit() run by task_work can set mm and files, but
io_sq_thread() in some cases, and because __io_sq_thread_acquire_mm()
and __io_sq_thread_acquire_files() do a simple current->mm/files check
it may end up submitting IO with mm/files of another task.
We also need to drop it after in the end to drop potentially grabbed
references to them.
Cc: stable@vger.kernel.org # 5.9+
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
With TZ system reboot cannot finish successfully. To fix that
enable core clocks by runtime pm before TZ calls and disable
clocks after that.
Tested-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Commit 528222d853 ("media: rc: harmonize infrared durations to
microseconds") missed to switch some timeout calculations from
nanoseconds to microseconds. This resulted in spurious key_up+key_down
events at the last scancode if the rc device uses a long timeout
(eg 100ms on nuvoton-cir) as the device timeout wasn't properly
accounted for in the keyup timeout calculation.
Fix this by applying the proper conversion functions.
Cc: stable@vger.kernel.org
Fixes: 528222d853 ("media: rc: harmonize infrared durations to microseconds")
Signed-off-by: Matthias Reichl <hias@horus.com>
Signed-off-by: Sean Young <sean@mess.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
The pch_get_backlight(), lpt_get_backlight(), and lpt_set_backlight()
functions operate directly on the hardware registers. If inverting the
value is needed, using intel_panel_compute_brightness(), it should only
be done in the interface between hardware registers and
panel->backlight.level.
The CPU mode takeover code added in commit 5b1ec9ac7a
("drm/i915/backlight: Fix backlight takeover on LPT, v3.") reads the
hardware register and converts to panel->backlight.level correctly,
however the value written back should remain in the hardware register
"domain".
This hasn't been an issue, because GM45 machines are the only known
users of i915.invert_brightness and the brightness invert quirk, and
without one of them no conversion is made. It's likely nobody's ever hit
the problem.
Fixes: 5b1ec9ac7a ("drm/i915/backlight: Fix backlight takeover on LPT, v3.")
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Lyude Paul <lyude@redhat.com>
Cc: <stable@vger.kernel.org> # v5.1+
Reviewed-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210108152841.6944-1-jani.nikula@intel.com
(cherry picked from commit 0d4ced1c5b)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Commit 25b4620ee8 ("drm/i915/dsi: Skip delays for v3 VBTs in vid-mode")
added an intel_dsi_msleep() helper which skips sleeping if the
MIPI-sequences have a version of 3 or newer and the panel is in vid-mode;
and it moved a bunch of msleep-s over to this new helper.
This was based on my reading of the big comment around line 730 which
starts with "Panel enable/disable sequences from the VBT spec.",
where the "v3 video mode seq" column does not have any wait t# entries.
Given that this code has been used on a lot of different devices without
issues until now, it seems that my interpretation of the spec here is
mostly correct.
But now I have encountered one device, an Acer Aspire Switch 10 E
SW3-016, where the panel will not light up unless we do actually honor the
panel_on_delay after exexuting the MIPI_SEQ_PANEL_ON sequence.
What seems to set this model apart is that it is lacking a
MIPI_SEQ_DEASSERT_RESET sequence, which is where the power-on
delay usually happens.
Fix the panel not lighting up on this model by using an unconditional
msleep(panel_on_delay) instead of intel_dsi_msleep() when there is
no MIPI_SEQ_DEASSERT_RESET sequence.
Fixes: 25b4620ee8 ("drm/i915/dsi: Skip delays for v3 VBTs in vid-mode")
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201118124058.26021-1-hdegoede@redhat.com
(cherry picked from commit 6fdb335f1c)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Selecting ARM_GIC_V3 on non-CP15 processors leads to build failures
like
arch/arm/include/asm/arch_gicv3.h: In function 'write_ICC_AP1R3_EL1':
arch/arm/include/asm/arch_gicv3.h:36:40: error: 'c12' undeclared (first use in this function)
36 | #define __ICC_AP1Rx(x) __ACCESS_CP15(c12, 0, c9, x)
| ^~~
Add a dependency to only enable the gic driver when building for
at an ARMv7 target, which is the closes approximation to the ARMv8
processor that is actually in this chip.
Fixes: fc40200ebf ("soc: imx: increase build coverage for imx8m soc driver")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
to be consistent with kernel versions up to v5.9 (mmc aliases not used here).
usdhc1 is not wired up on this board and therefore cannot be used.
Start mmc aliases with usdhc2.
Signed-off-by: Soeren Moch <smoch@web.de>
Cc: stable@vger.kernel.org # 5.10.x
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
When register_pernet_subsys() fails, nf_nat_bysource
should be freed just like when nf_ct_extend_register()
fails.
Fixes: 1cd472bf03 ("netfilter: nf_nat: add nat hook register functions to nf_nat")
Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Before referencing the inode, we must ensure that the superblock can be
referenced. Otherwise, we can end up with iput() calling superblock
operations that are no longer valid or accessible.
Fixes: ea7c38fef0 ("NFSv4: Ensure we reference the inode for return-on-close in delegreturn")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Before referencing the inode, we must ensure that the superblock can be
referenced. Otherwise, we can end up with iput() calling superblock
operations that are no longer valid or accessible.
Fixes: e39d8a186e ("NFSv4: Fix an Oops during delegation callbacks")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Pull Kbuild fixes from Masahiro Yamada:
- Search for <ncurses.h> in the default header path of HOSTCC
- Tweak the option order to be kind to old BSD awk
- Remove 'kvmconfig' and 'xenconfig' shorthands
- Fix documentation
* tag 'kbuild-fixes-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
Documentation: kbuild: Fix section reference
kconfig: remove 'kvmconfig' and 'xenconfig' shorthands
lib/raid6: Let $(UNROLL) rules work with macOS userland
kconfig: Support building mconf with vendor sysroot ncurses
kconfig: config script: add a little user help
MAINTAINERS: adjust GCC PLUGINS after gcc-plugin.sh removal
Pull SCSI fixes from James Bottomley:
"This is two driver fixes (megaraid_sas and hisi_sas).
The megaraid one is a revert of a previous revert of a cpu hotplug fix
which exposed a bug in the block layer which has been fixed in this
merge window.
The hisi_sas performance enhancement comes from switching to interrupt
managed completion queues, which depended on the addition of
devm_platform_get_irqs_affinity() which is now upstream via the irq
tree in the last merge window"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: hisi_sas: Expose HW queues for v2 hw
Revert "Revert "scsi: megaraid_sas: Added support for shared host tagset for cpuhotplug""
Pull block fixes from Jens Axboe:
- Missing CRC32 selections (Arnd)
- Fix for a merge window regression with bdev inode init (Christoph)
- bcache fixes
- rnbd fixes
- NVMe pull request from Christoph:
- fix a race in the nvme-tcp send code (Sagi Grimberg)
- fix a list corruption in an nvme-rdma error path (Israel Rukshin)
- avoid a possible double fetch in nvme-pci (Lalithambika Krishnakumar)
- add the susystem NQN quirk for a Samsung driver (Gopal Tiwari)
- fix two compiler warnings in nvme-fcloop (James Smart)
- don't call sleeping functions from irq context in nvme-fc (James Smart)
- remove an unused argument (Max Gurtovoy)
- remove unused exports (Minwoo Im)
- Use-after-free fix for partition iteration (Ming)
- Missing blk-mq debugfs flag annotation (John)
- Bdev freeze regression fix (Satya)
- blk-iocost NULL pointer deref fix (Tejun)
* tag 'block-5.11-2021-01-10' of git://git.kernel.dk/linux-block: (26 commits)
bcache: set bcache device into read-only mode for BCH_FEATURE_INCOMPAT_OBSO_LARGE_BUCKET
bcache: introduce BCH_FEATURE_INCOMPAT_LOG_LARGE_BUCKET_SIZE for large bucket
bcache: check unsupported feature sets for bcache register
bcache: fix typo from SUUP to SUPP in features.h
bcache: set pdev_set_uuid before scond loop iteration
blk-mq-debugfs: Add decode for BLK_MQ_F_TAG_HCTX_SHARED
block/rnbd-clt: avoid module unload race with close confirmation
block/rnbd: Adding name to the Contributors List
block/rnbd-clt: Fix sg table use after free
block/rnbd-srv: Fix use after free in rnbd_srv_sess_dev_force_close
block/rnbd: Select SG_POOL for RNBD_CLIENT
block: pre-initialize struct block_device in bdev_alloc_inode
fs: Fix freeze_bdev()/thaw_bdev() accounting of bd_fsfreeze_sb
nvme: remove the unused status argument from nvme_trace_bio_complete
nvmet-rdma: Fix list_del corruption on queue establishment failure
nvme: unexport functions with no external caller
nvme: avoid possible double fetch in handling CQE
nvme-tcp: Fix possible race of io_work and direct send
nvme-pci: mark Samsung PM1725a as IGNORE_DEV_SUBNQN
nvme-fcloop: Fix sscanf type and list_first_entry_or_null warnings
...
Pull io_uring fixes from Jens Axboe:
"A bit larger than I had hoped at this point, but it's all changes that
will be directed towards stable anyway. In detail:
- Fix a merge window regression on error return (Matthew)
- Remove useless variable declaration/assignment (Ye Bin)
- IOPOLL fixes (Pavel)
- Exit and cancelation fixes (Pavel)
- fasync lockdep complaint fix (Pavel)
- Ensure SQPOLL is synchronized with creator life time (Pavel)"
* tag 'io_uring-5.11-2021-01-10' of git://git.kernel.dk/linux-block:
io_uring: stop SQPOLL submit on creator's death
io_uring: add warn_once for io_uring_flush()
io_uring: inline io_uring_attempt_task_drop()
io_uring: io_rw_reissue lockdep annotations
io_uring: synchronise ev_posted() with waitqueues
io_uring: dont kill fasync under completion_lock
io_uring: trigger eventfd for IOPOLL
io_uring: Fix return value from alloc_fixed_file_ref_node
io_uring: Delete useless variable ‘id’ in io_prep_async_work
io_uring: cancel more aggressively in exit_work
io_uring: drop file refs after task cancel
io_uring: patch up IOPOLL overflow_flush sync
io_uring: synchronise IOPOLL on task_submit fail
Pull USB fixes from Greg KH:
"Here are a number of small USB driver fixes for 5.11-rc3.
Include in here are:
- USB gadget driver fixes for reported issues
- new usb-serial driver ids
- dma from stack bugfixes
- typec bugfixes
- dwc3 bugfixes
- xhci driver bugfixes
- other small misc usb driver bugfixes
All of these have been in linux-next with no reported issues"
* tag 'usb-5.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (35 commits)
usb: dwc3: gadget: Clear wait flag on dequeue
usb: typec: Send uevent for num_altmodes update
usb: typec: Fix copy paste error for NVIDIA alt-mode description
usb: gadget: enable super speed plus
kcov, usb: hide in_serving_softirq checks in __usb_hcd_giveback_urb
usb: uas: Add PNY USB Portable SSD to unusual_uas
usb: gadget: configfs: Preserve function ordering after bind failure
usb: gadget: select CONFIG_CRC32
usb: gadget: core: change the comment for usb_gadget_connect
usb: gadget: configfs: Fix use-after-free issue with udc_name
usb: dwc3: gadget: Restart DWC3 gadget when enabling pullup
usb: usbip: vhci_hcd: protect shift size
USB: usblp: fix DMA to stack
USB: serial: iuu_phoenix: fix DMA from stack
USB: serial: option: add LongSung M5710 module support
USB: serial: option: add Quectel EM160R-GL
USB: Gadget: dummy-hcd: Fix shift-out-of-bounds bug
usb: gadget: f_uac2: reset wMaxPacketSize
usb: dwc3: ulpi: Fix USB2.0 HS/FS/LS PHY suspend regression
usb: dwc3: ulpi: Replace CPU-based busyloop with Protocol-based one
...
Pull staging driver fixes from Greg KH:
"Here are some small staging driver fixes for 5.11-rc3. Nothing major,
just resolving some reported issues:
- cleanup some remaining mentions of the ION drivers that were
removed in 5.11-rc1
- comedi driver bugfix
- two error path memory leak fixes
All have been in linux-next for a while with no reported issues"
* tag 'staging-5.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: ION: remove some references to CONFIG_ION
staging: mt7621-dma: Fix a resource leak in an error handling path
Staging: comedi: Return -EFAULT if copy_to_user() fails
staging: spmi: hisi-spmi-controller: Fix some error handling paths
Pull char/misc driver fixes from Greg KH:
"Here are some small char and misc driver fixes for 5.11-rc3.
The majority here are fixes for the habanalabs drivers, but also in
here are:
- crypto driver fix
- pvpanic driver fix
- updated font file
- interconnect driver fixes
All of these have been in linux-next for a while with no reported
issues"
* tag 'char-misc-5.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (26 commits)
Fonts: font_ter16x32: Update font with new upstream Terminus release
misc: pvpanic: Check devm_ioport_map() for NULL
speakup: Add github repository URL and bug tracker
MAINTAINERS: Update Georgi's email address
crypto: asym_tpm: correct zero out potential secrets
habanalabs: Fix memleak in hl_device_reset
interconnect: imx8mq: Use icc_sync_state
interconnect: imx: Remove a useless test
interconnect: imx: Add a missing of_node_put after of_device_is_available
interconnect: qcom: fix rpmh link failures
habanalabs: fix order of status check
habanalabs: register to pci shutdown callback
habanalabs: add validation cs counter, fix misplaced counters
habanalabs/gaudi: retry loading TPC f/w on -EINTR
habanalabs: adjust pci controller init to new firmware
habanalabs: update comment in hl_boot_if.h
habanalabs/gaudi: enhance reset message
habanalabs: full FW hard reset support
habanalabs/gaudi: disable CGM at HW initialization
habanalabs: Revise comment to align with mirror list name
...
Pull ARC fixes from Vineet Gupta:
- Address the 2nd boot failure due to snafu in signal handling code
(first was generic console ttynull issue)
- misc other fixes
* tag 'arc-5.11-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
ARC: [hsdk]: Enable FPU_SAVE_RESTORE
ARC: unbork 5.11 bootup: fix snafu in _TIF_NOTIFY_SIGNAL handling
include/soc: remove headers for EZChip NPS
arch/arc: add copy_user_page() to <asm/page.h> to fix build error on ARC
Pull powerpc fixes from Michael Ellerman:
- A fix for machine check handling with VMAP stack on 32-bit.
- A clang build fix.
Thanks to Christophe Leroy and Nathan Chancellor.
* tag 'powerpc-5.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc: Handle .text.{hot,unlikely}.* in linker script
powerpc/32s: Fix RTAS machine check with VMAP stack
Pull x86 fixes from Borislav Petkov:
"As expected, fixes started trickling in after the holidays so here is
the accumulated pile of x86 fixes for 5.11:
- A fix for fanotify_mark() missing the conversion of x86_32 native
syscalls which take 64-bit arguments to the compat handlers due to
former having a general compat handler. (Brian Gerst)
- Add a forgotten pmd page destructor call to pud_free_pmd_page()
where a pmd page is freed. (Dan Williams)
- Make IN/OUT insns with an u8 immediate port operand handling for
SEV-ES guests more precise by using only the single port byte and
not the whole s32 value of the insn decoder. (Peter Gonda)
- Correct a straddling end range check before returning the proper
MTRR type, when the end address is the same as top of memory.
(Ying-Tsun Huang)
- Change PQR_ASSOC MSR update scheme when moving a task to a resctrl
resource group to avoid significant performance overhead with some
resctrl workloads. (Fenghua Yu)
- Avoid the actual task move overhead when the task is already in the
resource group. (Fenghua Yu)"
* tag 'x86_urgent_for_v5.11_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/resctrl: Don't move a task to the same resource group
x86/resctrl: Use an IPI instead of task_work_add() to update PQR_ASSOC MSR
x86/mtrr: Correct the range check before performing MTRR type lookups
x86/sev-es: Fix SEV-ES OUT/IN immediate opcode vc handling
x86/mm: Fix leak of pmd ptlock
fanotify: Fix sys_fanotify_mark() on native x86-32
If we exit _lgopen_prepare_attached() without setting a layout, we will
currently leak the plh_outstanding counter.
Fixes: 411ae722d1 ("pNFS: Wait for stale layoutget calls to complete in pnfs_update_layout()")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
We must ensure that we pass a layout segment to nfs_retry_commit() when
we're cleaning up after pnfs_bucket_alloc_ds_commits(). Otherwise,
requests that should be committed to the DS will get committed to the
MDS.
Do so by ensuring that pnfs_bucket_get_committing() always tries to
return a layout segment when it returns a non-empty page list.
Fixes: c84bea5944 ("NFS/pNFS: Simplify bucket layout segment reference counting")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
In pnfs_generic_clear_request_commit(), we try calling
pnfs_free_bucket_lseg() before we remove the request from the DS bucket.
That will always fail, since the point is to test for whether or not
that bucket is empty.
Fixes: c84bea5944 ("NFS/pNFS: Simplify bucket layout segment reference counting")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
If a layout return is in progress, we should wait for it to complete,
in case the layout segment we are picking up gets returned too.
Fixes: 30cb3ee299 ("pNFS: Handle NFS4ERR_OLD_STATEID on layoutreturn by bumping the state seqid")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
If the inode is being evicted, it should be safe to run return-on-close,
so we should do it to ensure we don't inadvertently leak layout segments.
Fixes: 1c5bd76d17 ("pNFS: Enable layoutreturn operation for return-on-close")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
If the layout return-on-close failed because the layoutreturn was never
sent, then we should mark the layout for return again.
Fixes: 9c47b18cf7 ("pNFS: Ensure we do clear the return-on-close layout stateid on fatal errors")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
A return value of 0 means success. This is documented in lib/kstrtox.c.
This was found by trying to mount an NFS share from a link-local IPv6
address with the interface specified by its index:
mount("[fe80::1%1]:/srv/nfs", "/mnt", "nfs", 0, "nolock,addr=fe80::1%1")
Before this commit this failed with EINVAL and also caused the following
message in dmesg:
[...] NFS: bad IP address specified: addr=fe80::1%1
The syscall using the same address based on the interface name instead
of its index succeeds.
Credits for this patch go to my colleague Christian Speich, who traced
the origin of this bug to this line of code.
Signed-off-by: Johannes Nixdorf <j.nixdorf@avm.de>
Fixes: 00cfaa943e ("replace strict_strto calls")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Several existing dprink()/dfprintk() calls were converted to use the new
mount API logging macros by commit ce8866f091 ("NFS: Attach
supplementary error information to fs_context"). If the fs_context was
not created using fsopen() then it will not have had a log buffer
allocated for it, and the new mount API logging macros will wind up
calling printk().
This can result in syslog messages being logged where previously there
were none... most notably "NFS4: Couldn't follow remote path", which can
happen if the client is auto-negotiating a protocol version with an NFS
server that doesn't support the higher v4.x versions.
Convert the nfs_errorf(), nfs_invalf(), and nfs_warnf() macros to check
for the existence of the fs_context's log buffer and call dprintk() if
it doesn't exist. Add nfs_ferrorf(), nfs_finvalf(), and nfs_warnf(),
which do the same thing but take an NFS debug flag as an argument and
call dfprintk(). Finally, modify the "NFS4: Couldn't follow remote
path" message to use nfs_ferrorf().
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=207385
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Fixes: ce8866f091 ("NFS: Attach supplementary error information to fs_context.")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
The offset of the reset request register is 0, the absolute address is
0x1e60000. Boards without PSCI support will fail to perform a reset:
[ 26.734700] reboot: Restarting system
[ 27.743259] Unable to restart system
[ 27.746845] Reboot failed -- System halted
Fixes: 8897f3255c ("arm64: dts: Add support for NXP LS1028A SoC")
Signed-off-by: Michael Walle <michael@walle.cc>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Since commit 5556797662 ("genirq/irqdomain: Allow partial trimming of
irq_data hierarchy") the irq_data chain is valided.
The irq_domain_trim_hierarchy() function doesn't consider the irq + ipi
domain hierarchy as valid, since the ipi domain has the irq domain set
as parent, but the parent domain has no chip set. Hence the boot ends in
a kernel panic.
Set the chip for the parent domain as it is done in the mips gic irq
driver, to have a valid irq_data chain.
Fixes: 3838a547fd ("irqchip: mips-cpu: Introduce IPI IRQ domain support")
Cc: <stable@vger.kernel.org> # v5.10+
Signed-off-by: Mathias Kresin <dev@kresin.me>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210107213603.1637781-1-dev@kresin.me
The TI PRUSS INTC irqchip driver handles the local interrupt controller
which is a child device of it's parent PRUSS/ICSSG device. The driver
was upstreamed in parallel with the PRUSS platform driver, and was
configurable independently previously. The PRUSS interrupt controller
is an integral part of the overall PRUSS software architecture, and is
not useful at all by itself.
Simplify the TI_PRUSS_INTC Kconfig dependencies by making it silent and
selected automatically when the TI_PRUSS platform driver is enabled.
Signed-off-by: Suman Anna <s-anna@ti.com>
Reviewed-by: David Lechner <david@lechnology.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210108162901.6003-1-s-anna@ti.com
The old way of changing the conntrack hashsize runtime was through changing
the module param via file /sys/module/nf_conntrack/parameters/hashsize. This
was extended to sysctl change in commit 3183ab8997 ("netfilter: conntrack:
allow increasing bucket size via sysctl too").
The commit introduced second "user" variable nf_conntrack_htable_size_user
which shadow actual variable nf_conntrack_htable_size. When hashsize is
changed via module param this "user" variable isn't updated. This results in
sysctl net/netfilter/nf_conntrack_buckets shows the wrong value when users
update via the old way.
This patch fix the issue by always updating "user" variable when reading the
proc file. This will take care of changes to the actual variable without
sysctl need to be aware.
Fixes: 3183ab8997 ("netfilter: conntrack: allow increasing bucket size via sysctl too")
Reported-by: Yoel Caspersen <yoel@kviknet.dk>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Fix nft_conntrack_helper.sh false fail report:
1) Conntrack tool need "-f ipv6" parameter to show out ipv6 traffic items.
2) Sleep 1 second after background nc send packet, to make sure check
is after this statement executed.
False report:
FAIL: ns1-lkjUemYw did not show attached helper ip set via ruleset
PASS: ns1-lkjUemYw connection on port 2121 has ftp helper attached
...
After fix:
PASS: ns1-2hUniwU2 connection on port 2121 has ftp helper attached
PASS: ns2-2hUniwU2 connection on port 2121 has ftp helper attached
...
Fixes: 619ae8e069 ("selftests: netfilter: add test case for conntrack helper assignment")
Signed-off-by: Chen Yi <yiche@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Ido Schimmel says:
====================
mlxsw: core: Thermal control fixes
This series includes two fixes for thermal control in mlxsw.
Patch #1 validates that the alarm temperature threshold read from a
transceiver is above the warning temperature threshold. If not, the
current thresholds are maintained. It was observed that some transceiver
might be unreliable and sometimes report a too low alarm temperature
threshold which would result in thermal shutdown of the system.
Patch #2 increases the temperature threshold above which thermal
shutdown is triggered for the ASIC thermal zone. It is currently too low
and might result in thermal shutdown under perfectly fine operational
conditions.
====================
Link: https://lore.kernel.org/r/20210108145210.1229820-1-idosch@idosch.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Increase critical threshold for ASIC thermal zone from 110C to 140C
according to the system hardware requirements. All the supported ASICs
(Spectrum-1, Spectrum-2, Spectrum-3) could be still operational with ASIC
temperature below 140C. With the old critical threshold value system
can perform unjustified shutdown.
All the systems equipped with the above ASICs implement thermal
protection mechanism at firmware level and firmware could decide to
perform system thermal shutdown in case the temperature is below 140C.
So with the new threshold system will not meltdown, while thermal
operating range will be aligned with hardware abilities.
Fixes: 41e760841d ("mlxsw: core: Replace thermal temperature trips with defines")
Fixes: a50c1e3565 ("mlxsw: core: Implement thermal zone")
Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Validate thresholds to avoid a single failure due to some transceiver
unreliability. Ignore the last readouts in case warning temperature is
above alarm temperature, since it can cause unexpected thermal
shutdown. Stay with the previous values and refresh threshold within
the next iteration.
This is a rare scenario, but it was observed at a customer site.
Fixes: 6a79507cfe ("mlxsw: core: Extend thermal module with per QSFP module thermal zones")
Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
There are cases where GSO segment's length exceeds the egress MTU:
- Forwarding of a TCP GRO skb, when DF flag is not set.
- Forwarding of an skb that arrived on a virtualisation interface
(virtio-net/vhost/tap) with TSO/GSO size set by other network
stack.
- Local GSO skb transmitted on an NETIF_F_TSO tunnel stacked over an
interface with a smaller MTU.
- Arriving GRO skb (or GSO skb in a virtualised environment) that is
bridged to a NETIF_F_TSO tunnel stacked over an interface with an
insufficient MTU.
If so:
- Consume the SKB and its segments.
- Issue an ICMP packet with 'Packet Too Big' message containing the
MTU, allowing the source host to reduce its Path MTU appropriately.
Note: These cases are handled in the same manner in IPv4 output finish.
This patch aligns the behavior of IPv6 and the one of IPv4.
Fixes: 9e50849054 ("netfilter: ipv6: move POSTROUTING invocation before fragmentation")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/1610027418-30438-1-git-send-email-ayal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
For all PCI functions on the netxen_nic adapter, interrupt
mode (INTx or MSI) configuration is dependent on what has
been configured by the PCI function zero in the shared
interrupt register, as these adapters do not support mixed
mode interrupts among the functions of a given adapter.
Logic for setting MSI/MSI-x interrupt mode in the shared interrupt
register based on PCI function id zero check is not appropriate for
all family of netxen adapters, as for some of the netxen family
adapters PCI function zero is not really meant to be probed/loaded
in the host but rather just act as a management function on the device,
which caused all the other PCI functions on the adapter to always use
legacy interrupt (INTx) mode instead of choosing MSI/MSI-x interrupt mode.
This patch replaces that check with port number so that for all
type of adapters driver attempts for MSI/MSI-x interrupt modes.
Fixes: b37eb210c0 ("netxen_nic: Avoid mixed mode interrupts")
Signed-off-by: Manish Chopra <manishc@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Link: https://lore.kernel.org/r/20210107101520.6735-1-manishc@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull hwmon fixes from Guenter Roeck:
- Fix possible KASAN issue in amd_energy driver
- Avoid configuration problem in pwm-fan driver
- Fix kernel-doc warning in sbtsi_temp documentation
* tag 'hwmon-for-v5.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (amd_energy) fix allocation of hwmon_channel_info config
hwmon: (pwm-fan) Ensure that calculation doesn't discard big period values
hwmon: (sbtsi_temp) Fix Documenation kernel-doc warning
Pull dmaengine fixes from Vinod Koul:
"A bunch of dmaengine driver fixes for:
- coverity discovered issues for xilinx driver
- qcom, gpi driver fix for undefined bhaviour and one off cleanup
- update Peter's email for TI DMA drivers
- one-off for idxd driver
- resource leak fix for mediatek and milbeaut drivers"
* tag 'dmaengine-fix-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine:
dmaengine: stm32-mdma: fix STM32_MDMA_VERY_HIGH_PRIORITY value
dmaengine: xilinx_dma: fix mixed_enum_type coverity warning
dmaengine: xilinx_dma: fix incompatible param warning in _child_probe()
dmaengine: xilinx_dma: check dma_async_device_register return value
dmaengine: qcom: fix gpi undefined behavior
dt-bindings: dma: ti: Update maintainer and author information
MAINTAINERS: Add entry for Texas Instruments DMA drivers
qcom: bam_dma: Delete useless kfree code
dmaengine: dw-edma: Fix use after free in dw_edma_alloc_chunk()
dmaengine: milbeaut-xdmac: Fix a resource leak in the error handling path of the probe function
dmaengine: mediatek: mtk-hsdma: Fix a resource leak in the error handling path of the probe function
dmaengine: qcom: gpi: Fixes a format mismatch
dmaengine: idxd: off by one in cleanup code
dmaengine: ti: k3-udma: Fix pktdma rchan TPL level setup
Pull i2c fixes from Wolfram Sang:
"Three driver bugfixes for I2C. Buisness as usual"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: mediatek: Fix apdma and i2c hand-shake timeout
i2c: i801: Fix the i2c-mux gpiod_lookup_table not being properly terminated
i2c: sprd: use a specific timeout to avoid system hang up issue
Change my email contact ahead of a likely painful eleven-month migration
to a certain cobalt enteprisey groupware cloud product that will totally
break my workflow. Some day I may get used to having to email being
sequestered behind both claret and cerulean oath2+sms 2fa layers, but
for now I'll stick with keying in one password to receive an email vs.
the required four.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When the creator of SQPOLL io_uring dies (i.e. sqo_task), we don't want
its internals like ->files and ->mm to be poked by the SQPOLL task, it
have never been nice and recently got racy. That can happen when the
owner undergoes destruction and SQPOLL tasks tries to submit new
requests in parallel, and so calls io_sq_thread_acquire*().
That patch halts SQPOLL submissions when sqo_task dies by introducing
sqo_dead flag. Once set, the SQPOLL task must not do any submission,
which is synchronised by uring_lock as well as the new flag.
The tricky part is to make sure that disabling always happens, that
means either the ring is discovered by creator's do_exit() -> cancel,
or if the final close() happens before it's done by the creator. The
last is guaranteed by the fact that for SQPOLL the creator task and only
it holds exactly one file note, so either it pins up to do_exit() or
removed by the creator on the final put in flush. (see comments in
uring_flush() around file->f_count == 2).
One more place that can trigger io_sq_thread_acquire_*() is
__io_req_task_submit(). Shoot off requests on sqo_dead there, even
though actually we don't need to. That's because cancellation of
sqo_task should wait for the request before going any further.
note 1: io_disable_sqo_submit() does io_ring_set_wakeup_flag() so the
caller would enter the ring to get an error, but it still doesn't
guarantee that the flag won't be cleared.
note 2: if final __userspace__ close happens not from the creator
task, the file note will pin the ring until the task dies.
Fixed: b1b6b5a30d ("kernel/io_uring: cancel io_uring before task works")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
files_cancel() should cancel all relevant requests and drop file notes,
so we should never have file notes after that, including on-exit fput
and flush. Add a WARN_ONCE to be sure.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We expect io_rw_reissue() to take place only during submission with
uring_lock held. Add a lockdep annotation to check that invariant.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
If BCH_FEATURE_INCOMPAT_OBSO_LARGE_BUCKET is set in incompat feature
set, it means the cache device is created with obsoleted layout with
obso_bucket_site_hi. Now bcache does not support this feature bit, a new
BCH_FEATURE_INCOMPAT_LOG_LARGE_BUCKET_SIZE incompat feature bit is added
for a better layout to support large bucket size.
For the legacy compatibility purpose, if a cache device created with
obsoleted BCH_FEATURE_INCOMPAT_OBSO_LARGE_BUCKET feature bit, all bcache
devices attached to this cache set should be set to read-only. Then the
dirty data can be written back to backing device before re-create the
cache device with BCH_FEATURE_INCOMPAT_LOG_LARGE_BUCKET_SIZE feature bit
by the latest bcache-tools.
This patch checks BCH_FEATURE_INCOMPAT_OBSO_LARGE_BUCKET feature bit
when running a cache set and attach a bcache device to the cache set. If
this bit is set,
- When run a cache set, print an error kernel message to indicate all
following attached bcache device will be read-only.
- When attach a bcache device, print an error kernel message to indicate
the attached bcache device will be read-only, and ask users to update
to latest bcache-tools.
Such change is only for cache device whose bucket size >= 32MB, this is
for the zoned SSD and almost nobody uses such large bucket size at this
moment. If you don't explicit set a large bucket size for a zoned SSD,
such change is totally transparent to your bcache device.
Fixes: ffa4703275 ("bcache: add bucket_size_hi into struct cache_sb_disk for large bucket")
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
When large bucket feature was added, BCH_FEATURE_INCOMPAT_LARGE_BUCKET
was introduced into the incompat feature set. It used bucket_size_hi
(which was added at the tail of struct cache_sb_disk) to extend current
16bit bucket size to 32bit with existing bucket_size in struct
cache_sb_disk.
This is not a good idea, there are two obvious problems,
- Bucket size is always value power of 2, if store log2(bucket size) in
existing bucket_size of struct cache_sb_disk, it is unnecessary to add
bucket_size_hi.
- Macro csum_set() assumes d[SB_JOURNAL_BUCKETS] is the last member in
struct cache_sb_disk, bucket_size_hi was added after d[] which makes
csum_set calculate an unexpected super block checksum.
To fix the above problems, this patch introduces a new incompat feature
bit BCH_FEATURE_INCOMPAT_LOG_LARGE_BUCKET_SIZE, when this bit is set, it
means bucket_size in struct cache_sb_disk stores the order of power-of-2
bucket size value. When user specifies a bucket size larger than 32768
sectors, BCH_FEATURE_INCOMPAT_LOG_LARGE_BUCKET_SIZE will be set to
incompat feature set, and bucket_size stores log2(bucket size) more
than store the real bucket size value.
The obsoleted BCH_FEATURE_INCOMPAT_LARGE_BUCKET won't be used anymore,
it is renamed to BCH_FEATURE_INCOMPAT_OBSO_LARGE_BUCKET and still only
recognized by kernel driver for legacy compatible purpose. The previous
bucket_size_hi is renmaed to obso_bucket_size_hi in struct cache_sb_disk
and not used in bcache-tools anymore.
For cache device created with BCH_FEATURE_INCOMPAT_LARGE_BUCKET feature,
bcache-tools and kernel driver still recognize the feature string and
display it as "obso_large_bucket".
With this change, the unnecessary extra space extend of bcache on-disk
super block can be avoided, and csum_set() may generate expected check
sum as well.
Fixes: ffa4703275 ("bcache: add bucket_size_hi into struct cache_sb_disk for large bucket")
Signed-off-by: Coly Li <colyli@suse.de>
Cc: stable@vger.kernel.org # 5.9+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This patch adds the check for features which is incompatible for
current supported feature sets.
Now if the bcache device created by bcache-tools has features that
current kernel doesn't support, read_super() will fail with error
messoage. E.g. if an unsupported incompatible feature detected,
bcache register will fail with dmesg "bcache: register_bcache() error :
Unsupported incompatible feature found".
Fixes: d721a43ff6 ("bcache: increase super block version for cache device and backing device")
Fixes: ffa4703275 ("bcache: add bucket_size_hi into struct cache_sb_disk for large bucket")
Signed-off-by: Coly Li <colyli@suse.de>
Cc: stable@vger.kernel.org # 5.9+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This patch fixes the following typos,
from BCH_FEATURE_COMPAT_SUUP to BCH_FEATURE_COMPAT_SUPP
from BCH_FEATURE_INCOMPAT_SUUP to BCH_FEATURE_INCOMPAT_SUPP
from BCH_FEATURE_INCOMPAT_SUUP to BCH_FEATURE_RO_COMPAT_SUPP
Fixes: d721a43ff6 ("bcache: increase super block version for cache device and backing device")
Fixes: ffa4703275 ("bcache: add bucket_size_hi into struct cache_sb_disk for large bucket")
Signed-off-by: Coly Li <colyli@suse.de>
Cc: stable@vger.kernel.org # 5.9+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
There is no need to reassign pdev_set_uuid in the second loop iteration,
so move it to the place before second loop.
Signed-off-by: Yi Li <yili@winhong.com>
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
It recently became apparent that the lack of a 'device_type = "pci"'
in the PCIe root complex node for rk3399 is a violation of the PCI
binding, as documented in IEEE Std 1275-1994. Changes to the kernel's
parsing of the DT made such violation fatal, as drivers cannot
probe the controller anymore.
Add the missing property makes the PCIe node compliant. While we
are at it, drop the pointless linux,pci-domain property, which only
makes sense when there are multiple host bridges.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200815125112.462652-3-maz@kernel.org
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
On Pinebook Pro laptops with an NVMe SSD installed, prevent random
crashes in the NVMe driver by not attempting to use a PCIe link speed
higher than that supported by the RK3399 SoC.
See commit 712fa17772 ("arm64: dts: rockchip: add max-link-speed for
rk3399").
Fixes: 5a65505a69 ("arm64: dts: rockchip: Add initial support for Pinebook Pro")
Signed-off-by: Simon South <simon@simonsouth.net>
Link: https://lore.kernel.org/r/20200930185627.5918-1-simon@simonsouth.net
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Jakub Kicinski says:
====================
net: fix issues around register_netdevice() failures
This series attempts to clean up the life cycle of struct
net_device. Dave has added dev->needs_free_netdev in the
past to fix double frees, we can lean on that mechanism
a little more to fix remaining issues with register_netdevice().
This is the next chapter of the saga which already includes:
commit 0e0eee2465 ("net: correct error path in rtnl_newlink()")
commit e51fb15231 ("rtnetlink: fix a memory leak when ->newlink fails")
commit cf124db566 ("net: Fix inconsistent teardown and release of private netdev state.")
commit 93ee31f14f ("[NET]: Fix free_netdev on register_netdev failure.")
commit 814152a89e ("net: fix memleak in register_netdevice()")
commit 10cc514f45 ("net: Fix null de-reference of device refcount")
The immediate problem which gets fixed here is that calling
free_netdev() right after unregister_netdevice() is illegal
because we need to release rtnl_lock first, to let the
unregistration finish. Note that unregister_netdevice() is
just a wrapper of unregister_netdevice_queue(), it only
does half of the job.
Where this limitation becomes most problematic is in failure
modes of register_netdevice(). There is a notifier call right
at the end of it, which lets other subsystems veto the entire
thing. At which point we should really go through a full
unregister_netdevice(), but we can't because callers may
go straight to free_netdev() after the failure, and that's
no bueno (see the previous paragraph).
This set makes free_netdev() more lenient, when device
is still being unregistered free_netdev() will simply set
dev->needs_free_netdev and let the unregister process do
the freeing.
With the free_netdev() problem out of the way failures in
register_netdevice() can make use of net_todo, again.
Users are still expected to call free_netdev() right after
failure but that will only set dev->needs_free_netdev.
To prevent the pathological case of:
dev->needs_free_netdev = true;
if (register_netdevice(dev)) {
rtnl_unlock();
free_netdev(dev);
}
make register_netdevice()'s failure clear dev->needs_free_netdev.
Problems described above are only present with register_netdevice() /
unregister_netdevice(). We have two parallel APIs for registration
of devices:
- those called outside rtnl_lock (register_netdev(), and
unregister_netdev());
- and those to be used under rtnl_lock - register_netdevice()
and unregister_netdevice().
The former is trivial and has no problems. The alternative
approach to fix the latter would be to also separate the
freeing functions - i.e. add free_netdevice(). This has been
implemented (incl. converting all relevant calls in the tree)
but it feels a little unnecessary to put the burden of choosing
the right free_netdev{,ice}() call on the programmer when we
can "just do the right thing" by default.
====================
Link: https://lore.kernel.org/r/20210106184007.1821480-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If register_netdevice() fails at the very last stage - the
notifier call - some subsystems may have already seen it and
grabbed a reference. struct net_device can't be freed right
away without calling netdev_wait_all_refs().
Now that we have a clean interface in form of dev->needs_free_netdev
and lenient free_netdev() we can undo what commit 93ee31f14f ("[NET]:
Fix free_netdev on register_netdev failure.") has done and complete
the unregistration path by bringing the net_set_todo() call back.
After registration fails user is still expected to explicitly
free the net_device, so make sure ->needs_free_netdev is cleared,
otherwise rolling back the registration will cause the old double
free for callers who release rtnl_lock before the free.
This also solves the problem of priv_destructor not being called
on notifier error.
net_set_todo() will be moved back into unregister_netdevice_queue()
in a follow up.
Reported-by: Hulk Robot <hulkci@huawei.com>
Reported-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
There are two flavors of handling netdev registration:
- ones called without holding rtnl_lock: register_netdev() and
unregister_netdev(); and
- those called with rtnl_lock held: register_netdevice() and
unregister_netdevice().
While the semantics of the former are pretty clear, the same can't
be said about the latter. The netdev_todo mechanism is utilized to
perform some of the device unregistering tasks and it hooks into
rtnl_unlock() so the locked variants can't actually finish the work.
In general free_netdev() does not mix well with locked calls. Most
drivers operating under rtnl_lock set dev->needs_free_netdev to true
and expect core to make the free_netdev() call some time later.
The part where this becomes most problematic is error paths. There is
no way to unwind the state cleanly after a call to register_netdevice(),
since unreg can't be performed fully without dropping locks.
Make free_netdev() more lenient, and defer the freeing if device
is being unregistered. This allows error paths to simply call
free_netdev() both after register_netdevice() failed, and after
a call to unregister_netdevice() but before dropping rtnl_lock.
Simplify the error paths which are currently doing gymnastics
around free_netdev() handling.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When setting up a channel bridge, ppp_bridge_channels sets the
pch->bridge field before taking the associated reference on the bridge
file instance.
This opens up a refcount underflow bug if ppp_bridge_channels called
via. iotcl runs concurrently with ppp_unbridge_channels executing via.
file release.
The bug is triggered by ppp_bridge_channels taking the error path
through the 'err_unset' label. In this scenario, pch->bridge is set,
but the reference on the bridged channel will not be taken because
the function errors out. If ppp_unbridge_channels observes pch->bridge
before it is unset by the error path, it will erroneously drop the
reference on the bridged channel and cause a refcount underflow.
To avoid this, ensure that ppp_bridge_channels holds a reference on
each channel in advance of setting the bridge pointers.
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Fixes: 4cf476ced4 ("ppp: add PPPIOCBRIDGECHAN and PPPIOCUNBRIDGECHAN ioctls")
Acked-by: Guillaume Nault <gnault@redhat.com>
Link: https://lore.kernel.org/r/20210107181315.3128-1-tparkin@katalix.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
skbs in fraglist could be shared by a BPF filter loaded at TC. If TC
writes, it will call skb_ensure_writable -> pskb_expand_head to create
a private linear section for the head_skb. And then call
skb_clone_fraglist -> skb_get on each skb in the fraglist.
skb_segment_list overwrites part of the skb linear section of each
fragment itself. Even after skb_clone, the frag_skbs share their
linear section with their clone in PF_PACKET.
Both sk_receive_queue of PF_PACKET and PF_INET (or PF_INET6) can have
a link for the same frag_skbs chain. If a new skb (not frags) is
queued to one of the sk_receive_queue, multiple ptypes can see and
release this. It causes use-after-free.
[ 4443.426215] ------------[ cut here ]------------
[ 4443.426222] refcount_t: underflow; use-after-free.
[ 4443.426291] WARNING: CPU: 7 PID: 28161 at lib/refcount.c:190
refcount_dec_and_test_checked+0xa4/0xc8
[ 4443.426726] pstate: 60400005 (nZCv daif +PAN -UAO)
[ 4443.426732] pc : refcount_dec_and_test_checked+0xa4/0xc8
[ 4443.426737] lr : refcount_dec_and_test_checked+0xa0/0xc8
[ 4443.426808] Call trace:
[ 4443.426813] refcount_dec_and_test_checked+0xa4/0xc8
[ 4443.426823] skb_release_data+0x144/0x264
[ 4443.426828] kfree_skb+0x58/0xc4
[ 4443.426832] skb_queue_purge+0x64/0x9c
[ 4443.426844] packet_set_ring+0x5f0/0x820
[ 4443.426849] packet_setsockopt+0x5a4/0xcd0
[ 4443.426853] __sys_setsockopt+0x188/0x278
[ 4443.426858] __arm64_sys_setsockopt+0x28/0x38
[ 4443.426869] el0_svc_common+0xf0/0x1d0
[ 4443.426873] el0_svc_handler+0x74/0x98
[ 4443.426880] el0_svc+0x8/0xc
Fixes: 3a1296a38d (net: Support GRO/GSO fraglist chaining.)
Signed-off-by: Dongseok Yi <dseok.yi@samsung.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/1610072918-174177-1-git-send-email-dseok.yi@samsung.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
At the moment it is quite hard to identify the network interface
provided by IPA in userspace components: The network interface is
created as virtual device, without any link to the IPA device.
The interface name ("rmnet_ipa%d") is the only indication that the
network interface belongs to IPA, but this is not very reliable.
Add SET_NETDEV_DEV() to associate the network interface with the
IPA parent device. This allows userspace services like ModemManager
to properly identify that this network interface is provided by IPA
and belongs to the modem.
Cc: Alex Elder <elder@kernel.org>
Fixes: a646d6ec90 ("soc: qcom: ipa: modem and microcontroller")
Signed-off-by: Stephan Gerhold <stephan@gerhold.net>
Link: https://lore.kernel.org/r/20210106100755.56800-1-stephan@gerhold.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull zonefs fix from Damien Le Moal:
"A single patch from Arnd to fix a missing dependency in zonefs
Kconfig"
* tag 'zonefs-5.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs:
zonefs: select CONFIG_CRC32
Pull kunit fixes from Shuah Khan:
"One fix to force the use of the 'tty' console for UML.
Given that kunit tool requires the console output, explicitly stating
the dependency makes sense than relying on it being the default"
* tag 'linux-kselftest-kunit-fixes-5.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
kunit: tool: Force the use of the 'tty' console for UML
Pull kselftest fixes from Shuah Khan:
"Two minor fixes to vDSO test changes in this merge window"
* tag 'linux-kselftest-next-5.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests/vDSO: fix -Wformat warning in vdso_test_correctness
selftests/vDSO: add additional binaries to .gitignore
Pull documentation fixes from Jonathan Corbet:
"A handful of relatively small documentation fixes"
* tag 'docs-5.11-3' of git://git.lwn.net/linux:
docs: admin-guide: bootconfig: Fix feils to fails
Documentation/admin-guide: kernel-parameters: hyphenate comma-separated
docs: binfmt-misc: Fix .rst formatting
docs: remove mention of ENABLE_MUST_CHECK
atomic: remove further references to atomic_ops
Documentation: doc-guide: fixes to sphinx.rst
docs/mm: concepts.rst: Correct the threshold to low watermark
Documentation: admin: early_param()s are also listed in kernel-parameters
docs: Fix reST markup when linking to sections
Pull device properties framework fixes from Rafael Wysocki:
"Revert a problematic commit that went in during the 5.10 cycle and
improve the kerneldoc description of the function affected by it (both
changes from Bard Liao)"
* tag 'devprop-5.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
device property: add description of fwnode cases
Revert "device property: Keep secondary firmware node secondary by type"
Pull ACPI fixes from Rafael Wysocki:
"These address two build issues and drop confusing text from a couple
of Kconfig entries.
Specifics:
- Drop two local variables that are never read and the code updating
their values from the x86 suspend-to-idle code (Rafael Wysocki)
- Add empty stub of an ACPI helper function to avoid build issues
when CONFIG_ACPI is not set (Shawn Guo)
- Remove confusing text regarding modules from Kconfig entries that
correspond to non-modular code (Peter Robinson)"
* tag 'acpi-5.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: Update Kconfig help text for items that are no longer modular
ACPI: scan: add stub acpi_create_platform_device() for !CONFIG_ACPI
ACPI: PM: s2idle: Drop unused local variables and related code
Pull power management fixes from Rafael Wysocki:
"These address two issues in the intel_pstate driver and one in the
powernow-k8 cpufreq driver.
Specifics:
- Make the powernow-k8 cpufreq driver avoid calling
cpufreq_cpu_get(), which theoretically may return NULL, to get a
policy pointer that is known to it already (Colin Ian King)
- Drop two functions that are not used any more from the intel_pstate
driver (Lukas Bulwahn)
- Make intel_pstate check the HWP capabilities to get the maximum
available P-state in the passive mode to avoid using a stale value
of it in case of out-of-band updates (Rafael Wysocki)"
* tag 'pm-5.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: intel_pstate: remove obsolete functions
cpufreq: powernow-k8: pass policy rather than use cpufreq_cpu_get()
cpufreq: intel_pstate: Use HWP capabilities in intel_cpufreq_adjust_perf()
Pull drm fixes from Daniel Vetter:
"Looks like people are back from the break, usual small pile of fixes
all over. Next week Dave should be back.
The only thing pending I'm aware of is a "this shouldn't have become
uapi" reverts for amdgpu, but they're already on the list and not that
important really so can wait another week.
Summary:
- fix for ttm list corruption in radeon, reported by a few people
- fixes for amdgpu, i915, msm
- dma-buf use-after free fix"
* tag 'drm-fixes-2021-01-08' of git://anongit.freedesktop.org/drm/drm: (29 commits)
drm/msm: Only enable A6xx LLCC code on A6xx
drm/msm: Add modparam to allow vram carveout
drm/msm: Call msm_init_vram before binding the gpu
drm/msm/dp: postpone irq_hpd event during connection pending state
drm/ttm: unexport ttm_pool_init/fini
drm/radeon: stop re-init the TTM page pool
dmabuf: fix use-after-free of dmabuf's file->f_inode
Revert "drm/amd/display: Fix memory leaks in S3 resume"
drm/amdgpu/display: drop DCN support for aarch64
drm/amdgpu: enable ras eeprom support for sienna cichlid
drm/amdgpu: fix no bad_pages issue after umc ue injection
drm/amdgpu: fix potential memory leak during navi12 deinitialization
drm/amd/display: Fix unused variable warning
drm/amd/pm: improve the fine grain tuning function for RV/RV2/PCO
drm/amd/pm: fix the failure when change power profile for renoir
drm/amdgpu: fix a GPU hang issue when remove device
drm/amdgpu: fix a memory protection fault when remove amdgpu device
drm/amdgpu: switched to cached noretry setting for vangogh
drm/amd/display: fix sysfs amdgpu_current_backlight_pwm NULL pointer issue
drm/amd/pm: updated PM to I2C controller port on sienna cichlid
...
Pull kvm fixes from Paolo Bonzini:
"x86:
- Fixes for the new scalable MMU
- Fixes for migration of nested hypervisors on AMD
- Fix for clang integrated assembler
- Fix for left shift by 64 (UBSAN)
- Small cleanups
- Straggler SEV-ES patch
ARM:
- VM init cleanups
- PSCI relay cleanups
- Kill CONFIG_KVM_ARM_PMU
- Fixup __init annotations
- Fixup reg_to_encoding()
- Fix spurious PMCR_EL0 access
Misc:
- selftests cleanups"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (38 commits)
KVM: x86: __kvm_vcpu_halt can be static
KVM: SVM: Add support for booting APs in an SEV-ES guest
KVM: nSVM: cancel KVM_REQ_GET_NESTED_STATE_PAGES on nested vmexit
KVM: nSVM: mark vmcb as dirty when forcingly leaving the guest mode
KVM: nSVM: correctly restore nested_run_pending on migration
KVM: x86/mmu: Clarify TDP MMU page list invariants
KVM: x86/mmu: Ensure TDP MMU roots are freed after yield
kvm: check tlbs_dirty directly
KVM: x86: change in pv_eoi_get_pending() to make code more readable
MAINTAINERS: Really update email address for Sean Christopherson
KVM: x86: fix shift out of bounds reported by UBSAN
KVM: selftests: Implement perf_test_util more conventionally
KVM: selftests: Use vm_create_with_vcpus in create_vm
KVM: selftests: Factor out guest mode code
KVM/SVM: Remove leftover __svm_vcpu_run prototype from svm.c
KVM: SVM: Add register operand to vmsave call in sev_es_vcpu_load
KVM: x86/mmu: Optimize not-present/MMIO SPTE check in get_mmio_spte()
KVM: x86/mmu: Use raw level to index into MMIO walks' sptes array
KVM: x86/mmu: Get root level from walkers when retrieving MMIO SPTE
KVM: x86/mmu: Use -1 to flag an undefined spte in get_mmio_spte()
...
Pull iommu fixes from Will Deacon:
"This is mainly all Intel VT-D stuff, but there are some fixes for AMD
and ARM as well.
We've also got the revert I promised during the merge window, which
removes a temporary hack to accomodate i915 while we transitioned the
Intel IOMMU driver over to the common DMA-IOMMU API.
Finally, there are still a couple of other VT-D fixes floating around,
so I expect to send you another batch of fixes next week.
Summary:
- Fix VT-D TLB invalidation for subdevices
- Fix VT-D use-after-free on subdevice detach
- Fix VT-D locking so that IRQs are disabled during SVA bind/unbind
- Fix VT-D address alignment when flushing IOTLB
- Fix memory leak in VT-D IRQ remapping failure path
- Revert temporary i915 sglist hack now that it is no longer required
- Fix sporadic boot failure with Arm SMMU on Qualcomm SM8150
- Fix NULL dereference in AMD IRQ remapping code with remapping disabled
- Fix accidental enabling of irqs on AMD resume-from-suspend path
- Fix some typos in comments"
* tag 'iommu-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
iommu/vt-d: Fix ineffective devTLB invalidation for subdevices
iommu/vt-d: Fix general protection fault in aux_detach_device()
iommu/vt-d: Move intel_iommu info from struct intel_svm to struct intel_svm_dev
iommu/arm-smmu-qcom: Initialize SCTLR of the bypass context
iommu/vt-d: Fix lockdep splat in sva bind()/unbind()
Revert "iommu: Add quirk for Intel graphic devices in map_sg"
iommu/vt-d: Fix misuse of ALIGN in qi_flush_piotlb()
iommu/amd: Stop irq_remapping_select() matching when remapping is disabled
iommu/amd: Set iommu->int_enabled consistently when interrupts are set up
iommu/intel: Fix memleak in intel_irq_remapping_alloc
iommu/iova: fix 'domain' typos
Pioneer devices have both playback and capture streams sharing the
same iface/altsetting, and those need to be paired as implicit
feedback. Instead of a half-baked (and broken) static quirk entry,
set up more generically for those devices by checking the number of
endpoints and the attribute of the secondary EP.
Fixes: bf6313a0ff ("ALSA: usb-audio: Refactor endpoint management")
Reported-by: František Kučera <konference@frantovo.cz>
Link: https://lore.kernel.org/r/20210108075219.21463-6-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
There are devices that have multiple endpoints sharing the same
iface/altset not only for sync but also for the actual streams, and
the audioformat for such an endpoint needs to be handled with the
proper endpoint index; otherwise it confuses the endpoint management.
This patch extends the audioformat to annotate the endpoint index, and
put the proper ep_idx=1 to Pioneer device quirk entries accordingly.
Fixes: bf6313a0ff ("ALSA: usb-audio: Refactor endpoint management")
Link: https://lore.kernel.org/r/20210108075219.21463-5-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
The current endpoint handling assumed (more or less) a unique 1:1
relation between the endpoint and the iface/altset. The exception was
the sync EP without the implicit feedback which has usually the
secondary EP of the same altset. This works fine for most devices,
but it turned out that some unusual devices like Pinoeer's ones have
both playback and capture endpoints in the same iface/altsetting and
use both for the implicit feedback mode. For handling such a case, we
need to extend the endpoint management to take the shared interface
into account.
This patch does that: it adds a new object snd_usb_iface_ref for
managing the reference counts of the each USB interface that is used
by each endpoint. The interface setup is performed only once for the
(sharing) endpoints, and the doubly initialization is avoided.
Along with this, the resource release of endpoints and interface
refcounts are put into a single function, snd_usb_endpoint_free_all()
instead of looping in the caller side.
Fixes: bf6313a0ff ("ALSA: usb-audio: Refactor endpoint management")
Link: https://lore.kernel.org/r/20210108075219.21463-4-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
The implicit feedback mode needs to handle two endpoints and the
choice of the audioformat object for the sync EP is important since
this determines the compatibility of the hw_params. The current code
uses the same audioformat object if both the main EP and the sync EP
point to the same iface/altsetting. This was done in consideration of
the non-implicit-fb sync EP handling, and it doesn't match well with
the cases where actually to endpoints are defined in the sameiface /
altsetting like a few Pioneer devices.
Modify snd_usb_find_implicit_fb_sync_format() to pick up the
audioformat that is assigned in the counter-part substreams primarily,
so that the actual capture stream can be opened properly. We keep the
same audioformat object only as a fallback in case nothing found,
though.
Fixes: 9fddc15e80 ("ALSA: usb-audio: Factor out the implicit feedback quirk code")
Link: https://lore.kernel.org/r/20210108075219.21463-3-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
The recent change in the endpoint management moved the endpoint object
creation from the stream open time to the parser of the audio
descriptor. It works fine for the standard audio, but it overlooked
the other places that create audio streams via quirks
(QUIRK_AUDIO_FIXED_ENDPOINT) like the reported a few Pioneer devices;
those call snd_usb_add_audio_stream() manually, hence they miss the
endpoints, eventually resulting in the error at opening streams.
Moreover, now the sync EP setup was moved to the explicit call of
snd_usb_audioformat_set_sync_ep(), and this needs to be added for
those places, too.
This patch addresses those regressions for quirks. It adds a local
helper function add_audio_stream_from_fixed_fmt(), which does the all
needed tasks, and replaces the calls of snd_usb_add_audio_stream()
with this new function.
Fixes: 54cb31901b ("ALSA: usb-audio: Create endpoint objects at parsing phase")
Reported-by: František Kučera <konference@frantovo.cz>
Link: https://lore.kernel.org/r/20210108075219.21463-2-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Pull ARM SoC fixes from Arnd Bergmann:
"These are a small number of bug fixes that all came in before or
during the merge window, most for the omap platform:
- One boot regression fix for Nokia N9 (OMAP3).
- Two small defconfig changes for omap2, to reflect changes in
drivers
- Warning fixes for DT issues on omap2, picoxcell and bitmap SoCs.
The picoxcell platform will be removed in v5.12, but fixing it
first makes it easier to backport to the fix to stable kernels and
get a clean build with new dtc versions"
* tag 'arm-fixes-5.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
ARM: picoxcell: fix missing interrupt-parent properties
ARM: dts: ux500/golden: Set display max brightness
arm64: dts: bitmain: Use generic "ngpios" rather than "snps,nr-gpios"
ARM: omap2: pmic-cpcap: fix maximum voltage to be consistent with defaults on xt875
ARM: omap2plus_defconfig: enable SPI GPIO
ARM: OMAP2+: omap_device: fix idling of devices during probe
ARM: dts: OMAP3: disable AES on N950/N9
ARM: omap2plus_defconfig: drop unused POWER_AVS option
Pull arm64 fixes from Catalin Marinas:
- Clean-ups following the merging window: remove unused variable,
duplicate includes, superfluous barrier, move some inline asm to
separate functions.
- Disable top-byte-ignore on kernel code addresses with KASAN/MTE
enabled (already done when MTE is disabled).
- Fix ARCH_LOW_ADDRESS_LIMIT definition with CONFIG_ZONE_DMA disabled.
- Compiler/linker flags: link with "-z norelno", discard .eh_frame_hdr
instead of --no-eh-frame-hdr.
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: Move PSTATE.TCO setting to separate functions
arm64: kasan: Set TCR_EL1.TBID1 when KASAN_HW_TAGS is enabled
arm64: vdso: disable .eh_frame_hdr via /DISCARD/ instead of --no-eh-frame-hdr
arm64: traps: remove duplicate include statement
arm64: link with -z norelro for LLD or aarch64-elf
arm64: mm: Fix ARCH_LOW_ADDRESS_LIMIT when !CONFIG_ZONE_DMA
arm64: mte: remove an ISB on kernel exit
arm64/smp: Remove unused irq variable in arch_show_interrupts()
With external metadata device, flush requests are not passed down to the
data device.
Fix this by submitting the flush request in dm_integrity_flush_buffers. In
order to not degrade performance, we overlap the data device flush with
the metadata device flush.
Reported-by: Lukas Straub <lukasstraub2@web.de>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
There wasn't ever a real need to log an error in the kernel log for
ioctls issued with insufficient permissions. Simply return an error
and if an admin/user is sufficiently motivated they can enable DM's
dynamic debugging to see an explanation for why the ioctls were
disallowed.
Reported-by: Nir Soffer <nsoffer@redhat.com>
Fixes: e980f62353 ("dm: don't allow ioctls to targets that don't map to whole devices")
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Pull more networking fixes from Jakub Kicinski:
"Slightly lighter pull request to get back into the Thursday cadence.
Current release - always broken:
- can: mcp251xfd: fix Tx/Rx ring buffer driver race conditions
- dsa: hellcreek: fix led_classdev build errors
Previous releases - regressions:
- ipv6: fib: flush exceptions when purging route to avoid netdev
reference leak
- ip_tunnels: fix pmtu check in nopmtudisc mode
- ip: always refragment ip defragmented packets to avoid MTU issues
when forwarding through tunnels, correct "packet too big" message
is prohibitively tricky to generate
- s390/qeth: fix locking for discipline setup / removal and during
recovery to prevent both deadlocks and races
- mlx5: Use port_num 1 instead of 0 when delete a RoCE address
Previous releases - always broken:
- cdc_ncm: correct overhead calculation in delayed_ndp_size to
prevent out of bound accesses with Huawei 909s-120 LTE module
- fix stmmac dwmac-sun8i suspend/resume:
- PHY being left powered off
- MAC syscon configuration being reset
- reference to the reset controller being improperly dropped
- qrtr: fix null-ptr-deref in qrtr_ns_remove
- can: tcan4x5x: fix bittiming const, use common bittiming from m_can
driver
- mlx5e: CT: Use per flow counter when CT flow accounting is enabled
- mlx5e: Fix SWP offsets when vlan inserted by driver
Misc:
- bpf: Fix a task_iter bug caused by a bpf -> net merge conflict
resolution
And the usual many fixes to various error paths"
* tag 'net-5.11-rc3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (69 commits)
net: dsa: lantiq_gswip: Exclude RMII from modes that report 1 GbE
s390/qeth: fix L2 header access in qeth_l3_osa_features_check()
s390/qeth: fix locking for discipline setup / removal
s390/qeth: fix deadlock during recovery
selftests: fib_nexthops: Fix wrong mausezahn invocation
nexthop: Bounce NHA_GATEWAY in FDB nexthop groups
nexthop: Unlink nexthop group entry in error path
nexthop: Fix off-by-one error in error path
octeontx2-af: fix memory leak of lmac and lmac->name
chtls: Fix chtls resources release sequence
chtls: Added a check to avoid NULL pointer dereference
chtls: Replace skb_dequeue with skb_peek
chtls: Avoid unnecessary freeing of oreq pointer
chtls: Fix panic when route to peer not configured
chtls: Remove invalid set_tcb call
chtls: Fix hardware tid leak
net: ip: always refragment ip defragmented packets
net: fix pmtu check in nopmtudisc mode
selftests: netfilter: add selftest for ipip pmtu discovery with enabled connection tracking
docs: octeontx2: tune rst markup
...
Pull crypto fixes from Herbert Xu:
"This fixes a functional bug in arm/chacha-neon as well as a potential
buffer overflow in ecdh"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: ecdh - avoid buffer overflow in ecdh_set_secret()
crypto: arm/chacha-neon - add missing counter increment
The kernel test robot reported a -5.8% performance regression on the
"poll2" test of will-it-scale, and bisected it to commit d55564cfc2
("x86: Make __put_user() generate an out-of-line call").
I didn't expect an out-of-line __put_user() to matter, because no normal
core code should use that non-checking legacy version of user access any
more. But I had overlooked the very odd poll() usage, which does a
__put_user() to update the 'revents' values of the poll array.
Now, Al Viro correctly points out that instead of updating just the
'revents' field, it would be much simpler to just copy the _whole_
pollfd entry, and then we could just use "copy_to_user()" on the whole
array of entries, the same way we use "copy_from_user()" a few lines
earlier to get the original values.
But that is not what we've traditionally done, and I worry that threaded
applications might be concurrently modifying the other fields of the
pollfd array. So while Al's suggestion is simpler - and perhaps worth
trying in the future - this instead keeps the "just update revents"
model.
To fix the performance regression, use the modern "unsafe_put_user()"
instead of __put_user(), with the proper "user_write_access_begin()"
guarding in place. This improves code generation enormously.
Link: https://lore.kernel.org/lkml/20210107134723.GA28532@xsang-OptiPlex-9020/
Reported-by: kernel test robot <oliver.sang@intel.com>
Tested-by: Oliver Sang <oliver.sang@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: David Laight <David.Laight@aculab.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This reverts commit 757055ae8d.
The commit caused that ttynull was used as the default console
on several systems[1][2][3]. As a result, the console was
blank even when a better alternative existed.
It happened when there was no console configured
on the command line and ttynull_init() was the first initcall
calling register_console().
Or it happened when /dev/ did not exist when console_on_rootfs()
was called. It was not able to open /dev/console even though
a console driver was registered. It tried to add ttynull console
but it obviously did not help. But ttynull became the preferred
console and was used by /dev/console when it was available later.
The commit tried to fix a historical problem that have been there
for ages. The primary motivation was the commit 3cffa06aee
("printk/console: Allow to disable console output by using console=""
or console=null"). It provided a clean solution for a workaround
that was widely used and worked only by chance.
This revert causes that the console="" or console=null command line
options will again work only by chance. These options will cause that
a particular console will be preferred and the default (tty) ones
will not get enabled. There will be no console registered at
all. As a result there won't be stdin, stdout, and stderr for
the init process. But it worked exactly this way even before.
The proper solution has to fulfill many conditions:
+ Register ttynull only when explicitly required or as
the ultimate fallback.
+ ttynull should get associated with /dev/console but it must
not become preferred console when used as a fallback.
Especially, it must still be possible to replace it
by a better console later.
Such a change requires clean up of the register_console() code.
Otherwise, it would be even harder to follow. Especially, the use
of has_preferred_console and CON_CONSDEV flag is tricky. The clean
up is risky. The ordering of consoles is not well defined. And
any changes tend to break existing user settings.
Do the revert at the least risky solution for now.
[1] https://lore.kernel.org/linux-kselftest/20201221144302.GR4077@smile.fi.intel.com/
[2] https://lore.kernel.org/lkml/d2a3b3c0-e548-7dd1-730f-59bc5c04e191@synopsys.com/
[3] https://patchwork.ozlabs.org/project/linux-um/patch/20210105120128.10854-1-thomas@m3y3r.de/
Reported-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reported-by: Vineet Gupta <vgupta@synopsys.com>
Reported-by: Thomas Meyer <thomas@m3y3r.de>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The final step in regulator_register() is to call
regulator_resolve_supply() for each registered regulator
(including the one in the process of being registered). The
regulator_resolve_supply() function first checks if rdev->supply
is NULL, then it performs various steps to try to find the supply.
If successful, rdev->supply is set inside of set_supply().
This procedure can encounter a race condition if two concurrent
tasks call regulator_register() near to each other on separate CPUs
and one of the regulators has rdev->supply_name specified. There
is currently nothing guaranteeing atomicity between the rdev->supply
check and set steps. Thus, both tasks can observe rdev->supply==NULL
in their regulator_resolve_supply() calls. This then results in
both creating a struct regulator for the supply. One ends up
actually stored in rdev->supply and the other is lost (though still
present in the supply's consumer_list).
Here is a kernel log snippet showing the issue:
[ 12.421768] gpu_cc_gx_gdsc: supplied by pm8350_s5_level
[ 12.425854] gpu_cc_gx_gdsc: supplied by pm8350_s5_level
[ 12.429064] debugfs: Directory 'regulator.4-SUPPLY' with parent
'17a00000.rsc:rpmh-regulator-gfxlvl-pm8350_s5_level'
already present!
Avoid this race condition by holding the rdev->mutex lock inside
of regulator_resolve_supply() while checking and setting
rdev->supply.
Signed-off-by: David Collins <collinsd@codeaurora.org>
Link: https://lore.kernel.org/r/1610068562-4410-1-git-send-email-collinsd@codeaurora.org
Signed-off-by: Mark Brown <broonie@kernel.org>
* acpi-scan:
ACPI: scan: add stub acpi_create_platform_device() for !CONFIG_ACPI
* acpi-misc:
ACPI: Update Kconfig help text for items that are no longer modular
Song reported a boot regression in a kvm image with 5.11-rc, and bisected
it down to the below patch. Debugging this issue, turns out that the boot
stalled when a task is waiting on a pipe being released. As we no longer
run task_work from get_signal() unless it's queued with TWA_SIGNAL, the
task goes idle without running the task_work. This prevents ->release()
from being called on the pipe, which another boot task is waiting on.
For now, re-instate the unconditional task_work run from get_signal().
For 5.12, we'll collapse TWA_RESUME and TWA_SIGNAL, as it no longer
makes sense to have a distinction between the two. This will turn
task_work notification into a simple boolean, whether to notify or not.
Fixes: 98b89b649f ("signal: kill JOBCTL_TASK_WORK")
Reported-by: Song Liu <songliubraving@fb.com>
Tested-by: John Stultz <john.stultz@linaro.org>
Tested-by: Douglas Anderson <dianders@chromium.org>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com> # LLVM/Clang version 11.0.1
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Battery status is being reported for the Elan touchscreen on ASUS
UX550 laptops despite not having a batter. It always shows either 0 or
1%.
Signed-off-by: Seth Miller <miller.seth@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
The current check of nvec < minvec for nvec returned from
platform_irq_count() will not detect a negative error code in nvec.
This is because minvec is unsigned, and, as such, nvec is promoted to
unsigned in that check, which will make it a huge number (if it contained
-EPROBE_DEFER).
In practice, an error should not occur in nvec for the only in-tree
user, but add a check anyway.
Fixes: e15f2fa959 ("driver core: platform: Add devm_platform_get_irqs_affinity()")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Link: https://lore.kernel.org/r/1608561055-231244-1-git-send-email-john.garry@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 38d715f494 ("btrfs: use btrfs_start_delalloc_roots in
shrink_delalloc") cleaned up how we do delalloc shrinking by utilizing
some infrastructure we have in place to flush inodes that we use for
device replace and snapshot. However this introduced a pretty serious
performance regression. To reproduce the user untarred the source
tarball of Firefox (360MiB xz compressed/1.5GiB uncompressed), and would
see it take anywhere from 5 to 20 times as long to untar in 5.10
compared to 5.9. This was observed on fast devices (SSD and better) and
not on HDD.
The root cause is because before we would generally use the normal
writeback path to reclaim delalloc space, and for this we would provide
it with the number of pages we wanted to flush. The referenced commit
changed this to flush that many inodes, which drastically increased the
amount of space we were flushing in certain cases, which severely
affected performance.
We cannot revert this patch unfortunately because of 3d45f221ce
("btrfs: fix deadlock when cloning inline extent and low on free
metadata space") which requires the ability to skip flushing inodes that
are being cloned in certain scenarios, which means we need to keep using
our flushing infrastructure or risk re-introducing the deadlock.
Instead to fix this problem we can go back to providing
btrfs_start_delalloc_roots with a number of pages to flush, and then set
up a writeback_control and utilize sync_inode() to handle the flushing
for us. This gives us the same behavior we had prior to the fix, while
still allowing us to avoid the deadlock that was fixed by Filipe. I
redid the users original test and got the following results on one of
our test machines (256GiB of ram, 56 cores, 2TiB Intel NVMe drive)
5.9 0m54.258s
5.10 1m26.212s
5.10+patch 0m38.800s
5.10+patch is significantly faster than plain 5.9 because of my patch
series "Change data reservations to use the ticketing infra" which
contained the patch that introduced the regression, but generally
improved the overall ENOSPC flushing mechanisms.
Additional testing on consumer-grade SSD (8GiB ram, 8 CPU) confirm
the results:
5.10.5 4m00s
5.10.5+patch 1m08s
5.11-rc2 5m14s
5.11-rc2+patch 1m30s
Reported-by: René Rebe <rene@exactcode.de>
Fixes: 38d715f494 ("btrfs: use btrfs_start_delalloc_roots in shrink_delalloc")
CC: stable@vger.kernel.org # 5.10
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Tested-by: David Sterba <dsterba@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ add my test results ]
Signed-off-by: David Sterba <dsterba@suse.com>
There will be memory leak if driver probe failed. Trace as below:
backtrace:
[<000000002415258f>] kmemleak_alloc+0x3c/0x50
[<00000000f447ebe4>] __kmalloc+0x208/0x530
[<0000000048bc7b3a>] of_dma_get_range+0xe4/0x1b0
[<0000000041e39065>] of_dma_configure_id+0x58/0x27c
[<000000006356866a>] platform_dma_configure+0x2c/0x40
......
[<000000000afcf9b5>] ret_from_fork+0x10/0x3c
This issue is introduced by commit e0d072782c73("dma-mapping:
introduce DMA range map, supplanting dma_pfn_offset "). It doesn't
free dma_range_map when driver probe failed and cause above
memory leak. So, add code to free it in error path.
Fixes: e0d072782c ("dma-mapping: introduce DMA range map, supplanting dma_pfn_offset ")
Cc: stable@vger.kernel.org
Signed-off-by: Meng Li <Meng.Li@windriver.com>
Link: https://lore.kernel.org/r/20210105070927.14968-1-Meng.Li@windriver.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Showing the hctx flags for when BLK_MQ_F_TAG_HCTX_SHARED is set gives
something like:
root@debian:/home/john# more /sys/kernel/debug/block/sda/hctx0/flags
alloc_policy=FIFO SHOULD_MERGE|TAG_QUEUE_SHARED|3
Add the decoding for that flag.
Fixes: 32bc15afed ("blk-mq: Facilitate a shared sbitmap per tagset")
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We had kernel panic, it is caused by unload module and last
close confirmation.
call trace:
[1196029.743127] free_sess+0x15/0x50 [rtrs_client]
[1196029.743128] rtrs_clt_close+0x4c/0x70 [rtrs_client]
[1196029.743129] ? rnbd_clt_unmap_device+0x1b0/0x1b0 [rnbd_client]
[1196029.743130] close_rtrs+0x25/0x50 [rnbd_client]
[1196029.743131] rnbd_client_exit+0x93/0xb99 [rnbd_client]
[1196029.743132] __x64_sys_delete_module+0x190/0x260
And in the crashdump confirmation kworker is also running.
PID: 6943 TASK: ffff9e2ac8098000 CPU: 4 COMMAND: "kworker/4:2"
#0 [ffffb206cf337c30] __schedule at ffffffff9f93f891
#1 [ffffb206cf337cc8] schedule at ffffffff9f93fe98
#2 [ffffb206cf337cd0] schedule_timeout at ffffffff9f943938
#3 [ffffb206cf337d50] wait_for_completion at ffffffff9f9410a7
#4 [ffffb206cf337da0] __flush_work at ffffffff9f08ce0e
#5 [ffffb206cf337e20] rtrs_clt_close_conns at ffffffffc0d5f668 [rtrs_client]
#6 [ffffb206cf337e48] rtrs_clt_close at ffffffffc0d5f801 [rtrs_client]
#7 [ffffb206cf337e68] close_rtrs at ffffffffc0d26255 [rnbd_client]
#8 [ffffb206cf337e78] free_sess at ffffffffc0d262ad [rnbd_client]
#9 [ffffb206cf337e88] rnbd_clt_put_dev at ffffffffc0d266a7 [rnbd_client]
The problem is both code path try to close same session, which lead to
panic.
To fix it, just skip the sess if the refcount already drop to 0.
Fixes: f7a7a5c228 ("block/rnbd: client: main functionality")
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Reviewed-by: Gioh Kim <gi-oh.kim@cloud.ionos.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
lkp reboot following build error:
drivers/block/rnbd/rnbd-clt.c: In function 'rnbd_softirq_done_fn':
>> drivers/block/rnbd/rnbd-clt.c:387:2: error: implicit declaration of function 'sg_free_table_chained' [-Werror=implicit-function-declaration]
387 | sg_free_table_chained(&iu->sgt, RNBD_INLINE_SG_CNT);
| ^~~~~~~~~~~~~~~~~~~~~
The reason is CONFIG_SG_POOL is not enabled in the config, to
avoid such failure, select SG_POOL in Kconfig for RNBD_CLIENT.
Fixes: 5a1328d0c3 ("block/rnbd-clt: Dynamically allocate sglist for rnbd_iu")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Tested. The device gets correctly exported to userspace and I can see
mouse and keyboard events.
Signed-off-by: Filipe Laíns <lains@archlinux.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Sound is broken on the DragonBoard 410c (apq8016_sbc) since 5.10:
hdmi-audio-codec hdmi-audio-codec.1.auto: ASoC: error at snd_soc_component_set_jack on hdmi-audio-codec.1.auto: -95
qcom-apq8016-sbc 7702000.sound: Failed to set jack: -95
ADV7533: ASoC: error at snd_soc_link_init on ADV7533: -95
hdmi-audio-codec hdmi-audio-codec.1.auto: ASoC: error at snd_soc_component_set_jack on hdmi-audio-codec.1.auto: -95
qcom-apq8016-sbc: probe of 7702000.sound failed with error -95
This happens because apq8016_sbc calls snd_soc_component_set_jack() on
all codec DAIs and attempts to ignore failures with return code -ENOTSUPP.
-ENOTSUPP is also excluded from error logging in soc_component_ret().
However, hdmi_codec_set_jack() returns -E*OP*NOTSUPP if jack detection
is not supported, which is not handled in apq8016_sbc and soc_component_ret().
Make it return -ENOTSUPP instead to fix sound and silence the errors.
Cc: Cheng-Yi Chiang <cychiang@chromium.org>
Cc: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Fixes: 55c5cc63ab ("ASoC: hdmi-codec: Use set_jack ops to set jack")
Signed-off-by: Stephan Gerhold <stephan@gerhold.net>
Acked-by: Nicolin Chen <nicoleotsuka@gmail.com>
Link: https://lore.kernel.org/r/20210107165131.2535-1-stephan@gerhold.net
Signed-off-by: Mark Brown <broonie@kernel.org>
Currently, when moving a task to a resource group the PQR_ASSOC MSR is
updated with the new closid and rmid in an added task callback. If the
task is running, the work is run as soon as possible. If the task is not
running, the work is executed later in the kernel exit path when the
kernel returns to the task again.
Updating the PQR_ASSOC MSR as soon as possible on the CPU a moved task
is running is the right thing to do. Queueing work for a task that is
not running is unnecessary (the PQR_ASSOC MSR is already updated when
the task is scheduled in) and causing system resource waste with the way
in which it is implemented: Work to update the PQR_ASSOC register is
queued every time the user writes a task id to the "tasks" file, even if
the task already belongs to the resource group.
This could result in multiple pending work items associated with a
single task even if they are all identical and even though only a single
update with most recent values is needed. Specifically, even if a task
is moved between different resource groups while it is sleeping then it
is only the last move that is relevant but yet a work item is queued
during each move.
This unnecessary queueing of work items could result in significant
system resource waste, especially on tasks sleeping for a long time.
For example, as demonstrated by Shakeel Butt in [1] writing the same
task id to the "tasks" file can quickly consume significant memory. The
same problem (wasted system resources) occurs when moving a task between
different resource groups.
As pointed out by Valentin Schneider in [2] there is an additional issue
with the way in which the queueing of work is done in that the task_struct
update is currently done after the work is queued, resulting in a race with
the register update possibly done before the data needed by the update is
available.
To solve these issues, update the PQR_ASSOC MSR in a synchronous way
right after the new closid and rmid are ready during the task movement,
only if the task is running. If a moved task is not running nothing
is done since the PQR_ASSOC MSR will be updated next time the task is
scheduled. This is the same way used to update the register when tasks
are moved as part of resource group removal.
[1] https://lore.kernel.org/lkml/CALvZod7E9zzHwenzf7objzGKsdBmVwTgEJ0nPgs0LUFU3SN5Pw@mail.gmail.com/
[2] https://lore.kernel.org/lkml/20201123022433.17905-1-valentin.schneider@arm.com
[ bp: Massage commit message and drop the two update_task_closid_rmid()
variants. ]
Fixes: e02737d5b8 ("x86/intel_rdt: Add tasks files")
Reported-by: Shakeel Butt <shakeelb@google.com>
Reported-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: James Morse <james.morse@arm.com>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/17aa2fb38fc12ce7bb710106b3e7c7b45acb9e94.1608243147.git.reinette.chatre@intel.com
The driver was converted to use the crypto engine helper
but is missing the corresponding Kconfig statement to ensure
it is available:
arm-linux-gnueabi-ld: drivers/crypto/omap-sham.o: in function `omap_sham_probe':
omap-sham.c:(.text+0x374): undefined reference to `crypto_engine_alloc_init'
arm-linux-gnueabi-ld: omap-sham.c:(.text+0x384): undefined reference to `crypto_engine_start'
arm-linux-gnueabi-ld: omap-sham.c:(.text+0x510): undefined reference to `crypto_engine_exit'
arm-linux-gnueabi-ld: drivers/crypto/omap-sham.o: in function `omap_sham_finish_req':
omap-sham.c:(.text+0x98c): undefined reference to `crypto_finalize_hash_request'
arm-linux-gnueabi-ld: omap-sham.c:(.text+0x9a0): undefined reference to `crypto_transfer_hash_request_to_engine'
arm-linux-gnueabi-ld: drivers/crypto/omap-sham.o: in function `omap_sham_update':
omap-sham.c:(.text+0xf24): undefined reference to `crypto_transfer_hash_request_to_engine'
arm-linux-gnueabi-ld: drivers/crypto/omap-sham.o: in function `omap_sham_final':
omap-sham.c:(.text+0x1020): undefined reference to `crypto_transfer_hash_request_to_engine'
Fixes: 133c3d434d ("crypto: omap-sham - convert to use crypto engine")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
bdev_evict_inode and bdev_free_inode are also called for the root inode
of bdevfs, for which bdev_alloc is never called. Move the zeroing o
f struct block_device and the initialization of the bd_bdi field into
bdev_alloc_inode to make sure they are initialized for the root inode
as well.
Fixes: e6cb53827e ("block: initialize struct block_device in bdev_alloc")
Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
When non-fatal error like line-reset happens, ufshcd_err_handler() starts
to abort tasks by ufshcd_try_to_abort_task(). When it tries to issue a task
management request, we hit two warnings:
WARNING: CPU: 7 PID: 7 at block/blk-core.c:630 blk_get_request+0x68/0x70
WARNING: CPU: 4 PID: 157 at block/blk-mq-tag.c:82 blk_mq_get_tag+0x438/0x46c
After fixing the above warnings we hit another tm_cmd timeout which may be
caused by unstable controller state:
__ufshcd_issue_tm_cmd: task management cmd 0x80 timed-out
Then, ufshcd_err_handler() enters full reset, and kernel gets stuck. It
turned out ufshcd_print_trs() printed too many messages on console which
requires CPU locks. Likewise hba->silence_err_logs, we need to avoid too
verbose messages. This is actually not an error case.
Link: https://lore.kernel.org/r/20210107185316.788815-3-jaegeuk@kernel.org
Fixes: 69a6c269c0 ("scsi: ufs: Use blk_{get,put}_request() to allocate and free TMFs")
Reviewed-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When gate_work/ungate_work experience an error during hibern8_enter or exit
we can livelock:
ufshcd_err_handler()
ufshcd_scsi_block_requests()
ufshcd_reset_and_restore()
ufshcd_clear_ua_wluns() -> stuck
ufshcd_scsi_unblock_requests()
In order to avoid this, ufshcd_clear_ua_wluns() can be called per recovery
flows such as suspend/resume, link_recovery, and error_handler.
Link: https://lore.kernel.org/r/20210107185316.788815-2-jaegeuk@kernel.org
Fixes: 1918651f2d ("scsi: ufs: Clear UAC for RPMB after ufshcd resets")
Reviewed-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Commit 2aa0102c66 ("scsi: ibmvfc: Use correlation token to tag commands")
sets the vfcFrame correlation token to the pointer handle of the associated
ibmvfc_event. However, that commit failed to cast the pointer to an
appropriate type which in this case is a u64. As such sparse warnings are
generated for both correlation token assignments.
ibmvfc.c:2375:36: sparse: incorrect type in argument 1 (different base types)
ibmvfc.c:2375:36: sparse: expected unsigned long long [usertype] val
ibmvfc.c:2375:36: sparse: got struct ibmvfc_event *[assigned] evt
Add the appropriate u64 casts when assigning an ibmvfc_event as a
correlation token.
Link: https://lore.kernel.org/r/20210106203721.1054693-1-tyreld@linux.ibm.com
Fixes: 2aa0102c66 ("scsi: ibmvfc: Use correlation token to tag commands")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Phil Oester reported that a fix for a possible buffer overrun that I sent
caused a regression that manifests in this output:
Event Message: A PCI parity error was detected on a component at bus 0 device 5 function 0.
Severity: Critical
Message ID: PCI1308
The original code tried to handle the sense data pointer differently when
using 32-bit 64-bit DMA addressing, which would lead to a 32-bit dma_addr_t
value of 0x11223344 to get stored
32-bit kernel: 44 33 22 11 ?? ?? ?? ??
64-bit LE kernel: 44 33 22 11 00 00 00 00
64-bit BE kernel: 00 00 00 00 44 33 22 11
or a 64-bit dma_addr_t value of 0x1122334455667788 to get stored as
32-bit kernel: 88 77 66 55 ?? ?? ?? ??
64-bit kernel: 88 77 66 55 44 33 22 11
In my patch, I tried to ensure that the same value is used on both 32-bit
and 64-bit kernels, and picked what seemed to be the most sensible
combination, storing 32-bit addresses in the first four bytes (as 32-bit
kernels already did), and 64-bit addresses in eight consecutive bytes (as
64-bit kernels already did), but evidently this was incorrect.
Always storing the dma_addr_t pointer as 64-bit little-endian,
i.e. initializing the second four bytes to zero in case of 32-bit
addressing, apparently solved the problem for Phil, and is consistent with
what all 64-bit little-endian machines did before.
I also checked in the history that in previous versions of the code, the
pointer was always in the first four bytes without padding, and that
previous attempts to fix 64-bit user space, big-endian architectures and
64-bit DMA were clearly flawed and seem to have introduced made this worse.
Link: https://lore.kernel.org/r/20210104234137.438275-1-arnd@kernel.org
Fixes: 381d34e376 ("scsi: megaraid_sas: Check user-provided offsets")
Fixes: 107a60dd71 ("scsi: megaraid_sas: Add support for 64bit consistent DMA")
Fixes: 94cd65ddf4 ("[SCSI] megaraid_sas: addded support for big endian architecture")
Fixes: 7b2519afa1 ("[SCSI] megaraid_sas: fix 64 bit sense pointer truncation")
Reported-by: Phil Oester <kernel@linuxace.com>
Tested-by: Phil Oester <kernel@linuxace.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Saeed Mahameed says:
====================
mlx5 fixes 2021-01-07
* tag 'mlx5-fixes-2021-01-07' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5e: Fix memleak in mlx5e_create_l2_table_groups
net/mlx5e: Fix two double free cases
net/mlx5: Release devlink object if adev fails
net/mlx5e: ethtool, Fix restriction of autoneg with 56G
net/mlx5e: In skb build skip setting mark in switchdev mode
net/mlx5: E-Switch, fix changing vf VLANID
net/mlx5e: Fix SWP offsets when vlan inserted by driver
net/mlx5e: CT: Use per flow counter when CT flow accounting is enabled
net/mlx5: Use port_num 1 instead of 0 when delete a RoCE address
net/mlx5e: Add missing capability check for uplink follow
net/mlx5: Check if lag is supported before creating one
====================
Link: https://lore.kernel.org/r/20210107202845.470205-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Julian Wiedmann says:
====================
s390/qeth: fixes 2021-01-07
This brings two locking fixes for the device control path.
Also one fix for a path where our .ndo_features_check() attempts to
access a non-existent L2 header.
====================
Link: https://lore.kernel.org/r/20210107172442.1737-1-jwi@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
ip_finish_output_gso() may call .ndo_features_check() even before the
skb has a L2 header. This conflicts with qeth_get_ip_version()'s attempt
to inspect the L2 header via vlan_eth_hdr().
Switch to vlan_get_protocol(), as already used further down in the
common qeth_features_check() path.
Fixes: f13ade1993 ("s390/qeth: run non-offload L3 traffic over common xmit path")
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Due to insufficient locking, qeth_core_set_online() and
qeth_dev_layer2_store() can run in parallel, both attempting to load &
setup the discipline (and stepping on each other toes along the way).
A similar race can also occur between qeth_core_remove_device() and
qeth_dev_layer2_store().
Access to .discipline is meant to be protected by the discipline_mutex,
so add/expand the locking in qeth_core_remove_device() and
qeth_core_set_online().
Adjust the locking in qeth_l*_remove_device() accordingly, as it's now
handled by the callers in a consistent manner.
Based on an initial patch by Ursula Braun.
Fixes: 9dc48ccc68 ("qeth: serialize sysfs-triggered device configurations")
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When qeth_dev_layer2_store() - holding the discipline_mutex - waits
inside qeth_l*_remove_device() for a qeth_do_reset() thread to complete,
we can hit a deadlock if qeth_do_reset() concurrently calls
qeth_set_online() and thus tries to aquire the discipline_mutex.
Move the discipline_mutex locking outside of qeth_set_online() and
qeth_set_offline(), and turn the discipline into a parameter so that
callers understand the dependency.
To fix the deadlock, we can now relax the locking:
As already established, qeth_l*_remove_device() waits for
qeth_do_reset() to complete. So qeth_do_reset() itself is under no risk
of having card->discipline ripped out while it's running, and thus
doesn't need to take the discipline_mutex.
Fixes: 9dc48ccc68 ("qeth: serialize sysfs-triggered device configurations")
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ido Schimmel says:
====================
nexthop: Various fixes
This series contains various fixes for the nexthop code. The bugs were
uncovered during the development of resilient nexthop groups.
Patches #1-#2 fix the error path of nexthop_create_group(). I was not
able to trigger these bugs with current code, but it is possible with
the upcoming resilient nexthop groups code which adds a user
controllable memory allocation further in the function.
Patch #3 fixes wrong validation of netlink attributes.
Patch #4 fixes wrong invocation of mausezahn in a selftest.
====================
Link: https://lore.kernel.org/r/20210107144824.1135691-1-idosch@idosch.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
For IPv6 traffic, mausezahn needs to be invoked with '-6'. Otherwise an
error is returned:
# ip netns exec me mausezahn veth1 -B 2001:db8:101::2 -A 2001:db8:91::1 -c 0 -t tcp "dp=1-1023, flags=syn"
Failed to set source IPv4 address. Please check if source is set to a valid IPv4 address.
Invalid command line parameters!
Fixes: 7c741868ce ("selftests: Add torture tests to nexthop tests")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The function nh_check_attr_group() is called to validate nexthop groups.
The intention of that code seems to have been to bounce all attributes
above NHA_GROUP_TYPE except for NHA_FDB. However instead it bounces all
these attributes except when NHA_FDB attribute is present--then it accepts
them.
NHA_FDB validation that takes place before, in rtm_to_nh_config(), already
bounces NHA_OIF, NHA_BLACKHOLE, NHA_ENCAP and NHA_ENCAP_TYPE. Yet further
back, NHA_GROUPS and NHA_MASTER are bounced unconditionally.
But that still leaves NHA_GATEWAY as an attribute that would be accepted in
FDB nexthop groups (with no meaning), so long as it keeps the address
family as unspecified:
# ip nexthop add id 1 fdb via 127.0.0.1
# ip nexthop add id 10 fdb via default group 1
The nexthop code is still relatively new and likely not used very broadly,
and the FDB bits are newer still. Even though there is a reproducer out
there, it relies on an improbable gateway arguments "via default", "via
all" or "via any". Given all this, I believe it is OK to reformulate the
condition to do the right thing and bounce NHA_GATEWAY.
Fixes: 38428d6871 ("nexthop: support for fdb ecmp nexthops")
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In case of error, remove the nexthop group entry from the list to which
it was previously added.
Fixes: 430a049190 ("nexthop: Add support for nexthop groups")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
A reference was not taken for the current nexthop entry, so do not try
to put it in the error path.
Fixes: 430a049190 ("nexthop: Add support for nexthop groups")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When running is M-Mode (no MMU config), MPIE does not get set. This
results in all syscalls being executed with interrupts disabled as
handle_exception never sets SR_IE as it always sees SR_PIE being
cleared. Fix this by always force enabling interrupts in
handle_syscall when CONFIG_RISCV_M_MODE is enabled.
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Palmer Dabbelt <palmerdabbelt@google.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Setup the port uartclk in sifive_serial_probe() so that the base baud
rate is correctly printed during device probe instead of always showing
"0". I.e. the probe message is changed from
38000000.serial: ttySIF0 at MMIO 0x38000000 (irq = 1,
base_baud = 0) is a SiFive UART v0
to the correct:
38000000.serial: ttySIF0 at MMIO 0x38000000 (irq = 1,
base_baud = 115200) is a SiFive UART v0
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Palmer Dabbelt <palmerdabbelt@google.com>
Acked-by: Palmer Dabbelt <palmerdabbelt@google.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
If of_clk_init() is not called in time_init(), clock providers defined
in the system device tree are not initialized, resulting in failures for
other devices to initialize due to missing clocks.
Similarly to other architectures and to the default kernel time_init()
implementation, call of_clk_init() before executing timer_probe() in
time_init().
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Acked-by: Stephen Boyd <sboyd@kernel.org>
Reviewed-by: Palmer Dabbelt <palmerdabbelt@google.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Properly return -ENOSYS for syscall -1 instead of leaving the return value
uninitialized. This fixes the strace teststuite.
Fixes: 5340627e3f ("riscv: add support for SECCOMP and SECCOMP_FILTER")
Cc: stable@vger.kernel.org
Signed-off-by: Andreas Schwab <schwab@suse.de>
Reviewed-by: Tycho Andersen <tycho@tycho.pizza>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
CPL_ABORT_RPL is sent after releasing the resources by calling
chtls_release_resources(sk); and chtls_conn_done(sk);
eventually causing kernel panic. Fixing it by calling release
in appropriate order.
Fixes: cc35c88ae4 ("crypto : chtls - CPL handler definition")
Signed-off-by: Vinay Kumar Yadav <vinay.yadav@chelsio.com>
Signed-off-by: Ayush Sawal <ayush.sawal@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In case of server removal lookup_stid() may return NULL pointer, which
is used as listen_ctx. So added a check before accessing this pointer.
Fixes: cc35c88ae4 ("crypto : chtls - CPL handler definition")
Signed-off-by: Vinay Kumar Yadav <vinay.yadav@chelsio.com>
Signed-off-by: Ayush Sawal <ayush.sawal@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The skb is unlinked twice, one in __skb_dequeue in function
chtls_reset_synq() and another in cleanup_syn_rcv_conn().
So in this patch using skb_peek() instead of __skb_dequeue(),
so that unlink will be handled only in cleanup_syn_rcv_conn().
Fixes: cc35c88ae4 ("crypto : chtls - CPL handler definition")
Signed-off-by: Vinay Kumar Yadav <vinay.yadav@chelsio.com>
Signed-off-by: Ayush Sawal <ayush.sawal@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In chtls_pass_accept_request(), removing the chtls_reqsk_free()
call to avoid oreq freeing twice. Here oreq is the pointer to
struct request_sock.
Fixes: cc35c88ae4 ("crypto : chtls - CPL handler definition")
Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com>
Signed-off-by: Ayush Sawal <ayush.sawal@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If route to peer is not configured, we might get non tls
devices from dst_neigh_lookup() which is invalid, adding a
check to avoid it.
Fixes: cc35c88ae4 ("crypto : chtls - CPL handler definition")
Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com>
Signed-off-by: Ayush Sawal <ayush.sawal@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
At the time of SYN_RECV, connection information is not
initialized at FW, updating tcb flag over uninitialized
connection causes adapter crash. We don't need to
update the flag during SYN_RECV state, so avoid this.
Fixes: cc35c88ae4 ("crypto : chtls - CPL handler definition")
Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com>
Signed-off-by: Ayush Sawal <ayush.sawal@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
send_abort_rpl() is not calculating cpl_abort_req_rss offset and
ends up sending wrong TID with abort_rpl WR causng tid leaks.
Replaced send_abort_rpl() with chtls_send_abort_rpl() as it is
redundant.
Fixes: cc35c88ae4 ("crypto : chtls - CPL handler definition")
Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com>
Signed-off-by: Ayush Sawal <ayush.sawal@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull gcc-plugins fix from Kees Cook:
"Bump c++ standard version for latest GCC versions (Valdis Kletnieks)"
* tag 'gcc-plugins-v5.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
gcc-plugins: fix gcc 11 indigestion with plugins...
Typically under KVM, an AP is booted using the INIT-SIPI-SIPI sequence,
where the guest vCPU register state is updated and then the vCPU is VMRUN
to begin execution of the AP. For an SEV-ES guest, this won't work because
the guest register state is encrypted.
Following the GHCB specification, the hypervisor must not alter the guest
register state, so KVM must track an AP/vCPU boot. Should the guest want
to park the AP, it must use the AP Reset Hold exit event in place of, for
example, a HLT loop.
First AP boot (first INIT-SIPI-SIPI sequence):
Execute the AP (vCPU) as it was initialized and measured by the SEV-ES
support. It is up to the guest to transfer control of the AP to the
proper location.
Subsequent AP boot:
KVM will expect to receive an AP Reset Hold exit event indicating that
the vCPU is being parked and will require an INIT-SIPI-SIPI sequence to
awaken it. When the AP Reset Hold exit event is received, KVM will place
the vCPU into a simulated HLT mode. Upon receiving the INIT-SIPI-SIPI
sequence, KVM will make the vCPU runnable. It is again up to the guest
to then transfer control of the AP to the proper location.
To differentiate between an actual HLT and an AP Reset Hold, a new MP
state is introduced, KVM_MP_STATE_AP_RESET_HOLD, which the vCPU is
placed in upon receiving the AP Reset Hold exit event. Additionally, to
communicate the AP Reset Hold exit event up to userspace (if needed), a
new exit reason is introduced, KVM_EXIT_AP_RESET_HOLD.
A new x86 ops function is introduced, vcpu_deliver_sipi_vector, in order
to accomplish AP booting. For VMX, vcpu_deliver_sipi_vector is set to the
original SIPI delivery function, kvm_vcpu_deliver_sipi_vector(). SVM adds
a new function that, for non SEV-ES guests, invokes the original SIPI
delivery function, kvm_vcpu_deliver_sipi_vector(), but for SEV-ES guests,
implements the logic above.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Message-Id: <e8fbebe8eb161ceaabdad7c01a5859a78b424d5e.1609791600.git.thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
It is possible to exit the nested guest mode, entered by
svm_set_nested_state prior to first vm entry to it (e.g due to pending event)
if the nested run was not pending during the migration.
In this case we must not switch to the nested msr permission bitmap.
Also add a warning to catch similar cases in the future.
Fixes: a7d5c7ce41 ("KVM: nSVM: delay MSR permission processing to first nested VM run")
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210107093854.882483-2-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The code to store it on the migration exists, but no code was restoring it.
One of the side effects of fixing this is that L1->L2 injected events
are no longer lost when migration happens with nested run pending.
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210107093854.882483-3-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The tdp_mmu_roots and tdp_mmu_pages in struct kvm_arch should only contain
pages with tdp_mmu_page set to true. tdp_mmu_pages should not contain any
pages with a non-zero root_count and tdp_mmu_roots should only contain
pages with a positive root_count, unless a thread holds the MMU lock and
is in the process of modifying the list. Various functions expect these
invariants to be maintained, but they are not explictily documented. Add
to the comments on both fields to document the above invariants.
Signed-off-by: Ben Gardon <bgardon@google.com>
Message-Id: <20210107001935.3732070-2-bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Many TDP MMU functions which need to perform some action on all TDP MMU
roots hold a reference on that root so that they can safely drop the MMU
lock in order to yield to other threads. However, when releasing the
reference on the root, there is a bug: the root will not be freed even
if its reference count (root_count) is reduced to 0.
To simplify acquiring and releasing references on TDP MMU root pages, and
to ensure that these roots are properly freed, move the get/put operations
into another TDP MMU root iterator macro.
Moving the get/put operations into an iterator macro also helps
simplify control flow when a root does need to be freed. Note that using
the list_for_each_entry_safe macro would not have been appropriate in
this situation because it could keep a pointer to the next root across
an MMU lock release + reacquire, during which time that root could be
freed.
Reported-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Fixes: faaf05b00a ("kvm: x86/mmu: Support zapping SPTEs in the TDP MMU")
Fixes: 063afacd87 ("kvm: x86/mmu: Support invalidate range MMU notifier for TDP MMU")
Fixes: a6a0b05da9 ("kvm: x86/mmu: Support dirty logging for the TDP MMU")
Fixes: 1488199856 ("kvm: x86/mmu: Support disabling dirty logging for the tdp MMU")
Signed-off-by: Ben Gardon <bgardon@google.com>
Message-Id: <20210107001935.3732070-1-bgardon@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
In kvm_mmu_notifier_invalidate_range_start(), tlbs_dirty is used as:
need_tlb_flush |= kvm->tlbs_dirty;
with need_tlb_flush's type being int and tlbs_dirty's type being long.
It means that tlbs_dirty is always used as int and the higher 32 bits
is useless. We need to check tlbs_dirty in a correct way and this
change checks it directly without propagating it to need_tlb_flush.
Note: it's _extremely_ unlikely this neglecting of higher 32 bits can
cause problems in practice. It would require encountering tlbs_dirty
on a 4 billion count boundary, and KVM would need to be using shadow
paging or be running a nested guest.
Cc: stable@vger.kernel.org
Fixes: a4ee1ca4a3 ("KVM: MMU: delay flush all tlbs on sync_page path")
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Message-Id: <20201217154118.16497-1-jiangshanlai@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Alexei Starovoitov says:
====================
pull-request: bpf 2021-01-07
We've added 4 non-merge commits during the last 10 day(s) which contain
a total of 4 files changed, 14 insertions(+), 7 deletions(-).
The main changes are:
1) Fix task_iter bug caused by the merge conflict resolution, from Yonghong.
2) Fix resolve_btfids for multiple type hierarchies, from Jiri.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpftool: Fix compilation failure for net.o with older glibc
tools/resolve_btfids: Warn when having multiple IDs for single type
bpf: Fix a task_iter bug caused by a merge conflict resolution
selftests/bpf: Fix a compile error for BPF_F_BPRM_SECUREEXEC
====================
Link: https://lore.kernel.org/r/20210107221555.64959-1-alexei.starovoitov@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When using LLVM's integrated assembler (LLVM_IAS=1) while building
x86_64_defconfig + CONFIG_KVM=y + CONFIG_KVM_AMD=y, the following build
error occurs:
$ make LLVM=1 LLVM_IAS=1 arch/x86/kvm/svm/sev.o
arch/x86/kvm/svm/sev.c:2004:15: error: too few operands for instruction
asm volatile(__ex("vmsave") : : "a" (__sme_page_pa(sd->save_area)) : "memory");
^
arch/x86/kvm/svm/sev.c:28:17: note: expanded from macro '__ex'
#define __ex(x) __kvm_handle_fault_on_reboot(x)
^
./arch/x86/include/asm/kvm_host.h:1646:10: note: expanded from macro '__kvm_handle_fault_on_reboot'
"666: \n\t" \
^
<inline asm>:2:2: note: instantiated into assembly here
vmsave
^
1 error generated.
This happens because LLVM currently does not support calling vmsave
without the fixed register operand (%rax for 64-bit and %eax for
32-bit). This will be fixed in LLVM 12 but the kernel currently supports
LLVM 10.0.1 and newer so this needs to be handled.
Add the proper register using the _ASM_AX macro, which matches the
vmsave call in vmenter.S.
Fixes: 861377730a ("KVM: SVM: Provide support for SEV-ES vCPU loading")
Link: https://reviews.llvm.org/D93524
Link: https://github.com/ClangBuiltLinux/linux/issues/1216
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Message-Id: <20201219063711.3526947-1-natechancellor@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Check only the terminal leaf for a "!PRESENT || MMIO" SPTE when looking
for reserved bits on valid, non-MMIO SPTEs. The get_walk() helpers
terminate their walks if a not-present or MMIO SPTE is encountered, i.e.
the non-terminal SPTEs have already been verified to be regular SPTEs.
This eliminates an extra check-and-branch in a relatively hot loop.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20201218003139.2167891-5-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Bump the size of the sptes array by one and use the raw level of the
SPTE to index into the sptes array. Using the SPTE level directly
improves readability by eliminating the need to reason out why the level
is being adjusted when indexing the array. The array is on the stack
and is not explicitly initialized; bumping its size is nothing more than
a superficial adjustment to the stack frame.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20201218003139.2167891-4-seanjc@google.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Get the so called "root" level from the low level shadow page table
walkers instead of manually attempting to calculate it higher up the
stack, e.g. in get_mmio_spte(). When KVM is using PAE shadow paging,
the starting level of the walk, from the callers perspective, is not
the CR3 root but rather the PDPTR "root". Checking for reserved bits
from the CR3 root causes get_mmio_spte() to consume uninitialized stack
data due to indexing into sptes[] for a level that was not filled by
get_walk(). This can result in false positives and/or negatives
depending on what garbage happens to be on the stack.
Opportunistically nuke a few extra newlines.
Fixes: 95fb5b0258 ("kvm: x86/mmu: Support MMIO in the TDP MMU")
Reported-by: Richard Herbert <rherbert@sympatico.ca>
Cc: Ben Gardon <bgardon@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20201218003139.2167891-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Return -1 from the get_walk() helpers if the shadow walk doesn't fill at
least one spte, which can theoretically happen if the walk hits a
not-present PDPTR. Returning the root level in such a case will cause
get_mmio_spte() to return garbage (uninitialized stack data). In
practice, such a scenario should be impossible as KVM shouldn't get a
reserved-bit page fault with a not-present PDPTR.
Note, using mmu->root_level in get_walk() is wrong for other reasons,
too, but that's now a moot point.
Fixes: 95fb5b0258 ("kvm: x86/mmu: Support MMIO in the TDP MMU")
Cc: Ben Gardon <bgardon@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20201218003139.2167891-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Florian Westphal says:
====================
net: fix netfilter defrag/ip tunnel pmtu blackhole
Christian Perle reported a PMTU blackhole due to unexpected interaction
between the ip defragmentation that comes with connection tracking and
ip tunnels.
Unfortunately setting 'nopmtudisc' on the tunnel breaks the test
scenario even without netfilter.
Christinas setup looks like this:
+--------+ +---------+ +--------+
|Router A|-------|Wanrouter|-------|Router B|
| |.IPIP..| |..IPIP.| |
+--------+ +---------+ +--------+
/ mtu 1400 \
/ \
+--------+ +--------+
|Client A| |Client B|
+--------+ +--------+
MTU is 1500 everywhere, except on Router A to Wanrouter and
Wanrouter to Router B.
Router A and Router B use IPIP tunnel interfaces to tunnel traffic
between Client A and Client B over WAN.
Client A sends a 1400 byte UDP datagram to Client B.
This packet gets encapsulated in the IPIP tunnel.
This works, packet is received on client B.
When conntrack (or anything else that forces ip defragmentation) is
enabled on Router A, the packet gets dropped on Router A after
encapsulation because they exceed the link MTU.
Setting the 'nopmtudisc' flag on the IPIP tunnel makes things worse,
no packets pass even in the no-netfilter scenario.
Patch one is a reproducer script for selftest infra.
Patch two is a fix for 'nopmtudisc' behaviour so ip_tunnel will send
an icmp error to Client A. This allows 'nopmtudisc' tunnel to forward
the UDP datagrams.
Patch three enables ip refragmentation for all reassembled packets, just
like ipv6.
====================
Link: https://lore.kernel.org/r/20210105231523.622-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Conntrack reassembly records the largest fragment size seen in IPCB.
However, when this gets forwarded/transmitted, fragmentation will only
be forced if one of the fragmented packets had the DF bit set.
In that case, a flag in IPCB will force fragmentation even if the
MTU is large enough.
This should work fine, but this breaks with ip tunnels.
Consider client that sends a UDP datagram of size X to another host.
The client fragments the datagram, so two packets, of size y and z, are
sent. DF bit is not set on any of these packets.
Middlebox netfilter reassembles those packets back to single size-X
packet, before routing decision.
packet-size-vs-mtu checks in ip_forward are irrelevant, because DF bit
isn't set. At output time, ip refragmentation is skipped as well
because x is still smaller than the mtu of the output device.
If ttransmit device is an ip tunnel, the packet size increases to
x+overhead.
Also, tunnel might be configured to force DF bit on outer header.
In this case, packet will be dropped (exceeds MTU) and an ICMP error is
generated back to sender.
But sender already respects the announced MTU, all the packets that
it sent did fit the announced mtu.
Force refragmentation as per original sizes unconditionally so ip tunnel
will encapsulate the fragments instead.
The only other solution I see is to place ip refragmentation in
the ip_tunnel code to handle this case.
Fixes: d6b915e29f ("ip_fragment: don't forward defragmented DF packet")
Reported-by: Christian Perle <christian.perle@secunet.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
For some reason ip_tunnel insist on setting the DF bit anyway when the
inner header has the DF bit set, EVEN if the tunnel was configured with
'nopmtudisc'.
This means that the script added in the previous commit
cannot be made to work by adding the 'nopmtudisc' flag to the
ip tunnel configuration. Doing so breaks connectivity even for the
without-conntrack/netfilter scenario.
When nopmtudisc is set, the tunnel will skip the mtu check, so no
icmp error is sent to client. Then, because inner header has DF set,
the outer header gets added with DF bit set as well.
IP stack then sends an error to itself because the packet exceeds
the device MTU.
Fixes: 23a3647bc4 ("ip_tunnels: Use skb-len to PMTU check.")
Cc: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linux 5.11.rcX was failing to boot on ARC HSDK board. Turns out we have
a couple of issues, this being the first one, and I'm to blame as I
didn't pay attention during review.
TIF_NOTIFY_SIGNAL support requires checking multiple TIF_* bits in
kernel return code path. Old code only needed to check a single bit so
BBIT0 <TIF_SIGPENDING> worked. New code needs to check multiple bits so
AND <bit-mask> instruction. So needs to use bit mask variant _TIF_SIGPENDING
Cc: Jens Axboe <axboe@kernel.dk>
Fixes: 53855e1258 ("arc: add support for TIF_NOTIFY_SIGNAL")
Link: https://github.com/foss-for-synopsys-dwc-arc-processors/linux/issues/34
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
In ocrdma_dealloc_ucontext_pd() uctx->cntxt_pd is assigned to the variable
pd and then after uctx->cntxt_pd is freed, the variable pd is passed to
function _ocrdma_dealloc_pd() which dereferences pd directly or through
its call to ocrdma_mbx_dealloc_pd().
Reorder the free using the variable pd.
Cc: stable@vger.kernel.org
Fixes: 21a428a019 ("RDMA: Handle PD allocations by IB/core")
Link: https://lore.kernel.org/r/20201230024653.1516495-1-trix@redhat.com
Signed-off-by: Tom Rix <trix@redhat.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
When mlx5_create_flow_group() fails, ft->g should be
freed just like when kvzalloc() fails. The caller of
mlx5e_create_l2_table_groups() does not catch this
issue on failure, which leads to memleak.
Fixes: 33cfaaa8f3 ("net/mlx5e: Split the main flow steering table")
Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
mlx5e_create_ttc_table_groups() frees ft->g on failure of
kvzalloc(), but such failure will be caught by its caller
in mlx5e_create_ttc_table() and ft->g will be freed again
in mlx5e_destroy_flow_table(). The same issue also occurs
in mlx5e_create_ttc_table_groups(). Set ft->g to NULL after
kfree() to avoid double free.
Fixes: 7b3722fa9e ("net/mlx5e: Support RSS for GRE tunneled packets")
Fixes: 33cfaaa8f3 ("net/mlx5e: Split the main flow steering table")
Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Prior to this patch, configuring speed to 50G with autoneg off over
devices supporting 50G per lane failed.
Support for 50G per lane introduced a new set of link-modes, on which
driver always performed a speed validation as if only legacy link-modes
were configured. Fix driver speed validation to force setting autoneg
over 56G only if in legacy link-mode.
Fixes: 3d7cadae51 ("net/mlx5e: ethtool, Fix analysis of speed setting")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Eran Ben Elisha <eranbe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
sop_drop_qpn field in the cqe is used by two features, in SWITCHDEV mode
to restore the chain id in case of a miss and in LEGACY mode to support
skbedit mark action. In build RX skb, the skb mark field is set regardless
of the configured mode which cause a corruption of the mark field in case
of switchdev mode.
Fix by overriding the mark value back to 0 in the representor tc update
skb flow.
Fixes: 8f1e0b97cc ("net/mlx5: E-Switch, Mark miss packets with new chain id mapping")
Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Raed Salem <raeds@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Adding vf VLANID for the first time, or after having cleared previously
defined VLANID works fine, however, attempting to change an existing vf
VLANID clears the rules on the firmware, but does not add new rules for
the new vf VLANID.
Fix this by changing the logic in function esw_acl_egress_lgcy_setup()
so that it will always configure egress rules.
Fixes: ea651a86d4 ("net/mlx5: E-Switch, Refactor eswitch egress acl codes")
Signed-off-by: Alaa Hleihel <alaa@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
In case WQE includes inline header the vlan is inserted by driver even
if vlan offload is set. On geneve over vlan interface where software
parser is used the SWP offsets should be updated according to the added
vlan.
Fixes: e3cfc7e6b7 ("net/mlx5e: TX, Add geneve tunnel stateless offload support")
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Connection counters may be shared for both directions when the counter
is used for connection aging purposes. However, if TC flow
accounting is enabled then a unique counter is required per direction.
Instantiate a unique counter per direction if the conntrack accounting
extension is enabled. Use a shared counter when the connection accounting
extension is disabled.
Fixes: 1edae2335a ("net/mlx5e: CT: Use the same counter for both directions")
Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Reported-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
In multi-port mode, FW reports syndrome 0x2ea48 (invalid vhca_port_number)
if the port_num is not 1 or 2.
Fixes: 80f09dfc23 ("net/mlx5: Eswitch, enable RoCE loopback traffic")
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Expose firmware indication that it supports setting eswitch uplink state
to follow (follow the physical link). Condition setting the eswitch
uplink admin-state with this capability bit. Older FW may not support
the uplink state setting.
Fixes: 7d0314b11c ("net/mlx5e: Modify uplink state on interface up/down")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Pull spi fixes from Mark Brown:
"A couple of core fixes here, both to do with handling of drivers which
don't report their maximum speed since we factored some of the
handling for transfer speeds out into the core in the previous
release.
There's also some driver specific fixes, including a relatively large
set for some races around timeouts in spi-geni-qcom"
* tag 'spi-fix-v5.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: fix the divide by 0 error when calculating xfer waiting time
spi: Fix the clamping of spi->max_speed_hz
spi: altera: fix return value for altera_spi_txrx()
spi: stm32: FIFO threshold level - fix align packet size
spi: spi-geni-qcom: Print an error when we timeout setting the CS
spi: spi-geni-qcom: Don't try to set CS if an xfer is pending
spi: spi-geni-qcom: Fail new xfers if xfer/cancel/abort pending
spi: spi-geni-qcom: Fix geni_spi_isr() NULL dereference in timeout case
Pull regulator fixes from Mark Brown:
"A few minor driver specific fixes, mostly DT bindings document bits,
plus a new device ID"
* tag 'regulator-fix-v5.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: qcom-rpmh: add QCOM_COMMAND_DB dependency
regulator: qcom-rpmh-regulator: correct hfsmps515 definition
dt-bindings: regulator: qcom,rpmh-regulator: add pm8009 revision
regulator: bd718x7: Add enable times
regulator: pf8x00: Use specific compatible strings for devices
This is just a maintenance patch to update font_ter16x32.c with changes
and minor fixes added in new upstream Terminus v4.49.
>From release notes of new version 4.49, this brings:
- Altered ascii grave in some sizes to be more useful as a back quote.
- Fixed 21B5, added 21B2 and 21B3.
Just as my initial submission of the font, above changes were obtained from
new ter-i32b.psf font source.
Terminus font sources are available for download at SourceForge:
https://sourceforge.net/projects/terminus-font/files/terminus-font-4.49/
Simply running `make` in source directory will build the .psf font files.
Signed-off-by: Amanoel Dawod <kernel@amanoeldawod.com>
Link: https://lore.kernel.org/r/20201226235840.26290-1-kernel@amanoeldawod.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Adds new 2 new tests to the PTMU script: pmtu_ipv4/6_route_change.
These tests explicitly test for a recently discovered problem in the
IPv6 routing framework where PMTU exceptions were not properly released
when replacing a route via "ip route change ...".
After creating PMTU exceptions, the route from the device A to R1 will be
replaced with a new route, then device A will be deleted. If the PMTU
exceptions were properly cleaned up by the kernel, this device deletion
will succeed. Otherwise, the unregistration of the device will stall, and
messages such as the following will be logged in dmesg:
unregister_netdevice: waiting for veth_A-R1 to become free. Usage count = 4
Signed-off-by: Sean Tranchetti <stranche@codeaurora.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/1609892546-11389-2-git-send-email-stranche@quicinc.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Route removal is handled by two code paths. The main removal path is via
fib6_del_route() which will handle purging any PMTU exceptions from the
cache, removing all per-cpu copies of the DST entry used by the route, and
releasing the fib6_info struct.
The second removal location is during fib6_add_rt2node() during a route
replacement operation. This path also calls fib6_purge_rt() to handle
cleaning up the per-cpu copies of the DST entries and releasing the
fib6_info associated with the older route, but it does not flush any PMTU
exceptions that the older route had. Since the older route is removed from
the tree during the replacement, we lose any way of accessing it again.
As these lingering DSTs and the fib6_info struct are holding references to
the underlying netdevice struct as well, unregistering that device from the
kernel can never complete.
Fixes: 2b760fcf5c ("ipv6: hook up exception table to store dst cache")
Signed-off-by: Sean Tranchetti <stranche@codeaurora.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/1609892546-11389-1-git-send-email-stranche@quicinc.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull regmap fixes from Mark Brown:
"A couple of small fixes for leaks when attaching a device to a
preexisting regmap"
* tag 'regmap-fix-v5.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
regmap: debugfs: Fix a reversed if statement in regmap_debugfs_init()
regmap: debugfs: Fix a memory leak when calling regmap_attach_dev
Marc Kleine-Budde says:
====================
pull-request: can 2021-01-07
The first patch is by me for the m_can driver and removes an erroneous
m_can_clk_stop() from the driver's unregister function.
The second patch targets the tcan4x5x driver, is by me, and fixes the bit
timing constant parameters.
The next two patches are by me, target the mcp251xfd driver, and fix a race
condition in the optimized TEF path (which was added in net-next for v5.11).
The similar code in the RX path is changed to look the same, although it
doesn't suffer from the race condition.
A patch by Lad Prabhakar updates the description and help text for the rcar CAN
driver to reflect all supported SoCs.
In the last patch Sriram Dash transfers the maintainership of the m_can driver
to Pankaj Sharma.
* tag 'linux-can-fixes-for-5.11-20210107' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
MAINTAINERS: Update MCAN MMIO device driver maintainer
can: rcar: Kconfig: update help description for CAN_RCAR config
can: mcp251xfd: mcp251xfd_handle_rxif_ring(): first increment RX tail pointer in HW, then in driver
can: mcp251xfd: mcp251xfd_handle_tefif(): fix TEF vs. TX race condition
can: tcan4x5x: fix bittiming const, use common bittiming from m_can driver
can: m_can: m_can_class_unregister(): remove erroneous m_can_clk_stop()
====================
Link: https://lore.kernel.org/r/20210107103451.183477-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
xa_alloc_cyclic() call returns positive number if ID allocation
succeeded but wrapped. It is not an error, so normalize the "ret"
variable to zero as marker of not-an-error.
drivers/infiniband/core/restrack.c:261 rdma_restrack_add()
warn: 'ret' can be either negative or positive
Fixes: fd47c2f99f ("RDMA/restrack: Convert internal DB from hash to XArray")
Link: https://lore.kernel.org/r/20201216100753.1127638-1-leon@kernel.org
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Pull NVMe updates from Christoph:
"nvme updates for 5.11:
- fix a race in the nvme-tcp send code (Sagi Grimberg)
- fix a list corruption in an nvme-rdma error path (Israel Rukshin)
- avoid a possible double fetch in nvme-pci (Lalithambika Krishnakumar)
- add the susystem NQN quirk for a Samsung driver (Gopal Tiwari)
- fix two compiler warnings in nvme-fcloop (James Smart)
- don't call sleeping functions from irq context in nvme-fc (James Smart)
- remove an unused argument (Max Gurtovoy)
- remove unused exports (Minwoo Im)"
* tag 'nvme-5.11-2021-01-07' of git://git.infradead.org/nvme:
nvme: remove the unused status argument from nvme_trace_bio_complete
nvmet-rdma: Fix list_del corruption on queue establishment failure
nvme: unexport functions with no external caller
nvme: avoid possible double fetch in handling CQE
nvme-tcp: Fix possible race of io_work and direct send
nvme-pci: mark Samsung PM1725a as IGNORE_DEV_SUBNQN
nvme-fcloop: Fix sscanf type and list_first_entry_or_null warnings
nvme-fc: avoid calling _nvme_fc_abort_outstanding_ios from interrupt context
percent_fp() was used in intel_pstate_pid_reset(), which was removed in
commit 9d0ef7af1f ("cpufreq: intel_pstate: Do not use PID-based P-state
selection") and hence, percent_fp() is unused since then.
percent_ext_fp() was last used in intel_pstate_update_perf_limits(), which
was refactored in commit 1a4fe38add ("cpufreq: intel_pstate: Remove
max/min fractions to limit performance"), and hence, percent_ext_fp() is
unused since then.
make CC=clang W=1 points us those unused functions:
drivers/cpufreq/intel_pstate.c:79:23: warning: unused function 'percent_fp' [-Wunused-function]
static inline int32_t percent_fp(int percent)
^
drivers/cpufreq/intel_pstate.c:94:23: warning: unused function 'percent_ext_fp' [-Wunused-function]
static inline int32_t percent_ext_fp(int percent)
^
Remove those obsolete functions.
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Using the GPU with a VRAM Carveout is a security vulnerability.
Nevertheless it is sometimes required, especially when no IOMMU
implementation is available for a certain platform.
Signed-off-by: Iskren Chernev <iskren.chernev@gmail.com>
Signed-off-by: Rob Clark <robdclark@chromium.org>
irq_hpd event can only be executed at connected state. Therefore
irq_hpd event should be postponed if it happened at connection
pending state. This patch also make sure both link rate and lane
are valid before start link training.
Signed-off-by: Kuogee Hsieh <khsieh@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@chromium.org>
There are only four valid fwnode cases which are
- primary --> secondary --> -ENODEV
- primary --> NULL
- secondary --> -ENODEV
- NULL
dev->fwnode should be converted between the 4 cases above no matter
how/when set_primary_fwnode() and set_secondary_fwnode() are called.
Describe it in the code so people will keep it in mind.
Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
[ rjw: Comment edit ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
While commit d5dcce0c41 ("device property: Keep secondary firmware
node secondary by type") describes everything correct in its commit
message, the change it made does the opposite and original commit
c15e1bdda4 ("device property: Fix the secondary firmware node handling
in set_primary_fwnode()") was fully correct.
Revert the former one here and improve documentation in the next patch.
Fixes: d5dcce0c41 ("device property: Keep secondary firmware node secondary by type")
Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Cc: 5.10+ <stable@vger.kernel.org> # 5.10+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The CONTAINER and HOTPLUG_MEMORY options mention modules but are bool
only, so if selected are always built in.
Drop the help text about modules.
Signed-off-by: Peter Robinson <pbrobinson@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
It adds a stub acpi_create_platform_device() for !CONFIG_ACPI build, so
that caller doesn't have to deal with !CONFIG_ACPI build issue.
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Two local variables in drivers/acpi/x86/s2idle.c are never read, so
drop them along with the code updating their values (in vain).
Fixes: fef9867119 ("ACPI: PM: s2idle: Move x86-specific code to the x86 directory")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Currently there is an unlikely case where cpufreq_cpu_get() returns a
NULL policy and this will cause a NULL pointer dereference later on.
Fix this by passing the policy to transition_frequency_fidvid() from
the caller and hence eliminating the need for the cpufreq_cpu_get()
and cpufreq_cpu_put().
Thanks to Viresh Kumar for suggesting the fix.
Addresses-Coverity: ("Dereference null return")
Fixes: b43a7ffbf3 ("cpufreq: Notify all policy->cpus in cpufreq_notify_transition()")
Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
If turbo P-states cannot be used, either due to the configuration of
the processor, or because intel_pstate is not allowed to used them,
the maximum available P-state with HWP enabled corresponds to the
HWP_CAP.GUARANTEED value which is not static. It can be adjusted by
an out-of-band agent or during an Intel Speed Select performance
level change, so long as it remains less than or equal to
HWP_CAP.MAX.
However, if turbo P-states cannot be used, intel_cpufreq_adjust_perf()
always uses pstate.max_pstate (set during the initialization of the
driver only) as the maximum available P-state, so it may miss a change
of the HWP_CAP.GUARANTEED value.
Prevent that from happening by modifyig intel_cpufreq_adjust_perf()
to always read the "guaranteed" and "maximum turbo" performance
levels from the cached HWP_CAP value.
Fixes: a365ab6b9d ("cpufreq: intel_pstate: Implement the ->adjust_perf() callback")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
freeze/thaw_bdev() currently use bdev->bd_fsfreeze_count to infer
whether or not bdev->bd_fsfreeze_sb is valid (it's valid iff
bd_fsfreeze_count is non-zero). thaw_bdev() doesn't nullify
bd_fsfreeze_sb.
But this means a freeze_bdev() call followed by a thaw_bdev() call can
leave bd_fsfreeze_sb with a non-null value, while bd_fsfreeze_count is
zero. If freeze_bdev() is called again, and this time
get_active_super() returns NULL (e.g. because the FS is unmounted),
we'll end up with bd_fsfreeze_count > 0, but bd_fsfreeze_sb is
*untouched* - it stays the same (now garbage) value. A subsequent
thaw_bdev() will decide that the bd_fsfreeze_sb value is legitimate
(since bd_fsfreeze_count > 0), and attempt to use it.
Fix this by always setting bd_fsfreeze_sb to NULL when
bd_fsfreeze_count is successfully decremented to 0 in thaw_sb().
Alternatively, we could set bd_fsfreeze_sb to whatever
get_active_super() returns in freeze_bdev() whenever bd_fsfreeze_count
is successfully incremented to 1 from 0 (which can be achieved cleanly
by moving the line currently setting bd_fsfreeze_sb to immediately
after the "sync:" label, but it might be a little too subtle/easily
overlooked in future).
This fixes the currently panicking xfstests generic/085.
Fixes: 040f04bd2e ("fs: simplify freeze_bdev/thaw_bdev")
Signed-off-by: Satya Tangirala <satyat@google.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
[BUG]
There are several bug reports about recent kernel unable to relocate
certain data block groups.
Sometimes the error just goes away, but there is one reporter who can
reproduce it reliably.
The dmesg would look like:
[438.260483] BTRFS info (device dm-10): balance: start -dvrange=34625344765952..34625344765953
[438.269018] BTRFS info (device dm-10): relocating block group 34625344765952 flags data|raid1
[450.439609] BTRFS info (device dm-10): found 167 extents, stage: move data extents
[463.501781] BTRFS info (device dm-10): balance: ended with status: -2
[CAUSE]
The ENOENT error is returned from the following call chain:
add_data_references()
|- delete_v1_space_cache();
|- if (!found)
return -ENOENT;
The variable @found is set to true if we find a data extent whose
disk bytenr matches parameter @data_bytes.
With extra debugging, the offending tree block looks like this:
leaf bytenr = 42676709441536, data_bytenr = 34626327621632
ctime 1567904822.739884119 (2019-09-08 03:07:02)
mtime 0.0 (1970-01-01 01:00:00)
otime 0.0 (1970-01-01 01:00:00)
item 27 key (51933 EXTENT_DATA 0) itemoff 9854 itemsize 53
generation 1517381 type 2 (prealloc)
prealloc data disk byte 34626327621632 nr 262144 <<<
prealloc data offset 0 nr 262144
item 28 key (52262 ROOT_ITEM 0) itemoff 9415 itemsize 439
generation 2618893 root_dirid 256 bytenr 42677048360960 level 3 refs 1
lastsnap 2618893 byte_limit 0 bytes_used 5557338112 flags 0x0(none)
uuid d0d4361f-d231-6d40-8901-fe506e4b2b53
Although item 27 has disk bytenr 34626327621632, which matches the
data_bytenr, its type is prealloc, not reg.
This makes the existing code skip that item, and return ENOENT.
[FIX]
The code is modified in commit 19b546d7a1 ("btrfs: relocation: Use
btrfs_find_all_leafs to locate data extent parent tree leaves"), before
that commit, we use something like
"if (type == BTRFS_FILE_EXTENT_INLINE) continue;"
But in that offending commit, we use (type == BTRFS_FILE_EXTENT_REG),
ignoring BTRFS_FILE_EXTENT_PREALLOC.
Fix it by also checking BTRFS_FILE_EXTENT_PREALLOC.
Reported-by: Stéphane Lesimple <stephane_btrfs2@lesimple.fr>
Link: https://lore.kernel.org/linux-btrfs/505cabfa88575ed6dbe7cb922d8914fb@lesimple.fr
Fixes: 19b546d7a1 ("btrfs: relocation: Use btrfs_find_all_leafs to locate data extent parent tree leaves")
CC: stable@vger.kernel.org # 5.6+
Tested-By: Stéphane Lesimple <stephane_btrfs2@lesimple.fr>
Reviewed-by: Su Yue <l@damenly.su>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Some extent io trees are initialized with NULL private member (e.g.
btrfs_device::alloc_state and btrfs_fs_info::excluded_extents).
Dereference of a NULL tree->private as inode pointer will cause panic.
Pass tree->fs_info as it's known to be valid in all cases.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=208929
Fixes: 05912a3c04 ("btrfs: drop extent_io_ops::tree_fs_info callback")
CC: stable@vger.kernel.org # 4.19+
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Su Yue <l@damenly.su>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We're supposed to print the root_key.offset in btrfs_root_name in the
case of a reloc root, not the objectid. Fix this helper to take the key
so we have access to the offset when we need it.
Fixes: 457f1864b5 ("btrfs: pretty print leaked root name")
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
waitqueue_active() needs smp_mb() to be in sync with waitqueues
modification, but we miss it in io_cqring_ev_posted*() apart from
cq_wait() case.
Take an smb_mb() out of wq_has_sleeper() making it waitqueue_active(),
and place it a few lines before, so it can synchronise other
waitqueue_active() as well.
The patch doesn't add any additional overhead, so even if there are
no problems currently, it's just safer to have it this way.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
CPU0 CPU1
---- ----
lock(&new->fa_lock);
local_irq_disable();
lock(&ctx->completion_lock);
lock(&new->fa_lock);
<Interrupt>
lock(&ctx->completion_lock);
*** DEADLOCK ***
Move kill_fasync() out of io_commit_cqring() to io_cqring_ev_posted(),
so it doesn't hold completion_lock while doing it. That saves from the
reported deadlock, and it's just nice to shorten the locking time and
untangle nested locks (compl_lock -> wq_head::lock).
Reported-by: syzbot+91ca3f25bd7f795f019c@syzkaller.appspotmail.com
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Make sure io_iopoll_complete() tries to wake up eventfd, which currently
is skipped together with io_cqring_ev_posted() for non-SQPOLL IOPOLL.
Add an iopoll version of io_cqring_ev_posted(), duplicates a bit of
code, but they actually use different sets of wait queues may be for
better.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Rename v4l2_get_link_rate() as v4l2_get_link_freq(). What the function
returns is the frequency of the link; rename it to reflect the name of the
control where the information is obtained.
Fixes: 1b888b3ceb ("media: v4l: Add a helper for obtaining the link frequency")
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Currently the return error code is in ret is being assigned but not
used. It and should be returned by the return statement and currently
just 0 is being returned. Fix this.
Addresses-Coverity: ("Unused value")
Fixes: b9ad52aafe ("media: rcar-vin: Rework parallel firmware parsing")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Fix obtaining CCS static data version minor part correctly. Instead, the
upper 8 bits were obtained from the major version number.
Fixes: a6b396f410 ("media: ccs: Add CCS static data parser library")
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
The highest fundamental frequency signal for C-PHY is half of the symbol
rate which is similar to D-PHY. Take this into account in ccs-pll.
Also remove the outdated comment.
Fixes: 8030aa4f9c ("media: ccs-pll: Add C-PHY support")
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
iommu_flush_dev_iotlb() is called to invalidate caches on a device but
only loops over the devices which are fully-attached to the domain. For
sub-devices, this is ineffective and can result in invalid caching
entries left on the device.
Fix the missing invalidation by adding a loop over the subdevices and
ensuring that 'domain->has_iotlb_device' is updated when attaching to
subdevices.
Fixes: 67b8e02b5e ("iommu/vt-d: Aux-domain specific domain attach/detach")
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Acked-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/1609949037-25291-4-git-send-email-yi.l.liu@intel.com
Signed-off-by: Will Deacon <will@kernel.org>
On SM8150 it's occasionally observed that the boot hangs in between the
writing of SMEs and context banks in arm_smmu_device_reset().
The problem seems to coincide with a display refresh happening after
updating the stream mapping, but before clearing - and there by
disabling translation - the context bank picked to emulate translation
bypass.
Resolve this by explicitly disabling the bypass context already in
cfg_probe.
Fixes: f9081b8ff5 ("iommu/arm-smmu-qcom: Implement S2CR quirk")
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20210106005038.4152731-1-bjorn.andersson@linaro.org
Signed-off-by: Will Deacon <will@kernel.org>
Lock(&iommu->lock) without disabling irq causes lockdep warnings.
========================================================
WARNING: possible irq lock inversion dependency detected
5.11.0-rc1+ #828 Not tainted
--------------------------------------------------------
kworker/0:1H/120 just changed the state of lock:
ffffffffad9ea1b8 (device_domain_lock){..-.}-{2:2}, at:
iommu_flush_dev_iotlb.part.0+0x32/0x120
but this lock took another, SOFTIRQ-unsafe lock in the past:
(&iommu->lock){+.+.}-{2:2}
and interrupts could create inverse lock ordering between them.
other info that might help us debug this:
Possible interrupt unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&iommu->lock);
local_irq_disable();
lock(device_domain_lock);
lock(&iommu->lock);
<Interrupt>
lock(device_domain_lock);
*** DEADLOCK ***
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20201231005323.2178523-5-baolu.lu@linux.intel.com
Signed-off-by: Will Deacon <will@kernel.org>
Enable Super speed plus in configfs to support USB3.1 Gen2.
This ensures that when a USB gadget is plugged in, it is
enumerated as Gen 2 and connected at 10 Gbps if the host and
cable are capable of it.
Many in-tree gadget functions (fs, midi, acm, ncm, mass_storage,
etc.) already have SuperSpeed Plus support.
Tested: plugged gadget into Linux host and saw:
[284907.385986] usb 8-2: new SuperSpeedPlus Gen 2 USB device number 3 using xhci_hcd
Tested-by: Lorenzo Colitti <lorenzo@google.com>
Acked-by: Felipe Balbi <balbi@kernel.org>
Signed-off-by: taehyun.cho <taehyun.cho@samsung.com>
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Link: https://lore.kernel.org/r/20210106154625.2801030-1-lorenzo@google.com
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It is observed 'use-after-free' on the dmabuf's file->f_inode with the
race between closing the dmabuf file and reading the dmabuf's debug
info.
Consider the below scenario where P1 is closing the dma_buf file
and P2 is reading the dma_buf's debug info in the system:
P1 P2
dma_buf_debug_show()
dma_buf_put()
__fput()
file->f_op->release()
dput()
....
dentry_unlink_inode()
iput(dentry->d_inode)
(where the inode is freed)
mutex_lock(&db_list.lock)
read 'dma_buf->file->f_inode'
(the same inode is freed by P1)
mutex_unlock(&db_list.lock)
dentry->d_op->d_release()-->
dma_buf_release()
.....
mutex_lock(&db_list.lock)
removes the dmabuf from the list
mutex_unlock(&db_list.lock)
In the above scenario, when dma_buf_put() is called on a dma_buf, it
first frees the dma_buf's file->f_inode(=dentry->d_inode) and then
removes this dma_buf from the system db_list. In between P2 traversing
the db_list tries to access this dma_buf's file->f_inode that was freed
by P1 which is a use-after-free case.
Since, __fput() calls f_op->release first and then later calls the
d_op->d_release, move the dma_buf's db_list removal from d_release() to
f_op->release(). This ensures that dma_buf's file->f_inode is not
accessed after it is released.
Cc: <stable@vger.kernel.org> # 5.4.x-
Fixes: 4ab59c3c63 ("dma-buf: Move dma_buf_release() from fops to dentry_ops")
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Charan Teja Reddy <charante@codeaurora.org>
Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org>
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/1609857399-31549-1-git-send-email-charante@codeaurora.org
The tb_dbg() call is using %#x that already adds the 0x prefix so don't
duplicate it.
Fixes: 9039387e16 ("thunderbolt: Add USB4 router operation proxy for firmware connection manager")
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Yehezkel Bernat <YehezkelShB@gmail.com>
For consistency with __uaccess_{disable,enable}_hw_pan(), move the
PSTATE.TCO setting into dedicated __uaccess_{disable,enable}_tco()
functions.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
The previous patch fixes a TEF vs. TX race condition, by first updating the TEF
tail pointer in hardware, and then updating the driver internal pointer.
The same pattern exists in the RX-path, too. This should be no problem, as the
driver accesses the RX-FIFO from the interrupt handler only, thus the access is
properly serialized. Fix the order here, too, so that the TEF- and RX-path look
similar.
Fixes: 1f652bb6ba ("can: mcp25xxfd: rx-path: reduce number of SPI core requests to set UINC bit")
Link: https://lore.kernel.org/r/20210105214138.3150886-3-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
The mcp251xfd driver uses a TX FIFO for sending CAN frames and a TX Event FIFO
(TEF) for completed TX-requests.
The TEF event handling in the mcp251xfd_handle_tefif() function has a race
condition. It first increments the tx-ring's tail counter to signal that
there's room in the TX and TEF FIFO, then it increments the TEF FIFO in
hardware.
A running mcp251xfd_start_xmit() on a different CPU might not stop the txqueue
(as the tx-ring still shows free space). The next mcp251xfd_start_xmit() will
push a message into the chip and the TX complete event might overflow the TEF
FIFO.
This patch changes the order to fix the problem.
Fixes: 68c0c1c7f9 ("can: mcp251xfd: tef-path: reduce number of SPI core requests to set UINC bit")
Link: https://lore.kernel.org/r/20210105214138.3150886-2-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
See Documentation/core-api/printk-formats.rst.
h should no longer be used in the format specifier for printk.
Signed-off-by: Tom Rix <trix@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
See Documentation/core-api/printk-formats.rst.
h should no longer be used in the format specifier for printk.
Signed-off-by: Tom Rix <trix@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
According to the TCAN4550 datasheet "SLLSF91 - DECEMBER 2018" the tcan4x5x has
the same bittiming constants as a m_can revision 3.2.x/3.3.0.
The tcan4x5x chip I'm using identifies itself as m_can revision 3.2.1, so
remove the tcan4x5x specific bittiming values and rely on the values in the
m_can driver, which are selected according to core revision.
Fixes: 5443c226ba ("can: tcan4x5x: Add tcan4x5x driver to the kernel")
Cc: Dan Murphy <dmurphy@ti.com>
Reviewed-by: Sean Nyekjaer <sean@geanix.com>
Link: https://lore.kernel.org/r/20201215103238.524029-3-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Error log:
sysfs: cannot create duplicate filename '/bus/platform/devices/30000000.bus'
The spba bus name is duplicate with aips bus name.
Refine spba bus name to fix this issue.
Fixes: 970406eaef ("arm64: dts: imx8mn: Enable Asynchronous Sample Rate Converter")
Signed-off-by: Shengjiu Wang <shengjiu.wang@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Restore alignment of the continuation of the devm_ioremap() call in
register_intc_controller().
Fixes: 4bdc0d676a ("remove ioremap_nocache and devm_ioremap_nocache")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Rich Felker <dalias@libc.org>
Remove CONFIG_IDE from defconfigs that did not actually select chipset
drivers, and switch ones that have libata drivers to libata.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Rich Felker <dalias@libc.org>
Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code.
Signed-off-by: Qinglang Miao <miaoqinglang@huawei.com>
Signed-off-by: Rich Felker <dalias@libc.org>
Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code.
Signed-off-by: Qinglang Miao <miaoqinglang@huawei.com>
Signed-off-by: Rich Felker <dalias@libc.org>
When G2_DMA is enabled and SH_DMA is disabled, it results in the following
Kbuild warning:
WARNING: unmet direct dependencies detected for SH_DMA_API
Depends on [n]: SH_DMA [=n]
Selected by [y]:
- G2_DMA [=y] && SH_DREAMCAST [=y]
The reason is that G2_DMA selects SH_DMA_API without depending on or
selecting SH_DMA while SH_DMA_API depends on SH_DMA.
When G2_DMA was first introduced with commit 40f49e7ed7
("sh: dma: Make G2 DMA configurable."), this wasn't an issue since
SH_DMA_API didn't have such dependency, and this way was the only way to
enable it since SH_DMA_API was non-visible. However, later SH_DMA_API was
made visible and dependent on SH_DMA with commit d8902adcc1
("dmaengine: sh: Add Support SuperH DMA Engine driver").
Let G2_DMA depend on SH_DMA_API instead to avoid Kbuild issues.
Fixes: d8902adcc1 ("dmaengine: sh: Add Support SuperH DMA Engine driver")
Signed-off-by: Necip Fazil Yildiran <fazilyildiran@gmail.com>
Signed-off-by: Rich Felker <dalias@libc.org>
ptp_ines.c uses devm_platform_ioremap_resource(), which is only
built/available when CONFIG_HAS_IOMEM is enabled.
CONFIG_HAS_IOMEM is not enabled for arch/s390/, so builds on S390
have a build error:
s390-linux-ld: drivers/ptp/ptp_ines.o: in function `ines_ptp_ctrl_probe':
ptp_ines.c:(.text+0x17e6): undefined reference to `devm_platform_ioremap_resource'
Prevent builds of ptp_ines.c when HAS_IOMEM is not set.
Fixes: bad1eaa6ac ("ptp: Add a driver for InES time stamping IP core.")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: kernel test robot <lkp@intel.com>
Link: lore.kernel.org/r/202101031125.ZEFCUiKi-lkp@intel.com
Acked-by: Richard Cochran <richardcochran@gmail.com>
Link: https://lore.kernel.org/r/20210106042531.1351-1-rdunlap@infradead.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fix build errors when LEDS_CLASS=m and NET_DSA_HIRSCHMANN_HELLCREEK=y.
This limits the latter to =m when LEDS_CLASS=m.
microblaze-linux-ld: drivers/net/dsa/hirschmann/hellcreek_ptp.o: in function `hellcreek_ptp_setup':
(.text+0xf80): undefined reference to `led_classdev_register_ext'
microblaze-linux-ld: (.text+0xf94): undefined reference to `led_classdev_register_ext'
microblaze-linux-ld: drivers/net/dsa/hirschmann/hellcreek_ptp.o: in function `hellcreek_ptp_free':
(.text+0x1018): undefined reference to `led_classdev_unregister'
microblaze-linux-ld: (.text+0x1024): undefined reference to `led_classdev_unregister'
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: kernel test robot <lkp@intel.com>
Link: lore.kernel.org/r/202101060655.iUvMJqS2-lkp@intel.com
Cc: Kurt Kanzenbach <kurt@linutronix.de>
Link: https://lore.kernel.org/r/20210106021815.31796-1-rdunlap@infradead.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
.dellink does not get called after .newlink fails,
bareudp_newlink() must undo what bareudp_configure()
has done if bareudp_link_config() fails.
v2: call bareudp_dellink(), like bareudp_dev_create() does
Fixes: 571912c69f ("net: UDP tunnel encapsulation module for tunnelling different protocols like MPLS, IP, NSH etc.")
Link: https://lore.kernel.org/r/20210105190725.1736246-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If the origin device has a volatile write-back cache and the following
events occur:
1: After finishing merge operation of one set of exceptions,
merge_callback() is invoked.
2: Update the metadata in COW device tracking the merge completion.
This update to COW device is flushed cleanly.
3: System crashes and the origin device's cache where the recent
merge was completed has not been flushed.
During the next cycle when we read the metadata from the COW device,
we will skip reading those metadata whose merge was completed in
step (1). This will lead to data loss/corruption.
To address this, flush the origin device post merge IO before
updating the metadata.
Cc: stable@vger.kernel.org
Signed-off-by: Akilesh Kailash <akailash@google.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Fedora Rawhide has started including gcc 11,and the g++ compiler
throws a wobbly when it hits scripts/gcc-plugins:
HOSTCXX scripts/gcc-plugins/latent_entropy_plugin.so
In file included from /usr/include/c++/11/type_traits:35,
from /usr/lib/gcc/x86_64-redhat-linux/11/plugin/include/system.h:244,
from /usr/lib/gcc/x86_64-redhat-linux/11/plugin/include/gcc-plugin.h:28,
from scripts/gcc-plugins/gcc-common.h:7,
from scripts/gcc-plugins/latent_entropy_plugin.c:78:
/usr/include/c++/11/bits/c++0x_warning.h:32:2: error: #error This file requires compiler and library support for the ISO
C++ 2011 standard. This support must be enabled with the -std=c++11 or -std=gnu++11 compiler options.
32 | #error This file requires compiler and library support \
In fact, it works just fine with c++11, which has been in gcc since 4.8,
and we now require 4.9 as a minimum.
Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/82487.1609006918@turing-police
From Ard:
"Simply disabling -mgeneral-regs-only left and right is risky, given that
the standard AArch64 ABI permits the use of FP/SIMD registers anywhere,
and GCC is known to use SIMD registers for spilling, and may invent
other uses of the FP/SIMD register file that have nothing to do with the
floating point code in question. Note that putting kernel_neon_begin()
and kernel_neon_end() around the code that does use FP is not sufficient
here, the problem is in all the other code that may be emitted with
references to SIMD registers in it.
So the only way to do this properly is to put all floating point code in
a separate compilation unit, and only compile that unit with
-mgeneral-regs-only."
Disable support until the code can be properly refactored to support this
properly on aarch64.
Acked-by: Will Deacon <will@kernel.org>
Reported-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
old code wrongly used the bad page status as the function return value,
which cause amdgpu_ras_badpages_read always return failed.
Signed-off-by: Dennis Li <Dennis.Li@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Navi12 HDCP & DTM deinitialization needs continue to free bo if already
created though initialized flag is not set.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Jiawei Gu <Jiawei.Gu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Some of the newly added code is hidden inside of #ifdef
blocks, but one variable is unused when debugfs is disabled:
drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:8370:8: error: unused variable 'configure_crc' [-Werror,-Wunused-variable]
Change the #ifdef to an if(IS_ENABLED()) check to fix the warning
and avoid adding more #ifdefs.
Fixes: c920888c60 ("drm/amd/display: Expose new CRC window property")
Reviewed-by: Wayne Lin <Wayne.Lin@amd.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This patch is to improve the fine grain tuning function for RV/RV2/PCO.
The fine grain tuning function uses the sysfs node -- pp_od_clk_voltage
to config gfxclk. Meanwhile, another sysfs
node -- power_dpm_force_perfomance_level also affects the gfx clk.
It will cause confusion when these two sysfs nodes works
together. So this patch adds one flag to avoid this confusion, the flag
will make these two sysfs nodes work separately.
The flag is set as "disabled" by default, so the fine grain tuning function
will be disabled by default.
Only when power_dpm_force_perfomance_level is changed to
"manual" mode, the flag will be set as "enabled",
and the fine grain tuning function will be enabled.
In other profile modes, including "auto", "high", "low",
"profile_peak", "profile_standard", "profile_min_sclk",
"profile_min_mclk", the flag will be set as "disabled",
and the od range of fine grain tuning function will
be restored default value.
Signed-off-by: Xiaojian Du <Xiaojian.Du@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
When GFXOFF is enabled and GPU is idle, driver will fail to access some
registers. Therefore change to disable power gating before all access
registers with MMIO.
Dmesg log is as following:
amdgpu 0000:03:00.0: amdgpu: amdgpu: finishing device.
amdgpu: cp queue pipe 4 queue 0 preemption failed
amdgpu 0000:03:00.0: amdgpu: failed to write reg 2890 wait reg 28a2
amdgpu 0000:03:00.0: amdgpu: failed to write reg 1a6f4 wait reg 1a706
amdgpu 0000:03:00.0: amdgpu: failed to write reg 2890 wait reg 28a2
amdgpu 0000:03:00.0: amdgpu: failed to write reg 1a6f4 wait reg 1a706
Signed-off-by: Dennis Li <Dennis.Li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The destruction flow is very complicated here because the cm_id can be
destroyed from the event handler at any time if the device is
hot-removed. This leaves behind a partial ctx with no cm_id in the
xarray, and will let user space leak memory.
Make everything consistent in this flow in all places:
- Return the xarray back to XA_ZERO_ENTRY before beginning any
destruction. The thread that reaches this first is responsible to
kfree, everyone else does nothing.
- Test the xarray during the special hot-removal case to block the
queue_work, this has much simpler locking and doesn't require a
'destroying'
- Fix the ref initialization so that it is only positive if cm_id !=
NULL, then rely on that to guide the destruction process in all cases.
Now the new ucma_destroy_private_ctx() can be called in all places that
want to free the ctx, including all the error unwinds, and none of the
details are missed.
Fixes: a1d33b70db ("RDMA/ucma: Rework how new connections are passed through event delivery")
Link: https://lore.kernel.org/r/20210105111327.230270-1-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
This patch is to improve the fine grain tuning function for RV/RV2/PCO.
This patch adds two new commands: "restore" and "commit".
This function uses the pp_od_clk_voltage sysfs file to configure the min
and max value of gfx clock frequency manually or restore the default value.
Command guide:
echo "s level value" > pp_od_clk_voltage
"s" - set the sclk frequency
"level" - 0 or 1, "0" represents the min value, "1" represents
the max value
"value" - the target value of sclk frequency, it should be limited in the
safe range
echo "r" > pp_od_clk_voltage
"r" - reset the sclk frequency, restore the default value instantly
echo "c" > pp_od_clk_voltage
"c" - commit the min and max value of sclk frequency to the system
only after the commit command, the target values set by "s" command
will take effect.
Example:
1)change power profile from "auto" to "manual"
$ cat power_dpm_force_performance_level
auto
$ echo "manual" > power_dpm_force_performance_level
$ cat power_dpm_force_performance_level
manual
2)check the default sclk frequency
$ cat pp_od_clk_voltage
OD_SCLK:
0: 200Mhz
1: 1400Mhz
OD_RANGE:
SCLK: 200MHz 1400MHz
3)use "s" -- set command to configure the min and max sclk frequency
$ echo "s 0 600" > pp_od_clk_voltage
$ echo "s 1 1000" > pp_od_clk_voltage
$ echo "c" > pp_od_clk_voltage
$ cat pp_od_clk_voltage
OD_SCLK:
0: 600Mhz
1: 1000Mhz
OD_RANGE:
SCLK: 200MHz 1400MHz
4)use "r" -- reset command to restore the min or max sclk frequency
$ echo "r" > pp_od_clk_voltage
$ cat pp_od_clk_voltage
OD_SCLK:
0: 200Mhz
1: 1400Mhz
OD_RANGE:
SCLK: 200MHz 1400MHz
Signed-off-by: Xiaojian Du <Xiaojian.Du@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Pull btrfs fixes from David Sterba:
"A few more fixes that arrived before the end of the year:
- a bunch of fixes related to transaction handle lifetime wrt various
operations (umount, remount, qgroup scan, orphan cleanup)
- async discard scheduling fixes
- fix item size calculation when item keys collide for extend refs
(hardlinks)
- fix qgroup flushing from running transaction
- fix send, wrong file path when there is an inode with a pending
rmdir
- fix deadlock when cloning inline extent and low on free metadata
space"
* tag 'for-5.11-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: run delayed iputs when remounting RO to avoid leaking them
btrfs: add assertion for empty list of transactions at late stage of umount
btrfs: fix race between RO remount and the cleaner task
btrfs: fix transaction leak and crash after cleaning up orphans on RO mount
btrfs: fix transaction leak and crash after RO remount caused by qgroup rescan
btrfs: merge critical sections of discard lock in workfn
btrfs: fix racy access to discard_ctl data
btrfs: fix async discard stall
btrfs: tests: initialize test inodes location
btrfs: send: fix wrong file path when there is an inode with a pending rmdir
btrfs: qgroup: don't try to wait flushing if we're already holding a transaction
btrfs: correctly calculate item size used when item key collision happens
btrfs: fix deadlock when cloning inline extent and low on free metadata space
Georgi writes:
interconnect fixes for v5.11
This contains a few fixes for iMX and Qualcomm drivers and also
updates my email to my kernel.org address.
- qcom: Fix rpmh link failures when compile test is enabled
- imx: Add a missing of_node_put after of_device_is_available
- imx: Remove a useless test
- imx8mq: Use icc_sync_state
Signed-off-by: Georgi Djakov <georgi.djakov@linaro.org>
* tag 'icc-5.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/djakov/icc:
MAINTAINERS: Update Georgi's email address
interconnect: imx8mq: Use icc_sync_state
interconnect: imx: Remove a useless test
interconnect: imx: Add a missing of_node_put after of_device_is_available
interconnect: qcom: fix rpmh link failures
Johan writes:
USB-serial fixes for 5.11-rc3
Here's a fix for a DMA-from-stack issue in iuu_phoenix and a couple of
new modem device ids.
All have been in linux-next with no reported issues.
* tag 'usb-serial-5.11-rc3' of https://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial:
USB: serial: iuu_phoenix: fix DMA from stack
USB: serial: option: add LongSung M5710 module support
USB: serial: option: add Quectel EM160R-GL
It is only safe to call the tracepoint before rpc_put_task() because
'data' is freed inside nfs4_lock_release (rpc_release).
Fixes: 48c9579a1a ("Adding stateid information to tracepoints")
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
alloc_fixed_file_ref_node() currently returns an ERR_PTR on failure.
io_sqe_files_unregister() expects it to return NULL and since it can only
return -ENOMEM, it makes more sense to change alloc_fixed_file_ref_node()
to behave that way.
Fixes: 1ffc54220c ("io_uring: fix io_sqe_files_unregister() hangs")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
In mtrr_type_lookup(), if the input memory address region is not in the
MTRR, over 4GB, and not over the top of memory, a write-back attribute
is returned. These condition checks are for ensuring the input memory
address region is actually mapped to the physical memory.
However, if the end address is just aligned with the top of memory,
the condition check treats the address is over the top of memory, and
write-back attribute is not returned.
And this hits in a real use case with NVDIMM: the nd_pmem module tries
to map NVDIMMs as cacheable memories when NVDIMMs are connected. If a
NVDIMM is the last of the DIMMs, the performance of this NVDIMM becomes
very low since it is aligned with the top of memory and its memory type
is uncached-minus.
Move the input end address change to inclusive up into
mtrr_type_lookup(), before checking for the top of memory in either
mtrr_type_lookup_{variable,fixed}() helpers.
[ bp: Massage commit message. ]
Fixes: 0cc705f56e ("x86/mm/mtrr: Clean up mtrr_type_lookup()")
Signed-off-by: Ying-Tsun Huang <ying-tsun.huang@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20201215070721.4349-1-ying-tsun.huang@amd.com
We've observed crashes due to an empty cpu mask in
hyperv_flush_tlb_others. Obviously the cpu mask in question is changed
between the cpumask_empty call at the beginning of the function and when
it is actually used later.
One theory is that an interrupt comes in between and a code path ends up
changing the mask. Move the check after interrupt has been disabled to
see if it fixes the issue.
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Cc: stable@kernel.org
Link: https://lore.kernel.org/r/20210105175043.28325-1-wei.liu@kernel.org
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Commit eff8728fe6 ("vmlinux.lds.h: Add PGO and AutoFDO input
sections") added ".text.unlikely.*" and ".text.hot.*" due to an LLVM
change [1].
After another LLVM change [2], these sections are seen in some PowerPC
builds, where there is a orphan section warning then build failure:
$ make -skj"$(nproc)" \
ARCH=powerpc CROSS_COMPILE=powerpc64le-linux-gnu- LLVM=1 O=out \
distclean powernv_defconfig zImage.epapr
ld.lld: warning: kernel/built-in.a(panic.o):(.text.unlikely.) is being placed in '.text.unlikely.'
...
ld.lld: warning: address (0xc000000000009314) of section .text is not a multiple of alignment (256)
...
ERROR: start_text address is c000000000009400, should be c000000000008000
ERROR: try to enable LD_HEAD_STUB_CATCH config option
ERROR: see comments in arch/powerpc/tools/head_check.sh
...
Explicitly handle these sections like in the main linker script so
there is no more build failure.
[1]: https://reviews.llvm.org/D79600
[2]: https://reviews.llvm.org/D92493
Fixes: 83a092cf95 ("powerpc: Link warning for orphan sections")
Cc: stable@vger.kernel.org
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://github.com/ClangBuiltLinux/linux/issues/1218
Link: https://lore.kernel.org/r/20210104205952.1399409-1-natechancellor@gmail.com
When a queue is in NVMET_RDMA_Q_CONNECTING state, it may has some
requests at rsp_wait_list. In case a disconnect occurs at this
state, no one will empty this list and will return the requests to
free_rsps list. Normally nvmet_rdma_queue_established() free those
requests after moving the queue to NVMET_RDMA_Q_LIVE state, but in
this case __nvmet_rdma_queue_disconnect() is called before. The
crash happens at nvmet_rdma_free_rsps() when calling
list_del(&rsp->free_list), because the request exists only at
the wait list. To fix the issue, simply clear rsp_wait_list when
destroying the queue.
Signed-off-by: Israel Rukshin <israelr@nvidia.com>
Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
There are no callers for nvme_reset_ctrl_sync() and
nvme_alloc_request_qid() so that we keep the symbols exported.
Unexport those functions, mark them static and update the header file
respectively.
Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
While handling the completion queue, keep a local copy of the command id
from the DMA-accessible completion entry. This silences a time-of-check
to time-of-use (TOCTOU) warning from KF/x[1], with respect to a
Thunderclap[2] vulnerability analysis. The double-read impact appears
benign.
There may be a theoretical window for @command_id to be used as an
adversary-controlled array-index-value for mounting a speculative
execution attack, but that mitigation is saved for a potential follow-on.
A man-in-the-middle attack on the data payload is out of scope for this
analysis and is hopefully mitigated by filesystem integrity mechanisms.
[1] https://github.com/intel/kernel-fuzzer-for-xen-project
[2] http://thunderclap.io/thunderclap-paper-ndss2019.pdf
Signed-off-by: Lalithambika Krishna Kumar <lalithambika.krishnakumar@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
We may send a request (with or without its data) from two paths:
1. From our I/O context nvme_tcp_io_work which is triggered from:
- queue_rq
- r2t reception
- socket data_ready and write_space callbacks
2. Directly from queue_rq if the send_list is empty (because we want to
save the context switch associated with scheduling our io_work).
However, given that now we have the send_mutex, we may run into a race
condition where none of these contexts will send the pending payload to
the controller. Both io_work send path and queue_rq send path
opportunistically attempt to acquire the send_mutex however queue_rq only
attempts to send a single request, and if io_work context fails to
acquire the send_mutex it will complete without rescheduling itself.
The race can trigger with the following sequence:
1. queue_rq sends request (no incapsule data) and blocks
2. RX path receives r2t - prepares data PDU to send, adds h2cdata PDU
to the send_list and schedules io_work
3. io_work triggers and cannot acquire the send_mutex - because of (1),
ends without self rescheduling
4. queue_rq completes the send, and completes
==> no context will send the h2cdata - timeout.
Fix this by having queue_rq sending as much as it can from the send_list
such that if it still has any left, its because the socket buffer is
full and the socket write_space callback will trigger, thus guaranteeing
that a context will be scheduled to send the h2cdata PDU.
Fixes: db5ad6b7f8 ("nvme-tcp: try to send request in queue_rq context")
Reported-by: Potnuri Bharat Teja <bharat@chelsio.com>
Reported-by: Samuel Jones <sjones@kalrayinc.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Tested-by: Potnuri Bharat Teja <bharat@chelsio.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
A system with more than one of these SSDs will only have one usable.
Hence the kernel fails to detect nvme devices due to duplicate cntlids.
[ 6.274554] nvme nvme1: Duplicate cntlid 33 with nvme0, rejecting
[ 6.274566] nvme nvme1: Removing after probe failure status: -22
Adding the NVME_QUIRK_IGNORE_DEV_SUBNQN quirk to resolves the issue.
Signed-off-by: Gopal Tiwari <gtiwari@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Kernel robot had the following warnings:
>> fcloop.c:1506:6: warning: %x in format string (no. 1) requires
>> 'unsigned int *' but the argument type is 'signed int *'.
>> [invalidScanfArgType_int]
>> if (sscanf(buf, "%x:%d:%d", &opcode, &starting, &amount) != 3)
>> ^
Resolve by changing opcode from and int to an unsigned int
and
>> fcloop.c:1632:32: warning: Uninitialized variable: lport [uninitvar]
>> ret = __wait_localport_unreg(lport);
>> ^
>> fcloop.c:1615:28: warning: Uninitialized variable: nport [uninitvar]
>> ret = __remoteport_unreg(nport, rport);
>> ^
These aren't actual issues as the values are assigned prior to use.
It appears the tool doesn't understand list_first_entry_or_null().
Regardless, quiet the tool by initializing the pointers to NULL at
declaration.
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Recent patches changed calling sequences. nvme_fc_abort_outstanding_ios
used to be called from a timeout or work context. Now it is being called
in an io completion context, which can be an interrupt handler.
Unfortunately, the abort outstanding ios routine attempts to stop nvme
queues and nested routines that may try to sleep, which is in conflict
with the interrupt handler.
Correct replacing the direct call with a work element scheduling, and the
abort outstanding ios routine will be called in the work element.
Fixes: 95ced8a2c7 ("nvme-fc: eliminate terminate_io use by nvme_fc_error_recovery")
Signed-off-by: James Smart <james.smart@broadcom.com>
Reported-by: Daniel Wagner <dwagner@suse.de>
Tested-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
The block layer code will split a large zeroout request into multiple bios
and if WRITE SAME is disabled because the storage device reports that it
does not support it (or support the length used), we can get an error
message from the block layer despite the setting of RQF_QUIET on the first
request. This is because more than one request may have already been
submitted.
Fix this by setting RQF_QUIET when BLK_STS_TARGET is returned to fail the
request early, we don't need to log a message because we did not actually
submit the command to the device, and the block layer code will handle the
error by submitting individual write bios.
Link: https://lore.kernel.org/r/20201207221021.28243-1-emilne@redhat.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Users can initiate resets to specific SCSI device/target/host through
IOCTL. When this happens, the SCSI cmd passed to eh_device/target/host
_reset_handler() callbacks is initialized with a request whose tag is -1.
In this case it is not right for eh_device_reset_handler() callback to
count on the LUN get from hba->lrb[-1]. Fix it by getting LUN from the SCSI
device associated with the SCSI cmd.
Link: https://lore.kernel.org/r/1609157080-26283-1-git-send-email-cang@codeaurora.org
Reviewed-by: Avri Altman <avri.altman@wdc.com>
Reviewed-by: Stanley Chu <stanley.chu@mediatek.com>
Signed-off-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
BXT/APL has different isr/irr/hpd regs compared with other GEN9. If not
setting these regs bits correctly according to the emulated monitor
(currently a DP on PORT_B), although gvt still triggers a virtual HPD
event, the guest driver won't detect a valid HPD pulse thus no full
display detection will be executed to read the updated EDID.
With this patch, the vfio_edid is enabled again on BXT/APL, which is
previously disabled.
Fixes: 642403e359 ("drm/i915/gvt: Temporarily disable vfio_edid for BXT/APL")
Signed-off-by: Colin Xu <colin.xu@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20201201060329.142375-1-colin.xu@intel.com
Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Currently if device needs to do flush or BKOP operations, the device VCC
power is kept during runtime-suspend period.
However, if system suspend is happening while device is runtime-suspended,
such power may not be disabled successfully.
The reasons may be,
1. If current PM level is the same as SPM level, device will keep
runtime-suspended by ufshcd_system_suspend().
2. Flush recheck work may not be scheduled successfully during system
suspend period. If it can wake up the system, this is also not the
intention of the recheck work.
To fix this issue, simply runtime-resume the device if the flush is allowed
during runtime suspend period. Flush capability will be disabled while
leaving runtime suspend, and also not be allowed in system suspend period.
Link: https://lore.kernel.org/r/20201222072905.32221-2-stanley.chu@mediatek.com
Fixes: 51dd905bd2 ("scsi: ufs: Fix WriteBooster flush during runtime suspend")
Reviewed-by: Chaotian Jing <chaotian.jing@mediatek.com>
Reviewed-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Stanley Chu <stanley.chu@mediatek.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The kernel image can contain multiple types (structs/unions)
with the same name. This causes distinct type hierarchies in
BTF data and makes resolve_btfids fail with error like:
BTFIDS vmlinux
FAILED unresolved symbol udp6_sock
as reported by Qais Yousef [1].
This change adds warning when multiple types of the same name
are detected:
BTFIDS vmlinux
WARN: multiple IDs found for 'file': 526, 113351 - using 526
WARN: multiple IDs found for 'sk_buff': 2744, 113958 - using 2744
We keep the lower ID for the given type instance and let the
build continue.
Also changing the 'nr' variable name to 'nr_types' to avoid confusion.
[1] https://lore.kernel.org/lkml/20201229151352.6hzmjvu3qh6p2qgg@e107158-lin/
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210105234219.970039-1-jolsa@kernel.org
The udpgro.sh will always return 0 (unless the bpf selftest was not
build first) even if there are some failed sub test-cases.
Therefore the kselftest framework will report this case is OK.
Check and return the exit status of each test to make it easier to
spot real failures.
Signed-off-by: Po-Hsu Lin <po-hsu.lin@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since commit 1d6cd39293 ("modpost: turn missing MODULE_LICENSE()
into error") the ppc32_allmodconfig build fails with:
ERROR: modpost: missing MODULE_LICENSE() in drivers/net/ethernet/freescale/fs_enet/mii-fec.o
ERROR: modpost: missing MODULE_LICENSE() in drivers/net/ethernet/freescale/fs_enet/mii-bitbang.o
Add the missing MODULE_LICENSEs to fix the build. Both files include a
copyright header indicating they are GPL v2.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
A null-ptr-deref bug is reported by Hulk Robot like this:
--------------
KASAN: null-ptr-deref in range [0x0000000000000128-0x000000000000012f]
Call Trace:
qrtr_ns_remove+0x22/0x40 [ns]
qrtr_proto_fini+0xa/0x31 [qrtr]
__x64_sys_delete_module+0x337/0x4e0
do_syscall_64+0x34/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x468ded
--------------
When qrtr_ns_init fails in qrtr_proto_init, qrtr_ns_remove which would
be called later on would raise a null-ptr-deref because qrtr_ns.workqueue
has been destroyed.
Fix it by making qrtr_ns_init have a return value and adding a check in
qrtr_proto_init.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Qinglang Miao <miaoqinglang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Aligning to tx_ndp_modulus is not sufficient because the next align
call can be cdc_ncm_align_tail, which can add up to ctx->tx_modulus +
ctx->tx_remainder - 1 bytes. This used to lead to occasional crashes
on a Huawei 909s-120 LTE module as follows:
- the condition marked /* if there is a remaining skb [...] */ is true
so the swaps happen
- skb_out is set from ctx->tx_curr_skb
- skb_out->len is exactly 0x3f52
- ctx->tx_curr_size is 0x4000 and delayed_ndp_size is 0xac
(note that the sum of skb_out->len and delayed_ndp_size is 0x3ffe)
- the for loop over n is executed once
- the cdc_ncm_align_tail call marked /* align beginning of next frame */
increases skb_out->len to 0x3f56 (the sum is now 0x4002)
- the condition marked /* check if we had enough room left [...] */ is
false so we break out of the loop
- the condition marked /* If requested, put NDP at end of frame. */ is
true so the NDP is written into skb_out
- now skb_out->len is 0x4002, so padding_count is minus two interpreted
as an unsigned number, which is used as the length argument to memset,
leading to a crash with various symptoms but usually including
> Call Trace:
> <IRQ>
> cdc_ncm_fill_tx_frame+0x83a/0x970 [cdc_ncm]
> cdc_mbim_tx_fixup+0x1d9/0x240 [cdc_mbim]
> usbnet_start_xmit+0x5d/0x720 [usbnet]
The cdc_ncm_align_tail call first aligns on a ctx->tx_modulus
boundary (adding at most ctx->tx_modulus-1 bytes), then adds
ctx->tx_remainder bytes. Alternatively, the next alignment call can
occur in cdc_ncm_ndp16 or cdc_ncm_ndp32, in which case at most
ctx->tx_ndp_modulus-1 bytes are added.
A similar problem has occurred before, and the code is nontrivial to
reason about, so add a guard before the crashing call. By that time it
is too late to prevent any memory corruption (we'll have written past
the end of the buffer already) but we can at least try to get a warning
written into an on-disk log by avoiding the hard crash caused by padding
past the buffer with a huge number of zeros.
Signed-off-by: Jouni K. Seppänen <jks@iki.fi>
Fixes: 4a0e3e989d ("cdc_ncm: Add support for moving NDP to end of NCM frame")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=209407
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
Huazhong Tan says:
====================
net: hns3: fixes for -net
There are some bugfixes for the HNS3 ethernet driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
For DEVICE_VERSION_V2, the hardware only supports src-ip,
dst-ip and verification-tag for rss tuple set of sctp6
packet. For DEVICE_VERSION_V3, the hardware supports
src-port and dst-port as well.
Currently, when user queries the sctp6 rss tuples info,
some unsupported information will be showed on V2. So add
a check for hardware version when initializing and queries
sctp6 rss tuple to fix this issue.
Fixes: 46a3df9f97 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
HCLGE_MBX_MAX_ARQ_MSG_NUM is used to apply memory for the number
of queues used by ARQ(Asynchronous Receive Queue), so the head
and tail pointers should also use this macro.
Fixes: 07a0556a3a ("net: hns3: Changes to support ARQ(Asynchronous Receive Queue)")
Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When phy driver does not implement the set_loopback interface,
phy loopback test will return -EOPNOTSUPP, and the loopback test
will fail. So when phy driver does not implement the set_loopback
interface, don't do phy loopback test.
Fixes: c9765a89d1 ("net: hns3: add phy selftest function")
Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Samuel Holland says:
====================
Fixes for dwmac-sun8i suspend/resume
This series fixes issues preventing dwmac-sun8i from working after a
suspend/resume cycle. Those issues include the PHY being left powered
off, the MAC syscon configuration being reset, and the reference to the
reset controller being improperly dropped. They also fix related issues
in probe error handling and driver removal.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Previously, sun8i_dwmac_set_syscon was called from a chain of functions
in several different files:
sun8i_dwmac_probe
stmmac_dvr_probe
stmmac_hw_init
stmmac_hwif_init
sun8i_dwmac_setup
sun8i_dwmac_set_syscon
which made the lifetime of the syscon values hard to reason about. Part
of the problem is that there is no similar platform driver callback from
stmmac_dvr_remove. As a result, the driver unset the syscon value in
sun8i_dwmac_exit, but this leaves it uninitialized after a suspend/
resume cycle. It was also unset a second time (outside sun8i_dwmac_exit)
in the probe error path.
Move the init to the earliest available place in sun8i_dwmac_probe
(after stmmac_probe_config_dt, which initializes plat_dat), and the
deinit to the corresponding position in the cleanup order.
Since priv is not filled in until stmmac_dvr_probe, this requires
changing the sun8i_dwmac_set_syscon parameters to priv's two relevant
members.
Fixes: 9f93ac8d40 ("net-next: stmmac: Add dwmac-sun8i")
Fixes: 634db83b82 ("net: stmmac: dwmac-sun8i: Handle integrated/external MDIOs")
Signed-off-by: Samuel Holland <samuel@sholland.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
sun8i_dwmac_exit calls sun8i_dwmac_unpower_internal_phy, but
sun8i_dwmac_init did not call sun8i_dwmac_power_internal_phy. This
caused PHY power to remain off after a suspend/resume cycle. Fix this by
recording if PHY power should be restored, and if so, restoring it.
Fixes: 634db83b82 ("net: stmmac: dwmac-sun8i: Handle integrated/external MDIOs")
Signed-off-by: Samuel Holland <samuel@sholland.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
While stmmac_pltfr_remove calls sun8i_dwmac_exit, the sun8i_dwmac_init
and sun8i_dwmac_exit functions are also called by the stmmac_platform
suspend/resume callbacks. They may be called many times during the
device's lifetime and should not release resources used by the driver.
Furthermore, there was no error handling in case registering the MDIO
mux failed during probe, and the EPHY clock was never released at all.
Fix all of these issues by moving the deinitialization code to a driver
removal callback. Also ensure the EPHY is powered down before removal.
Fixes: 634db83b82 ("net: stmmac: dwmac-sun8i: Handle integrated/external MDIOs")
Signed-off-by: Samuel Holland <samuel@sholland.org>
Reviewed-by: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
stmmac_pltfr_remove does three things in one function, making it
inapproprate for unwinding the steps in the probe function. Currently,
a failure before the call to stmmac_dvr_probe would leak OF node
references due to missing a call to stmmac_remove_config_dt. And an
error in stmmac_dvr_probe would cause the driver to attempt to remove a
netdevice that was never added. Fix these by reordering the init and
splitting out the error handling steps.
Fixes: 9f93ac8d40 ("net-next: stmmac: Add dwmac-sun8i")
Fixes: 40a1dcee2d ("net: ethernet: dwmac-sun8i: Use the correct function in exit path")
Signed-off-by: Samuel Holland <samuel@sholland.org>
Reviewed-by: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
VLAN checks for NETREG_UNINITIALIZED to distinguish between
registration failure and unregistration in progress.
Since commit cb626bf566 ("net-sysfs: Fix reference count leak")
registration failure may, however, result in NETREG_UNREGISTERED
as well as NETREG_UNINITIALIZED.
This fix is similer to cebb69754f ("rtnetlink: Fix
memory(net_device) leak when ->newlink fails")
Fixes: cb626bf566 ("net-sysfs: Fix reference count leak")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
From the existing definitions it's unclear which stat to
use to report filtering based on L2 dst addr in old
broadcast-medium Ethernet.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
A call to dma_alloc_coherent() is wrapped by sonic_alloc_descriptors().
This is correctly freed in the remove function, but not in the error
handling path of the probe function. Fix this by adding the missing
dma_free_coherent() call.
While at it, rename a label in order to be slightly more informative.
Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Chris Zankel <chris@zankel.net>
References: commit 10e3cc180e ("net/sonic: Fix a resource leak in an error handling path in 'jazz_sonic_probe()'")
Fixes: 74f2a5f0ef ("xtensa: Add support for the Sonic Ethernet device for the XT2000 board.")
Fixes: efcce83936 ("[PATCH] macsonic/jazzsonic network drivers update")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Without this, the driver runs into a link failure
arm-linux-gnueabi-ld: drivers/net/wan/slic_ds26522.o: in function `slic_ds26522_probe':
slic_ds26522.c:(.text+0x100c): undefined reference to `byte_rev_table'
arm-linux-gnueabi-ld: slic_ds26522.c:(.text+0x1cdc): undefined reference to `byte_rev_table'
arm-linux-gnueabi-ld: drivers/net/wan/slic_ds26522.o: in function `slic_write':
slic_ds26522.c:(.text+0x1e4c): undefined reference to `byte_rev_table'
Fixes: c37d4a0085 ("Maxim/driver: Add driver for maxim ds26522")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Without this, we run into a link error
arm-linux-gnueabi-ld: drivers/isdn/mISDN/dsp_audio.o: in function `dsp_audio_generate_law_tables':
(.text+0x30c): undefined reference to `byte_rev_table'
arm-linux-gnueabi-ld: drivers/isdn/mISDN/dsp_audio.o:(.text+0x5e4): more undefined references to `byte_rev_table' follow
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Without crc32 support, this fails to link:
arm-linux-gnueabi-ld: net/wireless/scan.o: in function `cfg80211_scan_6ghz':
scan.c:(.text+0x928): undefined reference to `crc32_le'
Fixes: c8cb5b854b ("nl80211/cfg80211: support 6 GHz scanning")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Without crc32, the driver fails to link:
arm-linux-gnueabi-ld: drivers/net/wireless/ath/wil6210/fw.o: in function `wil_fw_verify':
fw.c:(.text+0x74c): undefined reference to `crc32_le'
arm-linux-gnueabi-ld: drivers/net/wireless/ath/wil6210/fw.o:fw.c:(.text+0x758): more undefined references to `crc32_le' follow
Fixes: 151a970650 ("wil6210: firmware download")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Without crc32, this driver fails to link:
arm-linux-gnueabi-ld: drivers/net/can/kvaser_pciefd.o: in function `kvaser_pciefd_probe':
kvaser_pciefd.c:(.text+0x2b0): undefined reference to `crc32_be'
Fixes: 26ad340e58 ("can: kvaser_pciefd: Add driver for Kvaser PCIEcan devices")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Without crc32, this driver fails to link:
arm-linux-gnueabi-ld: drivers/net/phy/dp83640.o: in function `match':
dp83640.c:(.text+0x476c): undefined reference to `crc32_le'
Fixes: 539e44d268 ("dp83640: Include hash in timestamp/packet matching")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Without this, the driver fails to link:
lpc_eth.c:(.text+0x1934): undefined reference to `crc32_le'
arm-linux-gnueabi-ld: drivers/net/ethernet/qlogic/qed/qed_debug.o: in function `qed_grc_dump':
qed_debug.c:(.text+0x4068): undefined reference to `crc32_le'
arm-linux-gnueabi-ld: drivers/net/ethernet/qlogic/qed/qed_debug.o: in function `qed_idle_chk_dump':
qed_debug.c:(.text+0x51fc): undefined reference to `crc32_le'
arm-linux-gnueabi-ld: drivers/net/ethernet/qlogic/qed/qed_debug.o: in function `qed_mcp_trace_dump':
qed_debug.c:(.text+0x6000): undefined reference to `crc32_le'
arm-linux-gnueabi-ld: drivers/net/ethernet/qlogic/qed/qed_debug.o: in function `qed_dbg_reg_fifo_dump':
qed_debug.c:(.text+0x66cc): undefined reference to `crc32_le'
arm-linux-gnueabi-ld: drivers/net/ethernet/qlogic/qed/qed_debug.o:qed_debug.c:(.text+0x6aa4): more undefined references to `crc32_le' follow
Fixes: 7a4b21b7d1 ("qed: Add nvram selftest")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current description of ADC keys is not precise enough.
"when this key is pressed" leaves it open if a key is considered pressed
below or above the threshold. This has led to confusion:
drivers/input/keyboard/adc-keys.c ignores the meaning of thresholds and
sets the key that is closest to press-threshold-microvolt.
This patch nails down the definitions and provides an interpretation of the
supplied example.
Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Acked-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20201222110815.24121-1-xypron.glpk@gmx.de
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Pull vhost bugfix from Michael Tsirkin:
"This fixes configs with vhost vsock behind a viommu"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
vhost/vsock: add IOTLB API support
Pull sound fixes from Takashi Iwai:
"Here is a collection of USB- and HD-audio fixes.
Most of them are device-specific quirks while one fix is for a
regression due to an incorrect mutex unlock introduced in this merge
window"
* tag 'sound-5.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda/via: Fix runtime PM for Clevo W35xSS
ALSA: usb-audio: Add quirk for RC-505
ALSA: hda/hdmi: Fix incorrect mutex unlock in silent_stream_disable()
ALSA: hda/realtek: Enable mute and micmute LED on HP EliteBook 850 G7
ALSA: hda/realtek: Add two "Intel Reference board" SSID in the ALC256.
ALSA: hda/realtek: Add mute LED quirk for more HP laptops
ALSA: hda/conexant: add a new hda codec CX11970
ALSA: usb-audio: Add quirk for BOSS AD-10
ALSA: usb-audio: Fix UBSAN warnings for MIDI jacks
ALSA: hda/realtek - Modify Dell platform name
ALSA: hda/realtek - Fix speaker volume control on Lenovo C940
Pull ARC updates from Vineet Gupta:
"Things are quieter on upstreaming front as we are mostly focusing on
ARCv3/ARC64 port.
This contains just build system updates from Masahiro Yamada"
* tag 'arc-5.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
ARC: build: use $(READELF) instead of hard-coded readelf
ARC: build: remove unneeded extra-y
ARC: build: move symlink creation to arch/arc/Makefile to avoid race
ARC: build: add boot_targets to PHONY
ARC: build: add uImage.lzma to the top-level target
ARC: build: remove non-existing bootpImage from KBUILD_IMAGE
Pull networking fixes from Jakub Kicinski:
"Networking fixes, including fixes from netfilter, wireless and bpf
trees.
Current release - regressions:
- mt76: fix NULL pointer dereference in mt76u_status_worker and
mt76s_process_tx_queue
- net: ipa: fix interconnect enable bug
Current release - always broken:
- netfilter: fixes possible oops in mtype_resize in ipset
- ath11k: fix number of coding issues found by static analysis tools
and spurious error messages
Previous releases - regressions:
- e1000e: re-enable s0ix power saving flows for systems with the
Intel i219-LM Ethernet controllers to fix power use regression
- virtio_net: fix recursive call to cpus_read_lock() to avoid a
deadlock
- ipv4: ignore ECN bits for fib lookups in fib_compute_spec_dst()
- sysfs: take the rtnl lock around XPS configuration
- xsk: fix memory leak for failed bind and rollback reservation at
NETDEV_TX_BUSY
- r8169: work around power-saving bug on some chip versions
Previous releases - always broken:
- dcb: validate netlink message in DCB handler
- tun: fix return value when the number of iovs exceeds MAX_SKB_FRAGS
to prevent unnecessary retries
- vhost_net: fix ubuf refcount when sendmsg fails
- bpf: save correct stopping point in file seq iteration
- ncsi: use real net-device for response handler
- neighbor: fix div by zero caused by a data race (TOCTOU)
- bareudp: fix use of incorrect min_headroom size and a false
positive lockdep splat from the TX lock
- mvpp2:
- clear force link UP during port init procedure in case
bootloader had set it
- add TCAM entry to drop flow control pause frames
- fix PPPoE with ipv6 packet parsing
- fix GoP Networking Complex Control config of port 3
- fix pkt coalescing IRQ-threshold configuration
- xsk: fix race in SKB mode transmit with shared cq
- ionic: account for vlan tag len in rx buffer len
- stmmac: ignore the second clock input, current clock framework does
not handle exclusive clock use well, other drivers may reconfigure
the second clock
Misc:
- ppp: change PPPIOCUNBRIDGECHAN ioctl request number to follow
existing scheme"
* tag 'net-5.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (99 commits)
net: dsa: lantiq_gswip: Fix GSWIP_MII_CFG(p) register access
net: dsa: lantiq_gswip: Enable GSWIP_MII_CFG_EN also for internal PHYs
net: lapb: Decrease the refcount of "struct lapb_cb" in lapb_device_event
r8169: work around power-saving bug on some chip versions
net: usb: qmi_wwan: add Quectel EM160R-GL
selftests: mlxsw: Set headroom size of correct port
net: macb: Correct usage of MACB_CAPS_CLK_HW_CHG flag
ibmvnic: fix: NULL pointer dereference.
docs: networking: packet_mmap: fix old config reference
docs: networking: packet_mmap: fix formatting for C macros
vhost_net: fix ubuf refcount incorrectly when sendmsg fails
bareudp: Fix use of incorrect min_headroom size
bareudp: set NETIF_F_LLTX flag
net: hdlc_ppp: Fix issues when mod_timer is called while timer is running
atlantic: remove architecture depends
erspan: fix version 1 check in gre_parse_header()
net: hns: fix return value check in __lb_other_process()
net: sched: prevent invalid Scell_log shift count
net: neighbor: fix a crash caused by mod zero
ipv4: Ignore ECN bits for fib lookups in fib_compute_spec_dst()
...
Pull AFS fixes from David Howells:
"Two fixes.
The first is the fix for the strnlen() array limit check and the
second fixes the calculation of the number of dirent records used to
represent any particular filename length"
* tag 'afs-fixes-04012021' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
afs: Fix directory entry size calculation
afs: Work around strnlen() oops with CONFIG_FORTIFIED_SOURCE=y
Ever since commit 2a9127fcf2 ("mm: rewrite wait_on_page_bit_common()
logic") we've had some very occasional reports of BUG_ON(PageWriteback)
in write_cache_pages(), which we thought we already fixed in commit
073861ed77 ("mm: fix VM_BUG_ON(PageTail) and BUG_ON(PageWriteback)").
But syzbot just reported another one, even with that commit in place.
And it turns out that there's a simpler way to trigger the BUG_ON() than
the one Hugh found with page re-use. It all boils down to the fact that
the page writeback is ostensibly serialized by the page lock, but that
isn't actually really true.
Yes, the people _setting_ writeback all do so under the page lock, but
the actual clearing of the bit - and waking up any waiters - happens
without any page lock.
This gives us this fairly simple race condition:
CPU1 = end previous writeback
CPU2 = start new writeback under page lock
CPU3 = write_cache_pages()
CPU1 CPU2 CPU3
---- ---- ----
end_page_writeback()
test_clear_page_writeback(page)
... delayed...
lock_page();
set_page_writeback()
unlock_page()
lock_page()
wait_on_page_writeback();
wake_up_page(page, PG_writeback);
.. wakes up CPU3 ..
BUG_ON(PageWriteback(page));
where the BUG_ON() happens because we woke up the PG_writeback bit
becasue of the _previous_ writeback, but a new one had already been
started because the clearing of the bit wasn't actually atomic wrt the
actual wakeup or serialized by the page lock.
The reason this didn't use to happen was that the old logic in waiting
on a page bit would just loop if it ever saw the bit set again.
The nice proper fix would probably be to get rid of the whole "wait for
writeback to clear, and then set it" logic in the writeback path, and
replace it with an atomic "wait-to-set" (ie the same as we have for page
locking: we set the page lock bit with a single "lock_page()", not with
"wait for lock bit to clear and then set it").
However, out current model for writeback is that the waiting for the
writeback bit is done by the generic VFS code (ie write_cache_pages()),
but the actual setting of the writeback bit is done much later by the
filesystem ".writepages()" function.
IOW, to make the writeback bit have that same kind of "wait-to-set"
behavior as we have for page locking, we'd have to change our roughly
~50 different writeback functions. Painful.
Instead, just make "wait_on_page_writeback()" loop on the very unlikely
situation that the PG_writeback bit is still set, basically re-instating
the old behavior. This is very non-optimal in case of contention, but
since we only ever set the bit under the page lock, that situation is
controlled.
Reported-by: syzbot+2fc0712f8f8b8b8fa0ef@syzkaller.appspotmail.com
Fixes: 2a9127fcf2 ("mm: rewrite wait_on_page_bit_common() logic")
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The AMD IOMMU initialisation registers the IRQ remapping domain for
each IOMMU before doing the final sanity check that every I/OAPIC is
covered.
This means that the AMD irq_remapping_select() function gets invoked
even when IRQ remapping has been disabled, eventually leading to a NULL
pointer dereference in alloc_irq_table().
Unfortunately, the IVRS isn't fully parsed early enough that the sanity
check can be done in time to registering the IRQ domain altogether.
Doing that would be nice, but is a larger and more error-prone task. The
simple fix is just for irq_remapping_select() to refuse to report a
match when IRQ remapping has disabled.
Link: https://lore.kernel.org/lkml/ed4be9b4-24ac-7128-c522-7ef359e8185d@gmx.at
Fixes: a1a785b572 ("iommu/amd: Implement select() method on remapping irqdomain")
Reported-by: Johnathan Smithinovic <johnathan.smithinovic@gmx.at>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Link: https://lore.kernel.org/r/04bbe8bca87f81a3cfa93ec4299e53f47e00e5b3.camel@infradead.org
Signed-off-by: Will Deacon <will@kernel.org>
When I made the INTCAPXT support stop gratuitously pretending to be MSI,
I missed the fact that iommu_setup_msi() also sets the ->int_enabled
flag. I missed this in the iommu_setup_intcapxt() code path, which means
that a resume from suspend will try to allocate the IRQ domains again,
accidentally re-enabling interrupts as it does, resulting in much sadness.
Lift out the bit which sets iommu->int_enabled into the iommu_init_irq()
function which is also where it gets checked.
Link: https://lore.kernel.org/r/20210104132250.GE32151@zn.tnic/
Fixes: d1adcfbb52 ("iommu/amd: Fix IOMMU interrupt generation in X2APIC mode")
Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Tested-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/50cd5f55be8ead0937ac315cd2f5b89364f6a9a5.camel@infradead.org
Signed-off-by: Will Deacon <will@kernel.org>
Fix follow warning:
fs/io_uring.c:1523:22: warning: variable ‘id’ set but not used
[-Wunused-but-set-variable]
struct io_identity *id;
^~
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Ye Bin <yebin10@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
BFQ computes number of tags it allows to be allocated for each request type
based on tag bitmap. However it uses 1 << bitmap.shift as number of
available tags which is wrong. 'shift' is just an internal bitmap value
containing logarithm of how many bits bitmap uses in each bitmap word.
Thus number of tags allowed for some request types can be far to low.
Use proper bitmap.depth which has the number of tags instead.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
When initializing iocost for a queue, its rqos should be registered before
the blkcg policy is activated to allow policy data initiailization to lookup
the associated ioc. This unfortunately means that the rqos methods can be
called on bios before iocgs are attached to all existing blkgs.
While the race is theoretically possible on ioc_rqos_throttle(), it mostly
happened in ioc_rqos_merge() due to the difference in how they lookup ioc.
The former determines it from the passed in @rqos and then bails before
dereferencing iocg if the looked up ioc is disabled, which most likely is
the case if initialization is still in progress. The latter looked up ioc by
dereferencing the possibly NULL iocg making it a lot more prone to actually
triggering the bug.
* Make ioc_rqos_merge() use the same method as ioc_rqos_throttle() to look
up ioc for consistency.
* Make ioc_rqos_throttle() and ioc_rqos_merge() test for NULL iocg before
dereferencing it.
* Explain the danger of NULL iocgs in blk_iocost_init().
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Jonathan Lemon <bsd@fb.com>
Cc: stable@vger.kernel.org # v5.4+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The IN and OUT instructions with port address as an immediate operand
only use an 8-bit immediate (imm8). The current VC handler uses the
entire 32-bit immediate value but these instructions only set the first
bytes.
Cast the operand to an u8 for that.
[ bp: Massage commit message. ]
Fixes: 25189d08e5 ("x86/sev-es: Add support for handling IOIO exceptions")
Signed-off-by: Peter Gonda <pgonda@google.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: David Rientjes <rientjes@google.com>
Link: https://lkml.kernel.org/r/20210105163311.221490-1-pgonda@google.com
Commit 49b3cf035e ("kasan: arm64: set TCR_EL1.TBID1 when enabled") set
the TBID1 bit for the KASAN_SW_TAGS configuration, freeing up 8 bits to
be used by PAC. With in-kernel MTE now in mainline, also set this bit
for the KASAN_HW_TAGS configuration.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Peter Collingbourne <pcc@google.com>
Acked-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Acked-by: Andrey Konovalov <andreyknvl@google.com>
Currently with ld.lld we emit an empty .eh_frame_hdr section (and a
corresponding program header) into the vDSO. With ld.bfd the section
is not emitted but the program header is, with p_vaddr set to 0. This
can lead to unwinders attempting to interpret the data at whichever
location the program header happens to point to as an unwind info
header. This happens to be mostly harmless as long as the byte at
that location (interpreted as a version number) has a value other
than 1, causing both libgcc and LLVM libunwind to ignore the section
(in libunwind's case, after printing an error message to stderr),
but it could lead to worse problems if the byte happened to be 1 or
the program header points to non-readable memory (e.g. if the empty
section was placed at a page boundary).
Instead of disabling .eh_frame_hdr via --no-eh-frame-hdr (which
also has the downside of being unsupported by older versions of GNU
binutils), disable it by discarding the section, and stop emitting
the program header that points to it.
I understand that we intend to emit valid unwind info for the vDSO
at some point. Once that happens this patch can be reverted.
Signed-off-by: Peter Collingbourne <pcc@google.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Link: https://linux-review.googlesource.com/id/If745fd9cadcb31b4010acbf5693727fe111b0863
Link: https://lore.kernel.org/r/20201230221954.2007257-1-pcc@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Currently the kexec kernel can panic or hang due to 2 causes:
1) hv_cpu_die() is not called upon kexec, so the hypervisor corrupts the
old VP Assist Pages when the kexec kernel runs. The same issue is fixed
for hibernation in commit 421f090c81 ("x86/hyperv: Suspend/resume the
VP assist page for hibernation"). Now fix it for kexec.
2) hyperv_cleanup() is called too early. In the kexec path, the other CPUs
are stopped in hv_machine_shutdown() -> native_machine_shutdown(), so
between hv_kexec_handler() and native_machine_shutdown(), the other CPUs
can still try to access the hypercall page and cause panic. The workaround
"hv_hypercall_pg = NULL;" in hyperv_cleanup() is unreliabe. Move
hyperv_cleanup() to a better place.
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20201222065541.24312-1-decui@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
This code will leak "map->debugfs_name" because the if statement is
reversed so it only frees NULL pointers instead of non-NULL. In
fact the if statement is not required and should just be removed
because kfree() accepts NULL pointers.
Fixes: cffa4b2122 ("regmap: debugfs: Fix a memory leak when calling regmap_attach_dev")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: https://lore.kernel.org/r/X/RQpfAwRdLg0GqQ@mwanda
Signed-off-by: Mark Brown <broonie@kernel.org>
With GNU binutils 2.35+, linking with BFD produces warnings for vmlinux:
aarch64-linux-gnu-ld: warning: -z norelro ignored
BFD can produce this warning when the target emulation mode does not
support RELRO program headers, and -z relro or -z norelro is passed.
Alan Modra clarifies:
The default linker emulation for an aarch64-linux ld.bfd is
-maarch64linux, the default for an aarch64-elf linker is
-maarch64elf. They are not equivalent. If you choose -maarch64elf
you get an emulation that doesn't support -z relro.
The ARCH=arm64 kernel prefers -maarch64elf, but may fall back to
-maarch64linux based on the toolchain configuration.
LLD will always create RELRO program header regardless of target
emulation.
To avoid the above warning when linking with BFD, pass -z norelro only
when linking with LLD or with -maarch64linux.
Fixes: 3b92fa7485 ("arm64: link with -z norelro regardless of CONFIG_RELOCATABLE")
Fixes: 3bbd3db864 ("arm64: relocatable: fix inconsistencies in linker script and options")
Cc: <stable@vger.kernel.org> # 5.0.x-
Reported-by: kernelci.org bot <bot@kernelci.org>
Reported-by: Quentin Perret <qperret@google.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Alan Modra <amodra@gmail.com>
Cc: Fāng-ruì Sòng <maskray@google.com>
Link: https://lore.kernel.org/r/20201218002432.788499-1-ndesaulniers@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Commit
28ee90fe60 ("x86/mm: implement free pmd/pte page interfaces")
introduced a new location where a pmd was released, but neglected to
run the pmd page destructor. In fact, this happened previously for a
different pmd release path and was fixed by commit:
c283610e44 ("x86, mm: do not leak page->ptl for pmd page tables").
This issue was hidden until recently because the failure mode is silent,
but commit:
b2b29d6d01 ("mm: account PMD tables like PTE tables")
turns the failure mode into this signature:
BUG: Bad page state in process lt-pmem-ns pfn:15943d
page:000000007262ed7b refcount:0 mapcount:-1024 mapping:0000000000000000 index:0x0 pfn:0x15943d
flags: 0xaffff800000000()
raw: 00affff800000000 dead000000000100 0000000000000000 0000000000000000
raw: 0000000000000000 ffff913a029bcc08 00000000fffffbff 0000000000000000
page dumped because: nonzero mapcount
[..]
dump_stack+0x8b/0xb0
bad_page.cold+0x63/0x94
free_pcp_prepare+0x224/0x270
free_unref_page+0x18/0xd0
pud_free_pmd_page+0x146/0x160
ioremap_pud_range+0xe3/0x350
ioremap_page_range+0x108/0x160
__ioremap_caller.constprop.0+0x174/0x2b0
? memremap+0x7a/0x110
memremap+0x7a/0x110
devm_memremap+0x53/0xa0
pmem_attach_disk+0x4ed/0x530 [nd_pmem]
? __devm_release_region+0x52/0x80
nvdimm_bus_probe+0x85/0x210 [libnvdimm]
Given this is a repeat occurrence it seemed prudent to look for other
places where this destructor might be missing and whether a better
helper is needed. try_to_free_pmd_page() looks like a candidate, but
testing with setting up and tearing down pmd mappings via the dax unit
tests is thus far not triggering the failure.
As for a better helper pmd_free() is close, but it is a messy fit
due to requiring an @mm arg. Also, ___pmd_free_tlb() wants to call
paravirt_tlb_remove_table() instead of free_page(), so open-coded
pgtable_pmd_page_dtor() seems the best way forward for now.
Debugged together with Matthew Wilcox <willy@infradead.org>.
Fixes: 28ee90fe60 ("x86/mm: implement free pmd/pte page interfaces")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Yi Zhang <yi.zhang@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/160697689204.605323.17629854984697045602.stgit@dwillia2-desk3.amr.corp.intel.com
With the apdma remove hand-shake signal, it requirs special
operation timing to reset i2c manually, otherwise the interrupt
will not be triggered, i2c transmission will be timeout.
Fixes: 8426fe70cfa4("i2c: mediatek: Add apdma sync in i2c driver")
Signed-off-by: Qii Wang <qii.wang@mediatek.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
The pwms property have to specify the no-/inverted flag since
commit fa28d8212e ("ARM: dts: imx: default to #pwm-cells = <3>
in the SoC dtsi files").
Fixes: fa28d8212e ("ARM: dts: imx: default to #pwm-cells = <3> in the SoC dtsi files")
Signed-off-by: Marco Felsch <m.felsch@pengutronix.de>
Reviewed-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
While io_ring_exit_work() is running new requests of all sorts may be
issued, so it should do a bit more to cancel them, otherwise they may
just get stuck. e.g. in io-wq, in poll lists, etc.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
io_uring fds marked O_CLOEXEC and we explicitly cancel all requests
before going through exec, so we don't want to leave task's file
references to not our anymore io_uring instances.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
IOPOLL skips completion locking but keeps it under uring_lock, thus
io_cqring_overflow_flush() and so io_cqring_events() need additional
locking with uring_lock in some cases for IOPOLL.
Remove __io_cqring_overflow_flush() from io_cqring_events(), introduce a
wrapper around flush doing needed synchronisation and call it by hand.
Cc: stable@vger.kernel.org # 5.5+
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
io_req_task_submit() might be called for IOPOLL, do the fail path under
uring_lock to comply with IOPOLL synchronisation based solely on it.
Cc: stable@vger.kernel.org # 5.5+
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
1) Missing sanitization of rateest userspace string, bug has been
triggered by syzbot, patch from Florian Westphal.
2) Report EOPNOTSUPP on missing set features in nft_dynset, otherwise
error reporting to userspace via EINVAL is misleading since this is
reserved for malformed netlink requests.
3) New binaries with old kernels might silently accept several set
element expressions. New binaries set on the NFT_SET_EXPR and
NFT_DYNSET_F_EXPR flags to request for several expressions per
element, hence old kernels which do not support for this bail out
with EOPNOTSUPP.
* git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf:
netfilter: nftables: add set expression flags
netfilter: nft_dynset: report EOPNOTSUPP on missing set feature
netfilter: xt_RATEEST: reject non-null terminated string from userspace
====================
Link: https://lore.kernel.org/r/20210103192920.18639-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Martin Blumenstingl says:
====================
net: dsa: lantiq_gswip: two fixes for -net/-stable
While testing the lantiq_gswip driver in OpenWrt at least one board had
a non-working Ethernet port connected to an internal 100Mbit/s PHY22F
GPHY. The problem which could be observed:
- the PHY would detect the link just fine
- ethtool stats would see the TX counter rise
- the RX counter in ethtool was stuck at zero
It turns out that two independent patches are needed to fix this:
- first we need to enable the MII data lines also for internal PHYs
- second we need to program the GSWIP_MII_CFG registers for all ports
except the CPU port
These two patches have also been tested by back-porting them on top of
Linux 5.4.86 in OpenWrt.
Special thanks to Hauke for debugging and brainstorming this on IRC
with me!
====================
Link: https://lore.kernel.org/r/20210103012544.3259029-1-martin.blumenstingl@googlemail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
There is one GSWIP_MII_CFG register for each switch-port except the CPU
port. The register offset for the first port is 0x0, 0x02 for the
second, 0x04 for the third and so on.
Update the driver to not only restrict the GSWIP_MII_CFG registers to
ports 0, 1 and 5. Handle ports 0..5 instead but skip the CPU port. This
means we are not overwriting the configuration for the third port (port
two since we start counting from zero) with the settings for the sixth
port (with number five) anymore.
The GSWIP_MII_PCDU(p) registers are not updated because there's really
only three (one for each of the following ports: 0, 1, 5).
Fixes: 14fceff477 ("net: dsa: Add Lantiq / Intel DSA driver for vrx200")
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Acked-by: Hauke Mehrtens <hauke@hauke-m.de>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Enable GSWIP_MII_CFG_EN also for internal PHYs to make traffic flow.
Without this the PHY link is detected properly and ethtool statistics
for TX are increasing but there's no RX traffic coming in.
Fixes: 14fceff477 ("net: dsa: Add Lantiq / Intel DSA driver for vrx200")
Suggested-by: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Acked-by: Hauke Mehrtens <hauke@hauke-m.de>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In lapb_device_event, lapb_devtostruct is called to get a reference to
an object of "struct lapb_cb". lapb_devtostruct increases the refcount
of the object and returns a pointer to it. However, we didn't decrease
the refcount after we finished using the pointer. This patch fixes this
problem.
Fixes: a4989fa911 ("net/lapb: support netdev events")
Cc: Martin Schiller <ms@dev.tdt.de>
Signed-off-by: Xie He <xie.he.0141@gmail.com>
Link: https://lore.kernel.org/r/20201231174331.64539-1-xie.he.0141@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The test was setting the headroom size of the wrong port. This was not
visible because of a firmware bug that canceled this bug.
Set the headroom size of the correct port, so that the test will pass
with both old and new firmware versions.
Fixes: bfa804784e ("selftests: mlxsw: Add a PFC test")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://lore.kernel.org/r/20201230114251.394009-1-idosch@idosch.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
A new flag MACB_CAPS_CLK_HW_CHG was added and all callers of
macb_set_tx_clk were gated on the presence of this flag.
- if (!clk)
+ if (!bp->tx_clk || !(bp->caps & MACB_CAPS_CLK_HW_CHG))
However the flag was not added to anything other than the new
sama7g5_gem, turning that function call into a no op for all other
systems. This breaks the networking on Zynq.
The commit message adding this states: a new capability so that
macb_set_tx_clock() to not be called for IPs having this
capability
This strongly implies that present of the flag was intended to skip
the function not absence of the flag. Update the if statement to
this effect, which repairs the existing users.
Fixes: daafa1d33c ("net: macb: add capability to not set the clock rate")
Suggested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Reviewed-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20210104103802.13091-1-ckeepax@opensource.cirrus.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Before commit 889b8f964f ("packet: Kill CONFIG_PACKET_MMAP.") there
used to be a CONFIG_PACKET_MMAP config symbol that depended on
CONFIG_PACKET. The text still implies that PACKET_MMAP can be disabled.
Remove that from the text, as well as reference to old kernel versions.
Also, drop reference to broken link to information for pre 2.6.5
kernels.
Make a slight working improvement (s/In/On/) while at it.
Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Link: https://lore.kernel.org/r/80089f3783372c8fd7833f28ce774a171b2ef252.1609232919.git.baruch@tkos.co.il
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Like other tunneling interfaces, the bareudp doesn't need TXLOCK.
So, It is good to set the NETIF_F_LLTX flag to improve performance and
to avoid lockdep's false-positive warning.
Test commands:
ip netns add A
ip netns add B
ip link add veth0 netns A type veth peer name veth1 netns B
ip netns exec A ip link set veth0 up
ip netns exec A ip a a 10.0.0.1/24 dev veth0
ip netns exec B ip link set veth1 up
ip netns exec B ip a a 10.0.0.2/24 dev veth1
for i in {2..1}
do
let A=$i-1
ip netns exec A ip link add bareudp$i type bareudp \
dstport $i ethertype ip
ip netns exec A ip link set bareudp$i up
ip netns exec A ip a a 10.0.$i.1/24 dev bareudp$i
ip netns exec A ip r a 10.0.$i.2 encap ip src 10.0.$A.1 \
dst 10.0.$A.2 via 10.0.$i.2 dev bareudp$i
ip netns exec B ip link add bareudp$i type bareudp \
dstport $i ethertype ip
ip netns exec B ip link set bareudp$i up
ip netns exec B ip a a 10.0.$i.2/24 dev bareudp$i
ip netns exec B ip r a 10.0.$i.1 encap ip src 10.0.$A.2 \
dst 10.0.$A.1 via 10.0.$i.1 dev bareudp$i
done
ip netns exec A ping 10.0.2.2
Splat looks like:
[ 96.992803][ T822] ============================================
[ 96.993954][ T822] WARNING: possible recursive locking detected
[ 96.995102][ T822] 5.10.0+ #819 Not tainted
[ 96.995927][ T822] --------------------------------------------
[ 96.997091][ T822] ping/822 is trying to acquire lock:
[ 96.998083][ T822] ffff88810f753898 (_xmit_NONE#2){+.-.}-{2:2}, at: __dev_queue_xmit+0x1f52/0x2960
[ 96.999813][ T822]
[ 96.999813][ T822] but task is already holding lock:
[ 97.001192][ T822] ffff88810c385498 (_xmit_NONE#2){+.-.}-{2:2}, at: __dev_queue_xmit+0x1f52/0x2960
[ 97.002908][ T822]
[ 97.002908][ T822] other info that might help us debug this:
[ 97.004401][ T822] Possible unsafe locking scenario:
[ 97.004401][ T822]
[ 97.005784][ T822] CPU0
[ 97.006407][ T822] ----
[ 97.007010][ T822] lock(_xmit_NONE#2);
[ 97.007779][ T822] lock(_xmit_NONE#2);
[ 97.008550][ T822]
[ 97.008550][ T822] *** DEADLOCK ***
[ 97.008550][ T822]
[ 97.010057][ T822] May be due to missing lock nesting notation
[ 97.010057][ T822]
[ 97.011594][ T822] 7 locks held by ping/822:
[ 97.012426][ T822] #0: ffff888109a144f0 (sk_lock-AF_INET){+.+.}-{0:0}, at: raw_sendmsg+0x12f7/0x2b00
[ 97.014191][ T822] #1: ffffffffbce2f5a0 (rcu_read_lock_bh){....}-{1:2}, at: ip_finish_output2+0x249/0x2020
[ 97.016045][ T822] #2: ffffffffbce2f5a0 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x1fd/0x2960
[ 97.017897][ T822] #3: ffff88810c385498 (_xmit_NONE#2){+.-.}-{2:2}, at: __dev_queue_xmit+0x1f52/0x2960
[ 97.019684][ T822] #4: ffffffffbce2f600 (rcu_read_lock){....}-{1:2}, at: bareudp_xmit+0x31b/0x3690 [bareudp]
[ 97.021573][ T822] #5: ffffffffbce2f5a0 (rcu_read_lock_bh){....}-{1:2}, at: ip_finish_output2+0x249/0x2020
[ 97.023424][ T822] #6: ffffffffbce2f5a0 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x1fd/0x2960
[ 97.025259][ T822]
[ 97.025259][ T822] stack backtrace:
[ 97.026349][ T822] CPU: 3 PID: 822 Comm: ping Not tainted 5.10.0+ #819
[ 97.027609][ T822] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
[ 97.029407][ T822] Call Trace:
[ 97.030015][ T822] dump_stack+0x99/0xcb
[ 97.030783][ T822] __lock_acquire.cold.77+0x149/0x3a9
[ 97.031773][ T822] ? stack_trace_save+0x81/0xa0
[ 97.032661][ T822] ? register_lock_class+0x1910/0x1910
[ 97.033673][ T822] ? register_lock_class+0x1910/0x1910
[ 97.034679][ T822] ? rcu_read_lock_sched_held+0x91/0xc0
[ 97.035697][ T822] ? rcu_read_lock_bh_held+0xa0/0xa0
[ 97.036690][ T822] lock_acquire+0x1b2/0x730
[ 97.037515][ T822] ? __dev_queue_xmit+0x1f52/0x2960
[ 97.038466][ T822] ? check_flags+0x50/0x50
[ 97.039277][ T822] ? netif_skb_features+0x296/0x9c0
[ 97.040226][ T822] ? validate_xmit_skb+0x29/0xb10
[ 97.041151][ T822] _raw_spin_lock+0x30/0x70
[ 97.041977][ T822] ? __dev_queue_xmit+0x1f52/0x2960
[ 97.042927][ T822] __dev_queue_xmit+0x1f52/0x2960
[ 97.043852][ T822] ? netdev_core_pick_tx+0x290/0x290
[ 97.044824][ T822] ? mark_held_locks+0xb7/0x120
[ 97.045712][ T822] ? lockdep_hardirqs_on_prepare+0x12c/0x3e0
[ 97.046824][ T822] ? __local_bh_enable_ip+0xa5/0xf0
[ 97.047771][ T822] ? ___neigh_create+0x12a8/0x1eb0
[ 97.048710][ T822] ? trace_hardirqs_on+0x41/0x120
[ 97.049626][ T822] ? ___neigh_create+0x12a8/0x1eb0
[ 97.050556][ T822] ? __local_bh_enable_ip+0xa5/0xf0
[ 97.051509][ T822] ? ___neigh_create+0x12a8/0x1eb0
[ 97.052443][ T822] ? check_chain_key+0x244/0x5f0
[ 97.053352][ T822] ? rcu_read_lock_bh_held+0x56/0xa0
[ 97.054317][ T822] ? ip_finish_output2+0x6ea/0x2020
[ 97.055263][ T822] ? pneigh_lookup+0x410/0x410
[ 97.056135][ T822] ip_finish_output2+0x6ea/0x2020
[ ... ]
Acked-by: Guillaume Nault <gnault@redhat.com>
Fixes: 571912c69f ("net: UDP tunnel encapsulation module for tunnelling different protocols like MPLS, IP, NSH etc.")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Link: https://lore.kernel.org/r/20201228152136.24215-1-ap420073@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Unfortunately, there's userland code that used to rely upon these
checks being done before anything else to check for UMOUNT_NOFOLLOW
support. That broke in 41525f56e2 ("fs: refactor ksys_umount").
Separate those from the rest of checks and move them to ksys_umount();
unlike everything else in there, this can be sanely done there.
Reported-by: Sargun Dhillon <sargun@sargun.me>
Fixes: 41525f56e2 ("fs: refactor ksys_umount")
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Without crc32 support, this driver fails to link:
arm-linux-gnueabi-ld: drivers/md/dm-zoned-metadata.o: in function `dmz_write_sb':
dm-zoned-metadata.c:(.text+0xe98): undefined reference to `crc32_le'
arm-linux-gnueabi-ld: drivers/md/dm-zoned-metadata.o: in function `dmz_check_sb':
dm-zoned-metadata.c:(.text+0x7978): undefined reference to `crc32_le'
Fixes: 3b1a94c88b ("dm zoned: drive-managed zoned block device target")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
The integrity target relies on skcipher for encryption/decryption, but
certain kernel configurations may not enable CRYPTO_SKCIPHER, leading to
compilation errors due to unresolved symbols. Explicitly select
CRYPTO_SKCIPHER for DM_INTEGRITY, since it is unconditionally dependent
on it.
Signed-off-by: Anthony Iliopoulos <ailiop@suse.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Block core warned that discard_granularity was 0 for dm-raid with
personality of raid1. Reason is that raid_io_hints() was incorrectly
special-casing raid1 rather than raid0.
Fix raid_io_hints() by removing discard limits settings for
raid1. Check for raid0 instead.
Fixes: 61697a6abd ("dm: eliminate 'split_discard_bios' flag from DM target interface")
Cc: stable@vger.kernel.org
Reported-by: Zdenek Kabelac <zkabelac@redhat.com>
Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Reported-by: Stephan Bärwolf <stephan@matrixstorm.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Pull RCU fix from Paul McKenney:
"This is a fix for a regression in the v5.10 merge window, but it was
reported quite late in the v5.10 process, plus generating and testing
the fix took some time.
The regression is due to commit 36dadef23f ("kprobes: Init kprobes
in early_initcall") which on powerpc can use RCU Tasks before
initialization, resulting in boot failures.
The fix is straightforward, simply moving initialization of RCU Tasks
before the early_initcall()s. The fix has been exposed to -next and
kbuild test robot testing, and has been tested by the PowerPC guys"
* 'rcu/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
rcu-tasks: Move RCU-tasks initialization to before early_initcall()
Pull ENABLE_MUST_CHECK removal from Miguel Ojeda:
"Remove CONFIG_ENABLE_MUST_CHECK (Masahiro Yamada)"
Note that this removes the config option by making the must-check
unconditional, not by removing must check itself.
* tag 'compiler-attributes-for-linus-v5.11' of git://github.com/ojeda/linux:
Compiler Attributes: remove CONFIG_ENABLE_MUST_CHECK
Merge two commits that had dependencies on other 5.11 trees (the block
and the irq trees respectively).
- We reverted a megaraid_sas change in 5.10 due to missing block
layer plumbing. Now that this is in place, reinstate the change.
- The hisi_sas driver had a dependency on a driver core irq change
that went in through Thomas' tree.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
gpiod_add_lookup_table() expects the gpiod_lookup_table->table passed to
it to be terminated with a zero-ed out entry.
So we need to allocate one more entry then we will use.
Fixes: d308dfbf62 ("i2c: mux/i801: Switch to use descriptor passing")
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Acked-by: Jean Delvare <jdelvare@suse.de>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
If the i2c device SCL bus being pulled up due to some exception before
message transfer done, the system cannot receive the completing interrupt
signal any more, it would not exit waiting loop until MAX_SCHEDULE_TIMEOUT
jiffies eclipse, that would make the system seemed hang up. To avoid that
happen, this patch adds a specific timeout for message transfer.
Fixes: 8b9ec07198 ("i2c: Add Spreadtrum I2C controller driver")
Signed-off-by: Linhua Xu <linhua.xu@unisoc.com>
Signed-off-by: Chunyan Zhang <chunyan.zhang@unisoc.com>
[wsa: changed errno to ETIMEDOUT]
Signed-off-by: Wolfram Sang <wsa@kernel.org>
KVM_ARM_PMU only existed for the benefit of 32bit ARM hosts,
and makes no sense now that we are 64bit only. Get rid of it.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Since a few years, kernel addresses are no longer included in oops
dumps, at least on x86. All we get is a symbol name with offset and
size.
This is a problem for ceph_connection_operations handlers, especially
con->ops->dispatch(). All three handlers have the same name and there
is little context to disambiguate between e.g. monitor and OSD clients
because almost everything is inlined. gdb sneakily stops at the first
matching symbol, so one has to resort to nm and addr2line.
Some of these are already prefixed with mon_, osd_ or mds_. Let's do
the same for all others.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Acked-by: Jeff Layton <jlayton@kernel.org>
Try and avoid leaving bits and pieces of session key and connection
secret (gets split into GCM key and a pair of GCM IVs) around.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
The BPF selftests have build time dependencies on cutting edge versions
of tools in the BPF ecosystem including LLVM which are more involved
to satisfy than more typical requirements like installing a package from
your distribution. This causes issues for users looking at kselftest in
as a whole who find that a default build of kselftest fails and that
resolving this is time consuming and adds administrative overhead. The
fast pace of BPF development and the need for a full BPF stack to do
substantial development or validation work on the code mean that people
working directly on it don't see a reasonable way to keep supporting
older environments without causing problems with the usability of the
BPF tests in BPF development so these requirements are unlikely to be
relaxed in the immediate future.
There is already support for skipping targets so in order to reduce the
barrier to entry for people interested in kselftest as a whole let's use
that to skip the BPF tests by default when people work with the top
level kselftest build system. Users can still build the BPF selftests
as part of the wider kselftest build by specifying SKIP_TARGETS,
including setting an empty SKIP_TARGETS to build everything. They can
also continue to build the BPF selftests individually in cases where
they are specifically focused on BPF.
This isn't ideal since it means people will need to take special steps
to build the BPF tests but the dependencies mean that realistically this
is already the case to some extent and it makes it easier for people to
pick up and work with the other selftests which is hopefully a net win.
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Fix the following -Wformat warnings in vdso_test_correctness.c:
vdso_test_correctness.c: In function ‘test_one_clock_gettime64’:
vdso_test_correctness.c:352:21: warning: format ‘%ld’ expects argument of type ‘long int’, but argument 3 has type ‘long long int’ [-Wformat=]
352 | printf("\t%llu.%09ld %llu.%09ld %llu.%09ld\n",
| ~~~~^
| |
| long int
| %09lld
353 | (unsigned long long)start.tv_sec, start.tv_nsec,
| ~~~~~~~~~~~~~
| |
| long long int
vdso_test_correctness.c:352:32: warning: format ‘%ld’ expects argument of type ‘long int’, but argument 5 has type ‘long long int’ [-Wformat=]
352 | printf("\t%llu.%09ld %llu.%09ld %llu.%09ld\n",
| ~~~~^
| |
| long int
| %09lld
353 | (unsigned long long)start.tv_sec, start.tv_nsec,
354 | (unsigned long long)vdso.tv_sec, vdso.tv_nsec,
| ~~~~~~~~~~~~
| |
| long long int
vdso_test_correctness.c:352:43: warning: format ‘%ld’ expects argument of type ‘long int’, but argument 7 has type ‘long long int’ [-Wformat=]
The tv_sec member of __kernel_timespec is long long, both in
uapi/linux/time_types.h and locally in vdso_test_correctness.c.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Add the test binaries introduced by commit 693f5ca08c ("kselftest:
Extend vDSO selftest"), commit 03f55c7952 ("kselftest: Extend vDSO
selftest to clock_getres") and commit c7e5789b24 ("kselftest: Move
test_vdso to the vDSO test suite") to .gitignore.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
When binding the ConfigFS gadget to a UDC, the functions in each
configuration are added in list order. However, if usb_add_function()
fails, the failed function is put back on its configuration's
func_list and purge_configs_funcs() is called to further clean up.
purge_configs_funcs() iterates over the configurations and functions
in forward order, calling unbind() on each of the previously added
functions. But after doing so, each function gets moved to the
tail of the configuration's func_list. This results in reshuffling
the original order of the functions within a configuration such
that the failed function now appears first even though it may have
originally appeared in the middle or even end of the list. At this
point if the ConfigFS gadget is attempted to re-bind to the UDC,
the functions will be added in a different order than intended,
with the only recourse being to remove and relink the functions all
over again.
An example of this as follows:
ln -s functions/mass_storage.0 configs/c.1
ln -s functions/ncm.0 configs/c.1
ln -s functions/ffs.adb configs/c.1 # oops, forgot to start adbd
echo "<udc device>" > UDC # fails
start adbd
echo "<udc device>" > UDC # now succeeds, but...
# bind order is
# "ADB", mass_storage, ncm
[30133.118289] configfs-gadget gadget: adding 'Mass Storage Function'/ffffff810af87200 to config 'c'/ffffff817d6a2520
[30133.119875] configfs-gadget gadget: adding 'cdc_network'/ffffff80f48d1a00 to config 'c'/ffffff817d6a2520
[30133.119974] using random self ethernet address
[30133.120002] using random host ethernet address
[30133.139604] usb0: HOST MAC 3e:27:46:ba:3e:26
[30133.140015] usb0: MAC 6e:28:7e:42:66:6a
[30133.140062] configfs-gadget gadget: adding 'Function FS Gadget'/ffffff80f3868438 to config 'c'/ffffff817d6a2520
[30133.140081] configfs-gadget gadget: adding 'Function FS Gadget'/ffffff80f3868438 --> -19
[30133.140098] configfs-gadget gadget: unbind function 'Mass Storage Function'/ffffff810af87200
[30133.140119] configfs-gadget gadget: unbind function 'cdc_network'/ffffff80f48d1a00
[30133.173201] configfs-gadget a600000.dwc3: failed to start g1: -19
[30136.661933] init: starting service 'adbd'...
[30136.700126] read descriptors
[30136.700413] read strings
[30138.574484] configfs-gadget gadget: adding 'Function FS Gadget'/ffffff80f3868438 to config 'c'/ffffff817d6a2520
[30138.575497] configfs-gadget gadget: adding 'Mass Storage Function'/ffffff810af87200 to config 'c'/ffffff817d6a2520
[30138.575554] configfs-gadget gadget: adding 'cdc_network'/ffffff80f48d1a00 to config 'c'/ffffff817d6a2520
[30138.575631] using random self ethernet address
[30138.575660] using random host ethernet address
[30138.595338] usb0: HOST MAC 2e:cf:43:cd:ca:c8
[30138.597160] usb0: MAC 6a:f0:9f:ee:82:a0
[30138.791490] configfs-gadget gadget: super-speed config #1: c
Fix this by reversing the iteration order of the functions in
purge_config_funcs() when unbinding them, and adding them back to
the config's func_list at the head instead of the tail. This
ensures that we unbind and unwind back to the original list order.
Fixes: 88af8bbe4e ("usb: gadget: the start of the configfs interface")
Signed-off-by: Chandana Kishori Chiluveru <cchiluve@codeaurora.org>
Signed-off-by: Jack Pham <jackp@codeaurora.org>
Reviewed-by: Peter Chen <peter.chen@nxp.com>
Link: https://lore.kernel.org/r/20201229224443.31623-1-jackp@codeaurora.org
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The pullup does not need to be enabled at below situations:
- For platforms which the hardware pullup starts after setting
register even they do not see the VBUS. If the pullup is always
enabled for these platforms, it will consume more power and
break the USB IF compliance tests [1].
- For platforms which need to do BC 1.2 charger detection after
seeing the VBUS. Pullup D+ will break the charger detection
flow.
- For platforms which the system suspend is allowed when the
connection with host is there but the bus is not at suspend.
For these platforms, it is better to disable pullup when
entering the system suspend otherwise the host may confuse
the device behavior after controller is in low power mode.
Disable pullup is considered as a disconnection event from
host side.
[1] USB-IF Full and Low Speed Compliance Test Procedure
F Back-voltage Testing
Section 7.2.1 of the USB specification requires that no device
shall supply (source) current on VBUS at its upstream facing
port at any time. From VBUS on its upstream facing port,
a device may only draw (sink) current. They may not provide power
to the pull-up resistor on D+/D- unless VBUS is present (see
Section 7.1.5).
Signed-off-by: Peter Chen <peter.chen@nxp.com>
Link: https://lore.kernel.org/r/20201230051114.21370-1-peter.chen@nxp.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There is a use-after-free issue, if access udc_name
in function gadget_dev_desc_UDC_store after another context
free udc_name in function unregister_gadget.
Context 1:
gadget_dev_desc_UDC_store()->unregister_gadget()->
free udc_name->set udc_name to NULL
Context 2:
gadget_dev_desc_UDC_show()-> access udc_name
Call trace:
dump_backtrace+0x0/0x340
show_stack+0x14/0x1c
dump_stack+0xe4/0x134
print_address_description+0x78/0x478
__kasan_report+0x270/0x2ec
kasan_report+0x10/0x18
__asan_report_load1_noabort+0x18/0x20
string+0xf4/0x138
vsnprintf+0x428/0x14d0
sprintf+0xe4/0x12c
gadget_dev_desc_UDC_show+0x54/0x64
configfs_read_file+0x210/0x3a0
__vfs_read+0xf0/0x49c
vfs_read+0x130/0x2b4
SyS_read+0x114/0x208
el0_svc_naked+0x34/0x38
Add mutex_lock to protect this kind of scenario.
Signed-off-by: Eddie Hung <eddie.hung@mediatek.com>
Signed-off-by: Macpaul Lin <macpaul.lin@mediatek.com>
Reviewed-by: Peter Chen <peter.chen@nxp.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/1609239215-21819-1-git-send-email-macpaul.lin@mediatek.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Stack-allocated buffers cannot be used for DMA (on all architectures).
Replace the HP-channel macro with a helper function that allocates a
dedicated transfer buffer so that it can continue to be used with
arguments from the stack.
Note that the buffer is cleared on allocation as usblp_ctrl_msg()
returns success also on short transfers (the buffer is only used for
debugging).
Cc: stable@vger.kernel.org
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://lore.kernel.org/r/20210104145302.2087-1-johan@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Clevo W35xSS_370SS with VIA codec has had the runtime PM problem that
looses the power state of some nodes after the runtime resume. This
was worked around by disabling the default runtime PM via a denylist
entry. Since 5.10.x made the runtime PM applied (casually) even
though it's disabled in the denylist, this problem was revisited. The
result was that disabling power_save_node feature suffices for the
runtime PM problem.
This patch implements the disablement of power_save_node feature in
VIA codec for the device. It also drops the former denylist entry,
too, as the runtime PM should work in the codec side properly now.
Fixes: b529ef2464 ("ALSA: hda: Add Clevo W35xSS_370SS to the power_save blacklist")
Reported-by: Christian Labisch <clnetbox@gmail.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20210104153046.19993-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Stack-allocated buffers cannot be used for DMA (on all architectures) so
allocate the flush command buffer using kmalloc().
Fixes: 60a8fc0171 ("USB: add iuu_phoenix driver")
Cc: stable <stable@vger.kernel.org> # 2.6.25
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
Without crc32 support, this driver fails to link:
arm-linux-gnueabi-ld: drivers/hid/hid-sony.o: in function `sony_raw_event':
hid-sony.c:(.text+0x8f4): undefined reference to `crc32_le'
arm-linux-gnueabi-ld: hid-sony.c:(.text+0x900): undefined reference to `crc32_le'
arm-linux-gnueabi-ld: drivers/hid/hid-sony.o:hid-sony.c:(.text+0x4408): more undefined references to `crc32_le' follow
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
The new driver uses a phys_addr_t to store a DMA address,
which does not work when the two are different size:
drivers/hid/amd-sfh-hid/amd_sfh_client.c:157:11: error: incompatible pointer types passing 'phys_addr_t *' (aka 'unsigned int *') to parameter of type 'dma_addr_t *' (aka 'unsigned long long *') [-Werror,-Wincompatible-pointer-types]
&cl_data->sensor_phys_addr[i],
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/dma-mapping.h:393:15: note: passing argument to parameter 'dma_handle' here
dma_addr_t *dma_handle, gfp_t gfp)
^
Change both the type and the variable name to dma_addr for consistency.
Fixes: 4b2c53d93a ("SFH:Transport Driver to add support of AMD Sensor Fusion Hub (SFH)")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
The previous commit adding functionality for the palm sensor had a
mistake which meant the error conditions on initialisation was not checked
correctly. On some older platforms this meant that if the sensor wasn't
available an error would be returned and the driver would fail to load.
This commit corrects the error condition. Many thanks to Mario Oenning
for reporting and determining the issue
Signed-off-by: Mark Pearson <markpearson@lenovo.com>
Link: https://lore.kernel.org/r/20201230024726.7861-1-markpearson@lenovo.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
The Estar Beauty HD (MID 7316R) tablet uses a Goodix touchscreen,
with the X and Y coordinates swapped compared to the LCD panel.
Add a touchscreen_dmi entry for this adding a "touchscreen-swapped-x-y"
device-property to the i2c-client instantiated for this device before
the driver binds.
This is the first entry of a Goodix touchscreen to touchscreen_dmi.c,
so far DMI quirks for Goodix touchscreen's have been added directly
to drivers/input/touchscreen/goodix.c. Currently there are 3
DMI tables in goodix.c:
1. rotated_screen[] for devices where the touchscreen is rotated
180 degrees vs the LCD panel
2. inverted_x_screen[] for devices where the X axis is inverted
3. nine_bytes_report[] for devices which use a non standard touch
report size
Arguably only 3. really needs to be inside the driver and the other
2 cases are better handled through the generic touchscreen DMI quirk
mechanism from touchscreen_dmi.c, which allows adding device-props to
any i2c-client. Esp. now that goodix.c is using the generic
touchscreen_properties code.
Alternative to the approach from this patch we could add a 4th
dmi_system_id table for devices with swapped-x-y axis to goodix.c,
but that seems undesirable.
Cc: Bastien Nocera <hadess@hadess.net>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20201224135158.10976-1-hdegoede@redhat.com
There are several reports about the tps6598x causing
interrupt flood on boards with the INT3515 ACPI node, which
then causes instability. There appears to be several
problems with the interrupt. One problem is that the
I2CSerialBus resources do not always map to the Interrupt
resource with the same index, but that is not the only
problem. We have not been able to come up with a solution
for all the issues, and because of that disabling the device
for now.
The PD controller on these platforms is autonomous, and the
purpose for the driver is primarily to supply status to the
userspace, so this will not affect any functionality.
Reported-by: Moody Salem <moody@uniswap.org>
Fixes: a3dd034a17 ("ACPI / scan: Create platform device for INT3515 ACPI nodes")
Cc: stable@vger.kernel.org
BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1883511
Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Link: https://lore.kernel.org/r/20201223143644.33341-1-heikki.krogerus@linux.intel.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
All Microsoft Surface platform-specific device drivers depend on ACPI,
but the gatekeeper symbol SURFACE_PLATFORMS does not. Hence when the
user is configuring a kernel without ACPI support, he is still asked
about Microsoft Surface drivers, even though this question is
irrelevant.
Fix this by moving the dependency on ACPI from the individual driver
symbols to SURFACE_PLATFORMS.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Maximilian Luz <luzmaximilian@gmail.com>
Link: https://lore.kernel.org/r/20201216133752.1321978-1-geert@linux-m68k.org
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Fix build warnings when CONFIG_PM_SLEEP is not enabled and these
functions are not used:
../drivers/platform/surface/surface_gpe.c:189:12: warning: ‘surface_gpe_resume’ defined but not used [-Wunused-function]
static int surface_gpe_resume(struct device *dev)
^~~~~~~~~~~~~~~~~~
../drivers/platform/surface/surface_gpe.c:184:12: warning: ‘surface_gpe_suspend’ defined but not used [-Wunused-function]
static int surface_gpe_suspend(struct device *dev)
^~~~~~~~~~~~~~~~~~~
Fixes: 274335f1c5 ("platform/surface: Add Driver to set up lid GPEs on MS Surface device")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Maximilian Luz <luzmaximilian@gmail.com>
Cc: Hans de Goede <hdegoede@redhat.com>
Cc: platform-driver-x86@vger.kernel.org
Reviewed-by: Maximilian Luz <luzmaximilian@gmail.com>
Link: https://lore.kernel.org/r/20201214233336.19782-1-rdunlap@infradead.org
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
When BIOS disables turbo, The scaling_max_freq in cpufreq sysfs will be
limited to config level 0 base frequency. But when user selects a higher
config levels, this will result in higher base frequency. But since
scaling_max_freq is still old base frequency, the performance will still
be limited. So when the turbo is disabled and cpufreq base_frequency is
higher than scaling_max_freq, update the scaling_max_freq to the
base_frequency.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Link: https://lore.kernel.org/r/20201221071859.2783957-2-srinivas.pandruvada@linux.intel.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
- JZ4760 and JZ4760B have a similar register layout as the JZ4740, and
don't use the new register layout, which was introduced with the
JZ4770 SoC and not the JZ4760 or JZ4760B SoCs.
- The JZ4740 code path only expected two function modes to be
configurable for each pin, and wouldn't work with more than two. Fix
it for the JZ4760, which has four configurable function modes.
Fixes: 0257595a5c ("pinctrl: Ingenic: Add pinctrl driver for JZ4760 and JZ4760B.")
Cc: <stable@vger.kernel.org> # 5.3
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Link: https://lore.kernel.org/r/20201211232810.261565-1-paul@crapouillou.net
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
A built-in regulator driver cannot link against a modular cmd_db driver:
qcom-rpmh-regulator.c:(.text+0x174): undefined reference to `cmd_db_read_addr'
There is already a dependency for RPMh, so add another one of this
type for cmd_db.
Fixes: 34c5aa2666 ("regulator: Kconfig: Fix REGULATOR_QCOM_RPMH dependencies to avoid build error")
Fixes: 46fc033eba ("regulator: add QCOM RPMh regulator driver")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20201230145712.3133110-1-arnd@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Pointstick and its left/right buttons on HP EliteBook 850 G7 need
multi-input quirk to work correctly.
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
gcc points out an incorrect error handling loop:
drivers/dma/qcom/gpi.c: In function 'gpi_ch_init':
drivers/dma/qcom/gpi.c:1254:15: error: iteration 2 invokes undefined behavior [-Werror=aggressive-loop-optimizations]
1254 | struct gpii *gpii = gchan->gpii;
| ^~~~
drivers/dma/qcom/gpi.c:1951:2: note: within this loop
1951 | for (i = i - 1; i >= 0; i++) {
| ^~~
Change the loop to correctly walk backwards through the
initialized fields rather than off into the woods.
Fixes: 5d0c3533a1 ("dmaengine: qcom: Add GPI dma driver")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20210103135738.3741123-1-arnd@kernel.org
Signed-off-by: Vinod Koul <vkoul@kernel.org>
The number of dirent records used by an AFS directory entry should be
calculated using the assumption that there is a 16-byte name field in the
first block, rather than a 20-byte name field (which is actually the case).
This miscalculation is historic and effectively standard, so we have to use
it.
The calculation we need to use is:
1 + (((strlen(name) + 1) + 15) >> 5)
where we are adding one to the strlen() result to account for the NUL
termination.
Fix this by the following means:
(1) Create an inline function to do the calculation for a given name
length.
(2) Use the function to calculate the number of records used for a dirent
in afs_dir_iterate_block().
Use this to move the over-end check out of the loop since it only
needs to be done once.
Further use this to only go through the loop for the 2nd+ records
composing an entry. The only test there now is for if the record is
allocated - and we already checked the first block at the top of the
outer loop.
(3) Add a max name length check in afs_dir_iterate_block().
(4) Make afs_edit_dir_add() and afs_edit_dir_remove() use the function
from (1) to calculate the number of blocks rather than doing it
incorrectly themselves.
Fixes: 63a4681ff3 ("afs: Locally edit directory data for mkdir/create/unlink/...")
Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Marc Dionne <marc.dionne@auristor.com>
AFS has a structured layout in its directory contents (AFS dirs are
downloaded as files and parsed locally by the client for lookup/readdir).
The slots in the directory are defined by union afs_xdr_dirent. This,
however, only directly allows a name of a length that will fit into that
union. To support a longer name, the next 1-8 contiguous entries are
annexed to the first one and the name flows across these.
afs_dir_iterate_block() uses strnlen(), limited to the space to the end of
the page, to find out how long the name is. This worked fine until
6a39e62abb. With that commit, the compiler determines the size of the
array and asserts that the string fits inside that array. This is a
problem for AFS because we *expect* it to overflow one or more arrays.
A similar problem also occurs in afs_dir_scan_block() when a directory file
is being locally edited to avoid the need to redownload it. There strlen()
was being used safely because each page has the last byte set to 0 when the
file is downloaded and validated (in afs_dir_check_page()).
Fix this by changing the afs_xdr_dirent union name field to an
indeterminate-length array and dropping the overflow field.
(Note that whilst looking at this, I realised that the calculation of the
number of slots a dirent used is non-standard and not quite right, but I'll
address that in a separate patch.)
The issue can be triggered by something like:
touch /afs/example.com/thisisaveryveryverylongname
and it generates a report that looks like:
detected buffer overflow in strnlen
------------[ cut here ]------------
kernel BUG at lib/string.c:1149!
...
RIP: 0010:fortify_panic+0xf/0x11
...
Call Trace:
afs_dir_iterate_block+0x12b/0x35b
afs_dir_iterate+0x14e/0x1ce
afs_do_lookup+0x131/0x417
afs_lookup+0x24f/0x344
lookup_open.isra.0+0x1bb/0x27d
open_last_lookups+0x166/0x237
path_openat+0xe0/0x159
do_filp_open+0x48/0xa4
? kmem_cache_alloc+0xf5/0x16e
? __clear_close_on_exec+0x13/0x22
? _raw_spin_unlock+0xa/0xb
do_sys_openat2+0x72/0xde
do_sys_open+0x3b/0x58
do_syscall_64+0x2d/0x3a
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fixes: 6a39e62abb ("lib: string.h: detect intra-object overflow in fortified string functions")
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Marc Dionne <marc.dionne@auristor.com>
cc: Daniel Axtens <dja@axtens.net>
Systems configured with CONFIG_ZONE_DMA32, CONFIG_ZONE_NORMAL and
!CONFIG_ZONE_DMA will fail to properly setup ARCH_LOW_ADDRESS_LIMIT. The
limit will default to ~0ULL, effectively spanning the whole memory,
which is too high for a configuration that expects low memory to be
capped at 4GB.
Fix ARCH_LOW_ADDRESS_LIMIT by falling back to arm64_dma32_phys_limit
when arm64_dma_phys_limit isn't set. arm64_dma32_phys_limit will honour
CONFIG_ZONE_DMA32, or span the entire memory when not enabled.
Fixes: 1a8e1cef76 ("arm64: use both ZONE_DMA and ZONE_DMA32")
Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
Link: https://lore.kernel.org/r/20201218163307.10150-1-nsaenzjulienne@suse.de
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
arch/arm64/kernel/smp.c: In function ‘arch_show_interrupts’:
arch/arm64/kernel/smp.c:808:16: warning: unused variable ‘irq’ [-Wunused-variable]
808 | unsigned int irq = irq_desc_get_irq(ipi_desc[i]);
| ^~~
The removal of the last user forgot to remove the variable.
Fixes: 5089bc51f8 ("arm64/smp: Use irq_desc_kstat_cpu() in arch_show_interrupts()")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/20201215103026.2872532-1-geert+renesas@glider.be
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Use three-way comparison for address components to avoid integer
wraparound in the result of xfrm_policy_addr_delta(). This ensures
that the search trees are built and traversed correctly.
Treat IPv4 and IPv6 similarly by returning 0 when prefixlen == 0.
Prefix /0 has only one equivalence class.
Fixes: 9cf545ebd5 ("xfrm: policy: store inexact policies in a tree ordered by destination address")
Signed-off-by: Visa Hankala <visa@hankala.org>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
When running this xfrm_policy.sh test script, even with some cases
marked as FAIL, the overall test result will still be PASS:
$ sudo ./xfrm_policy.sh
PASS: policy before exception matches
FAIL: expected ping to .254 to fail (exceptions)
PASS: direct policy matches (exceptions)
PASS: policy matches (exceptions)
FAIL: expected ping to .254 to fail (exceptions and block policies)
PASS: direct policy matches (exceptions and block policies)
PASS: policy matches (exceptions and block policies)
FAIL: expected ping to .254 to fail (exceptions and block policies after hresh changes)
PASS: direct policy matches (exceptions and block policies after hresh changes)
PASS: policy matches (exceptions and block policies after hresh changes)
FAIL: expected ping to .254 to fail (exceptions and block policies after hthresh change in ns3)
PASS: direct policy matches (exceptions and block policies after hthresh change in ns3)
PASS: policy matches (exceptions and block policies after hthresh change in ns3)
FAIL: expected ping to .254 to fail (exceptions and block policies after htresh change to normal)
PASS: direct policy matches (exceptions and block policies after htresh change to normal)
PASS: policy matches (exceptions and block policies after htresh change to normal)
PASS: policies with repeated htresh change
$ echo $?
0
This is because the $lret in check_xfrm() is not a local variable.
Therefore when a test failed in check_exceptions(), the non-zero $lret
will later get reset to 0 when the next test calls check_xfrm().
With this fix, the final return value will be 1. Make it easier for
testers to spot this failure.
Fixes: 39aa6928d4 ("xfrm: policy: fix netlink/pf_key policy lookups")
Signed-off-by: Po-Hsu Lin <po-hsu.lin@canonical.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
xfrm_probe_algs() probes kernel crypto modules and changes the
availability of struct xfrm_algo_desc. But there is a small window
where ealg->available and aalg->available get changed between
count_ah_combs()/count_esp_combs() and dump_ah_combs()/dump_esp_combs(),
in this case we may allocate a smaller skb but later put a larger
amount of data and trigger the panic in skb_put().
Fix this by relaxing the checks when counting the size, that is,
skipping the test of ->available. We may waste some memory for a few
of sizeof(struct sadb_comb), but it is still much better than a panic.
Reported-by: syzbot+b2bf2652983d23734c5c@syzkaller.appspotmail.com
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
The disable_xfrm flag signals that xfrm should not be performed during
routing towards a device before reaching device xmit.
For xfrm interfaces this is usually desired as they perform the outbound
policy lookup as part of their xmit using their if_id.
Before this change enabling this flag on xfrm interfaces prevented them
from xmitting as xfrm_lookup_with_ifid() would not perform a policy lookup
in case the original dst had the DST_NOXFRM flag.
This optimization is incorrect when the lookup is done by the xfrm
interface xmit logic.
Fix by performing policy lookup when invoked by xfrmi as if_id != 0.
Similarly it's unlikely for the 'no policy exists on net' check to yield
any performance benefits when invoked from xfrmi.
Fixes: f203b76d78 ("xfrm: Add virtual xfrm interfaces")
Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Randconfig builds show another broken dependency:
WARNING: unmet direct dependencies detected for PHY_MTK_MIPI_DSI
Depends on [n]: ARCH_MEDIATEK [=n] && OF [=y]
Selected by [m]:
- DRM_MEDIATEK [=m] && HAS_IOMEM [=y] && DRM [=m] && (ARCH_MEDIATEK [=n] || ARM [=y] && COMPILE_TEST [=y]) && COMMON_CLK [=y] &&
HAVE_ARM_SMCCC [=y] && OF [=y] && MTK_MMSYS [=y]
This is similar to the hdmi driver I fixed earlier, and I guess the
common-clk bug would sooner or later also manifest here, so just use
the exact same solution I chose for the other driver, and hope that
any future drivers just copy it from here.
Fixes: 90f80d9599 ("phy: mediatek: Move mtk_mipi_dsi_phy driver into drivers/phy/mediatek folder")
Fixes: f5f6e01f91 ("phy: mediatek: allow compile-testing the hdmi phy")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20210103135524.3678664-1-arnd@kernel.org
Signed-off-by: Vinod Koul <vkoul@kernel.org>
On deferred probe, we will get the following splat:
cpcap-usb-phy cpcap-usb-phy.0: could not initialize VBUS or ID IIO: -517
WARNING: CPU: 0 PID: 21 at drivers/regulator/core.c:2123 regulator_put+0x68/0x78
...
(regulator_put) from [<c068ebf0>] (release_nodes+0x1b4/0x1fc)
(release_nodes) from [<c068a9a4>] (really_probe+0x104/0x4a0)
(really_probe) from [<c068b034>] (driver_probe_device+0x58/0xb4)
Signed-off-by: Tony Lindgren <tony@atomide.com>
Link: https://lore.kernel.org/r/20201230102105.11826-1-tony@atomide.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
The ILI251x seems to report pressure information in the 5th byte of
each per-finger touch data element. On the available hardware, this
information has the values ranging from 0x0 to 0xa, which is also
matching the downstream example code. Report pressure information
on the ILI251x.
Signed-off-by: Marek Vasut <marex@denx.de>
Link: https://lore.kernel.org/r/20201224071238.160098-1-marex@denx.de
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Older versions of BSD awk are fussy about the order of '-v' and '-f'
flags, and require a space after the flag name. This causes build
failures on platforms with an old awk, such as macOS and NetBSD.
Since GNU awk and modern versions of BSD awk (distributed with
FreeBSD/OpenBSD) are fine with either form, the definition of
'cmd_unroll' can be trivially tweaked to let the lib/raid6 Makefile
work with both old and new awk flag dialects.
Signed-off-by: John Millikin <john@john-millikin.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Changes the final fallback path in the ncurses locator for mconf
to support host compilers with a non-default sysroot.
This is similar to the hardcoded search for ncurses under
'/usr/include', but can support compilers that keep their default
header and library directories elsewhere.
For nconfig, do nothing because the only vendor compiler I'm aware
of with this layout (Apple Clang) ships an ncurses version that's too
old for nconfig anyway.
Signed-off-by: John Millikin <john@john-millikin.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
When CRC32 is disabled, zonefs cannot be linked:
ld: fs/zonefs/super.o: in function `zonefs_fill_super':
Add a Kconfig 'select' statement for it.
Fixes: 8dcc1a9d90 ("fs: New zonefs file system")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Without CRC32 support, this fails to link:
arm-linux-gnueabi-ld: drivers/lightnvm/pblk-init.o: in function `pblk_init':
pblk-init.c:(.text+0x2654): undefined reference to `crc32_le'
arm-linux-gnueabi-ld: drivers/lightnvm/pblk-init.o: in function `pblk_exit':
pblk-init.c:(.text+0x2a7c): undefined reference to `crc32_le'
Fixes: a4bd217b43 ("lightnvm: physical block device (pblk) target")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Without crc32, the driver fails to link:
arm-linux-gnueabi-ld: drivers/block/rsxx/config.o: in function `rsxx_load_config':
config.c:(.text+0x124): undefined reference to `crc32_le'
Fixes: 8722ff8cdb ("block: IBM RamSan 70/80 device driver")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Indicated by AML code in ACPI table, the touchpad in-use could be found
on two possible slave addresses on &i2c3, i.e. hid@15 and hid@2c. And
which one is in-use can be determined by reading another address on the
I2C bus. Unfortunately, for DT boot, there is currently no support in
firmware to make this check and patch DT accordingly. This results in
a non-functional touchpad on those C630 devices with hid@2c.
As i2c-hid driver will stop probing the device if there is nothing on
the slave address, we can actually keep both devices enabled in DT, and
i2c-hid driver will only probe the existing one. The only problem is
that we cannot set up pinctrl in both device nodes, as two devices with
the same pinctrl will cause pin conflict that makes the second device
fail to probe. Let's move the pinctrl state up to parent node to solve
this problem. As the pinctrl state of parent node is already defined in
sdm845.dtsi, it ends up with overwriting pinctrl-0 with i2c3_hid_active
state added in there.
Fixes: 11d0e4f281 ("arm64: dts: qcom: c630: Polish i2c-hid devices")
Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Link: https://lore.kernel.org/r/20210102045940.26874-1-shawn.guo@linaro.org
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
The kernel test robot reports the following warning in [1]:
drivers/gpio/gpiolib-cdev.c: In function 'gpio_ioctl':
>>drivers/gpio/gpiolib-cdev.c:1437:1: warning: the frame size of 1040 bytes is larger than 1024 bytes [-Wframe-larger-than=]
Refactor gpio_ioctl() to handle each ioctl in its own helper function
and so reduce the variables stored on the stack to those explicitly
required to service the ioctl at hand.
The lineinfo_get_v1() helper handles both the GPIO_GET_LINEINFO_IOCTL
and GPIO_GET_LINEINFO_WATCH_IOCTL, as per the corresponding v2
implementation - lineinfo_get().
[1] https://lore.kernel.org/lkml/202012270910.VW3qc1ER-lkp@intel.com/
Fixes: aad955842d ("gpiolib: cdev: support GPIO_V2_GET_LINEINFO_IOCTL and GPIO_V2_GET_LINEINFO_WATCH_IOCTL")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Kent Gibson <warthog618@gmail.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
BOSS RC-505 (shown by lsusb as "Roland Corp. RC-505") does require the
same quirk as these other BOSS devices.
Without this quirk it is neither possible to capture audio from nor to
write audio to the RC-505. Both just result in an empty audio
stream. With these changes both capture and playback seem to work
quite fine. MIDI funtionality was not tested.
Tested-by: Harry Reinold <harry.reinold@posteo.de>
Signed-off-by: Timon Reinold <tirei@agon.one>
Link: https://lore.kernel.org/r/20210102210835.21268-1-tirei@agon.one
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Latest bpf tree has a bug for bpf_iter selftest:
$ ./test_progs -n 4/25
test_bpf_sk_storage_get:PASS:bpf_iter_bpf_sk_storage_helpers__open_and_load 0 nsec
test_bpf_sk_storage_get:PASS:socket 0 nsec
...
do_dummy_read:PASS:read 0 nsec
test_bpf_sk_storage_get:FAIL:bpf_map_lookup_elem map value wasn't set correctly
(expected 1792, got -1, err=0)
#4/25 bpf_sk_storage_get:FAIL
#4 bpf_iter:FAIL
Summary: 0/0 PASSED, 0 SKIPPED, 2 FAILED
When doing merge conflict resolution, Commit 4bfc471484 missed to
save curr_task to seq_file private data. The task pointer in seq_file
private data is passed to bpf program. This caused NULL-pointer task
passed to bpf program which will immediately return upon checking
whether task pointer is NULL.
This patch added back the assignment of curr_task to seq_file private
data and fixed the issue.
Fixes: 4bfc471484 ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf")
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20201231052418.577024-1-yhs@fb.com
Pavel reports that commit 17858b140b ("crypto: ecdh - avoid unaligned
accesses in ecdh_set_secret()") fixes one problem but introduces another:
the unconditional memcpy() introduced by that commit may overflow the
target buffer if the source data is invalid, which could be the result of
intentional tampering.
So check params.key_size explicitly against the size of the target buffer
before validating the key further.
Fixes: 17858b140b ("crypto: ecdh - avoid unaligned accesses in ecdh_set_secret()")
Reported-by: Pavel Machek <pavel@denx.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Commit 86cd97ec4b ("crypto: arm/chacha-neon - optimize for non-block
size multiples") refactored the chacha block handling in the glue code in
a way that may result in the counter increment to be omitted when calling
chacha_block_xor_neon() to process a full block. This violates the skcipher
API, which requires that the output IV is suitable for handling more input
as long as the preceding input has been presented in round multiples of the
block size. Also, the same code is exposed via the chacha library interface
whose callers may actually rely on this increment to occur even for final
blocks that are smaller than the chacha block size.
So increment the counter after calling chacha_block_xor_neon().
Fixes: 86cd97ec4b ("crypto: arm/chacha-neon - optimize for non-block size multiples")
Reported-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
According to the st1232 datasheet, the host has to wait for the device
to change into Normal state before accessing registers other than the
Status Register.
If the reset GPIO is wired, the device is powered on during driver
probe, just before reading the resolution. However, the latter may
happen before the device is ready, leading to a probe failure:
st1232-ts 1-0055: Failed to read resolution: -6
Fix this by waiting until the device is ready, by trying to read the
Status Register until it indicates so, or until timeout.
On Armadillo 800 EVA, typically the first read fails with an I2C
transfer error, while the second read indicates the device is ready.
Fixes: 3a54a21541 ("Input: st1232 - add support resolution reading")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/20201229162601.2154566-4-geert+renesas@glider.be
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
st1232_ts_read_data() already reads ts->read_buf_len bytes (8 or 20
bytes) from the touchscreen controller. This was fine when it was used
to read touch point coordinates only, but is overkill for reading the
touchscreen resolution, which just needs 3 bytes.
Optimize transfers by passing the wanted number of bytes.
Fixes: 3a54a21541 ("Input: st1232 - add support resolution reading")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/20201229162601.2154566-3-geert+renesas@glider.be
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Before, the maximum coordinates were fixed to (799, 479) or (319, 479),
depending on touchscreen controller type. The driver was changed to
read the actual values from the touchscreen controller, but did not take
into account the returned values are not the maximum coordinates, but
the touchscreen resolution (e.g. 800 and 480).
Fix this by subtracting 1.
Fixes: 3a54a21541 ("Input: st1232 - add support resolution reading")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/20201229162601.2154566-2-geert+renesas@glider.be
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
During the process of converting the documentation to reST, some links
were converted using the following wrong syntax (and sometimes using %20
instead of spaces):
`Display text <#section-name-in-html>`__
This syntax isn't valid according to the docutils' spec [1], but more
importantly, it is specific to HTML, since it uses '#' to link to an
HTML anchor.
The right syntax would instead use a docutils hyperlink reference as the
embedded URI to point to the section [2], that is:
`Display text <Section Name_>`__
This syntax works in both HTML and PDF.
The LaTeX toolchain doesn't mind the HTML anchor syntax when generating
the pdf documentation (make pdfdocs), that is, the build succeeds but
the links don't work, but that syntax causes errors when trying to build
using the not-yet-merged rst2pdf:
ValueError: format not resolved, probably missing URL scheme or undefined destination target for 'Forcing%20Quiescent%20States'
So, use the correct syntax in order to have it work in all different
output formats.
[1]: https://docutils.sourceforge.io/docs/ref/rst/restructuredtext.html#reference-names
[2]: https://docutils.sourceforge.io/docs/ref/rst/restructuredtext.html#embedded-uris-and-aliases
Fixes: ccc9971e21 ("docs: rcu: convert some articles from html to ReST")
Fixes: c8cce10a62 ("docs: Fix the reference labels in Locking.rst")
Fixes: e548cdeffc ("docs-rst: convert kernel-locking to ReST")
Fixes: 7ddedebb03 ("ALSA: doc: ReSTize writing-an-alsa-driver document")
Signed-off-by: Nícolas F. R. A. Prado <nfraprado@protonmail.com>
Reviewed-by: Takashi Iwai <tiwai@suse.de>
Reviewed-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/20201228144537.135353-1-nfraprado@protonmail.com
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Clang warns:
../drivers/i3c/master/mipi-i3c-hci/core.c:780:21: warning: attribute
declaration must precede definition [-Wignored-attributes]
static const struct __maybe_unused of_device_id i3c_hci_of_match[] = {
^
../include/linux/compiler_attributes.h:267:56: note: expanded from macro
'__maybe_unused'
#define __maybe_unused __attribute__((__unused__))
^
../include/linux/mod_devicetable.h:262:8: note: previous definition is
here
struct of_device_id {
^
1 warning generated.
'struct of_device_id' should not be split, as it is a type. Move the
__maybe_unused attribute after the static and const qualifiers so that
there are no warnings about this variable, period.
Fixes: 95393f3e07 ("i3c/master/mipi-i3c-hci: quiet maybe-unused variable warning")
Link: https://github.com/ClangBuiltLinux/linux/issues/1221
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Acked-by: Nicolas Pitre <npitre@baylibre.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Link: https://lore.kernel.org/r/20201222025931.3043480-1-natechancellor@gmail.com
Although not a problem right now, it flared up while working
on some other aspects of the code-base. Remove the useless
semicolon.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Since commit 4466bf8282 ("efi/apple-properties: use
PROPERTY_ENTRY_U8_ARRAY_LEN"), my MacBook Pro issues a -ENODATA error
when trying to assign EFI properties to the discrete GPU:
pci 0000:01:00.0: assigning 56 device properties
pci 0000:01:00.0: error -61 assigning properties
That's because some of the properties have no value. They're booleans
whose presence can be checked by drivers, e.g. "use-backlight-blanking".
Commit 6e98503dba ("efi/apple-properties: Remove redundant attribute
initialization from unmarshal_key_value_pairs()") employed a trick to
store such booleans as u8 arrays (which is the data type used for all
other EFI properties on Macs): It cleared the property_entry's
"is_array" flag, thereby denoting that the value is stored inline in the
property_entry.
Commit 4466bf8282 erroneously removed that trick. It was probably a
little fragile to begin with.
Reinstate support for boolean properties by explicitly invoking the
PROPERTY_ENTRY_BOOL() initializer for properties with zero-length value.
Fixes: 4466bf8282 ("efi/apple-properties: use PROPERTY_ENTRY_U8_ARRAY_LEN")
Cc: <stable@vger.kernel.org>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Link: https://lore.kernel.org/r/be958bda75331a011d53c696d1deec8dccd06fd2.1609388549.git.lukas@wunner.de
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Oded writes:
This tag contains the following fixes for 5.11-rc2:
- Fixes that are needed for supporting the new F/W with security features:
- Correctly fetch PLL information in GOYA when security is enabled in F/W
- Fix hard-reset support when F/W is in its preboot stage
- Disable clock gating when initializing the H/W
- Fix hard-reset procedure
- Fix PCI controller initialization
- Remove setting of Engine-Barrier in collective wait operations. This
barrier created a drop in performance
- Retry loading the TPC firmware in case of EINTR during loading
- Fix CS counters
- Register to PCI shutdown callback to fix handling of VM shutdown
- Fix order of status check
- Fix memory leak in reset procedure
- Fix and add comments and fix indentations
* tag 'misc-habanalabs-fixes-2020-12-30' of https://git.kernel.org/pub/scm/linux/kernel/git/ogabbay/linux:
habanalabs: Fix memleak in hl_device_reset
habanalabs: fix order of status check
habanalabs: register to pci shutdown callback
habanalabs: add validation cs counter, fix misplaced counters
habanalabs/gaudi: retry loading TPC f/w on -EINTR
habanalabs: adjust pci controller init to new firmware
habanalabs: update comment in hl_boot_if.h
habanalabs/gaudi: enhance reset message
habanalabs: full FW hard reset support
habanalabs/gaudi: disable CGM at HW initialization
habanalabs: Revise comment to align with mirror list name
habanalabs/gaudi: do not set EB in collective slave queues
habanalabs: preboot hard reset support
habanalabs: remove generic gaudi get_pll_freq function
habanalabs: fetch PSOC PLL frequency from F/W in goya
habanalabs: add comment for pll frequency ioctl opcode
habanalabs: Fix a missing-braces warning
The dummy-hcd driver was written under the assumption that all the
parameters in URBs sent to its root hub would be valid. With URBs
sent from userspace via usbfs, that assumption can be violated.
In particular, the driver doesn't fully check the port-feature values
stored in the wValue entry of Clear-Port-Feature and Set-Port-Feature
requests. Values that are too large can cause the driver to perform
an invalid left shift of more than 32 bits. Ironically, two of those
left shifts are unnecessary, because they implement Set-Port-Feature
requests that hubs are not required to support, according to section
11.24.2.13 of the USB-2.0 spec.
This patch adds the appropriate checks for the port feature selector
values and removes the unnecessary feature settings. It also rejects
requests to set the TEST feature or to set or clear the INDICATOR and
C_OVERCURRENT features, as none of these are relevant to dummy-hcd's
root-hub emulation.
CC: <stable@vger.kernel.org>
Reported-and-tested-by: syzbot+5925509f78293baa7331@syzkaller.appspotmail.com
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Link: https://lore.kernel.org/r/20201230162044.GA727759@rowland.harvard.edu
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
With MAX_PWM being defined to 255 the code
unsigned long period;
...
period = ctx->pwm->args.period;
state.duty_cycle = DIV_ROUND_UP(pwm * (period - 1), MAX_PWM);
calculates a too small value for duty_cycle if the configured period is
big (either by discarding the 64 bit value ctx->pwm->args.period or by
overflowing the multiplication). As this results in a too slow fan and
so maybe an overheating machine better be safe than sorry and error out
in .probe.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Link: https://lore.kernel.org/r/20201215092031.152243-1-u.kleine-koenig@pengutronix.de
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
If spi->controller->max_speed_hz is zero, a non-zero spi->max_speed_hz
will be overwritten by zero. Make sure spi->controller->max_speed_hz
is not zero when clamping spi->max_speed_hz.
Put the spi->controller->max_speed_hz non-zero check higher in the if,
so that we avoid a superfluous init to zero when both spi->max_speed_hz
and spi->controller->max_speed_hz are zero.
Fixes: 9326e4f1e5 ("spi: Limit the spi device max speed to controller's max speed")
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Suggested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/20201216092321.413262-1-tudor.ambarus@microchip.com
Signed-off-by: Mark Brown <broonie@kernel.org>
The __init annotations on hyp_cpu_pm_{init,exit} are obviously incorrect,
and the build system shouts at you if you enable DEBUG_SECTION_MISMATCH.
Nothing really bad happens as we never execute that code outside of the
init context, but we can't label the callers as __int either, as kvm_init
isn't __init itself. Oh well.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Link: https://lore.kernel.org/r/20201223120854.255347-1-maz@kernel.org
With the arrival of
commit 2fee958319 ("spi: dt-bindings: clarify CS behavior for spi-cs-high and gpio descriptors")
it was clarified what the proper state for cs-gpios should be, even if the
flag is ignored. The driver code is doing the right thing since
766c6b63aa ("spi: fix client driver breakages when using GPIO descriptors")
The chip-select of the td028ttec1 panel is active-low, so we must omit spi-cs-high;
attribute (already removed by separate patch) and should now use GPIO_ACTIVE_LOW for
the client device description to be fully consistent.
Fixes: 766c6b63aa ("spi: fix client driver breakages when using GPIO descriptors")
CC: stable@vger.kernel.org
Signed-off-by: H. Nikolaus Schaller <hns@goldelico.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
We have rst_map_012 used for various accelerators like dsp, ipu and iva.
For these use cases, we have rstctrl bit 2 control the subsystem module
reset, and have and bits 0 and 1 control the accelerator specific
features.
If the bootloader, or kexec boot, has left any accelerator specific
reset bits deasserted, deasserting bit 2 reset will potentially enable
an accelerator with unconfigured MMU and no firmware. And we may get
spammed with a lot by warnings on boot with "Data Access in User mode
during Functional access", or depending on the accelerator, the system
can also just hang.
This issue can be quite easily reproduced by setting a rst_map_012 type
rstctrl register to 0 or 4 in the bootloader, and booting the system.
Let's just assert all reset bits for rst_map_012 type resets. So far
it looks like the other rstctrl types don't need this. If it turns out
that the other type rstctrl bits also need reset on init, we need to
add an instance specific reset mask for the bits to avoid resetting
unwanted bits.
Reported-by: Carl Philipp Klemm <philipp@uvos.xyz>
Cc: Philipp Zabel <p.zabel@pengutronix.de>
Cc: Santosh Shilimkar <ssantosh@kernel.org>
Cc: Suman Anna <s-anna@ti.com>
Cc: Tero Kristo <t-kristo@ti.com>
Tested-by: Carl Philipp Klemm <philipp@uvos.xyz>
Signed-off-by: Tony Lindgren <tony@atomide.com>
When kzalloc() fails, we should execute hl_mmu_fini()
to release the MMU module. It's the same when
hl_ctx_init() fails.
Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
get_pages is only called in a locked context. Add a WARN_ON to make sure
it stays that way.
Signed-off-by: Iskren Chernev <iskren.chernev@gmail.com>
Signed-off-by: Rob Clark <robdclark@chromium.org>
When CONFIG_BPF_LSM is not configured, running bpf selftesting will show
BPF_F_BPRM_SECUREEXEC undefined error for bprm_opts.c.
The problem is that bprm_opts.c includes vmliunx.h. The vmlinux.h is
generated by "bpftool btf dump file ./vmlinux format c". On the other
hand, BPF_F_BPRM_SECUREEXEC is defined in include/uapi/linux/bpf.h
and used only in bpf_lsm.c. When CONFIG_BPF_LSM is not set, bpf_lsm
will not be compiled, so vmlinux.h will not include definition of
BPF_F_BPRM_SECUREEXEC.
Ideally, we want to compile bpf selftest regardless of the configuration
setting, so change the include file from vmlinux.h to bpf.h.
Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20201224011242.585967-1-jiang.wang@bytedance.com
This patch fixes the return value for altera_spi_txrx. It should return
1 for interrupt transfer mode, and return 0 for polling transfer mode.
The altera_spi_txrx() implements the spi_controller.transfer_one
callback. According to the spi-summary.rst, the transfer_one should
return 0 when transfer is finished, return 1 when transfer is still in
progress.
Signed-off-by: Xu Yilun <yilun.xu@intel.com>
Link: https://lore.kernel.org/r/1609219662-27057-2-git-send-email-yilun.xu@intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
After initializing the regmap through
syscon_regmap_lookup_by_compatible, then regmap_attach_dev to the
device, because the debugfs_name has been allocated, there is no
need to redistribute it again
unreferenced object 0xd8399b80 (size 64):
comm "swapper/0", pid 1, jiffies 4294937641 (age 278.590s)
hex dump (first 32 bytes):
64 75 6d 6d 79 2d 69 6f 6d 75 78 63 2d 67 70 72
dummy-iomuxc-gpr
40 32 30 65 34 30 30 30 00 7f 52 5b d8 7e 42 69
@20e4000..R[.~Bi
backtrace:
[<ca384d6f>] kasprintf+0x2c/0x54
[<6ad3bbc2>] regmap_debugfs_init+0xdc/0x2fc
[<bc4181da>] __regmap_init+0xc38/0xd88
[<1f7e0609>] of_syscon_register+0x168/0x294
[<735e8766>] device_node_get_regmap+0x6c/0x98
[<d96c8982>] imx6ul_init_machine+0x20/0x88
[<0456565b>] customize_machine+0x1c/0x30
[<d07393d8>] do_one_initcall+0x80/0x3ac
[<7e584867>] kernel_init_freeable+0x170/0x1f0
[<80074741>] kernel_init+0x8/0x120
[<285d6f28>] ret_from_fork+0x14/0x20
[<00000000>] 0x0
Fixes: 9b947a13e7 ("regmap: use debugfs even when no device")
Signed-off-by: Xiaolei Wang <xiaolei.wang@windriver.com>
Link: https://lore.kernel.org/r/20201229105046.41984-1-xiaolei.wang@windriver.com
Signed-off-by: Mark Brown <broonie@kernel.org>
The cdns3 core device is populated by calling of_platform_populate,
the flag OF_POPULATED is set for core device node, if this flag
is not cleared, when calling of_platform_populate the second time
after loading parent module again, the OF code will not try to create
platform device for core device.
To fix it, it uses of_platform_depopulate to depopulate the core
device which the parent created, and the flag OF_POPULATED for
core device node will be cleared accordingly.
Cc: <stable@vger.kernel.org>
Fixes: 1e056efab9 ("usb: cdns3: add NXP imx8qm glue layer")
Signed-off-by: Peter Chen <peter.chen@nxp.com>
Currently drivers/phy/ingenic/Makefile adds phy-ingenic-usb to targets
not depending on actual Kconfig symbol CONFIG_PHY_INGENIC_USB, so this
driver always gets built[-in] on every system.
Add missing dependency.
Fixes: 31de313dfd ("PHY: Ingenic: Add USB PHY driver using generic PHY framework.")
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Link: https://lore.kernel.org/r/20201222131021.4751-1-alobakin@pm.me
Signed-off-by: Vinod Koul <vkoul@kernel.org>
If the dw_edma_alloc_burst() function fails then we free "chunk" but
it's still on the "desc->chunk->list" list so it will lead to a use
after free. Also the "->chunks_alloc" count is incremented when it
shouldn't be.
In current kernels small allocations are guaranteed to succeed and
dw_edma_alloc_burst() can't fail so this will not actually affect
runtime.
Fixes: e63d79d1ff ("dmaengine: Add Synopsys eDMA IP core driver")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Gustavo Pimentel <gustavo.pimentel@synopsys.com>
Link: https://lore.kernel.org/r/X9dTBFrUPEvvW7qc@mwanda
Signed-off-by: Vinod Koul <vkoul@kernel.org>
drivers/dma/qcom/gpi.c:1419:3: warning: format '%lu' expects argument of
type 'long unsigned int', but argument 8 has type 'size_t {aka unsigned
int}' [-Wformat=]
drivers/dma/qcom/gpi.c:1427:31: warning: format '%lu' expects argument of
type 'long unsigned int', but argument 3 has type 'size_t {aka unsigned
int}' [-Wformat=]
drivers/dma/qcom/gpi.c:1447:3: warning: format '%llx' expects argument of
type 'long long unsigned int', but argument 4 has type 'dma_addr_t {aka
unsigned int}' [-Wformat=]
drivers/dma/qcom/gpi.c:1447:3: warning: format '%llx' expects argument of
type 'long long unsigned int', but argument 5 has type 'phys_addr_t {aka
unsigned int}' [-Wformat=]
Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20201218104137.59200-1-nixiaoming@huawei.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Daniel Borkmann says:
====================
pull-request: bpf 2020-12-28
The following pull-request contains BPF updates for your *net* tree.
There is a small merge conflict between bpf tree commit 69ca310f34
("bpf: Save correct stopping point in file seq iteration") and net tree
commit 66ed594409 ("bpf/task_iter: In task_file_seq_get_next use
task_lookup_next_fd_rcu"). The get_files_struct() does not exist anymore
in net, so take the hunk in HEAD and add the `info->tid = curr_tid` to
the error path:
[...]
curr_task = task_seq_get_next(ns, &curr_tid, true);
if (!curr_task) {
info->task = NULL;
info->tid = curr_tid;
return NULL;
}
/* set info->task and info->tid */
[...]
We've added 10 non-merge commits during the last 9 day(s) which contain
a total of 11 files changed, 75 insertions(+), 20 deletions(-).
The main changes are:
1) Various AF_XDP fixes such as fill/completion ring leak on failed bind and
fixing a race in skb mode's backpressure mechanism, from Magnus Karlsson.
2) Fix latency spikes on lockdep enabled kernels by adding a rescheduling
point to BPF hashtab initialization, from Eric Dumazet.
3) Fix a splat in task iterator by saving the correct stopping point in the
seq file iteration, from Jonathan Lemon.
4) Fix BPF maps selftest by adding retries in case hashtab returns EBUSY
errors on update/deletes, from Andrii Nakryiko.
5) Fix BPF selftest error reporting to something more user friendly if the
vmlinux BTF cannot be found, from Kamal Mostafa.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
ppp_cp_event is called directly or indirectly by ppp_rx with "ppp->lock"
held. It may call mod_timer to add a new timer. However, at the same time
ppp_timer may be already running and waiting for "ppp->lock". In this
case, there's no need for ppp_timer to continue running and it can just
exit.
If we let ppp_timer continue running, it may call add_timer. This causes
kernel panic because add_timer can't be called with a timer pending.
This patch fixes this problem.
Fixes: e022c2f07a ("WAN: new synchronous PPP implementation for generic HDLC.")
Cc: Krzysztof Halasa <khc@pm.waw.pl>
Signed-off-by: Xie He <xie.he.0141@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This was tested on a RaptorCS Talos II with IBM POWER9 DD2.2 CPUs and an
ASUS XG-C100F PCI-e card without any issue. Speeds of ~8Gbps could be
attained with not-very-scientific (wget HTTP) both-ways measurements on
a local network. No warning or error reported in kernel logs. The
drivers seems to be portable enough for it not to be gated like such.
Signed-off-by: Léo Le Bouter <lle-bout@zaclys.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The function skb_copy() could return NULL, the return value
need to be checked.
Fixes: b5996f11ea ("net: add Hisilicon Network Subsystem basic ethernet support")
Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
pneigh_enqueue() tries to obtain a random delay by mod
NEIGH_VAR(p, PROXY_DELAY). However, NEIGH_VAR(p, PROXY_DELAY)
migth be zero at that point because someone could write zero
to /proc/sys/net/ipv4/neigh/[device]/proxy_delay after the
callers check it.
This patch uses prandom_u32_max() to get a random delay instead
which avoids potential division by zero.
Signed-off-by: weichenchen <weichen.chen@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
RT_TOS() only clears one of the ECN bits. Therefore, when
fib_compute_spec_dst() resorts to a fib lookup, it can return
different results depending on the value of the second ECN bit.
For example, ECT(0) and ECT(1) packets could be treated differently.
$ ip netns add ns0
$ ip netns add ns1
$ ip link add name veth01 netns ns0 type veth peer name veth10 netns ns1
$ ip -netns ns0 link set dev lo up
$ ip -netns ns1 link set dev lo up
$ ip -netns ns0 link set dev veth01 up
$ ip -netns ns1 link set dev veth10 up
$ ip -netns ns0 address add 192.0.2.10/24 dev veth01
$ ip -netns ns1 address add 192.0.2.11/24 dev veth10
$ ip -netns ns1 address add 192.0.2.21/32 dev lo
$ ip -netns ns1 route add 192.0.2.10/32 tos 4 dev veth10 src 192.0.2.21
$ ip netns exec ns1 sysctl -wq net.ipv4.icmp_echo_ignore_broadcasts=0
With TOS 4 and ECT(1), ns1 replies using source address 192.0.2.21
(ping uses -Q to set all TOS and ECN bits):
$ ip netns exec ns0 ping -c 1 -b -Q 5 192.0.2.255
[...]
64 bytes from 192.0.2.21: icmp_seq=1 ttl=64 time=0.544 ms
But with TOS 4 and ECT(0), ns1 replies using source address 192.0.2.11
because the "tos 4" route isn't matched:
$ ip netns exec ns0 ping -c 1 -b -Q 6 192.0.2.255
[...]
64 bytes from 192.0.2.11: icmp_seq=1 ttl=64 time=0.597 ms
After this patch the ECN bits don't affect the result anymore:
$ ip netns exec ns0 ping -c 1 -b -Q 6 192.0.2.255
[...]
64 bytes from 192.0.2.21: icmp_seq=1 ttl=64 time=0.591 ms
Fixes: 35ebf65e85 ("ipv4: Create and use fib_compute_spec_dst() helper.")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patches that introduced NT_FILE and NT_SIGINFO notes back in 2012
had taken care of native (fs/binfmt_elf.c) and compat (fs/compat_binfmt_elf.c)
coredumps; unfortunately, compat on mips (which does not go through the
usual compat_binfmt_elf.c) had not been noticed.
As the result, both N32 and O32 coredumps on 64bit mips kernels
have those sections malformed enough to confuse the living hell out of
all gdb and readelf versions (up to and including the tip of binutils-gdb.git).
Longer term solution is to make both O32 and N32 compat use the
regular compat_binfmt_elf.c, but that's too much for backports. The minimal
solution is to do in arch/mips/kernel/binfmt_elf[on]32.c the same thing
those patches have done in fs/compat_binfmt_elf.c
Cc: stable@kernel.org # v3.7+
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
The compressed payload is not necesarily 4-byte aligned, at least when
compiling with Clang. In that case, the 4-byte value appended to the
compressed payload that corresponds to the uncompressed kernel image
size must be read using get_unaligned_le32().
This fixes Clang-built kernels not booting on MIPS (tested on a Ingenic
JZ4770 board).
Fixes: b8f54f2cde ("MIPS: ZBOOT: copy appended dtb to the end of the kernel")
Cc: <stable@vger.kernel.org> # v4.7
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Alex Elder says:
====================
net: ipa: fix some new build warnings
I got a super friendly message from the Intel kernel test robot that
pointed out that two patches I posted last week caused new build
warnings. I already had these problems fixed in my own tree but
the fix was not included in what I sent out last week.
====================
Link: https://lore.kernel.org/r/20201226213737.338928-1-elder@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Callers of evt_ring_command() no longer care whether the command
times out, and don't use what evt_ring_command() returns. Redefine
that function to have void return type.
Reported-by: kernel test robot <lkp@intel.com>
Fixes: 428b448ee7 ("net: ipa: use state to determine event ring command success")
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Callers of gsi_channel_command() no longer care whether the command
times out, and don't use what gsi_channel_command() returns. Redefine
that function to have void return type.
Reported-by: kernel test robot <lkp@intel.com>
Fixes: 6ffddf3b3d ("net: ipa: use state to determine channel command success")
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
TQM rings are hardware resources that require host context memory
managed by the driver. The driver supports up to 9 TQM rings and
the number of rings to use is requested by firmware during run-time.
Cap this number to the maximum supported to prevent accessing beyond
the array. Future firmware may request more than 9 TQM rings. Define
macros to remove the magic number 9 from the C code.
Fixes: ac3158cb01 ("bnxt_en: Allocate TQM ring context memory according to fw specification.")
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
A recent change skips sending firmware messages to the firmware when
pci_channel_offline() is true during fatal AER error. To make this
complete, we need to move the re-initialization sequence to
bnxt_io_resume(), otherwise the firmware messages to re-initialize
will all be skipped. In any case, it is more correct to re-initialize
in bnxt_io_resume().
Also, fix the reverse x-mas tree format when defining variables
in bnxt_io_slot_reset().
Fixes: b340dc680e ("bnxt_en: Avoid sending firmware messages when AER error is detected.")
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2020-12-23
Commit e086ba2fcc ("e1000e: disable s0ix entry and exit flows for ME
systems") disabled S0ix flows for systems that have various incarnations of
the i219-LM ethernet controller. This was done because of some regressions
caused by an earlier commit 632fbd5eb5 ("e1000e: fix S0ix flows for
cable connected case") with i219-LM controller.
Per discussion with Intel architecture team this direction should be
changed and allow S0ix flows to be used by default. This patch series
includes directional changes for their conclusions in
https://lkml.org/lkml/2020/12/13/15.
* '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
e1000e: Export S0ix flags to ethtool
Revert "e1000e: disable s0ix entry and exit flows for ME systems"
e1000e: bump up timeout to wait when ME un-configures ULP mode
e1000e: Only run S0ix flows if shutdown succeeded
====================
Link: https://lore.kernel.org/r/20201223233625.92519-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently the tun_napi_alloc_frags() function returns -ENOMEM when the
number of iovs exceeds MAX_SKB_FRAGS + 1. However this is inappropriate,
we should use -EMSGSIZE instead of -ENOMEM.
The following distinctions are matters:
1. the caller need to drop the bad packet when -EMSGSIZE is returned,
which means meeting a persistent failure.
2. the caller can try again when -ENOMEM is returned, which means
meeting a transient failure.
Fixes: 90e33d4594 ("tun: enable napi_gro_frags() for TUN/TAP driver")
Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Link: https://lore.kernel.org/r/1608864736-24332-1-git-send-email-wangyunjian@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The CPTS driver registers PTP PHC clock when first netif is going up and
unregister it when all netif are down. Now ethtool will show:
- PTP PHC clock index 0 after boot until first netif is up;
- the last assigned PTP PHC clock index even if PTP PHC clock is not
registered any more after all netifs are down.
This patch ensures that -1 is returned by ethtool when PTP PHC clock is not
registered any more.
Fixes: 8a2c9a5ab4 ("net: ethernet: ti: cpts: rework initialization/deinitialization")
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Link: https://lore.kernel.org/r/20201224162405.28032-1-grygorii.strashko@ti.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Antoine Tenart says:
====================
net-sysfs: fix race conditions in the xps code
This series fixes race conditions in the xps code, where out of bound
accesses can occur when dev->num_tc is updated, triggering oops. The
root cause is linked to locking issues. An explanation is given in each
of the commit logs.
We had a discussion on the v1 of this series about using the xps_map
mutex instead of the rtnl lock. While that seemed a better compromise,
v2 showed the added complexity wasn't best for fixes. So we decided to
go back to v1 and use the rtnl lock.
Because of this, the only differences between v1 and v3 are improvements
in the commit messages.
====================
Link: https://lore.kernel.org/r/20201223212323.3603139-1-atenart@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Accesses to dev->xps_rxqs_map (when using dev->num_tc) should be
protected by the rtnl lock, like we do for netif_set_xps_queue. I didn't
see an actual bug being triggered, but let's be safe here and take the
rtnl lock while accessing the map in sysfs.
Fixes: 8af2c06ff4 ("net-sysfs: Add interface for Rx queue(s) map per Tx queue")
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Two race conditions can be triggered when storing xps rxqs, resulting in
various oops and invalid memory accesses:
1. Calling netdev_set_num_tc while netif_set_xps_queue:
- netif_set_xps_queue uses dev->tc_num as one of the parameters to
compute the size of new_dev_maps when allocating it. dev->tc_num is
also used to access the map, and the compiler may generate code to
retrieve this field multiple times in the function.
- netdev_set_num_tc sets dev->tc_num.
If new_dev_maps is allocated using dev->tc_num and then dev->tc_num
is set to a higher value through netdev_set_num_tc, later accesses to
new_dev_maps in netif_set_xps_queue could lead to accessing memory
outside of new_dev_maps; triggering an oops.
2. Calling netif_set_xps_queue while netdev_set_num_tc is running:
2.1. netdev_set_num_tc starts by resetting the xps queues,
dev->tc_num isn't updated yet.
2.2. netif_set_xps_queue is called, setting up the map with the
*old* dev->num_tc.
2.3. netdev_set_num_tc updates dev->tc_num.
2.4. Later accesses to the map lead to out of bound accesses and
oops.
A similar issue can be found with netdev_reset_tc.
One way of triggering this is to set an iface up (for which the driver
uses netdev_set_num_tc in the open path, such as bnx2x) and writing to
xps_rxqs in a concurrent thread. With the right timing an oops is
triggered.
Both issues have the same fix: netif_set_xps_queue, netdev_set_num_tc
and netdev_reset_tc should be mutually exclusive. We do that by taking
the rtnl lock in xps_rxqs_store.
Fixes: 8af2c06ff4 ("net-sysfs: Add interface for Rx queue(s) map per Tx queue")
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Accesses to dev->xps_cpus_map (when using dev->num_tc) should be
protected by the rtnl lock, like we do for netif_set_xps_queue. I didn't
see an actual bug being triggered, but let's be safe here and take the
rtnl lock while accessing the map in sysfs.
Fixes: 184c449f91 ("net: Add support for XPS with QoS via traffic classes")
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Two race conditions can be triggered when storing xps cpus, resulting in
various oops and invalid memory accesses:
1. Calling netdev_set_num_tc while netif_set_xps_queue:
- netif_set_xps_queue uses dev->tc_num as one of the parameters to
compute the size of new_dev_maps when allocating it. dev->tc_num is
also used to access the map, and the compiler may generate code to
retrieve this field multiple times in the function.
- netdev_set_num_tc sets dev->tc_num.
If new_dev_maps is allocated using dev->tc_num and then dev->tc_num
is set to a higher value through netdev_set_num_tc, later accesses to
new_dev_maps in netif_set_xps_queue could lead to accessing memory
outside of new_dev_maps; triggering an oops.
2. Calling netif_set_xps_queue while netdev_set_num_tc is running:
2.1. netdev_set_num_tc starts by resetting the xps queues,
dev->tc_num isn't updated yet.
2.2. netif_set_xps_queue is called, setting up the map with the
*old* dev->num_tc.
2.3. netdev_set_num_tc updates dev->tc_num.
2.4. Later accesses to the map lead to out of bound accesses and
oops.
A similar issue can be found with netdev_reset_tc.
One way of triggering this is to set an iface up (for which the driver
uses netdev_set_num_tc in the open path, such as bnx2x) and writing to
xps_cpus in a concurrent thread. With the right timing an oops is
triggered.
Both issues have the same fix: netif_set_xps_queue, netdev_set_num_tc
and netdev_reset_tc should be mutually exclusive. We do that by taking
the rtnl lock in xps_cpus_store.
Fixes: 184c449f91 ("net: Add support for XPS with QoS via traffic classes")
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The cdc_ncm driver passes network connection notifications up to
usbnet_link_change(), which is the right place for any logging.
Remove the netdev_info() duplicating this from the driver itself.
This stops devices such as my "TRENDnet USB 10/100/1G/2.5G LAN"
(ID 20f4:e02b) adapter from spamming the kernel log with
cdc_ncm 2-2:2.0 enp0s2u2c2: network connection: connected
messages every 60 msec or so.
Signed-off-by: Roland Dreier <roland@kernel.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/20201224032116.2453938-1-roland@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit c685af1108 ("serial: mvebu-uart: fix tx lost characters") fixed tx
lost characters at low baud rates but started causing tx lost characters
when kernel is going to power off or reboot.
TX_EMP tells us when transmit queue is empty therefore all characters were
transmitted. TX_RDY tells us when CPU can send a new character.
Therefore we need to use different check prior transmitting new character
and different check after all characters were sent.
This patch splits polling code into two functions: wait_for_xmitr() which
waits for TX_RDY and wait_for_xmite() which waits for TX_EMP.
When rebooting A3720 platform without this patch on UART is print only:
[ 42.699�
And with this patch on UART is full output:
[ 39.530216] reboot: Restarting system
Fixes: c685af1108 ("serial: mvebu-uart: fix tx lost characters")
Signed-off-by: Pali Rohár <pali@kernel.org>
Cc: stable <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20201223191931.18343-1-pali@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
With commit 913e4a90b6 ("usb: gadget: f_uac2: finalize wMaxPacketSize according to bandwidth")
wMaxPacketSize is computed dynamically but the value is never reset.
Because of this, the actual maximum packet size can only decrease each time
the audio gadget is instantiated.
Reset the endpoint maximum packet size and mark wMaxPacketSize as dynamic
to solve the problem.
Fixes: 913e4a90b6 ("usb: gadget: f_uac2: finalize wMaxPacketSize according to bandwidth")
Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
Cc: stable <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20201221173531.215169-2-jbrunet@baylibre.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
First of all the commit e0082698b6 ("usb: dwc3: ulpi: conditionally
resume ULPI PHY") introduced the Suspend USB2.0 HS/FS/LS PHY regression,
as by design of the fix any attempt to read/write from/to the PHY control
registers will completely disable the PHY suspension, which consequently
will increase the USB bus power consumption. Secondly the fix won't work
well for the very first attempt of the ULPI PHY control registers IO,
because after disabling the USB2.0 PHY suspension functionality it will
still take some time for the bus to resume from the sleep state if one has
been reached before it. So the very first PHY register read/write
operation will take more time than the busy-loop provides and the IO
timeout error might be returned anyway.
Here we suggest to fix the denoted problems in the following way. First of
all let's not disable the Suspend USB2.0 HS/FS/LS PHY functionality so to
make the controller and the USB2.0 bus more power efficient. Secondly
instead of that we'll extend the PHY IO op wait procedure with 1 - 1.2 ms
sleep if the PHY suspension is enabled (1ms should be enough as by LPM
specification it is at most how long it takes for the USB2.0 bus to resume
from L1 (Sleep) state). Finally in case if the USB2.0 PHY suspension
functionality has been disabled on the DWC USB3 controller setup procedure
we'll compensate the USB bus resume process latency by extending the
busy-loop attempts counter.
Fixes: e0082698b6 ("usb: dwc3: ulpi: conditionally resume ULPI PHY")
Acked-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru>
Link: https://lore.kernel.org/r/20201210085008.13264-4-Sergey.Semin@baikalelectronics.ru
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Originally the procedure of the ULPI transaction finish detection has been
developed as a simple busy-loop with just decrementing counter and no
delays. It's wrong since on different systems the loop will take a
different time to complete. So if the system bus and CPU are fast enough
to overtake the ULPI bus and the companion PHY reaction, then we'll get to
take a false timeout error. Fix this by converting the busy-loop procedure
to take the standard bus speed, address value and the registers access
mode into account for the busy-loop delay calculation.
Here is the way the fix works. It's known that the ULPI bus is clocked
with 60MHz signal. In accordance with [1] the ULPI bus protocol is created
so to spend 5 and 6 clock periods for immediate register write and read
operations respectively, and 6 and 7 clock periods - for the extended
register writes and reads. Based on that we can easily pre-calculate the
time which will be needed for the controller to perform a requested IO
operation. Note we'll still preserve the attempts counter in case if the
DWC USB3 controller has got some internals delays.
[1] UTMI+ Low Pin Interface (ULPI) Specification, Revision 1.1,
October 20, 2004, pp. 30 - 36.
Fixes: 88bc9d194f ("usb: dwc3: add ULPI interface support")
Acked-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru>
Link: https://lore.kernel.org/r/20201210085008.13264-3-Sergey.Semin@baikalelectronics.ru
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In accordance with [1] the DWC_usb3 core sets the GUSB2PHYACCn.VStsDone
bit when the PHY vendor control access is done and clears it when the
application initiates a new transaction. The doc doesn't say anything
about the GUSB2PHYACCn.VStsBsy flag serving for the same purpose. Moreover
we've discovered that the VStsBsy flag can be cleared before the VStsDone
bit. So using the former as a signal of the PHY control registers
completion might be dangerous. Let's have the VStsDone flag utilized
instead then.
[1] Synopsys DesignWare Cores SuperSpeed USB 3.0 xHCI Host Controller
Databook, 2.70a, December 2013, p.388
Fixes: 88bc9d194f ("usb: dwc3: add ULPI interface support")
Acked-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru>
Link: https://lore.kernel.org/r/20201210085008.13264-2-Sergey.Semin@baikalelectronics.ru
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Make sure to always cancel the control URB in write() so that it can be
reused after a timeout or spurious CMD_ACK.
Currently any further write requests after a timeout would fail after
triggering a WARN() in usb_submit_urb() when attempting to submit the
already active URB.
Reported-by: syzbot+e87ebe0f7913f71f2ea5@syzkaller.appspotmail.com
Fixes: 6bc235a2e2 ("USB: add driver for Meywa-Denki & Kayac YUREX")
Cc: stable <stable@vger.kernel.org> # 2.6.37
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix the MTU size issue with RX packet size as the host sends the packet
with extra bytes containing ethernet header. This causes failure when
user sets the MTU size to the maximum i.e. 15412. In this case the
ethernet packet received will be of length 15412 plus the ethernet header
length. This patch fixes the issue where there is a check that RX packet
length must not be more than max packet length.
Fixes: bba787a860 ("usb: gadget: ether: Allow jumbo frames")
Signed-off-by: Manish Narani <manish.narani@xilinx.com>
Cc: stable <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/1605597215-122027-1-git-send-email-manish.narani@xilinx.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There is a spinlock lockup as part of composite_disconnect
when it tries to acquire cdev->lock as part of usb_gadget_deactivate.
This is because the usb_gadget_deactivate is called from
usb_function_deactivate with the same spinlock held.
This would result in the below call stack and leads to stall.
rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
rcu: 3-...0: (1 GPs behind) idle=162/1/0x4000000000000000
softirq=10819/10819 fqs=2356
(detected by 2, t=5252 jiffies, g=20129, q=3770)
Task dump for CPU 3:
task:uvc-gadget_wlhe state:R running task stack: 0 pid: 674 ppid:
636 flags:0x00000202
Call trace:
__switch_to+0xc0/0x170
_raw_spin_lock_irqsave+0x84/0xb0
composite_disconnect+0x28/0x78
configfs_composite_disconnect+0x68/0x70
usb_gadget_disconnect+0x10c/0x128
usb_gadget_deactivate+0xd4/0x108
usb_function_deactivate+0x6c/0x80
uvc_function_disconnect+0x20/0x58
uvc_v4l2_release+0x30/0x88
v4l2_release+0xbc/0xf0
__fput+0x7c/0x230
____fput+0x14/0x20
task_work_run+0x88/0x140
do_notify_resume+0x240/0x6f0
work_pending+0x8/0x200
Fix this by doing an unlock on cdev->lock before the usb_gadget_deactivate
call from usb_function_deactivate.
The same lockup can happen in the usb_gadget_activate path. Fix that path
as well.
Reported-by: Peter Chen <peter.chen@nxp.com>
Link: https://lore.kernel.org/linux-usb/20201102094936.GA29581@b29397-desktop/
Tested-by: Peter Chen <peter.chen@nxp.com>
Signed-off-by: Sriharsha Allenki <sallenki@codeaurora.org>
Cc: stable <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20201202130220.24926-1-sallenki@codeaurora.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The commit 0472bf06c6 ("xhci: Prevent U1/U2 link pm states if exit
latency is too long") was constraining the xhci code not to allow U1/U2
sleep states if the latency to wake up from the U-states reached the
service interval of an periodic endpoint. This fix was not taking into
account that in case the quirk XHCI_INTEL_HOST is set, the wakeup time
will be calculated and configured differently.
It checks for u1_params.mel/u2_params.mel as a limit. But the code could
decide to write another MEL into the hardware. This leads to broken
cases where not enough bandwidth is available for other devices:
usb 1-2: can't set config #1, error -28
This patch is fixing that case by checking for timeout_ns after the
wakeup time was calculated depending on the quirks.
Fixes: 0472bf06c6 ("xhci: Prevent U1/U2 link pm states if exit latency is too long")
Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de>
Cc: stable <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20201215193147.11738-1-m.grzeschik@pengutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Warm reboot scenarios some times type C Mux driver gets Mux configuration
request as HPD=1,IRQ=1. In that scenario typeC Mux driver need to configure
Mux as follows as per IOM requirement:
(1). Confgiure Mux HPD = 1, IRQ = 0
(2). Configure Mux with HPD = 1, IRQ = 1
IOM expects TypeC Mux configuration as follows:
(1). HPD=1, IRQ=0
(2). HPD=1, IRQ=1
if IOM gets mux config request (2) without configuring (1), it will ignore
the request. The impact of this is there is no DP_alt mode display.
Fixes: 43d596e322 ("usb: typec: intel_pmc_mux: Check the port status before connect")
Cc: stable@vger.kernel.org
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Signed-off-by: Madhusudanarao Amara <madhusudanarao.amara@intel.com>
Link: https://lore.kernel.org/r/20201216140918.49197-1-madhusudanarao.amara@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
syzbot is reporting UAF at usb_submit_urb() [1], for
service_outstanding_interrupt() is not checking WDM_DISCONNECTING
before calling usb_submit_urb(). Close the race by doing same checks
wdm_read() does upon retry.
Also, while wdm_read() checks WDM_DISCONNECTING with desc->rlock held,
service_interrupt_work() does not hold desc->rlock. Thus, it is possible
that usb_submit_urb() is called from service_outstanding_interrupt() from
service_interrupt_work() after WDM_DISCONNECTING was set and kill_urbs()
from wdm_disconnect() completed. Thus, move kill_urbs() in
wdm_disconnect() to after cancel_work_sync() (which makes sure that
service_interrupt_work() is no longer running) completed.
Although it seems to be safe to dereference desc->intf->dev in
service_outstanding_interrupt() even if WDM_DISCONNECTING was already set
because desc->rlock or cancel_work_sync() prevents wdm_disconnect() from
reaching list_del() before service_outstanding_interrupt() completes,
let's not emit error message if WDM_DISCONNECTING is set by
wdm_disconnect() while usb_submit_urb() is in progress.
[1] https://syzkaller.appspot.com/bug?extid=9e04e2df4a32fb661daf
Reported-by: syzbot <syzbot+9e04e2df4a32fb661daf@syzkaller.appspotmail.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: stable <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/620e2ee0-b9a3-dbda-a25b-a93e0ed03ec5@i-love.sakura.ne.jp
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When we try to play and capture simultaneously we see that
interrupts are genrated but our handler is not being acknowledged,
After investigating further more in detail on this issue we found
that IRQ delivery via MSI from the ACP IP is unreliable and so sometimes
interrupt generated will not be acknowledged so MSI model shouldn't be used
and using legacy IRQs will resolve interrupt handling issue.
This patch replaces MSI interrupt handling with legacy IRQ model.
Issue can be reproduced easily by running below python script:
import subprocess
import time
import threading
def do2():
cmd = 'aplay -f dat -D hw:2,1 /dev/zero -d 1'
subprocess.call(cmd, stdin=subprocess.PIPE,
stderr=subprocess.PIPE, shell=True)
print('Play Done')
def run():
for i in range(1000):
do2()
def do(i):
cmd = 'arecord -f dat -D hw:2,2 /dev/null -d 1'
subprocess.call(cmd, stdout=subprocess.PIPE,
stderr=subprocess.PIPE, shell=True)
print(datetime.datetime.now(), i)
t = threading.Thread(target=run)
t.start()
for i in range(1000):
do(i)
t.join()
After applying this patch issue is resolved.
Signed-off-by: Ravulapati Vishnu vardhan rao <Vishnuvardhanrao.Ravulapati@amd.com>
Link: https://lore.kernel.org/r/20201222115929.11222-1-Vishnuvardhanrao.Ravulapati@amd.com
Signed-off-by: Mark Brown <broonie@kernel.org>
When CONFIG_COMPILE_TEST is set, it is possible to build some
of the interconnect drivers into the kernel while their dependencies
are loadable modules, which is bad:
arm-linux-gnueabi-ld: drivers/interconnect/qcom/bcm-voter.o: in function `qcom_icc_bcm_voter_commit':
(.text+0x1f8): undefined reference to `rpmh_invalidate'
arm-linux-gnueabi-ld: (.text+0x20c): undefined reference to `rpmh_write_batch'
arm-linux-gnueabi-ld: (.text+0x2b0): undefined reference to `rpmh_write_batch'
arm-linux-gnueabi-ld: (.text+0x2e8): undefined reference to `rpmh_write_batch'
arm-linux-gnueabi-ld: drivers/interconnect/qcom/icc-rpmh.o: in function `qcom_icc_bcm_init':
(.text+0x2ac): undefined reference to `cmd_db_read_addr'
arm-linux-gnueabi-ld: (.text+0x2c8): undefined reference to `cmd_db_read_aux_data'
The exact dependencies are a bit complicated, so split them out into a
hidden Kconfig symbol that all drivers can in turn depend on to get it
right.
Fixes: 976daac4a1 ("interconnect: qcom: Consolidate interconnect RPMh support")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20201204165030.3747484-1-arnd@kernel.org
Signed-off-by: Georgi Djakov <georgi.djakov@linaro.org>
Commit
121b32a58a ("x86/entry/32: Use IA32-specific wrappers for syscalls taking 64-bit arguments")
converted native x86-32 which take 64-bit arguments to use the
compat handlers to allow conversion to passing args via pt_regs.
sys_fanotify_mark() was however missed, as it has a general compat
handler. Add a config option that will use the syscall wrapper that
takes the split args for native 32-bit.
[ bp: Fix typo in Kconfig help text. ]
Fixes: 121b32a58a ("x86/entry/32: Use IA32-specific wrappers for syscalls taking 64-bit arguments")
Reported-by: Paweł Jasiak <pawel@jasiak.xyz>
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Jan Kara <jack@suse.cz>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20201130223059.101286-1-brgerst@gmail.com
The set flag NFT_SET_EXPR provides a hint to the kernel that userspace
supports for multiple expressions per set element. In the same
direction, NFT_DYNSET_F_EXPR specifies that dynset expression defines
multiple expressions per set element.
This allows new userspace software with old kernels to bail out with
EOPNOTSUPP. This update is similar to ef516e8625 ("netfilter:
nf_tables: reintroduce the NFT_SET_CONCAT flag"). The NFT_SET_EXPR flag
needs to be set on when the NFTA_SET_EXPRESSIONS attribute is specified.
The NFT_SET_EXPR flag is not set on with NFTA_SET_EXPR to retain
backward compatibility in old userspace binaries.
Fixes: 48b0ae046e ("netfilter: nftables: netlink support for several set element expressions")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
If userspace requests a feature which is not available the original set
definition, then bail out with EOPNOTSUPP. If userspace sends
unsupported dynset flags (new feature not supported by this kernel),
then report EOPNOTSUPP to userspace. EINVAL should be only used to
report malformed netlink messages from userspace.
Fixes: 22fe54d5fe ("netfilter: nf_tables: add support for dynamic set updates")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
We need to make sure our device is idle when rebooting a virtual
machine. This is done in the driver level.
The firmware will later handle FLR but we want to be extra safe and
stop the devices until the FLR is handled.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Up until now validation errors were counted in the parsing field
of the cs_counters struct, so we added a new counter and increased
it when needed.
In addition, there were some locations where only one of the counters
was updated (ctx or aggregate) so add the second one to be updated
as well.
Signed-off-by: Alon Mizrahi <amizrahi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
When the firmware security is enabled, the pcie_aux_dbi_reg_addr
register in the PCI controller is blocked. Therefore, ignore
the result of writing to this register and assume it worked. Also
remove the prints on errors in the internal ELBI write function.
If the security is enabled, the firmware is responsible for setting
this register correctly so we won't have any problem.
If the security is disabled, the write will work (unless something
is totally broken at the PCI level and then the whole sequence
will fail).
In addition, remove a write to register pcie_aux_dbi_reg_addr+4,
which was never actually needed.
Moreover, PCIE_DBI registers are blocked to access from host when
firmware security is enabled. Use a different register to flush the
writes.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Driver must fetch FW hard reset capability at every FW boot stage:
preboot, CPU boot, CPU application.
If hard reset is triggered, driver will take into consideration
only the last capability received.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
In case the clock gating was enabled in preboot we need to disable it
at the H/W initialization stage before touching the MME/TPC registers.
Otherwise, the ASIC can get stuck. If the security is enabled in
the firmware level, the CGM is always disabled and the driver can't
enable it.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
hw_queues_mirror was renamed to cs_mirror, so revise accordingly a
comment that refers to this list.
Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
We don't need to set EB on signal packets from collective slave
queues as it degrades performance. Because the slaves are the network
queues, the engine barrier doesn't actually guarantee that the
packet has been sent.
Signed-off-by: Alon Mizrahi <amizrahi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
FW hard reset capability indication is now moved to preboot stage.
Driver will check if HW is dirty only after it validated preboot
is up. If HW is dirty, driver will perform a hard reset according
to the FW capability.
In addition, FW defines a new message which driver need to send in
order to initiate a hard reset.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
As we only fetch the CPU_PLL frequency in gaudi, we don't need a
generic get_pll_frequency function which takes a pll index as input
Signed-off-by: Alon Mizrahi <amizrahi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
When the F/W security is enabled, goya needs to fetch the PSOC pll
frequency through a dedicated interface
Signed-off-by: Alon Mizrahi <amizrahi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
dist->ready setting is pointlessly spread across the two vgic
backends, while it could be consolidated in kvm_vgic_map_resources().
Move it there, and slightly simplify the flows in both backends.
Suggested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
KVM_ARM_VCPU_INIT ioctl calls kvm_reset_vcpu(), which in turn resets the
PMU with a call to kvm_pmu_vcpu_reset(). The function zeroes the PMU
chained counters bitmap and stops all the counters with a perf event
attached. Because it is called before the VCPU has had the chance to run,
no perf events are in use and none are released.
kvm_arm_pmu_v3_enable(), called by kvm_vcpu_first_run_init() only if the
VCPU has been initialized, also resets the PMU. kvm_pmu_vcpu_reset() in
this case does the exact same thing as the previous call, so remove it.
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201201150157.223625-6-alexandru.elisei@arm.com
This patch enables the IOTLB API support for vhost-vsock devices,
allowing the userspace to emulate an IOMMU for the guest.
These changes were made following vhost-net, in details this patch:
- exposes VIRTIO_F_ACCESS_PLATFORM feature and inits the iotlb
device if the feature is acked
- implements VHOST_GET_BACKEND_FEATURES and
VHOST_SET_BACKEND_FEATURES ioctls
- calls vq_meta_prefetch() before vq processing to prefetch vq
metadata address in IOTLB
- provides .read_iter, .write_iter, and .poll callbacks for the
chardev; they are used by the userspace to exchange IOTLB messages
This patch was tested specifying "intel_iommu=strict" in the guest
kernel command line. I used QEMU with a patch applied [1] to fix a
simple issue (that patch was merged in QEMU v5.2.0):
$ qemu -M q35,accel=kvm,kernel-irqchip=split \
-drive file=fedora.qcow2,format=qcow2,if=virtio \
-device intel-iommu,intremap=on,device-iotlb=on \
-device vhost-vsock-pci,guest-cid=3,iommu_platform=on,ats=on
[1] https://lists.gnu.org/archive/html/qemu-devel/2020-10/msg09077.html
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/20201223143638.123417-1-sgarzare@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
20b6cc34ea ("bpf: Avoid hashtab deadlock with map_locked") introduced
a possibility of getting EBUSY error on lock contention, which seems to happen
very deterministically in test_maps when running 1024 threads on low-CPU
machine. In libbpf CI case, it's a 2 CPU VM and it's hitting this 100% of the
time. Work around by retrying on EBUSY (and EAGAIN, while we are at it) after
a small sleep. sched_yield() is too agressive and fails even after 20 retries,
so I went with usleep(1) for backoff.
Also log actual error returned to make it easier to see what's going on.
Fixes: 20b6cc34ea ("bpf: Avoid hashtab deadlock with map_locked")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20201223200652.3417075-1-andrii@kernel.org
This flag can be used by an end user to disable S0ix flows on a
buggy system or by an OEM for development purposes.
If you need this flag to be persisted across reboots, it's suggested
to use a udev rule to call adjust it until the kernel could have your
configuration in a disallow list.
Signed-off-by: Mario Limonciello <mario.limonciello@dell.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Tested-by: Yijun Shen <Yijun.shen@dell.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
commit e086ba2fcc ("e1000e: disable s0ix entry and exit flows for ME
systems") disabled s0ix flows for systems that have various incarnations of
the i219-LM ethernet controller. This changed caused power consumption
regressions on the following shipping Dell Comet Lake based laptops:
* Latitude 5310
* Latitude 5410
* Latitude 5410
* Latitude 5510
* Precision 3550
* Latitude 5411
* Latitude 5511
* Precision 3551
* Precision 7550
* Precision 7750
This commit was introduced because of some regressions on certain Thinkpad
laptops. This comment was potentially caused by an earlier
commit 632fbd5eb5 ("e1000e: fix S0ix flows for cable connected case").
or it was possibly caused by a system not meeting platform architectural
requirements for low power consumption. Other changes made in the driver
with extended timeouts are expected to make the driver more impervious to
platform firmware behavior.
Fixes: e086ba2fcc ("e1000e: disable s0ix entry and exit flows for ME systems")
Reviewed-by: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Mario Limonciello <mario.limonciello@dell.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Tested-by: Yijun Shen <Yijun.shen@dell.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Commit f9c6cea0b3 ("ibmvnic: Skip fatal error reset after passive init")
says "If the passive
CRQ initialization occurs before the FATAL reset task is processed,
the FATAL error reset task would try to access a CRQ message queue
that was freed, causing an oops. The problem may be most likely to
occur during DLPAR add vNIC with a non-default MTU, because the DLPAR
process will automatically issue a change MTU request.
Fix this by not processing fatal error reset if CRQ is passively
initialized after client-driven CRQ initialization fails."
The original commit skips a specific reset condition, but that does
not fix the problem it claims to fix, and misses a reset condition.
The effective fix is commit 0e435befae ("ibmvnic: fix NULL pointer
dereference in ibmvic_reset_crq") and commit a0faaa27c7 ("ibmvnic:
fix NULL pointer dereference in reset_sub_crq_queues"). With above
two fixes, there are no more crashes seen as described even without
the original commit, so I would like to revert the original commit.
Fixes: f9c6cea0b3 ("ibmvnic: Skip fatal error reset after passive init")
Signed-off-by: Lijun Pan <ljp@linux.ibm.com>
Link: https://lore.kernel.org/r/20201223204904.12677-1-ljp@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
DCB uses the same handler function for both RTM_GETDCB and RTM_SETDCB
messages. dcb_doit() bounces RTM_SETDCB mesasges if the user does not have
the CAP_NET_ADMIN capability.
However, the operation to be performed is not decided from the DCB message
type, but from the DCB command. Thus DCB_CMD_*_GET commands are used for
reading DCB objects, the corresponding SET and DEL commands are used for
manipulation.
The assumption is that set-like commands will be sent via an RTM_SETDCB
message, and get-like ones via RTM_GETDCB. However, this assumption is not
enforced.
It is therefore possible to manipulate DCB objects without CAP_NET_ADMIN
capability by sending the corresponding command in an RTM_GETDCB message.
That is a bug. Fix it by validating the type of the request message against
the type used for the response.
Fixes: 2f90b8657e ("ixgbe: this patch adds support for DCB to the kernel and ixgbe driver")
Signed-off-by: Petr Machata <me@pmachata.org>
Link: https://lore.kernel.org/r/a2a9b88418f3a58ef211b718f2970128ef9e3793.1608673640.git.me@pmachata.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alex Elder says:
====================
net: ipa: GSI interrupt handling fixes
This series implements fixes for some issues related to handling
interrupts when GSI channel and event ring commands complete.
The first issue is that the completion condition for an event ring
or channel command could occur while the associated interrupt is
disabled. This would cause the interrupt to fire when it is
subsequently enabled, even if the condition it signals had already
been handled. The fix is to clear any pending interrupt conditions
before re-enabling the interrupt.
The second and third patches change how the success of an event ring
or channel command is determined. These commands change the state
of an event ring or channel. Previously the receipt of a completion
interrupt was required to consider a command successful. Instead, a
command is successful if it changes the state of the target event
ring or channel in the way expected. This way the command can
succeed even if the completion interrupt did not arrive while it was
enabled.
====================
Link: https://lore.kernel.org/r/20201222180012.22489-1-elder@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This patch implements the same basic fix for event rings as the
previous one does for channels.
The result of issuing an event ring control command should be that
the event ring changes state. If enabled, a completion interrupt
signals that the event ring state has changed. This interrupt is
enabled by gsi_evt_ring_command() and disabled again after the
command has completed (or we time out).
There is a window of time during which the command could complete
successfully without interrupting. This would cause the event ring
to transition to the desired new state.
So whether a event ring command ends via completion interrupt or
timeout, we can consider the command successful if the event ring
has entered the desired state (and a failure if it has not,
regardless of the cause).
Fixes: b4175f8731 ("net: ipa: only enable GSI event control IRQs when needed")
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The result of issuing a channel control command should be that the
channel changes state. If enabled, a completion interrupt signals
that the channel state has changed. This interrupt is enabled by
gsi_channel_command() and disabled again after the command has
completed (or we time out).
There is a window of time--after the completion interrupt is disabled
but before the channel state is read--during which the command could
complete successfully without interrupting. This would cause the
channel to transition to the desired new state.
So whether a channel command ends via completion interrupt or
timeout, we can consider the command successful if the channel
has entered the desired state (and a failure if it has not,
regardless of the cause).
Fixes: d6c9e3f506 ("net: ipa: only enable generic command completion IRQ when needed");
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We enable the completion interrupt for channel or event ring
commands only when we issue them. The interrupt is disabled after
the interrupt has fired, or after we have timed out waiting for it.
If we time out, the command could complete after the interrupt has
been disabled, causing a state change in the channel or event ring.
The interrupt associated with that state change would be delivered
the next time the completion interrupt is enabled.
To avoid previous command completions interfering with new commands,
clear all pending completion interrupts before re-enabling them for
a new command.
Fixes: b4175f8731 ("net: ipa: only enable GSI event control IRQs when needed")
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When the core clock rate and interconnect bandwidth specifications
were moved into configuration data, a copy/paste bug was introduced,
causing the memory interconnect bandwidth to be set three times
rather than enabling the three different interconnects.
Fix this bug.
Fixes: 91d02f9551 ("net: ipa: use config data for clocking")
Signed-off-by: Alex Elder <elder@linaro.org>
Reviewed-by: Georgi Djakov <georgi.djakov@linaro.org>
Link: https://lore.kernel.org/r/20201222151613.5730-1-elder@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
virtnet_set_channels can recursively call cpus_read_lock if CONFIG_XPS
and CONFIG_HOTPLUG are enabled.
The path is:
virtnet_set_channels - calls get_online_cpus(), which is a trivial
wrapper around cpus_read_lock()
netif_set_real_num_tx_queues
netif_reset_xps_queues_gt
netif_reset_xps_queues - calls cpus_read_lock()
This call chain and potential deadlock happens when the number of TX
queues is reduced.
This commit the removes netif_set_real_num_[tr]x_queues calls from
inside the get/put_online_cpus section, as they don't require that it
be held.
Fixes: 47be24796c ("virtio-net: fix the set affinity bug when CPU IDs are not consecutive")
Signed-off-by: Jeff Dike <jdike@akamai.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Link: https://lore.kernel.org/r/20201223025421.671-1-jdike@akamai.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
kvm_vgic_map_resources() is called when a VCPU if first run and it maps all
the VGIC MMIO regions. To prevent double-initialization, the VGIC uses the
ready variable to keep track of the state of resources and the global KVM
mutex to protect against concurrent accesses. After the lock is taken, the
variable is checked again in case another VCPU took the lock between the
current VCPU reading ready equals false and taking the lock.
The double-checked lock pattern is spread across four different functions:
in kvm_vcpu_first_run_init(), in kvm_vgic_map_resource() and in
vgic_{v2,v3}_map_resources(), which makes it hard to reason about and
introduces minor code duplication. Consolidate the checks in
kvm_vgic_map_resources(), where the lock is taken.
No functional change intended.
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201201150157.223625-4-alexandru.elisei@arm.com
kvm_timer_enable() is called in kvm_vcpu_first_run_init() after
kvm_vgic_map_resources() if the VGIC wasn't ready. kvm_vgic_map_resources()
is the only place where kvm->arch.vgic.ready is set to true.
For a v2 VGIC, kvm_vgic_map_resources() will attempt to initialize the VGIC
and set the initialized flag.
For a v3 VGIC, kvm_vgic_map_resources() will return an error code if the
VGIC isn't already initialized.
The end result is that if we've reached kvm_timer_enable(), the VGIC is
initialzed and ready and vgic_initialized() will always be true, so remove
this check.
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
[maz: added comment about vgic initialisation, as suggested by Eric]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201201150157.223625-3-alexandru.elisei@arm.com
The API documentation states that general error codes are not detailed, but
errors with specific meanings are. On arm64, KVM_RUN can return error
numbers with a different meaning than what is described by POSIX or the C99
standard (as taken from man 3 errno).
Absent from the newly documented error codes is ERANGE which can be
returned when making a change to the EL2 stage 1 tables if the address is
larger than the largest supported input address. Assuming no bugs in the
implementation, that is not possible because the input addresses which are
mapped are the result of applying the macro kern_hyp_va() on kernel virtual
addresses.
CC: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201201150157.223625-2-alexandru.elisei@arm.com
Cannot adjust speaker's volume on Lenovo C940.
Applying the alc298_fixup_speaker_volume function can fix the issue.
[ Additional note: C940 has I2S amp for the speaker and this needs the
same initialization as Dell machines.
The patch was slightly modified so that the quirk entry is moved
next to the corresponding Dell quirk entry. -- tiwai ]
Signed-off-by: Kailang Yang <kailang@realtek.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/ea25b4e5c468491aa2e9d6cb1f2fced3@realtek.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Kalle Valo says:
====================
wireless-drivers fixes for v5.11
First set of fixes for v5.11, more fixes than usual this time. For
ath11k we have several fixes for QCA6390 PCI support and mt76 has
several. Also one build fix for mt76.
mt76
* fix two NULL pointer dereference
* fix build error when CONFIG_MAC80211_MESH is disabled
rtlwifi
* fix use-after-free in firmware handling code
ath11k
* error handling fixes
* fix crash found during connect and disconnect test
* handle HT disable better
* avoid printing qmi memory failure during firmware bootup
* disable ASPM during firmware bootup
* tag 'wireless-drivers-2020-12-22' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers:
MAINTAINERS: switch to different email address
mt76: mt7915: fix MESH ifdef block
mt76: mt76s: fix NULL pointer dereference in mt76s_process_tx_queue
mt76: sdio: remove wake logic in mt76s_process_tx_queue
mt76: usb: remove wake logic in mt76u_status_worker
ath11k: pci: disable ASPM L0sLs before downloading firmware
ath11k: qmi: try to allocate a big block of DMA memory first
rtlwifi: rise completion at the last step of firmware callback
mt76: mt76u: fix NULL pointer dereference in mt76u_status_worker
ath11k: Fix ath11k_pci_fix_l1ss()
ath11k: Fix error code in ath11k_core_suspend()
ath11k: start vdev if a bss peer is already created
ath11k: fix crash caused by NULL rx_channel
ath11k: add missing null check on allocated skb
====================
Link: https://lore.kernel.org/r/20201222163727.D4336C433C6@smtp.codeaurora.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit 34f0f4e3f4 ("ibmvnic: Fix login buffer memory leaks") frees
login_rsp_buffer in release_resources() and send_login()
because handle_login_rsp() does not free it.
Commit f3ae59c0c0 ("ibmvnic: store RX and TX subCRQ handle array in
ibmvnic_adapter struct") frees login_rsp_buffer in handle_login_rsp().
It seems unnecessary to free it in release_resources() and send_login().
There are chances that handle_login_rsp returns earlier without freeing
buffers. Double-checking the buffer is harmless since
release_login_buffer and release_login_rsp_buffer will
do nothing if buffer is already freed.
Fixes: f3ae59c0c0 ("ibmvnic: store RX and TX subCRQ handle array in ibmvnic_adapter struct")
Fixes: 34f0f4e3f4 ("ibmvnic: Fix login buffer memory leaks")
Signed-off-by: Lijun Pan <ljp@linux.ibm.com>
Link: https://lore.kernel.org/r/20201219213919.21045-1-ljp@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When searching for inactive maintainers it's useful to filter
out mailing list addresses. Such "maintainers" will obviously
never feature in a "From:" line of an email or a review tag.
Since "L:" entries only provide the address of a mailing list
without a fancy name extend this pattern to "M:" entries.
Alternatively we could reserve M: entries for humans only
and move the fake "maintainers" to L:. While I'd personally
prefer to reserve M: for humans only, I'm not 100% that's
a great choice either, given most L: entries are in fact
open mailing lists with public archives.
Link: https://lore.kernel.org/r/20201219185538.750076-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The dwmac glue registers on Amlogic Meson8b and newer SoCs has two clock
inputs:
- Meson8b and Meson8m2: MPLL2 and MPLL2 (the same parent is wired to
both inputs)
- GXBB, GXL, GXM, AXG, G12A, G12B, SM1: FCLK_DIV2 and MPLL2
All known vendor kernels and u-boots are using the first input only. We
let the common clock framework automatically choose the "right" parent.
For some boards this causes a problem though, specificially with G12A and
newer SoCs. The clock input is used for generating the 125MHz RGMII TX
clock. For the two input clocks this means on G12A:
- FCLK_DIV2: 999999985Hz / 8 = 124999998.125Hz
- MPLL2: 499999993Hz / 4 = 124999998.25Hz
In theory MPLL2 is the "better" clock input because it's gets us 0.125Hz
closer to the requested frequency than FCLK_DIV2. In reality however
there is a resource conflict because MPLL2 is needed to generate some of
the audio clocks. dwmac-meson8b probes first and sets up the clock tree
with MPLL2. This works fine until the audio driver comes and "steals"
the MPLL2 clocks and configures it with it's own rate (294909637Hz). The
common clock framework happily changes the MPLL2 rate but does not
reconfigure our RGMII TX clock tree, which then ends up at 73727409Hz,
which is more than 40% off the requested 125MHz.
Don't use the second clock input for now to force the common clock
framework to always select the first parent. This mimics the behavior
from the vendor driver and fixes the clock resource conflict with the
audio driver on G12A boards. Once the common clock framework can handle
this situation this change can be reverted again.
Fixes: 566e825162 ("net: stmmac: add a glue driver for the Amlogic Meson 8b / GXBB DWMAC")
Reported-by: Thomas Graichen <thomas.graichen@gmail.com>
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Tested-by: thomas graichen <thomas.graichen@gmail.com>
Link: https://lore.kernel.org/r/20201219135036.3216017-1-martin.blumenstingl@googlemail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Late defconfig changes for omaps for v5.11 merge window
Drop drop unused POWER_AVS option, and enable CONFIG_SPI_GPIO as
a loadable module so gta04 needs it for controlling the td028ttec1
panel.
* tag 'omap-for-v5.11/defconfig-late-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
ARM: omap2plus_defconfig: enable SPI GPIO
ARM: omap2plus_defconfig: drop unused POWER_AVS option
ARM: omap2plus_defconfig: Enable TI eQEP counter driver
ARM: omap2plus_defconfig: add CONFIG_AK8975=m and CONFIG_KXCJK1013=m
ARM: omap2plus_defconfig: Enable OMAP3_THERMAL
Link: https://lore.kernel.org/r/pull-1607675790-251347@atomide.com-2
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
When the first file is opened, ext4 samples the mountpoint of the
filesystem in 64 bytes of the super block. It does so using
strlcpy(), this means that the remaining bytes in the super block
string buffer are untouched. If the mount point before had a longer
path than the current one, it can be reconstructed.
Consider the case where the fs was mounted to "/media/johnjdeveloper"
and later to "/". The super block buffer then contains
"/\x00edia/johnjdeveloper".
This case was seen in the wild and caused confusion how the name
of a developer ands up on the super block of a filesystem used
in production...
Fix this by using strncpy() instead of strlcpy(). The superblock
field is defined to be a fixed-size char array, and it is already
marked using __nonstring in fs/ext4/ext4.h. The consumer of the field
in e2fsprogs already assumes that in the case of a 64+ byte mount
path, that s_last_mounted will not be NUL terminated.
Link: https://lore.kernel.org/r/X9ujIOJG/HqMr88R@mit.edu
Reported-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
If journalling is still working at the moment we get to writing error
information to the superblock we cannot write directly to the superblock
as such write could race with journalled update of the superblock and
cause journal checksum failures, writing inconsistent information to the
journal or other problems. We cannot journal the superblock directly
from the error handling functions as we are running in uncertain context
and could deadlock so just punt journalled superblock update to a
workqueue.
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20201216101844.22917-5-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Protect all superblock modifications (including checksum computation)
with a superblock buffer lock. That way we are sure computed checksum
matches current superblock contents (a mismatch could cause checksum
failures in nojournal mode or if an unjournalled superblock update races
with a journalled one). Also we avoid modifying superblock contents
while it is being written out (which can cause DIF/DIX failures if we
are running in nojournal mode).
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20201216101844.22917-4-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Although there is nothing wrong with the current host PSCI relay
implementation, we can clean it up and remove some of the helpers
that do not improve the overall readability of the legacy PSCI 0.1
handling.
Opportunity is taken to turn the bitmap into a set of booleans,
and creative use of preprocessor macros make init and check
more concise/readable.
Suggested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
PSCI driver exposes a struct containing the PSCI v0.1 function IDs
configured in the DT. However, the struct does not convey the
information whether these were set from DT or contain the default value
zero. This could be a problem for PSCI proxy in KVM protected mode.
Extend config passed to KVM with a bit mask with individual bits set
depending on whether the corresponding function pointer in psci_ops is
set, eg. set bit for PSCI_CPU_SUSPEND if psci_ops.cpu_suspend != NULL.
Previously config was split into multiple global variables. Put
everything into a single struct for convenience.
Reported-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: David Brazdil <dbrazdil@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201208142452.87237-2-dbrazdil@google.com
We reset the guest's view of PMCR_EL0 unconditionally, based on
the host's view of this register. It is however legal for an
implementation not to provide any PMU, resulting in an UNDEF.
The obvious fix is to skip the reset of this shadow register
when no PMU is available, sidestepping the issue entirely.
If no PMU is available, the guest is not able to request
a virtual PMU anyway, so not doing nothing is the right thing
to do!
It is unlikely that this bug can hit any HW implementation
though, as they all provide a PMU. It has been found using nested
virt with the host KVM not implementing the PMU itself.
Fixes: ab9468340d ("arm64: KVM: Add access handler for PMCR register")
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201210083059.1277162-1-maz@kernel.org
The current check of nvec < minvec for nvec returned from
platform_irq_count() will not detect a negative error code in nvec.
This is because minvec is unsigned, and, as such, nvec is promoted to
unsigned in that check, which will make it a huge number (if it contained
-EPROBE_DEFER).
In practice, an error should not occur in nvec for the only in-tree
user, but add a check anyway.
Fixes: e15f2fa959 ("driver core: platform: Add devm_platform_get_irqs_affinity()")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/1608561055-231244-1-git-send-email-john.garry@huawei.com
As a performance enhancement, make the completion queue interrupts managed.
In addition, in commit bf0beec060 ("blk-mq: drain I/O when all CPUs in a
hctx are offline"), CPU hotplug for MQ devices using managed interrupts is
made safe. So expose HW queues to blk-mq to take advantage of this.
Flag Scsi_host.host_tagset is also set to ensure that the HBA is not sent
more commands than it can handle. However the driver still does not use
request tag for IPTT as there are many HW bugs means that special rules
apply for IPTT allocation.
Link: https://lore.kernel.org/r/1606905417-183214-6-git-send-email-john.garry@huawei.com
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
ugeth is the netdiv_priv() part of the netdevice. Accessing the memory
pointed to by ugeth (such as done by ucc_geth_memclean() and the two
of_node_puts) after free_netdev() is thus use-after-free.
Fixes: 80a9fad8e8 ("ucc_geth: fix module removal")
Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Table 8-53 in the QUICC Engine Reference manual shows definitions of
fields up to a size of 192 bytes, not just 128. But in table 8-111,
one does find the text
Base Address of the Global Transmitter Parameter RAM Page. [...]
The user needs to allocate 128 bytes for this page. The address must
be aligned to the page size.
I've checked both rev. 7 (11/2015) and rev. 9 (05/2018) of the manual;
they both have this inconsistency (and the table numbers are the
same).
Adding a bit of debug printing, on my board the struct
ucc_geth_tx_global_pram is allocated at offset 0x880, while
the (opaque) ucc_geth_thread_data_tx gets allocated immediately
afterwards, at 0x900. So whatever the engine writes into the thread
data overlaps with the tail of the global tx pram (and devmem says
that something does get written during a simple ping).
I haven't observed any failure that could be attributed to this, but
it seems to be the kind of thing that would be extremely hard to
debug. So extend the struct definition so that we do allocate 192
bytes.
Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
All the buffers and registers are already set up appropriately for an
MTU slightly above 1500, so we just need to expose this to the
networking stack. AFAICT, there's no need to implement .ndo_change_mtu
when the receive buffers are always set up to support the max_mtu.
This fixes several warnings during boot on our mpc8309-board with an
embedded mv88e6250 switch:
mv88e6085 mdio@e0102120:10: nonfatal error -34 setting MTU 1500 on port 0
...
mv88e6085 mdio@e0102120:10: nonfatal error -34 setting MTU 1500 on port 4
ucc_geth e0102000.ethernet eth1: error -22 setting MTU to 1504 to include DSA overhead
The last line explains what the DSA stack tries to do: achieving an MTU
of 1500 on-the-wire requires that the master netdevice connected to
the CPU port supports an MTU of 1500+the tagging overhead.
Fixes: bfcb813203 ("net: dsa: configure the MTU for switch ports")
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
if cur_bpw <= 8 and xfer_len < 4 then the value of fthlv will be 1 and
SPI registers content may have been lost.
* If SPI data register is accessed as a 16-bit register and DSIZE <= 8bit,
better to select FTHLV = 2, 4, 6 etc
* If SPI data register is accessed as a 32-bit register and DSIZE > 8bit,
better to select FTHLV = 2, 4, 6 etc, while if DSIZE <= 8bit,
better to select FTHLV = 4, 8, 12 etc
Signed-off-by: Roman Guskov <rguskov@dh-electronics.com>
Fixes: dcbe0d84df ("spi: add driver for STM32 SPI controller")
Reviewed-by: Marek Vasut <marex@denx.de>
Link: https://lore.kernel.org/r/20201221123532.27272-1-rguskov@dh-electronics.com
Signed-off-by: Mark Brown <broonie@kernel.org>
ath.git fixes for 5.11. Major changes:
ath11k
* add null check for skb allocation
* fix crash found during connect/disconnect stress testing
* fix for HT disabled case
* brown paperbag fixes for my bugs in suspend code
* fix an unnecessary qmi allocation during firmware bootup
* disable ASPM during firmware bootup to avoid issues
Issue:
Flow control frame used to pause GoP(MAC) was delivered to the CPU
and created a load on the CPU. Since XOFF/XON frames are used only
by MAC, these frames should be dropped inside MAC.
Fix:
According to 802.3-2012 - IEEE Standard for Ethernet pause frame
has unique destination MAC address 01-80-C2-00-00-01.
Add TCAM parser entry to track and drop pause frames by destination MAC.
Fixes: 3f518509de ("ethernet: Add new driver for Marvell Armada 375 network unit")
Signed-off-by: Stefan Chulski <stefanc@marvell.com>
Link: https://lore.kernel.org/r/1608229817-21951-1-git-send-email-stefanc@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When setting xfrm replay_window to values higher than 32, a rare
page-fault occurs in xfrm_replay_advance_bmp:
BUG: unable to handle page fault for address: ffff8af350ad7920
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD ad001067 P4D ad001067 PUD 0
Oops: 0002 [#1] SMP PTI
CPU: 3 PID: 30 Comm: ksoftirqd/3 Kdump: loaded Not tainted 5.4.52-050452-generic #202007160732
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
RIP: 0010:xfrm_replay_advance_bmp+0xbb/0x130
RSP: 0018:ffffa1304013ba40 EFLAGS: 00010206
RAX: 000000000000010d RBX: 0000000000000002 RCX: 00000000ffffff4b
RDX: 0000000000000018 RSI: 00000000004c234c RDI: 00000000ffb3dbff
RBP: ffffa1304013ba50 R08: ffff8af330ad7920 R09: 0000000007fffffa
R10: 0000000000000800 R11: 0000000000000010 R12: ffff8af29d6258c0
R13: ffff8af28b95c700 R14: 0000000000000000 R15: ffff8af29d6258fc
FS: 0000000000000000(0000) GS:ffff8af339ac0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff8af350ad7920 CR3: 0000000015ee4000 CR4: 00000000001406e0
Call Trace:
xfrm_input+0x4e5/0xa10
xfrm4_rcv_encap+0xb5/0xe0
xfrm4_udp_encap_rcv+0x140/0x1c0
Analysis revealed offending code is when accessing:
replay_esn->bmp[nr] |= (1U << bitnr);
with 'nr' being 0x07fffffa.
This happened in an SMP system when reordering of packets was present;
A packet arrived with a "too old" sequence number (outside the window,
i.e 'diff > replay_window'), and therefore the following calculation:
bitnr = replay_esn->replay_window - (diff - pos);
yields a negative result, but since bitnr is u32 we get a large unsigned
quantity (in crash dump above: 0xffffff4b seen in ecx).
This was supposed to be protected by xfrm_input()'s former call to:
if (x->repl->check(x, skb, seq)) {
However, the state's spinlock x->lock is *released* after '->check()'
is performed, and gets re-acquired before '->advance()' - which gives a
chance for a different core to update the xfrm state, e.g. by advancing
'replay_esn->seq' when it encounters more packets - leading to a
'diff > replay_window' situation when original core continues to
xfrm_replay_advance_bmp().
An attempt to fix this issue was suggested in commit bcf66bf54a
("xfrm: Perform a replay check after return from async codepaths"),
by calling 'x->repl->recheck()' after lock is re-acquired, but fix
applied only to asyncronous crypto algorithms.
Augment the fix, by *always* calling 'recheck()' - irrespective if we're
using async crypto.
Fixes: 0ebea8ef35 ("[IPSEC]: Move state lock into x->type->input")
Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
1) Incorrect loop in error path of nft_set_elem_expr_clone(),
from Colin Ian King.
2) Missing xt_table_get_private_protected() to access table
private data in x_tables, from Subash Abhinov Kasiviswanathan.
3) Possible oops in ipset hash type resize, from Vasily Averin.
4) Fix shift-out-of-bounds in ipset hash type, also from Vasily.
* git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf:
netfilter: ipset: fix shift-out-of-bounds in htable_bits()
netfilter: ipset: fixes possible oops in mtype_resize
netfilter: x_tables: Update remaining dereference to RCU
netfilter: nftables: fix incorrect increment of loop counter
====================
Link: https://lore.kernel.org/r/20201218120409.3659-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
taprio_graft() can insert a NULL element in the array of child qdiscs. As
a consquence, taprio_reset() might not reset child qdiscs completely, and
taprio_destroy() might leak resources. Fix it by ensuring that loops that
iterate over q->qdiscs[] don't end when they find the first NULL item.
Fixes: 44d4775ca5 ("net/sched: sch_taprio: reset child qdiscs before freeing them")
Fixes: 5a781ccbd1 ("tc: Add support for configuring the taprio scheduler")
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Link: https://lore.kernel.org/r/13edef6778fef03adc751582562fba4a13e06d6a.1608240532.git.dcaratti@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2020-12-17
Sylwester fixes an issue where PF was not properly being rebuilt
following VF removal for i40e.
Jakub Kicinski fixes a double release of rtnl_lock on
iavf_lan_add_device() error for iavf.
* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
iavf: fix double-release of rtnl_lock
i40e: Fix Error I40E_AQ_RC_EINVAL when removing VFs
====================
Link: https://lore.kernel.org/r/20201217223418.3134992-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
On 64-bit systems the packet procfs header field names following 'sk'
are not aligned correctly:
sk RefCnt Type Proto Iface R Rmem User Inode
00000000605d2c64 3 3 0003 7 1 450880 0 16643
00000000080e9b80 2 2 0000 0 0 0 0 17404
00000000b23b8a00 2 2 0000 0 0 0 0 17421
...
With this change field names are correctly aligned:
sk RefCnt Type Proto Iface R Rmem User Inode
000000005c3b1d97 3 3 0003 7 1 21568 0 16178
000000007be55bb7 3 3 fbce 8 1 0 0 16250
00000000be62127d 3 3 fbcd 8 1 0 0 16254
...
Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Link: https://lore.kernel.org/r/54917251d8433735d9a24e935a6cb8eb88b4058a.1608103684.git.baruch@tkos.co.il
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
It appears that despite its name, the bcm2836_arm_irqchip_ipi_eoi()
callback is an acknowledgement, and not an EOI. This means that
we lose IPIs that are made pending between the handling of the
IPI and the write to LOCAL_MAILBOX0_CLR0. With the right timing,
things fail nicely.
This used to work with handle_percpu_devid_fasteoi_ipi(), which
started by eoi-ing the interrupt. With the standard fasteoi flow,
this doesn't work anymore.
So let's use this callback for what it is, an ack. Your favourite
RPi-2/3 is back up and running.
Fixes: ffdad793d5 ("irqchip/bcm2836: Make IPIs use handle_percpu_devid_irq()")
Cc: Valentin Schneider <valentin.schneider@arm.com>
Reported-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/c9fb4ab3-a5cb-648c-6de3-c6a871e60870@roeck-us.net
Since commit 5fe71d271d ("irqchip/gic-v3-its: Tag ITS device as shared if
allocating for a proxy device"), some of the devices are wrongly marked as
"shared" by the ITS driver on systems equipped with the ITS(es). The
problem is that the @info->flags may not be initialized anywhere and we end
up looking at random bits on the stack. That's obviously not good.
We can perform the initialization in the IRQ core layer before calling
msi_domain_prepare_irqs(), which is neat enough.
Fixes: 5fe71d271d ("irqchip/gic-v3-its: Tag ITS device as shared if allocating for a proxy device")
Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201218060039.1770-1-yuzenghui@huawei.com
If we get a timeout sending then this happens:
spi_transfer_one_message()
->transfer_one() AKA spi_geni_transfer_one()
setup_fifo_xfer()
mas->cur_xfer = non-NULL
spi_transfer_wait() => TIMES OUT
if (msg->status != -EINPROGRESS)
goto out
if (ret != 0 ...)
spi_set_cs()
->set_cs AKA spi_geni_set_cs()
# mas->cur_xfer is non-NULL
The above happens _before_ the SPI core calls ->handle_err() AKA
handle_fifo_timeout().
Unfortunately that won't work so well on geni. If we got a timeout
transferring then it's likely that our interrupt handler is blocked,
but we need that same interrupt handler to run and the command channel
to be unblocked in order to adjust the chip select. Trying to set the
chip select doesn't crash us but ends up confusing our state machine
and leads to messages like: Premature done. rx_rem = 32 bpw8
Let's just drop the chip select request in this case. We can detect
the case because cur_xfer is non-NULL--it would have been set to NULL
in the interrupt handler if the previous transfer had finished. Sure,
we might leave the chip select in the wrong state but it's likely it
was going to fail anyway and this avoids getting the driver even more
confused about what it's doing.
The SPI core in general assumes that setting chip select is a simple
operation that doesn't fail. Yet another reason to just reconfigure
the chip select line as GPIOs.
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Link: https://lore.kernel.org/r/20201217142842.v3.3.I07afdedcc49655c5d26880f8df9170aac5792378@changeid
Signed-off-by: Mark Brown <broonie@kernel.org>
If we got a timeout when trying to send an abort command then it means
that we just got 3 timeouts in a row:
1. The original timeout that caused handle_fifo_timeout() to be
called.
2. A one second timeout waiting for the cancel command to finish.
3. A one second timeout waiting for the abort command to finish.
SPI is clocked by the controller, so nothing (aside from a hardware
fault or a totally broken sequencer) should be causing the actual
commands to fail in hardware. However, even though the hardware
itself is not expected to fail (and it'd be hard to predict how we
should handle things if it did), it's easy to hit the timeout case by
simply blocking our interrupt handler from running for a long period
of time. Obviously the system is in pretty bad shape if a interrupt
handler is blocked for > 2 seconds, but there are certainly bugs (even
bugs in other unrelated drivers) that can make this happen.
Let's make things a bit more robust against this case. If we fail to
abort we'll set a flag and then we'll block all future transfers until
we have no more interrupts pending.
Fixes: 561de45f72 ("spi: spi-geni-qcom: Add SPI driver support for GENI based QUP")
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Link: https://lore.kernel.org/r/20201217142842.v3.2.Ibade998ed587e070388b4bf58801f1107a40eb53@changeid
Signed-off-by: Mark Brown <broonie@kernel.org>
In commit 7ba9bdcb91 ("spi: spi-geni-qcom: Don't keep a local state
variable") we changed handle_fifo_timeout() so that we set
"mas->cur_xfer" to NULL to make absolutely sure that we don't mess
with the buffers from the previous transfer in the timeout case.
Unfortunately, this caused the IRQ handler to dereference NULL in some
cases. One case:
CPU0 CPU1
---- ----
setup_fifo_xfer()
geni_se_setup_m_cmd()
<hardware starts transfer>
<transfer completes in hardware>
<hardware sets M_RX_FIFO_WATERMARK_EN in m_irq>
...
handle_fifo_timeout()
spin_lock_irq(mas->lock)
mas->cur_xfer = NULL
geni_se_cancel_m_cmd()
spin_unlock_irq(mas->lock)
geni_spi_isr()
spin_lock(mas->lock)
if (m_irq & M_RX_FIFO_WATERMARK_EN)
geni_spi_handle_rx()
mas->cur_xfer NULL dereference!
tl;dr: Seriously delayed interrupts for RX/TX can lead to timeout
handling setting mas->cur_xfer to NULL.
Let's check for the NULL transfer in the TX and RX cases and reset the
watermark or clear out the fifo respectively to put the hardware back
into a sane state.
NOTE: things still could get confused if we get timeouts all the way
through handle_fifo_timeout() and then start a new transfer because
interrupts from the old transfer / cancel / abort could still be
pending. A future patch will help this corner case.
Fixes: 561de45f72 ("spi: spi-geni-qcom: Add SPI driver support for GENI based QUP")
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Link: https://lore.kernel.org/r/20201217142842.v3.1.I99ee04f0cb823415df59bd4f550d6ff5756e43d6@changeid
Signed-off-by: Mark Brown <broonie@kernel.org>
While converting the NFSv4 decoder to use xdr_stream-based XDR
processing, I removed the old SAVEMEM() macro. This macro wrapped
a bit of logic that avoided a memory allocation by recognizing when
the decoded item resides in a linear section of the Receive buffer.
In that case, it returned a pointer into that buffer instead of
allocating a bounce buffer.
The bounce buffer is necessary only when xdr_inline_decode() has
placed the decoded item in the xdr_stream's scratch buffer, which
disappears the next time xdr_inline_decode() is called with that
xdr_stream. That happens only if the data item crosses a page
boundary in the receive buffer, an exceedingly rare occurrence.
Allocating a bounce buffer every time results in a minor performance
regression that was introduced by the recent NFSv4 decoder overhaul.
Let's restore the previous behavior. On average, it saves about 1.5
kmalloc() calls per COMPOUND.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Daire Byrne reports a ~50% aggregrate throughput regression on his
Linux NFS server after commit da1661b93b ("SUNRPC: Teach server to
use xprt_sock_sendmsg for socket sends"), which replaced
kernel_send_page() calls in NFSD's socket send path with calls to
sock_sendmsg() using iov_iter.
Investigation showed that tcp_sendmsg() was not using zero-copy to
send the xdr_buf's bvec pages, but instead was relying on memcpy.
This means copying every byte of a large NFS READ payload.
It looks like TLS sockets do indeed support a ->sendpage method,
so it's really not necessary to use xprt_sock_sendmsg() to support
TLS fully on the server. A mechanical reversion of da1661b93b is
not possible at this point, but we can re-implement the server's
TCP socket sendmsg path using kernel_sendpage().
Reported-by: Daire Byrne <daire@dneg.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=209439
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
fs/nfsd/nfssvc.c:36:6: warning: symbol 'inter_copy_offload_enable' was not declared. Should it be static?
The parameter was added by commit ce0887ac96 ("NFSD add nfs4 inter
ssc to nfsd4_copy"). Relocate it into the source file that uses it,
and make it static. This approach is similar to the
nfs4_disable_idmapping, cltrack_prog, and cltrack_legacy_disable
module parameters.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
If the READ_PLUS operation was truncated due to an error, then ensure we
clear the 'eof' flag.
Fixes: 9f0b5792f0 ("NFSD: Encode a full READ_PLUS reply")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Ensure that we encode the data payload + padding, and that we truncate
the preallocated buffer to the actual read size.
Fixes: 528b84934e ("NFSD: Add READ_PLUS data support")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Rollback the reservation in the completion ring when we get a
NETDEV_TX_BUSY. When this error is received from the driver, we are
supposed to let the user application retry the transmit again. And in
order to do this, we need to roll back the failed send so it can be
retried. Unfortunately, we did not cancel the reservation we had made
in the completion ring. By not doing this, we actually make the
completion ring one entry smaller per NETDEV_TX_BUSY error we get, and
after enough of these errors the completion ring will be of size zero
and transmit will stop working.
Fix this by cancelling the reservation when we get a NETDEV_TX_BUSY
error.
Fixes: 642e450b6b ("xsk: Do not discard packet when NETDEV_TX_BUSY")
Reported-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Link: https://lore.kernel.org/bpf/20201218134525.13119-3-magnus.karlsson@gmail.com
Fix a race when multiple sockets are simultaneously calling sendto()
when the completion ring is shared in the SKB case. This is the case
when you share the same netdev and queue id through the
XDP_SHARED_UMEM bind flag. The problem is that multiple processes can
be in xsk_generic_xmit() and call the backpressure mechanism in
xskq_prod_reserve(xs->pool->cq). As this is a shared resource in this
specific scenario, a race might occur since the rings are
single-producer single-consumer.
Fix this by moving the tx_completion_lock from the socket to the pool
as the pool is shared between the sockets that share the completion
ring. (The pool is not shared when this is not the case.) And then
protect the accesses to xskq_prod_reserve() with this lock. The
tx_completion_lock is renamed cq_lock to better reflect that it
protects accesses to the potentially shared completion ring.
Fixes: 35fcde7f8d ("xsk: support for Tx")
Reported-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Link: https://lore.kernel.org/bpf/20201218134525.13119-2-magnus.karlsson@gmail.com
When remounting RO, after setting the superblock with the RO flag, the
cleaner task will start sleeping and do nothing, since the call to
btrfs_need_cleaner_sleep() keeps returning 'true'. However, when the
cleaner task goes to sleep, the list of delayed iputs may not be empty.
As long as we are in RO mode, the cleaner task will keep sleeping and
never run the delayed iputs. This means that if a filesystem unmount
is started, we get into close_ctree() with a non-empty list of delayed
iputs, and because the filesystem is in RO mode and is not in an error
state (or a transaction aborted), btrfs_error_commit_super() and
btrfs_commit_super(), which run the delayed iputs, are never called,
and later we fail the assertion that checks if the delayed iputs list
is empty:
assertion failed: list_empty(&fs_info->delayed_iputs), in fs/btrfs/disk-io.c:4049
------------[ cut here ]------------
kernel BUG at fs/btrfs/ctree.h:3153!
invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC PTI
CPU: 1 PID: 3780621 Comm: umount Tainted: G L 5.6.0-rc2-btrfs-next-73 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
RIP: 0010:assertfail.constprop.0+0x18/0x26 [btrfs]
Code: 8b 7b 58 48 85 ff 74 (...)
RSP: 0018:ffffb748c89bbdf8 EFLAGS: 00010246
RAX: 0000000000000051 RBX: ffff9608f2584000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffffff91998988 RDI: 00000000ffffffff
RBP: ffff9608f25870d8 R08: 0000000000000000 R09: 0000000000000001
R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffc0cbc500
R13: ffffffff92411750 R14: 0000000000000000 R15: ffff9608f2aab250
FS: 00007fcbfaa66c80(0000) GS:ffff960936c80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fffc2c2dd38 CR3: 0000000235e54002 CR4: 00000000003606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
close_ctree+0x1a2/0x2e6 [btrfs]
generic_shutdown_super+0x6c/0x100
kill_anon_super+0x14/0x30
btrfs_kill_super+0x12/0x20 [btrfs]
deactivate_locked_super+0x31/0x70
cleanup_mnt+0x100/0x160
task_work_run+0x93/0xc0
exit_to_usermode_loop+0xf9/0x100
do_syscall_64+0x20d/0x260
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7fcbfaca6307
Code: eb 0b 00 f7 d8 64 89 (...)
RSP: 002b:00007fffc2c2ed68 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
RAX: 0000000000000000 RBX: 0000558203b559b0 RCX: 00007fcbfaca6307
RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000558203b55bc0
RBP: 0000000000000000 R08: 0000000000000001 R09: 00007fffc2c2dad0
R10: 0000558203b55bf0 R11: 0000000000000246 R12: 0000558203b55bc0
R13: 00007fcbfadcc204 R14: 0000558203b55aa8 R15: 0000000000000000
Modules linked in: btrfs dm_flakey dm_log_writes (...)
---[ end trace d44d303790049ef6 ]---
So fix this by making the remount RO path run any remaining delayed iputs
after waiting for the cleaner to become inactive.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Add an assertion to close_ctree(), after destroying all the work queues,
to verify we do not have any transaction still open or committing at that
at that point. If we have any, it means something is seriously wrong and
that can cause memory leaks and use-after-free problems. This is motivated
by the previous patches that fixed bugs where we ended up leaking an open
transaction after unmounting the filesystem.
Tested-by: Fabian Vogt <fvogt@suse.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When we are remounting a filesystem in RO mode we can race with the cleaner
task and result in leaking a transaction if the filesystem is unmounted
shortly after, before the transaction kthread had a chance to commit that
transaction. That also results in a crash during unmount, due to a
use-after-free, if hardware acceleration is not available for crc32c.
The following sequence of steps explains how the race happens.
1) The filesystem is mounted in RW mode and the cleaner task is running.
This means that currently BTRFS_FS_CLEANER_RUNNING is set at
fs_info->flags;
2) The cleaner task is currently running delayed iputs for example;
3) A filesystem RO remount operation starts;
4) The RO remount task calls btrfs_commit_super(), which commits any
currently open transaction, and it finishes;
5) At this point the cleaner task is still running and it creates a new
transaction by doing one of the following things:
* When running the delayed iput() for an inode with a 0 link count,
in which case at btrfs_evict_inode() we start a transaction through
the call to evict_refill_and_join(), use it and then release its
handle through btrfs_end_transaction();
* When deleting a dead root through btrfs_clean_one_deleted_snapshot(),
a transaction is started at btrfs_drop_snapshot() and then its handle
is released through a call to btrfs_end_transaction_throttle();
* When the remount task was still running, and before the remount task
called btrfs_delete_unused_bgs(), the cleaner task also called
btrfs_delete_unused_bgs() and it picked and removed one block group
from the list of unused block groups. Before the cleaner task started
a transaction, through btrfs_start_trans_remove_block_group() at
btrfs_delete_unused_bgs(), the remount task had already called
btrfs_commit_super();
6) So at this point the filesystem is in RO mode and we have an open
transaction that was started by the cleaner task;
7) Shortly after a filesystem unmount operation starts. At close_ctree()
we stop the transaction kthread before it had a chance to commit the
transaction, since less than 30 seconds (the default commit interval)
have elapsed since the last transaction was committed;
8) We end up calling iput() against the btree inode at close_ctree() while
there is an open transaction, and since that transaction was used to
update btrees by the cleaner, we have dirty pages in the btree inode
due to COW operations on metadata extents, and therefore writeback is
triggered for the btree inode.
So btree_write_cache_pages() is invoked to flush those dirty pages
during the final iput() on the btree inode. This results in creating a
bio and submitting it, which makes us end up at
btrfs_submit_metadata_bio();
9) At btrfs_submit_metadata_bio() we end up at the if-then-else branch
that calls btrfs_wq_submit_bio(), because check_async_write() returned
a value of 1. This value of 1 is because we did not have hardware
acceleration available for crc32c, so BTRFS_FS_CSUM_IMPL_FAST was not
set in fs_info->flags;
10) Then at btrfs_wq_submit_bio() we call btrfs_queue_work() against the
workqueue at fs_info->workers, which was already freed before by the
call to btrfs_stop_all_workers() at close_ctree(). This results in an
invalid memory access due to a use-after-free, leading to a crash.
When this happens, before the crash there are several warnings triggered,
since we have reserved metadata space in a block group, the delayed refs
reservation, etc:
------------[ cut here ]------------
WARNING: CPU: 4 PID: 1729896 at fs/btrfs/block-group.c:125 btrfs_put_block_group+0x63/0xa0 [btrfs]
Modules linked in: btrfs dm_snapshot dm_thin_pool (...)
CPU: 4 PID: 1729896 Comm: umount Tainted: G B W 5.10.0-rc4-btrfs-next-73 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
RIP: 0010:btrfs_put_block_group+0x63/0xa0 [btrfs]
Code: f0 01 00 00 48 39 c2 75 (...)
RSP: 0018:ffffb270826bbdd8 EFLAGS: 00010206
RAX: 0000000000000001 RBX: ffff947ed73e4000 RCX: ffff947ebc8b29c8
RDX: 0000000000000001 RSI: ffffffffc0b150a0 RDI: ffff947ebc8b2800
RBP: ffff947ebc8b2800 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000001 R12: ffff947ed73e4110
R13: ffff947ed73e4160 R14: ffff947ebc8b2988 R15: dead000000000100
FS: 00007f15edfea840(0000) GS:ffff9481ad600000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f37e2893320 CR3: 0000000138f68001 CR4: 00000000003706e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
btrfs_free_block_groups+0x17f/0x2f0 [btrfs]
close_ctree+0x2ba/0x2fa [btrfs]
generic_shutdown_super+0x6c/0x100
kill_anon_super+0x14/0x30
btrfs_kill_super+0x12/0x20 [btrfs]
deactivate_locked_super+0x31/0x70
cleanup_mnt+0x100/0x160
task_work_run+0x68/0xb0
exit_to_user_mode_prepare+0x1bb/0x1c0
syscall_exit_to_user_mode+0x4b/0x260
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f15ee221ee7
Code: ff 0b 00 f7 d8 64 89 01 48 (...)
RSP: 002b:00007ffe9470f0f8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
RAX: 0000000000000000 RBX: 00007f15ee347264 RCX: 00007f15ee221ee7
RDX: ffffffffffffff78 RSI: 0000000000000000 RDI: 000056169701d000
RBP: 0000561697018a30 R08: 0000000000000000 R09: 00007f15ee2e2be0
R10: 000056169701efe0 R11: 0000000000000246 R12: 0000000000000000
R13: 000056169701d000 R14: 0000561697018b40 R15: 0000561697018c60
irq event stamp: 0
hardirqs last enabled at (0): [<0000000000000000>] 0x0
hardirqs last disabled at (0): [<ffffffff8bcae560>] copy_process+0x8a0/0x1d70
softirqs last enabled at (0): [<ffffffff8bcae560>] copy_process+0x8a0/0x1d70
softirqs last disabled at (0): [<0000000000000000>] 0x0
---[ end trace dd74718fef1ed5c6 ]---
------------[ cut here ]------------
WARNING: CPU: 2 PID: 1729896 at fs/btrfs/block-rsv.c:459 btrfs_release_global_block_rsv+0x70/0xc0 [btrfs]
Modules linked in: btrfs dm_snapshot dm_thin_pool (...)
CPU: 2 PID: 1729896 Comm: umount Tainted: G B W 5.10.0-rc4-btrfs-next-73 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
RIP: 0010:btrfs_release_global_block_rsv+0x70/0xc0 [btrfs]
Code: 48 83 bb b0 03 00 00 00 (...)
RSP: 0018:ffffb270826bbdd8 EFLAGS: 00010206
RAX: 000000000033c000 RBX: ffff947ed73e4000 RCX: 0000000000000000
RDX: 0000000000000001 RSI: ffffffffc0b0d8c1 RDI: 00000000ffffffff
RBP: ffff947ebc8b7000 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000001 R12: ffff947ed73e4110
R13: ffff947ed73e5278 R14: dead000000000122 R15: dead000000000100
FS: 00007f15edfea840(0000) GS:ffff9481aca00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000561a79f76e20 CR3: 0000000138f68006 CR4: 00000000003706e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
btrfs_free_block_groups+0x24c/0x2f0 [btrfs]
close_ctree+0x2ba/0x2fa [btrfs]
generic_shutdown_super+0x6c/0x100
kill_anon_super+0x14/0x30
btrfs_kill_super+0x12/0x20 [btrfs]
deactivate_locked_super+0x31/0x70
cleanup_mnt+0x100/0x160
task_work_run+0x68/0xb0
exit_to_user_mode_prepare+0x1bb/0x1c0
syscall_exit_to_user_mode+0x4b/0x260
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f15ee221ee7
Code: ff 0b 00 f7 d8 64 89 01 (...)
RSP: 002b:00007ffe9470f0f8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
RAX: 0000000000000000 RBX: 00007f15ee347264 RCX: 00007f15ee221ee7
RDX: ffffffffffffff78 RSI: 0000000000000000 RDI: 000056169701d000
RBP: 0000561697018a30 R08: 0000000000000000 R09: 00007f15ee2e2be0
R10: 000056169701efe0 R11: 0000000000000246 R12: 0000000000000000
R13: 000056169701d000 R14: 0000561697018b40 R15: 0000561697018c60
irq event stamp: 0
hardirqs last enabled at (0): [<0000000000000000>] 0x0
hardirqs last disabled at (0): [<ffffffff8bcae560>] copy_process+0x8a0/0x1d70
softirqs last enabled at (0): [<ffffffff8bcae560>] copy_process+0x8a0/0x1d70
softirqs last disabled at (0): [<0000000000000000>] 0x0
---[ end trace dd74718fef1ed5c7 ]---
------------[ cut here ]------------
WARNING: CPU: 2 PID: 1729896 at fs/btrfs/block-group.c:3377 btrfs_free_block_groups+0x25d/0x2f0 [btrfs]
Modules linked in: btrfs dm_snapshot dm_thin_pool (...)
CPU: 5 PID: 1729896 Comm: umount Tainted: G B W 5.10.0-rc4-btrfs-next-73 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
RIP: 0010:btrfs_free_block_groups+0x25d/0x2f0 [btrfs]
Code: ad de 49 be 22 01 00 (...)
RSP: 0018:ffffb270826bbde8 EFLAGS: 00010206
RAX: ffff947ebeae1d08 RBX: ffff947ed73e4000 RCX: 0000000000000000
RDX: 0000000000000001 RSI: ffff947e9d823ae8 RDI: 0000000000000246
RBP: ffff947ebeae1d08 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000001 R12: ffff947ebeae1c00
R13: ffff947ed73e5278 R14: dead000000000122 R15: dead000000000100
FS: 00007f15edfea840(0000) GS:ffff9481ad200000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f1475d98ea8 CR3: 0000000138f68005 CR4: 00000000003706e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
close_ctree+0x2ba/0x2fa [btrfs]
generic_shutdown_super+0x6c/0x100
kill_anon_super+0x14/0x30
btrfs_kill_super+0x12/0x20 [btrfs]
deactivate_locked_super+0x31/0x70
cleanup_mnt+0x100/0x160
task_work_run+0x68/0xb0
exit_to_user_mode_prepare+0x1bb/0x1c0
syscall_exit_to_user_mode+0x4b/0x260
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f15ee221ee7
Code: ff 0b 00 f7 d8 64 89 (...)
RSP: 002b:00007ffe9470f0f8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
RAX: 0000000000000000 RBX: 00007f15ee347264 RCX: 00007f15ee221ee7
RDX: ffffffffffffff78 RSI: 0000000000000000 RDI: 000056169701d000
RBP: 0000561697018a30 R08: 0000000000000000 R09: 00007f15ee2e2be0
R10: 000056169701efe0 R11: 0000000000000246 R12: 0000000000000000
R13: 000056169701d000 R14: 0000561697018b40 R15: 0000561697018c60
irq event stamp: 0
hardirqs last enabled at (0): [<0000000000000000>] 0x0
hardirqs last disabled at (0): [<ffffffff8bcae560>] copy_process+0x8a0/0x1d70
softirqs last enabled at (0): [<ffffffff8bcae560>] copy_process+0x8a0/0x1d70
softirqs last disabled at (0): [<0000000000000000>] 0x0
---[ end trace dd74718fef1ed5c8 ]---
BTRFS info (device sdc): space_info 4 has 268238848 free, is not full
BTRFS info (device sdc): space_info total=268435456, used=114688, pinned=0, reserved=16384, may_use=0, readonly=65536
BTRFS info (device sdc): global_block_rsv: size 0 reserved 0
BTRFS info (device sdc): trans_block_rsv: size 0 reserved 0
BTRFS info (device sdc): chunk_block_rsv: size 0 reserved 0
BTRFS info (device sdc): delayed_block_rsv: size 0 reserved 0
BTRFS info (device sdc): delayed_refs_rsv: size 524288 reserved 0
And the crash, which only happens when we do not have crc32c hardware
acceleration, produces the following trace immediately after those
warnings:
stack segment: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC PTI
CPU: 2 PID: 1749129 Comm: umount Tainted: G B W 5.10.0-rc4-btrfs-next-73 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
RIP: 0010:btrfs_queue_work+0x36/0x190 [btrfs]
Code: 54 55 53 48 89 f3 (...)
RSP: 0018:ffffb27082443ae8 EFLAGS: 00010282
RAX: 0000000000000004 RBX: ffff94810ee9ad90 RCX: 0000000000000000
RDX: 0000000000000001 RSI: ffff94810ee9ad90 RDI: ffff947ed8ee75a0
RBP: a56b6b6b6b6b6b6b R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000007 R11: 0000000000000001 R12: ffff947fa9b435a8
R13: ffff94810ee9ad90 R14: 0000000000000000 R15: ffff947e93dc0000
FS: 00007f3cfe974840(0000) GS:ffff9481ac600000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f1b42995a70 CR3: 0000000127638003 CR4: 00000000003706e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
btrfs_wq_submit_bio+0xb3/0xd0 [btrfs]
btrfs_submit_metadata_bio+0x44/0xc0 [btrfs]
submit_one_bio+0x61/0x70 [btrfs]
btree_write_cache_pages+0x414/0x450 [btrfs]
? kobject_put+0x9a/0x1d0
? trace_hardirqs_on+0x1b/0xf0
? _raw_spin_unlock_irqrestore+0x3c/0x60
? free_debug_processing+0x1e1/0x2b0
do_writepages+0x43/0xe0
? lock_acquired+0x199/0x490
__writeback_single_inode+0x59/0x650
writeback_single_inode+0xaf/0x120
write_inode_now+0x94/0xd0
iput+0x187/0x2b0
close_ctree+0x2c6/0x2fa [btrfs]
generic_shutdown_super+0x6c/0x100
kill_anon_super+0x14/0x30
btrfs_kill_super+0x12/0x20 [btrfs]
deactivate_locked_super+0x31/0x70
cleanup_mnt+0x100/0x160
task_work_run+0x68/0xb0
exit_to_user_mode_prepare+0x1bb/0x1c0
syscall_exit_to_user_mode+0x4b/0x260
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f3cfebabee7
Code: ff 0b 00 f7 d8 64 89 01 (...)
RSP: 002b:00007ffc9c9a05f8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
RAX: 0000000000000000 RBX: 00007f3cfecd1264 RCX: 00007f3cfebabee7
RDX: ffffffffffffff78 RSI: 0000000000000000 RDI: 0000562b6b478000
RBP: 0000562b6b473a30 R08: 0000000000000000 R09: 00007f3cfec6cbe0
R10: 0000562b6b479fe0 R11: 0000000000000246 R12: 0000000000000000
R13: 0000562b6b478000 R14: 0000562b6b473b40 R15: 0000562b6b473c60
Modules linked in: btrfs dm_snapshot dm_thin_pool (...)
---[ end trace dd74718fef1ed5cc ]---
Finally when we remove the btrfs module (rmmod btrfs), there are several
warnings about objects that were allocated from our slabs but were never
freed, consequence of the transaction that was never committed and got
leaked:
=============================================================================
BUG btrfs_delayed_ref_head (Tainted: G B W ): Objects remaining in btrfs_delayed_ref_head on __kmem_cache_shutdown()
-----------------------------------------------------------------------------
INFO: Slab 0x0000000094c2ae56 objects=24 used=2 fp=0x000000002bfa2521 flags=0x17fffc000010200
CPU: 5 PID: 1729921 Comm: rmmod Tainted: G B W 5.10.0-rc4-btrfs-next-73 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Call Trace:
dump_stack+0x8d/0xb5
slab_err+0xb7/0xdc
? lock_acquired+0x199/0x490
__kmem_cache_shutdown+0x1ac/0x3c0
? lock_release+0x20e/0x4c0
kmem_cache_destroy+0x55/0x120
btrfs_delayed_ref_exit+0x11/0x35 [btrfs]
exit_btrfs_fs+0xa/0x59 [btrfs]
__x64_sys_delete_module+0x194/0x260
? fpregs_assert_state_consistent+0x1e/0x40
? exit_to_user_mode_prepare+0x55/0x1c0
? trace_hardirqs_on+0x1b/0xf0
do_syscall_64+0x33/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f693e305897
Code: 73 01 c3 48 8b 0d f9 f5 (...)
RSP: 002b:00007ffcf73eb508 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
RAX: ffffffffffffffda RBX: 0000559df504f760 RCX: 00007f693e305897
RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000559df504f7c8
RBP: 00007ffcf73eb568 R08: 0000000000000000 R09: 0000000000000000
R10: 00007f693e378ac0 R11: 0000000000000206 R12: 00007ffcf73eb740
R13: 00007ffcf73ec5a6 R14: 0000559df504f2a0 R15: 0000559df504f760
INFO: Object 0x0000000050cbdd61 @offset=12104
INFO: Allocated in btrfs_add_delayed_tree_ref+0xbb/0x480 [btrfs] age=1894 cpu=6 pid=1729873
__slab_alloc.isra.0+0x109/0x1c0
kmem_cache_alloc+0x7bb/0x830
btrfs_add_delayed_tree_ref+0xbb/0x480 [btrfs]
btrfs_free_tree_block+0x128/0x360 [btrfs]
__btrfs_cow_block+0x489/0x5f0 [btrfs]
btrfs_cow_block+0xf7/0x220 [btrfs]
btrfs_search_slot+0x62a/0xc40 [btrfs]
btrfs_del_orphan_item+0x65/0xd0 [btrfs]
btrfs_find_orphan_roots+0x1bf/0x200 [btrfs]
open_ctree+0x125a/0x18a0 [btrfs]
btrfs_mount_root.cold+0x13/0xed [btrfs]
legacy_get_tree+0x30/0x60
vfs_get_tree+0x28/0xe0
fc_mount+0xe/0x40
vfs_kern_mount.part.0+0x71/0x90
btrfs_mount+0x13b/0x3e0 [btrfs]
INFO: Freed in __btrfs_run_delayed_refs+0x1117/0x1290 [btrfs] age=4292 cpu=2 pid=1729526
kmem_cache_free+0x34c/0x3c0
__btrfs_run_delayed_refs+0x1117/0x1290 [btrfs]
btrfs_run_delayed_refs+0x81/0x210 [btrfs]
commit_cowonly_roots+0xfb/0x300 [btrfs]
btrfs_commit_transaction+0x367/0xc40 [btrfs]
sync_filesystem+0x74/0x90
generic_shutdown_super+0x22/0x100
kill_anon_super+0x14/0x30
btrfs_kill_super+0x12/0x20 [btrfs]
deactivate_locked_super+0x31/0x70
cleanup_mnt+0x100/0x160
task_work_run+0x68/0xb0
exit_to_user_mode_prepare+0x1bb/0x1c0
syscall_exit_to_user_mode+0x4b/0x260
entry_SYSCALL_64_after_hwframe+0x44/0xa9
INFO: Object 0x0000000086e9b0ff @offset=12776
INFO: Allocated in btrfs_add_delayed_tree_ref+0xbb/0x480 [btrfs] age=1900 cpu=6 pid=1729873
__slab_alloc.isra.0+0x109/0x1c0
kmem_cache_alloc+0x7bb/0x830
btrfs_add_delayed_tree_ref+0xbb/0x480 [btrfs]
btrfs_alloc_tree_block+0x2bf/0x360 [btrfs]
alloc_tree_block_no_bg_flush+0x4f/0x60 [btrfs]
__btrfs_cow_block+0x12d/0x5f0 [btrfs]
btrfs_cow_block+0xf7/0x220 [btrfs]
btrfs_search_slot+0x62a/0xc40 [btrfs]
btrfs_del_orphan_item+0x65/0xd0 [btrfs]
btrfs_find_orphan_roots+0x1bf/0x200 [btrfs]
open_ctree+0x125a/0x18a0 [btrfs]
btrfs_mount_root.cold+0x13/0xed [btrfs]
legacy_get_tree+0x30/0x60
vfs_get_tree+0x28/0xe0
fc_mount+0xe/0x40
vfs_kern_mount.part.0+0x71/0x90
INFO: Freed in __btrfs_run_delayed_refs+0x1117/0x1290 [btrfs] age=3141 cpu=6 pid=1729803
kmem_cache_free+0x34c/0x3c0
__btrfs_run_delayed_refs+0x1117/0x1290 [btrfs]
btrfs_run_delayed_refs+0x81/0x210 [btrfs]
btrfs_write_dirty_block_groups+0x17d/0x3d0 [btrfs]
commit_cowonly_roots+0x248/0x300 [btrfs]
btrfs_commit_transaction+0x367/0xc40 [btrfs]
close_ctree+0x113/0x2fa [btrfs]
generic_shutdown_super+0x6c/0x100
kill_anon_super+0x14/0x30
btrfs_kill_super+0x12/0x20 [btrfs]
deactivate_locked_super+0x31/0x70
cleanup_mnt+0x100/0x160
task_work_run+0x68/0xb0
exit_to_user_mode_prepare+0x1bb/0x1c0
syscall_exit_to_user_mode+0x4b/0x260
entry_SYSCALL_64_after_hwframe+0x44/0xa9
kmem_cache_destroy btrfs_delayed_ref_head: Slab cache still has objects
CPU: 5 PID: 1729921 Comm: rmmod Tainted: G B W 5.10.0-rc4-btrfs-next-73 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Call Trace:
dump_stack+0x8d/0xb5
kmem_cache_destroy+0x119/0x120
btrfs_delayed_ref_exit+0x11/0x35 [btrfs]
exit_btrfs_fs+0xa/0x59 [btrfs]
__x64_sys_delete_module+0x194/0x260
? fpregs_assert_state_consistent+0x1e/0x40
? exit_to_user_mode_prepare+0x55/0x1c0
? trace_hardirqs_on+0x1b/0xf0
do_syscall_64+0x33/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f693e305897
Code: 73 01 c3 48 8b 0d f9 f5 0b (...)
RSP: 002b:00007ffcf73eb508 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
RAX: ffffffffffffffda RBX: 0000559df504f760 RCX: 00007f693e305897
RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000559df504f7c8
RBP: 00007ffcf73eb568 R08: 0000000000000000 R09: 0000000000000000
R10: 00007f693e378ac0 R11: 0000000000000206 R12: 00007ffcf73eb740
R13: 00007ffcf73ec5a6 R14: 0000559df504f2a0 R15: 0000559df504f760
=============================================================================
BUG btrfs_delayed_tree_ref (Tainted: G B W ): Objects remaining in btrfs_delayed_tree_ref on __kmem_cache_shutdown()
-----------------------------------------------------------------------------
INFO: Slab 0x0000000011f78dc0 objects=37 used=2 fp=0x0000000032d55d91 flags=0x17fffc000010200
CPU: 3 PID: 1729921 Comm: rmmod Tainted: G B W 5.10.0-rc4-btrfs-next-73 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Call Trace:
dump_stack+0x8d/0xb5
slab_err+0xb7/0xdc
? lock_acquired+0x199/0x490
__kmem_cache_shutdown+0x1ac/0x3c0
? lock_release+0x20e/0x4c0
kmem_cache_destroy+0x55/0x120
btrfs_delayed_ref_exit+0x1d/0x35 [btrfs]
exit_btrfs_fs+0xa/0x59 [btrfs]
__x64_sys_delete_module+0x194/0x260
? fpregs_assert_state_consistent+0x1e/0x40
? exit_to_user_mode_prepare+0x55/0x1c0
? trace_hardirqs_on+0x1b/0xf0
do_syscall_64+0x33/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f693e305897
Code: 73 01 c3 48 8b 0d f9 f5 (...)
RSP: 002b:00007ffcf73eb508 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
RAX: ffffffffffffffda RBX: 0000559df504f760 RCX: 00007f693e305897
RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000559df504f7c8
RBP: 00007ffcf73eb568 R08: 0000000000000000 R09: 0000000000000000
R10: 00007f693e378ac0 R11: 0000000000000206 R12: 00007ffcf73eb740
R13: 00007ffcf73ec5a6 R14: 0000559df504f2a0 R15: 0000559df504f760
INFO: Object 0x000000001a340018 @offset=4408
INFO: Allocated in btrfs_add_delayed_tree_ref+0x9e/0x480 [btrfs] age=1917 cpu=6 pid=1729873
__slab_alloc.isra.0+0x109/0x1c0
kmem_cache_alloc+0x7bb/0x830
btrfs_add_delayed_tree_ref+0x9e/0x480 [btrfs]
btrfs_free_tree_block+0x128/0x360 [btrfs]
__btrfs_cow_block+0x489/0x5f0 [btrfs]
btrfs_cow_block+0xf7/0x220 [btrfs]
btrfs_search_slot+0x62a/0xc40 [btrfs]
btrfs_del_orphan_item+0x65/0xd0 [btrfs]
btrfs_find_orphan_roots+0x1bf/0x200 [btrfs]
open_ctree+0x125a/0x18a0 [btrfs]
btrfs_mount_root.cold+0x13/0xed [btrfs]
legacy_get_tree+0x30/0x60
vfs_get_tree+0x28/0xe0
fc_mount+0xe/0x40
vfs_kern_mount.part.0+0x71/0x90
btrfs_mount+0x13b/0x3e0 [btrfs]
INFO: Freed in __btrfs_run_delayed_refs+0x63d/0x1290 [btrfs] age=4167 cpu=4 pid=1729795
kmem_cache_free+0x34c/0x3c0
__btrfs_run_delayed_refs+0x63d/0x1290 [btrfs]
btrfs_run_delayed_refs+0x81/0x210 [btrfs]
btrfs_commit_transaction+0x60/0xc40 [btrfs]
create_subvol+0x56a/0x990 [btrfs]
btrfs_mksubvol+0x3fb/0x4a0 [btrfs]
__btrfs_ioctl_snap_create+0x119/0x1a0 [btrfs]
btrfs_ioctl_snap_create+0x58/0x80 [btrfs]
btrfs_ioctl+0x1a92/0x36f0 [btrfs]
__x64_sys_ioctl+0x83/0xb0
do_syscall_64+0x33/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xa9
INFO: Object 0x000000002b46292a @offset=13648
INFO: Allocated in btrfs_add_delayed_tree_ref+0x9e/0x480 [btrfs] age=1923 cpu=6 pid=1729873
__slab_alloc.isra.0+0x109/0x1c0
kmem_cache_alloc+0x7bb/0x830
btrfs_add_delayed_tree_ref+0x9e/0x480 [btrfs]
btrfs_alloc_tree_block+0x2bf/0x360 [btrfs]
alloc_tree_block_no_bg_flush+0x4f/0x60 [btrfs]
__btrfs_cow_block+0x12d/0x5f0 [btrfs]
btrfs_cow_block+0xf7/0x220 [btrfs]
btrfs_search_slot+0x62a/0xc40 [btrfs]
btrfs_del_orphan_item+0x65/0xd0 [btrfs]
btrfs_find_orphan_roots+0x1bf/0x200 [btrfs]
open_ctree+0x125a/0x18a0 [btrfs]
btrfs_mount_root.cold+0x13/0xed [btrfs]
legacy_get_tree+0x30/0x60
vfs_get_tree+0x28/0xe0
fc_mount+0xe/0x40
vfs_kern_mount.part.0+0x71/0x90
INFO: Freed in __btrfs_run_delayed_refs+0x63d/0x1290 [btrfs] age=3164 cpu=6 pid=1729803
kmem_cache_free+0x34c/0x3c0
__btrfs_run_delayed_refs+0x63d/0x1290 [btrfs]
btrfs_run_delayed_refs+0x81/0x210 [btrfs]
commit_cowonly_roots+0xfb/0x300 [btrfs]
btrfs_commit_transaction+0x367/0xc40 [btrfs]
close_ctree+0x113/0x2fa [btrfs]
generic_shutdown_super+0x6c/0x100
kill_anon_super+0x14/0x30
btrfs_kill_super+0x12/0x20 [btrfs]
deactivate_locked_super+0x31/0x70
cleanup_mnt+0x100/0x160
task_work_run+0x68/0xb0
exit_to_user_mode_prepare+0x1bb/0x1c0
syscall_exit_to_user_mode+0x4b/0x260
entry_SYSCALL_64_after_hwframe+0x44/0xa9
kmem_cache_destroy btrfs_delayed_tree_ref: Slab cache still has objects
CPU: 5 PID: 1729921 Comm: rmmod Tainted: G B W 5.10.0-rc4-btrfs-next-73 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Call Trace:
dump_stack+0x8d/0xb5
kmem_cache_destroy+0x119/0x120
btrfs_delayed_ref_exit+0x1d/0x35 [btrfs]
exit_btrfs_fs+0xa/0x59 [btrfs]
__x64_sys_delete_module+0x194/0x260
? fpregs_assert_state_consistent+0x1e/0x40
? exit_to_user_mode_prepare+0x55/0x1c0
? trace_hardirqs_on+0x1b/0xf0
do_syscall_64+0x33/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f693e305897
Code: 73 01 c3 48 8b 0d f9 f5 (...)
RSP: 002b:00007ffcf73eb508 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
RAX: ffffffffffffffda RBX: 0000559df504f760 RCX: 00007f693e305897
RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000559df504f7c8
RBP: 00007ffcf73eb568 R08: 0000000000000000 R09: 0000000000000000
R10: 00007f693e378ac0 R11: 0000000000000206 R12: 00007ffcf73eb740
R13: 00007ffcf73ec5a6 R14: 0000559df504f2a0 R15: 0000559df504f760
=============================================================================
BUG btrfs_delayed_extent_op (Tainted: G B W ): Objects remaining in btrfs_delayed_extent_op on __kmem_cache_shutdown()
-----------------------------------------------------------------------------
INFO: Slab 0x00000000f145ce2f objects=22 used=1 fp=0x00000000af0f92cf flags=0x17fffc000010200
CPU: 5 PID: 1729921 Comm: rmmod Tainted: G B W 5.10.0-rc4-btrfs-next-73 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Call Trace:
dump_stack+0x8d/0xb5
slab_err+0xb7/0xdc
? lock_acquired+0x199/0x490
__kmem_cache_shutdown+0x1ac/0x3c0
? __mutex_unlock_slowpath+0x45/0x2a0
kmem_cache_destroy+0x55/0x120
exit_btrfs_fs+0xa/0x59 [btrfs]
__x64_sys_delete_module+0x194/0x260
? fpregs_assert_state_consistent+0x1e/0x40
? exit_to_user_mode_prepare+0x55/0x1c0
? trace_hardirqs_on+0x1b/0xf0
do_syscall_64+0x33/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f693e305897
Code: 73 01 c3 48 8b 0d f9 f5 (...)
RSP: 002b:00007ffcf73eb508 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
RAX: ffffffffffffffda RBX: 0000559df504f760 RCX: 00007f693e305897
RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000559df504f7c8
RBP: 00007ffcf73eb568 R08: 0000000000000000 R09: 0000000000000000
R10: 00007f693e378ac0 R11: 0000000000000206 R12: 00007ffcf73eb740
R13: 00007ffcf73ec5a6 R14: 0000559df504f2a0 R15: 0000559df504f760
INFO: Object 0x000000004cf95ea8 @offset=6264
INFO: Allocated in btrfs_alloc_tree_block+0x1e0/0x360 [btrfs] age=1931 cpu=6 pid=1729873
__slab_alloc.isra.0+0x109/0x1c0
kmem_cache_alloc+0x7bb/0x830
btrfs_alloc_tree_block+0x1e0/0x360 [btrfs]
alloc_tree_block_no_bg_flush+0x4f/0x60 [btrfs]
__btrfs_cow_block+0x12d/0x5f0 [btrfs]
btrfs_cow_block+0xf7/0x220 [btrfs]
btrfs_search_slot+0x62a/0xc40 [btrfs]
btrfs_del_orphan_item+0x65/0xd0 [btrfs]
btrfs_find_orphan_roots+0x1bf/0x200 [btrfs]
open_ctree+0x125a/0x18a0 [btrfs]
btrfs_mount_root.cold+0x13/0xed [btrfs]
legacy_get_tree+0x30/0x60
vfs_get_tree+0x28/0xe0
fc_mount+0xe/0x40
vfs_kern_mount.part.0+0x71/0x90
btrfs_mount+0x13b/0x3e0 [btrfs]
INFO: Freed in __btrfs_run_delayed_refs+0xabd/0x1290 [btrfs] age=3173 cpu=6 pid=1729803
kmem_cache_free+0x34c/0x3c0
__btrfs_run_delayed_refs+0xabd/0x1290 [btrfs]
btrfs_run_delayed_refs+0x81/0x210 [btrfs]
commit_cowonly_roots+0xfb/0x300 [btrfs]
btrfs_commit_transaction+0x367/0xc40 [btrfs]
close_ctree+0x113/0x2fa [btrfs]
generic_shutdown_super+0x6c/0x100
kill_anon_super+0x14/0x30
btrfs_kill_super+0x12/0x20 [btrfs]
deactivate_locked_super+0x31/0x70
cleanup_mnt+0x100/0x160
task_work_run+0x68/0xb0
exit_to_user_mode_prepare+0x1bb/0x1c0
syscall_exit_to_user_mode+0x4b/0x260
entry_SYSCALL_64_after_hwframe+0x44/0xa9
kmem_cache_destroy btrfs_delayed_extent_op: Slab cache still has objects
CPU: 3 PID: 1729921 Comm: rmmod Tainted: G B W 5.10.0-rc4-btrfs-next-73 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Call Trace:
dump_stack+0x8d/0xb5
kmem_cache_destroy+0x119/0x120
exit_btrfs_fs+0xa/0x59 [btrfs]
__x64_sys_delete_module+0x194/0x260
? fpregs_assert_state_consistent+0x1e/0x40
? exit_to_user_mode_prepare+0x55/0x1c0
? trace_hardirqs_on+0x1b/0xf0
do_syscall_64+0x33/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f693e305897
Code: 73 01 c3 48 8b 0d f9 (...)
RSP: 002b:00007ffcf73eb508 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
RAX: ffffffffffffffda RBX: 0000559df504f760 RCX: 00007f693e305897
RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000559df504f7c8
RBP: 00007ffcf73eb568 R08: 0000000000000000 R09: 0000000000000000
R10: 00007f693e378ac0 R11: 0000000000000206 R12: 00007ffcf73eb740
R13: 00007ffcf73ec5a6 R14: 0000559df504f2a0 R15: 0000559df504f760
BTRFS: state leak: start 30408704 end 30425087 state 1 in tree 1 refs 1
So fix this by making the remount path to wait for the cleaner task before
calling btrfs_commit_super(). The remount path now waits for the bit
BTRFS_FS_CLEANER_RUNNING to be cleared from fs_info->flags before calling
btrfs_commit_super() and this ensures the cleaner can not start a
transaction after that, because it sleeps when the filesystem is in RO
mode and we have already flagged the filesystem as RO before waiting for
BTRFS_FS_CLEANER_RUNNING to be cleared.
This also introduces a new flag BTRFS_FS_STATE_RO to be used for
fs_info->fs_state when the filesystem is in RO mode. This is because we
were doing the RO check using the flags of the superblock and setting the
RO mode simply by ORing into the superblock's flags - those operations are
not atomic and could result in the cleaner not seeing the update from the
remount task after it clears BTRFS_FS_CLEANER_RUNNING.
Tested-by: Fabian Vogt <fvogt@suse.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
btrfs_discard_workfn() drops discard_ctl->lock just to take it again in
a moment in btrfs_discard_schedule_work(). Avoid that and also reuse
ktime.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Because only one discard worker may be running at any given point, it
could have been safe to modify ->prev_discard, etc. without
synchronization, if not for @override flag in
btrfs_discard_schedule_work() and delayed_work_pending() returning false
while workfn is running.
That may lead to torn reads of u64 for some architectures, but that's
not a big problem as only slightly affects the discard rate.
Suggested-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Might happen that bg->discard_eligible_time was changed without
rescheduling, so btrfs_discard_workfn() wakes up earlier than that new
time, peek_discard_list() returns NULL, and all work halts and goes to
sleep without further rescheduling even there are block groups to
discard.
It happens pretty often, but not so visible from the userspace because
after some time it usually will be kicked off anyway by someone else
calling btrfs_discard_reschedule_work().
Fix it by continue rescheduling if block group discard lists are not
empty.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
I noticed that sometimes the module failed to load because the self
tests failed like this:
BTRFS: selftest: fs/btrfs/tests/inode-tests.c:963 miscount, wanted 1, got 0
This turned out to be because sometimes the btrfs ino would be the btree
inode number, and thus we'd skip calling the set extent delalloc bit
helper, and thus not adjust ->outstanding_extents.
Fix this by making sure we initialize test inodes with a valid inode
number so that we don't get random failures during self tests.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When doing an incremental send, if we have a new inode that happens to
have the same number that an old directory inode had in the base snapshot
and that old directory has a pending rmdir operation, we end up computing
a wrong path for the new inode, causing the receiver to fail.
Example reproducer:
$ cat test-send-rmdir.sh
#!/bin/bash
DEV=/dev/sdi
MNT=/mnt/sdi
mkfs.btrfs -f $DEV >/dev/null
mount $DEV $MNT
mkdir $MNT/dir
touch $MNT/dir/file1
touch $MNT/dir/file2
touch $MNT/dir/file3
# Filesystem looks like:
#
# . (ino 256)
# |----- dir/ (ino 257)
# |----- file1 (ino 258)
# |----- file2 (ino 259)
# |----- file3 (ino 260)
#
btrfs subvolume snapshot -r $MNT $MNT/snap1
btrfs send -f /tmp/snap1.send $MNT/snap1
# Now remove our directory and all its files.
rm -fr $MNT/dir
# Unmount the filesystem and mount it again. This is to ensure that
# the next inode that is created ends up with the same inode number
# that our directory "dir" had, 257, which is the first free "objectid"
# available after mounting again the filesystem.
umount $MNT
mount $DEV $MNT
# Now create a new file (it could be a directory as well).
touch $MNT/newfile
# Filesystem now looks like:
#
# . (ino 256)
# |----- newfile (ino 257)
#
btrfs subvolume snapshot -r $MNT $MNT/snap2
btrfs send -f /tmp/snap2.send -p $MNT/snap1 $MNT/snap2
# Now unmount the filesystem, create a new one, mount it and try to apply
# both send streams to recreate both snapshots.
umount $DEV
mkfs.btrfs -f $DEV >/dev/null
mount $DEV $MNT
btrfs receive -f /tmp/snap1.send $MNT
btrfs receive -f /tmp/snap2.send $MNT
umount $MNT
When running the test, the receive operation for the incremental stream
fails:
$ ./test-send-rmdir.sh
Create a readonly snapshot of '/mnt/sdi' in '/mnt/sdi/snap1'
At subvol /mnt/sdi/snap1
Create a readonly snapshot of '/mnt/sdi' in '/mnt/sdi/snap2'
At subvol /mnt/sdi/snap2
At subvol snap1
At snapshot snap2
ERROR: chown o257-9-0 failed: No such file or directory
So fix this by tracking directories that have a pending rmdir by inode
number and generation number, instead of only inode number.
A test case for fstests follows soon.
Reported-by: Massimo B. <massimo.b@gmx.net>
Tested-by: Massimo B. <massimo.b@gmx.net>
Link: https://lore.kernel.org/linux-btrfs/6ae34776e85912960a253a8327068a892998e685.camel@gmx.net/
CC: stable@vger.kernel.org # 4.19+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
There is a chance of racing for qgroup flushing which may lead to
deadlock:
Thread A | Thread B
(not holding trans handle) | (holding a trans handle)
--------------------------------+--------------------------------
__btrfs_qgroup_reserve_meta() | __btrfs_qgroup_reserve_meta()
|- try_flush_qgroup() | |- try_flush_qgroup()
|- QGROUP_FLUSHING bit set | |
| | |- test_and_set_bit()
| | |- wait_event()
|- btrfs_join_transaction() |
|- btrfs_commit_transaction()|
!!! DEAD LOCK !!!
Since thread A wants to commit transaction, but thread B is holding a
transaction handle, blocking the commit.
At the same time, thread B is waiting for thread A to finish its commit.
This is just a hot fix, and would lead to more EDQUOT when we're near
the qgroup limit.
The proper fix would be to make all metadata/data reservations happen
without holding a transaction handle.
CC: stable@vger.kernel.org # 5.9+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Item key collision is allowed for some item types, like dir item and
inode refs, but the overall item size is limited by the nodesize.
item size(ins_len) passed from btrfs_insert_empty_items to
btrfs_search_slot already contains size of btrfs_item.
When btrfs_search_slot reaches leaf, we'll see if we need to split leaf.
The check incorrectly reports that split leaf is required, because
it treats the space required by the newly inserted item as
btrfs_item + item data. But in item key collision case, only item data
is actually needed, the newly inserted item could merge into the existing
one. No new btrfs_item will be inserted.
And split_leaf return EOVERFLOW from following code:
if (extend && data_size + btrfs_item_size_nr(l, slot) +
sizeof(struct btrfs_item) > BTRFS_LEAF_DATA_SIZE(fs_info))
return -EOVERFLOW;
In most cases, when callers receive EOVERFLOW, they either return
this error or handle in different ways. For example, in normal dir item
creation the userspace will get errno EOVERFLOW; in inode ref case
INODE_EXTREF is used instead.
However, this is not the case for rename. To avoid the unrecoverable
situation in rename, btrfs_check_dir_item_collision is called in
early phase of rename. In this function, when item key collision is
detected leaf space is checked:
data_size = sizeof(*di) + name_len;
if (data_size + btrfs_item_size_nr(leaf, slot) +
sizeof(struct btrfs_item) > BTRFS_LEAF_DATA_SIZE(root->fs_info))
the sizeof(struct btrfs_item) + btrfs_item_size_nr(leaf, slot) here
refers to existing item size, the condition here correctly calculates
the needed size for collision case rather than the wrong case above.
The consequence of inconsistent condition check between
btrfs_check_dir_item_collision and btrfs_search_slot when item key
collision happens is that we might pass check here but fail
later at btrfs_search_slot. Rename fails and volume is forced readonly
[436149.586170] ------------[ cut here ]------------
[436149.586173] BTRFS: Transaction aborted (error -75)
[436149.586196] WARNING: CPU: 0 PID: 16733 at fs/btrfs/inode.c:9870 btrfs_rename2+0x1938/0x1b70 [btrfs]
[436149.586227] CPU: 0 PID: 16733 Comm: python Tainted: G D 4.18.0-rc5+ #1
[436149.586228] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/05/2016
[436149.586238] RIP: 0010:btrfs_rename2+0x1938/0x1b70 [btrfs]
[436149.586254] RSP: 0018:ffffa327043a7ce0 EFLAGS: 00010286
[436149.586255] RAX: 0000000000000000 RBX: ffff8d8a17d13340 RCX: 0000000000000006
[436149.586256] RDX: 0000000000000007 RSI: 0000000000000096 RDI: ffff8d8a7fc164b0
[436149.586257] RBP: ffffa327043a7da0 R08: 0000000000000560 R09: 7265282064657472
[436149.586258] R10: 0000000000000000 R11: 6361736e61725420 R12: ffff8d8a0d4c8b08
[436149.586258] R13: ffff8d8a17d13340 R14: ffff8d8a33e0a540 R15: 00000000000001fe
[436149.586260] FS: 00007fa313933740(0000) GS:ffff8d8a7fc00000(0000) knlGS:0000000000000000
[436149.586261] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[436149.586262] CR2: 000055d8d9c9a720 CR3: 000000007aae0003 CR4: 00000000003606f0
[436149.586295] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[436149.586296] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[436149.586296] Call Trace:
[436149.586311] vfs_rename+0x383/0x920
[436149.586313] ? vfs_rename+0x383/0x920
[436149.586315] do_renameat2+0x4ca/0x590
[436149.586317] __x64_sys_rename+0x20/0x30
[436149.586324] do_syscall_64+0x5a/0x120
[436149.586330] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[436149.586332] RIP: 0033:0x7fa3133b1d37
[436149.586348] RSP: 002b:00007fffd3e43908 EFLAGS: 00000246 ORIG_RAX: 0000000000000052
[436149.586349] RAX: ffffffffffffffda RBX: 00007fa3133b1d30 RCX: 00007fa3133b1d37
[436149.586350] RDX: 000055d8da06b5e0 RSI: 000055d8da225d60 RDI: 000055d8da2c4da0
[436149.586351] RBP: 000055d8da2252f0 R08: 00007fa313782000 R09: 00000000000177e0
[436149.586351] R10: 000055d8da010680 R11: 0000000000000246 R12: 00007fa313840b00
Thanks to Hans van Kranenburg for information about crc32 hash collision
tools, I was able to reproduce the dir item collision with following
python script.
https://github.com/wutzuchieh/misc_tools/blob/master/crc32_forge.py Run
it under a btrfs volume will trigger the abort transaction. It simply
creates files and rename them to forged names that leads to
hash collision.
There are two ways to fix this. One is to simply revert the patch
878f2d2cb3 ("Btrfs: fix max dir item size calculation") to make the
condition consistent although that patch is correct about the size.
The other way is to handle the leaf space check correctly when
collision happens. I prefer the second one since it correct leaf
space check in collision case. This fix will not account
sizeof(struct btrfs_item) when the item already exists.
There are two places where ins_len doesn't contain
sizeof(struct btrfs_item), however.
1. extent-tree.c: lookup_inline_extent_backref
2. file-item.c: btrfs_csum_file_blocks
to make the logic of btrfs_search_slot more clear, we add a flag
search_for_extension in btrfs_path.
This flag indicates that ins_len passed to btrfs_search_slot doesn't
contain sizeof(struct btrfs_item). When key exists, btrfs_search_slot
will use the actual size needed to calculate the required leaf space.
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: ethanwu <ethanwu@synology.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When cloning an inline extent there are cases where we can not just copy
the inline extent from the source range to the target range (e.g. when the
target range starts at an offset greater than zero). In such cases we copy
the inline extent's data into a page of the destination inode and then
dirty that page. However, after that we will need to start a transaction
for each processed extent and, if we are ever low on available metadata
space, we may need to flush existing delalloc for all dirty inodes in an
attempt to release metadata space - if that happens we may deadlock:
* the async reclaim task queued a delalloc work to flush delalloc for
the destination inode of the clone operation;
* the task executing that delalloc work gets blocked waiting for the
range with the dirty page to be unlocked, which is currently locked
by the task doing the clone operation;
* the async reclaim task blocks waiting for the delalloc work to complete;
* the cloning task is waiting on the waitqueue of its reservation ticket
while holding the range with the dirty page locked in the inode's
io_tree;
* if metadata space is not released by some other task (like delalloc for
some other inode completing for example), the clone task waits forever
and as a consequence the delalloc work and async reclaim tasks will hang
forever as well. Releasing more space on the other hand may require
starting a transaction, which will hang as well when trying to reserve
metadata space, resulting in a deadlock between all these tasks.
When this happens, traces like the following show up in dmesg/syslog:
[87452.323003] INFO: task kworker/u16:11:1810830 blocked for more than 120 seconds.
[87452.323644] Tainted: G B W 5.10.0-rc4-btrfs-next-73 #1
[87452.324248] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[87452.324852] task:kworker/u16:11 state:D stack: 0 pid:1810830 ppid: 2 flags:0x00004000
[87452.325520] Workqueue: btrfs-flush_delalloc btrfs_work_helper [btrfs]
[87452.326136] Call Trace:
[87452.326737] __schedule+0x5d1/0xcf0
[87452.327390] schedule+0x45/0xe0
[87452.328174] lock_extent_bits+0x1e6/0x2d0 [btrfs]
[87452.328894] ? finish_wait+0x90/0x90
[87452.329474] btrfs_invalidatepage+0x32c/0x390 [btrfs]
[87452.330133] ? __mod_memcg_state+0x8e/0x160
[87452.330738] __extent_writepage+0x2d4/0x400 [btrfs]
[87452.331405] extent_write_cache_pages+0x2b2/0x500 [btrfs]
[87452.332007] ? lock_release+0x20e/0x4c0
[87452.332557] ? trace_hardirqs_on+0x1b/0xf0
[87452.333127] extent_writepages+0x43/0x90 [btrfs]
[87452.333653] ? lock_acquire+0x1a3/0x490
[87452.334177] do_writepages+0x43/0xe0
[87452.334699] ? __filemap_fdatawrite_range+0xa4/0x100
[87452.335720] __filemap_fdatawrite_range+0xc5/0x100
[87452.336500] btrfs_run_delalloc_work+0x17/0x40 [btrfs]
[87452.337216] btrfs_work_helper+0xf1/0x600 [btrfs]
[87452.337838] process_one_work+0x24e/0x5e0
[87452.338437] worker_thread+0x50/0x3b0
[87452.339137] ? process_one_work+0x5e0/0x5e0
[87452.339884] kthread+0x153/0x170
[87452.340507] ? kthread_mod_delayed_work+0xc0/0xc0
[87452.341153] ret_from_fork+0x22/0x30
[87452.341806] INFO: task kworker/u16:1:2426217 blocked for more than 120 seconds.
[87452.342487] Tainted: G B W 5.10.0-rc4-btrfs-next-73 #1
[87452.343274] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[87452.344049] task:kworker/u16:1 state:D stack: 0 pid:2426217 ppid: 2 flags:0x00004000
[87452.344974] Workqueue: events_unbound btrfs_async_reclaim_metadata_space [btrfs]
[87452.345655] Call Trace:
[87452.346305] __schedule+0x5d1/0xcf0
[87452.346947] ? kvm_clock_read+0x14/0x30
[87452.347676] ? wait_for_completion+0x81/0x110
[87452.348389] schedule+0x45/0xe0
[87452.349077] schedule_timeout+0x30c/0x580
[87452.349718] ? _raw_spin_unlock_irqrestore+0x3c/0x60
[87452.350340] ? lock_acquire+0x1a3/0x490
[87452.351006] ? try_to_wake_up+0x7a/0xa20
[87452.351541] ? lock_release+0x20e/0x4c0
[87452.352040] ? lock_acquired+0x199/0x490
[87452.352517] ? wait_for_completion+0x81/0x110
[87452.353000] wait_for_completion+0xab/0x110
[87452.353490] start_delalloc_inodes+0x2af/0x390 [btrfs]
[87452.353973] btrfs_start_delalloc_roots+0x12d/0x250 [btrfs]
[87452.354455] flush_space+0x24f/0x660 [btrfs]
[87452.355063] btrfs_async_reclaim_metadata_space+0x1bb/0x480 [btrfs]
[87452.355565] process_one_work+0x24e/0x5e0
[87452.356024] worker_thread+0x20f/0x3b0
[87452.356487] ? process_one_work+0x5e0/0x5e0
[87452.356973] kthread+0x153/0x170
[87452.357434] ? kthread_mod_delayed_work+0xc0/0xc0
[87452.357880] ret_from_fork+0x22/0x30
(...)
< stack traces of several tasks waiting for the locks of the inodes of the
clone operation >
(...)
[92867.444138] RSP: 002b:00007ffc3371bbe8 EFLAGS: 00000246 ORIG_RAX: 0000000000000052
[92867.444624] RAX: ffffffffffffffda RBX: 00007ffc3371bea0 RCX: 00007f61efe73f97
[92867.445116] RDX: 0000000000000000 RSI: 0000560fbd5d7a40 RDI: 0000560fbd5d8960
[92867.445595] RBP: 00007ffc3371beb0 R08: 0000000000000001 R09: 0000000000000003
[92867.446070] R10: 00007ffc3371b996 R11: 0000000000000246 R12: 0000000000000000
[92867.446820] R13: 000000000000001f R14: 00007ffc3371bea0 R15: 00007ffc3371beb0
[92867.447361] task:fsstress state:D stack: 0 pid:2508238 ppid:2508153 flags:0x00004000
[92867.447920] Call Trace:
[92867.448435] __schedule+0x5d1/0xcf0
[92867.448934] ? _raw_spin_unlock_irqrestore+0x3c/0x60
[92867.449423] schedule+0x45/0xe0
[92867.449916] __reserve_bytes+0x4a4/0xb10 [btrfs]
[92867.450576] ? finish_wait+0x90/0x90
[92867.451202] btrfs_reserve_metadata_bytes+0x29/0x190 [btrfs]
[92867.451815] btrfs_block_rsv_add+0x1f/0x50 [btrfs]
[92867.452412] start_transaction+0x2d1/0x760 [btrfs]
[92867.453216] clone_copy_inline_extent+0x333/0x490 [btrfs]
[92867.453848] ? lock_release+0x20e/0x4c0
[92867.454539] ? btrfs_search_slot+0x9a7/0xc30 [btrfs]
[92867.455218] btrfs_clone+0x569/0x7e0 [btrfs]
[92867.455952] btrfs_clone_files+0xf6/0x150 [btrfs]
[92867.456588] btrfs_remap_file_range+0x324/0x3d0 [btrfs]
[92867.457213] do_clone_file_range+0xd4/0x1f0
[92867.457828] vfs_clone_file_range+0x4d/0x230
[92867.458355] ? lock_release+0x20e/0x4c0
[92867.458890] ioctl_file_clone+0x8f/0xc0
[92867.459377] do_vfs_ioctl+0x342/0x750
[92867.459913] __x64_sys_ioctl+0x62/0xb0
[92867.460377] do_syscall_64+0x33/0x80
[92867.460842] entry_SYSCALL_64_after_hwframe+0x44/0xa9
(...)
< stack traces of more tasks blocked on metadata reservation like the clone
task above, because the async reclaim task has deadlocked >
(...)
Another thing to notice is that the worker task that is deadlocked when
trying to flush the destination inode of the clone operation is at
btrfs_invalidatepage(). This is simply because the clone operation has a
destination offset greater than the i_size and we only update the i_size
of the destination file after cloning an extent (just like we do in the
buffered write path).
Since the async reclaim path uses btrfs_start_delalloc_roots() to trigger
the flushing of delalloc for all inodes that have delalloc, add a runtime
flag to an inode to signal it should not be flushed, and for inodes with
that flag set, start_delalloc_inodes() will simply skip them. When the
cloning code needs to dirty a page to copy an inline extent, set that flag
on the inode and then clear it when the clone operation finishes.
This could be sporadically triggered with test case generic/269 from
fstests, which exercises many fsstress processes running in parallel with
several dd processes filling up the entire filesystem.
CC: stable@vger.kernel.org # 5.9+
Fixes: 05a5a7621c ("Btrfs: implement full reflink support for inline extents")
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Not all firmware versions support allocating DMA memory in smaller blocks so
first try to allocate big block of DMA memory for QMI. If the allocation fails,
let firmware request multiple blocks of DMA memory with smaller size.
This also fixes an unnecessary error message seen during ath11k probe on
QCA6390:
ath11k_pci 0000:06:00.0: Respond mem req failed, result: 1, err: 0
ath11k_pci 0000:06:00.0: qmi failed to respond fw mem req:-22
Tested-on: QCA6390 hw2.0 PCI WLAN.HST.1.0.1-01740-QCAHSTSWPLZ_V2_TO_X86-1
Signed-off-by: Carl Huang <cjhuang@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1608127593-15192-1-git-send-email-kvalo@codeaurora.org
Commit 45c5775460 ("usb: ohci-omap: Fix descriptor conversion") tried to
fix all issues related to ohci-omap descriptor conversion, but a wrong
patch was applied, and one needed change to the OSK board file is still
missing. Fix that.
Fixes: 45c5775460 ("usb: ohci-omap: Fix descriptor conversion")
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
[aaro.koskinen@iki.fi: rebased and updated the changelog]
Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Signed-off-by: Tony Lindgren <tony@atomide.com>
This reverts
commit f1f028ff89 ("DTS: ARM: gta04: introduce legacy spi-cs-high to make display work again")
which had to be intruduced after
commit 6953c57ab1 ("gpio: of: Handle SPI chipselect legacy bindings")
broke the GTA04 display. This contradicted the data sheet but was the only
way to get it as an spi client operational again.
The panel data sheet defines the chip-select to be active low.
Now, with the arrival of
commit 766c6b63aa ("spi: fix client driver breakages when using GPIO descriptors")
the logic of interaction between spi-cs-high and the gpio descriptor flags
has been changed a second time, making the display broken again. So we have
to remove the original fix which in retrospect was a workaround of a bug in
the spi subsystem and not a feature of the panel or bug in the device tree.
With this fix the device tree is back in sync with the data sheet and
spi subsystem code.
Fixes: 766c6b63aa ("spi: fix client driver breakages when using GPIO descriptors")
CC: stable@vger.kernel.org
Signed-off-by: H. Nikolaus Schaller <hns@goldelico.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Fix a possible memory leak when a bind of an AF_XDP socket fails. When
the fill and completion rings are created, they are tied to the
socket. But when the buffer pool is later created at bind time, the
ownership of these two rings are transferred to the buffer pool as
they might be shared between sockets (and the buffer pool cannot be
created until we know what we are binding to). So, before the buffer
pool is created, these two rings are cleaned up with the socket, and
after they have been transferred they are cleaned up together with
the buffer pool.
The problem is that ownership was transferred before it was absolutely
certain that the buffer pool could be created and initialized
correctly and when one of these errors occurred, the fill and
completion rings did neither belong to the socket nor the pool and
where therefore leaked. Solve this by moving the ownership transfer
to the point where the buffer pool has been completely set up and
there is no way it can fail.
Fixes: 7361f9c3d7 ("xsk: Move fill and completion rings to buffer pool")
Reported-by: syzbot+cfa88ddd0655afa88763@syzkaller.appspotmail.com
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Link: https://lore.kernel.org/bpf/20201214085127.3960-1-magnus.karlsson@gmail.com
When removing VFs for PF added to bridge there was
an error I40E_AQ_RC_EINVAL. It was caused by not properly
resetting and reinitializing PF when adding/removing VFs.
Changed how reset is performed when adding/removing VFs
to properly reinitialize PFs VSI.
Fixes: fc60861e9b ("i40e: start up in VEPA mode by default")
Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
htable_bits() can call jhash_size(32) and trigger shift-out-of-bounds
UBSAN: shift-out-of-bounds in net/netfilter/ipset/ip_set_hash_gen.h:151:6
shift exponent 32 is too large for 32-bit type 'unsigned int'
CPU: 0 PID: 8498 Comm: syz-executor519
Not tainted 5.10.0-rc7-next-20201208-syzkaller #0
Call Trace:
__dump_stack lib/dump_stack.c:79 [inline]
dump_stack+0x107/0x163 lib/dump_stack.c:120
ubsan_epilogue+0xb/0x5a lib/ubsan.c:148
__ubsan_handle_shift_out_of_bounds.cold+0xb1/0x181 lib/ubsan.c:395
htable_bits net/netfilter/ipset/ip_set_hash_gen.h:151 [inline]
hash_mac_create.cold+0x58/0x9b net/netfilter/ipset/ip_set_hash_gen.h:1524
ip_set_create+0x610/0x1380 net/netfilter/ipset/ip_set_core.c:1115
nfnetlink_rcv_msg+0xecc/0x1180 net/netfilter/nfnetlink.c:252
netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2494
nfnetlink_rcv+0x1ac/0x420 net/netfilter/nfnetlink.c:600
netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline]
netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1330
netlink_sendmsg+0x907/0xe40 net/netlink/af_netlink.c:1919
sock_sendmsg_nosec net/socket.c:652 [inline]
sock_sendmsg+0xcf/0x120 net/socket.c:672
____sys_sendmsg+0x6e8/0x810 net/socket.c:2345
___sys_sendmsg+0xf3/0x170 net/socket.c:2399
__sys_sendmsg+0xe5/0x1b0 net/socket.c:2432
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9
This patch replaces htable_bits() by simple fls(hashsize - 1) call:
it alone returns valid nbits both for round and non-round hashsizes.
It is normal to set any nbits here because it is validated inside
following htable_size() call which returns 0 for nbits>31.
Fixes: 1feab10d7e6d("netfilter: ipset: Unified hash type generation")
Reported-by: syzbot+d66bfadebca46cf61a2b@syzkaller.appspotmail.com
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Acked-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
currently mtype_resize() can cause oops
t = ip_set_alloc(htable_size(htable_bits));
if (!t) {
ret = -ENOMEM;
goto out;
}
t->hregion = ip_set_alloc(ahash_sizeof_regions(htable_bits));
Increased htable_bits can force htable_size() to return 0.
In own turn ip_set_alloc(0) returns not 0 but ZERO_SIZE_PTR,
so follwoing access to t->hregion should trigger an OOPS.
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Acked-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This fixes the dereference to fetch the RCU pointer when holding
the appropriate xtables lock.
Reported-by: kernel test robot <lkp@intel.com>
Fixes: cc00bcaa58 ("netfilter: x_tables: Switch synchronization to RCU")
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
When the axg-tdm-interface was introduced, the backend DAI was marked as an
endpoint when DPCM was walking the DAPM graph to find a its BE.
It is no longer the case since this
commit 8dd26dff00 ("ASoC: dapm: Fix handling of custom_stop_condition on DAPM graph walks")
Because of this, when DPCM finds a BE it does everything it needs on the
DAIs but it won't power up the widgets between the FE and the BE if there
is no actual endpoint after the BE.
On meson-axg HWs, the loopback is a special DAI of the tdm-interface BE.
It is only linked to the dummy codec since there no actual HW after it.
>From the DAPM perspective, the DAI has no endpoint. Because of this, the TDM
decoder, which is a widget between the FE and BE is not powered up.
>From the user perspective, everything seems fine but no data is produced.
Connecting the Loopback DAI to a dummy DAPM endpoint solves the problem.
Fixes: 8dd26dff00 ("ASoC: dapm: Fix handling of custom_stop_condition on DAPM graph walks")
Cc: Charles Keepax <ckeepax@opensource.cirrus.com>
Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
Link: https://lore.kernel.org/r/20201217150812.3247405-1-jbrunet@baylibre.com
Signed-off-by: Mark Brown <broonie@kernel.org>
This patch set is to add support for playback recover after hard suspend and resume.
It includes:
1. Reverting part of previous commit, which is for handling registers invalid state
after hard suspend.
2. Adding pm ops in component driver and do regcache sync.
Changes Since v1 and v2:
-- Subject lines changed
Changes Since v3:
-- Patch is splitted into 2 patches
Changes Since v4:
-- Subject lines changed
Changes Since v5:
-- Removed redundant initialization of map variable in lpass-platform.c
Srinivasa Rao Mandadapu (2):
ASoC: qcom: Fix incorrect volatile registers
ASoC: qcom: Add support for playback recover after resume
sound/soc/qcom/lpass-cpu.c | 20 ++---------------
sound/soc/qcom/lpass-platform.c | 50 ++++++++++++++++++++++++++++-------------
2 files changed, 37 insertions(+), 33 deletions(-)
--
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.,
is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.
MI2S and DMA control registers are not volatile, so remove these from volatile registers list.
Registers reset state check by reading non volatile registers makes no use,
so remove error check from cpu and platform trigger callbacks.
Initialized map variable two times in lpass platform trigger API,
so remove redundant initialization.
Fixes commit b182496822 ("ASoC: qcom: Fix enabling BCLK and LRCLK in LPAIF invalid state")
Signed-off-by: V Sujith Kumar Reddy <vsujithk@codeaurora.org>
Signed-off-by: Srinivasa Rao Mandadapu <srivasam@codeaurora.org>
Link: https://lore.kernel.org/r/1608192514-29695-2-git-send-email-srivasam@codeaurora.org
Signed-off-by: Mark Brown <broonie@kernel.org>
The "if (!ret)" condition is inverted and it should be "if (ret)". It means
that we return success when we had intended to return an error code. This also
caused a spurious warning even when the suspend was successful:
[ 297.186612] ath11k_pci 0000:06:00.0: failed to suspend hif: 0
Fixes: d1b0c33850 ("ath11k: implement suspend for QCA6390 PCI devices")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/X9nF17L2/EKOSbn/@mwanda
For QCA6390, bss peer must be created before vdev is to start. This
change is to start vdev if a bss peer is created. Otherwise, ath11k
delays to start vdev.
This fixes an issue in a case where HT/VHT/HE settings change between
authentication and association, e.g., due to the user space request
to disable HT.
Tested-on: QCA6390 hw2.0 PCI WLAN.HST.1.0.1-01740-QCAHSTSWPLZ_V2_TO_X86-1
Signed-off-by: Carl Huang <cjhuang@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20201211051358.9191-1-cjhuang@codeaurora.org
This reverts commit 1a0e1943d8.
Commit b3c6a59975 ("block: Fix a lockdep complaint triggered by
request queue flushing") has been reverted and commit fb01a2932e has
been introduced in its place. Consequently, it is now safe to
reinstate the megaraid_sas tagset changes that led to boot problems in
5.10.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If Makefile cannot find any of the vmlinux's in its VMLINUX_BTF_PATHS list,
it tries to run btftool incorrectly, with VMLINUX_BTF unset:
bpftool btf dump file $(VMLINUX_BTF) format c
Such that the keyword 'format' is misinterpreted as the path to vmlinux.
The resulting build error message is fairly cryptic:
GEN vmlinux.h
Error: failed to load BTF from format: No such file or directory
This patch makes the failure reason clearer by yielding this instead:
Makefile:...: *** Cannot find a vmlinux for VMLINUX_BTF at any of
"{paths}". Stop.
Fixes: acbd06206b ("selftests/bpf: Add vmlinux.h selftest exercising tracing of syscalls")
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20201215182011.15755-1-kamal@canonical.com
The pf8x00 driver supports three devices, the DT compatible strings
and I2C IDs should enumerate these specifically rather than using a
wildcard so that we don't collide with anything incompatible in the
same ID range in the future and so that we can handle any software
visible differences between the variants we find.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20201215132024.13356-1-broonie@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
The intention of the err_expr cleanup path is to iterate over the
allocated expr_array objects and free them, starting from i - 1 and
working down to the start of the array. Currently the loop counter
is being incremented instead of decremented and also the index i is
being used instead of k, repeatedly destroying the same expr_array
element. Fix this by decrementing k and using k as the index into
expr_array.
Addresses-Coverity: ("Infinite loop")
Fixes: 8cfd9b0f85 ("netfilter: nftables: generalize set expressions support")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Hi,
My TI address is going to bounce after Friday (18.12.2020), switch my email
address to my private one for now.
Regards,
Peter
---
Peter Ujfalusi (2):
MAINTAINERS: Update email address for TI ASoC and twl4030 codec
drivers
ASoC: dt-bindings: ti,j721e: Update maintainer and author information
.../devicetree/bindings/sound/ti,j721e-cpb-audio.yaml | 4 +++-
.../devicetree/bindings/sound/ti,j721e-cpb-ivi-audio.yaml | 4 +++-
MAINTAINERS | 6 +++---
3 files changed, 9 insertions(+), 5 deletions(-)
--
Peter
Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
A widget's "dirty" list_head, much like its "list" list_head, eventually
chains back to a list_head on the snd_soc_card itself. This means that
the list can stick around even after the widget (or all widgets) have
been freed. Currently, however, widgets that are in the dirty list when
freed remain there, corrupting the entire list and leading to memory
errors and undefined behavior when the list is next accessed or
modified.
I encountered this issue when a component failed to probe relatively
late in snd_soc_bind_card(), causing it to bail out and call
soc_cleanup_card_resources(), which eventually called
snd_soc_dapm_free() with widgets that were still dirty from when they'd
been added.
Fixes: db432b414e ("ASoC: Do DAPM power checks only for widgets changed since last run")
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Hebb <tommyhebb@gmail.com>
Reviewed-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Link: https://lore.kernel.org/r/f8b5f031d50122bf1a9bfc9cae046badf4a7a31a.1607822410.git.tommyhebb@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
PowerPC testing encountered boot failures due to RCU Tasks not being
fully initialized until core_initcall() time. This commit therefore
initializes RCU Tasks (along with Rude RCU and RCU Tasks Trace) just
before early_initcall() time, thus allowing waiting on RCU Tasks grace
periods from early_initcall() handlers.
Link: https://lore.kernel.org/rcu/87eekfh80a.fsf@dja-thinkpad.axtens.net/
Fixes: 36dadef23f ("kprobes: Init kprobes in early_initcall")
Tested-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Few regression fixes for omaps for v5.10
Few fixes for regression issues recently noticed by folks testing
the current kernel:
- We need to disable AES for n950 as it's not accessible because of the
secure mode configuration and kernel fails to boot
- On gta04 wlan probe exposed a bug for BUS_NOTIFY_BIND_DRIVER that has
been around for a long time
- Droid bionic exposed an issue where we configure an invalid range on
the PMIC that adds boot time warnings
Obviously these fixes can be merged whenever suitable.
* tag 'omap-for-v5.10/fixes-rc6-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
ARM: omap2: pmic-cpcap: fix maximum voltage to be consistent with defaults on xt875
ARM: OMAP2+: omap_device: fix idling of devices during probe
ARM: dts: OMAP3: disable AES on N950/N9
Link: https://lore.kernel.org/r/pull-1607674932-973902@atomide.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
xt875 comes up with a iva voltage of 1375000 and android runs at this too. fix
maximum voltage to be consistent with this.
Signed-off-by: Carl Philipp Klemm <philipp@uvos.xyz>
Signed-off-by: Tony Lindgren <tony@atomide.com>
GTA04 uses that for controlling the td028ttec1 panel. So
for easier testing/bisecting it is useful to have it
enabled in the defconfig.
Signed-off-by: Andreas Kemnade <andreas@kemnade.info>
Signed-off-by: Tony Lindgren <tony@atomide.com>
On the GTA04A5 od->_driver_status was not set to BUS_NOTIFY_BIND_DRIVER
during probe of the second mmc used for wifi. Therefore
omap_device_late_idle idled the device during probing causing oopses when
accessing the registers.
It was not set because od->_state was set to OMAP_DEVICE_STATE_IDLE
in the notifier callback. Therefore set od->_driver_status also in that
case.
This came apparent after commit 21b2cec61c ("mmc: Set
PROBE_PREFER_ASYNCHRONOUS for drivers that existed in v4.4") causing this
oops:
omap_hsmmc 480b4000.mmc: omap_device_late_idle: enabled but no driver. Idling
8<--- cut here ---
Unhandled fault: external abort on non-linefetch (0x1028) at 0xfa0b402c
...
(omap_hsmmc_set_bus_width) from [<c07996bc>] (omap_hsmmc_set_ios+0x11c/0x258)
(omap_hsmmc_set_ios) from [<c077b2b0>] (mmc_power_up.part.8+0x3c/0xd0)
(mmc_power_up.part.8) from [<c077c14c>] (mmc_start_host+0x88/0x9c)
(mmc_start_host) from [<c077d284>] (mmc_add_host+0x58/0x84)
(mmc_add_host) from [<c0799190>] (omap_hsmmc_probe+0x5fc/0x8c0)
(omap_hsmmc_probe) from [<c0666728>] (platform_drv_probe+0x48/0x98)
(platform_drv_probe) from [<c066457c>] (really_probe+0x1dc/0x3b4)
Fixes: 04abaf07f6 ("ARM: OMAP2+: omap_device: Sync omap_device and pm_runtime after probe defer")
Fixes: 21b2cec61c ("mmc: Set PROBE_PREFER_ASYNCHRONOUS for drivers that existed in v4.4")
Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Andreas Kemnade <andreas@kemnade.info>
[tony@atomide.com: left out extra parens, trimmed description stack trace]
Signed-off-by: Tony Lindgren <tony@atomide.com>
Revert commit cebc04ba9a ("add CONFIG_ENABLE_MUST_CHECK").
A lot of warn_unused_result warnings existed in 2006, but until now
they have been fixed thanks to people doing allmodconfig tests.
Our goal is to always enable __must_check where appropriate, so this
CONFIG option is no longer needed.
I see a lot of defconfig (arch/*/configs/*_defconfig) files having:
# CONFIG_ENABLE_MUST_CHECK is not set
I did not touch them for now since it would be a big churn. If arch
maintainers want to clean them up, please go ahead.
While I was here, I also moved __must_check to compiler_attributes.h
from compiler_types.h
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Jason A. Donenfeld <Jason@zx2c4.com>
Acked-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
[Moved addition in compiler_attributes.h to keep it sorted]
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
The top Makefile defines READELF as the readelf in the cross-toolchains.
Use it rather than the host readelf.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Adding vmlinux.* to extra-y has no point because we expect they are
built on demand while building uImage.*
Add them to 'targets' is enough to include the corresponding .cmd file.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
If you run 'make uImage uImage.gz' with the parallel option, uImage.gz
will be created by two threads simultaneously.
This is because arch/arc/Makefile does not specify the dependency
between uImage and uImage.gz. Hence, GNU Make assumes they can be
built in parallel. One thread descends into arch/arc/boot/ to create
uImage, and another to create uImage.gz.
Please notice the same log is displayed twice in the following steps:
$ export CROSS_COMPILE=<your-arc-compiler-prefix>
$ make -s ARCH=arc defconfig
$ make -j$(nproc) ARCH=arc uImage uImage.gz
[ snip ]
LD vmlinux
SORTTAB vmlinux
SYSMAP System.map
OBJCOPY arch/arc/boot/vmlinux.bin
OBJCOPY arch/arc/boot/vmlinux.bin
GZIP arch/arc/boot/vmlinux.bin.gz
GZIP arch/arc/boot/vmlinux.bin.gz
UIMAGE arch/arc/boot/uImage.gz
UIMAGE arch/arc/boot/uImage.gz
Image Name: Linux-5.10.0-rc4-00003-g62f23044
Created: Sun Nov 22 02:52:26 2020
Image Type: ARC Linux Kernel Image (gzip compressed)
Data Size: 2109376 Bytes = 2059.94 KiB = 2.01 MiB
Load Address: 80000000
Entry Point: 80004000
Image arch/arc/boot/uImage is ready
Image Name: Linux-5.10.0-rc4-00003-g62f23044
Created: Sun Nov 22 02:52:26 2020
Image Type: ARC Linux Kernel Image (gzip compressed)
Data Size: 2815455 Bytes = 2749.47 KiB = 2.69 MiB
Load Address: 80000000
Entry Point: 80004000
This is a race between the two threads trying to write to the same file
arch/arc/boot/uImage.gz. This is a potential problem that can generate
a broken file.
I fixed a similar problem for ARM by commit 3939f33450 ("ARM: 8418/1:
add boot image dependencies to not generate invalid images").
I highly recommend to avoid such build rules that cause a race condition.
Move the uImage rule to arch/arc/Makefile.
Another strangeness is that arch/arc/boot/Makefile compares the
timestamps between $(obj)/uImage and $(obj)/uImage.*:
$(obj)/uImage: $(obj)/uImage.$(suffix-y)
@ln -sf $(notdir $<) $@
@echo ' Image $@ is ready'
This does not work as expected since $(obj)/uImage is a symlink.
The symlink should be created in a phony target rule.
I used $(kecho) instead of echo to suppress the message
'Image arch/arc/boot/uImage is ready' when the -s option is given.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
The top-level boot_targets (uImage and uImage.*) should be phony
targets. They just let Kbuild descend into arch/arc/boot/ and create
files there.
If a file exists in the top directory with the same name, the boot
image will not be created.
You can confirm it by the following steps:
$ export CROSS_COMPILE=<your-arc-compiler-prefix>
$ make -s ARCH=arc defconfig all # vmlinux will be built
$ touch uImage.gz
$ make ARCH=arc uImage.gz
CALL scripts/atomic/check-atomics.sh
CALL scripts/checksyscalls.sh
CHK include/generated/compile.h
# arch/arc/boot/uImage.gz is not created
Specify the targets as PHONY to fix this.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
arch/arc/boot/Makefile supports uImage.lzma, but you cannot do
'make uImage.lzma' because the corresponding target is missing
in arch/arc/Makefile. Add it.
I also changed the assignment operator '+=' to ':=' since this is the
only place where we expect this variable to be set.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
The deb-pkg builds for ARCH=arc fail.
$ export CROSS_COMPILE=<your-arc-compiler-prefix>
$ make -s ARCH=arc defconfig
$ make ARCH=arc bindeb-pkg
SORTTAB vmlinux
SYSMAP System.map
MODPOST Module.symvers
make KERNELRELEASE=5.10.0-rc4 ARCH=arc KBUILD_BUILD_VERSION=2 -f ./Makefile intdeb-pkg
sh ./scripts/package/builddeb
cp: cannot stat 'arch/arc/boot/bootpImage': No such file or directory
make[4]: *** [scripts/Makefile.package:87: intdeb-pkg] Error 1
make[3]: *** [Makefile:1527: intdeb-pkg] Error 2
make[2]: *** [debian/rules:13: binary-arch] Error 2
dpkg-buildpackage: error: debian/rules binary subprocess returned exit status 2
make[1]: *** [scripts/Makefile.package:83: bindeb-pkg] Error 2
make: *** [Makefile:1527: bindeb-pkg] Error 2
The reason is obvious; arch/arc/Makefile sets $(boot)/bootpImage as
the default image, but there is no rule to build it.
Remove the meaningless KBUILD_IMAGE assignment so it will fallback
to the default vmlinux. With this change, you can build the deb package.
I removed the 'bootpImage' target as well. At best, it provides
'make bootpImage' as an alias of 'make vmlinux', but I do not see
much sense in doing so.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
AES needs to be disabled on Nokia N950/N9 as well (HS devices), otherwise
kernel fails to boot.
Fixes: c312f06631 ("ARM: dts: omap3: Migrate AES from hwmods to sysc-omap2")
Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Commit 785b5bb41b ("PM: AVS: Drop the avs directory and the
corresponding Kconfig") moved AVS code to SOC-specific folders, and
removed corresponding Kconfig from drivers/power, leaving original
POWER_AVS config option enabled in omap2plus_defconfig file.
Remove the option, which has no references in the tree anymore.
Fixes: 785b5bb41b ("PM: AVS: Drop the avs directory and the corresponding Kconfig")
Signed-off-by: Andrey Zhizhikin <andrey.zhizhikin@leica-geosystems.com>
Reviewed-by: Krzysztof Kozlowski <krzk@kernel.org>
Reviewed-by: Nishanth Menon <nm@ti.com>
Cc: Nishanth Menon <nm@ti.com>
Cc: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Tony Lindgren <tony@atomide.com>
2020-12-01 09:29:20 +02:00
1642 changed files with 19379 additions and 9654 deletions
Q: Should I request it via stable@vger.kernel.org like the references in
the kernel's Documentation/process/stable-kernel-rules.rst file say?
A: No, not for networking. Check the stable queues as per above first
I see a network patch and I think it should be backported to stable. Should I request it via stable@vger.kernel.org like the references in the kernel's Documentation/process/stable-kernel-rules.rst file say?
Q: Should I add a Cc: stable@vger.kernel.org like the references in the
kernel's Documentation/ directory say?
A: No. See above answer. In short, if you think it really belongs in
I have created a network patch and I think it should be backported to stable. Should I add a Cc: stable@vger.kernel.org like the references in the kernel's Documentation/ directory say?
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.