Pull locking fixes from Thomas Gleixner:
"Two more places which invoke tracing from RCU disabled regions in the
idle path.
Similar to the entry path the low level idle functions have to be
non-instrumentable"
* tag 'locking-urgent-2020-11-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
intel_idle: Fix intel_idle() vs tracing
sched/idle: Fix arch_cpu_idle() vs tracing
Pull irq fixes from Thomas Gleixner:
"Two fixes for irqchip drivers:
- Save and restore the GICV3 ITS state unconditionally on
suspend/resume to handle firmware which fails to do so.
- Use the correct index into the fwspec parameters to read the irq
trigger type in the EXIU chip driver"
* tag 'irq-urgent-2020-11-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/gic-v3-its: Unconditionally save/restore the ITS state on suspend
irqchip/exiu: Fix the index of fwspec for IRQ type
Pull EFI fixes from Borislav Petkov:
"More EFI fixes forwarded from Ard Biesheuvel:
- revert efivarfs kmemleak fix again - it was a false positive
- make CONFIG_EFI_EARLYCON depend on CONFIG_EFI explicitly so it does
not pull in other dependencies unnecessarily if CONFIG_EFI is not
set
- defer attempts to load SSDT overrides from EFI vars until after the
efivar layer is up"
* tag 'efi-urgent-for-v5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi: EFI_EARLYCON should depend on EFI
efivarfs: revert "fix memory leak in efivarfs_create()"
efi/efivars: Set generic ops before loading SSDT
Pull x86 fixes from Borislav Petkov:
"A couple of urgent fixes which accumulated this last week:
- Two resctrl fixes to prevent refcount leaks when manipulating the
resctrl fs (Xiaochen Shen)
- Correct prctl(PR_GET_SPECULATION_CTRL) reporting (Anand K Mistry)
- A fix to not lose already seen MCE severity which determines
whether the machine can recover (Gabriele Paoloni)"
* tag 'x86_urgent_for_v5.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mce: Do not overwrite no_way_out if mce_end() fails
x86/speculation: Fix prctl() when spectre_v2_user={seccomp,prctl},ibpb
x86/resctrl: Add necessary kernfs_put() calls to prevent refcount leak
x86/resctrl: Remove superfluous kernfs_get() calls to prevent refcount leak
Pull RISC-V fixes from Palmer Dabbelt:
"I've collected a handful of fixes over the past few weeks:
- A fix to un-break the build-id argument to the vDSO build, which is
necessary for the LLVM linker.
- A fix to initialize the jump label subsystem, without which it (and
all the stuff that uses it) doesn't actually function.
- A fix to include <asm/barrier.h> from <vdso/processor.h>, without
which some drivers won't compile"
* tag 'riscv-for-linus-5.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
RISC-V: fix barrier() use in <vdso/processor.h>
RISC-V: Add missing jump label initialization
riscv: Explicitly specify the build id style in vDSO Makefile again
Pull perf tool fixes from Arnaldo Carvalho de Melo:
- Fix die_entrypc() when DW_AT_ranges DWARF attribute not available
- Cope with broken DWARF (missing DW_AT_declaration) generated by some
recent gcc versions
- Do not generate CGROUP metadata events when not asked to in 'perf
record'
- Use proper CPU for shadow stats in 'perf stat'
- Update copy of libbpf's hashmap.c, silencing tools/perf build warning
- Fix return value in 'perf diff'
* tag 'perf-tools-fixes-for-v5.10-2020-11-28' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
perf probe: Change function definition check due to broken DWARF
perf probe: Fix to die_entrypc() returns error correctly
perf stat: Use proper cpu for shadow stats
perf record: Synthesize cgroup events only if needed
perf diff: Fix error return value in __cmd_diff()
perf tools: Update copy of libbpf's hashmap.c
Pull USB / PHY driver fixes from Greg KH:
"Here are a few small USB and PHY driver fixes for 5.10-rc6. They
include:
- small PHY driver fixes to resolve reported issues
- USB quirks added for "broken" devices
- typec fixes for reported problems
- USB gadget fixes for small issues
Full details are in the shortlog, nothing major in here and all have
been in linux-next with no reported issues"
* tag 'usb-5.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: typec: stusb160x: fix power-opmode property with typec-power-opmode
USB: core: Change %pK for __user pointers to %px
USB: core: Fix regression in Hercules audio card
usb: gadget: Fix memleak in gadgetfs_fill_super
usb: gadget: f_midi: Fix memleak in f_midi_alloc
USB: quirks: Add USB_QUIRK_DISCONNECT_SUSPEND quirk for Lenovo A630Z TIO built-in usb-audio card
usb: typec: qcom-pmic-typec: fix builtin build errors
phy: mediatek: fix spelling mistake in Kconfig "veriosn" -> "version"
phy: qualcomm: Fix 28 nm Hi-Speed USB PHY OF dependency
phy: qualcomm: usb: Fix SuperSpeed PHY OF dependency
phy: intel: PHY_INTEL_KEEMBAY_EMMC should depend on ARCH_KEEMBAY
usb: cdns3: gadget: calculate TD_SIZE based on TD
usb: cdns3: gadget: initialize link_trb as NULL
phy: cpcap-usb: Use IRQF_ONESHOT
phy: qcom-qmp: Initialize another pointer to NULL
phy: tegra: xusb: Fix dangling pointer on probe failure
phy: usb: Fix incorrect clearing of tca_drv_sel bit in SETUP reg for 7211
Pull char/misc driver fixes from Greg KH:
"Here are some small misc driver fixes for 5.10-rc6. They include:
- interconnect fixes for reported problems
- habanalabs bugfix for found issue when doing the switch fallthrough
patches
- MAINTAINERS file update for coresight reviewers/maintainers
All have been in linux-next with no reported issues"
* tag 'char-misc-5.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
MAINTAINERS: Adding help for coresight subsystem
habanalabs/gaudi: fix missing code in ECC handling
interconnect: fix memory trashing in of_count_icc_providers()
interconnect: qcom: qcs404: Remove GPU and display RPM IDs
interconnect: qcom: msm8916: Remove rpm-ids from non-RPM nodes
interconnect: qcom: msm8974: Don't boost the NoC rate during boot
interconnect: qcom: msm8974: Prevent integer overflow in rate
Pull asm-generic fix from Arnd Bergmann:
"Add correct MAX_POSSIBLE_PHYSMEM_BITS setting to asm-generic.
This is a single bugfix for a bug that Stefan Agner found on 32-bit
Arm, but that exists on several other architectures"
* tag 'asm-generic-fixes-5.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
arch: pgtable: define MAX_POSSIBLE_PHYSMEM_BITS where needed
Pull ARM SoC fixes from Arnd Bergmann:
"Another set of patches for devicetree files and Arm SoC specific
drivers:
- A fix for OP-TEE shared memory on non-SMP systems
- multiple code fixes for the OMAP platform, including one regression
for the CPSW network driver and a few runtime warning fixes
- Some DT patches for the Rockchip RK3399 platform, in particular
fixing the MMC device ordering that recently became
nondeterministic with async probe.
- Multiple DT fixes for the Tegra platform, including a regression
fix for suspend/resume on TX2
- A regression fix for a user-triggered fault in the NXP dpio driver
- A regression fix for a bug caused by an earlier bug fix in the
xilinx firmware driver
- Two more DTC warning fixes
- Sylvain Lemieux steps down as maintainer for the NXP LPC32xx
platform"
* tag 'arm-soc-fixes-v5.10-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (24 commits)
arm64: tegra: Fix Tegra234 VDK node names
arm64: tegra: Wrong AON HSP reg property size
arm64: tegra: Fix USB_VBUS_EN0 regulator on Jetson TX1
arm64: tegra: Correct the UART for Jetson Xavier NX
arm64: tegra: Disable the ACONNECT for Jetson TX2
optee: add writeback to valid memory type
firmware: xilinx: Use hash-table for api feature check
firmware: xilinx: Fix SD DLL node reset issue
soc: fsl: dpio: Get the cpumask through cpumask_of(cpu)
ARM: dts: dra76x: m_can: fix order of clocks
bus: ti-sysc: suppress err msg for timers used as clockevent/source
MAINTAINERS: Remove myself as LPC32xx maintainers
arm64: dts: qcom: clear the warnings caused by empty dma-ranges
arm64: dts: broadcom: clear the warnings caused by empty dma-ranges
ARM: dts: am437x-l4: fix compatible for cpsw switch dt node
arm64: dts: rockchip: Reorder LED triggers from mmc devices on rk3399-roc-pc.
arm64: dts: rockchip: Assign a fixed index to mmc devices on rk3399 boards.
arm64: dts: rockchip: Remove system-power-controller from pmic on Odroid Go Advance
arm64: dts: rockchip: fix NanoPi R2S GMAC clock name
ARM: OMAP2+: Manage MPU state properly for omap_enter_idle_coupled()
...
Pull networking fixes from Jakub Kicinski:
"Networking fixes for 5.10-rc6, including fixes from the WiFi driver,
and CAN subtrees.
Current release - regressions:
- gro_cells: reduce number of synchronize_net() calls
- ch_ktls: release a lock before jumping to an error path
Current release - always broken:
- tcp: Allow full IP tos/IPv6 tclass to be reflected in L3 header
Previous release - regressions:
- net/tls: fix missing received data after fast remote close
- vsock/virtio: discard packets only when socket is really closed
- sock: set sk_err to ee_errno on dequeue from errq
- cxgb4: fix the panic caused by non smac rewrite
Previous release - always broken:
- tcp: fix corner cases around setting ECN with BPF selection of
congestion control
- tcp: fix race condition when creating child sockets from syncookies
on loopback interface
- usbnet: ipheth: fix connectivity with iOS 14
- tun: honor IOCB_NOWAIT flag
- net/packet: fix packet receive on L3 devices without visible hard
header
- devlink: Make sure devlink instance and port are in same net
namespace
- net: openvswitch: fix TTL decrement action netlink message format
- bonding: wait for sysfs kobject destruction before freeing struct
slave
- net: stmmac: fix upstream patch applied to the wrong context
- bnxt_en: fix return value and unwind in probe error paths
Misc:
- devlink: add extra layer of categorization to the reload stats uAPI
before it's released"
* tag 'net-5.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (68 commits)
sock: set sk_err to ee_errno on dequeue from errq
mptcp: fix NULL ptr dereference on bad MPJ
net: openvswitch: fix TTL decrement action netlink message format
can: af_can: can_rx_unregister(): remove WARN() statement from list operation sanity check
can: m_can: m_can_dev_setup(): add support for bosch mcan version 3.3.0
can: m_can: fix nominal bitiming tseg2 min for version >= 3.1
can: m_can: m_can_open(): remove IRQF_TRIGGER_FALLING from request_threaded_irq()'s flags
can: mcp251xfd: mcp251xfd_probe(): bail out if no IRQ was given
can: gs_usb: fix endianess problem with candleLight firmware
ch_ktls: lock is not freed
net/tls: Protect from calling tls_dev_del for TLS RX twice
devlink: Make sure devlink instance and port are in same net namespace
devlink: Hold rtnl lock while reading netdev attributes
ptp: clockmatrix: bug fix for idtcm_strverscmp
enetc: Let the hardware auto-advance the taprio base-time of 0
gro_cells: reduce number of synchronize_net() calls
net: stmmac: fix incorrect merge of patch upstream
ipv6: addrlabel: fix possible memory leak in ip6addrlbl_net_init
Documentation: netdev-FAQ: suggest how to post co-dependent series
ibmvnic: enhance resetting status check during module exit
...
Pull SCSI fixes from James Bottomley:
"Three small fixes in the UFS driver: two are for power management
issues and the third is to fix a slew of problem in the sysfs code"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: ufs: Fix race between shutdown and runtime resume flow
scsi: ufs: Make sure clk scaling happens only when HBA is runtime ACTIVE
scsi: ufs: Fix unexpected values from ufshcd_read_desc_param()
Pull io_uring fixes from Jens Axboe:
- Out of bounds fix for the cq size cap from earlier this release (Joseph)
- iov_iter type check fix (Pavel)
- Files grab + cancelation fix (Pavel)
* tag 'io_uring-5.10-2020-11-27' of git://git.kernel.dk/linux-block:
io_uring: fix files grab/cancel race
io_uring: fix ITER_BVEC check
io_uring: fix shift-out-of-bounds when round up cq size
Pull block fix from Jens Axboe:
"Just a single fix, for a crash in the keyslot manager"
* tag 'block-5.10-2020-11-27' of git://git.kernel.dk/linux-block:
block/keyslot-manager: prevent crash when num_slots=1
Pull btrfs fixes from David Sterba:
"A few fixes for various warnings that accumulated over past two weeks:
- tree-checker: add missing return values for some errors
- lockdep fixes
- when reading qgroup config and starting quota rescan
- reverse order of quota ioctl lock and VFS freeze lock
- avoid accessing potentially stale fs info during device scan,
reported by syzbot
- add scope NOFS protection around qgroup relation changes
- check for running transaction before flushing qgroups
- fix tracking of new delalloc ranges for some cases"
* tag 'for-5.10-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix lockdep splat when enabling and disabling qgroups
btrfs: do nofs allocations when adding and removing qgroup relations
btrfs: fix lockdep splat when reading qgroup config on mount
btrfs: tree-checker: add missing returns after data_ref alignment checks
btrfs: don't access possibly stale fs_info data for printing duplicate device
btrfs: tree-checker: add missing return after error in root_item
btrfs: qgroup: don't commit transaction when we already hold the handle
btrfs: fix missing delalloc new bit for new delalloc ranges
Pull rdma fixes from Jason Gunthorpe:
"Two security issues and several small bug fixes. Things seem to have
stabilized for this release here.
Summary:
- Significant out of bounds access security issue in i40iw
- Fix misuse of mmu notifiers in hfi1
- Several errors in the register map/usage in hns
- Missing error returns in mthca"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/hns: Bugfix for memory window mtpt configuration
RDMA/hns: Fix retry_cnt and rnr_cnt when querying QP
RDMA/hns: Fix wrong field of SRQ number the device supports
IB/hfi1: Ensure correct mm is used at all times
RDMA/i40iw: Address an mmap handler exploit in i40iw
IB/mthca: fix return value of error branch in mthca_init_cq()
Pull mtd fixes from Miquel Raynal:
"Because of a recent change in the core, NAND controller drivers
initializing the ECC engine too early in the probe path are broken.
Drivers should wait for the NAND device to be discovered and its
memory layout known before doing any ECC related initialization, so
instead of reverting the faulty change which is actually moving in the
right direction, let's fix the drivers directly: socrates, sharpsl,
r852, plat_nand, pasemi, tmio, txx9ndfmc, orion, mpc5121, lpc32xx_slc,
lpc32xx_mlc, fsmc, diskonchip, davinci, cs553x, au1550, ams-delta,
xway and gpio"
* tag 'mtd/fixes-for-5.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux:
mtd: rawnand: socrates: Move the ECC initialization to ->attach_chip()
mtd: rawnand: sharpsl: Move the ECC initialization to ->attach_chip()
mtd: rawnand: r852: Move the ECC initialization to ->attach_chip()
mtd: rawnand: plat_nand: Move the ECC initialization to ->attach_chip()
mtd: rawnand: pasemi: Move the ECC initialization to ->attach_chip()
mtd: rawnand: tmio: Move the ECC initialization to ->attach_chip()
mtd: rawnand: txx9ndfmc: Move the ECC initialization to ->attach_chip()
mtd: rawnand: orion: Move the ECC initialization to ->attach_chip()
mtd: rawnand: mpc5121: Move the ECC initialization to ->attach_chip()
mtd: rawnand: lpc32xx_slc: Move the ECC initialization to ->attach_chip()
mtd: rawnand: lpc32xx_mlc: Move the ECC initialization to ->attach_chip()
mtd: rawnand: fsmc: Move the ECC initialization to ->attach_chip()
mtd: rawnand: diskonchip: Move the ECC initialization to ->attach_chip()
mtd: rawnand: davinci: Move the ECC initialization to ->attach_chip()
mtd: rawnand: cs553x: Move the ECC initialization to ->attach_chip()
mtd: rawnand: au1550: Move the ECC initialization to ->attach_chip()
mtd: rawnand: ams-delta: Move the ECC initialization to ->attach_chip()
mtd: rawnand: xway: Move the ECC initialization to ->attach_chip()
mtd: rawnand: gpio: Move the ECC initialization to ->attach_chip()
Pull spi fixes from Mark Brown:
"A few fixes for v5.10, one for the core which fixes some potential
races for controllers with multiple chip selects when configuration of
the chip select for one client device races with the addition and
initial setup of an additional client"
* tag 'spi-fix-v5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: dw: Fix spi registration for controllers overriding CS
spi: imx: fix the unbalanced spi runtime pm management
spi: spi-nxp-fspi: fix fspi panic by unexpected interrupts
spi: Take the SPI IO-mutex in the spi_setup() method
Pull virtual digital TV driver fixes from Mauro Carvalho Chehab:
"A series of fixes for the new virtual digital TV driver (vidtv), which
is meant to help doing tests with the digital TV core and media
userspace apps and libraries.
They cover a series of issues I found on it, together with a few new
things in order to make it easier to detect problems at the DVB core"
* tag 'media/v5.10-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (36 commits)
media: vidtv.rst: add kernel-doc markups
media: vidtv.rst: update vidtv documentation
media: vidtv: simplify EIT write function
media: vidtv: simplify NIT write function
media: vidtv: simplify SDT write function
media: vidtv: cleanup PMT write table function
media: vidtv: cleanup PAT write function
media: vidtv: cleanup PSI table header function
media: vidtv: cleanup PSI descriptor write function
media: vidtv: simplify the crc writing logic
media: vidtv: simplify PSI write function
media: vidtv: add date to the current event
media: vidtv: fix service_id at SDT table
media: vidtv: fix service type
media: vidtv: add a PID entry for the NIT table
media: vidtv: properly fill EIT service_id
media: vidtv: fix the network ID range
media: vidtv: improve EIT data
media: vidtv: cleanup null packet initialization logic
media: vidtv: pre-initialize mux arrays
...
Pull drm fixes from Dave Airlie:
"Unfortunately this has a bit of thanksgiving stuffing in it, as it a
bit larger (at least the vc4 patches) than I like at this point in
time.
The main thing is it has a bunch of regressions fixes for reports in
the last couple of weeks, ast, nouveau and the amdgpu ttm init fix,
along with the usual selection of amdgpu and i915 fixes.
The vc4 fixes are a few but they are fixes and the nastiest one is a
fix for when you have a 2.4Ghz Wifi and a HDMI signal with a clock in
that range and there isn't enough shielding and interference happen
between the two, the fix adjusts the mode clock to try and avoid the
wifi channels in that case.
Hopefully you can merge this between turkey slices, and next week
should be quieter.
ast:
- LUT loading regression fix
nouveau:
- relocations regression fix
amdgpu:
- ttm init oops fix
- Runtime pm fix
- SI UVD suspend/resume fix
- HDCP fix for headless cards
- Sienna Cichlid golden register update
i915:
- Fix Perf/OA workaround register corruption (Lionel)
- Correct a comment statement in GVT (Yan)
- Fix GT enable/disable iterrupts, including a race condition that
prevented GPU to go idle (Chris)
- Free stale request on destroying the virtual engine (Chris)
exynos:
- config dependency fix
mediatek:
- unused var removal
- horizonal front/back porch formula fix
vc4:
- wifi and hdmi interference fix
- mode rejection fixes
- use after free fix
- cleanup some code"
* tag 'drm-fixes-2020-11-27-1' of git://anongit.freedesktop.org/drm/drm: (28 commits)
drm/nouveau: fix relocations applying logic and a double-free
drm/ast: Reload gamma LUT after changing primary plane's color format
drm/amdgpu: Fix size calculation when init onchip memory
drm/amdgpu: update golden setting for sienna_cichlid
drm/amd/display: Avoid HDCP initialization in devices without output
drm/i915/gt: Free stale request on destroying the virtual engine
drm/i915/gt: Don't cancel the interrupt shadow too early
drm/i915/gt: Track signaled breadcrumbs outside of the breadcrumb spinlock
drm/amdgpu: fix a page fault
drm/amdgpu: fix SI UVD firmware validate resume fail
drm/amd/amdgpu: fix null pointer in runtime pm
drm/i915/gt: Defer enabling the breadcrumb interrupt to after submission
drm/i915/gvt: correct a false comment of flag F_UNALIGN
drm/i915/perf: workaround register corruption in OATAILPTR
drm/vc4: kms: Don't disable the muxing of an active CRTC
drm/vc4: kms: Store the unassigned channel list in the state
drm/exynos: depend on COMMON_CLK to fix compile tests
drm/mediatek: dsi: Modify horizontal front/back porch byte formula
drm/vc4: hdmi: Disable Wifi Frequencies
dt-bindings: display: Add a property to deal with WiFi coexistence
...
Marc Kleine-Budde says:
====================
pull-request: can 2020-11-27
The first patch is by me and target the gs_usb driver and fixes the endianess
problem with candleLight firmware.
Another patch by me for the mcp251xfd driver add sanity checking to bail out if
no IRQ is configured.
The next three patches target the m_can driver. A patch by me removes the
hardcoded IRQF_TRIGGER_FALLING from the request_threaded_irq() as this clashes
with the trigger level specified in the DT. Further a patch by me fixes the
nominal bitiming tseg2 min value for modern m_can cores. Pankaj Sharma's patch
add support for cores version 3.3.x.
The last patch by Oliver Hartkopp is for af_can and converts a WARN() into a
pr_warn(), which is triggered by the syzkaller. It was able to create a
situation where the closing of a socket runs simultaneously to the notifier
call chain for removing the CAN network device in use.
* tag 'linux-can-fixes-for-5.10-20201127' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
can: af_can: can_rx_unregister(): remove WARN() statement from list operation sanity check
can: m_can: m_can_dev_setup(): add support for bosch mcan version 3.3.0
can: m_can: fix nominal bitiming tseg2 min for version >= 3.1
can: m_can: m_can_open(): remove IRQF_TRIGGER_FALLING from request_threaded_irq()'s flags
can: mcp251xfd: mcp251xfd_probe(): bail out if no IRQ was given
can: gs_usb: fix endianess problem with candleLight firmware
====================
Link: https://lore.kernel.org/r/20201127100301.512603-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull x86 platform driver fixes from Hans de Goede:
- thinkpad_acpi fixes: two bug-fixes and three model specific quirks
- fixes for misc other drivers: two bug-fixes and three model specific
quirks
* tag 'platform-drivers-x86-v5.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/x86: touchscreen_dmi: Add info for the Irbis TW118 tablet
platform/x86: touchscreen_dmi: Add info for the Predia Basic tablet
platform/x86: intel-vbtn: Support for tablet mode on HP Pavilion 13 x360 PC
platform/x86: toshiba_acpi: Fix the wrong variable assignment
platform/x86: acer-wmi: add automatic keyboard background light toggle key as KEY_LIGHTS_TOGGLE
platform/x86: thinkpad_acpi: Whitelist P15 firmware for dual fan control
platform/x86: thinkpad_acpi: Send tablet mode switch at wakeup time
platform/x86: thinkpad_acpi: Add BAT1 is primary battery quirk for Thinkpad Yoga 11e 4th gen
platform/x86: thinkpad_acpi: Do not report SW_TABLET_MODE on Yoga 11e
platform/x86: thinkpad_acpi: add P1 gen3 second fan support
When setting sk_err, set it to ee_errno, not ee_origin.
Commit f5f99309fa ("sock: do not set sk_err in
sock_dequeue_err_skb") disabled updating sk_err on errq dequeue,
which is correct for most error types (origins):
- sk->sk_err = err;
Commit 38b257938a ("sock: reset sk_err when the error queue is
empty") reenabled the behavior for IMCP origins, which do require it:
+ if (icmp_next)
+ sk->sk_err = SKB_EXT_ERR(skb_next)->ee.ee_origin;
But read from ee_errno.
Fixes: 38b257938a ("sock: reset sk_err when the error queue is empty")
Reported-by: Ayush Ranjan <ayushranjan@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Link: https://lore.kernel.org/r/20201126151220.2819322-1-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If an msk listener receives an MPJ carrying an invalid token, it
will zero the request socket msk entry. That should later
cause fallback and subflow reset - as per RFC - at
subflow_syn_recv_sock() time due to failing hmac validation.
Since commit 4cf8b7e48a ("subflow: introduce and use
mptcp_can_accept_new_subflow()"), we unconditionally dereference
- in mptcp_can_accept_new_subflow - the subflow request msk
before performing hmac validation. In the above scenario we
hit a NULL ptr dereference.
Address the issue doing the hmac validation earlier.
Fixes: 4cf8b7e48a ("subflow: introduce and use mptcp_can_accept_new_subflow()")
Tested-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Link: https://lore.kernel.org/r/03b2cfa3ac80d8fc18272edc6442a9ddf0b1e34e.1606400227.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull kvm fixes from Paolo Bonzini:
"ARM:
- Fix alignment of the new HYP sections
- Fix GICR_TYPER access from userspace
S390:
- do not reset the global diag318 data for per-cpu reset
- do not mark memory as protected too early
- fix for destroy page ultravisor call
x86:
- fix for SEV debugging
- fix incorrect return code
- fix for 'noapic' with PIC in userspace and LAPIC in kernel
- fix for 5-level paging"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
kvm: x86/mmu: Fix get_mmio_spte() on CPUs supporting 5-level PT
KVM: x86: Fix split-irqchip vs interrupt injection window request
KVM: x86: handle !lapic_in_kernel case in kvm_cpu_*_extint
MAINTAINERS: Update email address for Sean Christopherson
MAINTAINERS: add uv.c also to KVM/s390
s390/uv: handle destroy page legacy interface
KVM: arm64: vgic-v3: Drop the reporting of GICR_TYPER.Last for userspace
KVM: SVM: fix error return code in svm_create_vcpu()
KVM: SVM: Fix offset computation bug in __sev_dbg_decrypt().
KVM: arm64: Correctly align nVHE percpu data
KVM: s390: remove diag318 reset code
KVM: s390: pv: Mark mm as protected after the set secure parameters and improve cleanup
Currently, the openvswitch module is not accepting the correctly formated
netlink message for the TTL decrement action. For both setting and getting
the dec_ttl action, the actions should be nested in the
OVS_DEC_TTL_ATTR_ACTION attribute as mentioned in the openvswitch.h uapi.
When the original patch was sent, it was tested with a private OVS userspace
implementation. This implementation was unfortunately not upstreamed and
reviewed, hence an erroneous version of this patch was sent out.
Leaving the patch as-is would cause problems as the kernel module could
interpret additional attributes as actions and vice-versa, due to the
actions not being encapsulated/nested within the actual attribute, but
being concatinated after it.
Fixes: 744676e777 ("openvswitch: add TTL decrement action")
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Link: https://lore.kernel.org/r/160622121495.27296.888010441924340582.stgit@wsfd-netdev64.ntdv.lab.eng.bos.redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull powerpc fixes from Michael Ellerman:
"Some more powerpc fixes for 5.10:
- regression fix for a boot failure on some 32-bit machines.
- fix for host crashes in the KVM system reset handling.
- fix for a possible oops in the KVM XIVE interrupt handling on
Power9.
- fix for host crashes triggerable via the KVM emulated MMIO handling
when running HPT guests.
- a couple of small build fixes.
Thanks to Andreas Schwab, Cédric Le Goater, Christophe Leroy, Erhard
Furtner, Greg Kurz, Greg Kurz, Németh Márton, Nicholas Piggin, Nick
Desaulniers, Serge Belyshev, and Stephen Rothwell"
* tag 'powerpc-5.10-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/64s: Fix allnoconfig build since uaccess flush
powerpc/64s/exception: KVM Fix for host DSI being taken in HPT guest MMU context
powerpc: Drop -me200 addition to build flags
KVM: PPC: Book3S HV: XIVE: Fix possible oops when accessing ESB page
powerpc/64s: Fix KVM system reset handling when CONFIG_PPC_PSERIES=y
powerpc/32s: Use relocation offset when setting early hash table
Pull arm64 fixes from Will Deacon:
"The main changes are relating to our handling of access/dirty bits,
where our low-level page-table helpers could lead to stale young
mappings and loss of the dirty bit in some cases (the latter has not
been observed in practice, but could happen when clearing "soft-dirty"
if we enabled that). These were posted as part of a larger series, but
the rest of that is less urgent and needs a v2 which I'll get to
shortly.
In other news, we've now got a set of fixes to resolve the
lockdep/tracing problems that have been plaguing us for a while, but
they're still a bit "fresh" and I plan to send them to you next week
after we've got some more confidence in them (although initial CI
results look good).
Summary:
- Fix kerneldoc warnings generated by ACPI IORT code
- Fix pte_accessible() so that access flag is ignored
- Fix missing header #include
- Fix loss of software dirty bit across pte_wrprotect() when HW DBM
is enabled"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: pgtable: Ensure dirty bit is preserved across pte_wrprotect()
arm64: pgtable: Fix pte_accessible()
ACPI/IORT: Fix doc warnings in iort.c
arm64/fpsimd: add <asm/insn.h> to <asm/kprobes.h> to fix fpsimd build
Pull iommu fixes from Will Deacon:
"Here's another round of IOMMU fixes for -rc6 consisting mainly of a
bunch of independent driver fixes. Thomas agreed for me to take the
x86 'tboot' fix here, as it fixes a regression introduced by a vt-d
change.
- Fix intel iommu driver when running on devices without VCCAP_REG
- Fix swiotlb and "iommu=pt" interaction under TXT (tboot)
- Fix missing return value check during device probe()
- Fix probe ordering for Qualcomm SMMU implementation
- Ensure page-sized mappings are used for AMD IOMMU buffers with SNP
RMP"
* tag 'iommu-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
iommu/vt-d: Don't read VCCAP register unless it exists
x86/tboot: Don't disable swiotlb when iommu is forced on
iommu: Check return of __iommu_attach_device()
arm-smmu-qcom: Ensure the qcom_scm driver has finished probing
iommu/amd: Enforce 4k mapping for certain IOMMU data structures
Pull printk fixes from Petr Mladek:
- do not lose trailing newline in pr_cont() calls
- two trivial fixes for a dead store and a config description
* tag 'printk-for-5.10-rc6-fixup' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
printk: finalize records with trailing newlines
printk: remove unneeded dead-store assignment
init/Kconfig: Fix CPU number in LOG_CPU_MAX_BUF_SHIFT description
Pull writeback fix from Jan Kara:
"A fix of possible missing string termination in writeback tracepoints"
* tag 'writeback_for_v5.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
trace: fix potenial dangerous pointer
Since some gcc generates a broken DWARF which lacks DW_AT_declaration
attribute from the subprogram DIE of function prototype.
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97060)
So, in addition to the DW_AT_declaration check, we also check the
subprogram DIE has DW_AT_inline or actual entry pc.
Committer testing:
# cat /etc/fedora-release
Fedora release 33 (Thirty Three)
#
Before:
# perf test vfs_getname
78: Use vfs_getname probe to get syscall args filenames : FAILED!
79: Check open filename arg using perf trace + vfs_getname : FAILED!
81: Add vfs_getname probe to get syscall args filenames : FAILED!
#
After:
# perf test vfs_getname
78: Use vfs_getname probe to get syscall args filenames : Ok
79: Check open filename arg using perf trace + vfs_getname : Ok
81: Add vfs_getname probe to get syscall args filenames : Ok
#
Reported-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Link: http://lore.kernel.org/lkml/160645613571.2824037.7441351537890235895.stgit@devnote2
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Currently perf stat shows some metrics (like IPC) for defined events.
But when no aggregation mode is used (-A option), it shows incorrect
values since it used a value from a different cpu.
Before:
$ perf stat -aA -e cycles,instructions sleep 1
Performance counter stats for 'system wide':
CPU0 116,057,380 cycles
CPU1 86,084,722 cycles
CPU2 99,423,125 cycles
CPU3 98,272,994 cycles
CPU0 53,369,217 instructions # 0.46 insn per cycle
CPU1 33,378,058 instructions # 0.29 insn per cycle
CPU2 58,150,086 instructions # 0.50 insn per cycle
CPU3 40,029,703 instructions # 0.34 insn per cycle
1.001816971 seconds time elapsed
So the IPC for CPU1 should be 0.38 (= 33,378,058 / 86,084,722)
but it was 0.29 (= 33,378,058 / 116,057,380) and so on.
After:
$ perf stat -aA -e cycles,instructions sleep 1
Performance counter stats for 'system wide':
CPU0 109,621,384 cycles
CPU1 159,026,454 cycles
CPU2 99,460,366 cycles
CPU3 124,144,142 cycles
CPU0 44,396,706 instructions # 0.41 insn per cycle
CPU1 120,195,425 instructions # 0.76 insn per cycle
CPU2 44,763,978 instructions # 0.45 insn per cycle
CPU3 69,049,079 instructions # 0.56 insn per cycle
1.001910444 seconds time elapsed
Fixes: 44d49a6002 ("perf stat: Support metrics in --per-core/socket mode")
Reported-by: Sam Xi <xyzsam@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20201127041404.390276-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It didn't check the tool->cgroup_events bit which is set when the
--all-cgroups option is given. Without it, samples will not have cgroup
info so no reason to synthesize.
We can check the PERF_RECORD_CGROUP records after running perf record
*WITHOUT* the --all-cgroups option:
Before:
$ perf report -D | grep CGROUP
0 0 0x8430 [0x38]: PERF_RECORD_CGROUP cgroup: 1 /
CGROUP events: 1
CGROUP events: 0
CGROUP events: 0
After:
$ perf report -D | grep CGROUP
CGROUP events: 0
CGROUP events: 0
CGROUP events: 0
Committer testing:
Before:
# perf record -a sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 2.208 MB perf.data (10003 samples) ]
# perf report -D | grep "CGROUP events"
CGROUP events: 146
CGROUP events: 0
CGROUP events: 0
#
After:
# perf record -a sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 2.208 MB perf.data (10448 samples) ]
# perf report -D | grep "CGROUP events"
CGROUP events: 0
CGROUP events: 0
CGROUP events: 0
#
With all-cgroups:
# perf record --all-cgroups -a sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 2.374 MB perf.data (11526 samples) ]
# perf report -D | grep "CGROUP events"
CGROUP events: 146
CGROUP events: 0
CGROUP events: 0
#
Fixes: 8fb4b67939 ("perf record: Add --all-cgroups option")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20201127054356.405481-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick the changes in:
7a078d2d18 ("libbpf, hashmap: Fix undefined behavior in hash_bits")
That don't entail any changes in tools/perf.
This addresses this perf build warning:
Warning: Kernel ABI header at 'tools/perf/util/hashmap.h' differs from latest version at 'tools/lib/bpf/hashmap.h'
diff -u tools/perf/util/hashmap.h tools/lib/bpf/hashmap.h
Not a kernel ABI, its just that this uses the mechanism in place for
checking kernel ABI files drift.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add writeback to valid OP-TEE shared memory types
Allows OP-TEE to work with ARMv7 based single CPU systems by allowing
writeback cache policy for shared memory.
* tag 'optee-valid-memory-type-for-v5.11' of git://git.linaro.org/people/jens.wiklander/linux-tee:
optee: add writeback to valid memory type
Link: https://lore.kernel.org/r/20201125120134.GA1642471@jade
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Currently, if mce_end() fails, no_way_out - the variable denoting
whether the machine can recover from this MCE - is determined by whether
the worst severity that was found across the MCA banks associated with
the current CPU, is of panic severity.
However, at this point no_way_out could have been already set by
mca_start() after looking at all severities of all CPUs that entered the
MCE handler. If mce_end() fails, check first if no_way_out is already
set and, if so, stick to it, otherwise use the local worst value.
[ bp: Massage. ]
Signed-off-by: Gabriele Paoloni <gabriele.paoloni@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20201127161819.3106432-2-gabriele.paoloni@intel.com
Commit 95fb5b0258 ("kvm: x86/mmu: Support MMIO in the TDP MMU") caused
the following WARNING on an Intel Ice Lake CPU:
get_mmio_spte: detect reserved bits on spte, addr 0xb80a0, dump hierarchy:
------ spte 0xb80a0 level 5.
------ spte 0xfcd210107 level 4.
------ spte 0x1004c40107 level 3.
------ spte 0x1004c41107 level 2.
------ spte 0x1db00000000b83b6 level 1.
WARNING: CPU: 109 PID: 10254 at arch/x86/kvm/mmu/mmu.c:3569 kvm_mmu_page_fault.cold.150+0x54/0x22f [kvm]
...
Call Trace:
? kvm_io_bus_get_first_dev+0x55/0x110 [kvm]
vcpu_enter_guest+0xaa1/0x16a0 [kvm]
? vmx_get_cs_db_l_bits+0x17/0x30 [kvm_intel]
? skip_emulated_instruction+0xaa/0x150 [kvm_intel]
kvm_arch_vcpu_ioctl_run+0xca/0x520 [kvm]
The guest triggering this crashes. Note, this happens with the traditional
MMU and EPT enabled, not with the newly introduced TDP MMU. Turns out,
there was a subtle change in the above mentioned commit. Previously,
walk_shadow_page_get_mmio_spte() was setting 'root' to 'iterator.level'
which is returned by shadow_walk_init() and this equals to
'vcpu->arch.mmu->shadow_root_level'. Now, get_mmio_spte() sets it to
'int root = vcpu->arch.mmu->root_level'.
The difference between 'root_level' and 'shadow_root_level' on CPUs
supporting 5-level page tables is that in some case we don't want to
use 5-level, in particular when 'cpuid_maxphyaddr(vcpu) <= 48'
kvm_mmu_get_tdp_level() returns '4'. In case upper layer is not used,
the corresponding SPTE will fail '__is_rsvd_bits_set()' check.
Revert to using 'shadow_root_level'.
Fixes: 95fb5b0258 ("kvm: x86/mmu: Support MMIO in the TDP MMU")
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20201126110206.2118959-1-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
kvm_cpu_accept_dm_intr and kvm_vcpu_ready_for_interrupt_injection are
a hodge-podge of conditions, hacked together to get something that
more or less works. But what is actually needed is much simpler;
in both cases the fundamental question is, do we have a place to stash
an interrupt if userspace does KVM_INTERRUPT?
In userspace irqchip mode, that is !vcpu->arch.interrupt.injected.
Currently kvm_event_needs_reinjection(vcpu) covers it, but it is
unnecessarily restrictive.
In split irqchip mode it's a bit more complicated, we need to check
kvm_apic_accept_pic_intr(vcpu) (the IRQ window exit is basically an INTACK
cycle and thus requires ExtINTs not to be masked) as well as
!pending_userspace_extint(vcpu). However, there is no need to
check kvm_event_needs_reinjection(vcpu), since split irqchip keeps
pending ExtINT state separate from event injection state, and checking
kvm_cpu_has_interrupt(vcpu) is wrong too since ExtINT has higher
priority than APIC interrupts. In fact the latter fixes a bug:
when userspace requests an IRQ window vmexit, an interrupt in the
local APIC can cause kvm_cpu_has_interrupt() to be true and thus
kvm_vcpu_ready_for_interrupt_injection() to return false. When this
happens, vcpu_run does not exit to userspace but the interrupt window
vmexits keep occurring. The VM loops without any hope of making progress.
Once we try to fix these with something like
return kvm_arch_interrupt_allowed(vcpu) &&
- !kvm_cpu_has_interrupt(vcpu) &&
- !kvm_event_needs_reinjection(vcpu) &&
- kvm_cpu_accept_dm_intr(vcpu);
+ (!lapic_in_kernel(vcpu)
+ ? !vcpu->arch.interrupt.injected
+ : (kvm_apic_accept_pic_intr(vcpu)
+ && !pending_userspace_extint(v)));
we realize two things. First, thanks to the previous patch the complex
conditional can reuse !kvm_cpu_has_extint(vcpu). Second, the interrupt
window request in vcpu_enter_guest()
bool req_int_win =
dm_request_for_irq_injection(vcpu) &&
kvm_cpu_accept_dm_intr(vcpu);
should be kept in sync with kvm_vcpu_ready_for_interrupt_injection():
it is unnecessary to ask the processor for an interrupt window
if we would not be able to return to userspace. Therefore,
kvm_cpu_accept_dm_intr(vcpu) is basically !kvm_cpu_has_extint(vcpu)
ANDed with the existing check for masked ExtINT. It all makes sense:
- we can accept an interrupt from userspace if there is a place
to stash it (and, for irqchip split, ExtINTs are not masked).
Interrupts from userspace _can_ be accepted even if right now
EFLAGS.IF=0.
- in order to tell userspace we will inject its interrupt ("IRQ
window open" i.e. kvm_vcpu_ready_for_interrupt_injection), both
KVM and the vCPU need to be ready to accept the interrupt.
... and this is what the patch implements.
Reported-by: David Woodhouse <dwmw@amazon.co.uk>
Analyzed-by: David Woodhouse <dwmw@amazon.co.uk>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Nikos Tsironis <ntsironis@arrikto.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Tested-by: David Woodhouse <dwmw@amazon.co.uk>
Centralize handling of interrupts from the userspace APIC
in kvm_cpu_has_extint and kvm_cpu_get_extint, since
userspace APIC interrupts are handled more or less the
same as ExtINTs are with split irqchip. This removes
duplicated code from kvm_cpu_has_injectable_intr and
kvm_cpu_has_interrupt, and makes the code more similar
between kvm_cpu_has_{extint,interrupt} on one side
and kvm_cpu_get_{extint,interrupt} on the other.
Cc: stable@vger.kernel.org
Reviewed-by: Filippo Sironi <sironi@amazon.de>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Tested-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Fixes for omaps for various issues noticed during the -rc cycle:
- Earlier omap4 cpuidle fix was incomplete and needs to use a
configured idle state instead
- Fix am4 cpsw driver compatible to avoid invalid resource error
for the legacy driver
- Two kconfig fixes for genpd support that we added for for v5.10
for proper location of the option and adding missing option
- Fix ti-sysc reset status checking on enabling modules to ignore
quirky modules with reset status only usable when the quirk is
activated during reset. Also fix bogus resetdone warning for
cpsw and modules with no sysst register reset status bit
- Suppress a ti-sysc warning for timers reserved as system timers
- Fix the ordering of clocks for dra7 m_can
* tag 'omap-for-v5.10/fixes-rc5-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
ARM: dts: dra76x: m_can: fix order of clocks
bus: ti-sysc: suppress err msg for timers used as clockevent/source
ARM: dts: am437x-l4: fix compatible for cpsw switch dt node
ARM: OMAP2+: Manage MPU state properly for omap_enter_idle_coupled()
bus: ti-sysc: Fix bogus resetdone warning on enable for cpsw
bus: ti-sysc: Fix reset status check for modules with quirks
ARM: OMAP2+: Fix missing select PM_GENERIC_DOMAINS_OF
ARM: OMAP2+: Fix location for select PM_GENERIC_DOMAINS
Link: https://lore.kernel.org/r/pull-1606460270-864284@atomide.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
At lest the revision 3.3.0 of the bosch m_can IP core specifies that valid
register values for "Nominal Time segment after sample point (NTSEG2)" are from
1 to 127. As the hardware uses a value of one more than the programmed value,
mean tseg2_min is 2.
This patch fixes the tseg2_min value accordingly.
Cc: Dan Murphy <dmurphy@ti.com>
Cc: Mario Huettel <mario.huettel@gmx.net>
Acked-by: Sriram Dash <sriram.dash@samsung.com>
Link: https://lore.kernel.org/r/20201124190751.3972238-1-mkl@pengutronix.de
Fixes: b03cfc5bb0 ("can: m_can: Enable M_CAN version dependent initialization")
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
The threaded IRQ handler is used for the tcan4x5x driver only. The IRQ pin of
the tcan4x5x controller is active low, so better not use IRQF_TRIGGER_FALLING
when requesting the IRQ. As this can result in missing interrupts.
Further, if the device tree specified the interrupt as "IRQ_TYPE_LEVEL_LOW",
unloading and reloading of the driver results in the following error during
ifup:
| irq: type mismatch, failed to map hwirq-31 for gpio@20a8000!
| tcan4x5x spi1.1: m_can device registered (irq=0, version=32)
| tcan4x5x spi1.1 can2: TCAN4X5X successfully initialized.
| tcan4x5x spi1.1 can2: failed to request interrupt
This patch fixes the problem by removing the IRQF_TRIGGER_FALLING from the
request_threaded_irq().
Fixes: f524f829b7 ("can: m_can: Create a m_can platform framework")
Cc: Dan Murphy <dmurphy@ti.com>
Cc: Sriram Dash <sriram.dash@samsung.com>
Cc: Pankaj Sharma <pankj.sharma@samsung.com>
Link: https://lore.kernel.org/r/20201127093548.509253-1-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
arm64: tegra: Device tree fixes for v5.10-rc6
This contains a couple of fixes to device trees. Among other things,
this restores suspend/resume on Jetson TX2 and makes USB OTG work on
Jetson TX1.
* tag 'tegra-for-5.10-arm64-dt-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux:
arm64: tegra: Fix Tegra234 VDK node names
arm64: tegra: Wrong AON HSP reg property size
arm64: tegra: Fix USB_VBUS_EN0 regulator on Jetson TX1
arm64: tegra: Correct the UART for Jetson Xavier NX
arm64: tegra: Disable the ACONNECT for Jetson TX2
Link: https://lore.kernel.org/r/20201125170306.1095734-1-thierry.reding@gmail.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
arm64: soc: ZynqMP SoC fixes for v5.10-rc6
- Fix SD dll reset issue by using proper macro
- Fix PM feature checking for Xilinx Versal SoC
* tag 'zynqmp-soc-fixes-for-v5.10-rc6' of https://github.com/Xilinx/linux-xlnx: (337 commits)
firmware: xilinx: Use hash-table for api feature check
firmware: xilinx: Fix SD DLL node reset issue
Linux 5.10-rc4
kvm: mmu: fix is_tdp_mmu_check when the TDP MMU is not in use
afs: Fix afs_write_end() when called with copied == 0 [ver #3]
ocfs2: initialize ip_next_orphan
panic: don't dump stack twice on warn
hugetlbfs: fix anon huge page migration race
mm: memcontrol: fix missing wakeup polling thread
kernel/watchdog: fix watchdog_allowed_mask not used warning
reboot: fix overflow parsing reboot cpu number
Revert "kernel/reboot.c: convert simple_strtoul to kstrtoint"
compiler.h: fix barrier_data() on clang
mm/gup: use unpin_user_pages() in __gup_longterm_locked()
mm/slub: fix panic in slab_alloc_node()
mailmap: fix entry for Dmitry Baryshkov/Eremin-Solenikov
mm/vmscan: fix NR_ISOLATED_FILE corruption on 64-bit
mm/compaction: stop isolation if too many pages are isolated and we have pages to migrate
mm/compaction: count pages and stop correctly during page isolation
drm/nouveau/kms/nv50-: Use atomic encoder callbacks everywhere
...
Link: https://lore.kernel.org/r/fd5ab967-f3cf-95fb-7947-5477ff85f97e@monstr.eu
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Pull power management fix from Rafael Wysocki:
"Fix a recently introduced build issue in the cpufreq SCMI driver
(Sudeep Holla)"
* tag 'pm-5.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: scmi: Fix build for !CONFIG_COMMON_CLK
0day reported one -22.7% regression for will-it-scale page_fault2
case [1] on a 4 sockets 144 CPU platform, and bisected to it to be
caused by Waiman's optimization (commit bd0b230fe1) of saving one
'struct page_counter' space for 'struct mem_cgroup'.
Initially we thought it was due to the cache alignment change introduced
by the patch, but further debug shows that it is due to some hot data
members ('vmstats_local', 'vmstats_percpu', 'vmstats') sit in 2 adjacent
cacheline (2N and 2N+1 cacheline), and when adjacent cache line prefetch
is enabled, it triggers an "extended level" of cache false sharing for
2 adjacent cache lines.
So exchange the 2 member blocks, while keeping mostly the original
cache alignment, which can restore and even enhance the performance,
and save 64 bytes of space for 'struct mem_cgroup' (from 2880 to 2816,
with 0day's default RHEL-8.3 kernel config)
[1]. https://lore.kernel.org/lkml/20201102091543.GM31092@shao2-debian/
Fixes: bd0b230fe1 ("mm/memcg: unify swap and memsw page counters")
Reported-by: kernel test robot <rong.a.chen@intel.com>
Signed-off-by: Feng Tang <feng.tang@intel.com>
Acked-by: Waiman Long <longman@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When one task is in io_uring_cancel_files() and another is doing
io_prep_async_work() a race may happen. That's because after accounting
a request inflight in first call to io_grab_identity() it still may fail
and go to io_identity_cow(), which migh briefly keep dangling
work.identity and not only.
Grab files last, so io_prep_async_work() won't fail if it did get into
->inflight_list.
note: the bug shouldn't exist after making io_uring_cancel_files() not
poking into other tasks' requests.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The Thinkpad Yoga 11e 4th gen with the N3450 / Celeron CPU only has
one battery which is named BAT1 instead of the expected BAT0, add a
quirk for this. This fixes not being able to set the charging tresholds
on this model; and this alsoe fixes the following errors in dmesg:
ACPI: \_SB_.PCI0.LPCB.EC__.HKEY: BCTG evaluated but flagged as error
thinkpad_acpi: Error probing battery 2
battery: extension failed to load: ThinkPad Battery Extension
battery: extension unregistered: ThinkPad Battery Extension
Note that the added quirk is for the "R0K" BIOS versions which are
used on the Thinkpad Yoga 11e 4th gen's with a Celeron CPU, there
is a separate "R0L" BIOS for the i3/i5 based versions. This may also
need the same quirk, but if that really is necessary is unknown.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20201109103550.16265-1-hdegoede@redhat.com
The Yoga 11e series has 2 accelerometers described by a BOSC0200 ACPI node.
This setup relies on a Windows service which reads both accelerometers and
then calculates the angle between the 2 halves to determine laptop / tent /
tablet mode and then reports the calculated mode back to the EC by calling
special ACPI methods on the BOSC0200 node.
The bmc150 iio driver does not support this (it involves double
calculations requiring sqrt and arccos so this really needs to be done
in userspace), as a result of this on the Yoga 11e the thinkpad_acpi
code always reports SW_TABLET_MODE=0, starting with GNOME 3.38 reporting
SW_TABLET_MODE=0 causes GNOME to:
1. Not show the onscreen keyboard when a text-input field is focussed
with the touchscreen.
2. Disable accelerometer based auto display-rotation.
This makes sense when in laptop-mode but not when in tablet-mode. But
since for the Yoga 11e the thinkpad_acpi code always reports
SW_TABLET_MODE=0, GNOME does not know when the device is in tablet-mode.
Stop reporting the broken (always 0) SW_TABLET_MODE on Yoga 11e models
to fix this.
Note there are plans for userspace to support 360 degree hinges style
2-in-1s with 2 accelerometers and figure out the mode by itself, see:
https://gitlab.freedesktop.org/hadess/iio-sensor-proxy/-/issues/216
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20201106140130.46820-1-hdegoede@redhat.com
This patch add a check to the mcp251xfd_probe() function to bail out and give
the user a proper error message if no IRQ is specified. Otherwise the driver
will probe just fine but ifup will fail with a meaningless "RTNETLINK answers:
Invalid argument" error message.
Link: https://lore.kernel.org/r/20201123113522.3820052-1-mkl@pengutronix.de
Reported-by: Niels Petter <petter@ka-long.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
The firmware on the original USB2CAN by Geschwister Schneider Technologie
Entwicklungs- und Vertriebs UG exchanges all data between the host and the
device in host byte order. This is done with the struct
gs_host_config::byte_order member, which is sent first to indicate the desired
byte order.
The widely used open source firmware candleLight doesn't support this feature
and exchanges the data in little endian byte order. This breaks if a device
with candleLight firmware is used on big endianess systems.
To fix this problem, all u32 (but not the struct gs_host_frame::echo_id, which
is a transparent cookie) are converted to __le32.
Cc: Maximilian Schneider <max@schneidersoft.net>
Cc: Hubert Denkmair <hubert@denkmair.de>
Reported-by: Michael Rausch <mr@netadair.de>
Link: https://lore.kernel.org/r/b58aace7-61f3-6df7-c6df-69fee2c66906@netadair.de
Tested-by: Oleksij Rempel <o.rempel@pengutronix.de>
Fixes: d08e973a77 ("can: gs_usb: Added support for the GS_USB CAN devices")
Link: https://lore.kernel.org/r/20201120103818.3386964-1-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Avoid initializing the structs multiple times and pass the
PAT struct as a pointer, instead of a var.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
- Pass struct header_write_args as a pointer, instead of
passing as a var;
- Initialize the psi_args struct only once.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
This function initializes the psi_args twice, and receives
a struct, instead of a pointer to a struct.
Clean it up.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Cleanup the table_section_crc32_write_into() function
by initializing struct psi_write_args only once and by
passing the args as a pointer.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
The function vidtv_psi_ts_psi_write_into() initializes the
ts_header fields several times, and receives a struct
as argument, instead of using a pointer to struct.
Cleanup the function, in order to reduce its stack usage
and to avoid initializing the ts_header multiple times.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
The current event is using an undefined date. Instead, it
should be the timestamp when the EIT table was generated.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
The service_id there should be equal to the one used
on other tables, otherwise, EIT entries won't be valid.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
As the service currently broadcasts just audio, change the
service type to reflect that.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
On normal TS streams, the NIT table has its own entry at PAT,
but not at PMT.
While here, properly handle alloc problems when creating
PMT entries.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
The EIT header ID field should not contain the network ID, but,
instead, the service_id of the program described at EIT.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
As defined at ETSI TS 101 162, original network IDs up to 0xfebf
are reserved for registration at dvb.org.
Let's use, instead, an original network ID at the range
0xff00-0xffff, as this is for private temporary usage.
As the same value is also used for the network ID,
the range 0xff01-0xffff also fits better, as values
lower than that depend if the network is used for
satellite, terrestrial, cable of CI.
While here, move the TS ID to the bridge code, where it
is used, and change its value, as it was identical to
the value previously used by network ID. While we could
keep the same value, let's change it, just to make easier
to check for the new code while reading it with DVB tools
like dvbinspector.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Place some text at EIT data, and use ISO 8859-15 encoding for
the German letter "ü" (u mit umlat) letter.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Initialize the destination buffer/size and the initial
offset when creating the local var.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Instead of first zeroing all fields at the mux structs and
then filling, do some initialization for the const data
when they're created.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Right now, there's no need to access the length of some
tables. So, drop the unused functions.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Do some cleanups at the coding style of the driver:
- remove "inline" declarations;
- use reverse xmas-tree for local var declarations;
- Adjust some indent to avoid breaking 80-cols;
- Cleanup some comments.
No functional changes.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Minimize the number of data copies and initialization at
the code, passing them as pointers instead of duplicating
the data.
The only case where we're keeping the data copy is at
vidtv_pes_write_h(), as it needs a copy of the passed
arguments. On such case, we're being more explicit.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Initialize the fields of the arguments directly when
declaring it, and pass the args as a pointer, instead of
copying them.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
The sheet music used to generate the tones had a few
polyphonic notes. Due to that, its conversion to a
tones sequence had a few errors.
Also, due to a bug at the tone generator, it was missing
the pause at the initial compass.
Fix them.
While here, reduce the compass to 100bpm.
The music was converted from a Music XML file using
this small script:
<snip>
my $count = 0;
my $silent = 0;
my $note;
my $octave;
print "\t";
while (<>) {
$note = $1 if (m,\<step\>(.*)\</step\>,);
$octave = "_$1" if (m,\<octave\>(.*)\</octave\>,);
if (m,\<alter\>1\</alter\>,) {
$note .= "S";
$sharp = 1;
}
if (m,\<rest/\>,) {
$note = "SILENT";
$silent = 1;
}
if (m,\<duration\>(.*)\</duration\>,) {
printf "{ NOTE_${note}${octave}, %d},", $1 * 128 / 480;
$count++;
if ($silent || $count >= 3) {
print "\n\t";
$count = 0;
$silent = 0;
} else {
print " ";
print " " if (!$sharp);
}
$sharp = 0;
$note = "";
$octave = "";
};
};
print "\n";
</snip>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
The tone generator logic were repeating the song after the
first silent. There's also a wrong logic at the note
offset calculus, which may create some noise.
Fix it.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
While the original plan was to use the first movement of
the 5th Symphony, it was opted to use the Für Elise song,
instead.
Fix it.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
The Linux stack is too short. So, using recursive functions
is a very bad idea. Convert those into non-recursive ones.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Currently, there are not checks if something gets bad during
memory allocation: it will simply use NULL pointers and
crash.
Add error path at the logic which allocates memory for the
MPEG-TS generator code, propagating the errors up to the
vidtv_bridge. Now, if something wents bad, start_streaming
will return an error that userspace can detect:
ERROR DMX_SET_PES_FILTER failed (PID = 0x2000): 12 Cannot allocate memory
and the driver doesn't crash.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
- Place the includes on alphabetical order;
- get rid of asm/byteorder.h;
- add bug.h at vidtv_s302m.c, as it is needed by
inux/fixp-arith.h
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Some variables were only assigned once but were used in while
loops as if they had been updated at every iteration. Fix this.
Signed-off-by: Daniel W. S. Almeida <dwlsalmeida@gmail.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
A few fields used only by the tone generator in the s302m encoder
are stored in struct vidtv_encoder. Move them into
struct vidtv_s302m_ctx instead. While we are at it: fix a
checkpatch warning for long lines.
Signed-off-by: Daniel W. S. Almeida <dwlsalmeida@gmail.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
The code to append a descriptor to the end of a chain is repeated
throughout the psi generator code. Extract it into its own helper
function to avoid cluttering.
Signed-off-by: Daniel W. S. Almeida <dwlsalmeida@gmail.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Implement an Event Information Table (EIT) as per EN 300 468
5.2.4.
The EIT provides information in chronological order regarding
the events contained within each service.
For now only present event information is supported.
[mchehab+huawei@kernel.org: removed an extra blank line]
Signed-off-by: Daniel W. S. Almeida <dwlsalmeida@gmail.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Add a Network Information Table (NIT) as specified in ETSI EN 300 468.
This table conveys information relating to the physical organization of
the multiplexes carried via a given network and the characteristics of
the network itself.
It is conveyed in the output of vidtv as packets with TS PID of 0x0010
[mchehab+huawei@kernel.org: removed an extra blank line]
Signed-off-by: Daniel W. S. Almeida <dwlsalmeida@gmail.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
The same constant (0xffffffff) is used in three different functions.
Extract it into a #define to avoid repetition.
Signed-off-by: Daniel W. S. Almeida <dwlsalmeida@gmail.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
tls_device_offload_cleanup_rx doesn't clear tls_ctx->netdev after
calling tls_dev_del if TLX TX offload is also enabled. Clearing
tls_ctx->netdev gets postponed until tls_device_gc_task. It leaves a
time frame when tls_device_down may get called and call tls_dev_del for
RX one extra time, confusing the driver, which may lead to a crash.
This patch corrects this racy behavior by adding a flag to prevent
tls_device_down from calling tls_dev_del the second time.
Fixes: e8f6979981 ("net/tls: Add generic NIC offload infrastructure")
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20201125221810.69870-1-saeedm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Parav Pandit says:
====================
devlink port attribute fixes
This patchset contains 2 small fixes for devlink port attributes.
Patch summary:
Patch-1 synchronize the devlink port attribute reader
with net namespace change operation
Patch-2 Ensure to return devlink port's netdevice attributes
when netdev and devlink instance belong to same net namespace
====================
Link: https://lore.kernel.org/r/20201125091620.6781-1-parav@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When devlink reload operation is not used, netdev of an Ethernet port may
be present in different net namespace than the net namespace of the
devlink instance.
Ensure that both the devlink instance and devlink port netdev are located
in same net namespace.
Fixes: 070c63f20f ("net: devlink: allow to change namespaces during reload")
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
A netdevice of a devlink port can be moved to different net namespace
than its parent devlink instance.
This scenario occurs when devlink reload is not used.
When netdevice is undergoing migration to net namespace, its ifindex
and name may change.
In such use case, devlink port query may read stale netdev attributes.
Fix it by reading them under rtnl lock.
Fixes: bfcd3a4661 ("Introduce devlink infrastructure")
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Two earlier bug fixes have created a security problem in the hfi1
driver. One fix aimed to solve an issue where current->mm was not valid
when closing the hfi1 cdev. It attempted to do this by saving a cached
value of the current->mm pointer at file open time. This is a problem if
another process with access to the FD calls in via write() or ioctl() to
pin pages via the hfi driver. The other fix tried to solve a use after
free by taking a reference on the mm.
To fix this correctly we use the existing cached value of the mm in the
mmu notifier. Now we can check in the insert, evict, etc. routines that
current->mm matched what the notifier was registered for. If not, then
don't allow access. The register of the mmu notifier will save the mm
pointer.
Since in do_exit() the exit_mm() is called before exit_files(), which
would call our close routine a reference is needed on the mm. We rely on
the mmgrab done by the registration of the notifier, whereas before it was
explicit. The mmu notifier deregistration happens when the user context is
torn down, the creation of which triggered the registration.
Also of note is we do not do any explicit work to protect the interval
tree notifier. It doesn't seem that this is going to be needed since we
aren't actually doing anything with current->mm. The interval tree
notifier stuff still has a FIXME noted from a previous commit that will be
addressed in a follow on patch.
Cc: <stable@vger.kernel.org>
Fixes: e0cf75deab ("IB/hfi1: Fix mm_struct use after free")
Fixes: 3faa3d9a30 ("IB/hfi1: Make use of mm consistent")
Link: https://lore.kernel.org/r/20201125210112.104301.51331.stgit@awfm-01.aw.intel.com
Suggested-by: Jann Horn <jannh@google.com>
Reported-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
The tc-taprio base time indicates the beginning of the tc-taprio
schedule, which is cyclic by definition (where the length of the cycle
in nanoseconds is called the cycle time). The base time is a 64-bit PTP
time in the TAI domain.
Logically, the base-time should be a future time. But that imposes some
restrictions to user space, which has to retrieve the current PTP time
from the NIC first, then calculate a base time that will still be larger
than the base time by the time the kernel driver programs this value
into the hardware. Actually ensuring that the programmed base time is in
the future is still a problem even if the kernel alone deals with this.
Luckily, the enetc hardware already advances a base-time that is in the
past into a congruent time in the immediate future, according to the
same formula that can be found in the software implementation of taprio
(in taprio_get_start_time):
/* Schedule the start time for the beginning of the next
* cycle.
*/
n = div64_s64(ktime_sub_ns(now, base), cycle);
*start = ktime_add_ns(base, (n + 1) * cycle);
There's only one problem: the driver doesn't let the hardware do that.
It interferes with the base-time passed from user space, by special-casing
the situation when the base-time is zero, and replaces that with the
current PTP time. This changes the intended effective base-time of the
schedule, which will in the end have a different phase offset than if
the base-time of 0.000000000 was to be advanced by an integer multiple
of the cycle-time.
Fixes: 34c6adf197 ("enetc: Configure the Time-Aware Scheduler via tc-taprio offload")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20201124220259.3027991-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
After cited commit, gro_cells_destroy() became damn slow
on hosts with a lot of cores.
This is because we have one additional synchronize_net() per cpu as
stated in the changelog.
gro_cells_init() is setting NAPI_STATE_NO_BUSY_POLL, and this was enough
to not have one synchronize_net() call per netif_napi_del()
We can factorize all the synchronize_net() to a single one,
right before freeing per-cpu memory.
Fixes: 5198d545db ("net: remove napi_hash_del() from driver-facing API")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20201124203822.1360107-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When spectre_v2_user={seccomp,prctl},ibpb is specified on the command
line, IBPB is force-enabled and STIPB is conditionally-enabled (or not
available).
However, since
21998a3515 ("x86/speculation: Avoid force-disabling IBPB based on STIBP and enhanced IBRS.")
the spectre_v2_user_ibpb variable is set to SPECTRE_V2_USER_{PRCTL,SECCOMP}
instead of SPECTRE_V2_USER_STRICT, which is the actual behaviour.
Because the issuing of IBPB relies on the switch_mm_*_ibpb static
branches, the mitigations behave as expected.
Since
1978b3a53a ("x86/speculation: Allow IBPB to be conditionally enabled on CPUs with always-on STIBP")
this discrepency caused the misreporting of IB speculation via prctl().
On CPUs with STIBP always-on and spectre_v2_user=seccomp,ibpb,
prctl(PR_GET_SPECULATION_CTRL) would return PR_SPEC_PRCTL |
PR_SPEC_ENABLE instead of PR_SPEC_DISABLE since both IBPB and STIPB are
always on. It also allowed prctl(PR_SET_SPECULATION_CTRL) to set the IB
speculation mode, even though the flag is ignored.
Similarly, for CPUs without SMT, prctl(PR_GET_SPECULATION_CTRL) should
also return PR_SPEC_DISABLE since IBPB is always on and STIBP is not
available.
[ bp: Massage commit message. ]
Fixes: 21998a3515 ("x86/speculation: Avoid force-disabling IBPB based on STIBP and enhanced IBRS.")
Fixes: 1978b3a53a ("x86/speculation: Allow IBPB to be conditionally enabled on CPUs with always-on STIBP")
Signed-off-by: Anand K Mistry <amistry@google.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20201110123349.1.Id0cbf996d2151f4c143c90f9028651a5b49a5908@changeid
Pull media fixes from Mauro Carvalho Chehab:
- a rand Kconfig fixup for mtk-vcodec
- a fix at h264 handling at cedrus codec driver
- some warning fixes when config PM is not enabled at marvell-ccic
- two fixes at venus codec driver: one related to codec profile and the
other one related to a bad error path which causes an OOPS on module
re-bind
* tag 'media/v5.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
media: venus: pm_helpers: Fix kernel module reload
media: venus: venc: Fix setting of profile and level
media: cedrus: h264: Fix check for presence of scaling matrix
media: media/platform/marvell-ccic: fix warnings when CONFIG_PM is not enabled
media: mtk-vcodec: fix build breakage when one of VPU or SCP is enabled
media: mtk-vcodec: move firmware implementations into their own files
riscv's <vdso/processor.h> uses barrier() so it should include
<asm/barrier.h>
Fixes this build error:
CC [M] drivers/net/ethernet/emulex/benet/be_main.o
In file included from ./include/vdso/processor.h:10,
from ./arch/riscv/include/asm/processor.h:11,
from ./include/linux/prefetch.h:15,
from drivers/net/ethernet/emulex/benet/be_main.c:14:
./arch/riscv/include/asm/vdso/processor.h: In function 'cpu_relax':
./arch/riscv/include/asm/vdso/processor.h:14:2: error: implicit declaration of function 'barrier' [-Werror=implicit-function-declaration]
14 | barrier();
This happens with a total of 5 networking drivers -- they all use
<linux/prefetch.h>.
rv64 allmodconfig now builds cleanly after this patch.
Fixes fallout from:
815f0ddb34 ("include/linux/compiler*.h: make compiler-*.h mutually exclusive")
Fixes: ad5d1122b8 ("riscv: use vDSO common flow to reduce the latency of the time-related functions")
Reported-by: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
The jump_label_init() should be called from setup_arch() very
early for proper functioning of jump label support.
Fixes: ebc00dde8a ("riscv: Add jump-label implementation")
Signed-off-by: Anup Patel <anup.patel@wdc.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Commit a968433723 ("kbuild: explicitly specify the build id style")
explicitly set the build ID style to SHA1. Commit c2c81bb2f6 ("RISC-V:
Fix the VDSO symbol generaton for binutils-2.35+") undid this change,
likely unintentionally.
Restore it so that the build ID style stays consistent across the tree
regardless of linker.
Fixes: c2c81bb2f6 ("RISC-V: Fix the VDSO symbol generaton for binutils-2.35+")
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Bill Wendling <morbo@google.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
CONFIG_EFI_EARLYCON defaults to yes, and thus is enabled on systems that
do not support EFI, or do not have EFI support enabled, but do satisfy
the symbol's other dependencies.
While drivers/firmware/efi/ won't be entered during the build phase if
CONFIG_EFI=n, and drivers/firmware/efi/earlycon.c itself thus won't be
built, enabling EFI_EARLYCON does force-enable CONFIG_FONT_SUPPORT and
CONFIG_ARCH_USE_MEMREMAP_PROT, and CONFIG_FONT_8x16, which is
undesirable.
Fix this by making CONFIG_EFI_EARLYCON depend on CONFIG_EFI.
This reduces kernel size on headless systems by more than 4 KiB.
Fixes: 69c1f396f2 ("efi/x86: Convert x86 EFI earlyprintk into generic earlycon implementation")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Link: https://lore.kernel.org/r/20201124191646.3559757-1-geert@linux-m68k.org
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
The memory leak addressed by commit fe5186cf12 is a false positive:
all allocations are recorded in a linked list, and freed when the
filesystem is unmounted. This leads to double frees, and as reported
by David, leads to crashes if SLUB is configured to self destruct when
double frees occur.
So drop the redundant kfree() again, and instead, mark the offending
pointer variable so the allocation is ignored by kmemleak.
Cc: Vamshi K Sthambamkadi <vamshi.k.sthambamkadi@gmail.com>
Fixes: fe5186cf12 ("efivarfs: fix memory leak in efivarfs_create()")
Reported-by: David Laight <David.Laight@aculab.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
i40iw_mmap manipulates the vma->vm_pgoff to differentiate a push page mmap
vs a doorbell mmap, and uses it to compute the pfn in remap_pfn_range
without any validation. This is vulnerable to an mmap exploit as described
in: https://lore.kernel.org/r/20201119093523.7588-1-zhudi21@huawei.com
The push feature is disabled in the driver currently and therefore no push
mmaps are issued from user-space. The feature does not work as expected in
the x722 product.
Remove the push module parameter and all VMA attribute manipulations for
this feature in i40iw_mmap. Update i40iw_mmap to only allow DB user
mmapings at offset = 0. Check vm_pgoff for zero and if the mmaps are bound
to a single page.
Cc: <stable@kernel.org>
Fixes: d374984179 ("i40iw: add files for iwarp interface")
Link: https://lore.kernel.org/r/20201125005616.1800-2-shiraz.saleem@intel.com
Reported-by: Di Zhu <zhudi21@huawei.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
When the device-tree board file was added for the Tegra234 VDK simulator
it incorrectly used the names 'cbb' and 'sdhci' instead of 'bus' and
'mmc', respectively. The names 'bus' and 'mmc' are required by the
device-tree json-schema validation tools. Therefore, fix this by
renaming these nodes accordingly.
Fixes: 639448912b ("arm64: tegra: Initial Tegra234 VDK support")
Reported-by: Ashish Singhal <ashishsingha@nvidia.com>
Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
The AON HSP node's "reg" property size 0xa0000 will overlap with other
resources. This patch fixes that wrong value with correct size 0x90000.
Reviewed-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Dipen Patel <dipenp@nvidia.com>
Fixes: a38570c22e ("arm64: tegra: Add nodes for TCU on Tegra194")
Signed-off-by: Thierry Reding <treding@nvidia.com>
USB host mode is broken on the OTG port of Jetson TX1 platform because
the USB_VBUS_EN0 regulator (regulator@11) is being overwritten by the
vdd-cam-1v2 regulator. This commit rearranges USB_VBUS_EN0 to be
regulator@14.
Fixes: 257c8047be ("arm64: tegra: jetson-tx1: Add camera supplies")
Cc: stable@vger.kernel.org
Signed-off-by: JC Kuo <jckuo@nvidia.com>
Reviewed-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
The Jetson Xavier NX board routes UARTA to the 40-pin header and UARTC
to a 12-pin debug header. The UARTs can be used by either the Tegra
Combined UART (TCU) driver or the Tegra 8250 driver. By default, the
TCU will use UARTC on Jetson Xavier NX. Currently, device-tree for
Xavier NX enables the TCU and the Tegra 8250 node for UARTC. Fix this
by disabling the Tegra 8250 node for UARTC and enabling the Tegra 8250
node for UARTA.
Fixes: 3f9efbbe57 ("arm64: tegra: Add support for Jetson Xavier NX")
Cc: stable@vger.kernel.org
Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Commit ff4c371d2b ("arm64: defconfig: Build ADMA and ACONNECT driver")
enable the Tegra ADMA and ACONNECT drivers and this is causing resume
from system suspend to fail on Jetson TX2. Resume is failing because the
ACONNECT driver is being resumed before the BPMP driver, and the ACONNECT
driver is attempting to power on a power-domain that is provided by the
BPMP. While a proper fix for the resume sequencing problem is identified,
disable the ACONNECT for Jetson TX2 temporarily to avoid breaking system
suspend.
Please note that ACONNECT driver is used by the Audio Processing Engine
(APE) on Tegra, but because there is no mainline support for APE on
Jetson TX2 currently, disabling the ACONNECT does not disable any useful
feature at the moment.
Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
When SPI DW memory ops support was introduced, there was a check for
excluding controllers which supplied their own CS function. Even so,
the mem_ops pointer is *always* presented to the SPI core.
This causes the SPI core sanity check in spi_controller_check_ops() to
refuse registration, since a mem_ops pointer is being supplied without
an exec_op member function.
The end result is failure of the SPI DW driver on sparx5 and similar
platforms.
The fix in the core SPI DW driver is to avoid presenting the mem_ops
pointer if the exec_op function is not set.
Fixes: 6423207e57 (spi: dw: Add memory operations support)
Signed-off-by: Lars Povlsen <lars.povlsen@microchip.com>
Acked-by: Serge Semin <fancer.lancer@gmail.com>
Link: https://lore.kernel.org/r/20201120213414.339701-1-lars.povlsen@microchip.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Only in smp systems the cache policy is setup as write alloc, in
single cpu systems the cache policy is set as writeback and it is
normal memory, so, it should pass the is_normal_memory check in the
share memory registration.
Add the right condition to make it work in no smp systems.
Fixes: cdbcf83d29 ("tee: optee: check type of registered shared memory")
Signed-off-by: Rui Miguel Silva <rui.silva@linaro.org>
Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>
Make an explicit suggestion how to post user space side of kernel
patches to avoid reposts when patchwork groups the wrong patches.
v2: mention the cases unlike iproute2 explicitly
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Simon Wunderlich says:
====================
Here is a batman-adv bugfix:
- set module owner to THIS_MODULE, by Taehee Yoo
* tag 'batadv-net-pullrequest-20201124' of git://git.open-mesh.org/linux-merge:
batman-adv: set .owner to THIS_MODULE
====================
Link: https://lore.kernel.org/r/20201124134417.17269-1-sw@simonwunderlich.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Based on the discussion with Sukadev Bhattiprolu and Dany Madden,
we believe that checking adapter->resetting bit is preferred
since RESETTING state flag is not as strict as resetting bit.
RESETTING state flag is removed since it is verbose now.
Fixes: 7d7195a026 ("ibmvnic: Do not process device remove during device reset")
Signed-off-by: Lijun Pan <ljp@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Shay Agroskin says:
====================
Fixes for ENA driver
- fix wrong data offset on machines that support rx offset
- work-around Intel iommu issue
- fix out of bound access when request id is wrong
====================
Link: https://lore.kernel.org/r/20201123190859.21298-1-shayagr@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This patch fixes two lines in which the rx_offset received by the device
wasn't taken into account:
- prefetch function:
In our driver the copied data would reside in
rx_info->page + rx_headroom + rx_offset
so the prefetch function is changed accordingly.
- setting page_offset to zero for descriptors > 1:
for every descriptor but the first, the rx_offset is zero. Hence
the page_offset value should be set to rx_headroom.
The previous implementation changed the value of rx_info after
the descriptor was added to the SKB (essentially providing wrong
page offset).
Fixes: 68f236df93 ("net: ena: add support for the rx offset feature")
Signed-off-by: Shay Agroskin <shayagr@amazon.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The ENA driver uses the readless mechanism, which uses DMA, to find
out what the DMA mask is supposed to be.
If DMA is used without setting the dma_mask first, it causes the
Intel IOMMU driver to think that ENA is a 32-bit device and therefore
disables IOMMU passthrough permanently.
This patch sets the dma_mask to be ENA_MAX_PHYS_ADDR_SIZE_BITS=48
before readless initialization in
ena_device_init()->ena_com_mmio_reg_read_request_init(),
which is large enough to workaround the intel_iommu issue.
DMA mask is set again to the correct value after it's received from the
device after readless is initialized.
The patch also changes the driver to use dma_set_mask_and_coherent()
function instead of the two pci_set_dma_mask() and
pci_set_consistent_dma_mask() ones. Both methods achieve the same
effect.
Fixes: 1738cd3ed3 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)")
Signed-off-by: Mike Cui <mikecui@amazon.com>
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: Shay Agroskin <shayagr@amazon.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
After request id is checked in validate_rx_req_id() its value is still
used in the line
rx_ring->free_ids[next_to_clean] =
rx_ring->ena_bufs[i].req_id;
even if it was found to be out-of-bound for the array free_ids.
The patch moves the request id to an earlier stage in the napi routine and
makes sure its value isn't used if it's found out-of-bounds.
Fixes: 30623e1ed1 ("net: ena: avoid memory access violation by validating req_id properly")
Signed-off-by: Ido Segev <idose@amazon.com>
Signed-off-by: Shay Agroskin <shayagr@amazon.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull irqchip fixes from Marc Zyngier:
- Fix Exiu driver trigger type when using ACPI
- Fix GICv3 ITS suspend/resume to use the in-kernel path
at all times, sidestepping braindead firmware support
Link: https://lore.kernel.org/r/20201122184752.553990-1-maz@kernel.org
Pull cifs fixes from Steve French:
"Four smb3 fixes for stable: one fixes a memleak, the other three
address a problem found with decryption offload that can cause a use
after free"
* tag '5.10-rc5-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb3: Handle error case during offload read path
smb3: Avoid Mid pending list corruption
smb3: Call cifs reconnect from demultiplex thread
cifs: fix a memleak with modefromsid
Twice now, when exercising ext4 looped on shmem huge pages, I have crashed
on the PF_ONLY_HEAD check inside PageWaiters(): ext4_finish_bio() calling
end_page_writeback() calling wake_up_page() on tail of a shmem huge page,
no longer an ext4 page at all.
The problem is that PageWriteback is not accompanied by a page reference
(as the NOTE at the end of test_clear_page_writeback() acknowledges): as
soon as TestClearPageWriteback has been done, that page could be removed
from page cache, freed, and reused for something else by the time that
wake_up_page() is reached.
https://lore.kernel.org/linux-mm/20200827122019.GC14765@casper.infradead.org/
Matthew Wilcox suggested avoiding or weakening the PageWaiters() tail
check; but I'm paranoid about even looking at an unreferenced struct page,
lest its memory might itself have already been reused or hotremoved (and
wake_up_page_bit() may modify that memory with its ClearPageWaiters()).
Then on crashing a second time, realized there's a stronger reason against
that approach. If my testing just occasionally crashes on that check,
when the page is reused for part of a compound page, wouldn't it be much
more common for the page to get reused as an order-0 page before reaching
wake_up_page()? And on rare occasions, might that reused page already be
marked PageWriteback by its new user, and already be waited upon? What
would that look like?
It would look like BUG_ON(PageWriteback) after wait_on_page_writeback()
in write_cache_pages() (though I have never seen that crash myself).
Matthew Wilcox explaining this to himself:
"page is allocated, added to page cache, dirtied, writeback starts,
--- thread A ---
filesystem calls end_page_writeback()
test_clear_page_writeback()
--- context switch to thread B ---
truncate_inode_pages_range() finds the page, it doesn't have writeback set,
we delete it from the page cache. Page gets reallocated, dirtied, writeback
starts again. Then we call write_cache_pages(), see
PageWriteback() set, call wait_on_page_writeback()
--- context switch back to thread A ---
wake_up_page(page, PG_writeback);
... thread B is woken, but because the wakeup was for the old use of
the page, PageWriteback is still set.
Devious"
And prior to 2a9127fcf2 ("mm: rewrite wait_on_page_bit_common() logic")
this would have been much less likely: before that, wake_page_function()'s
non-exclusive case would stop walking and not wake if it found Writeback
already set again; whereas now the non-exclusive case proceeds to wake.
I have not thought of a fix that does not add a little overhead: the
simplest fix is for end_page_writeback() to get_page() before calling
test_clear_page_writeback(), then put_page() after wake_up_page().
Was there a chance of missed wakeups before, since a page freed before
reaching wake_up_page() would have PageWaiters cleared? I think not,
because each waiter does hold a reference on the page. This bug comes
when the old use of the page, the one we do TestClearPageWriteback on,
had *no* waiters, so no additional page reference beyond the page cache
(and whoever racily freed it). The reuse of the page has a waiter
holding a reference, and its own PageWriteback set; but the belated
wake_up_page() has woken the reuse to hit that BUG_ON(PageWriteback).
Reported-by: syzbot+3622cea378100f45d59f@syzkaller.appspotmail.com
Reported-by: Qian Cai <cai@lca.pw>
Fixes: 2a9127fcf2 ("mm: rewrite wait_on_page_bit_common() logic")
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: stable@vger.kernel.org # v5.8+
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
GPIOs - as returned by of_get_named_gpio() and used by the gpiolib - are
signed integers, where negative number indicates error. The return
value of of_get_named_gpio() should not be assigned to an unsigned int
because in case of !CONFIG_GPIOLIB such number would be a valid GPIO.
Fixes: c04c674fad ("nfc: s3fwrn5: Add driver for Samsung S3FWRN5 NFC Chip")
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Link: https://lore.kernel.org/r/20201123162351.209100-1-krzk@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The dpaa2 driver depends on devlink, so it should select
NET_DEVLINK in order to fix compile errors, such as:
drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.o: in function `dpaa2_eth_rx_err':
dpaa2-eth.c:(.text+0x3cec): undefined reference to `devlink_trap_report'
drivers/net/ethernet/freescale/dpaa2/dpaa2-eth-devlink.o: in function `dpaa2_eth_dl_info_get':
dpaa2-eth-devlink.c:(.text+0x160): undefined reference to `devlink_info_driver_name_put'
Fixes: ceeb03ad8e ("dpaa2-eth: add basic devlink support")
Signed-off-by: Ezequiel Garcia <ezequiel@collabora.com>
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Link: https://lore.kernel.org/r/20201123163553.1666476-1-ciorneiioana@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When a BPF program is used to select between a type of TCP congestion
control algorithm that uses either ECN or not there is a case where the
synack for the frame was coming up without the ECT0 bit set. A bit of
research found that this was due to the final socket being configured to
dctcp while the listener socket was staying in cubic.
To reproduce it all that is needed is to monitor TCP traffic while running
the sample bpf program "samples/bpf/tcp_cong_kern.c". What is observed,
assuming tcp_dctcp module is loaded or compiled in and the traffic matches
the rules in the sample file, is that for all frames with the exception of
the synack the ECT0 bit is set.
To address that it is necessary to make one additional call to
tcp_bpf_ca_needs_ecn using the request socket and then use the output of
that to set the ECT0 bit for the tos/tclass of the packet.
Fixes: 91b5b21c7c ("bpf: Add support for changing congestion control")
Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Link: https://lore.kernel.org/r/160593039663.2604.1374502006916871573.stgit@localhost.localdomain
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fix reload stats structure exposed to the user. Change stats structure
hierarchy to have the reload action as a parent of the stat entry and
then stat entry includes value per limit. This will also help to avoid
string concatenation on iproute2 output.
Reload stats structure before this fix:
"stats": {
"reload": {
"driver_reinit": 2,
"fw_activate": 1,
"fw_activate_no_reset": 0
}
}
After this fix:
"stats": {
"reload": {
"driver_reinit": {
"unspecified": 2
},
"fw_activate": {
"unspecified": 1,
"no_reset": 0
}
}
Fixes: a254c26426 ("devlink: Add reload stats")
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/1606109785-25197-1-git-send-email-moshe@mellanox.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull s390 fix from Heiko Carstens:
"Disable interrupts when restoring fpu and vector registers, otherwise
KVM guests might see corrupted register contents"
* tag 's390-5.10-5' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390: fix fpu restore in entry.S
Pull ARC fixes from Vineet Gupta:
"A couple more stack unwinder related fixes:
- More stack unwinding updates
- Misc minor fixes"
* tag 'arc-5.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
ARC: stack unwinding: reorganize how initial register state setup
ARC: stack unwinding: don't assume non-current task is sleeping
ARC: mm: fix spelling mistakes
ARC: bitops: Remove unecessary operation and value
When performing IPv6 forwarding, there is an expectation that SKBs
will have some headroom. When forwarding a packet from the aquantia
driver, this does not always happen, triggering a kernel warning.
aq_ring.c has this code (edited slightly for brevity):
if (buff->is_eop && buff->len <= AQ_CFG_RX_FRAME_MAX - AQ_SKB_ALIGN) {
skb = build_skb(aq_buf_vaddr(&buff->rxdata), AQ_CFG_RX_FRAME_MAX);
} else {
skb = napi_alloc_skb(napi, AQ_CFG_RX_HDR_SIZE);
There is a significant difference between the SKB produced by these
2 code paths. When napi_alloc_skb creates an SKB, there is a certain
amount of headroom reserved. However, this is not done in the
build_skb codepath.
As the hardware buffer that build_skb is built around does not
handle the presence of the SKB header, this code path is being
removed and the napi_alloc_skb path will always be used. This code
path does have to copy the packet header into the SKB, but it adds
the packet data as a frag.
Fixes: 018423e90b ("net: ethernet: aquantia: Add ring support code")
Signed-off-by: Lincoln Ramsay <lincoln.ramsay@opengear.com>
Link: https://lore.kernel.org/r/MWHPR1001MB23184F3EAFA413E0D1910EC9E8FC0@MWHPR1001MB2318.namprd10.prod.outlook.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The HDCP feature requires at least one connector attached to the device;
however, some GPUs do not have a physical output, making the HDCP
initialization irrelevant. This patch disables HDCP initialization when
the graphic card does not have output.
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Since preempt-to-busy, we may unsubmit a request while it is still on
the HW and completes asynchronously. That means it may be retired and in
the process destroy the virtual engine (as the user has closed their
context), but that engine may still be holding onto the unsubmitted
compelted request. Therefore we need to potentially cleanup the old
request on destroying the virtual engine. We also have to keep the
virtual_engine alive until after the sibling's execlists_dequeue() have
finished peeking into the virtual engines, for which we serialise with
RCU.
v2: Be paranoid and flush the tasklet as well.
v3: And flush the tasklet before the engines, as the tasklet may
re-attach an rb_node after our removal from the siblings.
Fixes: 6d06779e86 ("drm/i915: Load balancing across a virtual engine")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201123113717.20500-4-chris@chris-wilson.co.uk
(cherry picked from commit 46eecfccb4)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
The UVD firmware is copied to cpu addr in uvd_resume, so it
should be used after that. This is to fix a bug introduced by
patch drm/amdgpu: fix SI UVD firmware validate resume fail.
Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
CC: stable@vger.kernel.org
The SI UVD firmware validate key is stored at the end of firmware,
which is changed during resume while playing video. So get the key
at sw_init and store it for fw validate using.
Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Move the register slow register write and readback from out of the
critical path for execlists submission and delay it until the following
worker, shaving off around 200us. Note that the same signal_irq_work() is
allowed to run concurrently on each CPU (but it will only be queued once,
once running though it can be requeued and reexecuted) so we have to
remember to lock the global interactions as we cannot rely on the
signal_irq_work() itself providing the serialisation (in constrast to a
tasklet).
By pushing the arm/disarm into the central signaling worker we can close
the race for disarming the interrupt (and dropping its associated
GT wakeref) on parking the engine. If we loose the race, that GT wakeref
may be held indefinitely, preventing the machine from sleeping while
the GPU is ostensibly idle.
v2: Move the self-arming parking of the signal_irq_work to a flush of
the irq-work from intel_breadcrumbs_park().
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2271
Fixes: e23005604b ("drm/i915/gt: Hold context/request reference while breadcrumbs are active")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201123113717.20500-1-chris@chris-wilson.co.uk
(cherry picked from commit 9d5612ca16)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
After having written the entire OA buffer with reports, the HW will
write again at the beginning of the OA buffer. It'll indicate it by
setting the WRAP bits in the OASTATUS register.
When a wrap happens and that at the end of the read vfunc we write the
OASTATUS register back to clear the REPORT_LOST bit, we sometimes see
that the OATAILPTR register is reset to a previous position on Gen8/9
(apparently not the case on Gen11+). This leads the next call to the
read vfunc to process reports we've already read. Because we've marked
those as read by clearing the reason & timestamp dwords, they're
discarded and a "Skipping spurious, invalid OA report" message is
emitted.
The workaround to avoid this OATAILPTR value reset seems to be to set
the wrap bits when writing back OASTATUS.
This change has no impact on userspace, it only avoids a bunch of
DRM_NOTE("Skipping spurious, invalid OA report\n") messages.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 19f81df285 ("drm/i915/perf: Add OA unit support for Gen 8+")
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201117130124.829979-1-lionel.g.landwerlin@intel.com
(cherry picked from commit 059a0beb48)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
We call arch_cpu_idle() with RCU disabled, but then use
local_irq_{en,dis}able(), which invokes tracing, which relies on RCU.
Switch all arch_cpu_idle() implementations to use
raw_local_irq_{en,dis}able() and carefully manage the
lockdep,rcu,tracing state like we do in entry.
(XXX: we really should change arch_cpu_idle() to not return with
interrupts enabled)
Reported-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lkml.kernel.org/r/20201120114925.594122626@infradead.org
iov_iter::type is a bitmask that also keeps direction etc., so it
shouldn't be directly compared against ITER_*. Use proper helper.
Fixes: ff6165b2d7 ("io_uring: retain iov_iter state over io_read/io_write calls")
Reported-by: David Howells <dhowells@redhat.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Cc: <stable@vger.kernel.org> # 5.9
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Abaci Fuzz reported a shift-out-of-bounds BUG in io_uring_create():
[ 59.598207] UBSAN: shift-out-of-bounds in ./include/linux/log2.h:57:13
[ 59.599665] shift exponent 64 is too large for 64-bit type 'long unsigned int'
[ 59.601230] CPU: 0 PID: 963 Comm: a.out Not tainted 5.10.0-rc4+ #3
[ 59.602502] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[ 59.603673] Call Trace:
[ 59.604286] dump_stack+0x107/0x163
[ 59.605237] ubsan_epilogue+0xb/0x5a
[ 59.606094] __ubsan_handle_shift_out_of_bounds.cold+0xb2/0x20e
[ 59.607335] ? lock_downgrade+0x6c0/0x6c0
[ 59.608182] ? rcu_read_lock_sched_held+0xaf/0xe0
[ 59.609166] io_uring_create.cold+0x99/0x149
[ 59.610114] io_uring_setup+0xd6/0x140
[ 59.610975] ? io_uring_create+0x2510/0x2510
[ 59.611945] ? lockdep_hardirqs_on_prepare+0x286/0x400
[ 59.613007] ? syscall_enter_from_user_mode+0x27/0x80
[ 59.614038] ? trace_hardirqs_on+0x5b/0x180
[ 59.615056] do_syscall_64+0x2d/0x40
[ 59.615940] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 59.617007] RIP: 0033:0x7f2bb8a0b239
This is caused by roundup_pow_of_two() if the input entries larger
enough, e.g. 2^32-1. For sq_entries, it will check first and we allow
at most IORING_MAX_ENTRIES, so it is okay. But for cq_entries, we do
round up first, that may overflow and truncate it to 0, which is not
the expected behavior. So check the cq size first and then do round up.
Fixes: 88ec3211e4 ("io_uring: round-up cq size before comparing with rounded sq size")
Reported-by: Abaci Fuzz <abaci@linux.alibaba.com>
Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
If set active without increase the usage count of pm, the dont use
autosuspend function will call the suspend callback to close the two
clocks of spi because the usage count is reduced to -1.
This will cause the warning dump below when the defer-probe occurs.
[ 129.379701] ecspi2_root_clk already disabled
[ 129.384005] WARNING: CPU: 1 PID: 33 at drivers/clk/clk.c:952 clk_core_disable+0xa4/0xb0
So add the get noresume function before set active.
Fixes: 43b6bf406c spi: imx: fix runtime pm support for !CONFIG_PM
Signed-off-by: Clark Wang <xiaoning.wang@nxp.com>
Link: https://lore.kernel.org/r/20201124085247.18025-1-xiaoning.wang@nxp.com
Signed-off-by: Mark Brown <broonie@kernel.org>
On resource group creation via a mkdir an extra kernfs_node reference is
obtained by kernfs_get() to ensure that the rdtgroup structure remains
accessible for the rdtgroup_kn_unlock() calls where it is removed on
deletion. Currently the extra kernfs_node reference count is only
dropped by kernfs_put() in rdtgroup_kn_unlock() while the rdtgroup
structure is removed in a few other locations that lack the matching
reference drop.
In call paths of rmdir and umount, when a control group is removed,
kernfs_remove() is called to remove the whole kernfs nodes tree of the
control group (including the kernfs nodes trees of all child monitoring
groups), and then rdtgroup structure is freed by kfree(). The rdtgroup
structures of all child monitoring groups under the control group are
freed by kfree() in free_all_child_rdtgrp().
Before calling kfree() to free the rdtgroup structures, the kernfs node
of the control group itself as well as the kernfs nodes of all child
monitoring groups still take the extra references which will never be
dropped to 0 and the kernfs nodes will never be freed. It leads to
reference count leak and kernfs_node_cache memory leak.
For example, reference count leak is observed in these two cases:
(1) mount -t resctrl resctrl /sys/fs/resctrl
mkdir /sys/fs/resctrl/c1
mkdir /sys/fs/resctrl/c1/mon_groups/m1
umount /sys/fs/resctrl
(2) mkdir /sys/fs/resctrl/c1
mkdir /sys/fs/resctrl/c1/mon_groups/m1
rmdir /sys/fs/resctrl/c1
The same reference count leak issue also exists in the error exit paths
of mkdir in mkdir_rdt_prepare() and rdtgroup_mkdir_ctrl_mon().
Fix this issue by following changes to make sure the extra kernfs_node
reference on rdtgroup is dropped before freeing the rdtgroup structure.
(1) Introduce rdtgroup removal helper rdtgroup_remove() to wrap up
kernfs_put() and kfree().
(2) Call rdtgroup_remove() in rdtgroup removal path where the rdtgroup
structure is about to be freed by kfree().
(3) Call rdtgroup_remove() or kernfs_put() as appropriate in the error
exit paths of mkdir where an extra reference is taken by kernfs_get().
Fixes: f3cbeacaa0 ("x86/intel_rdt/cqm: Add rmdir support")
Fixes: e02737d5b8 ("x86/intel_rdt: Add tasks files")
Fixes: 60cf5e101f ("x86/intel_rdt: Add mkdir to resctrl file system")
Reported-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/1604085088-31707-1-git-send-email-xiaochen.shen@intel.com
Willem reported growing of kernfs_node_cache entries in slabtop when
repeatedly creating and removing resctrl subdirectories as well as when
repeatedly mounting and unmounting the resctrl filesystem.
On resource group (control as well as monitoring) creation via a mkdir
an extra kernfs_node reference is obtained to ensure that the rdtgroup
structure remains accessible for the rdtgroup_kn_unlock() calls where it
is removed on deletion. The kernfs_node reference count is dropped by
kernfs_put() in rdtgroup_kn_unlock().
With the above explaining the need for one kernfs_get()/kernfs_put()
pair in resctrl there are more places where a kernfs_node reference is
obtained without a corresponding release. The excessive amount of
reference count on kernfs nodes will never be dropped to 0 and the
kernfs nodes will never be freed in the call paths of rmdir and umount.
It leads to reference count leak and kernfs_node_cache memory leak.
Remove the superfluous kernfs_get() calls and expand the existing
comments surrounding the remaining kernfs_get()/kernfs_put() pair that
remains in use.
Superfluous kernfs_get() calls are removed from two areas:
(1) In call paths of mount and mkdir, when kernfs nodes for "info",
"mon_groups" and "mon_data" directories and sub-directories are
created, the reference count of newly created kernfs node is set to 1.
But after kernfs_create_dir() returns, superfluous kernfs_get() are
called to take an additional reference.
(2) kernfs_get() calls in rmdir call paths.
Fixes: 17eafd0762 ("x86/intel_rdt: Split resource group removal in two")
Fixes: 4af4a88e0c ("x86/intel_rdt/cqm: Add mount,umount support")
Fixes: f3cbeacaa0 ("x86/intel_rdt/cqm: Add rmdir support")
Fixes: d89b737901 ("x86/intel_rdt/cqm: Add mon_data")
Fixes: c7d9aac613 ("x86/intel_rdt/cqm: Add mkdir support for RDT monitoring")
Fixes: 5dc1d5c6ba ("x86/intel_rdt: Simplify info and base file lists")
Fixes: 60cf5e101f ("x86/intel_rdt: Add mkdir to resctrl file system")
Fixes: 4e978d06de ("x86/intel_rdt: Add "info" files to resctrl file system")
Reported-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Willem de Bruijn <willemb@google.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/1604085053-31639-1-git-send-email-xiaochen.shen@intel.com
In the patchset merged by commit b9fcf0a0d8
("Merge branch 'support-AF_PACKET-for-layer-3-devices'") L3 devices which
did not have header_ops were given one for the purpose of protocol parsing
on af_packet transmit path.
That change made af_packet receive path regard these devices as having a
visible L3 header and therefore aligned incoming skb->data to point to the
skb's mac_header. Some devices, such as ipip, xfrmi, and others, do not
reset their mac_header prior to ingress and therefore their incoming
packets became malformed.
Ideally these devices would reset their mac headers, or af_packet would be
able to rely on dev->hard_header_len being 0 for such cases, but it seems
this is not the case.
Fix by changing af_packet RX ll visibility criteria to include the
existence of a '.create()' header operation, which is used when creating
a device hard header - via dev_hard_header() - by upper layers, and does
not exist in these L3 devices.
As this predicate may be useful in other situations, add it as a common
dev_has_header() helper in netdevice.h.
Fixes: b9fcf0a0d8 ("Merge branch 'support-AF_PACKET-for-layer-3-devices'")
Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Acked-by: Jason A. Donenfeld <Jason@zx2c4.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20201121062817.3178900-1-eyal.birger@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Prevent VFs from resetting when PF driver is being unloaded:
- introduce new pf state: __I40E_VF_RESETS_DISABLED;
- check if pf state has __I40E_VF_RESETS_DISABLED state set,
if so, disable any further VFLR event notifications;
- when i40e_remove (rmmod i40e) is called, disable any resets on
the VFs;
Previously if there were bare-metal VFs passing traffic and PF
driver was removed, there was a possibility of VFs triggering a Tx
timeout right before iavf_remove. This was causing iavf_close to
not be called because there is a check in the beginning of iavf_remove
that bails out early if adapter->state < IAVF_DOWN_PENDING. This
makes it so some resources do not get cleaned up.
Fixes: 6a9ddb36ee ("i40e: disable IOV before freeing resources")
Signed-off-by: Slawomir Laba <slawomirx.laba@intel.com>
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20201120180640.3654474-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Starting from commit 8692cefc43 ("virtio_vsock: Fix race condition
in virtio_transport_recv_pkt"), we discard packets in
virtio_transport_recv_pkt() if the socket has been released.
When the socket is connected, we schedule a delayed work to wait the
RST packet from the other peer, also if SHUTDOWN_MASK is set in
sk->sk_shutdown.
This is done to complete the virtio-vsock shutdown algorithm, releasing
the port assigned to the socket definitively only when the other peer
has consumed all the packets.
If we discard the RST packet received, the socket will be closed only
when the VSOCK_CLOSE_TIMEOUT is reached.
Sergio discovered the issue while running ab(1) HTTP benchmark using
libkrun [1] and observing a latency increase with that commit.
To avoid this issue, we discard packet only if the socket is really
closed (SOCK_DONE flag is set).
We also set SOCK_DONE in virtio_transport_release() when we don't need
to wait any packets from the other peer (we didn't schedule the delayed
work). In this case we remove the socket from the vsock lists, releasing
the port assigned.
[1] https://github.com/containers/libkrun
Fixes: 8692cefc43 ("virtio_vsock: Fix race condition in virtio_transport_recv_pkt")
Cc: justin.he@arm.com
Reported-by: Sergio Lopez <slp@redhat.com>
Tested-by: Sergio Lopez <slp@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Jia He <justin.he@arm.com>
Link: https://lore.kernel.org/r/20201120104736.73749-1-sgarzare@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When the TCP stack is in SYN flood mode, the server child socket is
created from the SYN cookie received in a TCP packet with the ACK flag
set.
The child socket is created when the server receives the first TCP
packet with a valid SYN cookie from the client. Usually, this packet
corresponds to the final step of the TCP 3-way handshake, the ACK
packet. But is also possible to receive a valid SYN cookie from the
first TCP data packet sent by the client, and thus create a child socket
from that SYN cookie.
Since a client socket is ready to send data as soon as it receives the
SYN+ACK packet from the server, the client can send the ACK packet (sent
by the TCP stack code), and the first data packet (sent by the userspace
program) almost at the same time, and thus the server will equally
receive the two TCP packets with valid SYN cookies almost at the same
instant.
When such event happens, the TCP stack code has a race condition that
occurs between the momement a lookup is done to the established
connections hashtable to check for the existence of a connection for the
same client, and the moment that the child socket is added to the
established connections hashtable. As a consequence, this race condition
can lead to a situation where we add two child sockets to the
established connections hashtable and deliver two sockets to the
userspace program to the same client.
This patch fixes the race condition by checking if an existing child
socket exists for the same client when we are adding the second child
socket to the established connections socket. If an existing child
socket exists, we drop the packet and discard the second child socket
to the same client.
Signed-off-by: Ricardo Dias <rdias@singlestore.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20201120111133.GA67501@rdias-suse-pc.lan
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull Hyper-V fix from Wei Liu:
"One patch from Dexuan to fix VRAM cache type in Hyper-V framebuffer
driver"
* tag 'hyperv-fixes-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
video: hyperv_fb: Fix the cache type when mapping the VRAM
Kalle Valo says:
====================
wireless-drivers fixes for v5.10
First set of fixes for v5.10. One fix for iwlwifi kernel panic, others
less notable.
rtw88
* fix a bogus test found by clang
iwlwifi
* fix long memory reads causing soft lockup warnings
* fix kernel panic during Channel Switch Announcement (CSA)
* other smaller fixes
MAINTAINERS
* email address updates
* tag 'wireless-drivers-2020-11-23' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers:
iwlwifi: mvm: fix kernel panic in case of assert during CSA
iwlwifi: pcie: set LTR to avoid completion timeout
iwlwifi: mvm: write queue_sync_state only for sync
iwlwifi: mvm: properly cancel a session protection for P2P
iwlwifi: mvm: use the HOT_SPOT_CMD to cancel an AUX ROC
iwlwifi: sta: set max HE max A-MPDU according to HE capa
MAINTAINERS: update maintainers list for Cypress
MAINTAINERS: update Yan-Hsuan's email address
iwlwifi: pcie: limit memory read spin time
rtw88: fix fw_fifo_addr check
====================
Link: https://lore.kernel.org/r/20201123161037.C11D1C43460@smtp.codeaurora.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When adding or removing a qgroup relation we are doing a GFP_KERNEL
allocation which is not safe because we are holding a transaction
handle open and that can make us deadlock if the allocator needs to
recurse into the filesystem. So just surround those calls with a
nofs context.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
There are sectorsize alignment checks that are reported but then
check_extent_data_ref continues. This was not intended, wrong alignment
is not a minor problem and we should return with error.
CC: stable@vger.kernel.org # 5.4+
Fixes: 0785a9aacf ("btrfs: tree-checker: Add EXTENT_DATA_REF check")
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Syzbot reported a possible use-after-free when printing a duplicate device
warning device_list_add().
At this point it can happen that a btrfs_device::fs_info is not correctly
setup yet, so we're accessing stale data, when printing the warning
message using the btrfs_printk() wrappers.
==================================================================
BUG: KASAN: use-after-free in btrfs_printk+0x3eb/0x435 fs/btrfs/super.c:245
Read of size 8 at addr ffff8880878e06a8 by task syz-executor225/7068
CPU: 1 PID: 7068 Comm: syz-executor225 Not tainted 5.9.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x1d6/0x29e lib/dump_stack.c:118
print_address_description+0x66/0x620 mm/kasan/report.c:383
__kasan_report mm/kasan/report.c:513 [inline]
kasan_report+0x132/0x1d0 mm/kasan/report.c:530
btrfs_printk+0x3eb/0x435 fs/btrfs/super.c:245
device_list_add+0x1a88/0x1d60 fs/btrfs/volumes.c:943
btrfs_scan_one_device+0x196/0x490 fs/btrfs/volumes.c:1359
btrfs_mount_root+0x48f/0xb60 fs/btrfs/super.c:1634
legacy_get_tree+0xea/0x180 fs/fs_context.c:592
vfs_get_tree+0x88/0x270 fs/super.c:1547
fc_mount fs/namespace.c:978 [inline]
vfs_kern_mount+0xc9/0x160 fs/namespace.c:1008
btrfs_mount+0x33c/0xae0 fs/btrfs/super.c:1732
legacy_get_tree+0xea/0x180 fs/fs_context.c:592
vfs_get_tree+0x88/0x270 fs/super.c:1547
do_new_mount fs/namespace.c:2875 [inline]
path_mount+0x179d/0x29e0 fs/namespace.c:3192
do_mount fs/namespace.c:3205 [inline]
__do_sys_mount fs/namespace.c:3413 [inline]
__se_sys_mount+0x126/0x180 fs/namespace.c:3390
do_syscall_64+0x31/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x44840a
RSP: 002b:00007ffedfffd608 EFLAGS: 00000293 ORIG_RAX: 00000000000000a5
RAX: ffffffffffffffda RBX: 00007ffedfffd670 RCX: 000000000044840a
RDX: 0000000020000000 RSI: 0000000020000100 RDI: 00007ffedfffd630
RBP: 00007ffedfffd630 R08: 00007ffedfffd670 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000293 R12: 000000000000001a
R13: 0000000000000004 R14: 0000000000000003 R15: 0000000000000003
Allocated by task 6945:
kasan_save_stack mm/kasan/common.c:48 [inline]
kasan_set_track mm/kasan/common.c:56 [inline]
__kasan_kmalloc+0x100/0x130 mm/kasan/common.c:461
kmalloc_node include/linux/slab.h:577 [inline]
kvmalloc_node+0x81/0x110 mm/util.c:574
kvmalloc include/linux/mm.h:757 [inline]
kvzalloc include/linux/mm.h:765 [inline]
btrfs_mount_root+0xd0/0xb60 fs/btrfs/super.c:1613
legacy_get_tree+0xea/0x180 fs/fs_context.c:592
vfs_get_tree+0x88/0x270 fs/super.c:1547
fc_mount fs/namespace.c:978 [inline]
vfs_kern_mount+0xc9/0x160 fs/namespace.c:1008
btrfs_mount+0x33c/0xae0 fs/btrfs/super.c:1732
legacy_get_tree+0xea/0x180 fs/fs_context.c:592
vfs_get_tree+0x88/0x270 fs/super.c:1547
do_new_mount fs/namespace.c:2875 [inline]
path_mount+0x179d/0x29e0 fs/namespace.c:3192
do_mount fs/namespace.c:3205 [inline]
__do_sys_mount fs/namespace.c:3413 [inline]
__se_sys_mount+0x126/0x180 fs/namespace.c:3390
do_syscall_64+0x31/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Freed by task 6945:
kasan_save_stack mm/kasan/common.c:48 [inline]
kasan_set_track+0x3d/0x70 mm/kasan/common.c:56
kasan_set_free_info+0x17/0x30 mm/kasan/generic.c:355
__kasan_slab_free+0xdd/0x110 mm/kasan/common.c:422
__cache_free mm/slab.c:3418 [inline]
kfree+0x113/0x200 mm/slab.c:3756
deactivate_locked_super+0xa7/0xf0 fs/super.c:335
btrfs_mount_root+0x72b/0xb60 fs/btrfs/super.c:1678
legacy_get_tree+0xea/0x180 fs/fs_context.c:592
vfs_get_tree+0x88/0x270 fs/super.c:1547
fc_mount fs/namespace.c:978 [inline]
vfs_kern_mount+0xc9/0x160 fs/namespace.c:1008
btrfs_mount+0x33c/0xae0 fs/btrfs/super.c:1732
legacy_get_tree+0xea/0x180 fs/fs_context.c:592
vfs_get_tree+0x88/0x270 fs/super.c:1547
do_new_mount fs/namespace.c:2875 [inline]
path_mount+0x179d/0x29e0 fs/namespace.c:3192
do_mount fs/namespace.c:3205 [inline]
__do_sys_mount fs/namespace.c:3413 [inline]
__se_sys_mount+0x126/0x180 fs/namespace.c:3390
do_syscall_64+0x31/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9
The buggy address belongs to the object at ffff8880878e0000
which belongs to the cache kmalloc-16k of size 16384
The buggy address is located 1704 bytes inside of
16384-byte region [ffff8880878e0000, ffff8880878e4000)
The buggy address belongs to the page:
page:0000000060704f30 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x878e0
head:0000000060704f30 order:3 compound_mapcount:0 compound_pincount:0
flags: 0xfffe0000010200(slab|head)
raw: 00fffe0000010200 ffffea00028e9a08 ffffea00021e3608 ffff8880aa440b00
raw: 0000000000000000 ffff8880878e0000 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff8880878e0580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8880878e0600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff8880878e0680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8880878e0700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8880878e0780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================
The syzkaller reproducer for this use-after-free crafts a filesystem image
and loop mounts it twice in a loop. The mount will fail as the crafted
image has an invalid chunk tree. When this happens btrfs_mount_root() will
call deactivate_locked_super(), which then cleans up fs_info and
fs_info::sb. If a second thread now adds the same block-device to the
filesystem, it will get detected as a duplicate device and
device_list_add() will reject the duplicate and print a warning. But as
the fs_info pointer passed in is non-NULL this will result in a
use-after-free.
Instead of printing possibly uninitialized or already freed memory in
btrfs_printk(), explicitly pass in a NULL fs_info so the printing of the
device name will be skipped altogether.
There was a slightly different approach discussed in
https://lore.kernel.org/linux-btrfs/20200114060920.4527-1-anand.jain@oracle.com/t/#u
Link: https://lore.kernel.org/linux-btrfs/000000000000c9e14b05afcc41ba@google.com
Reported-by: syzbot+582e66e5edf36a22c7b0@syzkaller.appspotmail.com
CC: stable@vger.kernel.org # 4.19+
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Oded writes:
This tag contains the following habanalabs driver fix for 5.10-rc6:
- Add missing statements and break; in case switch of ECC handling. Without
this fix, the handling of that interrupt will be erroneous.
* tag 'misc-habanalabs-fixes-2020-11-23' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/ogabbay/linux:
habanalabs/gaudi: fix missing code in ECC handling
There is missing statement and missing "break;" in the ECC handling
code in gaudi.c
This will cause a wrong behavior upon certain ECC interrupts.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
The current HVS muxing code will consider the CRTCs in a given state to
setup their muxing in the HVS, and disable the other CRTCs muxes.
However, it's valid to only update a single CRTC with a state, and in this
situation we would mux out a CRTC that was enabled but left untouched by
the new state.
Fix this by setting a flag on the CRTC state when the muxing has been
changed, and only change the muxing configuration when that flag is there.
Fixes: 87ebcd42fb ("drm/vc4: crtc: Assign output to channel automatically")
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Tested-by: Hoegeun Kwon <hoegeun.kwon@samsung.com>
Reviewed-by: Hoegeun Kwon <hoegeun.kwon@samsung.com>
Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20201120144245.398711-3-maxime@cerno.tech
If a CRTC is enabled but not active, and that we're then doing a page
flip on another CRTC, drm_atomic_get_crtc_state will bring the first
CRTC state into the global state, and will make us wait for its vblank
as well, even though that might never occur.
Instead of creating the list of the free channels each time atomic_check
is called, and calling drm_atomic_get_crtc_state to retrieve the
allocated channels, let's create a private state object in the main
atomic state, and use it to store the available channels.
Since vc4 has a semaphore (with a value of 1, so a lock) in its commit
implementation to serialize all the commits, even the nonblocking ones, we
are free from the use-after-free race if two subsequent commits are not ran
in their submission order.
Fixes: 87ebcd42fb ("drm/vc4: crtc: Assign output to channel automatically")
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Tested-by: Hoegeun Kwon <hoegeun.kwon@samsung.com>
Reviewed-by: Hoegeun Kwon <hoegeun.kwon@samsung.com>
Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20201120144245.398711-2-maxime@cerno.tech
Georgi writes:
interconnect fixes for v5.10
This contains a few driver fixes and one core fix:
- Fix an excessive of_node_put() in the core.
- Fix boot regression and integer overflow on msm8974 platforms.
- Fix a minor issue on qcs404 and msm8916 platforms.
Signed-off-by: Georgi Djakov <georgi.djakov@linaro.org>
* tag 'icc-5.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/djakov/icc:
interconnect: fix memory trashing in of_count_icc_providers()
interconnect: qcom: qcs404: Remove GPU and display RPM IDs
interconnect: qcom: msm8916: Remove rpm-ids from non-RPM nodes
interconnect: qcom: msm8974: Don't boost the NoC rate during boot
interconnect: qcom: msm8974: Prevent integer overflow in rate
Fixed ordering for MMC devices on rk3399, due to a mmc change jumbling
all ordering, a fix to make the Odroig Go Advance actually power down
and using the correct clock name on the NanoPi R2S.
* tag 'v5.10-rockchip-dtsfixes1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip:
arm64: dts: rockchip: Reorder LED triggers from mmc devices on rk3399-roc-pc.
arm64: dts: rockchip: Assign a fixed index to mmc devices on rk3399 boards.
arm64: dts: rockchip: Remove system-power-controller from pmic on Odroid Go Advance
arm64: dts: rockchip: fix NanoPi R2S GMAC clock name
Link: https://lore.kernel.org/r/11641389.O9o76ZdvQC@phil
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
With hardware dirty bit management, calling pte_wrprotect() on a writable,
dirty PTE will lose the dirty state and return a read-only, clean entry.
Move the logic from ptep_set_wrprotect() into pte_wrprotect() to ensure that
the dirty bit is preserved for writable entries, as this is required for
soft-dirty bit management if we enable it in the future.
Cc: <stable@vger.kernel.org>
Fixes: 2f4b829c62 ("arm64: Add support for hardware updates of the access and dirty pte bits")
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20201120143557.6715-3-will@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
pte_accessible() is used by ptep_clear_flush() to figure out whether TLB
invalidation is necessary when unmapping pages for reclaim. Although our
implementation is correct according to the architecture, returning true
only for valid, young ptes in the absence of racing page-table
modifications, this is in fact flawed due to lazy invalidation of old
ptes in ptep_clear_flush_young() where we elide the expensive DSB
instruction for completing the TLB invalidation.
Rather than penalise the aging path, adjust pte_accessible() to return
true for any valid pte, even if the access flag is cleared.
Cc: <stable@vger.kernel.org>
Fixes: 76c714be0e ("arm64: pgtable: implement pte_accessible()")
Reported-by: Yu Zhao <yuzhao@google.com>
Acked-by: Yu Zhao <yuzhao@google.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20201120143557.6715-2-will@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Robin Murphy pointed out that if the arm-smmu driver probes before
the qcom_scm driver, we may call qcom_scm_qsmmu500_wait_safe_toggle()
before the __scm is initialized.
Now, getting this to happen is a bit contrived, as in my efforts it
required enabling asynchronous probing for both drivers, moving the
firmware dts node to the end of the dtsi file, as well as forcing a
long delay in the qcom_scm_probe function.
With those tweaks we ran into the following crash:
[ 2.631040] arm-smmu 15000000.iommu: Stage-1: 48-bit VA -> 48-bit IPA
[ 2.633372] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
...
[ 2.633402] [0000000000000000] user address but active_mm is swapper
[ 2.633409] Internal error: Oops: 96000005 [#1] PREEMPT SMP
[ 2.633415] Modules linked in:
[ 2.633427] CPU: 5 PID: 117 Comm: kworker/u16:2 Tainted: G W 5.10.0-rc1-mainline-00025-g272a618fc36-dirty #3971
[ 2.633430] Hardware name: Thundercomm Dragonboard 845c (DT)
[ 2.633448] Workqueue: events_unbound async_run_entry_fn
[ 2.633456] pstate: 80c00005 (Nzcv daif +PAN +UAO -TCO BTYPE=--)
[ 2.633465] pc : qcom_scm_qsmmu500_wait_safe_toggle+0x78/0xb0
[ 2.633473] lr : qcom_smmu500_reset+0x58/0x78
[ 2.633476] sp : ffffffc0105a3b60
...
[ 2.633567] Call trace:
[ 2.633572] qcom_scm_qsmmu500_wait_safe_toggle+0x78/0xb0
[ 2.633576] qcom_smmu500_reset+0x58/0x78
[ 2.633581] arm_smmu_device_reset+0x194/0x270
[ 2.633585] arm_smmu_device_probe+0xc94/0xeb8
[ 2.633592] platform_drv_probe+0x58/0xa8
[ 2.633597] really_probe+0xec/0x398
[ 2.633601] driver_probe_device+0x5c/0xb8
[ 2.633606] __driver_attach_async_helper+0x64/0x88
[ 2.633610] async_run_entry_fn+0x4c/0x118
[ 2.633617] process_one_work+0x20c/0x4b0
[ 2.633621] worker_thread+0x48/0x460
[ 2.633628] kthread+0x14c/0x158
[ 2.633634] ret_from_fork+0x10/0x18
[ 2.633642] Code: a9034fa0 d0007f73 29107fa0 91342273 (f9400020)
To avoid this, this patch adds a check on qcom_scm_is_available() in
the qcom_smmu_impl_init() function, returning -EPROBE_DEFER if its
not ready.
This allows the driver to try to probe again later after qcom_scm has
finished probing.
Reported-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Andy Gross <agross@kernel.org>
Cc: Maulik Shah <mkshah@codeaurora.org>
Cc: Bjorn Andersson <bjorn.andersson@linaro.org>
Cc: Saravana Kannan <saravanak@google.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Lina Iyer <ilina@codeaurora.org>
Cc: iommu@lists.linux-foundation.org
Cc: linux-arm-msm <linux-arm-msm@vger.kernel.org>
Link: https://lore.kernel.org/r/20201112220520.48159-1-john.stultz@linaro.org
Signed-off-by: Will Deacon <will@kernel.org>
Given the case that bootloader(such as UEFI)'s FSPI driver might not
handle all interrupts before loading kernel, those legacy interrupts
would assert immidiately once kernel's FSPI driver enable them. Further,
if it was FSPI_INTR_IPCMDDONE, the irq handler nxp_fspi_irq_handler()
would call complete(&f->c) to notify others. However, f->c might not be
initialized yet at that time, then cause kernel panic.
Of cause, we should fix this issue within bootloader. But it would be
better to have this pacth to make dirver more robust (by clearing all
interrupt status bits before enabling interrupts).
Suggested-by: Han Xu <han.xu@nxp.com>
Signed-off-by: Ran Wang <ran.wang_1@nxp.com>
Link: https://lore.kernel.org/r/20201123025715.14635-1-ran.wang_1@nxp.com
Signed-off-by: Mark Brown <broonie@kernel.org>
AMD IOMMU requires 4k-aligned pages for the event log, the PPR log,
and the completion wait write-back regions. However, when allocating
the pages, they could be part of large mapping (e.g. 2M) page.
This causes #PF due to the SNP RMP hardware enforces the check based
on the page level for these data structures.
So, fix by calling set_memory_4k() on the allocated pages.
Fixes: c69d89aff3 ("iommu/amd: Use 4K page for completion wait write-back semaphore")
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Link: https://lore.kernel.org/r/20201105145832.3065-1-suravee.suthikulpanit@amd.com
Signed-off-by: Will Deacon <will@kernel.org>
Pull SCMI cpufreq driver fix for 5.10-rc6 from Viresh Kumar:
"This fixes a build issues with SCMI cpufreq driver in the
!CONFIG_COMMON_CLK case."
* 'cpufreq/arm/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm:
cpufreq: scmi: Fix build for !CONFIG_COMMON_CLK
Fix following warnings caused by mismatch between
function parameters and function comments.
drivers/acpi/arm64/iort.c:55: warning: Function parameter or member 'iort_node' not described in 'iort_set_fwnode'
drivers/acpi/arm64/iort.c:55: warning: Excess function parameter 'node' description in 'iort_set_fwnode'
drivers/acpi/arm64/iort.c:682: warning: Function parameter or member 'id' not described in 'iort_get_device_domain'
drivers/acpi/arm64/iort.c:682: warning: Function parameter or member 'bus_token' not described in 'iort_get_device_domain'
drivers/acpi/arm64/iort.c:682: warning: Excess function parameter 'req_id' description in 'iort_get_device_domain'
drivers/acpi/arm64/iort.c:1142: warning: Function parameter or member 'dma_size' not described in 'iort_dma_setup'
drivers/acpi/arm64/iort.c:1142: warning: Excess function parameter 'size' description in 'iort_dma_setup'
drivers/acpi/arm64/iort.c:1534: warning: Function parameter or member 'ops' not described in 'iort_add_platform_device'
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
Acked-by: Hanjun Guo <guohanjun@huawei.com>
Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Link: https://lore.kernel.org/r/20201014093139.1580-1-shiju.jose@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
Adding <asm/exception.h> brought in <asm/kprobes.h> which uses
<asm/probes.h>, which uses 'pstate_check_t' so the latter needs to
#include <asm/insn.h> for this typedef.
Fixes this build error:
In file included from arch/arm64/include/asm/kprobes.h:24,
from arch/arm64/include/asm/exception.h:11,
from arch/arm64/kernel/fpsimd.c:35:
arch/arm64/include/asm/probes.h:16:2: error: unknown type name 'pstate_check_t'
16 | pstate_check_t *pstate_cc;
Fixes: c6b90d5cf6 ("arm64/fpsimd: Fix missing-prototypes in fpsimd.c")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: Tian Tao <tiantao6@hisilicon.com>
Link: https://lore.kernel.org/r/20201123044510.9942-1-rdunlap@infradead.org
Signed-off-by: Will Deacon <will@kernel.org>
We need to disable interrupts in load_fpu_regs(). Otherwise an
interrupt might come in after the registers are loaded, but before
CIF_FPU is cleared in load_fpu_regs(). When the interrupt returns,
CIF_FPU will be cleared and the registers will never be restored.
The entry.S code usually saves the interrupt state in __SF_EMPTY on the
stack when disabling/restoring interrupts. sie64a however saves the pointer
to the sie control block in __SF_SIE_CONTROL, which references the same
location. This is non-obvious to the reader. To avoid thrashing the sie
control block pointer in load_fpu_regs(), move the __SIE_* offsets eight
bytes after __SF_EMPTY on the stack.
Cc: <stable@vger.kernel.org> # 5.8
Fixes: 0b0ed657fe ("s390: remove critical section cleanup from entry.S")
Reported-by: Pierre Morel <pmorel@linux.ibm.com>
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Using DECLARE_STATIC_KEY_FALSE needs linux/jump_table.h.
Otherwise the build fails with eg:
arch/powerpc/include/asm/book3s/64/kup-radix.h:66:1: warning: data definition has no type or storage class
66 | DECLARE_STATIC_KEY_FALSE(uaccess_flush_key);
Fixes: 9a32a7e78b ("powerpc/64s: flush L1D after user accesses")
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
[mpe: Massage change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201123184016.693fe464@canb.auug.org.au
From Daniel's cover letter:
IBM Power9 processors can speculatively operate on data in the L1 cache
before it has been completely validated, via a way-prediction mechanism. It
is not possible for an attacker to determine the contents of impermissible
memory using this method, since these systems implement a combination of
hardware and software security measures to prevent scenarios where
protected data could be leaked.
However these measures don't address the scenario where an attacker induces
the operating system to speculatively execute instructions using data that
the attacker controls. This can be used for example to speculatively bypass
"kernel user access prevention" techniques, as discovered by Anthony
Steinhauser of Google's Safeside Project. This is not an attack by itself,
but there is a possibility it could be used in conjunction with
side-channels or other weaknesses in the privileged code to construct an
attack.
This issue can be mitigated by flushing the L1 cache between privilege
boundaries of concern.
This patch series flushes the L1 cache on kernel entry (patch 2) and after the
kernel performs any user accesses (patch 3). It also adds a self-test and
performs some related cleanups.
Commit 8410e7f3b3 ("cpufreq: scmi: Fix OPP addition failure with a
dummy clock provider") registers a dummy clock provider using
devm_of_clk_add_hw_provider. These *_hw_provider functions are defined
only when CONFIG_COMMON_CLK=y. One possible fix is to add the Kconfig
dependency, but since we plan to move away from the clock dependency
for scmi cpufreq, it is preferrable to avoid that.
Let us just conditionally compile out the offending call to
devm_of_clk_add_hw_provider. It also uses the variable 'dev' outside
of the #ifdef block to avoid build warning.
Fixes: 8410e7f3b3 ("cpufreq: scmi: Fix OPP addition failure with a dummy clock provider")
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
The Exynos DRM uses Common Clock Framework thus it cannot be built on
platforms without it (e.g. compile test on MIPS with RALINK and
SOC_RT305X):
/usr/bin/mips-linux-gnu-ld: drivers/gpu/drm/exynos/exynos_mixer.o: in function `mixer_bind':
exynos_mixer.c:(.text+0x958): undefined reference to `clk_set_parent'
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
Pull HID fixes from Jiri Kosina:
- Various functionality / regression fixes for Logitech devices from
Hans de Goede
- Fix for (recently added) GPIO support in mcp2221 driver from Lars
Povlsen
- Power management handling fix/quirk in i2c-hid driver for certain
BIOSes that have strange aproach to power-cycle from Hans de Goede
- a few device ID additions and device-specific quirks
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
HID: logitech-dj: Fix Dinovo Mini when paired with a MX5x00 receiver
HID: logitech-dj: Fix an error in mse_bluetooth_descriptor
HID: Add Logitech Dinovo Edge battery quirk
HID: logitech-hidpp: Add HIDPP_CONSUMER_VENDOR_KEYS quirk for the Dinovo Edge
HID: logitech-dj: Handle quad/bluetooth keyboards with a builtin trackpad
HID: add HID_QUIRK_INCREMENT_USAGE_ON_DUPLICATE for Gamevice devices
HID: mcp2221: Fix GPIO output handling
HID: hid-sensor-hub: Fix issue with devices with no report ID
HID: i2c-hid: Put ACPI enumerated devices in D3 on shutdown
HID: add support for Sega Saturn
HID: cypress: Support Varmilo Keyboards' media hotkeys
HID: ite: Replace ABS_MISC 120/121 events with touchpad on/off keypresses
HID: logitech-hidpp: Add PID for MX Anywhere 2
HID: uclogic: Add ID for Trust Flex Design Tablet
Pull scheduler fixes from Thomas Gleixner:
"A couple of scheduler fixes:
- Make the conditional update of the overutilized state work
correctly by caching the relevant flags state before overwriting
them and checking them afterwards.
- Fix a data race in the wakeup path which caused loadavg on ARM64
platforms to become a random number generator.
- Fix the ordering of the iowaiter accounting operations so it can't
be decremented before it is incremented.
- Fix a bug in the deadline scheduler vs. priority inheritance when a
non-deadline task A has inherited the parameters of a deadline task
B and then blocks on a non-deadline task C.
The second inheritance step used the static deadline parameters of
task A, which are usually 0, instead of further propagating task
B's parameters. The zero initialized parameters trigger a bug in
the deadline scheduler"
* tag 'sched-urgent-2020-11-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/deadline: Fix priority inheritance with multiple scheduling classes
sched: Fix rq->nr_iowait ordering
sched: Fix data-race in wakeup
sched/fair: Fix overutilized update in enqueue_task_fair()
Pull perf fix from Thomas Gleixner:
"A single fix for the x86 perf sysfs interfaces which used kobject
attributes instead of device attributes and therefore making clang's
control flow integrity checker upset"
* tag 'perf-urgent-2020-11-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86: fix sysfs type mismatches
Pull locking fix from Thomas Gleixner:
"A single fix for lockdep which makes the recursion protection cover
graph lock/unlock"
* tag 'locking-urgent-2020-11-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
lockdep: Put graph lock/unlock under lock_recursion protection
Pull EFI fixes from Borislav Petkov:
"Forwarded EFI fixes from Ard Biesheuvel:
- fix memory leak in efivarfs driver
- fix HYP mode issue in 32-bit ARM version of the EFI stub when built
in Thumb2 mode
- avoid leaking EFI pgd pages on allocation failure"
* tag 'efi-urgent-for-v5.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi/x86: Free efi_pgd with free_pages()
efivarfs: fix memory leak in efivarfs_create()
efi/arm: set HSCTLR Thumb2 bit correctly for HVC calls from HYP
Pull x86 fixes from Borislav Petkov:
- An IOMMU VT-d build fix when CONFIG_PCI_ATS=n along with a revert of
same because the proper one is going through the IOMMU tree (Thomas
Gleixner)
- An Intel microcode loader fix to save the correct microcode patch to
apply during resume (Chen Yu)
- A fix to not access user memory of other processes when dumping
opcode bytes (Thomas Gleixner)
* tag 'x86_urgent_for_v5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
Revert "iommu/vt-d: Take CONFIG_PCI_ATS into account"
x86/dumpstack: Do not try to access user space code of other tasks
x86/microcode/intel: Check patch signature before saving microcode for early loading
iommu/vt-d: Take CONFIG_PCI_ATS into account
Merge misc fixes from Andrew Morton:
"8 patches.
Subsystems affected by this patch series: mm (madvise, pagemap,
readahead, memcg, userfaultfd), kbuild, and vfs"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm: fix madvise WILLNEED performance problem
libfs: fix error cast of negative value in simple_attr_write()
mm/userfaultfd: do not access vma->vm_mm after calling handle_userfault()
mm: memcg/slab: fix root memcg vmstats
mm: fix readahead_page_batch for retry entries
mm: fix phys_to_target_node() and memory_add_physaddr_to_nid() exports
compiler-clang: remove version check for BPF Tracing
mm/madvise: fix memory leak from process_madvise
Pull staging and IIO fixes from Greg KH:
"Here are some small Staging and IIO driver fixes for 5.10-rc5. They
include:
- IIO fixes for reported regressions and problems
- new device ids for IIO drivers
- new device id for rtl8723bs driver
- staging ralink driver Kconfig dependency fix
- staging mt7621-pci bus resource fix
All of these have been in linux-next all week with no reported issues"
* tag 'staging-5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
iio: accel: kxcjk1013: Add support for KIOX010A ACPI DSM for setting tablet-mode
iio: accel: kxcjk1013: Replace is_smo8500_device with an acpi_type enum
docs: ABI: testing: iio: stm32: remove re-introduced unsupported ABI
iio: light: fix kconfig dependency bug for VCNL4035
iio/adc: ingenic: Fix AUX/VBAT readings when touchscreen is used
iio/adc: ingenic: Fix battery VREF for JZ4770 SoC
staging: rtl8723bs: Add 024c:0627 to the list of SDIO device-ids
staging: ralink-gdma: fix kconfig dependency bug for DMA_RALINK
staging: mt7621-pci: avoid to request pci bus resources
iio: imu: st_lsm6dsx: set 10ms as min shub slave timeout
counter/ti-eqep: Fix regmap max_register
iio: adc: stm32-adc: fix a regression when using dma and irq
iio: adc: mediatek: fix unset field
iio: cros_ec: Use default frequencies when EC returns invalid information
Pull tty fixes from Greg KH:
"Here are some small tty/serial fixes for 5.10-rc5 that resolve some
reported issues:
- speakup crash when telling the kernel to use a device that isn't
really there
- imx serial driver fixes for reported problems
- ar933x_uart driver fix for probe error handling path
All have been in linux-next for a while with no reported issues"
* tag 'tty-5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
serial: ar933x_uart: disable clk on error handling path in probe
tty: serial: imx: keep console clocks always on
speakup: Do not let the line discipline be used several times
tty: serial: imx: fix potential deadlock
Pull ext4 fixes from Ted Ts'o:
"A final set of miscellaneous bug fixes for ext4"
* tag 'ext4_for_linus_fixes2' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: fix bogus warning in ext4_update_dx_flag()
jbd2: fix kernel-doc markups
ext4: drop fast_commit from /proc/mounts
When doing a lookup in a directory, the afs filesystem uses a bulk
status fetch to speculatively retrieve the statuses of up to 48 other
vnodes found in the same directory and it will then either update extant
inodes or create new ones - effectively doing 'lookup ahead'.
To avoid the possibility of deadlocking itself, however, the filesystem
doesn't lock all of those inodes; rather just the directory inode is
locked (by the VFS).
When the operation completes, afs_inode_init_from_status() or
afs_apply_status() is called, depending on whether the inode already
exists, to commit the new status.
A case exists, however, where the speculative status fetch operation may
straddle a modification operation on one of those vnodes. What can then
happen is that the speculative bulk status RPC retrieves the old status,
and whilst that is happening, the modification happens - which returns
an updated status, then the modification status is committed, then we
attempt to commit the speculative status.
This results in something like the following being seen in dmesg:
kAFS: vnode modified {100058:861} 8->9 YFS.InlineBulkStatus
showing that for vnode 861 on volume 100058, we saw YFS.InlineBulkStatus
say that the vnode had data version 8 when we'd already recorded version
9 due to a local modification. This was causing the cache to be
invalidated for that vnode when it shouldn't have been. If it happens
on a data file, this might lead to local changes being lost.
Fix this by ignoring speculative status updates if the data version
doesn't match the expected value.
Note that it is possible to get a DV regression if a volume gets
restored from a backup - but we should get a callback break in such a
case that should trigger a recheck anyway. It might be worth checking
the volume creation time in the volsync info and, if a change is
observed in that (as would happen on a restore), invalidate all caches
associated with the volume.
Fixes: 5cf9dd55a0 ("afs: Prospectively look up extra files when doing a single lookup")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The attr->set() receive a value of u64, but simple_strtoll() is used for
doing the conversion. It will lead to the error cast if user inputs a
negative value.
Use kstrtoull() instead of simple_strtoll() to convert a string got from
the user to an unsigned value. The former will return '-EINVAL' if it
gets a negetive value, but the latter can't handle the situation
correctly. Make 'val' unsigned long long as what kstrtoull() takes,
this will eliminate the compile warning on no 64-bit architectures.
Fixes: f7b88631a8 ("fs/libfs.c: fix simple_attr_write() on 32bit machines")
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Link: https://lkml.kernel.org/r/1605341356-11872-1-git-send-email-yangyicong@hisilicon.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexander reported a syzkaller / KASAN finding on s390, see below for
complete output.
In do_huge_pmd_anonymous_page(), the pre-allocated pagetable will be
freed in some cases. In the case of userfaultfd_missing(), this will
happen after calling handle_userfault(), which might have released the
mmap_lock. Therefore, the following pte_free(vma->vm_mm, pgtable) will
access an unstable vma->vm_mm, which could have been freed or re-used
already.
For all architectures other than s390 this will go w/o any negative
impact, because pte_free() simply frees the page and ignores the
passed-in mm. The implementation for SPARC32 would also access
mm->page_table_lock for pte_free(), but there is no THP support in
SPARC32, so the buggy code path will not be used there.
For s390, the mm->context.pgtable_list is being used to maintain the 2K
pagetable fragments, and operating on an already freed or even re-used
mm could result in various more or less subtle bugs due to list /
pagetable corruption.
Fix this by calling pte_free() before handle_userfault(), similar to how
it is already done in __do_huge_pmd_anonymous_page() for the WRITE /
non-huge_zero_page case.
Commit 6b251fc96c ("userfaultfd: call handle_userfault() for
userfaultfd_missing() faults") actually introduced both, the
do_huge_pmd_anonymous_page() and also __do_huge_pmd_anonymous_page()
changes wrt to calling handle_userfault(), but only in the latter case
it put the pte_free() before calling handle_userfault().
BUG: KASAN: use-after-free in do_huge_pmd_anonymous_page+0xcda/0xd90 mm/huge_memory.c:744
Read of size 8 at addr 00000000962d6988 by task syz-executor.0/9334
CPU: 1 PID: 9334 Comm: syz-executor.0 Not tainted 5.10.0-rc1-syzkaller-07083-g4c9720875573 #0
Hardware name: IBM 3906 M04 701 (KVM/Linux)
Call Trace:
do_huge_pmd_anonymous_page+0xcda/0xd90 mm/huge_memory.c:744
create_huge_pmd mm/memory.c:4256 [inline]
__handle_mm_fault+0xe6e/0x1068 mm/memory.c:4480
handle_mm_fault+0x288/0x748 mm/memory.c:4607
do_exception+0x394/0xae0 arch/s390/mm/fault.c:479
do_dat_exception+0x34/0x80 arch/s390/mm/fault.c:567
pgm_check_handler+0x1da/0x22c arch/s390/kernel/entry.S:706
copy_from_user_mvcos arch/s390/lib/uaccess.c:111 [inline]
raw_copy_from_user+0x3a/0x88 arch/s390/lib/uaccess.c:174
_copy_from_user+0x48/0xa8 lib/usercopy.c:16
copy_from_user include/linux/uaccess.h:192 [inline]
__do_sys_sigaltstack kernel/signal.c:4064 [inline]
__s390x_sys_sigaltstack+0xc8/0x240 kernel/signal.c:4060
system_call+0xe0/0x28c arch/s390/kernel/entry.S:415
Allocated by task 9334:
slab_alloc_node mm/slub.c:2891 [inline]
slab_alloc mm/slub.c:2899 [inline]
kmem_cache_alloc+0x118/0x348 mm/slub.c:2904
vm_area_dup+0x9c/0x2b8 kernel/fork.c:356
__split_vma+0xba/0x560 mm/mmap.c:2742
split_vma+0xca/0x108 mm/mmap.c:2800
mlock_fixup+0x4ae/0x600 mm/mlock.c:550
apply_vma_lock_flags+0x2c6/0x398 mm/mlock.c:619
do_mlock+0x1aa/0x718 mm/mlock.c:711
__do_sys_mlock2 mm/mlock.c:738 [inline]
__s390x_sys_mlock2+0x86/0xa8 mm/mlock.c:728
system_call+0xe0/0x28c arch/s390/kernel/entry.S:415
Freed by task 9333:
slab_free mm/slub.c:3142 [inline]
kmem_cache_free+0x7c/0x4b8 mm/slub.c:3158
__vma_adjust+0x7b2/0x2508 mm/mmap.c:960
vma_merge+0x87e/0xce0 mm/mmap.c:1209
userfaultfd_release+0x412/0x6b8 fs/userfaultfd.c:868
__fput+0x22c/0x7a8 fs/file_table.c:281
task_work_run+0x200/0x320 kernel/task_work.c:151
tracehook_notify_resume include/linux/tracehook.h:188 [inline]
do_notify_resume+0x100/0x148 arch/s390/kernel/signal.c:538
system_call+0xe6/0x28c arch/s390/kernel/entry.S:416
The buggy address belongs to the object at 00000000962d6948 which belongs to the cache vm_area_struct of size 200
The buggy address is located 64 bytes inside of 200-byte region [00000000962d6948, 00000000962d6a10)
The buggy address belongs to the page: page:00000000313a09fe refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x962d6 flags: 0x3ffff00000000200(slab)
raw: 3ffff00000000200 000040000257e080 0000000c0000000c 000000008020ba00
raw: 0000000000000000 000f001e00000000 ffffffff00000001 0000000096959501
page dumped because: kasan: bad access detected
page->mem_cgroup:0000000096959501
Memory state around the buggy address:
00000000962d6880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00000000962d6900: 00 fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb
>00000000962d6980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
00000000962d6a00: fb fb fc fc fc fc fc fc fc fc 00 00 00 00 00 00
00000000962d6a80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
==================================================================
Fixes: 6b251fc96c ("userfaultfd: call handle_userfault() for userfaultfd_missing() faults")
Reported-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Signed-off-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: <stable@vger.kernel.org> [4.3+]
Link: https://lkml.kernel.org/r/20201110190329.11920-1-gerald.schaefer@linux.ibm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The core-mm has a default __weak implementation of phys_to_target_node()
to mirror the weak definition of memory_add_physaddr_to_nid(). That
symbol is exported for modules. However, while the export in
mm/memory_hotplug.c exported the symbol in the configuration cases of:
CONFIG_NUMA_KEEP_MEMINFO=y
CONFIG_MEMORY_HOTPLUG=y
...and:
CONFIG_NUMA_KEEP_MEMINFO=n
CONFIG_MEMORY_HOTPLUG=y
...it failed to export the symbol in the case of:
CONFIG_NUMA_KEEP_MEMINFO=y
CONFIG_MEMORY_HOTPLUG=n
Not only is that broken, but Christoph points out that the kernel should
not be exporting any __weak symbol, which means that
memory_add_physaddr_to_nid() example that phys_to_target_node() copied
is broken too.
Rework the definition of phys_to_target_node() and
memory_add_physaddr_to_nid() to not require weak symbols. Move to the
common arch override design-pattern of an asm header defining a symbol
to replace the default implementation.
The only common header that all memory_add_physaddr_to_nid() producing
architectures implement is asm/sparsemem.h. In fact, powerpc already
defines its memory_add_physaddr_to_nid() helper in sparsemem.h.
Double-down on that observation and define phys_to_target_node() where
necessary in asm/sparsemem.h. An alternate consideration that was
discarded was to put this override in asm/numa.h, but that entangles
with the definition of MAX_NUMNODES relative to the inclusion of
linux/nodemask.h, and requires powerpc to grow a new header.
The dependency on NUMA_KEEP_MEMINFO for DEV_DAX_HMEM_DEVICES is invalid
now that the symbol is properly exported / stubbed in all combinations
of CONFIG_NUMA_KEEP_MEMINFO and CONFIG_MEMORY_HOTPLUG.
[dan.j.williams@intel.com: v4]
Link: https://lkml.kernel.org/r/160461461867.1505359.5301571728749534585.stgit@dwillia2-desk3.amr.corp.intel.com
[dan.j.williams@intel.com: powerpc: fix create_section_mapping compile warning]
Link: https://lkml.kernel.org/r/160558386174.2948926.2740149041249041764.stgit@dwillia2-desk3.amr.corp.intel.com
Fixes: a035b6bf86 ("mm/memory_hotplug: introduce default phys_to_target_node() implementation")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Joao Martins <joao.m.martins@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Link: https://lkml.kernel.org/r/160447639846.1133764.7044090803980177548.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On systems without HW-based collections (i.e. anything except GIC-500),
we rely on firmware to perform the ITS save/restore. This doesn't
really work, as although FW can properly save everything, it cannot
fully restore the state of the command queue (the read-side is reset
to the head of the queue). This results in the ITS consuming previously
processed commands, potentially corrupting the state.
Instead, let's always save the ITS state on suspend, disabling it in the
process, and restore the full state on resume. This saves us from broken
FW as long as it doesn't enable the ITS by itself (for which we can't do
anything).
This amounts to simply dropping the ITS_FLAGS_SAVE_SUSPEND_STATE.
Signed-off-by: Xu Qiang <xuqiang36@huawei.com>
[maz: added warning on resume, rewrote commit message]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201107104226.14282-1-xuqiang36@huawei.com
Lijun Pan says:
====================
ibmvnic: fixes in reset path
Patch 1/3 and 2/3 notify peers in failover and migration reset.
Patch 3/3 skips timeout reset if it is already resetting.
====================
Link: https://lore.kernel.org/r/20201120224013.46891-1-ljp@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sometimes it takes longer than 5 seconds (watchdog timeout) to complete
failover, migration, and other resets. In stead of scheduling another
timeout reset, we wait for the current one to complete.
Suggested-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Lijun Pan <ljp@linux.ibm.com>
Reviewed-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit 61d3e1d9bc ("ibmvnic: Remove netdev notify for failover resets")
excluded the failover case for notify call because it said
netdev_notify_peers() can cause network traffic to stall or halt.
Current testing does not show network traffic stall
or halt because of the notify call for failover event.
netdev_notify_peers may be used when a device wants to inform the
rest of the network about some sort of a reconfiguration
such as failover or migration.
It is unnecessary to call that in other events like
FATAL, NON_FATAL, CHANGE_PARAM, and TIMEOUT resets
since in those scenarios the hardware does not change.
If the driver must do a hard reset, it is necessary to notify peers.
Fixes: 61d3e1d9bc ("ibmvnic: Remove netdev notify for failover resets")
Suggested-by: Brian King <brking@linux.vnet.ibm.com>
Suggested-by: Pradeep Satyanarayana <pradeeps@linux.vnet.ibm.com>
Signed-off-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: Lijun Pan <ljp@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When netdev_notify_peers was substituted in
commit 986103e792 ("net/ibmvnic: Fix RTNL deadlock during device reset"),
call_netdevice_notifiers(NETDEV_RESEND_IGMP, dev) was missed.
Fix it now.
Fixes: 986103e792 ("net/ibmvnic: Fix RTNL deadlock during device reset")
Signed-off-by: Lijun Pan <ljp@linux.ibm.com>
Reviewed-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
tun only checks the file O_NONBLOCK flag, but it should also be checking
the iocb IOCB_NOWAIT flag. Any fops using ->read/write_iter() should check
both, otherwise it breaks users that correctly expect O_NONBLOCK semantics
if IOCB_NOWAIT is set.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/e9451860-96cc-c7c7-47b8-fe42cadd5f4c@kernel.dk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Child sockets erroneously inherit their parent's sk_type (ie. SOCK_*),
instead of the PF_IUCV protocol that the parent was created with in
iucv_sock_create().
We're currently not using sk->sk_protocol ourselves, so this shouldn't
have much impact (except eg. getting the output in skb_dump() right).
Fixes: eac3731bd0 ("[S390]: Add AF_IUCV socket support")
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Link: https://lore.kernel.org/r/20201120100657.34407-1-jwi@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Starting with iOS 14 released in September 2020, connectivity using the
personal hotspot USB tethering function of iOS devices is broken.
Communication between the host and the device (for example ICMP traffic
or DNS resolution using the DNS service running in the device itself)
works fine, but communication to endpoints further away doesn't work.
Investigation on the matter shows that no UDP and ICMP traffic from the
tethered host is reaching the Internet at all. For TCP traffic there are
exchanges between tethered host and server but packets are modified in
transit leading to impossible communication.
After some trials Matti Vuorela discovered that reducing the URB buffer
size by two bytes restored the previous behavior. While a better
solution might exist to fix the issue, since the protocol is not
publicly documented and considering the small size of the fix, let's do
that.
Tested-by: Matti Vuorela <matti.vuorela@bitfactor.fi>
Signed-off-by: Yves-Alexis Perez <corsac@corsac.net>
Link: https://lore.kernel.org/linux-usb/CAAn0qaXmysJ9vx3ZEMkViv_B19ju-_ExN8Yn_uSefxpjS6g4Lw@mail.gmail.com/
Link: https://github.com/libimobiledevice/libimobiledevice/issues/1038
Link: https://lore.kernel.org/r/20201119172439.94988-1-corsac@corsac.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
After commit 9d2e5e9eeb ("cxgb4/ch_ktls: decrypted bit is not enough")
whenever CONFIG_TLS=m and CONFIG_CHELSIO_T4=y, the following build
failure occurs:
ld: drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.o: in function
`cxgb_select_queue':
cxgb4_main.c:(.text+0x2dac): undefined reference to `tls_validate_xmit_skb'
Fix this by ensuring that if TLS is set to be a module, CHELSIO_T4 will
also be compiled as a module. As otherwise the cxgb4 driver will not be
able to access TLS' symbols.
Fixes: 9d2e5e9eeb ("cxgb4/ch_ktls: decrypted bit is not enough")
Signed-off-by: Tom Seewald <tseewald@gmail.com>
Link: https://lore.kernel.org/r/20201120192528.615-1-tseewald@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull xfs fixes from Darrick Wong:
"The critical fixes are for a crash that someone reported in the xattr
code on 32-bit arm last week; and a revert of the rmap key comparison
change from last week as it was totally wrong. I need a vacation. :(
Summary:
- Fix various deficiencies in online fsck's metadata checking code
- Fix an integer casting bug in the xattr code on 32-bit systems
- Fix a hang in an inode walk when the inode index is corrupt
- Fix error codes being dropped when initializing per-AG structures
- Fix nowait directio writes that partially succeed but return EAGAIN
- Revert last week's rmap comparison patch because it was wrong"
* tag 'xfs-5.10-fixes-7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: revert "xfs: fix rmap key and record comparison functions"
xfs: don't allow NOWAIT DIO across extent boundaries
xfs: return corresponding errcode if xfs_initialize_perag() fail
xfs: ensure inobt record walks always make forward progress
xfs: fix forkoff miscalculation related to XFS_LITINO(mp)
xfs: directory scrub should check the null bestfree entries too
xfs: strengthen rmap record flags checking
xfs: fix the minrecs logic when dealing with inode root child blocks
Pull fanotify fix from Jan Kara:
"A single fanotify fix from Amir"
* tag 'fsnotify_for_v5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
fanotify: fix logic of reporting name info with watched parent
Pull seccomp fixes from Kees Cook:
"This gets the seccomp selftests running again on powerpc and sh, and
fixes an audit reporting oversight noticed in both seccomp and ptrace.
- Fix typos in seccomp selftests on powerpc and sh (Kees Cook)
- Fix PF_SUPERPRIV audit marking in seccomp and ptrace (Mickaël
Salaün)"
* tag 'seccomp-v5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
selftests/seccomp: sh: Fix register names
selftests/seccomp: powerpc: Fix typo in macro variable name
seccomp: Set PF_SUPERPRIV when checking capability
ptrace: Set PF_SUPERPRIV when checking capability
In the patch to be fixed, horizontal_backporch_byte become too large
for some panel, so roll back that patch. For small hfp or hbp panel,
using vm->hfront_porch + vm->hback_porch to calculate
horizontal_backporch_byte would make it negtive, so
use horizontal_backporch_byte itself to make it positive.
Fixes: 35bf948f1e ("drm/mediatek: dsi: Fix scrolling of panel with small hfp or hbp")
Signed-off-by: CK Hu <ck.hu@mediatek.com>
Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org>
Tested-by: Bilal Wasim <bilal.wasim@imgtec.com>
Julian Wiedmann says:
====================
s390/qeth: fixes 2020-11-20
This brings several fixes for qeth's af_iucv-specific code paths.
Also one fix by Alexandra for the recently added BR_LEARNING_SYNC
support. We want to trust the feature indication bit, so that HW can
mask it out if there's any issues on their end.
====================
Link: https://lore.kernel.org/r/20201120090939.101406-1-jwi@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When qeth_iqd_tx_complete() detects that a TX buffer requires additional
async completion via QAOB, it might fail to replace the queue entry's
metadata (and ends up triggering recovery).
Assume now that the device gets torn down, overruling the recovery.
If the QAOB notification then arrives before the tear down has
sufficiently progressed, the buffer state is changed to
QETH_QDIO_BUF_HANDLED_DELAYED by qeth_qdio_handle_aob().
The tear down code calls qeth_drain_output_queue(), where
qeth_cleanup_handled_pending() will then attempt to replace such a
buffer _again_. If it succeeds this time, the buffer ends up dangling in
its replacement's ->next_pending list ... where it will never be freed,
since there's no further call to qeth_cleanup_handled_pending().
But the second attempt isn't actually needed, we can simply leave the
buffer on the queue and re-use it after a potential recovery has
completed. The qeth_clear_output_buffer() in qeth_drain_output_queue()
will ensure that it's in a clean state again.
Fixes: 72861ae792 ("qeth: recovery through asynchronous delivery")
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The two expected notification sequences are
1. TX_NOTIFY_PENDING with a subsequent TX_NOTIFY_DELAYED_*, when
our TX completion code first observed the pending TX and the QAOB
then completes at a later time; or
2. TX_NOTIFY_OK, when qeth_qdio_handle_aob() picked up the QAOB
completion before our TX completion code even noticed that the TX
was pending.
But as qeth_iqd_tx_complete() and qeth_qdio_handle_aob() can run
concurrently, we may end up with a race that results in a sequence of
TX_NOTIFY_DELAYED_* followed by TX_NOTIFY_PENDING. Which would confuse
the af_iucv code in its tracking of pending transmits.
Rework the notification code, so that qeth_qdio_handle_aob() defers its
notification if the TX completion code is still active.
Fixes: b333293058 ("qeth: add support for af_iucv HiperSockets transport")
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Calling into socket code is ugly already, at least check whether we are
dealing with the expected sk_family. Only looking at skb->protocol is
bound to cause troubles (consider eg. af_packet).
Fixes: b333293058 ("qeth: add support for af_iucv HiperSockets transport")
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Remove workaround that supported early hardware implementations
of PNSO OC3. Rely on the 'enarf' feature bit instead.
Fixes: fa115adff2 ("s390/qeth: Detect PNSO OC3 capability")
Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
Reviewed-by: Julian Wiedmann <jwi@linux.ibm.com>
[jwi: use logical instead of bit-wise AND]
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alexander Duyck says:
====================
tcp: Address issues with ECT0 not being set in DCTCP packets
This patch set is meant to address issues seen with SYN/ACK packets not
containing the ECT0 bit when DCTCP is configured as the congestion control
algorithm for a TCP socket.
A simple test using "tcpdump" and "test_progs -t bpf_tcp_ca" makes the
issue obvious. Looking at the packets will result in the SYN/ACK packet
with an ECT0 bit that does not match the other packets for the flow when
the congestion control agorithm is switch from the default. So for example
going from non-DCTCP to a DCTCP congestion control algorithm we will see
the SYN/ACK IPV6 header will not have ECT0 set while the other packets in
the flow will. Likewise if we switch from a default of DCTCP to cubic we
will see the ECT0 bit set in the SYN/ACK while the other packets in the
flow will not.
====================
Link: https://lore.kernel.org/r/160582070138.66684.11785214534154816097.stgit@localhost.localdomain
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When setting congestion control via a BPF program it is seen that the
SYN/ACK for packets within a given flow will not include the ECT0 flag. A
bit of simple printk debugging shows that when this is configured without
BPF we will see the value INET_ECN_xmit value initialized in
tcp_assign_congestion_control however when we configure this via BPF the
socket is in the closed state and as such it isn't configured, and I do not
see it being initialized when we transition the socket into the listen
state. The result of this is that the ECT0 bit is configured based on
whatever the default state is for the socket.
Any easy way to reproduce this is to monitor the following with tcpdump:
tools/testing/selftests/bpf/test_progs -t bpf_tcp_ca
Without this patch the SYN/ACK will follow whatever the default is. If dctcp
all SYN/ACK packets will have the ECT0 bit set, and if it is not then ECT0
will be cleared on all SYN/ACK packets. With this patch applied the SYN/ACK
bit matches the value seen on the other packets in the given stream.
Fixes: 91b5b21c7c ("bpf: Add support for changing congestion control")
Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
An issue was recently found where DCTCP SYN/ACK packets did not have the
ECT bit set in the L3 header. A bit of code review found that the recent
change referenced below had gone though and added a mask that prevented the
ECN bits from being populated in the L3 header.
This patch addresses that by rolling back the mask so that it is only
applied to the flags coming from the incoming TCP request instead of
applying it to the socket tos/tclass field. Doing this the ECT bits were
restored in the SYN/ACK packets in my testing.
One thing that is not addressed by this patch set is the fact that
tcp_reflect_tos appears to be incompatible with ECN based congestion
avoidance algorithms. At a minimum the feature should likely be documented
which it currently isn't.
Fixes: ac8f1710c1 ("tcp: reflect tos value received in SYN to the socket")
Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Acked-by: Wei Wang <weiwan@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull SCSI fixes from James Bottomley:
"Fixes for two fairly obscure but annoying when triggered races in
iSCSI"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: target: iscsi: Fix cmd abort fabric stop race
scsi: libiscsi: Fix NOP race condition
Pull io_uring fixes from Jens Axboe:
"Mostly regression or stable fodder:
- Disallow async path resolution of /proc/self
- Tighten constraints for segmented async buffered reads
- Fix double completion for a retry error case
- Fix for fixed file life times (Pavel)"
* tag 'io_uring-5.10-2020-11-20' of git://git.kernel.dk/linux-block:
io_uring: order refnode recycling
io_uring: get an active ref_node from files_data
io_uring: don't double complete failed reissue request
mm: never attempt async page lock if we've transferred data already
io_uring: handle -EOPNOTSUPP on path resolution
proc: don't allow async path resolution of /proc/self components
If there is only one keyslot, then blk_ksm_init() computes
slot_hashtable_size=1 and log_slot_ht_size=0. This causes
blk_ksm_find_keyslot() to crash later because it uses
hash_ptr(key, log_slot_ht_size) to find the hash bucket containing the
key, and hash_ptr() doesn't support the bits == 0 case.
Fix this by making the hash table always have at least 2 buckets.
Tested by running:
kvm-xfstests -c ext4 -g encrypt -m inlinecrypt \
-o blk-crypto-fallback.num_keyslots=1
Fixes: 1b26283970 ("block: Keyslot Manager for Inline Encryption")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull xen fix from Juergen Gross:
"A single fix for avoiding WARN splats when booting a Xen guest with
nosmt"
* tag 'for-linus-5.10b-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
x86/xen: don't unbind uninitialized lock_kicker_irq
In case when tcp socket received FIN after some data and the
parser haven't started before reading data caller will receive
an empty buffer. This behavior differs from plain TCP socket and
leads to special treating in user-space.
The flow that triggers the race is simple. Server sends small
amount of data right after the connection is configured to use TLS
and closes the connection. In this case receiver sees TLS Handshake
data, configures TLS socket right after Change Cipher Spec record.
While the configuration is in process, TCP socket receives small
Application Data record, Encrypted Alert record and FIN packet. So
the TCP socket changes sk_shutdown to RCV_SHUTDOWN and sk_flag with
SK_DONE bit set. The received data is not parsed upon arrival and is
never sent to user-space.
Patch unpauses parser directly if we have unparsed data in tcp
receive queue.
Fixes: fcf4793e27 ("tls: check RCV_SHUTDOWN in tls_wait_data")
Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru>
Link: https://lore.kernel.org/r/1605801588-12236-1-git-send-email-vfedorenko@novek.ru
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull iommu fixes from Will Deacon:
"Two straightforward vt-d fixes:
- Fix boot when intel iommu initialisation fails under TXT (tboot)
- Fix intel iommu compilation error when DMAR is enabled without ATS
and temporarily update IOMMU MAINTAINERs entry"
* tag 'iommu-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
MAINTAINERS: Temporarily add myself to the IOMMU entry
iommu/vt-d: Fix compile error with CONFIG_PCI_ATS not set
iommu/vt-d: Avoid panic if iommu init fails in tboot system
Pull MMC fixes from Ulf Hansson:
"A couple of MMC fixes:
- sdhci-of-arasan: Stabilize communication by fixing tap value configs
- sdhci-pci: Use SDR25 timing for HS mode for BYT-based Intel HWs"
* tag 'mmc-v5.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: sdhci-of-arasan: Issue DLL reset explicitly
mmc: sdhci-of-arasan: Use Mask writes for Tap delays
mmc: sdhci-of-arasan: Allow configuring zero tap values
mmc: sdhci-pci: Prefer SDR25 timing for High Speed mode for BYT-based Intel controllers
Pull sound fixes from Takashi Iwai:
"A collection of small fixes: the only core change is a minor error
code handling in the control API, and all the rest are device-specific
fixes, mostly quirks, fixups and ASoC Intel fixes.
It looks boring, and good so"
* tag 'sound-5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: mixart: Fix mutex deadlock
ALSA: hda/ca0132: Fix compile warning without PCI
ASOC: Intel: kbl_rt5663_rt5514_max98927: Do not try to disable disabled clock
ALSA: usb-audio: Add delay quirk for all Logitech USB devices
ASoC: Intel: catpt: Correct clock selection for dai trigger
ASoC: Intel: catpt: Skip position update for unprepared streams
ASoC: qcom: lpass-platform: Fix memory leak
ASoC: Intel: KMB: Fix S24_LE configuration
ALSA: hda: Add Alderlake-S PCI ID and HDMI codec vid
ALSA: usb-audio: Use ALC1220-VB-DT mapping for ASUS ROG Strix TRX40 mobo
ALSA: firewire: Clean up a locking issue in copy_resp_to_buf()
ASoC: rt1015: increase the time to detect BCLK
ALSA: ctl: fix error path at adding user-defined element set
ALSA: hda/realtek - HP Headset Mic can't detect after boot
ALSA: hda/realtek - Add supported mute Led for HP
ALSA: hda/realtek: Add some Clove SSID in the ALC293(ALC1220)
ALSA: hda/realtek - Add supported for Lenovo ThinkPad Headset Button
ASoC: rt1015: add delay to fix pop noise from speaker
Pull drm fixes from Dave Airlie:
"Weekly fixes pull.
This contains some fixes for sun4i/dw-hdmi probing, then amdgpu
enables arcturus hw without experimental flag and two other fixes and
a group of i915 fixes.
It also has a backported from next fix for the warn on reported in
ast/drm_gem_vram_helper code in the merge window. There's a separate
report which initially looked to be the same problem, but I'm going to
chase that up next week a bit more as I don't think the bisect landed
anywhere useful.
Summary:
core:
- vram helper TTM regression fix
amdgpu:
- Pageflip fix for navi1x with 5 or 6 displays
- Remove experimental flag for Arcturus
- Fix regression in atomic commit tail rework
i915:
- Fix tgl power gating issue
- Memory leak fixes
- Selftest fixes
- Display bpc fix
- Fix TGL MOCS for PTE tracking
dw-hdmi:
- probing fix
sun4i:
- probing fix"
* tag 'drm-fixes-2020-11-20-2' of git://anongit.freedesktop.org/drm/drm:
drm/i915/gt: Fixup tgl mocs for PTE tracking
drm/vram-helper: Fix use of top-down placement
drm/i915/gt: Remember to free the virtual breadcrumbs
drm/i915: Handle max_bpc==16
drm/amd/display: Always get CRTC updated constant values inside commit tail
drm/sun4i: backend: Fix probe failure with multiple backends
drm/sun4i: dw-hdmi: fix error return code in sun8i_dw_hdmi_bind()
drm/i915/selftests: Fix wrong return value of perf_request_latency()
drm/i915/selftests: Fix wrong return value of perf_series_engines()
drm/i915: Avoid memory leak with more than 16 workarounds on a list
drm/i915/tgl: Fix Media power gate sequence.
drm/amdgpu: remove experimental flag from arcturus
drm/amd/display: Add missing pflip irq for dcn2.0
drm/i915/gvt: return error when failing to take the module reference
drm: bridge: dw-hdmi: Avoid resetting force in the detect function
drm/i915/gvt: Set ENHANCED_FRAME_CAP bit
drm/i915/gvt: Temporarily disable vfio_edid for BXT/APL
I've discovered that due to the recent commit 49d7d695ca ("spi: dw:
Explicitly de-assert CS on SPI transfer completion") a concurrent usage of
the spidev devices with different chip-selects causes the "SPI transfer
timed out" error. The root cause of the problem has turned to be in a race
condition of the SPI-transfer execution procedure and the spi_setup()
method being called at the same time. In particular in calling the
spi_set_cs(false) while there is an SPI-transfer being executed. In my
case due to the commit cited above all CSs get to be switched off by
calling the spi_setup() for /dev/spidev0.1 while there is an concurrent
SPI-transfer execution performed on /dev/spidev0.0. Of course a situation
of the spi_setup() being called while there is an SPI-transfer being
executed for two different SPI peripheral devices of the same controller
may happen not only for the spidev driver, but for instance for MMC SPI +
some another device, or spi_setup() being called from an SPI-peripheral
probe method while some other device has already been probed and is being
used by a corresponding driver...
Of course I could have provided a fix affecting the DW APB SSI driver
only, for instance, by creating a mutual exclusive access to the set_cs
callback and setting/clearing only the bit responsible for the
corresponding chip-select. But after a short research I've discovered that
the problem most likely affects a lot of the other drivers:
- drivers/spi/spi-sun4i.c - RMW the chip-select register;
- drivers/spi/spi-rockchip.c - RMW the chip-select register;
- drivers/spi/spi-qup.c - RMW a generic force-CS flag in a CSR.
- drivers/spi/spi-sifive.c - set a generic CS-mode flag in a CSR.
- drivers/spi/spi-bcm63xx-hsspi.c - uses an internal mutex to serialize
the bus config changes, but still isn't protected from the race
condition described above;
- drivers/spi/spi-geni-qcom.c - RMW a chip-select internal flag and set the
CS state in HW;
- drivers/spi/spi-orion.c - RMW a chip-select register;
- drivers/spi/spi-cadence.c - RMW a chip-select register;
- drivers/spi/spi-armada-3700.c - RMW a chip-select register;
- drivers/spi/spi-lantiq-ssc.c - overwrites the chip-select register;
- drivers/spi/spi-sun6i.c - RMW a chip-select register;
- drivers/spi/spi-synquacer.c - RMW a chip-select register;
- drivers/spi/spi-altera.c - directly sets the chip-select state;
- drivers/spi/spi-omap2-mcspi.c - RMW an internally cached CS state and
writes it to HW;
- drivers/spi/spi-mt65xx.c - RMW some CSR;
- drivers/spi/spi-jcore.c - directly sets the chip-selects state;
- drivers/spi/spi-mt7621.c - RMW a chip-select register;
I could have missed some drivers, but a scale of the problem is obvious.
As you can see most of the drivers perform an unprotected
Read-modify-write chip-select register modification in the set_cs callback.
Seeing the spi_setup() function is calling the spi_set_cs() and it can be
executed concurrently with SPI-transfers exec procedure, which also calls
spi_set_cs() in the SPI core spi_transfer_one_message() method, the race
condition of the register modification turns to be obvious.
To sum up the problem denoted above affects each driver for a controller
having more than one chip-select lane and which:
1) performs the RMW to some CS-related register with no serialization;
2) directly disables any CS on spi_set_cs(dev, false).
* the later is the case of the DW APB SSI driver.
The controllers which equipped with a single CS theoretically can also
experience the problem, but in practice will not since normally the
spi_setup() isn't called concurrently with the SPI-transfers executed on
the same SPI peripheral device.
In order to generically fix the denoted bug I'd suggest to serialize an
access to the controller IO by taking the IO mutex in the spi_setup()
callback. The mutex is held while there is an SPI communication going on
on the SPI-bus of the corresponding SPI-controller. So calling the
spi_setup() method and disabling/updating the CS state within it would be
safe while there is no any SPI-transfers being executed. Also note I
suppose it would be safer to protect the spi_controller->setup() callback
invocation too, seeing some of the SPI-controller drivers update a HW
state in there.
Fixes: 49d7d695ca ("spi: dw: Explicitly de-assert CS on SPI transfer completion")
Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru>
Link: https://lore.kernel.org/r/20201117094517.5654-1-Sergey.Semin@baikalelectronics.ru
Signed-off-by: Mark Brown <broonie@kernel.org>
Commit 2f964780c0 ("USB: core: replace %p with %pK") used the %pK
format specifier for a bunch of __user pointers. But as the 'K' in
the specifier indicates, it is meant for kernel pointers. The reason
for the %pK specifier is to avoid leaks of kernel addresses, but when
the pointer is to an address in userspace the security implications
are minimal. In particular, no kernel information is leaked.
This patch changes the __user %pK specifiers (used in a bunch of
debugging output lines) to %px, which will always print the actual
address with no mangling. (Notably, there is no printk format
specifier particularly intended for __user pointers.)
Fixes: 2f964780c0 ("USB: core: replace %p with %pK")
CC: Vamsi Krishna Samavedam <vskrishn@codeaurora.org>
CC: <stable@vger.kernel.org>
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Link: https://lore.kernel.org/r/20201119170228.GB576844@rowland.harvard.edu
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 3e4f8e21c4 ("USB: core: fix check for duplicate endpoints")
aimed to make the USB stack more reliable by detecting and skipping
over endpoints that are duplicated between interfaces. This caused a
regression for a Hercules audio card (reported as Bugzilla #208357),
which contains such non-compliant duplications. Although the
duplications are harmless, skipping the valid endpoints prevented the
device from working.
This patch fixes the regression by adding ENDPOINT_IGNORE quirks for
the Hercules card, telling the kernel to ignore the invalid duplicate
endpoints and thereby allowing the valid endpoints to be used as
intended.
Fixes: 3e4f8e21c4 ("USB: core: fix check for duplicate endpoints")
CC: <stable@vger.kernel.org>
Reported-by: Alexander Chalikiopoulos <bugzilla.kernel.org@mrtoasted.com>
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Link: https://lore.kernel.org/r/20201119170040.GA576844@rowland.harvard.edu
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add a USB_QUIRK_DISCONNECT_SUSPEND quirk for the Lenovo TIO built-in
usb-audio. when A630Z going into S3,the system immediately wakeup 7-8
seconds later by usb-audio disconnect interrupt to avoids the issue.
eg dmesg:
....
[ 626.974091 ] usb 7-1.1: USB disconnect, device number 3
....
....
[ 1774.486691] usb 7-1.1: new full-speed USB device number 5 using xhci_hcd
[ 1774.947742] usb 7-1.1: New USB device found, idVendor=17ef, idProduct=a012, bcdDevice= 0.55
[ 1774.956588] usb 7-1.1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[ 1774.964339] usb 7-1.1: Product: Thinkcentre TIO24Gen3 for USB-audio
[ 1774.970999] usb 7-1.1: Manufacturer: Lenovo
[ 1774.975447] usb 7-1.1: SerialNumber: 000000000000
[ 1775.048590] usb 7-1.1: 2:1: cannot get freq at ep 0x1
.......
Seeking a better fix, we've tried a lot of things, including:
- Check that the device's power/wakeup is disabled
- Check that remote wakeup is off at the USB level
- All the quirks in drivers/usb/core/quirks.c
e.g. USB_QUIRK_RESET_RESUME,
USB_QUIRK_RESET,
USB_QUIRK_IGNORE_REMOTE_WAKEUP,
USB_QUIRK_NO_LPM.
but none of that makes any difference.
There are no errors in the logs showing any suspend/resume-related issues.
When the system wakes up due to the modem, log-wise it appears to be a
normal resume.
Introduce a quirk to disable the port during suspend when the modem is
detected.
Signed-off-by: penghao <penghao@uniontech.com>
Link: https://lore.kernel.org/r/20201118123039.11696-1-penghao@uniontech.com
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
of_count_icc_providers() function uses for_each_available_child_of_node()
helper to recursively check all the available nodes. This helper already
properly handles child nodes' reference count, so there is no need to do
it explicitly. Remove the excessive call to of_node_put(). This fixes
memory trashing when CONFIG_OF_DYNAMIC is enabled (for example
arm/multi_v7_defconfig).
Fixes: b1d681d8d3 ("interconnect: Add sync state support")
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://lore.kernel.org/r/20201119103746.32564-1-m.szyprowski@samsung.com
Signed-off-by: Georgi Djakov <georgi.djakov@linaro.org>
The following errors are noticed during boot on a QCS404 board:
[ 2.926647] qcom_icc_rpm_smd_send mas 6 error -6
[ 2.934573] qcom_icc_rpm_smd_send mas 8 error -6
These errors show when we try to configure the GPU and display nodes.
Since these particular nodes aren't supported on RPM and are purely
local, we should just change their mas_rpm_id to -1 to avoid any
requests being sent for these master IDs.
Reviewed-by: Mike Tipton <mdtipton@codeaurora.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20201118111044.26056-1-georgi.djakov@linaro.org
Signed-off-by: Georgi Djakov <georgi.djakov@linaro.org>
Some nodes are incorrectly marked as RPM-controlled (they have RPM
master and slave ids assigned), but are actually controlled by the
application CPU instead. The RPM complains when we send requests for
resources that it can't control. Let's fix this by replacing the IDs,
with the default "-1" in which case no requests are sent.
Reviewed-by: Mike Tipton <mdtipton@codeaurora.org>
Link: https://lore.kernel.org/r/20201112105140.10092-1-georgi.djakov@linaro.org
Signed-off-by: Georgi Djakov <georgi.djakov@linaro.org>
x86 Hyper-V used to essentially always overwrite the effective cache type
of guest memory accesses to WB. This was problematic in cases where there
is a physical device assigned to the VM, since that often requires that
the VM should have control over cache types. Thus, on newer Hyper-V since
2018, Hyper-V always honors the VM's cache type, but unexpectedly Linux VM
users start to complain that Linux VM's VRAM becomes very slow, and it
turns out that Linux VM should not map the VRAM uncacheable by ioremap().
Fix this slowness issue by using ioremap_cache().
On ARM64, ioremap_cache() is also required as the host also maps the VRAM
cacheable, otherwise VM Connect can't display properly with ioremap() or
ioremap_wc().
With this change, the VRAM on new Hyper-V is as fast as regular RAM, so
it's no longer necessary to use the hacks we added to mitigate the
slowness, i.e. we no longer need to allocate physical memory and use
it to back up the VRAM in Generation-1 VM, and we also no longer need to
allocate physical memory to back up the framebuffer in a Generation-2 VM
and copy the framebuffer to the real VRAM. A further big change will
address these for v5.11.
Fixes: 68a2d20b79 ("drivers/video: add Hyper-V Synthetic Video Frame Buffer Driver")
Tested-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Link: https://lore.kernel.org/r/20201118000305.24797-1-decui@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
The probe function is only supposed to initialize the controller
hardware but not the ECC engine. Indeed, we don't know anything about
the NAND chip(s) at this stage. Let's move the logic initializing the
ECC engine, even pretty simple, to the ->attach_chip() hook which gets
called during nand_scan() routine, after the NAND chip discovery. As
the previously mentioned logic is supposed to parse the DT for us, it
is likely that the chip->ecc.* entries be overwritten. So let's avoid
this by moving these lines to ->attach_chip().
Fixes: d7157ff49a ("mtd: rawnand: Use the ECC framework user input parsing bits")
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20201113123424.32233-20-miquel.raynal@bootlin.com
The probe function is only supposed to initialize the controller
hardware but not the ECC engine. Indeed, we don't know anything about
the NAND chip(s) at this stage. Let's move the logic initializing the
ECC engine, even pretty simple, to the ->attach_chip() hook which gets
called during nand_scan() routine, after the NAND chip discovery. As
the previously mentioned logic is supposed to parse the DT for us, it
is likely that the chip->ecc.* entries be overwritten. So let's avoid
this by moving these lines to ->attach_chip().
Fixes: d7157ff49a ("mtd: rawnand: Use the ECC framework user input parsing bits")
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
The probe function is only supposed to initialize the controller
hardware but not the ECC engine. Indeed, we don't know anything about
the NAND chip(s) at this stage. Let's move the logic initializing the
ECC engine, even pretty simple, to the ->attach_chip() hook which gets
called during nand_scan() routine, after the NAND chip discovery. As
the previously mentioned logic is supposed to parse the DT for us, it
is likely that the chip->ecc.* entries be overwritten. So let's avoid
this by moving these lines to ->attach_chip().
Fixes: d7157ff49a ("mtd: rawnand: Use the ECC framework user input parsing bits")
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Cc: Maxim Levitsky <maximlevitsky@gmail.com>
Link: https://lore.kernel.org/linux-mtd/20201113123424.32233-18-miquel.raynal@bootlin.com
The idea of the warning in ext4_update_dx_flag() is that we should warn
when we are clearing EXT4_INODE_INDEX on a filesystem with metadata
checksums enabled since after clearing the flag, checksums for internal
htree nodes will become invalid. So there's no need to warn (or actually
do anything) when EXT4_INODE_INDEX is not set.
Link: https://lore.kernel.org/r/20201118153032.17281-1-jack@suse.cz
Fixes: 48a3431195 ("ext4: fix checksum errors with indexed dirs")
Reported-by: Eric Biggers <ebiggers@kernel.org>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
If UFS host device is in runtime-suspended state while UFS shutdown
callback is invoked, UFS device shall be resumed for register
accesses. Currently only UFS local runtime resume function will be invoked
to wake up the host. This is not enough because if someone triggers
runtime resume from block layer, then race may happen between shutdown and
runtime resume flow, and finally lead to unlocked register access.
To fix this, in ufshcd_shutdown(), use pm_runtime_get_sync() instead of
resuming UFS device by ufshcd_runtime_resume() "internally" to let runtime
PM framework manage the whole resume flow.
Link: https://lore.kernel.org/r/20201119062916.12931-1-stanley.chu@mediatek.com
Fixes: 57d104c153 ("ufs: add UFS power management support")
Reviewed-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Stanley Chu <stanley.chu@mediatek.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
- Fix tgl power gating issue (Rodrigo)
- Memory leak fixes (Tvrtko, Chris)
- Selftest fixes (Zhang)
- Display bpc fix (Ville)
- Fix TGL MOCS for PTE tracking (Chris)
GVT Fixes: It temporarily disables VFIO edid
feature on BXT/APL until its virtual display is really fixed to make
it work properly. And fixes for DPCD 1.2 and error return in taking
module reference.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201119203417.GA1795798@intel.com
This reverts commit 6ff646b2ce.
Your maintainer committed a major braino in the rmap code by adding the
attr fork, bmbt, and unwritten extent usage bits into rmap record key
comparisons. While XFS uses the usage bits *in the rmap records* for
cross-referencing metadata in xfs_scrub and xfs_repair, it only needs
the owner and offset information to distinguish between reverse mappings
of the same physical extent into the data fork of a file at multiple
offsets. The other bits are not important for key comparisons for index
lookups, and never have been.
Eric Sandeen reports that this causes regressions in generic/299, so
undo this patch before it does more damage.
Reported-by: Eric Sandeen <sandeen@sandeen.net>
Fixes: 6ff646b2ce ("xfs: fix rmap key and record comparison functions")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Pull networking fixes from Jakub Kicinski:
"Networking fixes for 5.10-rc5, including fixes from the WiFi
(mac80211), can and bpf (including the strncpy_from_user fix).
Current release - regressions:
- mac80211: fix memory leak of filtered powersave frames
- mac80211: free sta in sta_info_insert_finish() on errors to avoid
sleeping in atomic context
- netlabel: fix an uninitialized variable warning added in -rc4
Previous release - regressions:
- vsock: forward all packets to the host when no H2G is registered,
un-breaking AWS Nitro Enclaves
- net: Exempt multicast addresses from five-second neighbor lifetime
requirement, decreasing the chances neighbor tables fill up
- net/tls: fix corrupted data in recvmsg
- qed: fix ILT configuration of SRC block
- can: m_can: process interrupt only when not runtime suspended
Previous release - always broken:
- page_frag: Recover from memory pressure by not recycling pages
allocating from the reserves
- strncpy_from_user: Mask out bytes after NUL terminator
- ip_tunnels: Set tunnel option flag only when tunnel metadata is
present, always setting it confuses Open vSwitch
- bpf, sockmap:
- Fix partial copy_page_to_iter so progress can still be made
- Fix socket memory accounting and obeying SO_RCVBUF
- net: Have netpoll bring-up DSA management interface
- net: bridge: add missing counters to ndo_get_stats64 callback
- tcp: brr: only postpone PROBE_RTT if RTT is < current min_rtt
- enetc: Workaround MDIO register access HW bug
- net/ncsi: move netlink family registration to a subsystem init,
instead of tying it to driver probe
- net: ftgmac100: unregister NC-SI when removing driver to avoid
crash
- lan743x:
- prevent interrupt storm on open
- fix freeing skbs in the wrong context
- net/mlx5e: Fix socket refcount leak on kTLS RX resync
- net: dsa: mv88e6xxx: Avoid VLAN database corruption on 6097
- fix 21 unset return codes and other mistakes on error paths, mostly
detected by the Hulk Robot"
* tag 'net-5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (115 commits)
fail_function: Remove a redundant mutex unlock
selftest/bpf: Test bpf_probe_read_user_str() strips trailing bytes after NUL
lib/strncpy_from_user.c: Mask out bytes after NUL terminator.
net/smc: fix direct access to ib_gid_addr->ndev in smc_ib_determine_gid()
net/smc: fix matching of existing link groups
ipv6: Remove dependency of ipv6_frag_thdr_truncated on ipv6 module
libbpf: Fix VERSIONED_SYM_COUNT number parsing
net/mlx4_core: Fix init_hca fields offset
atm: nicstar: Unmap DMA on send error
page_frag: Recover from memory pressure
net: dsa: mv88e6xxx: Wait for EEPROM done after HW reset
mlxsw: core: Use variable timeout for EMAD retries
mlxsw: Fix firmware flashing
net: Have netpoll bring-up DSA management interface
atl1e: fix error return code in atl1e_probe()
atl1c: fix error return code in atl1c_probe()
ah6: fix error return code in ah6_input()
net: usb: qmi_wwan: Set DTR quirk for MR400
can: m_can: process interrupt only when not runtime suspended
can: flexcan: flexcan_chip_start(): fix erroneous flexcan_transceiver_enable() during bus-off recovery
...
Pull rdma fixes from Jason Gunthorpe:
"The last two weeks have been quiet here, just the usual smattering of
long standing bug fixes.
A collection of error case bug fixes:
- Improper nesting of spinlock types in cm
- Missing error codes and kfree()
- Ensure dma_virt_ops users have the right kconfig symbols to work
properly
- Compilation failure of tools/testing"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
tools/testing/scatterlist: Fix test to compile and run
IB/hfi1: Fix error return code in hfi1_init_dd()
RMDA/sw: Don't allow drivers using dma_virt_ops on highmem configs
RDMA/pvrdma: Fix missing kfree() in pvrdma_register_device()
RDMA/cm: Make the local_id_table xarray non-irq
The probe function is only supposed to initialize the controller
hardware but not the ECC engine. Indeed, we don't know anything about
the NAND chip(s) at this stage. Let's move the logic initializing the
ECC engine, even pretty simple, to the ->attach_chip() hook which gets
called during nand_scan() routine, after the NAND chip discovery. As
the previously mentioned logic is supposed to parse the DT for us, it
is likely that the chip->ecc.* entries be overwritten. So let's avoid
this by moving these lines to ->attach_chip().
Fixes: d7157ff49a ("mtd: rawnand: Use the ECC framework user input parsing bits")
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20201113123424.32233-17-miquel.raynal@bootlin.com
The probe function is only supposed to initialize the controller
hardware but not the ECC engine. Indeed, we don't know anything about
the NAND chip(s) at this stage. Let's move the logic initializing the
ECC engine, even pretty simple, to the ->attach_chip() hook which gets
called during nand_scan() routine, after the NAND chip discovery. As
the previously mentioned logic is supposed to parse the DT for us, it
is likely that the chip->ecc.* entries be overwritten. So let's avoid
this by moving these lines to ->attach_chip().
Fixes: d7157ff49a ("mtd: rawnand: Use the ECC framework user input parsing bits")
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20201113123424.32233-16-miquel.raynal@bootlin.com
The probe function is only supposed to initialize the controller
hardware but not the ECC engine. Indeed, we don't know anything about
the NAND chip(s) at this stage. Let's move the logic initializing the
ECC engine, even pretty simple, to the ->attach_chip() hook which gets
called during nand_scan() routine, after the NAND chip discovery. As
the previously mentioned logic is supposed to parse the DT for us, it
is likely that the chip->ecc.* entries be overwritten. So let's avoid
this by moving these lines to ->attach_chip().
Fixes: d7157ff49a ("mtd: rawnand: Use the ECC framework user input parsing bits")
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20201113123424.32233-15-miquel.raynal@bootlin.com
The probe function is only supposed to initialize the controller
hardware but not the ECC engine. Indeed, we don't know anything about
the NAND chip(s) at this stage. Let's move the logic initializing the
ECC engine, even pretty simple, to the ->attach_chip() hook which gets
called during nand_scan() routine, after the NAND chip discovery. As
the previously mentioned logic is supposed to parse the DT for us, it
is likely that the chip->ecc.* entries be overwritten. So let's avoid
this by moving these lines to ->attach_chip().
Fixes: d7157ff49a ("mtd: rawnand: Use the ECC framework user input parsing bits")
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20201113123424.32233-14-miquel.raynal@bootlin.com
The probe function is only supposed to initialize the controller
hardware but not the ECC engine. Indeed, we don't know anything about
the NAND chip(s) at this stage. Let's move the logic initializing the
ECC engine, even pretty simple, to the ->attach_chip() hook which gets
called during nand_scan() routine, after the NAND chip discovery. As
the previously mentioned logic is supposed to parse the DT for us, it
is likely that the chip->ecc.* entries be overwritten. So let's avoid
this by moving these lines to ->attach_chip().
Fixes: d7157ff49a ("mtd: rawnand: Use the ECC framework user input parsing bits")
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20201113123424.32233-13-miquel.raynal@bootlin.com
The probe function is only supposed to initialize the controller
hardware but not the ECC engine. Indeed, we don't know anything about
the NAND chip(s) at this stage. Let's move the logic initializing the
ECC engine, even pretty simple, to the ->attach_chip() hook which gets
called during nand_scan() routine, after the NAND chip discovery. As
the previously mentioned logic is supposed to parse the DT for us, it
is likely that the chip->ecc.* entries be overwritten. So let's avoid
this by moving these lines to ->attach_chip().
Fixes: d7157ff49a ("mtd: rawnand: Use the ECC framework user input parsing bits")
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20201113123424.32233-12-miquel.raynal@bootlin.com
The probe function is only supposed to initialize the controller
hardware but not the ECC engine. Indeed, we don't know anything about
the NAND chip(s) at this stage. Let's move the logic initializing the
ECC engine, even pretty simple, to the ->attach_chip() hook which gets
called during nand_scan() routine, after the NAND chip discovery. As
the previously mentioned logic is supposed to parse the DT for us, it
is likely that the chip->ecc.* entries be overwritten. So let's avoid
this by moving these lines to ->attach_chip().
Fixes: d7157ff49a ("mtd: rawnand: Use the ECC framework user input parsing bits")
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Cc: Vladimir Zapolskiy <vz@mleia.com>
Cc: Sylvain Lemieux <slemieux.tyco@gmail.com>
The probe function is only supposed to initialize the controller
hardware but not the ECC engine. Indeed, we don't know anything about
the NAND chip(s) at this stage. Let's move the logic initializing the
ECC engine, even pretty simple, to the ->attach_chip() hook which gets
called during nand_scan() routine, after the NAND chip discovery. As
the previously mentioned logic is supposed to parse the DT for us, it
is likely that the chip->ecc.* entries be overwritten. So let's avoid
this by moving these lines to ->attach_chip().
Fixes: d7157ff49a ("mtd: rawnand: Use the ECC framework user input parsing bits")
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Cc: Vladimir Zapolskiy <vz@mleia.com>
Cc: Sylvain Lemieux <slemieux.tyco@gmail.com>
Link: https://lore.kernel.org/linux-mtd/20201113123424.32233-10-miquel.raynal@bootlin.com
The probe function is only supposed to initialize the controller
hardware but not the ECC engine. Indeed, we don't know anything about
the NAND chip(s) at this stage. Let's move the logic initializing the
ECC engine, even pretty simple, to the ->attach_chip() hook which gets
called during nand_scan() routine, after the NAND chip discovery. As
the previously mentioned logic is supposed to parse the DT for us, it
is likely that the chip->ecc.* entries be overwritten. So let's avoid
this by moving these lines to ->attach_chip().
Fixes: d7157ff49a ("mtd: rawnand: Use the ECC framework user input parsing bits")
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
The probe function is only supposed to initialize the controller
hardware but not the ECC engine. Indeed, we don't know anything about
the NAND chip(s) at this stage. Let's move the logic initializing the
ECC engine, even pretty simple, to the ->attach_chip() hook which gets
called during nand_scan() routine, after the NAND chip discovery. As
the previously mentioned logic is supposed to parse the DT for us, it
is likely that the chip->ecc.* entries be overwritten. So let's avoid
this by moving these lines to ->attach_chip().
Fixes: d7157ff49a ("mtd: rawnand: Use the ECC framework user input parsing bits")
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20201113123424.32233-8-miquel.raynal@bootlin.com
The probe function is only supposed to initialize the controller
hardware but not the ECC engine. Indeed, we don't know anything about
the NAND chip(s) at this stage. Let's move the logic initializing the
ECC engine, even pretty simple, to the ->attach_chip() hook which gets
called during nand_scan() routine, after the NAND chip discovery. As
the previously mentioned logic is supposed to parse the DT for us, it
is likely that the chip->ecc.* entries be overwritten. So let's avoid
this by moving these lines to ->attach_chip().
Fixes: d7157ff49a ("mtd: rawnand: Use the ECC framework user input parsing bits")
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Cc: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Link: https://lore.kernel.org/linux-mtd/20201113123424.32233-7-miquel.raynal@bootlin.com
The probe function is only supposed to initialize the controller
hardware but not the ECC engine. Indeed, we don't know anything about
the NAND chip(s) at this stage. Let's move the logic initializing the
ECC engine, even pretty simple, to the ->attach_chip() hook which gets
called during nand_scan() routine, after the NAND chip discovery. As
the previously mentioned logic is supposed to parse the DT for us, it
is likely that the chip->ecc.* entries be overwritten. So let's avoid
this by moving these lines to ->attach_chip().
Fixes: d7157ff49a ("mtd: rawnand: Use the ECC framework user input parsing bits")
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
The options in /proc/mounts must be valid mount options --- and
fast_commit is not a mount option. Otherwise, command sequences like
this will fail:
# mount /dev/vdc /vdc
# mkdir -p /vdc/phoronix_test_suite /pts
# mount --bind /vdc/phoronix_test_suite /pts
# mount -o remount,nodioread_nolock /pts
mount: /pts: mount point not mounted or bad option.
And in the system logs, you'll find:
EXT4-fs (vdc): Unrecognized mount option "fast_commit" or missing value
Fixes: 995a3ed67f ("ext4: add fast_commit feature and handling for extended mount options")
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The probe function is only supposed to initialize the controller
hardware but not the ECC engine. Indeed, we don't know anything about
the NAND chip(s) at this stage. Let's move the logic initializing the
ECC engine, even pretty simple, to the ->attach_chip() hook which gets
called during nand_scan() routine, after the NAND chip discovery. As
the previously mentioned logic is supposed to parse the DT for us, it
is likely that the chip->ecc.* entries be overwritten. So let's avoid
this by moving these lines to ->attach_chip().
Fixes: d7157ff49a ("mtd: rawnand: Use the ECC framework user input parsing bits")
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20201113123424.32233-5-miquel.raynal@bootlin.com
The probe function is only supposed to initialize the controller
hardware but not the ECC engine. Indeed, we don't know anything about
the NAND chip(s) at this stage. Let's move the logic initializing the
ECC engine, even pretty simple, to the ->attach_chip() hook which gets
called during nand_scan() routine, after the NAND chip discovery. As
the previously mentioned logic is supposed to parse the DT for us, it
is likely that the chip->ecc.* entries be overwritten. So let's avoid
this by moving these lines to ->attach_chip().
Fixes: d7157ff49a ("mtd: rawnand: Use the ECC framework user input parsing bits")
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20201113123424.32233-4-miquel.raynal@bootlin.com
The probe function is only supposed to initialize the controller
hardware but not the ECC engine. Indeed, we don't know anything about
the NAND chip(s) at this stage. Let's move the logic initializing the
ECC engine, even pretty simple, to the ->attach_chip() hook which gets
called during nand_scan() routine, after the NAND chip discovery. As
the previously mentioned logic is supposed to parse the DT for us, it
is likely that the chip->ecc.* entries be overwritten. So let's avoid
this by moving these lines to ->attach_chip(), a NAND controller
hook.
Fixes: d7157ff49a ("mtd: rawnand: Use the ECC framework user input parsing bits")
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20201113123424.32233-3-miquel.raynal@bootlin.com
The probe function is only supposed to initialize the controller
hardware but not the ECC engine. Indeed, we don't know anything about
the NAND chip(s) at this stage. Let's move the logic initializing the
ECC engine, even pretty simple, to the ->attach_chip() hook which gets
called during nand_scan() routine, after the NAND chip discovery. As
the previously mentioned logic is supposed to parse the DT for us, it
is likely that the chip->ecc.* entries be overwritten. So let's avoid
this by moving these lines to ->attach_chip(), a NAND controller
hook.
Fixes: d7157ff49a ("mtd: rawnand: Use the ECC framework user input parsing bits")
Reported-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Tested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Link: https://lore.kernel.org/linux-mtd/20201113123424.32233-2-miquel.raynal@bootlin.com
Alexei Starovoitov says:
====================
1) libbpf should not attempt to load unused subprogs, from Andrii.
2) Make strncpy_from_user() mask out bytes after NUL terminator, from Daniel.
3) Relax return code check for subprograms in the BPF verifier, from Dmitrii.
4) Fix several sockmap issues, from John.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
fail_function: Remove a redundant mutex unlock
selftest/bpf: Test bpf_probe_read_user_str() strips trailing bytes after NUL
lib/strncpy_from_user.c: Mask out bytes after NUL terminator.
libbpf: Fix VERSIONED_SYM_COUNT number parsing
bpf, sockmap: Avoid failures from skb_to_sgvec when skb has frag_list
bpf, sockmap: Handle memory acct if skb_verdict prog redirects to self
bpf, sockmap: Avoid returning unneeded EAGAIN when redirecting to self
bpf, sockmap: Use truesize with sk_rmem_schedule()
bpf, sockmap: Ensure SO_RCVBUF memory is observed on ingress redirect
bpf, sockmap: Fix partial copy_page_to_iter so progress can still be made
selftests/bpf: Fix error return code in run_getsockopt_test()
bpf: Relax return code check for subprograms
tools, bpftool: Add missing close before bpftool net attach exit
MAINTAINERS/bpf: Update Andrii's entry.
selftests/bpf: Fix unused attribute usage in subprogs_unused test
bpf: Fix unsigned 'datasec_id' compared with zero in check_pseudo_btf_id
bpf: Fix passing zero to PTR_ERR() in bpf_btf_printf_prepare
libbpf: Don't attempt to load unused subprog as an entry-point BPF program
====================
Link: https://lore.kernel.org/r/20201119200721.288-1-alexei.starovoitov@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Daniel Xu says:
====================
6ae08ae3de ("bpf: Add probe_read_{user, kernel} and probe_read_{user,
kernel}_str helpers") introduced a subtle bug where
bpf_probe_read_user_str() would potentially copy a few extra bytes after
the NUL terminator.
This issue is particularly nefarious when strings are used as map keys,
as seemingly identical strings can occupy multiple entries in a map.
This patchset fixes the issue and introduces a selftest to prevent
future regressions.
v6 -> v7:
* Add comments
v5 -> v6:
* zero-pad up to sizeof(unsigned long) after NUL
v4 -> v5:
* don't read potentially uninitialized memory
v3 -> v4:
* directly pass userspace pointer to prog
* test more strings of different length
v2 -> v3:
* set pid filter before attaching prog in selftest
* use long instead of int as bpf_probe_read_user_str() retval
* style changes
v1 -> v2:
* add Fixes: tag
* add selftest
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
do_strncpy_from_user() may copy some extra bytes after the NUL
terminator into the destination buffer. This usually does not matter for
normal string operations. However, when BPF programs key BPF maps with
strings, this matters a lot.
A BPF program may read strings from user memory by calling the
bpf_probe_read_user_str() helper which eventually calls
do_strncpy_from_user(). The program can then key a map with the
destination buffer. BPF map keys are fixed-width and string-agnostic,
meaning that map keys are treated as a set of bytes.
The issue is when do_strncpy_from_user() overcopies bytes after the NUL
terminator, it can result in seemingly identical strings occupying
multiple slots in a BPF map. This behavior is subtle and totally
unexpected by the user.
This commit masks out the bytes following the NUL while preserving
long-sized stride in the fast path.
Fixes: 6ae08ae3de ("bpf: Add probe_read_{user, kernel} and probe_read_{user, kernel}_str helpers")
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/21efc982b3e9f2f7b0379eed642294caaa0c27a7.1605642949.git.dxu@dxuuu.xyz
Pull powerpc fixes from Michael Ellerman:
"Fixes for CVE-2020-4788.
From Daniel's cover letter:
IBM Power9 processors can speculatively operate on data in the L1
cache before it has been completely validated, via a way-prediction
mechanism. It is not possible for an attacker to determine the
contents of impermissible memory using this method, since these
systems implement a combination of hardware and software security
measures to prevent scenarios where protected data could be leaked.
However these measures don't address the scenario where an attacker
induces the operating system to speculatively execute instructions
using data that the attacker controls. This can be used for example to
speculatively bypass "kernel user access prevention" techniques, as
discovered by Anthony Steinhauser of Google's Safeside Project. This
is not an attack by itself, but there is a possibility it could be
used in conjunction with side-channels or other weaknesses in the
privileged code to construct an attack.
This issue can be mitigated by flushing the L1 cache between privilege
boundaries of concern.
This patch series flushes the L1 cache on kernel entry (patch 2) and
after the kernel performs any user accesses (patch 3). It also adds a
self-test and performs some related cleanups"
* tag 'powerpc-cve-2020-4788' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/64s: rename pnv|pseries_setup_rfi_flush to _setup_security_mitigations
selftests/powerpc: refactor entry and rfi_flush tests
selftests/powerpc: entry flush test
powerpc: Only include kup-radix.h for 64-bit Book3S
powerpc/64s: flush L1D after user accesses
powerpc/64s: flush L1D on kernel entry
selftests/powerpc: rfi_flush: disable entry flush if present
Pull xtensa fixes from Max Filippov:
- fix placement of cache alias remapping area
- disable preemption around cache alias management calls
- add missing __user annotation to strncpy_from_user argument
* tag 'xtensa-20201119' of git://github.com/jcmvbkbc/linux-xtensa:
xtensa: uaccess: Add missing __user to strncpy_from_user() prototype
xtensa: disable preemption around cache alias management calls
xtensa: fix TLBTEMP area placement
Commit 7053e0eab4 ("drm/vram-helper: stop using TTM placement flags")
cleared the BO placement flags if top-down placement had been selected.
Hence, BOs that were supposed to go into VRAM are now placed in a default
location in system memory.
Trying to scanout the incorrectly pinned BO results in displayed garbage
and an error message.
[ 146.108127] ------------[ cut here ]------------
[ 146.1V08180] WARNING: CPU: 0 PID: 152 at drivers/gpu/drm/drm_gem_vram_helper.c:284 drm_gem_vram_offset+0x59/0x60 [drm_vram_helper]
...
[ 146.108591] ast_cursor_page_flip+0x3e/0x150 [ast]
[ 146.108622] ast_cursor_plane_helper_atomic_update+0x8a/0xc0 [ast]
[ 146.108654] drm_atomic_helper_commit_planes+0x197/0x4c0
[ 146.108699] drm_atomic_helper_commit_tail_rpm+0x59/0xa0
[ 146.108718] commit_tail+0x103/0x1c0
...
[ 146.109302] ---[ end trace d901a1ba1d949036 ]---
Fix the bug by keeping the placement flags. The top-down placement flag
is stored in a separate variable.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Christian König <christian.koenig@amd.com>
Fixes: 7053e0eab4 ("drm/vram-helper: stop using TTM placement flags")
Reported-by: Pu Wen <puwen@hygon.cn> [for 5.10-rc1]
Tested-by: Pu Wen <puwen@hygon.cn>
Cc: Christian König <christian.koenig@amd.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Maxime Ripard <mripard@kernel.org>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: dri-devel@lists.freedesktop.org
Link: https://patchwork.freedesktop.org/patch/msgid/20200921142536.4392-1-tzimmermann@suse.de
(cherry picked from commit b8f8dbf649)
[pulled into fixes from drm-next]
Signed-off-by: Dave Airlie <airlied@redhat.com>
Pull ACPI fixes from Rafael Wysocki:
"These fix recent regression in the APEI code and initialization issue
in the ACPI fan driver.
Specifics:
- Make the APEI code avoid attempts to obtain logical addresses for
registers located in the I/O address space to fix initialization
issues (Aili Yao)
- Fix sysfs attribute initialization in the ACPI fan driver (Guenter
Roeck)"
* tag 'acpi-5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI, APEI, Fix error return value in apei_map_generic_address()
ACPI: fan: Initialize performance state sysfs attribute
Pull power management fixes from Rafael Wysocki:
"These fix two issues in ARM cpufreq drivers and one cpuidle driver
issue.
Specifics:
- Add missing RCU_NONIDLE() annotations to the Tegra cpuidle driver
(Dmitry Osipenko)
- Fix boot frequency computation in the tegra186 cpufreq driver (Jon
Hunter)
- Make the SCMI cpufreq driver register a dummy clock provider to
avoid OPP addition failures (Sudeep Holla)"
* tag 'pm-5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: scmi: Fix OPP addition failure with a dummy clock provider
cpufreq: tegra186: Fix get frequency callback
cpuidle: tegra: Annotate tegra_pm_set_cpu_in_lp2() with RCU_NONIDLE
Pull spi fixes from Mark Brown:
"This is a relatively large set of fixes, the bulk of it being a series
from Lukas Wunner which fixes confusion with the lifetime of driver
data allocated along with the SPI controller structure that's been
created as part of the conversion to devm APIs.
The simplest fix, explained in detail in Lukas' commit message, is to
move to a devm_ function for allocation of the controller and hence
driver data in order to push the free of that after anything tries to
reference the driver data in the remove path. This results in a
relatively large diff due to the addition of a new function but isn't
particularly complex.
There's also a fix from Sven van Asbroeck which fixes yet more fallout
from the conflicts between the various different places one can
configure the polarity of GPIOs in modern systems.
Otherwise everything is fairly small and driver specific"
* tag 'spi-fix-v5.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: npcm-fiu: Don't leak SPI master in probe error path
spi: dw: Set transfer handler before unmasking the IRQs
spi: cadence-quadspi: Fix error return code in cqspi_probe
spi: bcm2835aux: Restore err assignment in bcm2835aux_spi_probe
spi: lpspi: Fix use-after-free on unbind
spi: bcm-qspi: Fix use-after-free on unbind
spi: bcm2835aux: Fix use-after-free on unbind
spi: bcm2835: Fix use-after-free on unbind
spi: Introduce device-managed SPI controller allocation
spi: fsi: Fix transfer returning without finalizing message
spi: fix client driver breakages when using GPIO descriptors
Karsten Graul says:
====================
net/smc: fixes 2020-11-18
Patch 1 fixes the matching of link groups because with SMC-Dv2 the vlanid
should no longer be part of this matching. Patch 2 removes a sparse message.
====================
Link: https://lore.kernel.org/r/20201118214038.24039-1-kgraul@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sparse complaints 3 times about:
net/smc/smc_ib.c:203:52: warning: incorrect type in argument 1 (different address spaces)
net/smc/smc_ib.c:203:52: expected struct net_device const *dev
net/smc/smc_ib.c:203:52: got struct net_device [noderef] __rcu *const ndev
Fix that by using the existing and validated ndev variable instead of
accessing attr->ndev directly.
Fixes: 5102eca903 ("net/smc: Use rdma_read_gid_l2_fields to L2 fields")
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
With the multi-subnet support of SMC-Dv2 the match for existing link
groups should not include the vlanid of the network device.
Set ini->smcd_version accordingly before the call to smc_conn_create()
and use this value in smc_conn_create() to skip the vlanid check.
Fixes: 5c21c4ccaf ("net/smc: determine accepted ISM devices")
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
ASoC: Fixes for v5.11
A collection of driver specific fixes, mostly for x86 systems (or CODECs
used mostly on x86) and all for relatively minor issues, the biggest one
being fixing S24_LE format on Keem Bay systems.
Pull regulator fixes from Mark Brown:
"Mostly core fixes here, one set from Michał Mirosław which cleans up
some issues introduced as part of the coupled regulators work, one
memory leak during probe and two due to regulators which have an input
supply name and regulator name which are identical, which is very
unusual.
There's also a fix for our handling of the similarly unusual case
where we can't determine if a regulator is enabled during boot"
* tag 'regulator-fix-v5.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: ti-abb: Fix array out of bound read access on the first transition
regulator: workaround self-referent regulators
regulator: avoid resolve_supply() infinite recursion
regulator: fix memory leak with repeated set_machine_constraints()
regulator: pfuze100: limit pfuze-support-disable-sw to pfuze{100,200}
regulator: core: don't disable regulator if is_enabled return error.
IPV6=m
NF_DEFRAG_IPV6=y
ld: net/ipv6/netfilter/nf_conntrack_reasm.o: in function
`nf_ct_frag6_gather':
net/ipv6/netfilter/nf_conntrack_reasm.c:462: undefined reference to
`ipv6_frag_thdr_truncated'
Netfilter is depending on ipv6 symbol ipv6_frag_thdr_truncated. This
dependency is forcing IPV6=y.
Remove this dependency by moving ipv6_frag_thdr_truncated out of ipv6. This
is the same solution as used with a similar issues: Referring to
commit 70b095c843 ("ipv6: remove dependency of nf_defrag_ipv6 on ipv6
module")
Fixes: 9d9e937b1c ("ipv6/netfilter: Discard first fragment not including all headers")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Georg Kohmann <geokohma@cisco.com>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Link: https://lore.kernel.org/r/20201119095833.8409-1-geokohma@cisco.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull thermal fix from Daniel Lezcano:
"Disable the CPU PM notifier for OMAP4430 for suspend in order to
prevent wrong temperature leading to a critical shutdown (Peter
Ujfalusi)"
* tag 'thermal-v5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux:
thermal: ti-soc-thermal: Disable the CPU PM notifier for OMAP4430
The code change for switching to non-atomic mode brought the
unexpected mutex deadlock in get_msg(). It converted the spinlock
with the existing mutex, but there were calls with the already holding
the mutex. Since the only place that needs the extra lock is the code
path from snd_mixart_send_msg(), remove the mutex lock in get_msg()
and apply in the caller side for fixing the mutex deadlock.
Fixes: 8d3a8b5cb5 ("ALSA: mixart: Use nonatomic PCM ops")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20201119121440.18945-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Jens has reported a situation where partial direct IOs can be issued
and completed yet still return -EAGAIN. We don't want this to report
a short IO as we want XFS to complete user DIO entirely or not at
all.
This partial IO situation can occur on a write IO that is split
across an allocated extent and a hole, and the second mapping is
returning EAGAIN because allocation would be required.
The trivial reproducer:
$ sudo xfs_io -fdt -c "pwrite 0 4k" -c "pwrite -V 1 -b 8k -N 0 8k" /mnt/scr/foo
wrote 4096/4096 bytes at offset 0
4 KiB, 1 ops; 0.0001 sec (27.509 MiB/sec and 7042.2535 ops/sec)
pwrite: Resource temporarily unavailable
$
The pwritev2(0, 8kB, RWF_NOWAIT) call returns EAGAIN having done
the first 4kB write:
xfs_file_direct_write: dev 259:1 ino 0x83 size 0x1000 offset 0x0 count 0x2000
iomap_apply: dev 259:1 ino 0x83 pos 0 length 8192 flags WRITE|DIRECT|NOWAIT (0x31) ops xfs_direct_write_iomap_ops caller iomap_dio_rw actor iomap_dio_actor
xfs_ilock_nowait: dev 259:1 ino 0x83 flags ILOCK_SHARED caller xfs_ilock_for_iomap
xfs_iunlock: dev 259:1 ino 0x83 flags ILOCK_SHARED caller xfs_direct_write_iomap_begin
xfs_iomap_found: dev 259:1 ino 0x83 size 0x1000 offset 0x0 count 8192 fork data startoff 0x0 startblock 24 blockcount 0x1
iomap_apply_dstmap: dev 259:1 ino 0x83 bdev 259:1 addr 102400 offset 0 length 4096 type MAPPED flags DIRTY
Here the first iomap loop has mapped the first 4kB of the file and
issued the IO, and we enter the second iomap_apply loop:
iomap_apply: dev 259:1 ino 0x83 pos 4096 length 4096 flags WRITE|DIRECT|NOWAIT (0x31) ops xfs_direct_write_iomap_ops caller iomap_dio_rw actor iomap_dio_actor
xfs_ilock_nowait: dev 259:1 ino 0x83 flags ILOCK_SHARED caller xfs_ilock_for_iomap
xfs_iunlock: dev 259:1 ino 0x83 flags ILOCK_SHARED caller xfs_direct_write_iomap_begin
And we exit with -EAGAIN out because we hit the allocate case trying
to make the second 4kB block.
Then IO completes on the first 4kB and the original IO context
completes and unlocks the inode, returning -EAGAIN to userspace:
xfs_end_io_direct_write: dev 259:1 ino 0x83 isize 0x1000 disize 0x1000 offset 0x0 count 4096
xfs_iunlock: dev 259:1 ino 0x83 flags IOLOCK_SHARED caller xfs_file_dio_aio_write
There are other vectors to the same problem when we re-enter the
mapping code if we have to make multiple mappinfs under NOWAIT
conditions. e.g. failing trylocks, COW extents being found,
allocation being required, and so on.
Avoid all these potential problems by only allowing IOMAP_NOWAIT IO
to go ahead if the mapping we retrieve for the IO spans an entire
allocated extent. This avoids the possibility of subsequent mappings
to complete the IO from triggering NOWAIT semantics by any means as
NOWAIT IO will now only enter the mapping code once per NOWAIT IO.
Reported-and-tested-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
We remove "other info" from "readelf -s --wide" output when
parsing GLOBAL_SYM_COUNT variable, which was added in [1].
But we don't do that for VERSIONED_SYM_COUNT and it's failing
the check_abi target on powerpc Fedora 33.
The extra "other info" wasn't problem for VERSIONED_SYM_COUNT
parsing until commit [2] added awk in the pipe, which assumes
that the last column is symbol, but it can be "other info".
Adding "other info" removal for VERSIONED_SYM_COUNT the same
way as we did for GLOBAL_SYM_COUNT parsing.
[1] aa915931ac ("libbpf: Fix readelf output parsing for Fedora")
[2] 746f534a48 ("tools/libbpf: Avoid counting local symbols in ABI check")
Fixes: 746f534a48 ("tools/libbpf: Avoid counting local symbols in ABI check")
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20201118211350.1493421-1-jolsa@kernel.org
Some users are pairing the Dinovo keyboards with the MX5000 or MX5500
receivers, instead of with the Dinovo receivers. The receivers are
mostly the same (and the air protocol obviously is compatible) but
currently the Dinovo receivers are handled by hid-lg.c while the
MX5x00 receivers are handled by logitech-dj.c.
When using a Dinovo keyboard, with its builtin touchpad, through
logitech-dj.c then the touchpad stops working because when asking the
receiver for paired devices, we get only 1 paired device with
a device_type of REPORT_TYPE_KEYBOARD. And since we don't see a paired
mouse, we have nowhere to send mouse-events to, so we drop them.
Extend the existing fix for the Dinovo Edge for this to also cover the
Dinovo Mini keyboard and also add a mapping to logitech-hidpp for the
Media key on the Dinovo Mini, so that that keeps working too.
BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1811424
Fixes: f2113c3020 ("HID: logitech-dj: add support for Logitech Bluetooth Mini-Receiver")
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Fix an error in the mouse / INPUT(2) descriptor used for quad/bt2.0 combo
receivers. Replace INPUT with INPUT (Data,Var,Abs) for the field for the
4 extra buttons which share their report-byte with the low-res hwheel.
This is likely a copy and paste error. I've verified that the new
0x81, 0x02 value matches both the mouse descriptor for the currently
supported MX5000 / MX5500 receivers, as well as the INPUT(2) mouse
descriptors for the Dinovo receivers for which support is being
worked on.
Cc: stable@vger.kernel.org
Fixes: f2113c3020 ("HID: logitech-dj: add support for Logitech Bluetooth Mini-Receiver")
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
The RaspberryPi4 has both a WiFi chip and HDMI outputs capable of doing
4k. Unfortunately, the 1440p resolution at 60Hz has a TMDS rate on the
HDMI cable right in the middle of the first Wifi channel.
Add a property to our HDMI controller, that could be reused by other
similar HDMI controllers, to allow the OS to take whatever measure is
necessary to avoid that crosstalk.
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Reviewed-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
Reviewed-by: Rob Herring <robh@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20201029134018.1948636-1-maxime@cerno.tech
The HDMI controller cannot go above a certain pixel rate limit depending on
the generations, but that limit is only enforced in mode_valid at the
moment, which means that we won't advertise modes that exceed that limit,
but the userspace is still free to try to setup a mode that would.
Implement atomic_check to make sure we check it in that scenario too.
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Reviewed-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201029122522.1917579-1-maxime@cerno.tech
pseries|pnv_setup_rfi_flush already does the count cache flush setup, and
we just added entry and uaccess flushes. So the name is not very accurate
any more. In both platforms we then also immediately setup the STF flush.
Rename them to _setup_security_mitigations and fold the STF flush in.
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
For simplicity in backporting, the original entry_flush test contained
a lot of duplicated code from the rfi_flush test. De-duplicate that code.
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Add a test modelled on the RFI flush test which counts the number
of L1D misses doing a simple syscall with the entry flush on and off.
For simplicity of backporting, this test duplicates a lot of code from
rfi_flush. We clean that up in the next patch.
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
In kup.h we currently include kup-radix.h for all 64-bit builds, which
includes Book3S and Book3E. The latter doesn't make sense, Book3E
never uses the Radix MMU.
This has worked up until now, but almost by accident, and the recent
uaccess flush changes introduced a build breakage on Book3E because of
the bad structure of the code.
So disentangle things so that we only use kup-radix.h for Book3S. This
requires some more stubs in kup.h and fixing an include in
syscall_64.c.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
IBM Power9 processors can speculatively operate on data in the L1 cache
before it has been completely validated, via a way-prediction mechanism. It
is not possible for an attacker to determine the contents of impermissible
memory using this method, since these systems implement a combination of
hardware and software security measures to prevent scenarios where
protected data could be leaked.
However these measures don't address the scenario where an attacker induces
the operating system to speculatively execute instructions using data that
the attacker controls. This can be used for example to speculatively bypass
"kernel user access prevention" techniques, as discovered by Anthony
Steinhauser of Google's Safeside Project. This is not an attack by itself,
but there is a possibility it could be used in conjunction with
side-channels or other weaknesses in the privileged code to construct an
attack.
This issue can be mitigated by flushing the L1 cache between privilege
boundaries of concern. This patch flushes the L1 cache after user accesses.
This is part of the fix for CVE-2020-4788.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
IBM Power9 processors can speculatively operate on data in the L1 cache
before it has been completely validated, via a way-prediction mechanism. It
is not possible for an attacker to determine the contents of impermissible
memory using this method, since these systems implement a combination of
hardware and software security measures to prevent scenarios where
protected data could be leaked.
However these measures don't address the scenario where an attacker induces
the operating system to speculatively execute instructions using data that
the attacker controls. This can be used for example to speculatively bypass
"kernel user access prevention" techniques, as discovered by Anthony
Steinhauser of Google's Safeside Project. This is not an attack by itself,
but there is a possibility it could be used in conjunction with
side-channels or other weaknesses in the privileged code to construct an
attack.
This issue can be mitigated by flushing the L1 cache between privilege
boundaries of concern. This patch flushes the L1 cache on kernel entry.
This is part of the fix for CVE-2020-4788.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
We are about to add an entry flush. The rfi (exit) flush test measures
the number of L1D flushes over a syscall with the RFI flush enabled and
disabled. But if the entry flush is also enabled, the effect of enabling
and disabling the RFI flush is masked.
If there is a debugfs entry for the entry flush, disable it during the RFI
flush and restore it later.
Reported-by: Spoorthy S <spoorts2@in.ibm.com>
Signed-off-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
CONFIG_PCI=n leads to a compile warning like:
sound/pci/hda/patch_ca0132.c:8214:10: warning: no case matching constant switch condition '0'
due to the missed handling of QUIRK_NONE in ca0132_mmio_init().
Fix it.
Fixes: bf2aa9ccc8 ("ALSA: hda/ca0132 - Cleanup ca0132_mmio_init function.")
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/r/20201119120404.16833-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Pull in x86 fixes from Thomas, as they include a change to the Intel DMAR
code on which we depend:
* tag 'x86-urgent-2020-11-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
iommu/vt-d: Cure VF irqdomain hickup
x86/platform/uv: Fix copied UV5 output archtype
x86/platform/uv: Drop last traces of uv_flush_tlb_others
GP Timers used as clockevent/source are not available for ti-sysc bus and
handled by Kernel timekeeping core. Now ti-sysc produces error message
every time such timer is detected:
"ti-sysc: probe of 48040000.target-module failed with error -16"
Such messages are not necessary, so suppress them by returning -ENXIO
instead of -EBUSY.
Fixes: 6cfcd5563b ("clocksource/drivers/timer-ti-dm: Fix suspend and resume for am3 and am4")
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Fix for drm/sun4i shared with arm-soc
This patch is a preliminary fix that will conflict with subsequent work merged
through arm-soc.
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
# gpg: Signature made Wed 18 Nov 2020 09:51:53 AM CET
# gpg: using EDDSA key 5C1337A45ECA9AEB89060E9EE3EF0D6F671851C5
# gpg: Good signature from "Maxime Ripard <maxime.ripard@anandra.org>" [unknown]
# gpg: aka "Maxime Ripard <mripard@kernel.org>" [unknown]
# gpg: aka "Maxime Ripard (Work Address) <maxime.ripard@bootlin.com>" [unknown]
# gpg: aka "Maxime Ripard (Work Address) <maxime@bootlin.com>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: BE56 75C3 7E81 8C8B 5764 241C 254B CFC5 6BF6 CE8D
# Subkey fingerprint: 5C13 37A4 5ECA 9AEB 8906 0E9E E3EF 0D6F 6718 51C5
From: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20201118090455.sznrgpduuytlc22k@gilmour.lan
EDID can declare the maximum supported bpc up to 16,
and apparently there are displays that do so. Currently
we assume 12 bpc is tha max. Fix the assumption and
toss in a MISSING_CASE() for any other value we don't
expect to see.
This fixes modesets with a display with EDID max bpc > 12.
Previously any modeset would just silently fail on platforms
that didn't otherwise limit this via the max_bpc property.
In particular we don't add the max_bpc property to HDMI
ports on gmch platforms, and thus we would see the raw
max_bpc coming from the EDID.
I suppose we could already adjust this to also allow 16bpc,
but seeing as no current platform supports that there is
little point.
Cc: stable@vger.kernel.org
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2632
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201110210447.27454-1-ville.syrjala@linux.intel.com
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
(cherry picked from commit 2ca5a7b85b)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
If someone plays with the UFS clk scaling devfreq governor through sysfs,
ufshcd_devfreq_scale may be called even when HBA is not runtime ACTIVE.
This can lead to unexpected error. We cannot just protect it by calling
pm_runtime_get_sync() because that may cause a race condition since HBA
runtime suspend ops need to suspend clk scaling. To fix this call
pm_runtime_get_noresume() and check HBA's runtime status. Only proceed if
HBA is runtime ACTIVE, otherwise just bail.
governor_store
devfreq_performance_handler
update_devfreq
devfreq_set_target
ufshcd_devfreq_target
ufshcd_devfreq_scale
Link: https://lore.kernel.org/r/1600758548-28576-1-git-send-email-cang@codeaurora.org
Reviewed-by: Stanley Chu <stanley.chu@mediatek.com>
Signed-off-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Marc Kleine-Budde says:
====================
pull-request: can 2020-11-18
Jimmy Assarsson provides two patches for the kvaser_pciefd and kvaser_usb
drivers, where the can_bittiming_const are fixed.
The next patch is by me and fixes an erroneous flexcan_transceiver_enable()
during bus-off recovery in the flexcan driver.
Jarkko Nikula's patch for the m_can driver fixes the IRQ handler to only
process the interrupts if the device is not suspended.
* tag 'linux-can-fixes-for-5.10-20201118' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
can: m_can: process interrupt only when not runtime suspended
can: flexcan: flexcan_chip_start(): fix erroneous flexcan_transceiver_enable() during bus-off recovery
can: kvaser_usb: kvaser_usb_hydra: Fix KCAN bittiming limits
can: kvaser_pciefd: Fix KCAN bittiming limits
====================
Link: https://lore.kernel.org/r/20201118160414.2731659-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Slave function read the following capabilities from the wrong offset:
1. log_mc_entry_sz
2. fs_log_entry_sz
3. log_mc_hash_sz
Fix that by adjusting these capabilities offset to match firmware
layout.
Due to the wrong offset read, the following issues might occur:
1+2. Negative value reported at max_mcast_qp_attach.
3. Driver to init FW with multicast hash size of zero.
Fixes: a40ded6043 ("net/mlx4_core: Add masking for a few queries on HCA caps")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Eran Ben Elisha <eranbe@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20201118081922.553-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Saeed Mahameed says:
====================
mlx5 fixes 2020-11-17
This series introduces some fixes to mlx5 driver.
* tag 'mlx5-fixes-2020-11-17' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5: fix error return code in mlx5e_tc_nic_init()
net/mlx5: E-Switch, Fail mlx5_esw_modify_vport_rate if qos disabled
net/mlx5: Disable QoS when min_rates on all VFs are zero
net/mlx5: Clear bw_share upon VF disable
net/mlx5: Add handling of port type in rule deletion
net/mlx5e: Fix check if netdev is bond slave
net/mlx5e: Fix IPsec packet drop by mlx5e_tc_update_skb
net/mlx5e: Set IPsec WAs only in IP's non checksum partial case.
net/mlx5e: Fix refcount leak on kTLS RX resync
====================
Link: https://lore.kernel.org/r/20201117195702.386113-1-saeedm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The `skb' is mapped for DMA in ns_send() but does not unmap DMA in case
push_scqe() fails to submit the `skb'. The memory of the `skb' is
released so only the DMA mapping is leaking.
Unmap the DMA mapping in case push_scqe() failed.
Fixes: 864a3ff635 ("atm: [nicstar] remove virt_to_bus() and support 64-bit platforms")
Cc: Chas Williams <3chas3@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The ethernet driver may allocate skb (and skb->data) via napi_alloc_skb().
This ends up to page_frag_alloc() to allocate skb->data from
page_frag_cache->va.
During the memory pressure, page_frag_cache->va may be allocated as
pfmemalloc page. As a result, the skb->pfmemalloc is always true as
skb->data is from page_frag_cache->va. The skb will be dropped if the
sock (receiver) does not have SOCK_MEMALLOC. This is expected behaviour
under memory pressure.
However, once kernel is not under memory pressure any longer (suppose large
amount of memory pages are just reclaimed), the page_frag_alloc() may still
re-use the prior pfmemalloc page_frag_cache->va to allocate skb->data. As a
result, the skb->pfmemalloc is always true unless page_frag_cache->va is
re-allocated, even if the kernel is not under memory pressure any longer.
Here is how kernel runs into issue.
1. The kernel is under memory pressure and allocation of
PAGE_FRAG_CACHE_MAX_ORDER in __page_frag_cache_refill() will fail. Instead,
the pfmemalloc page is allocated for page_frag_cache->va.
2: All skb->data from page_frag_cache->va (pfmemalloc) will have
skb->pfmemalloc=true. The skb will always be dropped by sock without
SOCK_MEMALLOC. This is an expected behaviour.
3. Suppose a large amount of pages are reclaimed and kernel is not under
memory pressure any longer. We expect skb->pfmemalloc drop will not happen.
4. Unfortunately, page_frag_alloc() does not proactively re-allocate
page_frag_alloc->va and will always re-use the prior pfmemalloc page. The
skb->pfmemalloc is always true even kernel is not under memory pressure any
longer.
Fix this by freeing and re-allocating the page instead of recycling it.
References: https://lore.kernel.org/lkml/20201103193239.1807-1-dongli.zhang@oracle.com/
References: https://lore.kernel.org/linux-mm/20201105042140.5253-1-willy@infradead.org/
Suggested-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Aruna Ramakrishna <aruna.ramakrishna@oracle.com>
Cc: Bert Barbe <bert.barbe@oracle.com>
Cc: Rama Nichanamatlu <rama.nichanamatlu@oracle.com>
Cc: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
Cc: Manjunath Patil <manjunath.b.patil@oracle.com>
Cc: Joe Jin <joe.jin@oracle.com>
Cc: SRINIVAS <srinivas.eeda@oracle.com>
Fixes: 79930f5892 ("net: do not deplete pfmemalloc reserve")
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20201115201029.11903-1-dongli.zhang@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We recently improved our display atomic commit and tail sequence to
avoid some issues related to concurrency. One of the major changes
consisted of moving the interrupt disable and the stream release from
our atomic commit to our atomic tail (commit 6d90a208cf
("drm/amd/display: Move disable interrupt into commit tail")) .
However, the new code introduced inside our commit tail function was
inserted right after the function
drm_atomic_helper_update_legacy_modeset_state(), which has routines for
updating internal data structs related to timestamps. As a result, in
certain conditions, the display module can reach a situation where we
update our constants and, after that, clean it. This situation generates
the following warning:
amdgpu 0000:03:00.0: drm_WARN_ON_ONCE(drm_drv_uses_atomic_modeset(dev))
WARNING: CPU: 6 PID: 1269 at drivers/gpu/drm/drm_vblank.c:722
drm_crtc_vblank_helper_get_vblank_timestamp_internal+0x32b/0x340 [drm]
...
RIP:
0010:drm_crtc_vblank_helper_get_vblank_timestamp_internal+0x32b/0x340
[drm]
...
Call Trace:
? dc_stream_get_vblank_counter+0x57/0x60 [amdgpu]
drm_crtc_vblank_helper_get_vblank_timestamp+0x1c/0x20 [drm]
drm_get_last_vbltimestamp+0xad/0xc0 [drm]
drm_reset_vblank_timestamp+0x63/0xd0 [drm]
drm_crtc_vblank_on+0x85/0x150 [drm]
amdgpu_dm_atomic_commit_tail+0xaf1/0x2330 [amdgpu]
commit_tail+0x99/0x130 [drm_kms_helper]
drm_atomic_helper_commit+0x123/0x150 [drm_kms_helper]
amdgpu_dm_atomic_commit+0x11/0x20 [amdgpu]
drm_atomic_commit+0x4a/0x50 [drm]
drm_atomic_helper_set_config+0x7c/0xc0 [drm_kms_helper]
drm_mode_setcrtc+0x20b/0x7e0 [drm]
? tomoyo_path_number_perm+0x6f/0x200
? drm_mode_getcrtc+0x190/0x190 [drm]
drm_ioctl_kernel+0xae/0xf0 [drm]
drm_ioctl+0x245/0x400 [drm]
? drm_mode_getcrtc+0x190/0x190 [drm]
amdgpu_drm_ioctl+0x4e/0x80 [amdgpu]
__x64_sys_ioctl+0x91/0xc0
do_syscall_64+0x38/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xa9
...
For fixing this issue we rely upon a refactor introduced on
drm_atomic_helper_update_legacy_modeset_state ("Remove the timestamping
constant update from drm_atomic_helper_update_legacy_modeset_state()")
which decouples constant values update from
drm_atomic_helper_update_legacy_modeset_state to a new helper.
Basically, this commit uses this new helper and place it right after our
release module to avoid a situation where our CRTC struct gets wrong
values.
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1373
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1349
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Pull gfs2 fix from Andreas Gruenbacher:
"Fix gfs2 freeze/thaw"
* tag 'gfs2-v5.10-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
gfs2: Fix regression in freeze_go_sync
Pull nfsd fix from Bruce Fields:
"Just one quick fix for a tracing oops"
* tag 'nfsd-5.10-2' of git://linux-nfs.org/~bfields/linux:
SUNRPC: Fix oops in the rpc_xdr_buf event class
Pull Kunit fixes from Shuah Khan:
"Several fixes to Kunit documentation and tools, and to not pollute
the source directory.
Also remove the incorrect kunit .gitattributes file"
* tag 'linux-kselftest-kunit-fixes-5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
kunit: fix display of failed expectations for strings
kunit: tool: fix extra trailing \n in raw + parsed test output
kunit: tool: print out stderr from make (like build warnings)
KUnit: Docs: usage: wording fixes
KUnit: Docs: style: fix some Kconfig example issues
KUnit: Docs: fix a wording typo
kunit: Do not pollute source directory with generated files (test.log)
kunit: Do not pollute source directory with generated files (.kunitconfig)
kunit: tool: fix pre-existing python type annotation errors
kunit: Fix kunit.py parse subcommand (use null build_dir)
kunit: tool: unmark test_data as binary blobs
When the switch is hardware reset, it reads the contents of the
EEPROM. This can contain instructions for programming values into
registers and to perform waits between such programming. Reading the
EEPROM can take longer than the 100ms mv88e6xxx_hardware_reset() waits
after deasserting the reset GPIO. So poll the EEPROM done bit to
ensure it is complete.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Ruslan Sushko <rus@sushko.dev>
Link: https://lore.kernel.org/r/20201116164301.977661-1-rus@sushko.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ido Schimmel says:
====================
mlxsw: Couple of fixes
Patch #1 fixes firmware flashing when CONFIG_MLXSW_CORE=y and
CONFIG_MLXFW=m.
Patch #2 prevents EMAD transactions from needlessly failing when the
system is under heavy load by using exponential backoff.
Please consider patch #2 for stable.
====================
Link: https://lore.kernel.org/r/20201117173352.288491-1-idosch@idosch.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The driver sends Ethernet Management Datagram (EMAD) packets to the
device for configuration purposes and waits for up to 200ms for a reply.
A request is retried up to 5 times.
When the system is under heavy load, replies are not always processed in
time and EMAD transactions fail.
Make the process more robust to such delays by using exponential
backoff. First wait for up to 200ms, then retransmit and wait for up to
400ms and so on.
Fixes: caf7297e7a ("mlxsw: core: Introduce support for asynchronous EMAD register access")
Reported-by: Denis Yulevich <denisyu@nvidia.com>
Tested-by: Denis Yulevich <denisyu@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The commit cited below moved firmware flashing functionality from
mlxsw_spectrum to mlxsw_core, but did not adjust the Kconfig
dependencies. This makes it possible to have mlxsw_core as built-in and
mlxfw as a module. The mlxfw code is therefore not reachable from
mlxsw_core and firmware flashing fails:
# devlink dev flash pci/0000:01:00.0 file mellanox/mlxsw_spectrum-13.2008.1310.mfa2
devlink answers: Operation not supported
Fix by having mlxsw_core select mlxfw.
Fixes: b79cb787ac ("mlxsw: Move fw flashing code into core.c")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reported-by: Vadim Pasternak <vadimp@nvidia.com>
Tested-by: Vadim Pasternak <vadimp@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
DSA network devices rely on having their DSA management interface up and
running otherwise their ndo_open() will return -ENETDOWN. Without doing
this it would not be possible to use DSA devices as netconsole when
configured on the command line. These devices also do not utilize the
upper/lower linking so the check about the netpoll device having upper
is not going to be a problem.
The solution adopted here is identical to the one done for
net/ipv4/ipconfig.c with 728c02089a ("net: ipv4: handle DSA enabled
master network devices"), with the network namespace scope being
restricted to that of the process configuring netpoll.
Fixes: 04ff53f96a ("net: dsa: Add netconsole support")
Tested-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20201117035236.22658-1-f.fainelli@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
At the start of driver initialization, we do not know what bias
setting the bootloader has configured the system for and we only know
for certain the very first time we do a transition.
However, since the initial value of the comparison index is -EINVAL,
this negative value results in an array out of bound access on the
very first transition.
Since we don't know what the setting is, we just set the bias
configuration as there is nothing to compare against. This prevents
the array out of bound access.
NOTE: Even though we could use a more relaxed check of "< 0" the only
valid values(ignoring cosmic ray induced bitflips) are -EINVAL, 0+.
Fixes: 40b1936efe ("regulator: Introduce TI Adaptive Body Bias(ABB) on-chip LDO driver")
Link: https://lore.kernel.org/linux-mm/CA+G9fYuk4imvhyCN7D7T6PMDH6oNp6HDCRiTUKMQ6QXXjBa4ag@mail.gmail.com/
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Nishanth Menon <nm@ti.com>
Link: https://lore.kernel.org/r/20201118145009.10492-1-nm@ti.com
Signed-off-by: Mark Brown <broonie@kernel.org>
In kabylake_set_bias_level(), enabling mclk may fail if the clock has
already been enabled by the firmware. Attempts to disable that clock
later will fail with a warning backtrace.
mclk already disabled
WARNING: CPU: 2 PID: 108 at drivers/clk/clk.c:952 clk_core_disable+0x1b6/0x1cf
...
Call Trace:
clk_disable+0x2d/0x3a
kabylake_set_bias_level+0x72/0xfd [snd_soc_kbl_rt5663_rt5514_max98927]
snd_soc_card_set_bias_level+0x2b/0x6f
snd_soc_dapm_set_bias_level+0xe1/0x209
dapm_pre_sequence_async+0x63/0x96
async_run_entry_fn+0x3d/0xd1
process_one_work+0x2a9/0x526
...
Only disable the clock if it has been enabled.
Fixes: 15747a8020 ("ASoC: eve: implement set_bias_level function for rt5514")
Cc: Brent Lu <brent.lu@intel.com>
Cc: Curtis Malainey <cujomalainey@chromium.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/20201111205434.207610-1-linux@roeck-us.net
Signed-off-by: Mark Brown <broonie@kernel.org>
In xfs_initialize_perag(), if kmem_zalloc(), xfs_buf_hash_init(), or
radix_tree_preload() failed, the returned value 'error' is not set
accordingly.
Reported-as-fixing: 8b26c5825e ("xfs: handle ENOMEM correctly during initialisation of perag structures")
Fixes: 9b24717979 ("xfs: cache unlinked pointers in an rhashtable")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
The aim of the inode btree record iterator function is to call a
callback on every record in the btree. To avoid having to tear down and
recreate the inode btree cursor around every callback, it caches a
certain number of records in a memory buffer. After each batch of
callback invocations, we have to perform a btree lookup to find the
next record after where we left off.
However, if the keys of the inode btree are corrupt, the lookup might
put us in the wrong part of the inode btree, causing the walk function
to loop forever. Therefore, we add extra cursor tracking to make sure
that we never go backwards neither when performing the lookup nor when
jumping to the next inobt record. This also fixes an off by one error
where upon resume the lookup should have been for the inode /after/ the
point at which we stopped.
Found by fuzzing xfs/460 with keys[2].startino = ones causing bulkstat
and quotacheck to hang.
Fixes: a211432c27 ("xfs: create simplified inode walk function")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Currently, commit e9e2eae89d dropped a (int) decoration from
XFS_LITINO(mp), and since sizeof() expression is also involved,
the result of XFS_LITINO(mp) is simply as the size_t type
(commonly unsigned long).
Considering the expression in xfs_attr_shortform_bytesfit():
offset = (XFS_LITINO(mp) - bytes) >> 3;
let "bytes" be (int)340, and
"XFS_LITINO(mp)" be (unsigned long)336.
on 64-bit platform, the expression is
offset = ((unsigned long)336 - (int)340) >> 3 =
(int)(0xfffffffffffffffcUL >> 3) = -1
but on 32-bit platform, the expression is
offset = ((unsigned long)336 - (int)340) >> 3 =
(int)(0xfffffffcUL >> 3) = 0x1fffffff
instead.
so offset becomes a large positive number on 32-bit platform, and
cause xfs_attr_shortform_bytesfit() returns maxforkoff rather than 0.
Therefore, one result is
"ASSERT(new_size <= XFS_IFORK_SIZE(ip, whichfork));"
assertion failure in xfs_idata_realloc(), which was also the root
cause of the original bugreport from Dennis, see:
https://bugzilla.redhat.com/show_bug.cgi?id=1894177
And it can also be manually triggered with the following commands:
$ touch a;
$ setfattr -n user.0 -v "`seq 0 80`" a;
$ setfattr -n user.1 -v "`seq 0 80`" a
on 32-bit platform.
Fix the case in xfs_attr_shortform_bytesfit() by bailing out
"XFS_LITINO(mp) < bytes" in advance suggested by Eric and a misleading
comment together with this bugfix suggested by Darrick. It seems the
other users of XFS_LITINO(mp) are not impacted.
Fixes: e9e2eae89d ("xfs: only check the superblock version for dinode size calculation")
Cc: <stable@vger.kernel.org> # 5.7+
Reported-and-tested-by: Dennis Gilmore <dgilmore@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Teach the directory scrubber to check all the bestfree entries,
including the null ones. We want to be able to detect the case where
the entry is null but there actually /is/ a directory data block.
Found by fuzzing lbests[0] = ones in xfs/391.
Fixes: df481968f3 ("xfs: scrub directory freespace")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
We always know the correct state of the rmap record flags (attr, bmbt,
unwritten) so check them by direct comparison.
Fixes: d852657ccf ("xfs: cross-reference reverse-mapping btree")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
The comment and logic in xchk_btree_check_minrecs for dealing with
inode-rooted btrees isn't quite correct. While the direct children of
the inode root are allowed to have fewer records than what would
normally be allowed for a regular ondisk btree block, this is only true
if there is only one child block and the number of records don't fit in
the inode root.
Fixes: 08a3a692ef ("xfs: btree scrub should check minrecs")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Avoid processing bogus interrupt statuses when the HW is runtime suspended and
the M_CAN_IR register read may get all bits 1's. Handler can be called if the
interrupt request is shared with other peripherals or at the end of free_irq().
Therefore check the runtime suspended status before processing.
Fixes: cdf8259d65 ("can: m_can: Add PM Support")
Signed-off-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Link: https://lore.kernel.org/r/20200915134715.696303-1-jarkko.nikula@linux.intel.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Patch 541656d3a5 ("gfs2: freeze should work on read-only mounts") changed
the check for glock state in function freeze_go_sync() from "gl->gl_state
== LM_ST_SHARED" to "gl->gl_req == LM_ST_EXCLUSIVE". That's wrong and it
regressed gfs2's freeze/thaw mechanism because it caused only the freezing
node (which requests the glock in EX) to queue freeze work.
All nodes go through this go_sync code path during the freeze to drop their
SHared hold on the freeze glock, allowing the freezing node to acquire it
in EXclusive mode. But all the nodes must freeze access to the file system
locally, so they ALL must queue freeze work. The freeze_work calls
freeze_func, which makes a request to reacquire the freeze glock in SH,
effectively blocking until the thaw from the EX holder. Once thawed, the
freezing node drops its EX hold on the freeze glock, then the (blocked)
freeze_func reacquires the freeze glock in SH again (on all nodes, including
the freezer) so all nodes go back to a thawed state.
This patch changes the check back to gl_state == LM_ST_SHARED like it was
prior to 541656d3a5.
Fixes: 541656d3a5 ("gfs2: freeze should work on read-only mounts")
Cc: stable@vger.kernel.org # v5.8+
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
If the CAN controller goes into bus off, the do_set_mode() callback with
CAN_MODE_START can be used to recover the controller, which then calls
flexcan_chip_start(). If configured, this is done automatically by the
framework or manually by the user.
In flexcan_chip_start() there is an explicit call to
flexcan_transceiver_enable(), which does a regulator_enable() on the
transceiver regulator. This results in a net usage counter increase, as there
is no corresponding flexcan_transceiver_disable() in the bus off code path.
This further leads to the transceiver stuck enabled, even if the CAN interface
is shut down.
To fix this problem the
flexcan_transceiver_enable()/flexcan_transceiver_disable() are moved out of
flexcan_chip_start()/flexcan_chip_stop() into flexcan_open()/flexcan_close().
Fixes: e955cead03 ("CAN: Add Flexcan CAN controller driver")
Link: https://lore.kernel.org/r/20201118150148.2664024-1-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
An active ref_node always can be found in ctx->files_data, it's much
safer to get it this way instead of poking into files_data->ref_list.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Cc: stable@vger.kernel.org # v5.7+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
"intel_iommu=off" command line is used to disable iommu but iommu is force
enabled in a tboot system for security reason.
However for better performance on high speed network device, a new option
"intel_iommu=tboot_noforce" is introduced to disable the force on.
By default kernel should panic if iommu init fail in tboot for security
reason, but it's unnecessory if we use "intel_iommu=tboot_noforce,off".
Fix the code setting force_on and move intel_iommu_tboot_noforce
from tboot code to intel iommu code.
Fixes: 7304e8f28b ("iommu/vt-d: Correctly disable Intel IOMMU force on")
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@gmail.com>
Tested-by: Lukasz Hawrylko <lukasz.hawrylko@linux.intel.com>
Acked-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20201110071908.3133-1-zhenzhong.duan@gmail.com
Signed-off-by: Will Deacon <will@kernel.org>
Most changes in uv.c are related to KVM. Involve also the KVM team
regarding changes to uv.c.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Older firmware can return rc=0x107 rrc=0xd for destroy page if the
page is already non-secure. This should be handled like a success
as already done by newer firmware.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Fixes: 1a80b54d1c ("s390/uv: add destroy page call")
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
sysrq-t ends up invoking show_opcodes() for each task which tries to access
the user space code of other processes, which is obviously bogus.
It either manages to dump where the foreign task's regs->ip points to in a
valid mapping of the current task or triggers a pagefault and prints "Code:
Bad RIP value.". Both is just wrong.
Add a safeguard in copy_code() and check whether the @regs pointer matches
currents pt_regs. If not, do not even try to access it.
While at it, add commentary why using copy_from_user_nmi() is safe in
copy_code() even if the function name suggests otherwise.
Reported-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Tested-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20201117202753.667274723@linutronix.de
Commit e0d072782c ("dma-mapping: introduce DMA range map, supplanting
dma_pfn_offset") introduced a regression in our code since the second
backed to probe will now get -EINVAL back from dma_direct_set_offset and
will prevent the entire DRM device from probing.
Ignore -EINVAL as a temporary measure to get it back working, before
removing that call entirely.
Fixes: e0d072782c ("dma-mapping: introduce DMA range map, supplanting dma_pfn_offset")
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Reviewed-by: Chen-Yu Tsai <wens@csie.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
To start stack unwinding (SP, PC and BLINK) are needed. When the
explicit execution context (pt_regs etc) is not available, unwinder
assumes the task is sleeping (in __switch_to()) and fetches SP and BLINK
from kernel mode stack.
But this assumption is not true, specially in a SMP system, when top
runs on 1 core, there may be active running processes on all cores.
So when unwinding non courrent tasks, ensure they are NOT running.
And while at it, handle the self unwinding case explicitly.
This came out of investigation of a customer reported hang with
rcutorture+top
Link: https://github.com/foss-for-synopsys-dwc-arc-processors/linux/issues/31
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
The 1-bit shift rotation to the left on x variable located on
4 last if statement can be removed because the computed value is will
not be used afront.
Signed-off-by: Gustavo Pimentel <gustavo.pimentel@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Commit 2284ffea8f ("powerpc/64s/exception: Only test KVM in SRR
interrupts when PR KVM is supported") removed KVM guest tests from
interrupts that do not set HV=1, when PR-KVM is not configured.
This is wrong for HV-KVM HPT guest MMIO emulation case which attempts
to load the faulting instruction word with MSR[DR]=1 and MSR[HV]=1 with
the guest MMU context loaded. This can cause host DSI, DSLB interrupts
which must test for KVM guest. Restore this and add a comment.
Fixes: 2284ffea8f ("powerpc/64s/exception: Only test KVM in SRR interrupts when PR KVM is supported")
Cc: stable@vger.kernel.org # v5.7+
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201117135617.3521127-1-npiggin@gmail.com
Checking for ifdef CONFIG_x fails if CONFIG_x=m.
Use IS_ENABLED instead, which is true for both built-ins and modules.
Otherwise, a
> ip -4 route add 1.2.3.4/32 via inet6 fe80::2 dev eth1
fails with the message "Error: IPv6 support not enabled in kernel." if
CONFIG_IPV6 is `m`.
In the spirit of b8127113d0.
Fixes: d15662682d ("ipv4: Allow ipv6 gateway with ipv4 routes")
Cc: Kim Phillips <kim.phillips@arm.com>
Signed-off-by: Florian Klink <flokli@flokli.de>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20201115224509.2020651-1-flokli@flokli.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add missing define of ALIGN_DOWN to make the test build and run. In
addition, __sg_alloc_table_from_pages now support unaligned maximum
segment, so adapt the test result accordingly.
Fixes: 07da1223ec ("lib/scatterlist: Add support in dynamic allocation of SG table from pages")
Link: https://lore.kernel.org/r/20201115120623.139113-1-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
When skb has a frag_list its possible for skb_to_sgvec() to fail. This
happens when the scatterlist has fewer elements to store pages than would
be needed for the initial skb plus any of its frags.
This case appears rare, but is possible when running an RX parser/verdict
programs exposed to the internet. Currently, when this happens we throw
an error, break the pipe, and kfree the msg. This effectively breaks the
application or forces it to do a retry.
Lets catch this case and handle it by doing an skb_linearize() on any
skb we receive with frags. At this point skb_to_sgvec should not fail
because the failing conditions would require frags to be in place.
Fixes: 604326b41a ("bpf, sockmap: convert to generic sk_msg interface")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/160556576837.73229.14800682790808797635.stgit@john-XPS-13-9370
If the skb_verdict_prog redirects an skb knowingly to itself, fix your
BPF program this is not optimal and an abuse of the API please use
SK_PASS. That said there may be cases, such as socket load balancing,
where picking the socket is hashed based or otherwise picks the same
socket it was received on in some rare cases. If this happens we don't
want to confuse userspace giving them an EAGAIN error if we can avoid
it.
To avoid double accounting in these cases. At the moment even if the
skb has already been charged against the sockets rcvbuf and forward
alloc we check it again and do set_owner_r() causing it to be orphaned
and recharged. For one this is useless work, but more importantly we
can have a case where the skb could be put on the ingress queue, but
because we are under memory pressure we return EAGAIN. The trouble
here is the skb has already been accounted for so any rcvbuf checks
include the memory associated with the packet already. This rolls
up and can result in unnecessary EAGAIN errors in userspace read()
calls.
Fix by doing an unlikely check and skipping checks if skb->sk == sk.
Fixes: 51199405f9 ("bpf: skb_verdict, support SK_PASS on RX BPF path")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/160556574804.73229.11328201020039674147.stgit@john-XPS-13-9370
If a socket redirects to itself and it is under memory pressure it is
possible to get a socket stuck so that recv() returns EAGAIN and the
socket can not advance for some time. This happens because when
redirecting a skb to the same socket we received the skb on we first
check if it is OK to enqueue the skb on the receiving socket by checking
memory limits. But, if the skb is itself the object holding the memory
needed to enqueue the skb we will keep retrying from kernel side
and always fail with EAGAIN. Then userspace will get a recv() EAGAIN
error if there are no skbs in the psock ingress queue. This will continue
until either some skbs get kfree'd causing the memory pressure to
reduce far enough that we can enqueue the pending packet or the
socket is destroyed. In some cases its possible to get a socket
stuck for a noticeable amount of time if the socket is only receiving
skbs from sk_skb verdict programs. To reproduce I make the socket
memory limits ridiculously low so sockets are always under memory
pressure. More often though if under memory pressure it looks like
a spurious EAGAIN error on user space side causing userspace to retry
and typically enough has moved on the memory side that it works.
To fix skip memory checks and skb_orphan if receiving on the same
sock as already assigned.
For SK_PASS cases this is easy, its always the same socket so we
can just omit the orphan/set_owner pair.
For backlog cases we need to check skb->sk and decide if the orphan
and set_owner pair are needed.
Fixes: 51199405f9 ("bpf: skb_verdict, support SK_PASS on RX BPF path")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/160556572660.73229.12566203819812939627.stgit@john-XPS-13-9370
If copy_page_to_iter() fails or even partially completes, but with fewer
bytes copied than expected we currently reset sg.start and return EFAULT.
This proves problematic if we already copied data into the user buffer
before we return an error. Because we leave the copied data in the user
buffer and fail to unwind the scatterlist so kernel side believes data
has been copied and user side believes data has _not_ been received.
Expected behavior should be to return number of bytes copied and then
on the next read we need to return the error assuming its still there. This
can happen if we have a copy length spanning multiple scatterlist elements
and one or more complete before the error is hit.
The error is rare enough though that my normal testing with server side
programs, such as nginx, httpd, envoy, etc., I have never seen this. The
only reliable way to reproduce that I've found is to stream movies over
my browser for a day or so and wait for it to hang. Not very scientific,
but with a few extra WARN_ON()s in the code the bug was obvious.
When we review the errors from copy_page_to_iter() it seems we are hitting
a page fault from copy_page_to_iter_iovec() where the code checks
fault_in_pages_writeable(buf, copy) where buf is the user buffer. It
also seems typical server applications don't hit this case.
The other way to try and reproduce this is run the sockmap selftest tool
test_sockmap with data verification enabled, but it doesn't reproduce the
fault. Perhaps we can trigger this case artificially somehow from the
test tools. I haven't sorted out a way to do that yet though.
Fixes: 604326b41a ("bpf, sockmap: convert to generic sk_msg interface")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/160556566659.73229.15694973114605301063.stgit@john-XPS-13-9370
In async_resync mode, we log the TCP seq of records until the async request
is completed. Later, in case one of the logged seqs matches the resync
request, we return it, together with its record serial number. Before this
fix, we mistakenly returned the serial number of the current record
instead.
Fixes: ed9b7646b0 ("net/tls: Add asynchronous resync")
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Boris Pismenny <borisp@nvidia.com>
Link: https://lore.kernel.org/r/20201115131448.2702-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When sync_state support got introduced recently, by default we try to
set the NoCs to run initially at maximum rate. But as these values are
aggregated, we may end with a really big clock rate value, which is
then converted from "u64" to "long" during the clock rate rounding.
But on 32bit platforms this may result an overflow. Fix it by making
sure that the rate is within range.
Reported-by: Luca Weiss <luca@z3ntu.xyz>
Reviewed-by: Brian Masney <masneyb@onstation.org>
Link: https://lore.kernel.org/r/20201106144847.7726-1-georgi.djakov@linaro.org
Signed-off-by: Georgi Djakov <georgi.djakov@linaro.org>
Zorro reports that an xfstest test case is failing, and it turns out that
for the reissue path we can potentially issue a double completion on the
request for the failure path. There's an issue around the retry as well,
but for now, at least just make sure that we handle the error path
correctly.
Cc: stable@vger.kernel.org
Fixes: b63534c41e ("io_uring: re-issue block requests that failed because of resources")
Reported-by: Zorro Lang <zlang@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
If THIS_MODULE is not set, the module would be removed while debugfs is
being used.
It eventually makes kernel panic.
Fixes: 82c93a87bf ("netdevsim: implement couple of testing devlink health reporters")
Fixes: 424be63ad8 ("netdevsim: add UDP tunnel port offload support")
Fixes: 4418f862d6 ("netdevsim: implement support for devlink region and snapshots")
Fixes: d3cbb907ae ("netdevsim: add ACL trap reporting cookie as a metadata")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Link: https://lore.kernel.org/r/20201115103041.30701-1-ap420073@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Due to a hardware issue, an access to MDIO registers
that is concurrent with other ENETC register accesses
may lead to the MDIO access being dropped or corrupted.
The workaround introduces locking for all register accesses
to the ENETC register space. To reduce performance impact,
a readers-writers locking scheme has been implemented.
The writer in this case is the MDIO access code (irrelevant
whether that MDIO access is a register read or write), and
the reader is any access code to non-MDIO ENETC registers.
Also, the datapath functions acquire the read lock fewer times
and use _hot accessors. All the rest of the code uses the _wa
accessors which lock every register access.
The commit introducing MDIO support is -
commit ebfcb23d62 ("enetc: Add ENETC PF level external MDIO support")
but due to subsequent refactoring this patch is applicable on
top of a later commit.
Fixes: 6517798dd3 ("enetc: Make MDIO accessors more generic and export to include/linux/fsl")
Signed-off-by: Alex Marginean <alexandru.marginean@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Link: https://lore.kernel.org/r/20201112182608.26177-1-claudiu.manoil@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
I appreciate my time as a kernel maintainer for the last few years.
I am no longer working on the LPC32xx platform
and cannot commit time for support and discussions.
Signed-off-by: Sylvain Lemieux <slemieux@tycoint.com>
Acked-by: Vladimir Zapolskiy <vz@mleia.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Pull input fixes from Dmitry Torokhov:
"A fix for use-after-free in the Sun keyboard driver, a fix to firmware
updates on newer ICs in the Elan touchpad diver, and a couple misc
driver fixes"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: elan_i2c - fix firmware update on newer ICs
Input: resistive-adc-touch - fix kconfig dependency on IIO_BUFFER
Input: sunkbd - avoid use-after-free in teardown paths
Input: i8042 - allow insmod to succeed on devices without an i8042 controller
Input: adxl34x - clean up a data type in adxl34x_probe()
Fix to return a negative error code from the error handling
case instead of 0, as done elsewhere in this function.
Fixes: aedd133d17 ("net/mlx5e: Support CT offload for tc nic flows")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Avoid calling mlx5_esw_modify_vport_rate() if qos is not enabled and
avoid unnecessary syndrome messages from firmware.
Fixes: fcb64c0f56 ("net/mlx5: E-Switch, add ingress rate support")
Signed-off-by: Eli Cohen <elic@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Currently when QoS is enabled for VF and any min_rate is configured,
the driver sets bw_share value to at least 1 and doesn’t allow to set
it to 0 to make minimal rate unlimited. It means there is always a
minimal rate configured for every VF, even if user tries to remove it.
In order to make QoS disable possible, check whether all vports have
configured min_rate = 0. If this is true, set their bw_share to 0 to
disable min_rate limitations.
Fixes: c9497c9890 ("net/mlx5: Add support for setting VF min rate")
Signed-off-by: Vladyslav Tarasiuk <vladyslavt@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Currently, if user disables VFs with some min and max rates configured,
they are cleared. But QoS data is not cleared and restored upon next VF
enable placing limits on minimal rate for given VF, when user expects
none.
To match cleared vport->info struct with QoS-related min and max rates
upon VF disable, clear vport->qos struct too.
Fixes: 556b9d16d3 ("net/mlx5: Clear VF's configuration on disabling SRIOV")
Signed-off-by: Vladyslav Tarasiuk <vladyslavt@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Bond events handler uses bond_slave_get_rtnl to check if net device
is bond slave. bond_slave_get_rtnl return the rcu rx_handler pointer
from the netdev which exists for bond slaves but also exists for
devices that are attached to linux bridge so using it as indication
for bond slave is wrong.
Fix by using netif_is_lag_port instead.
Fixes: 7e51891a23 ("net/mlx5e: Use netdev events to set/del egress acl forward-to-vport rule")
Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Raed Salem <raeds@nvidia.com>
Reviewed-by: Ariel Levkovich <lariel@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Both TC and IPsec crypto offload use metadata_regB to store
private information. Since TC does not use bit 31 of regB, IPsec
will use bit 31 as the IPsec packet marker. The IPsec's regB usage
is changed to:
Bit31: IPsec marker
Bit30-24: IPsec syndrome
Bit23-0: IPsec obj id
Fixes: b2ac7541e3 ("net/mlx5e: IPsec: Add Connect-X IPsec Rx data path offload")
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Raed Salem <raeds@nvidia.com>
Reviewed-by: Ariel Levkovich <lariel@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The IP's checksum partial still requires L4 csum flag on Ethernet WQE.
Make the IPsec WAs only for the IP's non checksum partial case
(for example icmd packet)
Fixes: 5be019040c ("net/mlx5e: IPsec: Add Connect-X IPsec Tx data path offload")
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Raed Salem <raeds@nvidia.com>
Reviewed-by: Alaa Hleihel <alaa@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
On resync, the driver calls inet_lookup_established
(__inet6_lookup_established) that increases sk_refcnt of the socket. To
decrease it, the driver set skb->destructor to sock_edemux. However, it
didn't work well, because the TCP stack also sets this destructor for
early demux, and the refcount gets decreased only once, while increased
two times (in mlx5e and in the TCP stack). It leads to a socket leak, a
TLS context leak, which in the end leads to calling tls_dev_del twice:
on socket close and on driver unload, which in turn leads to a crash.
This commit fixes the refcount leak by calling sock_gen_put right away
after using the socket, thus fixing all the subsequent issues.
Fixes: 0419d8c9d8 ("net/mlx5e: kTLS, Add kTLS RX resync support")
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Pull s390 fixes from Heiko Carstens:
- fix system call exit path; avoid return to user space with any
TIF/CIF/PIF set
- fix file permission for cpum_sfb_size parameter
- another small defconfig update
* tag 's390-5.10-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/cpum_sf.c: fix file permission for cpum_sfb_size
s390: update defconfigs
s390: fix system call exit path
Pull MIPS fixes from Thomas Bogendoerfer:
- fix bug preventing booting on several platforms
- fix for build error, when modules need has_transparent_hugepage
- fix for memleak in alchemy clk setup
* tag 'mips_fixes_5.10_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
MIPS: Alchemy: Fix memleak in alchemy_clk_setup_cpu
MIPS: kernel: Fix for_each_memblock conversion
MIPS: export has_transparent_hugepage() for modules
During loss recovery, retransmitted packets are forced to use TCP
timestamps to calculate the RTT samples, which have a millisecond
granularity. BBR is designed using a microsecond granularity. As a
result, multiple RTT samples could be truncated to the same RTT value
during loss recovery. This is problematic, as BBR will not enter
PROBE_RTT if the RTT sample is <= the current min_rtt sample, meaning
that if there are persistent losses, PROBE_RTT will constantly be
pushed off and potentially never re-entered. This patch makes sure
that BBR enters PROBE_RTT by checking if RTT sample is < the current
min_rtt sample, rather than <=.
The Netflix transport/TCP team discovered this bug in the Linux TCP
BBR code during lab tests.
Fixes: 0f8782ea14 ("tcp_bbr: add BBR congestion control")
Signed-off-by: Ryan Sharpelletti <sharpelletti@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Link: https://lore.kernel.org/r/20201116174412.1433277-1-sharpelletti.kdev@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When removing the driver we would hit BUG_ON(!list_empty(&dev->ptype_specific))
in net/core/dev.c due to still having the NC-SI packet handler
registered.
# echo 1e660000.ethernet > /sys/bus/platform/drivers/ftgmac100/unbind
------------[ cut here ]------------
kernel BUG at net/core/dev.c:10254!
Internal error: Oops - BUG: 0 [#1] SMP ARM
CPU: 0 PID: 115 Comm: sh Not tainted 5.10.0-rc3-next-20201111-00007-g02e0365710c4 #46
Hardware name: Generic DT based system
PC is at netdev_run_todo+0x314/0x394
LR is at cpumask_next+0x20/0x24
pc : [<806f5830>] lr : [<80863cb0>] psr: 80000153
sp : 855bbd58 ip : 00000001 fp : 855bbdac
r10: 80c03d00 r9 : 80c06228 r8 : 81158c54
r7 : 00000000 r6 : 80c05dec r5 : 80c05d18 r4 : 813b9280
r3 : 813b9054 r2 : 8122c470 r1 : 00000002 r0 : 00000002
Flags: Nzcv IRQs on FIQs off Mode SVC_32 ISA ARM Segment none
Control: 00c5387d Table: 85514008 DAC: 00000051
Process sh (pid: 115, stack limit = 0x7cb5703d)
...
Backtrace:
[<806f551c>] (netdev_run_todo) from [<80707eec>] (rtnl_unlock+0x18/0x1c)
r10:00000051 r9:854ed710 r8:81158c54 r7:80c76bb0 r6:81158c10 r5:8115b410
r4:813b9000
[<80707ed4>] (rtnl_unlock) from [<806f5db8>] (unregister_netdev+0x2c/0x30)
[<806f5d8c>] (unregister_netdev) from [<805a8180>] (ftgmac100_remove+0x20/0xa8)
r5:8115b410 r4:813b9000
[<805a8160>] (ftgmac100_remove) from [<805355e4>] (platform_drv_remove+0x34/0x4c)
Fixes: bd466c3fb5 ("net/faraday: Support NCSI mode")
Signed-off-by: Joel Stanley <joel@jms.id.au>
Link: https://lore.kernel.org/r/20201117024448.1170761-1-joel@jms.id.au
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
It was recently reported that if GICR_TYPER is accessed before the RD base
address is set, we'll suffer from the unset @rdreg dereferencing. Oops...
gpa_t last_rdist_typer = rdreg->base + GICR_TYPER +
(rdreg->free_index - 1) * KVM_VGIC_V3_REDIST_SIZE;
It's "expected" that users will access registers in the redistributor if
the RD has been properly configured (e.g., the RD base address is set). But
it hasn't yet been covered by the existing documentation.
Per discussion on the list [1], the reporting of the GICR_TYPER.Last bit
for userspace never actually worked. And it's difficult for us to emulate
it correctly given that userspace has the flexibility to access it any
time. Let's just drop the reporting of the Last bit for userspace for now
(userspace should have full knowledge about it anyway) and it at least
prevents kernel from panic ;-)
[1] https://lore.kernel.org/kvmarm/c20865a267e44d1e2c0d52ce4e012263@kernel.org/
Fixes: ba7b3f1275 ("KVM: arm/arm64: Revisit Redistributor TYPER last bit computation")
Reported-by: Keqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Link: https://lore.kernel.org/r/20201117151629.1738-1-yuzenghui@huawei.com
Cc: stable@vger.kernel.org
Pull perf tools fixes from Arnaldo Carvalho de Melo:
- Fix file corruption due to event deletion in 'perf inject'.
- Update arch/x86/lib/mem{cpy,set}_64.S copies used in 'perf bench mem
memcpy', silencing perf build warning.
- Avoid an msan warning in a copied stack in 'perf test'.
- Correct tracepoint field name "flags" in ARM's CS-ETM hardware
tracing 'perf test' entry.
- Update branch sample pattern for cs-etm to cope with excluding guest
in userspace counting.
- Don't free "lock_seq_stat" if read_count isn't zero in 'perf lock'.
* tag 'perf-tools-fixes-for-v5.10-2020-11-17' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
perf test: Avoid an msan warning in a copied stack.
perf inject: Fix file corruption due to event deletion
perf test: Update branch sample pattern for cs-etm
perf test: Fix a typo in cs-etm testing
tools arch: Update arch/x86/lib/mem{cpy,set}_64.S copies used in 'perf bench mem memcpy'
perf lock: Don't free "lock_seq_stat" if read_count isn't zero
perf lock: Correct field name "flags"
Pull RCU fix from Paul McKenney:
"A single commit that fixes a bug that was introduced a couple of merge
windows ago, but which rather more recently converged to an
agreed-upon fix. The bug is that interrupts can be incorrectly enabled
while holding an irq-disabled spinlock. This can of course result in
self-deadlocks.
The bug is a bit difficult to trigger. It requires that a preempted
task be blocking a preemptible-RCU grace period long enough to trigger
an RCU CPU stall warning. In addition, an interrupt must occur at just
the right time, and that interrupt's handler must acquire that same
irq-disabled spinlock. Still, a deadlock is a deadlock.
Furthermore, we do now have a fix, and that fix survives kernel test
robot, -next, and rcutorture testing. It has also been verified by
Sebastian as fixing the bug. Therefore..."
* 'urgent-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
rcu: Don't invoke try_invoke_on_locked_down_task() with irqs disabled
It turns out the IRQs most like can be unmasked before the controller is
enabled with no problematic consequences. The manual doesn't explicitly
state that, but the examples perform the controller initialization
procedure in that order. So the commit da8f58909e ("spi: dw: Unmask IRQs
after enabling the chip") hasn't been that required as I thought. But
anyway setting the IRQs up after the chip enabling still worth adding
since it has simplified the code a bit. The problem is that it has
introduced a potential bug. The transfer handler pointer is now
initialized after the IRQs are enabled. That may and eventually will cause
an invalid or uninitialized callback invocation. Fix that just by
performing the callback initialization before the IRQ unmask procedure.
Fixes: da8f58909e ("spi: dw: Unmask IRQs after enabling the chip")
Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru>
Link: https://lore.kernel.org/r/20201117094054.4696-1-Sergey.Semin@baikalelectronics.ru
Signed-off-by: Mark Brown <broonie@kernel.org>
The scripts/dtc/checks.c requires that the node have empty "dma-ranges"
property must have the same "#address-cells" and "#size-cells" values as
the parent node. Otherwise, the following warnings is reported:
arch/arm64/boot/dts/qcom/ipq6018.dtsi:185.3-14: Warning \
(dma_ranges_format): /soc:dma-ranges: empty "dma-ranges" property but \
its #address-cells (1) differs from / (2)
arch/arm64/boot/dts/qcom/ipq6018.dtsi:185.3-14: Warning \
(dma_ranges_format): /soc:dma-ranges: empty "dma-ranges" property but \
its #size-cells (1) differs from / (2)
Arnd Bergmann figured out why it's necessary:
Also note that the #address-cells=<1> means that any device under
this bus is assumed to only support 32-bit addressing, and DMA will
have to go through a slow swiotlb in the absence of an IOMMU.
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20201016090833.1892-3-thunder.leizhen@huawei.com'
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
The scripts/dtc/checks.c requires that the node have empty "dma-ranges"
property must have the same "#address-cells" and "#size-cells" values as
the parent node. Otherwise, the following warnings is reported:
arch/arm64/boot/dts/broadcom/stingray/stingray-usb.dtsi:7.3-14: Warning \
(dma_ranges_format): /usb:dma-ranges: empty "dma-ranges" property but \
its #address-cells (1) differs from / (2)
arch/arm64/boot/dts/broadcom/stingray/stingray-usb.dtsi:7.3-14: Warning \
(dma_ranges_format): /usb:dma-ranges: empty "dma-ranges" property but \
its #size-cells (1) differs from / (2)
Arnd Bergmann figured out why it's necessary:
Also note that the #address-cells=<1> means that any device under
this bus is assumed to only support 32-bit addressing, and DMA will
have to go through a slow swiotlb in the absence of an IOMMU.
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Link: https://lore.kernel.org/r/20201016090833.1892-2-thunder.leizhen@huawei.com'
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Pull cpufreq-arm fixes for 5.10-rc5 from Viresh Kumar:
"- tegra186: Fix ->get() callback.
- arm/scmi: Add dummy clock provider to fix failure."
* 'cpufreq/arm/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm:
cpufreq: scmi: Fix OPP addition failure with a dummy clock provider
cpufreq: tegra186: Fix get frequency callback
A warning was hit when running xfstests/generic/068 in a Hyper-V guest:
[...] ------------[ cut here ]------------
[...] DEBUG_LOCKS_WARN_ON(lockdep_hardirqs_enabled())
[...] WARNING: CPU: 2 PID: 1350 at kernel/locking/lockdep.c:5280 check_flags.part.0+0x165/0x170
[...] ...
[...] Workqueue: events pwq_unbound_release_workfn
[...] RIP: 0010:check_flags.part.0+0x165/0x170
[...] ...
[...] Call Trace:
[...] lock_is_held_type+0x72/0x150
[...] ? lock_acquire+0x16e/0x4a0
[...] rcu_read_lock_sched_held+0x3f/0x80
[...] __send_ipi_one+0x14d/0x1b0
[...] hv_send_ipi+0x12/0x30
[...] __pv_queued_spin_unlock_slowpath+0xd1/0x110
[...] __raw_callee_save___pv_queued_spin_unlock_slowpath+0x11/0x20
[...] .slowpath+0x9/0xe
[...] lockdep_unregister_key+0x128/0x180
[...] pwq_unbound_release_workfn+0xbb/0xf0
[...] process_one_work+0x227/0x5c0
[...] worker_thread+0x55/0x3c0
[...] ? process_one_work+0x5c0/0x5c0
[...] kthread+0x153/0x170
[...] ? __kthread_bind_mask+0x60/0x60
[...] ret_from_fork+0x1f/0x30
The cause of the problem is we have call chain lockdep_unregister_key()
-> <irq disabled by raw_local_irq_save()> lockdep_unlock() ->
arch_spin_unlock() -> __pv_queued_spin_unlock_slowpath() -> pv_kick() ->
__send_ipi_one() -> trace_hyperv_send_ipi_one().
Although this particular warning is triggered because Hyper-V has a
trace point in ipi sending, but in general arch_spin_unlock() may call
another function having a trace point in it, so put the arch_spin_lock()
and arch_spin_unlock() after lock_recursion protection to fix this
problem and avoid similiar problems.
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20201113110512.1056501-1-boqun.feng@gmail.com
Glenn reported that "an application [he developed produces] a BUG in
deadline.c when a SCHED_DEADLINE task contends with CFS tasks on nested
PTHREAD_PRIO_INHERIT mutexes. I believe the bug is triggered when a CFS
task that was boosted by a SCHED_DEADLINE task boosts another CFS task
(nested priority inheritance).
------------[ cut here ]------------
kernel BUG at kernel/sched/deadline.c:1462!
invalid opcode: 0000 [#1] PREEMPT SMP
CPU: 12 PID: 19171 Comm: dl_boost_bug Tainted: ...
Hardware name: ...
RIP: 0010:enqueue_task_dl+0x335/0x910
Code: ...
RSP: 0018:ffffc9000c2bbc68 EFLAGS: 00010002
RAX: 0000000000000009 RBX: ffff888c0af94c00 RCX: ffffffff81e12500
RDX: 000000000000002e RSI: ffff888c0af94c00 RDI: ffff888c10b22600
RBP: ffffc9000c2bbd08 R08: 0000000000000009 R09: 0000000000000078
R10: ffffffff81e12440 R11: ffffffff81e1236c R12: ffff888bc8932600
R13: ffff888c0af94eb8 R14: ffff888c10b22600 R15: ffff888bc8932600
FS: 00007fa58ac55700(0000) GS:ffff888c10b00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fa58b523230 CR3: 0000000bf44ab003 CR4: 00000000007606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
? intel_pstate_update_util_hwp+0x13/0x170
rt_mutex_setprio+0x1cc/0x4b0
task_blocks_on_rt_mutex+0x225/0x260
rt_spin_lock_slowlock_locked+0xab/0x2d0
rt_spin_lock_slowlock+0x50/0x80
hrtimer_grab_expiry_lock+0x20/0x30
hrtimer_cancel+0x13/0x30
do_nanosleep+0xa0/0x150
hrtimer_nanosleep+0xe1/0x230
? __hrtimer_init_sleeper+0x60/0x60
__x64_sys_nanosleep+0x8d/0xa0
do_syscall_64+0x4a/0x100
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7fa58b52330d
...
---[ end trace 0000000000000002 ]—
He also provided a simple reproducer creating the situation below:
So the execution order of locking steps are the following
(N1 and N2 are non-deadline tasks. D1 is a deadline task. M1 and M2
are mutexes that are enabled * with priority inheritance.)
Time moves forward as this timeline goes down:
N1 N2 D1
| | |
| | |
Lock(M1) | |
| | |
| Lock(M2) |
| | |
| | Lock(M2)
| | |
| Lock(M1) |
| (!!bug triggered!) |
Daniel reported a similar situation as well, by just letting ksoftirqd
run with DEADLINE (and eventually block on a mutex).
Problem is that boosted entities (Priority Inheritance) use static
DEADLINE parameters of the top priority waiter. However, there might be
cases where top waiter could be a non-DEADLINE entity that is currently
boosted by a DEADLINE entity from a different lock chain (i.e., nested
priority chains involving entities of non-DEADLINE classes). In this
case, top waiter static DEADLINE parameters could be null (initialized
to 0 at fork()) and replenish_dl_entity() would hit a BUG().
Fix this by keeping track of the original donor and using its parameters
when a task is boosted.
Reported-by: Glenn Elliott <glenn@aurora.tech>
Reported-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Link: https://lkml.kernel.org/r/20201117061432.517340-1-juri.lelli@redhat.com
schedule() ttwu()
deactivate_task(); if (p->on_rq && ...) // false
atomic_dec(&task_rq(p)->nr_iowait);
if (prev->in_iowait)
atomic_inc(&rq->nr_iowait);
Allows nr_iowait to be decremented before it gets incremented,
resulting in more dodgy IO-wait numbers than usual.
Note that because we can now do ttwu_queue_wakelist() before
p->on_cpu==0, we lose the natural ordering and have to further delay
the decrement.
Fixes: c6e7bd7afa ("sched/core: Optimize ttwu() spinning on p->on_cpu")
Reported-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Link: https://lkml.kernel.org/r/20201117093829.GD3121429@hirez.programming.kicks-ass.net
Mel reported that on some ARM64 platforms loadavg goes bananas and
Will tracked it down to the following race:
CPU0 CPU1
schedule()
prev->sched_contributes_to_load = X;
deactivate_task(prev);
try_to_wake_up()
if (p->on_rq &&) // false
if (smp_load_acquire(&p->on_cpu) && // true
ttwu_queue_wakelist())
p->sched_remote_wakeup = Y;
smp_store_release(prev->on_cpu, 0);
where both p->sched_contributes_to_load and p->sched_remote_wakeup are
in the same word, and thus the stores X and Y race (and can clobber
one another's data).
Whereas prior to commit c6e7bd7afa ("sched/core: Optimize ttwu()
spinning on p->on_cpu") the p->on_cpu handoff serialized access to
p->sched_remote_wakeup (just as it still does with
p->sched_contributes_to_load) that commit broke that by calling
ttwu_queue_wakelist() with p->on_cpu != 0.
However, due to
p->XXX = X ttwu()
schedule() if (p->on_rq && ...) // false
smp_mb__after_spinlock() if (smp_load_acquire(&p->on_cpu) &&
deactivate_task() ttwu_queue_wakelist())
p->on_rq = 0; p->sched_remote_wakeup = Y;
We can be sure any 'current' store is complete and 'current' is
guaranteed asleep. Therefore we can move p->sched_remote_wakeup into
the current flags word.
Note: while the observed failure was loadavg accounting gone wrong due
to ttwu() cobbering p->sched_contributes_to_load, the reverse problem
is also possible where schedule() clobbers p->sched_remote_wakeup,
this could result in enqueue_entity() wrecking ->vruntime and causing
scheduling artifacts.
Fixes: c6e7bd7afa ("sched/core: Optimize ttwu() spinning on p->on_cpu")
Reported-by: Mel Gorman <mgorman@techsingularity.net>
Debugged-by: Will Deacon <will@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20201117083016.GK3121392@hirez.programming.kicks-ass.net
If the clk_register fails, we should free h before
function returns to prevent memleak.
Fixes: 474402291a ("MIPS: Alchemy: clock framework integration of onchip clocks")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
A UHS setting of SDR25 can give better results for High Speed mode.
This is because there is no setting corresponding to high speed. Currently
SDHCI sets no value, which means zero which is also the setting for SDR12.
There was an attempt to change this in sdhci.c but it caused problems for
some drivers, so it was reverted and the change was made to sdhci-brcmstb
in commit 2fefc7c5f7 ("mmc: sdhci-brcmstb: Fix incorrect switch to HS
mode"). Several other drivers also do this.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: stable@vger.kernel.org # v5.4+
Link: https://lore.kernel.org/r/20201112133656.20317-1-adrian.hunter@intel.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Currently a build with CONFIG_E200=y will fail with:
Error: invalid switch -me200
Error: unrecognized option -me200
Upstream binutils has never supported an -me200 option. Presumably it
was supported at some point by either a fork or Freescale internal
binutils.
We can't support code that we can't even build test, so drop the
addition of -me200 to the build flags, so we can at least build with
CONFIG_E200=y.
Reported-by: Németh Márton <nm127@freemail.hu>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Scott Wood <oss@buserror.net>
Link: https://lore.kernel.org/r/20201116120913.165317-1-mpe@ellerman.id.au
Jonathan writes:
First set of IIO and counter fixes for the 5.10 cycle.
IIO
cros_ec
- Provide defauts for max and min frequency when older machines fail
to return them correctly.
ingenic-adc
- Fix wrong vref value for JZ4770 SoC
- Fix AUX / VBAT readings when touchscreen in use by pausing touchscreen
readings during a read of these channels.
kxcjk1013
- Fix an issue with KIOX010A ACPI id using devices which need to run
a ACPI device specific method to avoid leaving the keyboard disabled.
Includes a minor precursor patch to make this fix easier to do.
mt6577-auxadc
- Fix an issue with dev_comp not being set resulting in a null ptr deref.
st_lsm6dsx
- Set a 10ms min shub slave timeout to handle fast snesors where more time
is needed to set up the config than the cycles allowed.
stm32-adc
- Fix an issue due to a clash between an ADC configured to use IRQs and
a second configured to use DMA cause by some incorrect register masking.
vcnl4035
- Kconfig missing dependency
Counter
ti-eqep
- wrong value for max_register as one beyond the end instead of the end.
* tag 'iio-fixes-for-5.10a' of https://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio:
iio: accel: kxcjk1013: Add support for KIOX010A ACPI DSM for setting tablet-mode
iio: accel: kxcjk1013: Replace is_smo8500_device with an acpi_type enum
iio: light: fix kconfig dependency bug for VCNL4035
iio/adc: ingenic: Fix AUX/VBAT readings when touchscreen is used
iio/adc: ingenic: Fix battery VREF for JZ4770 SoC
iio: imu: st_lsm6dsx: set 10ms as min shub slave timeout
counter/ti-eqep: Fix regmap max_register
iio: adc: stm32-adc: fix a regression when using dma and irq
iio: adc: mediatek: fix unset field
iio: cros_ec: Use default frequencies when EC returns invalid information
Currently, scan_microcode() leverages microcode_matches() to check
if the microcode matches the CPU by comparing the family and model.
However, the processor stepping and flags of the microcode signature
should also be considered when saving a microcode patch for early
update.
Use find_matching_signature() in scan_microcode() and get rid of the
now-unused microcode_matches() which is a good cleanup in itself.
Complete the verification of the patch being saved for early loading in
save_microcode_patch() directly. This needs to be done there too because
save_mc_for_early() will call save_microcode_patch() too.
The second reason why this needs to be done is because the loader still
tries to support, at least hypothetically, mixed-steppings systems and
thus adds all patches to the cache that belong to the same CPU model
albeit with different steppings.
For example:
microcode: CPU: sig=0x906ec, pf=0x2, rev=0xd6
microcode: mc_saved[0]: sig=0x906e9, pf=0x2a, rev=0xd6, total size=0x19400, date = 2020-04-23
microcode: mc_saved[1]: sig=0x906ea, pf=0x22, rev=0xd6, total size=0x19000, date = 2020-04-27
microcode: mc_saved[2]: sig=0x906eb, pf=0x2, rev=0xd6, total size=0x19400, date = 2020-04-23
microcode: mc_saved[3]: sig=0x906ec, pf=0x22, rev=0xd6, total size=0x19000, date = 2020-04-27
microcode: mc_saved[4]: sig=0x906ed, pf=0x22, rev=0xd6, total size=0x19400, date = 2020-04-23
The patch which is being saved for early loading, however, can only be
the one which fits the CPU this runs on so do the signature verification
before saving.
[ bp: Do signature verification in save_microcode_patch()
and rewrite commit message. ]
Fixes: ec400ddeff ("x86/microcode_intel_early.c: Early update ucode on Intel's CPU")
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: stable@vger.kernel.org
Link: https://bugzilla.kernel.org/show_bug.cgi?id=208535
Link: https://lkml.kernel.org/r/20201113015923.13960-1-yu.c.chen@intel.com
The loop over all memblocks works with PFNs and not physical
addresses, so we need for_each_mem_pfn_range().
Fixes: b10d6bca87 ("arch, drivers: replace for_each_membock() with for_each_mem_range()")
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Serge Semin <fancer.lancer@gmail.com>
Commit dd461cd918 ("opp: Allow dev_pm_opp_get_opp_table() to return
-EPROBE_DEFER") handles -EPROBE_DEFER for the clock/interconnects within
_allocate_opp_table() which is called from dev_pm_opp_add and it
now propagates the error back to the caller.
SCMI performance domain re-used clock bindings to keep it simple. However
with the above mentioned change, if clock property is present in a device
node, opps fails to get added with below errors until clk_get succeeds.
cpu0: failed to add opp 450000000Hz
cpu0: failed to add opps to the device
....(errors on cpu1-cpu4)
cpu5: failed to add opp 450000000Hz
cpu5: failed to add opps to the device
So, in order to fix the issue, we need to register dummy clock provider.
With the dummy clock provider, clk_get returns NULL(no errors!), then opp
core proceeds to add OPPs for the CPUs.
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Fixes: dd461cd918 ("opp: Allow dev_pm_opp_get_opp_table() to return -EPROBE_DEFER")
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Commit b89c01c960 ("cpufreq: tegra186: Fix initial frequency")
implemented the CPUFREQ 'get' callback to determine the current
operating frequency for each CPU. This implementation used a simple
looked up to determine the current operating frequency. The problem
with this is that frequency table for different Tegra186 devices may
vary and so the default boot frequency for Tegra186 device may or may
not be present in the frequency table. If the default boot frequency is
not present in the frequency table, this causes the function
tegra186_cpufreq_get() to return 0 and in turn causes cpufreq_online()
to fail which prevents CPUFREQ from working.
Fix this by always calculating the CPU frequency based upon the current
'ndiv' setting for the CPU. Note that the CPU frequency for Tegra186 is
calculated by reading the current 'ndiv' setting, multiplying by the
CPU reference clock and dividing by a constant divisor.
Fixes: b89c01c960 ("cpufreq: tegra186: Fix initial frequency")
Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Maurizio found a race where the abort and cmd stop paths can race as
follows:
1. thread1 runs iscsit_release_commands_from_conn and sets
CMD_T_FABRIC_STOP.
2. thread2 runs iscsit_aborted_task and then does __iscsit_free_cmd. It
then returns from the aborted_task callout and we finish
target_handle_abort and do:
target_handle_abort -> transport_cmd_check_stop_to_fabric ->
lio_check_stop_free -> target_put_sess_cmd
The cmd is now freed.
3. thread1 now finishes iscsit_release_commands_from_conn and runs
iscsit_free_cmd while accessing a command we just released.
In __target_check_io_state we check for CMD_T_FABRIC_STOP and set the
CMD_T_ABORTED if the driver is not cleaning up the cmd because of a session
shutdown. However, iscsit_release_commands_from_conn only sets the
CMD_T_FABRIC_STOP and does not check to see if the abort path has claimed
completion ownership of the command.
This adds a check in iscsit_release_commands_from_conn so only the abort or
fabric stop path cleanup the command.
Link: https://lore.kernel.org/r/1605318378-9269-1-git-send-email-michael.christie@oracle.com
Reported-by: Maurizio Lombardi <mlombard@redhat.com>
Reviewed-by: Maurizio Lombardi <mlombard@redhat.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
iSCSI NOPs are sometimes "lost", mistakenly sent to the user-land iscsid
daemon instead of handled in the kernel, as they should be, resulting in a
message from the daemon like:
iscsid: Got nop in, but kernel supports nop handling.
This can occur because of the new forward- and back-locks, and the fact
that an iSCSI NOP response can occur before processing of the NOP send is
complete. This can result in "conn->ping_task" being NULL in
iscsi_nop_out_rsp(), when the pointer is actually in the process of being
set.
To work around this, we add a new state to the "ping_task" pointer. In
addition to NULL (not assigned) and a pointer (assigned), we add the state
"being set", which is signaled with an INVALID pointer (using "-1").
Link: https://lore.kernel.org/r/20201106193317.16993-1-leeman.duncan@gmail.com
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Michael Chan says:
====================
bnxt_en: Bug fixes.
This first patch fixes a module eeprom A2h addressing issue. The next
2 patches fix counter related issues. The last one skips an
unsupported firmware call on the VF to avoid the error log.
====================
Link: https://lore.kernel.org/r/1605486472-28156-1-git-send-email-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
VFs do not have access permissions to issue NVM_GET_DEV_INFO
firmware command.
Fixes: 4933f6753b ("bnxt_en: Add bnxt_hwrm_nvm_get_dev_info() to query NVM info.")
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
bnxt_add_one_ctr() adds a hardware counter to a software counter and
adjusts for the hardware counter wraparound against the mask. The logic
assumes that the hardware counter is always smaller than or equal to
the mask.
This assumption is mostly correct. But in some cases if the firmware
is older and does not provide the accurate mask, the driver can use
a mask that is smaller than the actual hardware mask. This can cause
some extra carry bits to be added to the software counter, resulting in
counters that far exceed the actual value. Fix it by masking the
hardware counter with the mask passed into bnxt_add_one_ctr().
Fixes: fea6b33355 ("bnxt_en: Accumulate all counters.")
Reviewed-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Firmware is unable to retain the port counters during any kind of
fatal or non-fatal resets, so we must clear the port counters to
avoid false detection of port counter overflow.
Fixes: fea6b33355 ("bnxt_en: Accumulate all counters.")
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Reviewed-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The module eeprom address range returned by bnxt_get_module_eeprom()
should be 256 bytes of A0h address space, the lower half of the A2h
address space, and page 0 for the upper half of the A2h address space.
Fix the firmware call by passing page_number 0 for the A2h slave address
space.
Fixes: 42ee18fe4c ("bnxt_en: Add Support for ETHTOOL_GMODULEINFO and ETHTOOL_GMODULEEEPRO")
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Transactions sit on one of several lists, depending on their state
(allocated, pending, complete, or polled). A spinlock protects
against concurrent access when transactions are moved between these
lists.
Transactions are also reference counted. A newly-allocated
transaction has an initial count of 1; a transaction is released in
gsi_trans_free() only if its decremented reference count reaches 0.
Releasing a transaction includes removing it from the polled (or if
unused, allocated) list, so the spinlock is acquired when we release
a transaction.
The reference count is used to allow a caller to synchronously wait
for a committed transaction to complete. In this case, the waiter
takes an extra reference to the transaction *before* committing it
(so it won't be freed), and releases its reference (calls
gsi_trans_free()) when it is done with it.
Similarly, gsi_channel_update() takes an extra reference to ensure a
transaction isn't released before the function is done operating on
it. Until the transaction is moved to the completed list (by this
function) it won't be freed, so this reference is taken "safely."
But in the quiesce path, we want to wait for the "last" transaction,
which we find in the completed or polled list. Transactions on
these lists can be freed at any time, so we (try to) prevent that
by taking the reference while holding the spinlock.
Currently gsi_trans_free() decrements a transaction's reference
count unconditionally, acquiring the lock to remove the transaction
from its list *only* when the count reaches 0. This does not
protect the quiesce path, which depends on the lock to ensure its
extra reference prevents release of the transaction.
Fix this by only dropping the last reference to a transaction
in gsi_trans_free() while holding the spinlock.
Fixes: 9dd441e4ed ("soc: qcom: ipa: GSI transactions")
Reported-by: Stephen Boyd <swboyd@chromium.org>
Signed-off-by: Alex Elder <elder@linaro.org>
Link: https://lore.kernel.org/r/20201114182017.28270-1-elder@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If tcp socket has more data than Encrypted Handshake Message then
tls_sw_recvmsg will try to decrypt next record instead of returning
full control message to userspace as mentioned in comment. The next
message - usually Application Data - gets corrupted because it uses
zero copy for decryption that's why the data is not stored in skb
for next iteration. Revert check to not decrypt next record if
current is not Application Data.
Fixes: 692d7b5d1f ("tls: Fix recvmsg() to be able to peek across multiple records")
Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru>
Link: https://lore.kernel.org/r/1605413760-21153-1-git-send-email-vfedorenko@novek.ru
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
During rmnet unregistration, the real device rx_handler is first cleared
followed by the removal of rx_handler_data after the rcu synchronization.
Any packets in the receive path may observe that the rx_handler is NULL.
However, there is no check when dereferencing this value to use the
rmnet_port information.
This fixes following splat by adding the NULL check.
Unable to handle kernel NULL pointer dereference at virtual
address 000000000000000d
pc : rmnet_rx_handler+0x124/0x284
lr : rmnet_rx_handler+0x124/0x284
rmnet_rx_handler+0x124/0x284
__netif_receive_skb_core+0x758/0xd74
__netif_receive_skb+0x50/0x17c
process_backlog+0x15c/0x1b8
napi_poll+0x88/0x284
net_rx_action+0xbc/0x23c
__do_softirq+0x20c/0x48c
Fixes: ceed73a2cf ("drivers: net: ethernet: qualcomm: rmnet: Initial implementation")
Signed-off-by: Sean Tranchetti <stranche@codeaurora.org>
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Link: https://lore.kernel.org/r/1605298325-3705-1-git-send-email-subashab@codeaurora.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull ARM SoC fixes from Arnd Bergmann:
"Around one third of the fixes this time are for dts files that list
their ethernet controller as using 'phy-mode="rgmii"' but are changed
to 'phy-mode="rgmii-id"' now, because the PHY drivers (realtek,
ksz9031, dp83867, ...) now configure the internal delay based on that
when they used to stay on the hardware default.
The long story is archived at
https://lore.kernel.org/netdev/CAMj1kXEEF_Un-4NTaD5iUN0NoZYaJQn-rPediX0S6oRiuVuW-A@mail.gmail.com/
I was trying to hold off on the bugfixes until there was a solution
that would avoid breaking all boards, but that does not seem to be
happening any time soon, so I am now sending the correct version of
the dts files to ensure that at least these machines can use their
network devices again.
The other changes this time are:
- Updating the MAINTAINER lists for Allwinner and Samsung SoCs
- Multiple i.MX8MN machines get updates for their CPU operating
points to match the data sheet
- A revert for a dts patch that caused a regression in USB support on
Odroid U3
- Two fixes for the AMD Tee driver, addressing a memory leak and
missing locking
- Mark the network subsystem on qoriq-fman3 as cache coherent for
correctness as better performance.
- Minor dts fixes elsewhere, addressing dtc warnings and similar
problems"
* tag 'arm-soc-fixes-v5.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (48 commits)
ARM: dts: exynos: revert "add input clock to CMU in Exynos4412 Odroid"
ARM: dts: imx50-evk: Fix the chip select 1 IOMUX
arm64: dts: imx8mm: fix voltage for 1.6GHz CPU operating point
ARM: dts: stm32: Keep VDDA LDO1 always on on DHCOM
ARM: dts: stm32: Enable thermal sensor support on stm32mp15xx-dhcor
ARM: dts: stm32: Define VIO regulator supply on DHCOM
ARM: dts: stm32: Fix LED5 on STM32MP1 DHCOM PDK2
ARM: dts: stm32: Fix TA3-GPIO-C key on STM32MP1 DHCOM PDK2
arm64: dts: renesas: r8a774e1: Add missing audio_clk_b
tee: amdtee: synchronize access to shm list
tee: amdtee: fix memory leak due to reset of global shm list
arm64: dts: agilex/stratix10: Fix qspi node compatible
ARM: dts: imx6q-prti6q: fix PHY address
ARM: dts: vf610-zii-dev-rev-b: Fix MDIO over clocking
arm: dts: imx6qdl-udoo: fix rgmii phy-mode for ksz9031 phy
arm64: dts imx8mn: Remove non-existent USB OTG2
arm64: dts: imx8mm-beacon-som: Fix Choppy BT audio
arm64: dts: fsl: DPAA FMan DMA operations are coherent
arm64: dts: fsl: fix endianness issue of rcpm
arm64: dts: imx8mn-evk: fix missing PMIC's interrupt line pull-up
...
Pull Hyper-V fix from Wei Liu:
"One patch from Chris to fix kexec on Hyper-V"
* tag 'hyperv-fixes-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
Drivers: hv: vmbus: Allow cleanup of VMBUS_CONNECT_CPU if disconnected
Pull vhost fixes from Michael Tsirkin:
"Fixes all over the place, most notably vhost scsi IO error fixes"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
vhost scsi: Add support for LUN resets.
vhost scsi: add lun parser helper
vhost scsi: fix cmd completion race
vhost scsi: alloc cmds per vq instead of session
vhost: add helper to check if a vq has been setup
vdpasim: fix "mac_pton" undefined error
swiotlb: using SIZE_MAX needs limits.h included
A user reports (slightly shortened from the original message):
libphy: lantiq,xrx200-mdio: probed
mdio_bus 1e108000.switch-mii: MDIO device at address 17 is missing.
gswip 1e108000.switch lan: no phy at 2
gswip 1e108000.switch lan: failed to connect to port 2: -19
lantiq,xrx200-net 1e10b308.eth eth0: error -19 setting up slave phy
This is a single-port board using the internal Fast Ethernet PHY. The
user reports that switching to PHY scanning instead of configuring the
PHY within device-tree works around this issue.
The documentation for the standalone variant of the PHY11G (which is
probably very similar to what is used inside the xRX200 SoCs but having
the firmware burnt onto that standalone chip in the factory) states that
the PHY needs 300ms to be ready for MDIO communication after releasing
the reset.
Add a 300ms delay after initializing all GPHYs to ensure that the GPHY
firmware had enough time to initialize and to appear on the MDIO bus.
Unfortunately there is no (known) documentation on what the minimum time
to wait after releasing the reset on an internal PHY so play safe and
take the one for the external variant. Only wait after the last GPHY
firmware is loaded to not slow down the initialization too much (
xRX200 has two GPHYs but newer SoCs have at least three GPHYs).
Fixes: 14fceff477 ("net: dsa: Add Lantiq / Intel DSA driver for vrx200")
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Acked-by: Hauke Mehrtens <hauke@hauke-m.de>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20201115165757.552641-1-martin.blumenstingl@googlemail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We catch the case where we enter generic_file_buffered_read() with data
already transferred, but we also need to be careful not to allow an async
page lock if we're looping transferring data. If not, we could be
returning -EIOCBQUEUED instead of the transferred amount, and it could
result in double waitqueue additions as well.
Cc: stable@vger.kernel.org # v5.9
Fixes: 1a0a7853b9 ("mm: support async buffered reads in generic_file_buffered_read()")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
During stream start DSP firmware requires LPCS disabled as that moment in
time is resource heavy. Currently high-clock is selected on start of
second stream onwards while low-clock is re-selected before stream
actually leaves RESUME state i.e. PAUSE_STREAM call. Fix this by always
updating clock before RESUME_STREAM and directly after PAUSE_STREAM.
Fixes: a126750fc8 ("ASoC: Intel: catpt: PCM operations")
Signed-off-by: Cezary Rojewski <cezary.rojewski@intel.com>
Link: https://lore.kernel.org/r/20201116133332.8530-3-cezary.rojewski@intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Playing with very low period sizes may lead to timeouts when awaiting
RESET_STREAM reply for offload streams. This is caused by NOTIFY_POSITION
appearing in the middle of trigger(stop).
Stream is unprepared during trigger(stop) where PAUSE_STREAM IPC gets
invoked. However, all data that is already mixed in DSP firmware's mixer
stream will still be played regardless of the pause. For offload streams,
this means possibility for another NOTIFY_POSITION to process. Keep these
notifications in check by only handling them when stream is in prepared
state.
Fixes: a126750fc8 ("ASoC: Intel: catpt: PCM operations")
Signed-off-by: Cezary Rojewski <cezary.rojewski@intel.com>
Link: https://lore.kernel.org/r/20201116133332.8530-2-cezary.rojewski@intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Fix offset computation in __sev_dbg_decrypt() to include the
source paddr before it is rounded down to be aligned to 16 bytes
as required by SEV API. This fixes incorrect guest memory dumps
observed when using qemu monitor.
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Message-Id: <20201110224205.29444-1-Ashish.Kalra@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
From commit 6915564dc5 ("ACPI: OSL: Change the type of
acpi_os_map_generic_address() return value"),
acpi_os_map_generic_address() will return logical address or NULL
for error, but for ACPI_ADR_SPACE_SYSTEM_IO case, it should be also
return 0 as it's a normal case, but now it will return -ENXIO.
So check it out for such case to avoid einj module initialization
fail.
Fixes: 6915564dc5 ("ACPI: OSL: Change the type of acpi_os_map_generic_address() return value")
Cc: <stable@vger.kernel.org>
Reviewed-by: James Morse <james.morse@arm.com>
Tested-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Aili Yao <yaoaili@kingsoft.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Martin Schiller is an active developer and reviewer for the X.25 code.
His company is providing products based on the Linux X.25 stack.
So he is a good candidate for maintainers of the X.25 code.
The original maintainer of the X.25 network layer (Andrew Hendry) has
not sent any email to the netdev mail list since 2013. So he is probably
inactive now.
Cc: Andrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: Xie He <xie.he.0141@gmail.com>
Acked-by: Martin Schiller <ms@dev.tdt.de>
Link: https://lore.kernel.org/r/20201114111029.326972-1-xie.he.0141@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Packets are processed even though the first fragment don't include all
headers through the upper layer header. This breaks TAHI IPv6 Core
Conformance Test v6LC.1.3.6.
Referring to RFC8200 SECTION 4.5: "If the first fragment does not include
all headers through an Upper-Layer header, then that fragment should be
discarded and an ICMP Parameter Problem, Code 3, message should be sent to
the source of the fragment, with the Pointer field set to zero."
The fragment needs to be validated the same way it is done in
commit 2efdaaaf88 ("IPv6: reply ICMP error if the first fragment don't
include all headers") for ipv6. Wrap the validation into a common function,
ipv6_frag_thdr_truncated() to check for truncation in the upper layer
header. This validation does not fullfill all aspects of RFC 8200,
section 4.5, but is at the moment sufficient to pass mentioned TAHI test.
In netfilter, utilize the fragment offset returned by find_prev_fhdr() to
let ipv6_frag_thdr_truncated() start it's traverse from the fragment
header.
Return 0 to drop the fragment in the netfilter. This is the same behaviour
as used on other protocol errors in this function, e.g. when
nf_ct_frag6_queue() returns -EPROTO. The Fragment will later be picked up
by ipv6_frag_rcv() in reassembly.c. ipv6_frag_rcv() will then send an
appropriate ICMP Parameter Problem message back to the source.
References commit 2efdaaaf88 ("IPv6: reply ICMP error if the first
fragment don't include all headers")
Signed-off-by: Georg Kohmann <geokohma@cisco.com>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Link: https://lore.kernel.org/r/20201111115025.28879-1-geokohma@cisco.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The profile and level in op_set_ctrl was recently changed but during
v4l2_ctrl_handler_setup profile and level control values are mangled.
Fixes: 435c53c369 ("media: venus: venc: Use helper to set profile and level")
Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Zhang Qilong says:
====================
Fix usage counter leak by adding a general sync ops
In many case, we need to check return value of pm_runtime_get_sync,
but it brings a trouble to the usage counter processing. Many callers
forget to decrease the usage counter when it failed, which could
resulted in reference leak. It has been discussed a lot[0][1]. So we
add a function to deal with the usage counter for better coding and
view. Then, we replace pm_runtime_resume_and_get with it in fec_main.c
to avoid it.
[0] https://lkml.org/lkml/2020/6/14/88
[1] https://patchwork.ozlabs.org/project/linux-tegra/list/?series=178139
====================
Link: https://lore.kernel.org/r/20201110092933.3342784-1-zhangqilong3@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
pm_runtime_get_sync() will increment pm usage at first and it will
resume the device later. If runtime of the device has error or
device is in inaccessible state(or other error state), resume
operation will fail. If we do not call put operation to decrease
the reference, it will result in reference count leak. Moreover,
this device cannot enter the idle state and always stay busy or other
non-idle state later. So we fixed it by replacing it with
pm_runtime_resume_and_get.
Fixes: 8fff755e9f ("net: fec: Ensure clocks are enabled while using mdio bus")
Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit 7f832645d0 ("dmaengine: ioatdma: remove ioatdma v2 registration")
missed to remove dca2_tag_map_valid() during its removal. Hence, since
then, dca2_tag_map_valid() is unused and make CC=clang W=1 warns:
drivers/dma/ioat/dca.c:44:19:
warning: unused function 'dca2_tag_map_valid' [-Wunused-function]
So, remove this unused function and get rid of a -Wused-function warning.
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Link: https://lore.kernel.org/r/20201113081248.26416-1-lukas.bulwahn@gmail.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
This fix is for a failure that occurred in the DWARF unwind perf test.
Stack unwinders may probe memory when looking for frames.
Memory sanitizer will poison and track uninitialized memory on the
stack, and on the heap if the value is copied to the heap.
This can lead to false memory sanitizer failures for the use of an
uninitialized value.
Avoid this problem by removing the poison on the copied stack.
The full msan failure with track origins looks like:
==2168==WARNING: MemorySanitizer: use-of-uninitialized-value
#0 0x559ceb10755b in handle_cfi elfutils/libdwfl/frame_unwind.c:648:8
#1 0x559ceb105448 in __libdwfl_frame_unwind elfutils/libdwfl/frame_unwind.c:741:4
#2 0x559ceb0ece90 in dwfl_thread_getframes elfutils/libdwfl/dwfl_frame.c:435:7
#3 0x559ceb0ec6b7 in get_one_thread_frames_cb elfutils/libdwfl/dwfl_frame.c:379:10
#4 0x559ceb0ec6b7 in get_one_thread_cb elfutils/libdwfl/dwfl_frame.c:308:17
#5 0x559ceb0ec6b7 in dwfl_getthreads elfutils/libdwfl/dwfl_frame.c:283:17
#6 0x559ceb0ec6b7 in getthread elfutils/libdwfl/dwfl_frame.c:354:14
#7 0x559ceb0ec6b7 in dwfl_getthread_frames elfutils/libdwfl/dwfl_frame.c:388:10
#8 0x559ceaff6ae6 in unwind__get_entries tools/perf/util/unwind-libdw.c:236:8
#9 0x559ceabc9dbc in test_dwarf_unwind__thread tools/perf/tests/dwarf-unwind.c:111:8
#10 0x559ceabca5cf in test_dwarf_unwind__compare tools/perf/tests/dwarf-unwind.c:138:26
#11 0x7f812a6865b0 in bsearch (libc.so.6+0x4e5b0)
#12 0x559ceabca871 in test_dwarf_unwind__krava_3 tools/perf/tests/dwarf-unwind.c:162:2
#13 0x559ceabca926 in test_dwarf_unwind__krava_2 tools/perf/tests/dwarf-unwind.c:169:9
#14 0x559ceabca946 in test_dwarf_unwind__krava_1 tools/perf/tests/dwarf-unwind.c:174:9
#15 0x559ceabcae12 in test__dwarf_unwind tools/perf/tests/dwarf-unwind.c:211:8
#16 0x559ceabbc4ab in run_test tools/perf/tests/builtin-test.c:418:9
#17 0x559ceabbc4ab in test_and_print tools/perf/tests/builtin-test.c:448:9
#18 0x559ceabbac70 in __cmd_test tools/perf/tests/builtin-test.c:669:4
#19 0x559ceabbac70 in cmd_test tools/perf/tests/builtin-test.c:815:9
#20 0x559cea960e30 in run_builtin tools/perf/perf.c:313:11
#21 0x559cea95fbce in handle_internal_command tools/perf/perf.c:365:8
#22 0x559cea95fbce in run_argv tools/perf/perf.c:409:2
#23 0x559cea95fbce in main tools/perf/perf.c:539:3
Uninitialized value was stored to memory at
#0 0x559ceb106acf in __libdwfl_frame_reg_set elfutils/libdwfl/frame_unwind.c:77:22
#1 0x559ceb106acf in handle_cfi elfutils/libdwfl/frame_unwind.c:627:13
#2 0x559ceb105448 in __libdwfl_frame_unwind elfutils/libdwfl/frame_unwind.c:741:4
#3 0x559ceb0ece90 in dwfl_thread_getframes elfutils/libdwfl/dwfl_frame.c:435:7
#4 0x559ceb0ec6b7 in get_one_thread_frames_cb elfutils/libdwfl/dwfl_frame.c:379:10
#5 0x559ceb0ec6b7 in get_one_thread_cb elfutils/libdwfl/dwfl_frame.c:308:17
#6 0x559ceb0ec6b7 in dwfl_getthreads elfutils/libdwfl/dwfl_frame.c:283:17
#7 0x559ceb0ec6b7 in getthread elfutils/libdwfl/dwfl_frame.c:354:14
#8 0x559ceb0ec6b7 in dwfl_getthread_frames elfutils/libdwfl/dwfl_frame.c:388:10
#9 0x559ceaff6ae6 in unwind__get_entries tools/perf/util/unwind-libdw.c:236:8
#10 0x559ceabc9dbc in test_dwarf_unwind__thread tools/perf/tests/dwarf-unwind.c:111:8
#11 0x559ceabca5cf in test_dwarf_unwind__compare tools/perf/tests/dwarf-unwind.c:138:26
#12 0x7f812a6865b0 in bsearch (libc.so.6+0x4e5b0)
#13 0x559ceabca871 in test_dwarf_unwind__krava_3 tools/perf/tests/dwarf-unwind.c:162:2
#14 0x559ceabca926 in test_dwarf_unwind__krava_2 tools/perf/tests/dwarf-unwind.c:169:9
#15 0x559ceabca946 in test_dwarf_unwind__krava_1 tools/perf/tests/dwarf-unwind.c:174:9
#16 0x559ceabcae12 in test__dwarf_unwind tools/perf/tests/dwarf-unwind.c:211:8
#17 0x559ceabbc4ab in run_test tools/perf/tests/builtin-test.c:418:9
#18 0x559ceabbc4ab in test_and_print tools/perf/tests/builtin-test.c:448:9
#19 0x559ceabbac70 in __cmd_test tools/perf/tests/builtin-test.c:669:4
#20 0x559ceabbac70 in cmd_test tools/perf/tests/builtin-test.c:815:9
#21 0x559cea960e30 in run_builtin tools/perf/perf.c:313:11
#22 0x559cea95fbce in handle_internal_command tools/perf/perf.c:365:8
#23 0x559cea95fbce in run_argv tools/perf/perf.c:409:2
#24 0x559cea95fbce in main tools/perf/perf.c:539:3
Uninitialized value was stored to memory at
#0 0x559ceb106a54 in handle_cfi elfutils/libdwfl/frame_unwind.c:613:9
#1 0x559ceb105448 in __libdwfl_frame_unwind elfutils/libdwfl/frame_unwind.c:741:4
#2 0x559ceb0ece90 in dwfl_thread_getframes elfutils/libdwfl/dwfl_frame.c:435:7
#3 0x559ceb0ec6b7 in get_one_thread_frames_cb elfutils/libdwfl/dwfl_frame.c:379:10
#4 0x559ceb0ec6b7 in get_one_thread_cb elfutils/libdwfl/dwfl_frame.c:308:17
#5 0x559ceb0ec6b7 in dwfl_getthreads elfutils/libdwfl/dwfl_frame.c:283:17
#6 0x559ceb0ec6b7 in getthread elfutils/libdwfl/dwfl_frame.c:354:14
#7 0x559ceb0ec6b7 in dwfl_getthread_frames elfutils/libdwfl/dwfl_frame.c:388:10
#8 0x559ceaff6ae6 in unwind__get_entries tools/perf/util/unwind-libdw.c:236:8
#9 0x559ceabc9dbc in test_dwarf_unwind__thread tools/perf/tests/dwarf-unwind.c:111:8
#10 0x559ceabca5cf in test_dwarf_unwind__compare tools/perf/tests/dwarf-unwind.c:138:26
#11 0x7f812a6865b0 in bsearch (libc.so.6+0x4e5b0)
#12 0x559ceabca871 in test_dwarf_unwind__krava_3 tools/perf/tests/dwarf-unwind.c:162:2
#13 0x559ceabca926 in test_dwarf_unwind__krava_2 tools/perf/tests/dwarf-unwind.c:169:9
#14 0x559ceabca946 in test_dwarf_unwind__krava_1 tools/perf/tests/dwarf-unwind.c:174:9
#15 0x559ceabcae12 in test__dwarf_unwind tools/perf/tests/dwarf-unwind.c:211:8
#16 0x559ceabbc4ab in run_test tools/perf/tests/builtin-test.c:418:9
#17 0x559ceabbc4ab in test_and_print tools/perf/tests/builtin-test.c:448:9
#18 0x559ceabbac70 in __cmd_test tools/perf/tests/builtin-test.c:669:4
#19 0x559ceabbac70 in cmd_test tools/perf/tests/builtin-test.c:815:9
#20 0x559cea960e30 in run_builtin tools/perf/perf.c:313:11
#21 0x559cea95fbce in handle_internal_command tools/perf/perf.c:365:8
#22 0x559cea95fbce in run_argv tools/perf/perf.c:409:2
#23 0x559cea95fbce in main tools/perf/perf.c:539:3
Uninitialized value was stored to memory at
#0 0x559ceaff8800 in memory_read tools/perf/util/unwind-libdw.c:156:10
#1 0x559ceb10f053 in expr_eval elfutils/libdwfl/frame_unwind.c:501:13
#2 0x559ceb1060cc in handle_cfi elfutils/libdwfl/frame_unwind.c:603:18
#3 0x559ceb105448 in __libdwfl_frame_unwind elfutils/libdwfl/frame_unwind.c:741:4
#4 0x559ceb0ece90 in dwfl_thread_getframes elfutils/libdwfl/dwfl_frame.c:435:7
#5 0x559ceb0ec6b7 in get_one_thread_frames_cb elfutils/libdwfl/dwfl_frame.c:379:10
#6 0x559ceb0ec6b7 in get_one_thread_cb elfutils/libdwfl/dwfl_frame.c:308:17
#7 0x559ceb0ec6b7 in dwfl_getthreads elfutils/libdwfl/dwfl_frame.c:283:17
#8 0x559ceb0ec6b7 in getthread elfutils/libdwfl/dwfl_frame.c:354:14
#9 0x559ceb0ec6b7 in dwfl_getthread_frames elfutils/libdwfl/dwfl_frame.c:388:10
#10 0x559ceaff6ae6 in unwind__get_entries tools/perf/util/unwind-libdw.c:236:8
#11 0x559ceabc9dbc in test_dwarf_unwind__thread tools/perf/tests/dwarf-unwind.c:111:8
#12 0x559ceabca5cf in test_dwarf_unwind__compare tools/perf/tests/dwarf-unwind.c:138:26
#13 0x7f812a6865b0 in bsearch (libc.so.6+0x4e5b0)
#14 0x559ceabca871 in test_dwarf_unwind__krava_3 tools/perf/tests/dwarf-unwind.c:162:2
#15 0x559ceabca926 in test_dwarf_unwind__krava_2 tools/perf/tests/dwarf-unwind.c:169:9
#16 0x559ceabca946 in test_dwarf_unwind__krava_1 tools/perf/tests/dwarf-unwind.c:174:9
#17 0x559ceabcae12 in test__dwarf_unwind tools/perf/tests/dwarf-unwind.c:211:8
#18 0x559ceabbc4ab in run_test tools/perf/tests/builtin-test.c:418:9
#19 0x559ceabbc4ab in test_and_print tools/perf/tests/builtin-test.c:448:9
#20 0x559ceabbac70 in __cmd_test tools/perf/tests/builtin-test.c:669:4
#21 0x559ceabbac70 in cmd_test tools/perf/tests/builtin-test.c:815:9
#22 0x559cea960e30 in run_builtin tools/perf/perf.c:313:11
#23 0x559cea95fbce in handle_internal_command tools/perf/perf.c:365:8
#24 0x559cea95fbce in run_argv tools/perf/perf.c:409:2
#25 0x559cea95fbce in main tools/perf/perf.c:539:3
Uninitialized value was stored to memory at
#0 0x559cea9027d9 in __msan_memcpy llvm/llvm-project/compiler-rt/lib/msan/msan_interceptors.cpp:1558:3
#1 0x559cea9d2185 in sample_ustack tools/perf/arch/x86/tests/dwarf-unwind.c:41:2
#2 0x559cea9d202c in test__arch_unwind_sample tools/perf/arch/x86/tests/dwarf-unwind.c:72:9
#3 0x559ceabc9cbd in test_dwarf_unwind__thread tools/perf/tests/dwarf-unwind.c:106:6
#4 0x559ceabca5cf in test_dwarf_unwind__compare tools/perf/tests/dwarf-unwind.c:138:26
#5 0x7f812a6865b0 in bsearch (libc.so.6+0x4e5b0)
#6 0x559ceabca871 in test_dwarf_unwind__krava_3 tools/perf/tests/dwarf-unwind.c:162:2
#7 0x559ceabca926 in test_dwarf_unwind__krava_2 tools/perf/tests/dwarf-unwind.c:169:9
#8 0x559ceabca946 in test_dwarf_unwind__krava_1 tools/perf/tests/dwarf-unwind.c:174:9
#9 0x559ceabcae12 in test__dwarf_unwind tools/perf/tests/dwarf-unwind.c:211:8
#10 0x559ceabbc4ab in run_test tools/perf/tests/builtin-test.c:418:9
#11 0x559ceabbc4ab in test_and_print tools/perf/tests/builtin-test.c:448:9
#12 0x559ceabbac70 in __cmd_test tools/perf/tests/builtin-test.c:669:4
#13 0x559ceabbac70 in cmd_test tools/perf/tests/builtin-test.c:815:9
#14 0x559cea960e30 in run_builtin tools/perf/perf.c:313:11
#15 0x559cea95fbce in handle_internal_command tools/perf/perf.c:365:8
#16 0x559cea95fbce in run_argv tools/perf/perf.c:409:2
#17 0x559cea95fbce in main tools/perf/perf.c:539:3
Uninitialized value was created by an allocation of 'bf' in the stack frame of function 'perf_event__synthesize_mmap_events'
#0 0x559ceafc5f60 in perf_event__synthesize_mmap_events tools/perf/util/synthetic-events.c:445
SUMMARY: MemorySanitizer: use-of-uninitialized-value elfutils/libdwfl/frame_unwind.c:648:8 in handle_cfi
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: clang-built-linux@googlegroups.com
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sandeep Dasgupta <sdasgup@google.com>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20201113182053.754625-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
"perf inject" can create corrupt files when synthesizing sample events from AUX
data. This happens when in the input file, the first event (for the AUX data)
has a different sample_type from the second event (generally dummy).
Specifically, they differ in the bits that indicate the standard fields
appended to perf records in the mmap buffer. "perf inject" deletes the first
event and moves up the second event to first position.
The problem is with the synthetic PERF_RECORD_MMAP (etc.) events created
by "perf record".
Since these are synthetic versions of events which are normally produced
by the kernel, they have to have the standard fields appended as
described by sample_type.
"perf record" fills these in with zeroes, including the IDENTIFIER
field; perf readers interpret records with zero IDENTIFIER using the
descriptor for the first event in the file.
Since "perf inject" changes the first event, these synthetic records are
then processed with the wrong value of sample_type, and the perf reader
reads bad data, reports on incorrect length records etc.
Mismatching sample_types are seen with "perf record -e cs_etm//", where the AUX
event has TID|TIME|CPU|IDENTIFIER and the dummy event has TID|TIME|IDENTIFIER.
Perhaps they could be the same, but it isn't normally a problem if they aren't
- perf has no problems reading the file.
The sample_types have to agree on the position of IDENTIFIER, because
that's how perf finds the right event descriptor in the first place, but
they don't normally have to agree on other fields, and perf doesn't
check that they do.
The problem is specific to the way "perf inject" reorganizes the events
and the way synthetic MMAP events are recorded with a zero identifier. A
simple solution is to stop "perf inject" deleting the tracing event.
Committer testing
Removed the now unused 'evsel' variable, update the comment about the
evsel removal not being performed anymore, and apply the patch manually
as it failed with this warning:
warning: Patch sent with format=flowed; space at the end of lines might be lost.
Testing it with:
$ perf bench internals inject-build-id
# Running 'internals/inject-build-id' benchmark:
Average build-id injection took: 8.543 msec (+- 0.130 msec)
Average time per event: 0.838 usec (+- 0.013 usec)
Average memory usage: 12717 KB (+- 9 KB)
Average build-id-all injection took: 5.710 msec (+- 0.058 msec)
Average time per event: 0.560 usec (+- 0.006 usec)
Average memory usage: 12079 KB (+- 7 KB)
$
Signed-off-by: Al Grant <al.grant@arm.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
LPU-Reference: b9cf5611-daae-2390-3439-6617f8f0a34b@foss.arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
i.MX fixes for 5.10, round 4:
- Fix MDIO over clocking on vf610-zii-dev-rev-b board to get switch
device work reliably.
- Fix imx50-evk IOMUX for the chip select 1 to use GPIO4_13 instead of
the native CSPI_SSI function.
- Fix voltage for 1.6GHz CPU operating point on i.MX8MM to match
hardware datasheet.
- Fix phy-mode for KSZ9031 PHY on imx6qdl-udoo board.
* tag 'imx-fixes-5.10-4' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux:
ARM: dts: imx50-evk: Fix the chip select 1 IOMUX
arm64: dts: imx8mm: fix voltage for 1.6GHz CPU operating point
ARM: dts: vf610-zii-dev-rev-b: Fix MDIO over clocking
arm: dts: imx6qdl-udoo: fix rgmii phy-mode for ksz9031 phy
Link: https://lore.kernel.org/r/20201116090702.GM5849@dragon
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Stefan Agner reported a bug when using zsram on 32-bit Arm machines
with RAM above the 4GB address boundary:
Unable to handle kernel NULL pointer dereference at virtual address 00000000
pgd = a27bd01c
[00000000] *pgd=236a0003, *pmd=1ffa64003
Internal error: Oops: 207 [#1] SMP ARM
Modules linked in: mdio_bcm_unimac(+) brcmfmac cfg80211 brcmutil raspberrypi_hwmon hci_uart crc32_arm_ce bcm2711_thermal phy_generic genet
CPU: 0 PID: 123 Comm: mkfs.ext4 Not tainted 5.9.6 #1
Hardware name: BCM2711
PC is at zs_map_object+0x94/0x338
LR is at zram_bvec_rw.constprop.0+0x330/0xa64
pc : [<c0602b38>] lr : [<c0bda6a0>] psr: 60000013
sp : e376bbe0 ip : 00000000 fp : c1e2921c
r10: 00000002 r9 : c1dda730 r8 : 00000000
r7 : e8ff7a00 r6 : 00000000 r5 : 02f9ffa0 r4 : e3710000
r3 : 000fdffe r2 : c1e0ce80 r1 : ebf979a0 r0 : 00000000
Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user
Control: 30c5383d Table: 235c2a80 DAC: fffffffd
Process mkfs.ext4 (pid: 123, stack limit = 0x495a22e6)
Stack: (0xe376bbe0 to 0xe376c000)
As it turns out, zsram needs to know the maximum memory size, which
is defined in MAX_PHYSMEM_BITS when CONFIG_SPARSEMEM is set, or in
MAX_POSSIBLE_PHYSMEM_BITS on the x86 architecture.
The same problem will be hit on all 32-bit architectures that have a
physical address space larger than 4GB and happen to not enable sparsemem
and include asm/sparsemem.h from asm/pgtable.h.
After the initial discussion, I suggested just always defining
MAX_POSSIBLE_PHYSMEM_BITS whenever CONFIG_PHYS_ADDR_T_64BIT is
set, or provoking a build error otherwise. This addresses all
configurations that can currently have this runtime bug, but
leaves all other configurations unchanged.
I looked up the possible number of bits in source code and
datasheets, here is what I found:
- on ARC, CONFIG_ARC_HAS_PAE40 controls whether 32 or 40 bits are used
- on ARM, CONFIG_LPAE enables 40 bit addressing, without it we never
support more than 32 bits, even though supersections in theory allow
up to 40 bits as well.
- on MIPS, some MIPS32r1 or later chips support 36 bits, and MIPS32r5
XPA supports up to 60 bits in theory, but 40 bits are more than
anyone will ever ship
- On PowerPC, there are three different implementations of 36 bit
addressing, but 32-bit is used without CONFIG_PTE_64BIT
- On RISC-V, the normal page table format can support 34 bit
addressing. There is no highmem support on RISC-V, so anything
above 2GB is unused, but it might be useful to eventually support
CONFIG_ZRAM for high pages.
Fixes: 61989a80fb ("staging: zsmalloc: zsmalloc memory allocation library")
Fixes: 02390b87a9 ("mm/zsmalloc: Prepare to variable MAX_PHYSMEM_BITS")
Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Reviewed-by: Stefan Agner <stefan@agner.ch>
Tested-by: Stefan Agner <stefan@agner.ch>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Link: https://lore.kernel.org/linux-mm/bdfa44bf1c570b05d6c70898e2bbb0acf234ecdf.1604762181.git.stefan@agner.ch/
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Marc Kleine-Budde says:
====================
pull-request: can 2020-11-15
Anant Thazhemadam contributed two patches for the AF_CAN that prevent potential
access of uninitialized member in can_rcv() and canfd_rcv().
The next patch is by Alejandro Concepcion Rodriguez and changes can_restart()
to use the correct function to push a skb into the networking stack from
process context.
Zhang Qilong's patch fixes a memory leak in the error path of the ti_hecc's
probe function.
A patch by me fixes mcba_usb_start_xmit() function in the mcba_usb driver, to
first fill the skb and then pass it to can_put_echo_skb().
Colin Ian King's patch fixes a potential integer overflow on shift in the
peak_usb driver.
The next two patches target the flexcan driver, a patch by me adds the missing
"req_bit" to the stop mode property comment (which was broken during net-next
for v5.10). Zhang Qilong's patch fixes the failure handling of
pm_runtime_get_sync().
The next seven patches target the m_can driver including the tcan4x5x spi
driver glue code. Enric Balletbo i Serra's patch for the tcan4x5x Kconfig fix
the REGMAP_SPI dependency handling. A patch by me for the tcan4x5x driver's
probe() function adds missing error handling to for devm_regmap_init(), and in
tcan4x5x_can_remove() the order of deregistration is fixed. Wu Bo's patch for
the m_can driver fixes the state change handling in
m_can_handle_state_change(). Two patches by Dan Murphy first introduce
m_can_class_free_dev() and then make use of it to fix the freeing of the can
device. A patch by Faiz Abbas add a missing shutdown of the CAN controller in
the m_can_stop() function.
* tag 'linux-can-fixes-for-5.10-20201115' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
can: m_can: m_can_stop(): set device to software init mode before closing
can: m_can: Fix freeing of can device from peripherials
can: m_can: m_can_class_free_dev(): introduce new function
can: m_can: m_can_handle_state_change(): fix state change
can: tcan4x5x: tcan4x5x_can_remove(): fix order of deregistration
can: tcan4x5x: tcan4x5x_can_probe(): add missing error checking for devm_regmap_init()
can: tcan4x5x: replace depends on REGMAP_SPI with depends on SPI
can: flexcan: fix failure handling of pm_runtime_get_sync()
can: flexcan: flexcan_setup_stop_mode(): add missing "req_bit" to stop mode property comment
can: peak_usb: fix potential integer overflow on shift of a int
can: mcba_usb: mcba_usb_start_xmit(): first fill skb, then pass to can_put_echo_skb()
can: ti_hecc: Fix memleak in ti_hecc_probe
can: dev: can_restart(): post buffer from the right context
can: af_can: prevent potential access of uninitialized member in canfd_rcv()
can: af_can: prevent potential access of uninitialized member in can_rcv()
====================
Link: https://lore.kernel.org/r/20201115174131.2089251-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When requeueing all requests on the device request queue to the blocklayer
we might get to an ERP (error recovery) request that is a copy of an
original CQR.
Those requests do not have blocklayer request information or a pointer to
the dasd_queue set. When trying to access those data it will lead to a
null pointer dereference in dasd_requeue_all_requests().
Fix by checking if the request is an ERP request that can simply be
ignored. The blocklayer request will be requeued by the original CQR that
is on the device queue right behind the ERP request.
Fixes: 9487cfd343 ("s390/dasd: fix handling of internal requests")
Cc: <stable@vger.kernel.org> #4.16
Signed-off-by: Stefan Haberland <sth@linux.ibm.com>
Reviewed-by: Jan Hoeppner <hoeppner@linux.ibm.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The following warning is reported if lock debugging is enabled.
DEBUG_LOCKS_WARN_ON(1)
WARNING: CPU: 1 PID: 1 at kernel/locking/lockdep.c:4617 lockdep_init_map_waits+0x141/0x222
...
Call Trace:
__kernfs_create_file+0x7a/0xd8
sysfs_add_file_mode_ns+0x135/0x189
sysfs_create_file_ns+0x70/0xa0
acpi_fan_probe+0x547/0x621
platform_drv_probe+0x67/0x8b
...
Dynamically allocated sysfs attributes need to be initialized to avoid
the warning.
Fixes: d19e470b66 ("ACPI: fan: Expose fan performance state information")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Cc: 5.6+ <stable@vger.kernel.org> # 5.6+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
pci_dev::physfn is only available when CONFIG_PCI_ATS is set. The recent
fix for the irqdomain rework missed that dependency which makes the build
fail when CONFIG_PCI_ATS=n.
Add the necessary #ifdeffery.
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Fixes: ff828729be ("iommu/vt-d: Cure VF irqdomain hickup")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Joerg Roedel <joro@8bytes.org>
Annotate tegra_pm_set[clear]_cpu_in_lp2() with RCU_NONIDLE in order to
fix lockdep warning about suspicious RCU usage of a spinlock during late
idling phase.
WARNING: suspicious RCU usage
...
include/trace/events/lock.h:13 suspicious rcu_dereference_check() usage!
...
(dump_stack) from (lock_acquire)
(lock_acquire) from (_raw_spin_lock)
(_raw_spin_lock) from (tegra_pm_set_cpu_in_lp2)
(tegra_pm_set_cpu_in_lp2) from (tegra_cpuidle_enter)
(tegra_cpuidle_enter) from (cpuidle_enter_state)
(cpuidle_enter_state) from (cpuidle_enter_state_coupled)
(cpuidle_enter_state_coupled) from (cpuidle_enter)
(cpuidle_enter) from (do_idle)
...
Tested-by: Peter Geis <pgwipeout@gmail.com>
Reported-by: Peter Geis <pgwipeout@gmail.com>
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Although cache alias management calls set up and tear down TLB entries
and fast_second_level_miss is able to restore TLB entry should it be
evicted they absolutely cannot preempt each other because they use the
same TLBTEMP area for different purposes.
Disable preemption around all cache alias management calls to enforce
that.
Cc: stable@vger.kernel.org
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
fast_second_level_miss handler for the TLBTEMP area has an assumption
that page table directory entry for the TLBTEMP address range is 0. For
it to be true the TLBTEMP area must be aligned to 4MB boundary and not
share its 4MB region with anything that may use a page table. This is
not true currently: TLBTEMP shares space with vmalloc space which
results in the following kinds of runtime errors when
fast_second_level_miss loads page table directory entry for the vmalloc
space instead of fixing up the TLBTEMP area:
Unable to handle kernel paging request at virtual address c7ff0e00
pc = d0009275, ra = 90009478
Oops: sig: 9 [#1] PREEMPT
CPU: 1 PID: 61 Comm: kworker/u9:2 Not tainted 5.10.0-rc3-next-20201110-00007-g1fe4962fa983-dirty #58
Workqueue: xprtiod xs_stream_data_receive_workfn
a00: 90009478 d11e1dc0 c7ff0e00 00000020 c7ff0000 00000001 7f8b8107 00000000
a08: 900c5992 d11e1d90 d0cc88b8 5506e97c 00000000 5506e97c d06c8074 d11e1d90
pc: d0009275, ps: 00060310, depc: 00000014, excvaddr: c7ff0e00
lbeg: d0009275, lend: d0009287 lcount: 00000003, sar: 00000010
Call Trace:
xs_stream_data_receive_workfn+0x43c/0x770
process_one_work+0x1a1/0x324
worker_thread+0x1cc/0x3c0
kthread+0x10d/0x124
ret_from_kernel_thread+0xc/0x18
Cc: stable@vger.kernel.org
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
The nVHE percpu data is partially linked but the nVHE linker script did
not align the percpu section. The PERCPU_INPUT macro would then align
the data to a page boundary:
#define PERCPU_INPUT(cacheline) \
__per_cpu_start = .; \
*(.data..percpu..first) \
. = ALIGN(PAGE_SIZE); \
*(.data..percpu..page_aligned) \
. = ALIGN(cacheline); \
*(.data..percpu..read_mostly) \
. = ALIGN(cacheline); \
*(.data..percpu) \
*(.data..percpu..shared_aligned) \
PERCPU_DECRYPTED_SECTION \
__per_cpu_end = .;
but then when the final vmlinux linking happens the hypervisor percpu
data is included after page alignment and so the offsets potentially
don't match. On my build I saw that the .hyp.data..percpu section was
at address 0x20 and then the percpu data would begin at 0x1000 (because
of the page alignment in PERCPU_INPUT), but when linked into vmlinux,
everything would be shifted down by 0x20 bytes.
This manifests as one of the CPUs getting lost when running
kvm-unit-tests or starting any VM and subsequent soft lockup on a Cortex
A72 device.
Fixes: 30c953911c ("kvm: arm64: Set up hyp percpu data for nVHE")
Signed-off-by: Jamie Iles <jamie@nuviainc.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: David Brazdil <dbrazdil@google.com>
Cc: David Brazdil <dbrazdil@google.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201113150406.14314-1-jamie@nuviainc.com
Fix build errors when CONFIG_TYPEC_QCOM_PMIC=y and
CONFIG_USB_ROLE_SWITCH=m by limiting the former to =m when
USB_ROLE_SWITCH also =m.
powerpc64-linux-ld: drivers/usb/typec/qcom-pmic-typec.o: in function `.qcom_pmic_typec_remove':
qcom-pmic-typec.c:(.text+0x28): undefined reference to `.usb_role_switch_set_role'
powerpc64-linux-ld: qcom-pmic-typec.c:(.text+0x64): undefined reference to `.usb_role_switch_put'
powerpc64-linux-ld: drivers/usb/typec/qcom-pmic-typec.o: in function `.qcom_pmic_typec_check_connection':
qcom-pmic-typec.c:(.text+0x120): undefined reference to `.usb_role_switch_set_role'
powerpc64-linux-ld: drivers/usb/typec/qcom-pmic-typec.o: in function `.qcom_pmic_typec_probe':
qcom-pmic-typec.c:(.text+0x360): undefined reference to `.fwnode_usb_role_switch_get'
powerpc64-linux-ld: qcom-pmic-typec.c:(.text+0x4e4): undefined reference to `.usb_role_switch_put'
Fixes: 6c8cf36951 ("usb: typec: Add QCOM PMIC typec detection driver")
Cc: linux-usb@vger.kernel.org
Cc: Wesley Cheng <wcheng@codeaurora.org>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/r/20201116040653.7943-1-rdunlap@infradead.org
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter writes:
Two bugs for Cadence USB3 gadget driver
- TD_SIZE entry at descriptor is error for multiple-trb use case
- Possible use uninitialized variables
* tag 'usb-fixes-v5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/peter.chen/usb:
usb: cdns3: gadget: calculate TD_SIZE based on TD
usb: cdns3: gadget: initialize link_trb as NULL
The TRB entry TD_SIZE is the packet number for the TD (request) but not the
each TRB, so it only needs to be assigned for the first TRB during the TD,
and the value of it is for TD too.
Fixes: 7733f6c32e ("usb: cdns3: Add Cadence USB3 DRD Driver")
Signed-off-by: Peter Chen <peter.chen@nxp.com>
There is an uninitialized variable "link_trb" usage at function cdns3_ep_run_transfer.
Fixed it by initialize "link_trb" as NULL.
Fixes: 4e218882eb ("usb: cdns3: gadget: improve the dump TRB operation at cdns3_ep_run_transfer")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Peter Chen <peter.chen@nxp.com>
Mid callback needs to be called only when valid data is
read into pages.
These patches address a problem found during decryption offload:
CIFS: VFS: trying to dequeue a deleted mid
that could cause a refcount use after free:
Workqueue: smb3decryptd smb2_decrypt_offload [cifs]
Signed-off-by: Rohith Surabattula <rohiths@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
CC: Stable <stable@vger.kernel.org> #5.4+
Signed-off-by: Steve French <stfrench@microsoft.com>
When reconnect happens Mid queue can be corrupted when both
demultiplex and offload thread try to dequeue the MID from the
pending list.
These patches address a problem found during decryption offload:
CIFS: VFS: trying to dequeue a deleted mid
that could cause a refcount use after free:
Workqueue: smb3decryptd smb2_decrypt_offload [cifs]
Signed-off-by: Rohith Surabattula <rohiths@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
CC: Stable <stable@vger.kernel.org> #5.4+
Signed-off-by: Steve French <stfrench@microsoft.com>
pseries guest kernels have a FWNMI handler for SRESET and MCE NMIs,
which is basically the same as the regular handlers for those
interrupts.
The system reset FWNMI handler did not have a KVM guest test in it,
although it probably should have because the guest can itself run
guests.
Commit 4f50541f67 ("powerpc/64s/exception: Move all interrupt
handlers to new style code gen macros") convert the handler faithfully
to avoid a KVM test with a "clever" trick to modify the IKVM_REAL
setting to 0 when the fwnmi handler is to be generated (PPC_PSERIES=y).
This worked when the KVM test was generated in the interrupt entry
handlers, but a later patch moved the KVM test to the common handler,
and the common handler macro is expanded below the fwnmi entry. This
prevents the KVM test from being generated even for the 0x100 entry
point as well.
The result is NMI IPIs in the host kernel when a guest is running will
use gest registers. This goes particularly badly when an HPT guest is
running and the MMU is set to guest mode.
Remove this trickery and just generate the test always.
Fixes: 9600f261ac ("powerpc/64s/exception: Move KVM test to common code")
Cc: stable@vger.kernel.org # v5.7+
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201114114743.3306283-1-npiggin@gmail.com
In newer versions of virtio-scsi we just reset the timer when an a
command times out, so TMFs are never sent for the cmd time out case.
However, in older kernels and for the TMF inject cases, we can still get
resets and we end up just failing immediately so the guest might see the
device get offlined and IO errors.
For the older kernel cases, we want the same end result as the
modern virtio-scsi driver where we let the lower levels fire their error
handling and handle the problem. And at the upper levels we want to
wait. This patch ties the LUN reset handling into the LIO TMF code which
will just wait for outstanding commands to complete like we are doing in
the modern virtio-scsi case.
Note: I did not handle the ABORT case to keep this simple. For ABORTs
LIO just waits on the cmd like how it does for the RESET case. If
an ABORT fails, the guest OS ends up escalating to LUN RESET, so in
the end we get the same behavior where we wait on the outstanding
cmds.
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Link: https://lore.kernel.org/r/1604986403-4931-6-git-send-email-michael.christie@oracle.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
We might not do the final se_cmd put from vhost_scsi_complete_cmd_work.
When the last put happens a little later then we could race where
vhost_scsi_complete_cmd_work does vhost_signal, the guest runs and sends
more IO, and vhost_scsi_handle_vq runs but does not find any free cmds.
This patch has us delay completing the cmd until the last lio core ref
is dropped. We then know that once we signal to the guest that the cmd
is completed that if it queues a new command it will find a free cmd.
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Reviewed-by: Maurizio Lombardi <mlombard@redhat.com>
Link: https://lore.kernel.org/r/1604986403-4931-4-git-send-email-michael.christie@oracle.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
We currently are limited to 256 cmds per session. This leads to problems
where if the user has increased virtqueue_size to more than 2 or
cmd_per_lun to more than 256 vhost_scsi_get_tag can fail and the guest
will get IO errors.
This patch moves the cmd allocation to per vq so we can easily match
whatever the user has specified for num_queues and
virtqueue_size/cmd_per_lun. It also makes it easier to control how much
memory we preallocate. For cases, where perf is not as important and
we can use the current defaults (1 vq and 128 cmds per vq) memory use
from preallocate cmds is cut in half. For cases, where we are willing
to use more memory for higher perf, cmd mem use will now increase as
the num queues and queue depth increases.
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Link: https://lore.kernel.org/r/1604986403-4931-3-git-send-email-michael.christie@oracle.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Maurizio Lombardi <mlombard@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Pull char/misc driver fixes from Greg KH:
"Here are some small char/misc/whatever driver fixes for 5.10-rc4.
Nothing huge, lots of small fixes for reported issues:
- habanalabs driver fixes
- speakup driver fixes
- uio driver fixes
- virtio driver fix
- other tiny driver fixes
Full details are in the shortlog.
All of these have been in linux-next for a full week with no reported
issues"
* tag 'char-misc-5.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
uio: Fix use-after-free in uio_unregister_device()
firmware: xilinx: fix out-of-bounds access
nitro_enclaves: Fixup type and simplify logic of the poll mask setup
speakup ttyio: Do not schedule() in ttyio_in_nowait
speakup: Fix clearing selection in safe context
speakup: Fix var_id_t values and thus keymap
virtio: virtio_console: fix DMA memory allocation for rproc serial
habanalabs/gaudi: mask WDT error in QMAN
habanalabs/gaudi: move coresight mmu config
habanalabs: fix kernel pointer type
mei: protect mei_cl_mtu from null dereference
Pull USB and Thunderbolt fixes from Greg KH:
"Here are some small Thunderbolt and USB driver fixes for 5.10-rc4 to
solve some reported issues.
Nothing huge in here, just small things:
- thunderbolt memory leaks fixed and new device ids added
- revert of problem patch for the musb driver
- new quirks added for USB devices
- typec power supply fixes to resolve much reported problems about
charging notifications not working anymore
All except the cdc-acm driver quirk addition have been in linux-next
with no reported issues (the quirk patch was applied on Friday, and is
self-contained)"
* tag 'usb-5.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: cdc-acm: Add DISABLE_ECHO for Renesas USB Download mode
MAINTAINERS: add usb raw gadget entry
usb: typec: ucsi: Report power supply changes
xhci: hisilicon: fix refercence leak in xhci_histb_probe
Revert "usb: musb: convert to devm_platform_ioremap_resource_byname"
thunderbolt: Add support for Intel Tiger Lake-H
thunderbolt: Only configure USB4 wake for lane 0 adapters
thunderbolt: Add uaccess dependency to debugfs interface
thunderbolt: Fix memory leak if ida_simple_get() fails in enumerate_services()
thunderbolt: Add the missed ida_simple_remove() in ring_request_msix()
Pull kvm fixes from Paolo Bonzini:
"Fixes for ARM and x86, the latter especially for old processors
without two-dimensional paging (EPT/NPT)"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
kvm: mmu: fix is_tdp_mmu_check when the TDP MMU is not in use
KVM: SVM: Update cr3_lm_rsvd_bits for AMD SEV guests
KVM: x86: Introduce cr3_lm_rsvd_bits in kvm_vcpu_arch
KVM: x86: clflushopt should be treated as a no-op by emulation
KVM: arm64: Handle SCXTNUM_ELx traps
KVM: arm64: Unify trap handlers injecting an UNDEF
KVM: arm64: Allow setting of ID_AA64PFR0_EL1.CSV2 from userspace
Pull x86 fixes from Thomas Gleixner:
"A small set of fixes for x86:
- Cure the fallout from the MSI irqdomain overhaul which missed that
the Intel IOMMU does not register virtual function devices and
therefore never reaches the point where the MSI interrupt domain is
assigned. This made the VF devices use the non-remapped MSI domain
which is trapped by the IOMMU/remap unit
- Remove an extra space in the SGI_UV architecture type procfs output
for UV5
- Remove a unused function which was missed when removing the UV BAU
TLB shootdown handler"
* tag 'x86-urgent-2020-11-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
iommu/vt-d: Cure VF irqdomain hickup
x86/platform/uv: Fix copied UV5 output archtype
x86/platform/uv: Drop last traces of uv_flush_tlb_others
Pull perf fixes from Thomas Gleixner:
"A set of fixes for perf:
- A set of commits which reduce the stack usage of various perf
event handling functions which allocated large data structs on
stack causing stack overflows in the worst case
- Use the proper mechanism for detecting soft interrupts in the
recursion protection
- Make the resursion protection simpler and more robust
- Simplify the scheduling of event groups to make the code more
robust and prepare for fixing the issues vs. scheduling of
exclusive event groups
- Prevent event multiplexing and rotation for exclusive event groups
- Correct the perf event attribute exclusive semantics to take
pinned events, e.g. the PMU watchdog, into account
- Make the anythread filtering conditional for Intel's generic PMU
counters as it is not longer guaranteed to be supported on newer
CPUs. Check the corresponding CPUID leaf to make sure
- Fixup a duplicate initialization in an array which was probably
caused by the usual 'copy & paste - forgot to edit' mishap"
* tag 'perf-urgent-2020-11-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel/uncore: Fix Add BW copypasta
perf/x86/intel: Make anythread filter support conditional
perf: Tweak perf_event_attr::exclusive semantics
perf: Fix event multiplexing for exclusive groups
perf: Simplify group_sched_in()
perf: Simplify group_sched_out()
perf/x86: Make dummy_iregs static
perf/arch: Remove perf_sample_data::regs_user_copy
perf: Optimize get_recursion_context()
perf: Fix get_recursion_context()
perf/x86: Reduce stack usage for x86_pmu::drain_pebs()
perf: Reduce stack usage of perf_output_begin()
Pull scheduler fixes from Thomas Gleixner:
"A set of scheduler fixes:
- Address a load balancer regression by making the load balancer use
the same logic as the wakeup path to spread tasks in the LLC domain
- Prefer the CPU on which a task run last over the local CPU in the
fast wakeup path for asymmetric CPU capacity systems to align with
the symmetric case. This ensures more locality and prevents massive
migration overhead on those asymetric systems
- Fix a memory corruption bug in the scheduler debug code caused by
handing a modified buffer pointer to kfree()"
* tag 'sched-urgent-2020-11-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/debug: Fix memory corruption caused by multiple small reads of flags
sched/fair: Prefer prev cpu in asymmetric wakeup path
sched/fair: Ensure tasks spreading in LLC during LB
There might be some requests pending in the buffer when the interface close
sequence occurs. In some devices, these pending requests might lead to the
module not shutting down properly when m_can_clk_stop() is called.
Therefore, move the device to init state before potentially powering it down.
Fixes: e0d1f4816f ("can: m_can: add Bosch M_CAN controller support")
Signed-off-by: Faiz Abbas <faiz_abbas@ti.com>
Acked-by: Dan Murphy <dmurphy@ti.com>
Link: https://lore.kernel.org/r/20200825055442.16994-1-faiz_abbas@ti.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Pull locking fixes from Thomas Gleixner:
"Two fixes for the locking subsystem:
- Prevent an unconditional interrupt enable in a futex helper
function which can be called from contexts which expect interrupts
to stay disabled across the call
- Don't modify lockdep chain keys in the validation process as that
causes chain inconsistency"
* tag 'locking-urgent-2020-11-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
lockdep: Avoid to modify chain keys in validate_chain()
futex: Don't enable IRQs unconditionally in put_pi_state()
m_can_handle_state_change() is called with the new_state as an argument.
In the switch statements for CAN_STATE_ERROR_ACTIVE, the comment and the
following code indicate that a CAN_STATE_ERROR_WARNING is handled.
This patch fixes this problem by changing the case to CAN_STATE_ERROR_WARNING.
Signed-off-by: Wu Bo <wubo.oduw@gmail.com>
Link: http://lore.kernel.org/r/20200129022330.21248-2-wubo.oduw@gmail.com
Cc: Dan Murphy <dmurphy@ti.com>
Fixes: e0d1f4816f ("can: m_can: add Bosch M_CAN controller support")
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
regmap is a library function that gets selected by drivers that need it. No
driver modules should depend on it. Instead depends on SPI and select
REGMAP_SPI. Depending on REGMAP_SPI makes this driver only build if another
driver already selected REGMAP_SPI, as the symbol can't be selected through the
menu kernel configuration.
Signed-off-by: Enric Balletbo i Serra <enric.balletbo@collabora.com>
Link: http://lore.kernel.org/r/20200413141013.506613-1-enric.balletbo@collabora.com
Reviewed-by: Dan Murphy <dmurphy@ti.com>
Fixes: 5443c226ba ("can: tcan4x5x: Add tcan4x5x driver to the kernel")
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
pm_runtime_get_sync() will increment pm usage at first and it will resume the
device later. If runtime of the device has error or device is in inaccessible
state(or other error state), resume operation will fail. If we do not call put
operation to decrease the reference, it will result in reference leak in the
two functions flexcan_get_berr_counter() and flexcan_open().
Moreover, this device cannot enter the idle state and always stay busy or other
non-idle state later. So we should fix it through adding
pm_runtime_put_noidle().
Fixes: ca10989632 ("can: flexcan: implement can Runtime PM")
Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com>
Link: https://lore.kernel.org/r/20201108083000.2599705-1-zhangqilong3@huawei.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
In the patch
d9b081e3fc ("can: flexcan: remove ack_grp and ack_bit handling from driver")
the unused ack_grp and ack_bit were removed from the driver. However in the
comment, the "req_bit" was accidentally removed, too.
This patch adds back the "req_bit" bit.
Fixes: d9b081e3fc ("can: flexcan: remove ack_grp and ack_bit handling from driver")
Reported-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Link: http://lore.kernel.org/r/20201014114810.2911135-1-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
The left shift of int 32 bit integer constant 1 is evaluated using 32 bit
arithmetic and then assigned to a signed 64 bit variable. In the case where
time_ref->adapter->ts_used_bits is 32 or more this can lead to an oveflow.
Avoid this by shifting using the BIT_ULL macro instead.
Fixes: bb4785551f ("can: usb: PEAK-System Technik USB adapters driver core")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Link: https://lore.kernel.org/r/20201105112427.40688-1-colin.king@canonical.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
In canfd_rcv(), cfd->len is uninitialized when skb->len = 0, and this
uninitialized cfd->len is accessed nonetheless by pr_warn_once().
Fix this uninitialized variable access by checking cfd->len's validity
condition (cfd->len > CANFD_MAX_DLEN) separately after the skb->len's
condition is checked, and appropriately modify the log messages that
are generated as well.
In case either of the required conditions fail, the skb is freed and
NET_RX_DROP is returned, same as before.
Fixes: d468984688 ("can: af_can: canfd_rcv(): replace WARN_ONCE by pr_warn_once")
Reported-by: syzbot+9bcb0c9409066696d3aa@syzkaller.appspotmail.com
Tested-by: Anant Thazhemadam <anant.thazhemadam@gmail.com>
Signed-off-by: Anant Thazhemadam <anant.thazhemadam@gmail.com>
Link: https://lore.kernel.org/r/20201103213906.24219-3-anant.thazhemadam@gmail.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
In can_rcv(), cfd->len is uninitialized when skb->len = 0, and this
uninitialized cfd->len is accessed nonetheless by pr_warn_once().
Fix this uninitialized variable access by checking cfd->len's validity
condition (cfd->len > CAN_MAX_DLEN) separately after the skb->len's
condition is checked, and appropriately modify the log messages that
are generated as well.
In case either of the required conditions fail, the skb is freed and
NET_RX_DROP is returned, same as before.
Fixes: 8cb68751c1 ("can: af_can: can_rcv(): replace WARN_ONCE by pr_warn_once")
Reported-by: syzbot+9bcb0c9409066696d3aa@syzkaller.appspotmail.com
Tested-by: Anant Thazhemadam <anant.thazhemadam@gmail.com>
Signed-off-by: Anant Thazhemadam <anant.thazhemadam@gmail.com>
Link: https://lore.kernel.org/r/20201103213906.24219-2-anant.thazhemadam@gmail.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Pull percpu fix and cleanup from Dennis Zhou:
"A fix for a Wshadow warning in the asm-generic percpu macros came in
and then I tacked on the removal of flexible array initializers in the
percpu allocator"
* 'for-5.10-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu:
percpu: convert flexible array initializers to use struct_size()
asm-generic: percpu: avoid Wshadow warning
In some cases where shadow paging is in use, the root page will
be either mmu->pae_root or vcpu->arch.mmu->lm_root. Then it will
not have an associated struct kvm_mmu_page, because it is allocated
with alloc_page instead of kvm_mmu_alloc_page.
Just return false quickly from is_tdp_mmu_root if the TDP MMU is
not in use, which also includes the case where shadow paging is
enabled.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
If THIS_MODULE is not set, the module would be removed while debugfs is
being used.
It eventually makes kernel panic.
Fixes: c6c8fea297 ("net: Add batman-adv meshing protocol")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
On arm imx6, when opening the chip's netdev, the whole Linux
kernel intermittently hangs/freezes.
This is caused by a bug in the driver code which tests if pcie
interrupts are working correctly, using the software interrupt:
1. open: enable the software interrupt
2. open: tell the chip to assert the software interrupt
3. open: wait for flag
4. ISR: acknowledge s/w interrupt, set flag
5. open: notice flag, disable the s/w interrupt, continue
Unfortunately the ISR only acknowledges the s/w interrupt, but
does not disable it. This will re-trigger the ISR in a tight
loop.
On some (lucky) platforms, open proceeds to disable the s/w
interrupt even while the ISR is 'spinning'. On arm imx6,
the spinning ISR does not allow open to proceed, resulting
in a hung Linux kernel.
Fix minimally by disabling the s/w interrupt in the ISR, which
will prevent it from spinning. This won't break anything because
the s/w interrupt is used as a one-shot interrupt.
Note that this is a minimal fix, overlooking many possible
cleanups, e.g.:
- lan743x_intr_software_isr() is completely redundant and reads
INT_STS twice for no apparent reason
- disabling the s/w interrupt in lan743x_intr_test_isr() is now
redundant, but harmless
- waiting on software_isr_flag can be converted from a sleeping
poll loop to wait_event_timeout()
Fixes: 23f0703c12 ("lan743x: Add main source files for new lan743x driver")
Tested-by: Sven Van Asbroeck <thesven73@gmail.com> # arm imx6 lan7430
Signed-off-by: Sven Van Asbroeck <thesven73@gmail.com>
Link: https://lore.kernel.org/r/20201112204741.12375-1-TheSven73@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When running this chip on arm imx6, we intermittently observe
the following kernel warning in the log, especially when the
system is under high load:
[ 50.119484] ------------[ cut here ]------------
[ 50.124377] WARNING: CPU: 0 PID: 303 at kernel/softirq.c:169 __local_bh_enable_ip+0x100/0x184
[ 50.132925] IRQs not enabled as expected
[ 50.159250] CPU: 0 PID: 303 Comm: rngd Not tainted 5.7.8 #1
[ 50.164837] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
[ 50.171395] [<c0111a38>] (unwind_backtrace) from [<c010be28>] (show_stack+0x10/0x14)
[ 50.179162] [<c010be28>] (show_stack) from [<c05b9dec>] (dump_stack+0xac/0xd8)
[ 50.186408] [<c05b9dec>] (dump_stack) from [<c0122e40>] (__warn+0xd0/0x10c)
[ 50.193391] [<c0122e40>] (__warn) from [<c0123238>] (warn_slowpath_fmt+0x98/0xc4)
[ 50.200892] [<c0123238>] (warn_slowpath_fmt) from [<c012b010>] (__local_bh_enable_ip+0x100/0x184)
[ 50.209860] [<c012b010>] (__local_bh_enable_ip) from [<bf09ecbc>] (destroy_conntrack+0x48/0xd8 [nf_conntrack])
[ 50.220038] [<bf09ecbc>] (destroy_conntrack [nf_conntrack]) from [<c0ac9b58>] (nf_conntrack_destroy+0x94/0x168)
[ 50.230160] [<c0ac9b58>] (nf_conntrack_destroy) from [<c0a4aaa0>] (skb_release_head_state+0xa0/0xd0)
[ 50.239314] [<c0a4aaa0>] (skb_release_head_state) from [<c0a4aadc>] (skb_release_all+0xc/0x24)
[ 50.247946] [<c0a4aadc>] (skb_release_all) from [<c0a4b4cc>] (consume_skb+0x74/0x17c)
[ 50.255796] [<c0a4b4cc>] (consume_skb) from [<c081a2dc>] (lan743x_tx_release_desc+0x120/0x124)
[ 50.264428] [<c081a2dc>] (lan743x_tx_release_desc) from [<c081a98c>] (lan743x_tx_napi_poll+0x5c/0x18c)
[ 50.273755] [<c081a98c>] (lan743x_tx_napi_poll) from [<c0a6b050>] (net_rx_action+0x118/0x4a4)
[ 50.282306] [<c0a6b050>] (net_rx_action) from [<c0101364>] (__do_softirq+0x13c/0x53c)
[ 50.290157] [<c0101364>] (__do_softirq) from [<c012b29c>] (irq_exit+0x150/0x17c)
[ 50.297575] [<c012b29c>] (irq_exit) from [<c0196a08>] (__handle_domain_irq+0x60/0xb0)
[ 50.305423] [<c0196a08>] (__handle_domain_irq) from [<c05d44fc>] (gic_handle_irq+0x4c/0x90)
[ 50.313790] [<c05d44fc>] (gic_handle_irq) from [<c0100ed4>] (__irq_usr+0x54/0x80)
[ 50.321287] Exception stack(0xecd99fb0 to 0xecd99ff8)
[ 50.326355] 9fa0: 1cf1aa74 00000001 00000001 00000000
[ 50.334547] 9fc0: 00000001 00000000 00000000 00000000 00000000 00000000 00004097 b6d17d14
[ 50.342738] 9fe0: 00000001 b6d17c60 00000000 b6e71f94 800b0010 ffffffff
[ 50.349364] irq event stamp: 2525027
[ 50.352955] hardirqs last enabled at (2525026): [<c0a6afec>] net_rx_action+0xb4/0x4a4
[ 50.360892] hardirqs last disabled at (2525027): [<c0d6d2fc>] _raw_spin_lock_irqsave+0x1c/0x50
[ 50.369517] softirqs last enabled at (2524660): [<c01015b4>] __do_softirq+0x38c/0x53c
[ 50.377446] softirqs last disabled at (2524693): [<c012b29c>] irq_exit+0x150/0x17c
[ 50.385027] ---[ end trace c0b571db4bc8087d ]---
The driver is calling dev_kfree_skb() from code inside a spinlock,
where h/w interrupts are disabled. This is forbidden, as documented
in include/linux/netdevice.h. The correct function to use
dev_kfree_skb_irq(), or dev_kfree_skb_any().
Fix by using the correct dev_kfree_skb_xxx() functions:
in lan743x_tx_release_desc():
called by lan743x_tx_release_completed_descriptors()
called by in lan743x_tx_napi_poll()
which holds a spinlock
called by lan743x_tx_release_all_descriptors()
called by lan743x_tx_close()
which can-sleep
conclusion: use dev_kfree_skb_any()
in lan743x_tx_xmit_frame():
which holds a spinlock
conclusion: use dev_kfree_skb_irq()
in lan743x_tx_close():
which can-sleep
conclusion: use dev_kfree_skb()
in lan743x_rx_release_ring_element():
called by lan743x_rx_close()
which can-sleep
called by lan743x_rx_open()
which can-sleep
conclusion: use dev_kfree_skb()
Fixes: 23f0703c12 ("lan743x: Add main source files for new lan743x driver")
Signed-off-by: Sven Van Asbroeck <thesven73@gmail.com>
Link: https://lore.kernel.org/r/20201112185949.11315-1-TheSven73@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Merge fixes from Andrew Morton:
"14 patches.
Subsystems affected by this patch series: mm (migration, vmscan, slub,
gup, memcg, hugetlbfs), mailmap, kbuild, reboot, watchdog, panic, and
ocfs2"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
ocfs2: initialize ip_next_orphan
panic: don't dump stack twice on warn
hugetlbfs: fix anon huge page migration race
mm: memcontrol: fix missing wakeup polling thread
kernel/watchdog: fix watchdog_allowed_mask not used warning
reboot: fix overflow parsing reboot cpu number
Revert "kernel/reboot.c: convert simple_strtoul to kstrtoint"
compiler.h: fix barrier_data() on clang
mm/gup: use unpin_user_pages() in __gup_longterm_locked()
mm/slub: fix panic in slab_alloc_node()
mailmap: fix entry for Dmitry Baryshkov/Eremin-Solenikov
mm/vmscan: fix NR_ISOLATED_FILE corruption on 64-bit
mm/compaction: stop isolation if too many pages are isolated and we have pages to migrate
mm/compaction: count pages and stop correctly during page isolation
Pull clk fixes from Stephen Boyd:
"Two small clk driver fixes:
- Make to_clk_regmap() inline to avoid compiler annoyance
- Fix critical clks on i.MX imx8m SoCs"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: imx8m: fix bus critical clk registration
clk: define to_clk_regmap() as inline function
Pull SCSI fixes from James Bottomley:
"Three small fixes, all in the embedded ufs driver subsystem"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: ufshcd: Fix missing destroy_workqueue()
scsi: ufs: Try to save power mode change and UIC cmd completion timeout
scsi: ufs: Fix unbalanced scsi_block_reqs_cnt caused by ufshcd_hold()
Pull selinux fix from Paul Moore:
"One small SELinux patch to make sure we return an error code when an
allocation fails. It passes all of our tests, but given the nature of
the patch that isn't surprising"
* tag 'selinux-pr-20201113' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: Fix error return code in sel_ib_pkey_sid_slow()
A call trace was found in Hangbin's Codenomicon testing with debug kernel:
[ 2615.981988] ODEBUG: free active (active state 0) object type: timer_list hint: sctp_generate_proto_unreach_event+0x0/0x3a0 [sctp]
[ 2615.995050] WARNING: CPU: 17 PID: 0 at lib/debugobjects.c:328 debug_print_object+0x199/0x2b0
[ 2616.095934] RIP: 0010:debug_print_object+0x199/0x2b0
[ 2616.191533] Call Trace:
[ 2616.194265] <IRQ>
[ 2616.202068] debug_check_no_obj_freed+0x25e/0x3f0
[ 2616.207336] slab_free_freelist_hook+0xeb/0x140
[ 2616.220971] kfree+0xd6/0x2c0
[ 2616.224293] rcu_do_batch+0x3bd/0xc70
[ 2616.243096] rcu_core+0x8b9/0xd00
[ 2616.256065] __do_softirq+0x23d/0xacd
[ 2616.260166] irq_exit+0x236/0x2a0
[ 2616.263879] smp_apic_timer_interrupt+0x18d/0x620
[ 2616.269138] apic_timer_interrupt+0xf/0x20
[ 2616.273711] </IRQ>
This is because it holds asoc when transport->proto_unreach_timer starts
and puts asoc when the timer stops, and without holding transport the
transport could be freed when the timer is still running.
So fix it by holding/putting transport instead for proto_unreach_timer
in transport, just like other timers in transport.
v1->v2:
- Also use sctp_transport_put() for the "out_unlock:" path in
sctp_generate_proto_unreach_event(), as Marcelo noticed.
Fixes: 50b5d6ad63 ("sctp: Fix a race between ICMP protocol unreachable and connect()")
Reported-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Link: https://lore.kernel.org/r/102788809b554958b13b95d33440f5448113b8d6.1605331373.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull uml fix from Richard Weinberger:
"Call PMD destructor in __pmd_free_tlb()"
* tag 'for-linus-5.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
um: Call pgtable_pmd_page_dtor() in __pmd_free_tlb()
When afs_write_end() is called with copied == 0, it tries to set the
dirty region, but there's no way to actually encode a 0-length region in
the encoding in page->private.
"0,0", for example, indicates a 1-byte region at offset 0. The maths
miscalculates this and sets it incorrectly.
Fix it to just do nothing but unlock and put the page in this case. We
don't actually need to mark the page dirty as nothing presumably
changed.
Fixes: 65dd2d6072 ("afs: Alter dirty range encoding in page->private")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Before commit c0cfa2d8a7 ("vsock: add multi-transports support"),
if a G2H transport was loaded (e.g. virtio transport), every packets
was forwarded to the host, regardless of the destination CID.
The H2G transports implemented until then (vhost-vsock, VMCI) always
responded with an error, if the destination CID was not
VMADDR_CID_HOST.
From that commit, we are using the remote CID to decide which
transport to use, so packets with remote CID > VMADDR_CID_HOST(2)
are sent only through H2G transport. If no H2G is available, packets
are discarded directly in the guest.
Some use cases (e.g. Nitro Enclaves [1]) rely on the old behaviour
to implement sibling VMs communication, so we restore the old
behavior when no H2G is registered.
It will be up to the host to discard packets if the destination is
not the right one. As it was already implemented before adding
multi-transport support.
Tested with nested QEMU/KVM by me and Nitro Enclaves by Andra.
[1] Documentation/virt/ne_overview.rst
Cc: Jorgen Hansen <jhansen@vmware.com>
Cc: Dexuan Cui <decui@microsoft.com>
Fixes: c0cfa2d8a7 ("vsock: add multi-transports support")
Reported-by: Andra Paraschiv <andraprs@amazon.com>
Tested-by: Andra Paraschiv <andraprs@amazon.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/20201112133837.34183-1-sgarzare@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
As soon as you add the second port to a VLAN, all other port
membership configuration is overwritten with zeroes. The HW interprets
this as all ports being "unmodified members" of the VLAN.
In the simple case when all ports belong to the same VLAN, switching
will still work. But using multiple VLANs or trying to set multiple
ports as tagged members will not work.
On the 6352, doing a VTU GetNext op, followed by an STU GetNext op
will leave you with both the member- and state- data in the VTU/STU
data registers. But on the 6097 (which uses the same implementation),
the STU GetNext will override the information gathered from the VTU
GetNext.
Separate the two stages, parsing the result of the VTU GetNext before
doing the STU GetNext.
We opt to update the existing implementation for all applicable chips,
as opposed to creating a separate callback for 6097, because although
the previous implementation did work for (at least) 6352, the
datasheet does not mention the masking behavior.
Fixes: ef6fcea37f ("net: dsa: mv88e6xxx: get STU entry on VTU GetNext")
Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Link: https://lore.kernel.org/r/20201112114335.27371-1-tobias@waldekranz.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Though problem if found on a lower 4.1.12 kernel, I think upstream has
same issue.
In one node in the cluster, there is the following callback trace:
# cat /proc/21473/stack
__ocfs2_cluster_lock.isra.36+0x336/0x9e0 [ocfs2]
ocfs2_inode_lock_full_nested+0x121/0x520 [ocfs2]
ocfs2_evict_inode+0x152/0x820 [ocfs2]
evict+0xae/0x1a0
iput+0x1c6/0x230
ocfs2_orphan_filldir+0x5d/0x100 [ocfs2]
ocfs2_dir_foreach_blk+0x490/0x4f0 [ocfs2]
ocfs2_dir_foreach+0x29/0x30 [ocfs2]
ocfs2_recover_orphans+0x1b6/0x9a0 [ocfs2]
ocfs2_complete_recovery+0x1de/0x5c0 [ocfs2]
process_one_work+0x169/0x4a0
worker_thread+0x5b/0x560
kthread+0xcb/0xf0
ret_from_fork+0x61/0x90
The above stack is not reasonable, the final iput shouldn't happen in
ocfs2_orphan_filldir() function. Looking at the code,
2067 /* Skip inodes which are already added to recover list, since dio may
2068 * happen concurrently with unlink/rename */
2069 if (OCFS2_I(iter)->ip_next_orphan) {
2070 iput(iter);
2071 return 0;
2072 }
2073
The logic thinks the inode is already in recover list on seeing
ip_next_orphan is non-NULL, so it skip this inode after dropping a
reference which incremented in ocfs2_iget().
While, if the inode is already in recover list, it should have another
reference and the iput() at line 2070 should not be the final iput
(dropping the last reference). So I don't think the inode is really in
the recover list (no vmcore to confirm).
Note that ocfs2_queue_orphans(), though not shown up in the call back
trace, is holding cluster lock on the orphan directory when looking up
for unlinked inodes. The on disk inode eviction could involve a lot of
IOs which may need long time to finish. That means this node could hold
the cluster lock for very long time, that can lead to the lock requests
(from other nodes) to the orhpan directory hang for long time.
Looking at more on ip_next_orphan, I found it's not initialized when
allocating a new ocfs2_inode_info structure.
This causes te reflink operations from some nodes hang for very long
time waiting for the cluster lock on the orphan directory.
Fix: initialize ip_next_orphan as NULL.
Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20201109171746.27884-1-wen.gang.wang@oracle.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Qian Cai reported the following BUG in [1]
LTP: starting move_pages12
BUG: unable to handle page fault for address: ffffffffffffffe0
...
RIP: 0010:anon_vma_interval_tree_iter_first+0xa2/0x170 avc_start_pgoff at mm/interval_tree.c:63
Call Trace:
rmap_walk_anon+0x141/0xa30 rmap_walk_anon at mm/rmap.c:1864
try_to_unmap+0x209/0x2d0 try_to_unmap at mm/rmap.c:1763
migrate_pages+0x1005/0x1fb0
move_pages_and_store_status.isra.47+0xd7/0x1a0
__x64_sys_move_pages+0xa5c/0x1100
do_syscall_64+0x5f/0x310
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Hugh Dickins diagnosed this as a migration bug caused by code introduced
to use i_mmap_rwsem for pmd sharing synchronization. Specifically, the
routine unmap_and_move_huge_page() is always passing the TTU_RMAP_LOCKED
flag to try_to_unmap() while holding i_mmap_rwsem. This is wrong for
anon pages as the anon_vma_lock should be held in this case. Further
analysis suggested that i_mmap_rwsem was not required to he held at all
when calling try_to_unmap for anon pages as an anon page could never be
part of a shared pmd mapping.
Discussion also revealed that the hack in hugetlb_page_mapping_lock_write
to drop page lock and acquire i_mmap_rwsem is wrong. There is no way to
keep mapping valid while dropping page lock.
This patch does the following:
- Do not take i_mmap_rwsem and set TTU_RMAP_LOCKED for anon pages when
calling try_to_unmap.
- Remove the hacky code in hugetlb_page_mapping_lock_write. The routine
will now simply do a 'trylock' while still holding the page lock. If
the trylock fails, it will return NULL. This could impact the
callers:
- migration calling code will receive -EAGAIN and retry up to the
hard coded limit (10).
- memory error code will treat the page as BUSY. This will force
killing (SIGKILL) instead of SIGBUS any mapping tasks.
Do note that this change in behavior only happens when there is a
race. None of the standard kernel testing suites actually hit this
race, but it is possible.
[1] https://lore.kernel.org/lkml/20200708012044.GC992@lca.pw/
[2] https://lore.kernel.org/linux-mm/alpine.LSU.2.11.2010071833100.2214@eggly.anvils/
Fixes: c0d0381ade ("hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization")
Reported-by: Qian Cai <cai@lca.pw>
Suggested-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20201105195058.78401-1-mike.kravetz@oracle.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "fix parsing of reboot= cmdline", v3.
The parsing of the reboot= cmdline has two major errors:
- a missing bound check can crash the system on reboot
- parsing of the cpu number only works if specified last
Fix both.
This patch (of 2):
This reverts commit 616feab753.
kstrtoint() and simple_strtoul() have a subtle difference which makes
them non interchangeable: if a non digit character is found amid the
parsing, the former will return an error, while the latter will just
stop parsing, e.g. simple_strtoul("123xyx") = 123.
The kernel cmdline reboot= argument allows to specify the CPU used for
rebooting, with the syntax `s####` among the other flags, e.g.
"reboot=warm,s31,force", so if this flag is not the last given, it's
silently ignored as well as the subsequent ones.
Fixes: 616feab753 ("kernel/reboot.c: convert simple_strtoul to kstrtoint")
Signed-off-by: Matteo Croce <mcroce@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Robin Holt <robinmholt@gmail.com>
Cc: Fabian Frederick <fabf@skynet.be>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20201103214025.116799-2-mcroce@linux.microsoft.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 815f0ddb34 ("include/linux/compiler*.h: make compiler-*.h
mutually exclusive") neglected to copy barrier_data() from
compiler-gcc.h into compiler-clang.h.
The definition in compiler-gcc.h was really to work around clang's more
aggressive optimization, so this broke barrier_data() on clang, and
consequently memzero_explicit() as well.
For example, this results in at least the memzero_explicit() call in
lib/crypto/sha256.c:sha256_transform() being optimized away by clang.
Fix this by moving the definition of barrier_data() into compiler.h.
Also move the gcc/clang definition of barrier() into compiler.h,
__memory_barrier() is icc-specific (and barrier() is already defined
using it in compiler-intel.h) and doesn't belong in compiler.h.
[rdunlap@infradead.org: fix ALPHA builds when SMP is not enabled]
Link: https://lkml.kernel.org/r/20201101231835.4589-1-rdunlap@infradead.org
Fixes: 815f0ddb34 ("include/linux/compiler*.h: make compiler-*.h mutually exclusive")
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20201014212631.207844-1-nivedita@alum.mit.edu
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
While doing memory hot-unplug operation on a PowerPC VM running 1024 CPUs
with 11TB of ram, I hit the following panic:
BUG: Kernel NULL pointer dereference on read at 0x00000007
Faulting instruction address: 0xc000000000456048
Oops: Kernel access of bad area, sig: 11 [#2]
LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS= 2048 NUMA pSeries
Modules linked in: rpadlpar_io rpaphp
CPU: 160 PID: 1 Comm: systemd Tainted: G D 5.9.0 #1
NIP: c000000000456048 LR: c000000000455fd4 CTR: c00000000047b350
REGS: c00006028d1b77a0 TRAP: 0300 Tainted: G D (5.9.0)
MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 24004228 XER: 00000000
CFAR: c00000000000f1b0 DAR: 0000000000000007 DSISR: 40000000 IRQMASK: 0
GPR00: c000000000455fd4 c00006028d1b7a30 c000000001bec800 0000000000000000
GPR04: 0000000000000dc0 0000000000000000 00000000000374ef c00007c53df99320
GPR08: 000007c53c980000 0000000000000000 000007c53c980000 0000000000000000
GPR12: 0000000000004400 c00000001e8e4400 0000000000000000 0000000000000f6a
GPR16: 0000000000000000 c000000001c25930 c000000001d62528 00000000000000c1
GPR20: c000000001d62538 c00006be469e9000 0000000fffffffe0 c0000000003c0ff8
GPR24: 0000000000000018 0000000000000000 0000000000000dc0 0000000000000000
GPR28: c00007c513755700 c000000001c236a4 c00007bc4001f800 0000000000000001
NIP [c000000000456048] __kmalloc_node+0x108/0x790
LR [c000000000455fd4] __kmalloc_node+0x94/0x790
Call Trace:
kvmalloc_node+0x58/0x110
mem_cgroup_css_online+0x10c/0x270
online_css+0x48/0xd0
cgroup_apply_control_enable+0x2c4/0x470
cgroup_mkdir+0x408/0x5f0
kernfs_iop_mkdir+0x90/0x100
vfs_mkdir+0x138/0x250
do_mkdirat+0x154/0x1c0
system_call_exception+0xf8/0x200
system_call_common+0xf0/0x27c
Instruction dump:
e93e0000 e90d0030 39290008 7cc9402a e94d0030 e93e0000 7ce95214 7f89502a
2fbc0000 419e0018 41920230 e9270010 <89290007> 7f994800 419e0220 7ee6bb78
This pointing to the following code:
mm/slub.c:2851
if (unlikely(!object || !node_match(page, node))) {
c000000000456038: 00 00 bc 2f cmpdi cr7,r28,0
c00000000045603c: 18 00 9e 41 beq cr7,c000000000456054 <__kmalloc_node+0x114>
node_match():
mm/slub.c:2491
if (node != NUMA_NO_NODE && page_to_nid(page) != node)
c000000000456040: 30 02 92 41 beq cr4,c000000000456270 <__kmalloc_node+0x330>
page_to_nid():
include/linux/mm.h:1294
c000000000456044: 10 00 27 e9 ld r9,16(r7)
c000000000456048: 07 00 29 89 lbz r9,7(r9) <<<< r9 = NULL
node_match():
mm/slub.c:2491
c00000000045604c: 00 48 99 7f cmpw cr7,r25,r9
c000000000456050: 20 02 9e 41 beq cr7,c000000000456270 <__kmalloc_node+0x330>
The panic occurred in slab_alloc_node() when checking for the page's node:
object = c->freelist;
page = c->page;
if (unlikely(!object || !node_match(page, node))) {
object = __slab_alloc(s, gfpflags, node, addr, c);
stat(s, ALLOC_SLOWPATH);
The issue is that object is not NULL while page is NULL which is odd but
may happen if the cache flush happened after loading object but before
loading page. Thus checking for the page pointer is required too.
The cache flush is done through an inter processor interrupt when a
piece of memory is off-lined. That interrupt is triggered when a memory
hot-unplug operation is initiated and offline_pages() is calling the
slub's MEM_GOING_OFFLINE callback slab_mem_going_offline_callback()
which is calling flush_cpu_slab(). If that interrupt is caught between
the reading of c->freelist and the reading of c->page, this could lead
to such a situation. That situation is expected and the later call to
this_cpu_cmpxchg_double() will detect the change to c->freelist and redo
the whole operation.
In commit 6159d0f5c0 ("mm/slub.c: page is always non-NULL in
node_match()") check on the page pointer has been removed assuming that
page is always valid when it is called. It happens that this is not
true in that particular case, so check for page before calling
node_match() here.
Fixes: 6159d0f5c0 ("mm/slub.c: page is always non-NULL in node_match()")
Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Nathan Lynch <nathanl@linux.ibm.com>
Cc: Scott Cheloha <cheloha@linux.ibm.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20201027190406.33283-1-ldufour@linux.ibm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In isolate_migratepages_block, when cc->alloc_contig is true, we are
able to isolate compound pages. But nr_migratepages and nr_isolated did
not count compound pages correctly, causing us to isolate more pages
than we thought.
So count compound pages as the number of base pages they contain.
Otherwise, we might be trapped in too_many_isolated while loop, since
the actual isolated pages can go up to COMPACT_CLUSTER_MAX*512=16384,
where COMPACT_CLUSTER_MAX is 32, since we stop isolation after
cc->nr_migratepages reaches to COMPACT_CLUSTER_MAX.
In addition, after we fix the issue above, cc->nr_migratepages could
never be equal to COMPACT_CLUSTER_MAX if compound pages are isolated,
thus page isolation could not stop as we intended. Change the isolation
stop condition to '>='.
The issue can be triggered as follows:
In a system with 16GB memory and an 8GB CMA region reserved by
hugetlb_cma, if we first allocate 10GB THPs and mlock them (so some THPs
are allocated in the CMA region and mlocked), reserving 6 1GB hugetlb
pages via /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages will
get stuck (looping in too_many_isolated function) until we kill either
task. With the patch applied, oom will kill the application with 10GB
THPs and let hugetlb page reservation finish.
[ziy@nvidia.com: v3]
Link: https://lkml.kernel.org/r/20201030183809.3616803-1-zi.yan@sent.com
Fixes: 1da2f328fa ("cmm,thp,compaction,cma: allow THP migration for CMA allocations")
Signed-off-by: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Rik van Riel <riel@surriel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20201029200435.3386066-1-zi.yan@sent.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
disk_get_part needs to be paired with a disk_put_part.
Cc: stable@vger.kernel.org
Fixes: ef45fe470e ("blk-cgroup: show global disk stats in root cgroup io.stat")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Some 360 degree hinges (yoga) style 2-in-1 devices use 2 KXCJ91008-s
to allow the OS to determine the angle between the display and the base
of the device, so that the OS can determine if the 2-in-1 is in laptop
or in tablet-mode.
On Windows both accelerometers are read by a special HingeAngleService
process; and this process calls a DSM (Device Specific Method) on the
ACPI KIOX010A device node for the sensor in the display, to let the
embedded-controller (EC) know about the mode so that it can disable the
kbd and touchpad to avoid spurious input while folded into tablet-mode.
This notifying of the EC is problematic because sometimes the EC comes up
thinking that device is in tablet-mode and the kbd and touchpad do not
work. This happens for example on Irbis NB111 devices after a suspend /
resume cycle (after a complete battery drain / hard reset without having
booted Windows at least once). Other 2-in-1s which are likely affected
too are e.g. the Teclast F5 and F6 series.
The kxcjk-1013 driver may seem like a strange place to deal with this,
but since it is *the* driver for the ACPI KIOX010A device, it is also
the driver which has access to the ACPI handle needed by the DSM.
Add support for calling the DSM and on probe unconditionally tell the
EC that the device is laptop mode, fixing the kbd and touchpad sometimes
not working.
Fixes: 7f6232e695 ("iio: accel: kxcjk1013: Add KIOX010A ACPI Hardware-ID")
Reported-and-tested-by: russianneuromancer <russianneuromancer@ya.ru>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Cc: <Stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20201110133835.129080-3-hdegoede@redhat.com
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Replace the boolean is_smo8500_device variable with an acpi_type enum.
For now this can be either ACPI_GENERIC or ACPI_SMO8500, this is a
preparation patch for adding special handling for the KIOX010A ACPI HID,
which will add a ACPI_KIOX010A acpi_type to the introduced enum.
For stable as needed as precursor for next patch.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Fixes: 7f6232e695 ("iio: accel: kxcjk1013: Add KIOX010A ACPI Hardware-ID")
Cc: <Stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20201110133835.129080-2-hdegoede@redhat.com
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Any attempt to do path resolution on /proc/self from an async worker will
yield -EOPNOTSUPP. We can safely do that resolution from the task itself,
and without blocking, so retry it from there.
Ideally io_uring would know this upfront and not have to go through the
worker thread to find out, but that doesn't currently seem feasible.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Currently verifier enforces return code checks for subprograms in the
same manner as it does for program entry points. This prevents returning
arbitrary scalar values from subprograms. Scalar type of returned values
is checked by btf_prepare_func_args() and hence it should be safe to
allow only scalars for now. Relax return code checks for subprograms and
allow any correct scalar values.
Fixes: 51c39bb1d5 (bpf: Introduce function-by-function verification)
Signed-off-by: Dmitrii Banshchikov <me@ubique.spb.ru>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20201113171756.90594-1-me@ubique.spb.ru
xa_destroy() frees only internal data. The caller is responsible for
freeing the exteranl objects referenced by an xarray.
Fixes: 1cf7a12e09 ("nvme: use an xarray to lookup the Commands Supported and Effects log")
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Remove the struct used for tracking known command effects logs in a
list. This is now saved in an xarray that doesn't use these elements.
Instead, store the log directly instead of the wrapper struct.
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
If Doorbell Buffer Config command fails even 'dev->dbbuf_dbs != NULL'
which means OACS indicates that NVME_CTRL_OACS_DBBUF_SUPP is set,
nvme_dbbuf_update_and_check_event() will check event even it's not been
successfully set.
This patch fixes mismatch among dbbuf for sq/cqs in case that dbbuf
command fails.
Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
It turns out that I forgot to go through and make sure that I converted all
encoder callbacks to use atomic_enable/atomic_disable(), so let's go and
actually do that.
Signed-off-by: Lyude Paul <lyude@redhat.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Fixes: 09838c4efe ("drm/nouveau/kms: Search for encoders' connectors properly")
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Pre-NV50 chipsets don't currently use the MMU subsystem that later
chipsets use, and type_vram is negative here, leading to an OOB memory
access.
This was previously guarded by a chipset check, restore that.
Reported-by: Thomas Zimmermann <tzimmermann@suse.de>
Fixes: 5839172f09 ("drm/nouveau: explicitly specify caching to use")
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Pull fs freeze fix and cleanups from Darrick Wong:
"A single vfs fix for 5.10, along with two subsequent cleanups.
A very long time ago, a hack was added to the vfs fs freeze protection
code to work around lockdep complaints about XFS, which would try to
run a transaction (which requires intwrite protection) to finalize an
xfs freeze (by which time the vfs had already taken intwrite).
Fast forward a few years, and XFS fixed the recursive intwrite problem
on its own, and the hack became unnecessary. Fast forward almost a
decade, and latent bugs in the code converting this hack from freeze
flags to freeze locks combine with lockdep bugs to make this reproduce
frequently enough to notice page faults racing with freeze.
Since the hack is unnecessary and causes thread race errors, just get
rid of it completely. Making this kind of vfs change midway through a
cycle makes me nervous, but a large enough number of the usual
VFS/ext4/XFS/btrfs suspects have said this looks good and solves a
real problem vector.
And once that removal is done, __sb_start_write is now simple enough
that it becomes possible to refactor the function into smaller,
simpler static inline helpers in linux/fs.h. The cleanup is
straightforward.
Summary:
- Finally remove the "convert to trylock" weirdness in the fs freezer
code. It was necessary 10 years ago to deal with nested
transactions in XFS, but we've long since removed that; and now
this is causing subtle race conditions when lockdep goes offline
and sb_start_* aren't prepared to retry a trylock failure.
- Minor cleanups of the sb_start_* fs freeze helpers"
* tag 'vfs-5.10-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
vfs: move __sb_{start,end}_write* to fs.h
vfs: separate __sb_start_write into blocking and non-blocking helpers
vfs: remove lockdep bogosity in __sb_start_write
Pull xfs fixes from Darrick Wong:
- Fix a fairly serious problem where the reverse mapping btree key
comparison functions were silently ignoring parts of the keyspace
when doing comparisons
- Fix a thinko in the online refcount scrubber
- Fix a missing unlock in the pnfs code
* tag 'xfs-5.10-fixes-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: fix a missing unlock on error in xfs_fs_map_blocks
xfs: fix brainos in the refcount scrubber's rmap fragment processor
xfs: fix rmap key and record comparison functions
xfs: set the unwritten bit in rmap lookup flags in xchk_bmap_get_rmapextents
xfs: fix flags argument to rmap lookup when converting shared file rmaps
If this is attempted by a kthread, then return -EOPNOTSUPP as we don't
currently support that. Once we can get task_pid_ptr() doing the right
thing, then this can go away again.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We always have to update the value of ret, otherwise the
error value may be the previous one.
Fixes: f6bd59526c ("net: ethernet: ti: introduce am654 common platform time sync driver")
Signed-off-by: Wang Qing <wangqing@vivo.com>
[grygorii.strashko@ti.com: fix build warn, subj add fixes tag]
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Link: https://lore.kernel.org/r/20201112164541.3223-1-grygorii.strashko@ti.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull block fixes from Jens Axboe:
"A few small fixes:
- NVMe pull request from Christoph:
- don't clear the read-only bit on a revalidate (Sagi Grimberg)
- nbd error case refcount leak (Christoph)
- loop/generic uevent fix (Christoph, Petr)"
* tag 'block-5.10-2020-11-13' of git://git.kernel.dk/linux-block:
loop: Fix occasional uevent drop
block: add a return value to set_capacity_revalidate_and_notify
nbd: fix a block_device refcount leak in nbd_release
nvme: fix incorrect behavior when BLKROSET is called by the user
Pull io_uring fix from Jens Axboe:
"A single fix in here, for a missed rounding case at setup time, which
caused an otherwise legitimate setup case to return -EINVAL if used
with unaligned ring size values"
* tag 'io_uring-5.10-2020-11-13' of git://git.kernel.dk/linux-block:
io_uring: round-up cq size before comparing with rounded sq size
Commit 58956317c8 ("neighbor: Improve garbage collection")
guarantees neighbour table entries a five-second lifetime. Processes
which make heavy use of multicast can fill the neighour table with
multicast addresses in five seconds. At that point, neighbour entries
can't be GC-ed because they aren't five seconds old yet, the kernel
log starts to fill up with "neighbor table overflow!" messages, and
sends start to fail.
This patch allows multicast addresses to be thrown out before they've
lived out their five seconds. This makes room for non-multicast
addresses and makes messages to all addresses more reliable in these
circumstances.
Fixes: 58956317c8 ("neighbor: Improve garbage collection")
Signed-off-by: Jeff Dike <jdike@akamai.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20201113015815.31397-1-jdike@akamai.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Depending on the SoC/platform the CPSW can completely lose context after a
suspend/resume cycle, including CPSW wrapper (WR) which will cause reset of
WR_C0_MISC_EN register, so CPTS IRQ will became disabled.
Fix it by moving CPTS IRQ enabling in cpsw_ndo_open() where CPTS is
actually started.
Fixes: 84ea9c0a95 ("net: ethernet: ti: cpsw: enable cpts irq")
Reported-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Tested-by: Tony Lindgren <tony@atomide.com>
Link: https://lore.kernel.org/r/20201112111546.20343-1-grygorii.strashko@ti.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
For avoiding use-after-free on flush request, we call its .end_io() from
both timeout code path and __blk_mq_end_request().
When flush request's ref doesn't drop to zero, it is still used, we
can't mark it as IDLE, so fix it by marking IDLE when its refcount drops
to zero really.
Fixes: 65ff5cd045 ("blk-mq: mark flush request as IDLE in flush_end_io()")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Cc: Yi Zhang <yi.zhang@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
[BUG]
When running the following script, btrfs will trigger an ASSERT():
#/bin/bash
mkfs.btrfs -f $dev
mount $dev $mnt
xfs_io -f -c "pwrite 0 1G" $mnt/file
sync
btrfs quota enable $mnt
btrfs quota rescan -w $mnt
# Manually set the limit below current usage
btrfs qgroup limit 512M $mnt $mnt
# Crash happens
touch $mnt/file
The dmesg looks like this:
assertion failed: refcount_read(&trans->use_count) == 1, in fs/btrfs/transaction.c:2022
------------[ cut here ]------------
kernel BUG at fs/btrfs/ctree.h:3230!
invalid opcode: 0000 [#1] SMP PTI
RIP: 0010:assertfail.constprop.0+0x18/0x1a [btrfs]
btrfs_commit_transaction.cold+0x11/0x5d [btrfs]
try_flush_qgroup+0x67/0x100 [btrfs]
__btrfs_qgroup_reserve_meta+0x3a/0x60 [btrfs]
btrfs_delayed_update_inode+0xaa/0x350 [btrfs]
btrfs_update_inode+0x9d/0x110 [btrfs]
btrfs_dirty_inode+0x5d/0xd0 [btrfs]
touch_atime+0xb5/0x100
iterate_dir+0xf1/0x1b0
__x64_sys_getdents64+0x78/0x110
do_syscall_64+0x33/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7fb5afe588db
[CAUSE]
In try_flush_qgroup(), we assume we don't hold a transaction handle at
all. This is true for data reservation and mostly true for metadata.
Since data space reservation always happens before we start a
transaction, and for most metadata operation we reserve space in
start_transaction().
But there is an exception, btrfs_delayed_inode_reserve_metadata().
It holds a transaction handle, while still trying to reserve extra
metadata space.
When we hit EDQUOT inside btrfs_delayed_inode_reserve_metadata(), we
will join current transaction and commit, while we still have
transaction handle from qgroup code.
[FIX]
Let's check current->journal before we join the transaction.
If current->journal is unset or BTRFS_SEND_TRANS_STUB, it means
we are not holding a transaction, thus are able to join and then commit
transaction.
If current->journal is a valid transaction handle, we avoid committing
transaction and just end it
This is less effective than committing current transaction, as it won't
free metadata reserved space, but we may still free some data space
before new data writes.
Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1178634
Fixes: c53e965360 ("btrfs: qgroup: try to flush qgroup space when we get -EDQUOT")
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When doing a buffered write, through one of the write family syscalls, we
look for ranges which currently don't have allocated extents and set the
'delalloc new' bit on them, so that we can report a correct number of used
blocks to the stat(2) syscall until delalloc is flushed and ordered extents
complete.
However there are a few other places where we can do a buffered write
against a range that is mapped to a hole (no extent allocated) and where
we do not set the 'new delalloc' bit. Those places are:
- Doing a memory mapped write against a hole;
- Cloning an inline extent into a hole starting at file offset 0;
- Calling btrfs_cont_expand() when the i_size of the file is not aligned
to the sector size and is located in a hole. For example when cloning
to a destination offset beyond EOF.
So after such cases, until the corresponding delalloc range is flushed and
the respective ordered extents complete, we can report an incorrect number
of blocks used through the stat(2) syscall.
In some cases we can end up reporting 0 used blocks to stat(2), which is a
particular bad value to report as it may mislead tools to think a file is
completely sparse when its i_size is not zero, making them skip reading
any data, an undesired consequence for tools such as archivers and other
backup tools, as reported a long time ago in the following thread (and
other past threads):
https://lists.gnu.org/archive/html/bug-tar/2016-07/msg00001.html
Example reproducer:
$ cat reproducer.sh
#!/bin/bash
MNT=/mnt/sdi
DEV=/dev/sdi
mkfs.btrfs -f $DEV > /dev/null
# mkfs.xfs -f $DEV > /dev/null
# mkfs.ext4 -F $DEV > /dev/null
# mkfs.f2fs -f $DEV > /dev/null
mount $DEV $MNT
xfs_io -f -c "truncate 64K" \
-c "mmap -w 0 64K" \
-c "mwrite -S 0xab 0 64K" \
-c "munmap" \
$MNT/foo
blocks_used=$(stat -c %b $MNT/foo)
echo "blocks used: $blocks_used"
if [ $blocks_used -eq 0 ]; then
echo "ERROR: blocks used is 0"
fi
umount $DEV
$ ./reproducer.sh
blocks used: 0
ERROR: blocks used is 0
So move the logic that decides to set the 'delalloc bit' bit into the
function btrfs_set_extent_delalloc(), since that is what we use for all
those missing cases as well as for the cases that currently work well.
This change is also preparatory work for an upcoming patch that fixes
other problems related to tracking and reporting the number of bytes used
by an inode.
CC: stable@vger.kernel.org # 4.19+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Pull devicetree fixes from Rob Herring:
- fix Flexcan binding schema errors introduced in rc3
- fix an of_node ref counting error in of_dma_is_coherent
* tag 'devicetree-fixes-for-5.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
dt-bindings: clock: imx5: fix example
dt-bindings: can: fsl,flexcan.yaml: fix compatible for i.MX35 and i.MX53
dt-bindings: can: fsl,flexcan.yaml: fix fsl,stop-mode
of/address: Fix of_node memory leak in of_dma_is_coherent
Johannes Berg says:
====================
A handful of fixes:
* a use-after-free fix in rfkill
* a memory leak fix in the mac80211 TX status path
* some rate scaling fixes
* a fix for the often-reported (by syzbot) sleeping
in atomic issue with mac80211's station removal
* tag 'mac80211-for-net-2020-11-13' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211:
mac80211: free sta in sta_info_insert_finish() on errors
mac80211: minstrel: fix tx status processing corner case
mac80211: minstrel: remove deferred sampling code
mac80211: fix memory leak on filtered powersave frames
rfkill: Fix use-after-free in rfkill_resume()
====================
Link: https://lore.kernel.org/r/20201113093421.24025-1-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull pin control fixes from Linus Walleij:
"A bunch of pin control fixes for the v5.10 kernel series.
Nothing in particular to say about it, because they are all driver
fixes.
I'm happy that some AMD driver fixes are appearing, it's been an
undermaintained driver, and laptops have suffered.
Summary:
- Two fixes to the Intel pin controller drivers: fixing pull
resistance bias.
- Fix some invalid SSI pins on the Ingenic pin controller.
- Make sure the clock is enabled when requesting interrupts from the
Rockchip GPIO controller.
- Make sure IRQs are mapped when looking up the IRQ for a GPIO line
on the Rockchip GPIO Write.
- Two regmap initialization fixes for the MCP23s08.
- Fix a GPI-only prefix function problem on the Aspeed pin
controller.
- Disable the debounce filter correctly on the AMD pin controller.
- Correct the timer clock setting for the AMD debounce timer.
- Make the Qualcomm pin controller more cautious around the handling
of PDC-related GPIO interrupts.
- Fix the interrupt map in the Qualcomm SM8250 pin controller"
* tag 'pinctrl-v5.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: qcom: sm8250: Specify PDC map
pinctrl: qcom: Move clearing pending IRQ to .irq_request_resources callback
pinctrl: amd: use higher precision for 512 RtcClk
pinctrl: amd: fix incorrect way to disable debounce filter
pinctrl: aspeed: Fix GPI only function problem.
pinctrl: mcp23s08: Print error message when regmap init fails
pinctrl: mcp23s08: Use full chunk of memory for regmap configuration
pinctrl: rockchip: create irq mapping in gpio_to_irq
pinctrl: rockchip: enable gpio pclk for rockchip_gpio_to_irq
pinctrl: ingenic: Fix invalid SSI pins
pinctrl: intel: Set default bias in case no particular value given
pinctrl: intel: Fix 2 kOhm bias which is 833 Ohm
Pull GPIO fixes from Linus Walleij:
"Some GPIO fixes I've collected with the help of Bartosz.
Nothing special about them: all are driver and kbuild fixes + some
documentation fixes:
- Tidy up a missed function call in the designware driver when
converting to gpiolib irqchip
- Fix some bitmasks in the Aspeed driver
- Fix some kerneldoc warnings and minor bugs in the improved
userspace API documentation
- Revert the revert of the OMAP fix for lost edge wakeup interrupts:
the fix needs to stay in
- Fix a compile error when deselecting the character device
- A bunch of IRQ fixes on the idio GPIO drivers
- Fix an off-by-one error in the SiFive GPIO driver"
* tag 'gpio-v5.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
gpio: sifive: Fix SiFive gpio probe
gpio: pcie-idio-24: Enable PEX8311 interrupts
gpio: pcie-idio-24: Fix IRQ Enable Register value
gpio: pcie-idio-24: Fix irq mask when masking
gpiolib: fix sysfs when cdev is not selected
Revert "Revert "gpio: omap: Fix lost edge wake-up interrupts""
gpio: uapi: clarify the meaning of 'empty' char arrays
gpio: uapi: remove whitespace
gpio: uapi: kernel-doc formatting improvements
gpio: uapi: comment consistency
gpio: uapi: fix kernel-doc warnings
gpio: aspeed: fix ast2600 bank properties
gpio: dwapb: Fix missing conversion to GPIO-lib-based IRQ-chip
Clang warns:
drivers/spi/spi-bcm2835aux.c:532:50: warning: variable 'err' is
uninitialized when used here [-Wuninitialized]
dev_err(&pdev->dev, "could not get clk: %d\n", err);
^~~
./include/linux/dev_printk.h:112:32: note: expanded from macro 'dev_err'
_dev_err(dev, dev_fmt(fmt), ##__VA_ARGS__)
^~~~~~~~~~~
drivers/spi/spi-bcm2835aux.c:495:9: note: initialize the variable 'err'
to silence this warning
int err;
^
= 0
1 warning generated.
Restore the assignment so that the error value can be used in the
dev_err statement and there is no uninitialized memory being leaked.
Fixes: e13ee6cc47 ("spi: bcm2835aux: Fix use-after-free on unbind")
Link: https://github.com/ClangBuiltLinux/linux/issues/1199
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Link: https://lore.kernel.org/r/20201113180701.455541-1-natechancellor@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Pull MMC fixes from Ulf Hansson:
- tmio: Fixup support for reset
- sdhci-of-esdhc: Extend erratum for pulse width to more broken HWs
- renesas_sdhi: Fix re-binding of drivers
* tag 'mmc-v5.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
Revert "mmc: renesas_sdhi: workaround a regression when reinserting SD cards"
mmc: tmio: bring tuning HW to a sane state with MMC_POWER_OFF
mmc: tmio: when resetting, reset DMA controller, too
mmc: sdhci-of-esdhc: Handle pulse width detection erratum for more SoCs
mmc: renesas_sdhi_core: Add missing tmio_mmc_host_free() at remove
Pull drm fixes from Dave Airlie:
"Nearly didn't send you a PR this week at all, but a few things
trickled in over the day, not a huge amount here, some i915, amdgpu
and a bunch of misc fixes. I have a couple of nouveau fixes
outstanding due to the PR having the wrong base, I'll figure it out
next week.
amdgpu:
- Pageflip fix for DCN3
- Declare TA firmware for green sardine
- Headless navi fix
i915:
- Pull phys pread/pwrite implementations to the backend
- Correctly set SFC capability for video engines
bridge:
- cdns Kconfig fix
hyperv_fb:
- fix missing include
gma500:
- oob access fix
mcde:
- unbalanced regulator fix"
* tag 'drm-fixes-2020-11-13' of git://anongit.freedesktop.org/drm/drm:
drm/amdgpu: enable DCN for navi10 headless SKU
drm/amdgpu: add ta firmware load for green-sardine
drm/i915: Correctly set SFC capability for video engines
drm/i915/gem: Pull phys pread/pwrite implementations to the backend
drm/i915/gem: Allow backends to override pread implementation
drm/mcde: Fix unbalanced regulator
drm/gma500: Fix out-of-bounds access to struct drm_device.vblank[]
video: hyperv_fb: include vmalloc.h
drm: bridge: cdns: Kconfig: Switch over dependency to ARCH_K3
drm/amd/display: Add missing pflip irq
The spin_lock/unlock_irq() functions cannot be nested. The problem is
that presumably we would want the IRQs to be re-enabled on the second
call the spin_unlock_irq() but instead it will be enabled at the first
call so IRQs will be enabled earlier than expected.
In this situation the copy_resp_to_buf() function is only called from
one function and it is called with IRQs disabled. We can just use
the regular spin_lock/unlock() functions.
Fixes: 555e8a8f7f ("ALSA: fireworks: Add command/response functionality into hwdep interface")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20201113101241.GB168908@mwanda
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Pull bootconfig fix from Steven Rostedt:
"Fix alignment of bootconfig
GRUB may align the init ramdisk size to 4 bytes, the magic number at
the end of the init ramdisk that denotes bootconfig is attached may
not be at the exact end of the ramdisk. The kernel needs to check back
at least 4 bytes"
* tag 'trace-v5.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
bootconfig: Extend the magic check range to the preceding 3 bytes
Pull ARM fix from Russell King:
"Just one bug fix: avoid a fortify panic when copying optprobe template"
* tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: 9019/1: kprobes: Avoid fortify_panic() when copying optprobe template
Pull arm64 fixes from Will Deacon:
- Spectre/Meltdown safelisting for some Qualcomm KRYO cores
- Fix RCU splat when failing to online a CPU due to a feature mismatch
- Fix a recently introduced sparse warning in kexec()
- Fix handling of CPU erratum 1418040 for late CPUs
- Ensure hot-added memory falls within linear-mapped region
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: cpu_errata: Apply Erratum 845719 to KRYO2XX Silver
arm64: proton-pack: Add KRYO2XX silver CPUs to spectre-v2 safe-list
arm64: kpti: Add KRYO2XX gold/silver CPU cores to kpti safelist
arm64: Add MIDR value for KRYO2XX gold/silver CPU cores
arm64/mm: Validate hotplug range before creating linear mapping
arm64: smp: Tell RCU about CPUs that fail to come online
arm64: psci: Avoid printing in cpu_psci_cpu_die()
arm64: kexec_file: Fix sparse warning
arm64: errata: Fix handling of 1418040 with late CPU onlining
Pull ext4 fixes from Ted Ts'o:
"Two ext4 bug fixes, one being a revert of a commit sent during the
merge window"
* tag 'ext4_for_linus_bugfixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
Revert "ext4: fix superblock checksum calculation race"
ext4: handle dax mount option collision
Since commit:
0e030a373d ("can: flexcan: fix endianess detection")
the fsl,imx53-flexcan isn't compatible with the fsl,p1010-flexcan any more. As
the former accesses the IP core in Little Endian mode and the latter uses Big
Endian mode.
With the conversion of the flexcan DT bindings to yaml, the dt_binding_check
this throws the following error:
Documentation/devicetree/bindings/clock/imx5-clock.example.dt.yaml: can@53fc8000: compatible: 'oneOf' conditional failed, one must be fixed:
['fsl,imx53-flexcan', 'fsl,p1010-flexcan'] is too long
Additional items are not allowed ('fsl,p1010-flexcan' was unexpected)
'fsl,imx53-flexcan' is not one of ['fsl,imx7d-flexcan', 'fsl,imx6ul-flexcan', 'fsl,imx6sx-flexcan']
'fsl,imx53-flexcan' is not one of ['fsl,ls1028ar1-flexcan']
'fsl,imx6q-flexcan' was expected
'fsl,lx2160ar1-flexcan' was expected
From schema: /builds/robherring/linux-dt-bindings/Documentation/devicetree/bindings/net/can/fsl,flexcan.yaml
The error is fixed by replacing the "fsl,p1010-flexcan" compatible
(which turned out the be incompatible) with "fsl,imx25-flexcan" in the
binding example.
Reported-by: Rob Herring <robh+dt@kernel.org>
Cc: Fabio Estevam <fabio.estevam@nxp.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Link: https://lore.kernel.org/r/20201111213548.1621094-1-mkl@pengutronix.de
[robh: Add "fsl,imx25-flexcan" as fallback]
Signed-off-by: Rob Herring <robh@kernel.org>
As both the i.MX35 and i.MX53 flexcan IP cores are compatible to the i.MX25,
they are listed as:
compatible = "fsl,imx35-flexcan", "fsl,imx25-flexcan";
and:
compatible = "fsl,imx53-flexcan", "fsl,imx25-flexcan";
in the SoC device trees.
This patch fixes the following errors, which shows up during a dtbs_check:
arch/arm/boot/dts/imx53-ard.dt.yaml: can@53fc8000: compatible: 'oneOf' conditional failed, one must be fixed:
['fsl,imx53-flexcan', 'fsl,imx25-flexcan'] is too long
Additional items are not allowed ('fsl,imx25-flexcan' was unexpected)
'fsl,imx53-flexcan' is not one of ['fsl,imx7d-flexcan', 'fsl,imx6ul-flexcan', 'fsl,imx6sx-flexcan']
'fsl,imx53-flexcan' is not one of ['fsl,ls1028ar1-flexcan']
'fsl,imx6q-flexcan' was expected
'fsl,lx2160ar1-flexcan' was expected
From schema: Documentation/devicetree/bindings/net/can/fsl,flexcan.yaml
Fixes: e5ab9aa7e4 ("dt-bindings: can: flexcan: convert fsl,*flexcan bindings to yaml")
Reported-by: Rob Herring <robh+dt@kernel.org>
Cc: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Link: https://lore.kernel.org/r/20201111130507.1560881-4-mkl@pengutronix.de
[robh: drop singular fsl,imx53-flexcan and fsl,imx35-flexcan]
Signed-off-by: Rob Herring <robh@kernel.org>
pm_runtime_get_sync() will increment pm usage at first and it
will resume the device later. We should decrease the usage count
whetever it succeeded or failed(maybe runtime of the device has
error, or device is in inaccessible state, or other error state).
If we do not call put operation to decrease the reference, it will
result in reference leak in xhci_histb_probe. Moreover, this
device cannot enter the idle state and always stay busy or other
non-idle state later. So we fixed it by jumping to error handling
branch.
Fixes: c508f41da0 ("xhci: hisilicon: support HiSilicon STB xHCI host controller")
Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com>
Link: https://lore.kernel.org/r/20201106122221.2304528-1-zhangqilong3@huawei.com
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This reverts commit 2d30e408a2.
On Beaglebone Black, where each interface has 2 children:
musb-dsps 47401c00.usb: can't request region for resource [mem 0x47401800-0x474019ff]
musb-hdrc musb-hdrc.1: musb_init_controller failed with status -16
musb-hdrc: probe of musb-hdrc.1 failed with error -16
musb-dsps 47401400.usb: can't request region for resource [mem 0x47401000-0x474011ff]
musb-hdrc musb-hdrc.0: musb_init_controller failed with status -16
musb-hdrc: probe of musb-hdrc.0 failed with error -16
Before, devm_ioremap_resource() was called on "dev" ("musb-hdrc.0" or
"musb-hdrc.1"), after it is called on "&pdev->dev" ("47401400.usb" or
"47401c00.usb"), leading to a duplicate region request, which fails.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Fixes: 2d30e408a2 ("usb: musb: convert to devm_platform_ioremap_resource_byname")
Cc: stable <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20201112135900.3822599-1-geert+renesas@glider.be
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
AMD-TEE driver bug fixes
AMD-TEE driver keeps track of shared memory buffers and their
corresponding buffer id's in a global linked list. These buffers are
used to share data between x86 and AMD Secure Processor. This pull
request fixes issues related to maintaining mapped buffers in a shared
linked list.
* tag 'amdtee-fixes-for-5.10' of git://git.linaro.org:/people/jens.wiklander/linux-tee:
tee: amdtee: synchronize access to shm list
tee: amdtee: fix memory leak due to reset of global shm list
Link: https://lore.kernel.org/r/20201109080809.GA3862873@jade
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
This reverts commit eaf2d2f689.
The commit eaf2d2f689 ("ARM: dts: exynos: add input clock to CMU in
Exynos4412 Odroid") breaks probing of usb3503 USB hub on Odroid U3.
It changes the order of clock drivers probe: the clkout (Exynos PMU)
driver is probed before the main clk-exynos4 driver. The clkout driver
on Exynos4412 depends on clk-exynos4 but it does not support deferred
probe, therefore this dependency and changed probe order causes probe
failure.
The usb3503 USB hub on Odroid U3 on the other hand requires clkout
clock. This can be seen in logs:
[ 5.007442] usb3503 0-0008: unable to request refclk (-517)
Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Link: https://lore.kernel.org/r/20200921174818.15525-1-krzk@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Samsung changes for v5.10
Remove Kamil Debski, Kyungmin Park and Jeongtae Park from maintainers of
several drivers. There were no maintenance activities from them for
some time. While at touching credits, this also cleans up white spaces.
* tag 'samsung-fixes-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux:
CREDITS: remove trailing white spaces
MAINTAINERS: remove Jeongtae Park from Samsung MFC entry
MAINTAINERS: move Kyungmin Park to credits
MAINTAINERS: move Kamil Debski to credits
Link: https://lore.kernel.org/r/20201031120237.8542-1-krzk@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
i.MX fixes for 5.10, 3rd round:
- A series from Krzysztof Kozlowski to fix missing PMIC's interrupt
line pull-up for i.MX8MM and i.MX8MN boards.
- Set Bluetooth chip max-speed to 4000000 on imx8mm-beacon-som board
to fix the choppy Bluetooth audio sound.
- Remove non-existent OTG2, usbphynop2, and the usbmisc2 from i.MX8MN
device tree.
- Fix the endianness setting of RCPM node on Layerscape SoCs.
- Add the missing dma-coherent property for qoriq-fman device to improve
the performance.
- Fix the Ethernet PHY address on imx6q-prti6q board.
* tag 'imx-fixes-5.10-3' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux:
ARM: dts: imx6q-prti6q: fix PHY address
arm64: dts imx8mn: Remove non-existent USB OTG2
arm64: dts: imx8mm-beacon-som: Fix Choppy BT audio
arm64: dts: fsl: DPAA FMan DMA operations are coherent
arm64: dts: fsl: fix endianness issue of rcpm
arm64: dts: imx8mn-evk: fix missing PMIC's interrupt line pull-up
arm64: dts: imx8mn-ddr4-evk: fix missing PMIC's interrupt line pull-up
arm64: dts: imx8mn-var-som: fix missing PMIC's interrupt line pull-up
arm64: dts: imx8mm-evk: fix missing PMIC's interrupt line pull-up
arm64: dts: imx8mm-beacon-som: fix missing PMIC's interrupt line pull-up
arm64: dts: imx8mm-var-som: fix missing PMIC's interrupt line pull-up
Link: https://lore.kernel.org/r/20201030151821.GA28266@dragon
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
For AMD SEV guests, update the cr3_lm_rsvd_bits to mask
the memory encryption bit in reserved bits.
Signed-off-by: Babu Moger <babu.moger@amd.com>
Message-Id: <160521948301.32054.5783800787423231162.stgit@bmoger-ubuntu>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
SEV guests fail to boot on a system that supports the PCID feature.
While emulating the RSM instruction, KVM reads the guest CR3
and calls kvm_set_cr3(). If the vCPU is in the long mode,
kvm_set_cr3() does a sanity check for the CR3 value. In this case,
it validates whether the value has any reserved bits set. The
reserved bit range is 63:cpuid_maxphysaddr(). When AMD memory
encryption is enabled, the memory encryption bit is set in the CR3
value. The memory encryption bit may fall within the KVM reserved
bit range, causing the KVM emulation failure.
Introduce a new field cr3_lm_rsvd_bits in kvm_vcpu_arch which will
cache the reserved bits in the CR3 value. This will be initialized
to rsvd_bits(cpuid_maxphyaddr(vcpu), 63).
If the architecture has any special bits(like AMD SEV encryption bit)
that needs to be masked from the reserved bits, should be cleared
in vendor specific kvm_x86_ops.vcpu_after_set_cpuid handler.
Fixes: a780a3ea62 ("KVM: X86: Fix reserved bits check for MOV to CR3")
Signed-off-by: Babu Moger <babu.moger@amd.com>
Message-Id: <160521947657.32054.3264016688005356563.stgit@bmoger-ubuntu>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The recent changes to store the MSI irqdomain pointer in struct device
missed that Intel DMAR does not register virtual function devices. Due to
that a VF device gets the plain PCI-MSI domain assigned and then issues
compat MSI messages which get caught by the interrupt remapping unit.
Cure that by inheriting the irq domain from the physical function
device.
Ideally the irqdomain would be associated to the bus, but DMAR can have
multiple units and therefore irqdomains on a single bus. The VF 'bus' could
of course inherit the domain from the PF, but that'd be yet another x86
oddity.
Fixes: 85a8dfc57a ("iommm/vt-d: Store irq domain in struct device")
Reported-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Lu Baolu <baolu.lu@linux.intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: David Woodhouse <dwmw2@infradead.org>
Link: https://lore.kernel.org/r/draft-87eekymlpz.fsf@nanos.tec.linutronix.de
When processing request to add/replace user-defined element set, check
of given element identifier and decision of numeric identifier is done
in "__snd_ctl_add_replace()" helper function. When the result of check
is wrong, the helper function returns error code. The error code shall
be returned to userspace application.
Current implementation includes bug to return zero to userspace application
regardless of the result. This commit fixes the bug.
Cc: <stable@vger.kernel.org>
Fixes: e1a7bfe380 ("ALSA: control: Fix race between adding and removing a user element")
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Link: https://lore.kernel.org/r/20201113092043.16148-1-o-takashi@sakamocchi.jp
Signed-off-by: Takashi Iwai <tiwai@suse.de>
There is a NULL pointer crash when DCN disabled on headless SKU.
On normal SKU, the variable adev->ddev.mode_config.funcs is
initialized in dm_hw_init(), and it is fine to access it in
amdgpu_device_resume(). But on headless SKU, DCN is disabled,
the funcs variable is not initialized, then crash arises.
Enable DCN to fix this issue.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Tianci.Yin <tianci.yin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The x25_disconnect function in x25_subr.c would decrease the refcount of
"x25->neighbour" (struct x25_neigh) and reset this pointer to NULL.
However, the x25_rx_call_request function in af_x25.c, which is called
when we receive a connection request, does not increase the refcount when
it assigns the pointer.
Fix this issue by increasing the refcount of "struct x25_neigh" in
x25_rx_call_request.
This patch fixes frequent kernel crashes when using AF_X25 sockets.
Fixes: 4becb7ee5b ("net/x25: Fix x25_neigh refcnt leak when x25 disconnect")
Cc: Martin Schiller <ms@dev.tdt.de>
Signed-off-by: Xie He <xie.he.0141@gmail.com>
Link: https://lore.kernel.org/r/20201112103506.5875-1-xie.he.0141@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fix to return a negative error code from the error handling case
instead of 0 in function sel_ib_pkey_sid_slow(), as done elsewhere
in this function.
Cc: stable@vger.kernel.org
Fixes: 409dcf3153 ("selinux: Add a cache for quicker retreival of PKey SIDs")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
If a user unbinds and re-binds a NC-SI aware driver the kernel will
attempt to register the netlink interface at runtime. The structure is
marked __ro_after_init so registration fails spectacularly at this point.
# echo 1e660000.ethernet > /sys/bus/platform/drivers/ftgmac100/unbind
# echo 1e660000.ethernet > /sys/bus/platform/drivers/ftgmac100/bind
ftgmac100 1e660000.ethernet: Read MAC address 52:54:00:12:34:56 from chip
ftgmac100 1e660000.ethernet: Using NCSI interface
8<--- cut here ---
Unable to handle kernel paging request at virtual address 80a8f858
pgd = 8c768dd6
[80a8f858] *pgd=80a0841e(bad)
Internal error: Oops: 80d [#1] SMP ARM
CPU: 0 PID: 116 Comm: sh Not tainted 5.10.0-rc3-next-20201111-00003-gdd25b227ec1e #51
Hardware name: Generic DT based system
PC is at genl_register_family+0x1f8/0x6d4
LR is at 0xff26ffff
pc : [<8073f930>] lr : [<ff26ffff>] psr: 20000153
sp : 8553bc80 ip : 81406244 fp : 8553bd04
r10: 8085d12c r9 : 80a8f73c r8 : 85739000
r7 : 00000017 r6 : 80a8f860 r5 : 80c8ab98 r4 : 80a8f858
r3 : 00000000 r2 : 00000000 r1 : 81406130 r0 : 00000017
Flags: nzCv IRQs on FIQs off Mode SVC_32 ISA ARM Segment none
Control: 00c5387d Table: 85524008 DAC: 00000051
Process sh (pid: 116, stack limit = 0x1f1988d6)
...
Backtrace:
[<8073f738>] (genl_register_family) from [<80860ac0>] (ncsi_init_netlink+0x20/0x48)
r10:8085d12c r9:80c8fb0c r8:85739000 r7:00000000 r6:81218000 r5:85739000
r4:8121c000
[<80860aa0>] (ncsi_init_netlink) from [<8085d740>] (ncsi_register_dev+0x1b0/0x210)
r5:8121c400 r4:8121c000
[<8085d590>] (ncsi_register_dev) from [<805a8060>] (ftgmac100_probe+0x6e0/0x778)
r10:00000004 r9:80950228 r8:8115bc10 r7:8115ab00 r6:9eae2c24 r5:813b6f88
r4:85739000
[<805a7980>] (ftgmac100_probe) from [<805355ec>] (platform_drv_probe+0x58/0xa8)
r9:80c76bb0 r8:00000000 r7:80cd4974 r6:80c76bb0 r5:8115bc10 r4:00000000
[<80535594>] (platform_drv_probe) from [<80532d58>] (really_probe+0x204/0x514)
r7:80cd4974 r6:00000000 r5:80cd4868 r4:8115bc10
Jakub pointed out that ncsi_register_dev is obviously broken, because
there is only one family so it would never work if there was more than
one ncsi netdev.
Fix the crash by registering the netlink family once on boot, and drop
the code to unregister it.
Fixes: 955dc68cb9 ("net/ncsi: Add generic netlink family")
Signed-off-by: Joel Stanley <joel@jms.id.au>
Reviewed-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
Link: https://lore.kernel.org/r/20201112061210.914621-1-joel@jms.id.au
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull fscrypt fix from Eric Biggers:
"Fix a regression where new files weren't using inline encryption when
they should be"
* tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt:
fscrypt: fix inline encryption not used on new files
Pull gfs2 fixes from Andreas Gruenbacher:
"Fix jdata data corruption and glock reference leak"
* tag 'gfs2-v5.10-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
gfs2: Fix case in which ail writes are done to jdata holes
Revert "gfs2: Ignore journal log writes for jdata holes"
gfs2: fix possible reference leak in gfs2_check_blk_type
A test shows that the output contains a space:
# cat /proc/sgi_uv/archtype
NSGI4 U/UVX
Remove that embedded space by copying the "trimmed" buffer instead of the
untrimmed input character list. Use sizeof to remove size dependency on
copy out length. Increase output buffer size by one character just in case
BIOS sends an 8 character string for archtype.
Fixes: 1e61f5a95f ("Add and decode Arch Type in UVsystab")
Signed-off-by: Mike Travis <mike.travis@hpe.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Steve Wahl <steve.wahl@hpe.com>
Link: https://lore.kernel.org/r/20201111010418.82133-1-mike.travis@hpe.com
Pull networking fixes from Jakub Kicinski:
"Current release - regressions:
- arm64: dts: fsl-ls1028a-kontron-sl28: specify in-band mode for
ENETC
Current release - bugs in new features:
- mptcp: provide rmem[0] limit offset to fix oops
Previous release - regressions:
- IPv6: Set SIT tunnel hard_header_len to zero to fix path MTU
calculations
- lan743x: correctly handle chips with internal PHY
- bpf: Don't rely on GCC __attribute__((optimize)) to disable GCSE
- mlx5e: Fix VXLAN port table synchronization after function reload
Previous release - always broken:
- bpf: Zero-fill re-used per-cpu map element
- fix out-of-order UDP packets when forwarding with UDP GSO fraglists
turned on:
- fix UDP header access on Fast/frag0 UDP GRO
- fix IP header access and skb lookup on Fast/frag0 UDP GRO
- ethtool: netlink: add missing netdev_features_change() call
- net: Update window_clamp if SOCK_RCVBUF is set
- igc: Fix returning wrong statistics
- ch_ktls: fix multiple leaks and corner cases in Chelsio TLS offload
- tunnels: Fix off-by-one in lower MTU bounds for ICMP/ICMPv6 replies
- r8169: disable hw csum for short packets on all chip versions
- vrf: Fix fast path output packet handling with async Netfilter
rules"
* tag 'net-5.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (65 commits)
lan743x: fix use of uninitialized variable
net: udp: fix IP header access and skb lookup on Fast/frag0 UDP GRO
net: udp: fix UDP header access on Fast/frag0 UDP GRO
devlink: Avoid overwriting port attributes of registered port
vrf: Fix fast path output packet handling with async Netfilter rules
cosa: Add missing kfree in error path of cosa_write
net: switch to the kernel.org patchwork instance
ch_ktls: stop the txq if reaches threshold
ch_ktls: tcb update fails sometimes
ch_ktls/cxgb4: handle partial tag alone SKBs
ch_ktls: don't free skb before sending FIN
ch_ktls: packet handling prior to start marker
ch_ktls: Correction in middle record handling
ch_ktls: missing handling of header alone
ch_ktls: Correction in trimmed_len calculation
cxgb4/ch_ktls: creating skbs causes panic
ch_ktls: Update cheksum information
ch_ktls: Correction in finding correct length
cxgb4/ch_ktls: decrypted bit is not enough
net/x25: Fix null-ptr-deref in x25_connect
...
Pull NFS client bugfixes from Anna Schumaker:
"Stable fixes:
- Fix failure to unregister shrinker
Other fixes:
- Fix unnecessary locking to clear up some contention
- Fix listxattr receive buffer size
- Fix default mount options for nfsroot"
* tag 'nfs-for-5.10-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
NFS: Remove unnecessary inode lock in nfs_fsync_dir()
NFS: Remove unnecessary inode locking in nfs_llseek_dir()
NFS: Fix listxattr receive buffer size
NFSv4.2: fix failure to unregister shrinker
nfsroot: Default mount option should ask for built-in NFS version
As the kernel never sets HCR_EL2.EnSCXT, accesses to SCXTNUM_ELx
will trap to EL2. Let's handle that as gracefully as possible
by injecting an UNDEF exception into the guest. This is consistent
with the guest's view of ID_AA64PFR0_EL1.CSV2 being at most 1.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201110141308.451654-4-maz@kernel.org
A large number of system register trap handlers only inject an
UNDEF exeption, and yet each class of sysreg seems to provide its
own, identical function.
Let's unify them all, saving us introducing yet another one later.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201110141308.451654-3-maz@kernel.org
We now expose ID_AA64PFR0_EL1.CSV2=1 to guests running on hosts
that are immune to Spectre-v2, but that don't have this field set,
most likely because they predate the specification.
However, this prevents the migration of guests that have started on
a host the doesn't fake this CSV2 setting to one that does, as KVM
rejects the write to ID_AA64PFR0_EL2 on the grounds that it isn't
what is already there.
In order to fix this, allow userspace to set this field as long as
this doesn't result in a promising more than what is already there
(setting CSV2 to 0 is acceptable, but setting it to 1 when it is
already set to 0 isn't).
Fixes: e1026237f9 ("KVM: arm64: Set CSV2 for guests on hardware unaffected by Spectre-v2")
Reported-by: Peng Liang <liangpeng10@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201110141308.451654-2-maz@kernel.org
It has been observed that resetting force in the detect function can
result in the PHY being powered down in response to hot-plug detect
being asserted, even when the HDMI connector is forced on.
Enabling debug messages and adding a call to dump_stack() in
dw_hdmi_phy_power_off() shows the following in dmesg:
[ 160.637413] dwhdmi-rockchip ff940000.hdmi: EVENT=plugin
[ 160.637433] dwhdmi-rockchip ff940000.hdmi: PHY powered down in 0 iterations
Call trace:
dw_hdmi_phy_power_off
dw_hdmi_phy_disable
dw_hdmi_update_power
dw_hdmi_detect
dw_hdmi_connector_detect
drm_helper_probe_detect_ctx
drm_helper_hpd_irq_event
dw_hdmi_irq
irq_thread_fn
irq_thread
kthread
ret_from_fork
Fixes: 381f05a7a8 ("drm: bridge/dw_hdmi: add connector mode forcing")
Signed-off-by: Jonathan Liu <net147@gmail.com>
Reviewed-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20201031081747.372599-1-net147@gmail.com
Commit 716ad0986c ("loop: Switch to set_capacity_revalidate_and_notify")
causes an occasional drop of loop device uevent, which are no longer
triggered in loop_set_size() but in a different part of code.
Bug is reproducible with LTP test uevent01 [1]:
i=0; while true; do
i=$((i+1)); echo "== $i =="
lsmod |grep -q loop && rmmod -f loop
./uevent01 || break
done
Put back triggering through code called in loop_set_size().
Fix required to add yet another parameter to
set_capacity_revalidate_and_notify().
[1] https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/uevents/uevent01.c
[hch: rebased on a different change to the prototype of
set_capacity_revalidate_and_notify]
Cc: stable@vger.kernel.org # v5.9
Fixes: 716ad0986c ("loop: Switch to set_capacity_revalidate_and_notify")
Reported-by: <ltp@lists.linux.it>
Signed-off-by: Petr Vorel <pvorel@suse.cz>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Backchannel rpc tasks don't have task->tk_client set, so it's necessary
to check it for NULL before dereferencing.
Fixes: c509f15a58 ("SUNRPC: Split the xdr_buf event class")
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
To bring in the change made in this cset:
4d6ffa27b8 ("x86/lib: Change .weak to SYM_FUNC_START_WEAK for arch/x86/lib/mem*_64.S")
6dcc5627f6 ("x86/asm: Change all ENTRY+ENDPROC to SYM_FUNC_*")
I needed to define SYM_FUNC_START_LOCAL() as SYM_L_GLOBAL as
mem{cpy,set}_{orig,erms} are used by 'perf bench'.
This silences these perf tools build warnings:
Warning: Kernel ABI header at 'tools/arch/x86/lib/memcpy_64.S' differs from latest version at 'arch/x86/lib/memcpy_64.S'
diff -u tools/arch/x86/lib/memcpy_64.S arch/x86/lib/memcpy_64.S
Warning: Kernel ABI header at 'tools/arch/x86/lib/memset_64.S' differs from latest version at 'arch/x86/lib/memset_64.S'
diff -u tools/arch/x86/lib/memset_64.S arch/x86/lib/memset_64.S
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Fangrui Song <maskray@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Jiri Slaby <jirislaby@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When execute command "perf lock report", it hits failure and outputs log
as follows:
perf: builtin-lock.c:623: report_lock_release_event: Assertion `!(seq->read_count < 0)' failed.
Aborted
This is an imbalance issue. The locking sequence structure
"lock_seq_stat" contains the reader counter and it is used to check if
the locking sequence is balance or not between acquiring and releasing.
If the tool wrongly frees "lock_seq_stat" when "read_count" isn't zero,
the "read_count" will be reset to zero when allocate a new structure at
the next time; thus it causes the wrong counting for reader and finally
results in imbalance issue.
To fix this issue, if detects "read_count" is not zero (means still have
read user in the locking sequence), goto the "end" tag to skip freeing
structure "lock_seq_stat".
Fixes: e4cef1f650 ("perf lock: Fix state machine to recognize lock sequence")
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20201104094229.17509-2-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The tracepoint "lock:lock_acquire" contains field "flags" but not
"flag". Current code wrongly retrieves value from field "flag" and it
always gets zero for the value, thus "perf lock" doesn't report the
correct result.
This patch replaces the field name "flag" with "flags", so can read out
the correct flags for locking.
Fixes: e4cef1f650 ("perf lock: Fix state machine to recognize lock sequence")
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20201104094229.17509-1-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Here's my proposal to fix the use-after-free bugs reported by
Sascha Hauer and Florian Fainelli:
I scrutinized all SPI drivers in the v5.10 tree:
* There are 9 drivers with a use-after-free in the ->remove() hook
caused by accessing driver private data after spi_unregister_controller().
* There are 8 drivers which leak the spi_controller in the ->probe()
error path because of a missing spi_controller_put().
I'm introducing devm_spi_alloc_master/slave() which automatically
calls spi_controller_put() on ->remove(). This fixes both classes
of bugs while at the same time reducing code amount and complexity
in the ->probe() hook.
I propose that spi_controller_unregister() should no longer release
a reference on the spi_controller. Instead, drivers need to either
do it themselves or use one of the devm functions introduced herein.
The vast majority of drivers can be converted to the devm functions.
See the commit message of patch [1/4] for the rationale and details.
Enclosed are patches for 3 Broadcom drivers.
Patches for the other drivers are on this branch:
https://github.com/l1k/linux/commits/spi_fixes
@Florian Fainelli: Could you verify that there are no KASAN splats or
leaks with these patches? Unfortunately I do not have any SPI-capable
hardware at my disposal right now, so can only compile-test. You may
want to augment spi_controller_release() with a printk() to log when
the spi_controller is freed.
@Mark Brown: Patches [2/4] to [4/4] reference the SHA-1 of patch [1/4]
in their stable tags. Because the hash is unknown to me until you apply
the patch, I've used "123456789abc" as a placeholder. You'll have to
replace the hash if/when applying. Alternatively, only apply patch [1/4]
and I'll repost the other patches with the hash fixed up.
Thanks!
Lukas Wunner (4):
spi: Introduce device-managed SPI controller allocation
spi: bcm2835: Fix use-after-free on unbind
spi: bcm2835aux: Fix use-after-free on unbind
spi: bcm-qspi: Fix use-after-free on unbind
drivers/spi/spi-bcm-qspi.c | 34 ++++++++-------------
drivers/spi/spi-bcm2835.c | 24 +++++----------
drivers/spi/spi-bcm2835aux.c | 21 +++++--------
drivers/spi/spi.c | 58 +++++++++++++++++++++++++++++++++++-
include/linux/spi/spi.h | 19 ++++++++++++
5 files changed, 103 insertions(+), 53 deletions(-)
--
2.28.0
Pull ACPI fixes from Rafael Wysocki:
"These are mostly docmentation fixes and janitorial changes plus some
new device IDs and a new quirk.
Specifics:
- Fix documentation regarding GPIO properties (Andy Shevchenko)
- Fix spelling mistakes in ACPI documentation (Flavio Suligoi)
- Fix white space inconsistencies in ACPI code (Maximilian Luz)
- Fix string formatting in the ACPI Generic Event Device (GED) driver
(Nick Desaulniers)
- Add Intel Alder Lake device IDs to the ACPI drivers used by the
Dynamic Platform and Thermal Framework (Srinivas Pandruvada)
- Add lid-related DMI quirk for Medion Akoya E2228T to the ACPI
button driver (Hans de Goede)"
* tag 'acpi-5.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: DPTF: Support Alder Lake
Documentation: ACPI: fix spelling mistakes
ACPI: button: Add DMI quirk for Medion Akoya E2228T
ACPI: GED: fix -Wformat
ACPI: Fix whitespace inconsistencies
ACPI: scan: Fix acpi_dma_configure_id() kerneldoc name
Documentation: firmware-guide: gpio-properties: Clarify initial output state
Documentation: firmware-guide: gpio-properties: active_low only for GpioIo()
Documentation: firmware-guide: gpio-properties: Fix factual mistakes
Pull power management fixes from Rafael Wysocki:
"Make the intel_pstate driver behave as expected when it operates in
the passive mode with HWP enabled and the 'powersave' governor on top
of it"
* tag 'pm-5.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: intel_pstate: Take CPUFREQ_GOV_STRICT_TARGET into account
cpufreq: Add strict_target to struct cpufreq_policy
cpufreq: Introduce CPUFREQ_GOV_STRICT_TARGET
cpufreq: Introduce governor flags
Normally the last reference on an spi_controller is released by
spi_unregister_controller(). In the case of the i.MX lpspi driver,
the spi_controller is registered with devm_spi_register_controller(),
so spi_unregister_controller() is invoked automatically after the driver
has unbound.
However the driver already releases the last reference in
fsl_lpspi_remove() through a gratuitous call to spi_master_put(),
causing a use-after-free when spi_unregister_controller() is
subsequently invoked by the devres framework.
Fix by dropping the superfluous spi_master_put().
Fixes: 944c01a889 ("spi: lpspi: enable runtime pm for lpspi")
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: <stable@vger.kernel.org> # v5.2+
Cc: Han Xu <han.xu@nxp.com>
Link: https://lore.kernel.org/r/ab3c0b18bd820501a12c85e440006e09ec0e275f.1604874488.git.lukas@wunner.de
Signed-off-by: Mark Brown <broonie@kernel.org>
Alexander Lobakin says:
====================
net: udp: fix Fast/frag0 UDP GRO
While testing UDP GSO fraglists forwarding through driver that uses
Fast GRO (via napi_gro_frags()), I was observing lots of out-of-order
iperf packets:
[ ID] Interval Transfer Bitrate Jitter
[SUM] 0.0-40.0 sec 12106 datagrams received out-of-order
Simple switch to napi_gro_receive() or any other method without frag0
shortcut completely resolved them.
I've found two incorrect header accesses in GRO receive callback(s):
- udp_hdr() (instead of udp_gro_udphdr()) that always points to junk
in "fast" mode and could probably do this in "regular".
This was the actual bug that caused all out-of-order delivers;
- udp{4,6}_lib_lookup_skb() -> ip{,v6}_hdr() (instead of
skb_gro_network_header()) that potentionally might return odd
pointers in both modes.
Each patch addresses one of these two issues.
This doesn't cover a support for nested tunnels as it's out of the
subject and requires more invasive changes. It will be handled
separately in net-next series.
Credits:
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Willem de Bruijn <willemb@google.com>
Since v4 [0]:
- split the fix into two logical ones (Willem);
- replace ternaries with plain ifs to beautify the code (Jakub);
- drop p->data part to reintroduce it later in abovementioned set.
Since v3 [1]:
- restore the original {,__}udp{4,6}_lib_lookup_skb() and use
private versions of them inside GRO code (Willem).
Since v2 [2]:
- dropped redundant check introduced in v2 as it's performed right
before (thanks to Eric);
- udp_hdr() switched to data + off for skbs from list (also Eric);
- fixed possible malfunction of {,__}udp{4,6}_lib_lookup_skb() with
Fast/frag0 due to ip{,v6}_hdr() usage (Willem).
Since v1 [3]:
- added a NULL pointer check for "uh" as suggested by Willem.
[0] https://lore.kernel.org/netdev/Ha2hou5eJPcblo4abjAqxZRzIl1RaLs2Hy0oOAgFs@cp4-web-036.plabs.ch
[1] https://lore.kernel.org/netdev/MgZce9htmEtCtHg7pmWxXXfdhmQ6AHrnltXC41zOoo@cp7-web-042.plabs.ch
[2] https://lore.kernel.org/netdev/0eaG8xtbtKY1dEKCTKUBubGiC9QawGgB3tVZtNqVdY@cp4-web-030.plabs.ch
[3] https://lore.kernel.org/netdev/YazU6GEzBdpyZMDMwJirxDX7B4sualpDG68ADZYvJI@cp4-web-034.plabs.ch
====================
Link: https://lore.kernel.org/r/hjGOh0iCOYyo1FPiZh6TMXcx3YCgNs1T1eGKLrDz8@cp4-web-037.plabs.ch
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
udp{4,6}_lib_lookup_skb() use ip{,v6}_hdr() to get IP header of the
packet. While it's probably OK for non-frag0 paths, this helpers
will also point to junk on Fast/frag0 GRO when all headers are
located in frags. As a result, sk/skb lookup may fail or give wrong
results. To support both GRO modes, skb_gro_network_header() might
be used. To not modify original functions, add private versions of
udp{4,6}_lib_lookup_skb() only to perform correct sk lookups on GRO.
Present since the introduction of "application-level" UDP GRO
in 4.7-rc1.
Misc: replace totally unneeded ternaries with plain ifs.
Fixes: a6024562ff ("udp: Add GRO functions to UDP socket")
Suggested-by: Willem de Bruijn <willemb@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
UDP GRO uses udp_hdr(skb) in its .gro_receive() callback. While it's
probably OK for non-frag0 paths (when all headers or even the entire
frame are already in skb head), this inline points to junk when
using Fast GRO (napi_gro_frags() or napi_gro_receive() with only
Ethernet header in skb head and all the rest in the frags) and breaks
GRO packet compilation and the packet flow itself.
To support both modes, skb_gro_header_fast() + skb_gro_header_slow()
are typically used. UDP even has an inline helper that makes use of
them, udp_gro_udphdr(). Use that instead of troublemaking udp_hdr()
to get rid of the out-of-order delivers.
Present since the introduction of plain UDP GRO in 5.0-rc1.
Fixes: e20cf8d3f1 ("udp: implement GRO for plain UDP sockets.")
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Patch b2a846dbef ("gfs2: Ignore journal log writes for jdata holes")
tried (unsuccessfully) to fix a case in which writes were done to jdata
blocks, the blocks are sent to the ail list, then a punch_hole or truncate
operation caused the blocks to be freed. In other words, the ail items
are for jdata holes. Before b2a846dbef, the jdata hole caused function
gfs2_block_map to return -EIO, which was eventually interpreted as an
IO error to the journal, and then withdraw.
This patch changes function gfs2_get_block_noalloc, which is only used
for jdata writes, so it returns -ENODATA rather than -EIO, and when
-ENODATA is returned to gfs2_ail1_start_one, the error is ignored.
We can safely ignore it because gfs2_ail1_start_one is only called
when the jdata pages have already been written and truncated, so the
ail1 content no longer applies.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
This reverts commit b2a846dbef.
That commit changed the behavior of function gfs2_block_map to return
-ENODATA in cases where a hole (IOMAP_HOLE) is encountered and create is
false. While that fixed the intended problem for jdata, it also broke
other callers of gfs2_block_map such as some jdata block reads. Before
the patch, an encountered hole would be skipped and the buffer seen as
unmapped by the caller. The patch changed the behavior to return
-ENODATA, which is interpreted as an error by the caller.
The -ENODATA return code should be restricted to the specific case where
jdata holes are encountered during ail1 writes. That will be done in a
later patch.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
dma_virt_ops requires that all pages have a kernel virtual address.
Introduce a INFINIBAND_VIRT_DMA Kconfig symbol that depends on !HIGHMEM
and make all three drivers depend on the new symbol.
Also remove the ARCH_DMA_ADDR_T_64BIT dependency, which has been obsolete
since commit 4965a68780 ("arch: define the ARCH_DMA_ADDR_T_64BIT config
symbol in lib/Kconfig")
Fixes: 551199aca1 ("lib/dma-virt: Add dma_virt_ops")
Link: https://lore.kernel.org/r/20201106181941.1878556-2-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2020-11-10
This series contains updates to i40e and igc drivers and the MAINTAINERS
file.
Slawomir fixes updating VF MAC addresses to fix various issues related
to reporting and setting of these addresses for i40e.
Dan Carpenter fixes a possible used before being initialized issue for
i40e.
Vinicius fixes reporting of netdev stats for igc.
Tony updates repositories for Intel Ethernet Drivers.
* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
MAINTAINERS: Update repositories for Intel Ethernet Drivers
igc: Fix returning wrong statistics
i40e, xsk: uninitialized variable in i40e_clean_rx_irq_zc()
i40e: Fix MAC address setting for a VF via Host/VM
====================
Link: https://lore.kernel.org/r/20201111001955.533210-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The xarray is never mutated from an IRQ handler, only from work queues
under a spinlock_irq. Thus there is no reason for it be an IRQ type
xarray.
This was copied over from the original IDR code, but the recent rework put
the xarray inside another spinlock_irq which will unbalance the unlocking.
Fixes: c206f8bad1 ("RDMA/cm: Make it clearer how concurrency works in cm_req_handler()")
Link: https://lore.kernel.org/r/0-v1-808b6da3bd3f+1857-cm_xarray_no_irq_jgg@nvidia.com
Reported-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
VRF devices use an optimized direct path on output if a default qdisc
is involved, calling Netfilter hooks directly. This path, however, does
not consider Netfilter rules completing asynchronously, such as with
NFQUEUE. The Netfilter okfn() is called for asynchronously accepted
packets, but the VRF never passes that packet down the stack to send
it out over the slave device. Using the slower redirect path for this
seems not feasible, as we do not know beforehand if a Netfilter hook
has asynchronously completing rules.
Fix the use of asynchronously completing Netfilter rules in OUTPUT and
POSTROUTING by using a special completion function that additionally
calls dst_output() to pass the packet down the stack. Also, slightly
adjust the use of nf_reset_ct() so that is called in the asynchronous
case, too.
Fixes: dcdd43c41e ("net: vrf: performance improvements for IPv4")
Fixes: a9ec54d1b0 ("net: vrf: performance improvements for IPv6")
Signed-off-by: Martin Willi <martin@strongswan.org>
Link: https://lore.kernel.org/r/20201106073030.3974927-1-martin@strongswan.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Remove the contentious inode lock, and instead provide thread safety
using the file->f_lock spinlock.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Certain NFSv4.2/RDMA tests fail with v5.9-rc1.
rpcrdma_convert_kvec() runs off the end of the rl_segments array
because rq_rcv_buf.tail[0].iov_len holds a very large positive
value. The resultant kernel memory corruption is enough to crash
the client system.
Callers of rpc_prepare_reply_pages() must reserve an extra XDR_UNIT
in the maximum decode size for a possible XDR pad of the contents
of the xdr_buf's pages. That guarantees the allocated receive buffer
will be large enough to accommodate the usual contents plus that XDR
pad word.
encode_op_hdr() cannot add that extra word. If it does,
xdr_inline_pages() underruns the length of the tail iovec.
Fixes: 3e1f02123f ("NFSv4.2: add client side XDR handling for extended attributes")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
We forgot to unregister the nfs4_xattr_large_entry_shrinker.
That leaves the global list of shrinkers corrupted after unload of the
nfs module, after which possibly unrelated code that calls
register_shrinker() or unregister_shrinker() gets a BUG() with
"supervisor write access in kernel mode".
And similarly for the nfs4_xattr_large_entry_lru.
Reported-by: Kris Karas <bugs-a17@moonlit-rail.com>
Tested-By: Kris Karas <bugs-a17@moonlit-rail.com>
Fixes: 95ad37f90c "NFSv4.2: add client side xattr caching."
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
CC: stable@vger.kernel.org
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
bcm_qspi_remove() calls spi_unregister_master() even though
bcm_qspi_probe() calls devm_spi_register_master(). The spi_master is
therefore unregistered and freed twice on unbind.
Moreover, since commit 0392727c26 ("spi: bcm-qspi: Handle clock probe
deferral"), bcm_qspi_probe() leaks the spi_master allocation if the call
to devm_clk_get_optional() fails.
Fix by switching over to the new devm_spi_alloc_master() helper which
keeps the private data accessible until the driver has unbound and also
avoids the spi_master leak on probe.
While at it, fix an ordering issue in bcm_qspi_remove() wherein
spi_unregister_master() is called after uninitializing the hardware,
disabling the clock and freeing an IRQ data structure. The correct
order is to call spi_unregister_master() *before* those teardown steps
because bus accesses may still be ongoing until that function returns.
Fixes: fa236a7ef2 ("spi: bcm-qspi: Add Broadcom MSPI driver")
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: <stable@vger.kernel.org> # v4.9+: 123456789abc: spi: Introduce device-managed SPI controller allocation
Cc: <stable@vger.kernel.org> # v4.9+
Cc: Kamal Dasu <kdasu.kdev@gmail.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/5e31a9a59fd1c0d0b795b2fe219f25e5ee855f9d.1605121038.git.lukas@wunner.de
Signed-off-by: Mark Brown <broonie@kernel.org>
SPI driver probing currently comprises two steps, whereas removal
comprises only one step:
spi_alloc_master()
spi_register_controller()
spi_unregister_controller()
That's because spi_unregister_controller() calls device_unregister()
instead of device_del(), thereby releasing the reference on the
spi_controller which was obtained by spi_alloc_master().
An SPI driver's private data is contained in the same memory allocation
as the spi_controller struct. Thus, once spi_unregister_controller()
has been called, the private data is inaccessible. But some drivers
need to access it after spi_unregister_controller() to perform further
teardown steps.
Introduce devm_spi_alloc_master() and devm_spi_alloc_slave(), which
release a reference on the spi_controller struct only after the driver
has unbound, thereby keeping the memory allocation accessible. Change
spi_unregister_controller() to not release a reference if the
spi_controller was allocated by one of these new devm functions.
The present commit is small enough to be backportable to stable.
It allows fixing drivers which use the private data in their ->remove()
hook after it's been freed. It also allows fixing drivers which neglect
to release a reference on the spi_controller in the probe error path.
Long-term, most SPI drivers shall be moved over to the devm functions
introduced herein. The few that can't shall be changed in a treewide
commit to explicitly release the last reference on the controller.
That commit shall amend spi_unregister_controller() to no longer release
a reference, thereby completing the migration.
As a result, the behaviour will be less surprising and more consistent
with subsystems such as IIO, which also includes the private data in the
allocation of the generic iio_dev struct, but calls device_del() in
iio_device_unregister().
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Link: https://lore.kernel.org/r/272bae2ef08abd21388c98e23729886663d19192.1605121038.git.lukas@wunner.de
Signed-off-by: Mark Brown <broonie@kernel.org>
The battery status is also being reported by the logitech-hidpp driver,
so ignore the standard HID battery status to avoid reporting the same
info twice.
Note the logitech-hidpp battery driver provides more info, such as properly
differentiating between charging and discharging. Also the standard HID
battery info seems to be wrong, reporting a capacity of just 26% after
fully charging the device.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Like the MX5000 and MX5500 quad/bluetooth keyboards the Dinovo Edge also
needs the HIDPP_CONSUMER_VENDOR_KEYS quirk for some special keys to work.
Specifically without this the "Phone" and the 'A' - 'D' Smart Keys do not
send any events.
In addition to fixing these keys not sending any events, adding the
Bluetooth match, so that hid-logitech-hidpp is used instead of the
generic HID driver, also adds battery monitoring support when the
keyboard is connected over Bluetooth.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Commit fff2d0f701 ("hwmon: (applesmc) avoid overlong udelay()")
introduced an issue whereby communication with the SMC became
unreliable with write errors like :
[ 120.378614] applesmc: send_byte(0x00, 0x0300) fail: 0x40
[ 120.378621] applesmc: LKSB: write data fail
[ 120.512782] applesmc: send_byte(0x00, 0x0300) fail: 0x40
[ 120.512787] applesmc: LKSB: write data fail
The original code appeared to be timing sensitive and was not reliable
with the timing changes in the aforementioned commit.
This patch re-factors the SMC communication to remove the timing
dependencies and restore function with the changes previously
committed.
Tested on : MacbookAir6,2 MacBookPro11,1 iMac12,2, MacBookAir1,1,
MacBookAir3,1
Fixes: fff2d0f701 ("hwmon: (applesmc) avoid overlong udelay()")
Reported-by: Andreas Kemnade <andreas@kemnade.info>
Tested-by: Andreas Kemnade <andreas@kemnade.info> # MacBookAir6,2
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Brad Campbell <brad@fnarfbargle.com>
Signed-off-by: Henrik Rydberg <rydberg@bitmath.org>
Link: https://lore.kernel.org/r/194a7d71-a781-765a-d177-c962ef296b90@fnarfbargle.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
To convert the number of pulses counted into an RPM estimation, we need
to divide by the width of our measurement interval instead of
multiplying by it. If the width of the measurement interval is zero we
don't update the RPM value to avoid dividing by zero.
We also don't need to do 64-bit division, with 32-bits we can handle a
fan running at over 4 million RPM.
Signed-off-by: Paul Barker <pbarker@konsulko.com>
Link: https://lore.kernel.org/r/20201111164643.7087-1-pbarker@konsulko.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Some quad/bluetooth keyboards, such as the Dinovo Edge (Y-RAY81) have a
builtin touchpad. In this case when asking the receiver for paired devices,
we get only 1 paired device with a device_type of REPORT_TYPE_KEYBOARD.
This means that we do not instantiate a second dj_hiddev for the mouse
(as we normally would) and thus there is no place for us to forward the
mouse input reports to, causing the touchpad part of the keyboard to not
work.
There is no way for us to detect these keyboards, so this commit adds
an array with device-ids for such keyboards and when a keyboard is on
this list it adds STD_MOUSE to the reports_supported bitmap for the
dj_hiddev created for the keyboard fixing the touchpad not working.
Using a list of device-ids for this is not ideal, but there are only
very few such keyboards so this should be fine. Besides the Dinovo Edge,
other known wireless Logitech keyboards with a builtin touchpad are:
* Dinovo Mini (TODO add its device-id to the list)
* K400 (uses a unifying receiver so is not affected)
* K600 (uses a unifying receiver so is not affected)
Cc: stable@vger.kernel.org
BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1811424
Fixes: f2113c3020 ("HID: logitech-dj: add support for Logitech Bluetooth Mini-Receiver")
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
In the fail path of gfs2_check_blk_type, forgetting to call
gfs2_glock_dq_uninit will result in rgd_gh reference leak.
Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
It has been observed that on OMAP4430 (ES2.0, ES2.1 and ES2.3) the enabled
notifier causes errors on the DTEMP readout values:
ti-soc-thermal 4a002260.bandgap: in range ADC val: 52
ti-soc-thermal 4a002260.bandgap: in range ADC val: 64
ti-soc-thermal 4a002260.bandgap: in range ADC val: 64
ti-soc-thermal 4a002260.bandgap: out of range ADC val: 0
thermal thermal_zone0: failed to read out thermal zone (-5)
ti-soc-thermal 4a002260.bandgap: out of range ADC val: 0
thermal thermal_zone0: failed to read out thermal zone (-5)
ti-soc-thermal 4a002260.bandgap: out of range ADC val: 4
thermal thermal_zone0: failed to read out thermal zone (-5)
ti-soc-thermal 4a002260.bandgap: in range ADC val: 100
raw 100 translates to 133 Celsius on omap4-sdp, triggering shutdown due to
critical temperature.
When the notifier is disable for OMAP4430 the DTEMP values are stable:
ti-soc-thermal 4a002260.bandgap: in range ADC val: 56
ti-soc-thermal 4a002260.bandgap: in range ADC val: 56
ti-soc-thermal 4a002260.bandgap: in range ADC val: 57
ti-soc-thermal 4a002260.bandgap: in range ADC val: 57
ti-soc-thermal 4a002260.bandgap: in range ADC val: 56
Fixes: 5093402e5b ("thermal: ti-soc-thermal: Enable addition power management")
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
Tested-by: Tony Lindgren <tony@atomide.com>
Acked-by: Keerthy <j-keerthy@ti.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20201029100335.27665-1-peter.ujfalusi@ti.com
This file is installed by the s390 CPU Measurement sampling
facility device driver to export supported minimum and
maximum sample buffer sizes.
This file is read by lscpumf tool to display the details
of the device driver capabilities. The lscpumf tool might
be invoked by a non-root user. In this case it does not
print anything because the file contents can not be read.
Fix this by allowing read access for all users. Reading
the file contents is ok, changing the file contents is
left to the root user only.
For further reference and details see:
[1] https://github.com/ibm-s390-tools/s390-tools/issues/97
Fixes: 69f239ed33 ("s390/cpum_sf: Dynamically extend the sampling buffer if overflows occur")
Cc: <stable@vger.kernel.org> # 3.14
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Some drivers fill the status rate list without setting the rate index after
the final rate to -1. minstrel_ht already deals with this, but minstrel
doesn't, which causes it to get stuck at the lowest rate on these drivers.
Fix this by checking the count as well.
Cc: stable@vger.kernel.org
Fixes: cccf129f82 ("mac80211: add the 'minstrel' rate control algorithm")
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20201111183359.43528-3-nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Deferring sampling attempts to the second stage has some bad interactions
with drivers that process the rate table in hardware and use the probe flag
to indicate probing packets (e.g. most mt76 drivers). On affected drivers
it can lead to probing not working at all.
If the link conditions turn worse, it might not be such a good idea to
do a lot of sampling for lower rates in this case.
Fix this by simply skipping the sample attempt instead of deferring it,
but keep the checks that would allow it to be sampled if it was skipped
too often, but only if it has less than 95% success probability.
Also ensure that IEEE80211_TX_CTL_RATE_CTRL_PROBE is set for all probing
packets.
Cc: stable@vger.kernel.org
Fixes: cccf129f82 ("mac80211: add the 'minstrel' rate control algorithm")
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20201111183359.43528-2-nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Speakup has only one speakup_tty variable to store the tty it is managing. This
makes sense since its codebase currently assumes that there is only one user who
controls the screen reading.
That however means that we have to forbid using the line discipline several
times, otherwise the second closure would try to free a NULL ldisc_data, leading to
general protection fault: 0000 [#1] SMP KASAN PTI
RIP: 0010:spk_ttyio_ldisc_close+0x2c/0x60
Call Trace:
tty_ldisc_release+0xa2/0x340
tty_release_struct+0x17/0xd0
tty_release+0x9d9/0xcc0
__fput+0x231/0x740
task_work_run+0x12c/0x1a0
do_exit+0x9b5/0x2230
? release_task+0x1240/0x1240
? __do_page_fault+0x562/0xa30
do_group_exit+0xd5/0x2a0
__x64_sys_exit_group+0x35/0x40
do_syscall_64+0x89/0x2b0
? page_fault+0x8/0x30
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Cc: stable@vger.kernel.org
Reported-by: 秦世松 <qinshisong1205@gmail.com>
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Tested-by: Shisong Qin <qinshisong1205@gmail.com>
Link: https://lore.kernel.org/r/20201110183541.fzgnlwhjpgqzjeth@function
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Enabling the lock dependency validator has revealed
that the way spinlocks are used in the IMX serial
port could result in a deadlock.
Specifically, imx_uart_int() acquires a spinlock
without disabling the interrupts, meaning that another
interrupt could come along and try to acquire the same
spinlock, potentially causing the two to wait for each
other indefinitely.
Use spin_lock_irqsave() instead to disable interrupts
upon acquisition of the spinlock.
Fixes: c974991d26 ("tty:serial:imx: use spin_lock instead of spin_lock_irqsave in isr")
Reviewed-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Sam Nobs <samuel.nobs@taitradio.com>
Link: https://lore.kernel.org/r/1604955006-9363-1-git-send-email-samuel.nobs@taitradio.com
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If a device is getting removed or reprobed during resume, use-after-free
might happen. For example, h5_btrtl_resume() schedules a work queue for
device reprobing, which of course requires removal first.
If the removal happens in parallel with the device_resume() and wins the
race to acquire device_lock(), removal may remove the device from the PM
lists and all, but device_resume() is already running and will continue
when the lock can be acquired, thus calling rfkill_resume().
During this, if rfkill_set_block() is then called after the corresponding
*_unregister() and kfree() are called, there will be an use-after-free
in hci_rfkill_set_block():
BUG: KASAN: use-after-free in hci_rfkill_set_block+0x58/0xc0 [bluetooth]
...
Call trace:
dump_backtrace+0x0/0x154
show_stack+0x20/0x2c
dump_stack+0xbc/0x12c
print_address_description+0x88/0x4b0
__kasan_report+0x144/0x168
kasan_report+0x10/0x18
check_memory_region+0x19c/0x1ac
__kasan_check_write+0x18/0x24
hci_rfkill_set_block+0x58/0xc0 [bluetooth]
rfkill_set_block+0x9c/0x120
rfkill_resume+0x34/0x70
dpm_run_callback+0xf0/0x1f4
device_resume+0x210/0x22c
Fix this by checking rfkill->registered in rfkill_resume(). device_del()
in rfkill_unregister() requires device_lock() and the whole rfkill_resume()
is also protected by the same lock via device_resume(), we can make sure
either the rfkill->registered is false before rfkill_resume() starts or the
rfkill device won't be unregistered before rfkill_resume() returns.
As async_resume() holds a reference to the device, at this level there can
be no use-after-free; only in the user that doesn't expect this scenario.
Fixes: 8589086f4e ("Bluetooth: hci_h5: Turn off RTL8723BS on suspend, reprobe on resume")
Signed-off-by: Claire Chang <tientzu@chromium.org>
Link: https://lore.kernel.org/r/20201110084908.219088-1-tientzu@chromium.org
[edit commit message for clarity and add more info provided later]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Specification says the bit7 of the DPCD MAX_LANE_COUNT (offset 0x02) must
be set to 1 when comes to the displayport version 1.2. This patch respects
the definition.
W/o this patch, guest i915 driver can only set the resolution to 1024*768,
and complains about the unsuccessful link training:
[ 5.692193] i915 0000:00:02.0: [drm] *ERROR* index 0, lane_count 1 Link Training Unsuccessful
Fixes: e2e02cbb5b ("drm/i915/gvt: make dpcd_fix_data supports DP1.2")
Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Tina Zhang <tina.zhang@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20200921065807.247847-1-tina.zhang@intel.com
The new helper function fscrypt_prepare_new_inode() runs before
S_ENCRYPTED has been set on the new inode. This accidentally made
fscrypt_select_encryption_impl() never enable inline encryption on newly
created files, due to its use of fscrypt_needs_contents_encryption()
which only returns true when S_ENCRYPTED is set.
Fix this by using S_ISREG() directly instead of
fscrypt_needs_contents_encryption(), analogous to what
select_encryption_mode() does.
I didn't notice this earlier because by design, the user-visible
behavior is the same (other than performance, potentially) regardless of
whether inline encryption is used or not.
Fixes: a992b20cd4 ("fscrypt: add fscrypt_prepare_new_inode() and fscrypt_set_context()")
Reviewed-by: Satya Tangirala <satyat@google.com>
Link: https://lore.kernel.org/r/20201111015224.303073-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
When TOUCHSCREEN_ADC is enabled and IIO_BUFFER is disabled, it results
in the following Kbuild warning:
WARNING: unmet direct dependencies detected for IIO_BUFFER_CB
Depends on [n]: IIO [=y] && IIO_BUFFER [=n]
Selected by [y]:
- TOUCHSCREEN_ADC [=y] && !UML && INPUT [=y] && INPUT_TOUCHSCREEN [=y] && IIO [=y]
The reason is that TOUCHSCREEN_ADC selects IIO_BUFFER_CB without depending
on or selecting IIO_BUFFER while IIO_BUFFER_CB depends on IIO_BUFFER. This
can also fail building the kernel.
Honor the kconfig dependency to remove unmet direct dependency warnings
and avoid any potential build failures.
Fixes: aa132ffb6b ("input: touchscreen: resistive-adc-touch: add generic resistive ADC touchscreen")
Signed-off-by: Necip Fazil Yildiran <fazilyildiran@gmail.com>
Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20201102221504.541279-1-fazilyildiran@gmail.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Rohit Maheshwari says:
====================
cxgb4/ch_ktls: Fixes in nic tls code
This series helps in fixing multiple nic ktls issues. Series is broken
into 12 patches.
Patch 1 avoids deciding tls packet based on decrypted bit. If its a
retransmit packet which has tls handshake and finish (for encryption),
decrypted bit won't be set there, and so we can't rely on decrypted
bit.
Patch 2 helps supporting linear skb. SKBs were assumed non-linear.
Corrected the length extraction.
Patch 3 fixes the checksum offload update in WR.
Patch 4 fixes kernel panic happening due to creating new skb for each
record. As part of fix driver will use same skb to send out one tls
record (partial data) of the same SKB.
Patch 5 fixes the problem of skb data length smaller than remaining data
of the record.
Patch 6 fixes the handling of SKBs which has tls header alone pkt, but
not starting from beginning.
Patch 7 avoids sending extra data which is used to make a record 16 byte
aligned. We don't need to retransmit those extra few bytes.
Patch 8 handles the cases where retransmit packet has tls starting
exchanges which are prior to tls start marker.
Patch 9 fixes the problem os skb free before HW knows about tcp FIN.
Patch 10 handles the small packet case which has partial TAG bytes only.
HW can't handle those, hence using sw crypto for such pkts.
Patch 11 corrects the potential tcb update problem.
Patch 12 stops the queue if queue reaches threshold value.
v1->v2:
- Corrected fixes tag issue.
- Marked chcr_ktls_sw_fallback() static.
v2->v3:
- Replaced GFP_KERNEL with GFP_ATOMIC.
- Removed mixed fixes.
v3->v4:
- Corrected fixes tag issue.
v4->v5:
- Separated mixed fixes from patch 4.
v5-v6:
- Fixes tag should be at the end.
====================
Link: https://lore.kernel.org/r/20201109105142.15398-1-rohitm@chelsio.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Stop the queue and ask for the credits if queue reaches to
threashold.
Fixes: 5a4b9fe7fe ("cxgb4/chcr: complete record tx handling")
Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
context id and port id should be filled while sending tcb update.
Fixes: 5a4b9fe7fe ("cxgb4/chcr: complete record tx handling")
Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If TCP congestion caused a very small packets which only has some
part fo the TAG, and that too is not till the end. HW can't handle
such case, so falling back to sw crypto in such cases.
v1->v2:
- Marked chcr_ktls_sw_fallback() static.
Fixes: dc05f3df8f ("chcr: Handle first or middle part of record")
Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If its a last packet and fin is set. Make sure FIN is informed
to HW before skb gets freed.
Fixes: 429765a149 ("chcr: handle partial end part of a record")
Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
There could be a case where ACK for tls exchanges prior to start
marker is missed out, and by the time tls is offloaded. This pkt
should not be discarded and handled carefully. It could be
plaintext alone or plaintext + finish as well.
Fixes: 5a4b9fe7fe ("cxgb4/chcr: complete record tx handling")
Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If a record starts in middle, reset TCB UNA so that we could
avoid sending out extra packet which is needed to make it 16
byte aligned to start AES CTR.
Check also considers prev_seq, which should be what is
actually sent, not the skb data length.
Avoid updating partial TAG to HW at any point of time, that's
why we need to check if remaining part is smaller than TAG
size, then reset TX_MAX to be TAG starting sequence number.
Fixes: 5a4b9fe7fe ("cxgb4/chcr: complete record tx handling")
Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If an skb has only header part which doesn't start from
beginning, is not being handled properly.
Fixes: dc05f3df8f ("chcr: Handle first or middle part of record")
Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
trimmed length calculation goes wrong if skb has only tag part
to send. It should be zero if there is no data bytes apart from
TAG.
Fixes: dc05f3df8f ("chcr: Handle first or middle part of record")
Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Creating SKB per tls record and freeing the original one causes
panic. There will be race if connection reset is requested. By
freeing original skb, refcnt will be decremented and that means,
there is no pending record to send, and so tls_dev_del will be
requested in control path while SKB of related connection is in
queue.
Better approach is to use same SKB to send one record (partial
data) at a time. We still have to create a new SKB when partial
last part of a record is requested.
This fix introduces new API cxgb4_write_partial_sgl() to send
partial part of skb. Present cxgb4_write_sgl can only provide
feasibility to start from an offset which limits to header only
and it can write sgls for the whole skb len. But this new API
will help in both. It can start from any offset and can end
writing in middle of the skb.
v4->v5:
- Removed extra changes.
Fixes: 429765a149 ("chcr: handle partial end part of a record")
Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Checksum update was missing in the WR.
Fixes: 429765a149 ("chcr: handle partial end part of a record")
Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
There is a possibility of linear skbs coming in. Correcting
the length extraction logic.
v2->v3:
- Separated un-related changes from this patch.
Fixes: 5a4b9fe7fe ("cxgb4/chcr: complete record tx handling")
Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If skb has retransmit data starting before start marker, e.g. ccs,
decrypted bit won't be set for that, and if it has some data to
encrypt, then it must be given to crypto ULD. So in place of
decrypted, check if socket is tls offloaded. Also, unless skb has
some data to encrypt, no need to give it for tls offload handling.
v2->v3:
- Removed ifdef.
Fixes: 5a4b9fe7fe ("cxgb4/chcr: complete record tx handling")
Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The fsl,stop-mode property is a phandle-array and should consist of one phandle
and two 32 bit integers, e.g.:
fsl,stop-mode = <&gpr 0x34 28>;
This patch fixes the following errors, which shows up during a dtbs_check:
arch/arm/boot/dts/imx6dl-apf6dev.dt.yaml: can@2090000: fsl,stop-mode: [[1, 52, 28]] is too short
From schema: Documentation/devicetree/bindings/net/can/fsl,flexcan.yaml
Fixes: e5ab9aa7e4 ("dt-bindings: can: flexcan: convert fsl,*flexcan bindings to yaml")
Reported-by: Rob Herring <robh+dt@kernel.org>
Cc: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Link: https://lore.kernel.org/r/20201111130507.1560881-5-mkl@pengutronix.de
Signed-off-by: Rob Herring <robh@kernel.org>
Commit dabf6b36b8 ("of: Add OF_DMA_DEFAULT_COHERENT & select it on
powerpc") added a check to of_dma_is_coherent which returns early
if OF_DMA_DEFAULT_COHERENT is enabled. This results in the of_node_put()
being skipped causing a memory leak. Moved the of_node_get() below this
check so we now we only get the node if OF_DMA_DEFAULT_COHERENT is not
enabled.
Fixes: dabf6b36b8 ("of: Add OF_DMA_DEFAULT_COHERENT & select it on powerpc")
Signed-off-by: Evan Nimmo <evan.nimmo@alliedtelesis.co.nz>
Link: https://lore.kernel.org/r/20201110022825.30895-1-evan.nimmo@alliedtelesis.co.nz
Signed-off-by: Rob Herring <robh@kernel.org>
This fixes a regression for blocking connects introduced by commit
4becb7ee5b ("net/x25: Fix x25_neigh refcnt leak when x25 disconnect").
The x25->neighbour is already set to "NULL" by x25_disconnect() now,
while a blocking connect is waiting in
x25_wait_for_connection_establishment(). Therefore x25->neighbour must
not be accessed here again and x25->state is also already set to
X25_STATE_0 by x25_disconnect().
Fixes: 4becb7ee5b ("net/x25: Fix x25_neigh refcnt leak when x25 disconnect")
Signed-off-by: Martin Schiller <ms@dev.tdt.de>
Reviewed-by: Xie He <xie.he.0141@gmail.com>
Link: https://lore.kernel.org/r/20201109065449.9014-1-ms@dev.tdt.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull swiotlb fixes from Konrad Rzeszutek Wilk:
"Two tiny fixes for issues that make drivers under Xen unhappy under
certain conditions"
* 'stable/for-linus-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb:
swiotlb: remove the tbl_dma_addr argument to swiotlb_tbl_map_single
swiotlb: fix "x86: Don't panic if can not alloc buffer for swiotlb"
If an application specifies IORING_SETUP_CQSIZE to set the CQ ring size
to a specific size, we ensure that the CQ size is at least that of the
SQ ring size. But in doing so, we compare the already rounded up to power
of two SQ size to the as-of yet unrounded CQ size. This means that if an
application passes in non power of two sizes, we can return -EINVAL when
the final value would've been fine. As an example, an application passing
in 100/100 for sq/cq size should end up with 128 for both. But since we
round the SQ size first, we compare the CQ size of 100 to 128, and return
-EINVAL as that is too small.
Cc: stable@vger.kernel.org
Fixes: 33a107f0a1 ("io_uring: allow application controlled CQ ring size")
Reported-by: Dan Melnic <dmm@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We also need to drop the iolock when invalidate_inode_pages2 fails, not
only on all other error or successful cases.
Fixes: 527851124d ("xfs: implement pNFS export operations")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
In the case that the SPI mux isn't set, the transfer_one_message
function returns without finalizing the message. This means that
the transfer never completes, resulting in hung tasks and an
eventual kernel panic. Fix it by finalizing the transfer in this
case.
Fixes: 9211a441e6 ("spi: fsi: Check mux status before transfers")
Signed-off-by: Eddie James <eajames@linux.ibm.com>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Link: https://lore.kernel.org/r/20201110214736.25718-1-eajames@linux.ibm.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Commit f3186dd876 ("spi: Optionally use GPIO descriptors for CS GPIOs")
introduced the optional use of GPIO descriptors for chip selects.
A side-effect of this change: when a SPI bus uses GPIO descriptors,
all its client devices have SPI_CS_HIGH set in spi->mode. This flag is
required for the SPI bus to operate correctly.
This unfortunately breaks many client drivers, which use the following
pattern to configure their underlying SPI bus:
static int client_device_probe(struct spi_device *spi)
{
...
spi->mode = SPI_MODE_0;
spi->bits_per_word = 8;
err = spi_setup(spi);
..
}
In short, many client drivers overwrite the SPI_CS_HIGH bit in
spi->mode, and break the underlying SPI bus driver.
This is especially true for Freescale/NXP imx ecspi, where large
numbers of spi client drivers now no longer work.
Proposed fix:
-------------
When using gpio descriptors, depend on gpiolib to handle CS polarity.
Existing quirks in gpiolib ensure that this is handled correctly.
Existing gpiolib behaviour will force the polarity of any chip-select
gpiod to active-high (if 'spi-active-high' devicetree prop present) or
active-low (if 'spi-active-high' absent). Irrespective of whether
the gpio is marked GPIO_ACTIVE_[HIGH|LOW] in the devicetree.
Loose ends:
-----------
If this fix is applied:
- is commit 138c9c32f0
("spi: spidev: Fix CS polarity if GPIO descriptors are used")
still necessary / correct ?
Fixes: f3186dd876 ("spi: Optionally use GPIO descriptors for CS GPIOs")
Signed-off-by: Sven Van Asbroeck <thesven73@gmail.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/r/20201106150706.29089-1-TheSven73@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Limit the fsl,pfuze-support-disable-sw to the pfuze100 and pfuze200
variants.
When enabling fsl,pfuze-support-disable-sw and using a pfuze3000 or
pfuze3001, the driver would choose pfuze100_sw_disable_regulator_ops
instead of the newly introduced and correct pfuze3000_sw_regulator_ops.
Signed-off-by: Sean Nyekjaer <sean@geanix.com>
Fixes: 6f1cf5257a ("regualtor: pfuze100: correct sw1a/sw2 on pfuze3000")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20201110174113.2066534-1-sean@geanix.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Commit 39297dde73 ("x86/platform/uv: Remove UV BAU TLB Shootdown
Handler") removed uv_flush_tlb_others. Its declaration was removed also
from asm/uv/uv.h. But only for the CONFIG_X86_UV=y case. The inline
definition (!X86_UV case) is still in place.
So remove this implementation with everything what was added to support
uv_flush_tlb_others:
* include of asm/tlbflush.h
* forward declarations of struct cpumask, mm_struct, and flush_tlb_info
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Mike Travis <mike.travis@hpe.com>
Acked-by: Steve Wahl <steve.wahl@hpe.com>
Link: https://lore.kernel.org/r/20201109093653.2042-1-jslaby@suse.cz
When invoking kexec() on a Linux guest running on a Hyper-V host, the
kernel panics.
RIP: 0010:cpuhp_issue_call+0x137/0x140
Call Trace:
__cpuhp_remove_state_cpuslocked+0x99/0x100
__cpuhp_remove_state+0x1c/0x30
hv_kexec_handler+0x23/0x30 [hv_vmbus]
hv_machine_shutdown+0x1e/0x30
machine_shutdown+0x10/0x20
kernel_kexec+0x6d/0x96
__do_sys_reboot+0x1ef/0x230
__x64_sys_reboot+0x1d/0x20
do_syscall_64+0x6b/0x3d8
entry_SYSCALL_64_after_hwframe+0x44/0xa9
This was due to hv_synic_cleanup() callback returning -EBUSY to
cpuhp_issue_call() when tearing down the VMBUS_CONNECT_CPU, even
if the vmbus_connection.conn_state = DISCONNECTED. hv_synic_cleanup()
should succeed in the case where vmbus_connection.conn_state
is DISCONNECTED.
Fix is to add an extra condition to test for
vmbus_connection.conn_state == CONNECTED on the VMBUS_CONNECT_CPU and
only return early if true. This way the kexec() path can still shut
everything down while preserving the initial behavior of preventing
CPU offlining on the VMBUS_CONNECT_CPU while the VM is running.
Fixes: 8a857c5542 ("Drivers: hv: vmbus: Always handle the VMBus messages on CPU0")
Signed-off-by: Chris Co <chrco@microsoft.com>
Reviewed-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20201110190118.15596-1-chrco@linux.microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
We can only have protected guest pages after a successful set secure
parameters call as only then the UV allows imports and unpacks.
By moving the test we can now also check for it in s390_reset_acc()
and do an early return if it is 0.
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Fixes: 29b40f105e ("KVM: s390: protvirt: Add initial vm and cpu lifecycle handling")
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Julian Wiedmann says:
====================
net/iucv: fixes 2020-11-09
One fix in the shutdown path for af_iucv sockets. This is relevant for
stable as well.
Also sending along an update for the Maintainers file.
v1 -> v2: use the correct Fixes tag in patch 1 (Jakub)
====================
Link: https://lore.kernel.org/r/20201109075706.56573-1-jwi@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
I am retiring soon. Thus this patch removes myself from the
MAINTAINERS file (s390 network).
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
[jwi: fix up the subject]
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
syzbot reported the following KASAN finding:
BUG: KASAN: nullptr-dereference in iucv_send_ctrl+0x390/0x3f0 net/iucv/af_iucv.c:385
Read of size 2 at addr 000000000000021e by task syz-executor907/519
CPU: 0 PID: 519 Comm: syz-executor907 Not tainted 5.9.0-syzkaller-07043-gbcf9877ad213 #0
Hardware name: IBM 3906 M04 701 (KVM/Linux)
Call Trace:
[<00000000c576af60>] unwind_start arch/s390/include/asm/unwind.h:65 [inline]
[<00000000c576af60>] show_stack+0x180/0x228 arch/s390/kernel/dumpstack.c:135
[<00000000c9dcd1f8>] __dump_stack lib/dump_stack.c:77 [inline]
[<00000000c9dcd1f8>] dump_stack+0x268/0x2f0 lib/dump_stack.c:118
[<00000000c5fed016>] print_address_description.constprop.0+0x5e/0x218 mm/kasan/report.c:383
[<00000000c5fec82a>] __kasan_report mm/kasan/report.c:517 [inline]
[<00000000c5fec82a>] kasan_report+0x11a/0x168 mm/kasan/report.c:534
[<00000000c98b5b60>] iucv_send_ctrl+0x390/0x3f0 net/iucv/af_iucv.c:385
[<00000000c98b6262>] iucv_sock_shutdown+0x44a/0x4c0 net/iucv/af_iucv.c:1457
[<00000000c89d3a54>] __sys_shutdown+0x12c/0x1c8 net/socket.c:2204
[<00000000c89d3b70>] __do_sys_shutdown net/socket.c:2212 [inline]
[<00000000c89d3b70>] __s390x_sys_shutdown+0x38/0x48 net/socket.c:2210
[<00000000c9e36eac>] system_call+0xe0/0x28c arch/s390/kernel/entry.S:415
There is nothing to shutdown if a connection has never been established.
Besides that iucv->hs_dev is not yet initialized if a socket is in
IUCV_OPEN state and iucv->path is not yet initialized if socket is in
IUCV_BOUND state.
So, just skip the shutdown calls for a socket in these states.
Fixes: eac3731bd0 ("[S390]: Add AF_IUCV socket support")
Fixes: 82492a355f ("af_iucv: add shutdown for HS transport")
Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
[jwi: correct one Fixes tag]
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In the net core, the struct net_device_ops -> ndo_set_rx_mode()
callback is called with the dev->addr_list_lock spinlock held.
However, this driver's ndo_set_rx_mode callback eventually calls
lan743x_dp_write(), which acquires a mutex. Mutex acquisition
may sleep, and this is not allowed when holding a spinlock.
Fix by removing the dp_lock mutex entirely. Its purpose is to
prevent concurrent accesses to the data port. No concurrent
accesses are possible, because the dev->addr_list_lock
spinlock in the core only lets through one thread at a time.
Fixes: 23f0703c12 ("lan743x: Add main source files for new lan743x driver")
Signed-off-by: Sven Van Asbroeck <thesven73@gmail.com>
Link: https://lore.kernel.org/r/20201109203828.5115-1-TheSven73@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When net.ipv4.tcp_syncookies=1 and syn flood is happened,
cookie_v4_check or cookie_v6_check tries to redo what
tcp_v4_send_synack or tcp_v6_send_synack did,
rsk_window_clamp will be changed if SOCK_RCVBUF is set,
which will make rcv_wscale is different, the client
still operates with initial window scale and can overshot
granted window, the client use the initial scale but local
server use new scale to advertise window value, and session
work abnormally.
Fixes: e88c64f0a4 ("tcp: allow effective reduction of TCP's rcv-buffer via setsockopt")
Signed-off-by: Mao Wenan <wenan.mao@linux.alibaba.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/1604967391-123737-1-git-send-email-wenan.mao@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The try_invoke_on_locked_down_task() function requires that
interrupts be enabled, but it is called with interrupts disabled from
rcu_print_task_stall(), resulting in an "IRQs not enabled as expected"
diagnostic. This commit therefore updates rcu_print_task_stall()
to accumulate a list of the first few tasks while holding the current
leaf rcu_node structure's ->lock, then releases that lock and only then
uses try_invoke_on_locked_down_task() to attempt to obtain per-task
detailed information. Of course, as soon as ->lock is released, the
task might exit, so the get_task_struct() function is used to prevent
the task structure from going away in the meantime.
Link: https://lore.kernel.org/lkml/000000000000903d5805ab908fc4@google.com/
Fixes: 5bef8da66a ("rcu: Add per-task state to RCU CPU stall warnings")
Reported-by: syzbot+cb3b69ae80afd6535b0e@syzkaller.appspotmail.com
Reported-by: syzbot+f04854e1c5c9e913cc27@syzkaller.appspotmail.com
Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Now that we've straightened out the callers, move these three functions
to fs.h since they're fairly trivial.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Break this function into two helpers so that it's obvious that the
trylock versions return a value that must be checked, and the blocking
versions don't require that. While we're at it, clean up the return
type mismatch.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
__sb_start_write has some weird looking lockdep code that claims to
exist to handle nested freeze locking requests from xfs. The code as
written seems broken -- if we think we hold a read lock on any of the
higher freeze levels (e.g. we hold SB_FREEZE_WRITE and are trying to
lock SB_FREEZE_PAGEFAULT), it converts a blocking lock attempt into a
trylock.
However, it's not correct to downgrade a blocking lock attempt to a
trylock unless the downgrading code or the callers are prepared to deal
with that situation. Neither __sb_start_write nor its callers handle
this at all. For example:
sb_start_pagefault ignores the return value completely, with the result
that if xfs_filemap_fault loses a race with a different thread trying to
fsfreeze, it will proceed without pagefault freeze protection (thereby
breaking locking rules) and then unlocks the pagefault freeze lock that
it doesn't own on its way out (thereby corrupting the lock state), which
leads to a system hang shortly afterwards.
Normally, this won't happen because our ownership of a read lock on a
higher freeze protection level blocks fsfreeze from grabbing a write
lock on that higher level. *However*, if lockdep is offline,
lock_is_held_type unconditionally returns 1, which means that
percpu_rwsem_is_held returns 1, which means that __sb_start_write
unconditionally converts blocking freeze lock attempts into trylocks,
even when we *don't* hold anything that would block a fsfreeze.
Apparently this all held together until 5.10-rc1, when bugs in lockdep
caused lockdep to shut itself off early in an fstests run, and once
fstests gets to the "race writes with freezer" tests, kaboom. This
might explain the long trail of vanishingly infrequent livelocks in
fstests after lockdep goes offline that I've never been able to
diagnose.
We could fix it by spinning on the trylock if wait==true, but AFAICT the
locking works fine if lockdep is not built at all (and I didn't see any
complaints running fstests overnight), so remove this snippet entirely.
NOTE: Commit f4b554af99 in 2015 created the current weird logic (which
used to exist in a different form in commit 5accdf82ba from 2012) in
__sb_start_write. XFS solved this whole problem in the late 2.6 era by
creating a variant of transactions (XFS_TRANS_NO_WRITECOUNT) that don't
grab intwrite freeze protection, thus making lockdep's solution
unnecessary. The commit claims that Dave Chinner explained that the
trylock hack + comment could be removed, but nobody ever did.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Fix some serious WTF in the reference count scrubber's rmap fragment
processing. The code comment says that this loop is supposed to move
all fragment records starting at or before bno onto the worklist, but
there's no obvious reason why nr (the number of items added) should
increment starting from 1, and breaking the loop when we've added the
target number seems dubious since we could have more rmap fragments that
should have been added to the worklist.
This seems to manifest in xfs/411 when adding one to the refcount field.
Fixes: dbde19da96 ("xfs: cross-reference the rmapbt data with the refcountbt")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Keys for extent interval records in the reverse mapping btree are
supposed to be computed as follows:
(physical block, owner, fork, is_btree, is_unwritten, offset)
This provides users the ability to look up a reverse mapping from a bmbt
record -- start with the physical block; then if there are multiple
records for the same block, move on to the owner; then the inode fork
type; and so on to the file offset.
However, the key comparison functions incorrectly remove the
fork/btree/unwritten information that's encoded in the on-disk offset.
This means that lookup comparisons are only done with:
(physical block, owner, offset)
This means that queries can return incorrect results. On consistent
filesystems this hasn't been an issue because blocks are never shared
between forks or with bmbt blocks; and are never unwritten. However,
this bug means that online repair cannot always detect corruption in the
key information in internal rmapbt nodes.
Found by fuzzing keys[1].attrfork = ones on xfs/371.
Fixes: 4b8ed67794 ("xfs: add rmap btree operations")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
When the bmbt scrubber is looking up rmap extents, we need to set the
extent flags from the bmbt record fully. This will matter once we fix
the rmap btree comparison functions to check those flags correctly.
Fixes: d852657ccf ("xfs: cross-reference reverse-mapping btree")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Pass the same oldext argument (which contains the existing rmapping's
unwritten state) to xfs_rmap_lookup_le_range at the start of
xfs_rmap_convert_shared. At this point in the code, flags is zero,
which means that we perform lookups using the wrong key.
Fixes: 3f165b334e ("xfs: convert unwritten status of reverse mappings for shared files")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
The RTL8401-internal PHY identifies as RTL8201CP, and the init
sequence in r8169, copied from vendor driver r8168, uses paged
operations. Therefore set the same paged operation callbacks as
for the other Realtek PHY's.
Fixes: cdafdc29ef ("r8169: sync support for RTL8401 with vendor driver")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/69882f7a-ca2f-e0c7-ae83-c9b6937282cd@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit 6f197fb638 ("lan743x: Added fixed link and RGMII support")
assumes that chips with an internal PHY will never have a devicetree
entry. This is incorrect: even for these chips, a devicetree entry
can be useful e.g. to pass the mac address from bootloader to chip:
&pcie {
status = "okay";
host@0 {
reg = <0 0 0 0 0>;
#address-cells = <3>;
#size-cells = <2>;
lan7430: ethernet@0 {
/* LAN7430 with internal PHY */
compatible = "microchip,lan743x";
status = "okay";
reg = <0 0 0 0 0>;
/* filled in by bootloader */
local-mac-address = [00 00 00 00 00 00];
};
};
};
If a devicetree entry is present, the driver will not attach the chip
to its internal phy, and the chip will be non-operational.
Fix by tweaking the phy connection algorithm:
- first try to connect to a phy specified in the devicetree
(could be 'real' phy, or just a 'fixed-link')
- if that doesn't succeed, try to connect to an internal phy, even
if the chip has a devnode
Tested on a LAN7430 with internal PHY. I cannot test a device using
fixed-link, as I do not have access to one.
Fixes: 6f197fb638 ("lan743x: Added fixed link and RGMII support")
Tested-by: Sven Van Asbroeck <thesven73@gmail.com> # lan7430
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Sven Van Asbroeck <thesven73@gmail.com>
Link: https://lore.kernel.org/r/20201108171224.23829-1-TheSven73@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The current NetLabel code doesn't correctly keep track of the netlink
dump state in some cases, in particular when multiple interfaces with
large configurations are loaded. The problem manifests itself by not
reporting the full configuration to userspace, even though it is
loaded and active in the kernel. This patch fixes this by ensuring
that the dump state is properly reset when necessary inside the
netlbl_unlabel_staticlist() function.
Fixes: 8cc44579d1 ("NetLabel: Introduce static network labels for unlabeled connections")
Signed-off-by: Paul Moore <paul@paul-moore.com>
Link: https://lore.kernel.org/r/160484450633.3752.16512718263560813473.stgit@sifl
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
'igc_update_stats()' was not updating 'netdev->stats', so the returned
statistics, for example, requested by:
$ ip -s link show dev enp3s0
were not being updated and were always zero.
Fix by returning a set of statistics that are actually being
updated (adapter->stats64).
Fixes: c9a11c23ce ("igc: Add netdev")
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Fix MAC setting flow for the PF driver.
Update the unicast VF's MAC address in VF structure if it is
a new setting in i40e_vc_add_mac_addr_msg.
When unicast MAC address gets deleted, record that and
set the new unicast MAC address that is already waiting in the filter
list. This logic is based on the order of messages arriving to
the PF driver.
Without this change the MAC address setting was interpreted
incorrectly in the following use cases:
1) Print incorrect VF MAC or zero MAC
ip link show dev $pf
2) Don't preserve MAC between driver reload
rmmod iavf; modprobe iavf
3) Update VF MAC when macvlan was set
ip link add link $vf address $mac $vf.1 type macvlan
4) Failed to update mac address when VF was trusted
ip link set dev $vf address $mac
This includes all other configurations including above commands.
Fixes: f657a6e131 ("i40e: Fix VF driver MAC address configuration")
Signed-off-by: Slawomir Laba <slawomirx.laba@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Currently the following expectation
KUNIT_EXPECT_STREQ(test, "hi", "bye");
will produce:
Expected "hi" == "bye", but
"hi" == 1625079497
"bye" == 1625079500
After this patch:
Expected "hi" == "bye", but
"hi" == hi
"bye" == bye
KUNIT_INIT_BINARY_STR_ASSERT_STRUCT() was written but just mistakenly
not actually used by KUNIT_EXPECT_STREQ() and friends.
Signed-off-by: Daniel Latypov <dlatypov@google.com>
Reviewed-by: Brendan Higgins <brendanhiggins@google.com>
Tested-by: Brendan Higgins <brendanhiggins@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
For simplcity, strip all trailing whitespace from parsed output.
I imagine no one is printing out meaningful trailing whitespace via
KUNIT_FAIL() or similar, and that if they are, they really shouldn't.
`isolate_kunit_output()` yielded liens with trailing \n, which results
in artifacty output like this:
$ ./tools/testing/kunit/kunit.py run
[16:16:46] [FAILED] example_simple_test
[16:16:46] # example_simple_test: EXPECTATION FAILED at lib/kunit/kunit-example-test.c:29
[16:16:46] Expected 1 + 1 == 3, but
[16:16:46] 1 + 1 == 2
[16:16:46] 3 == 3
[16:16:46] not ok 1 - example_simple_test
[16:16:46]
After this change:
[16:16:46] # example_simple_test: EXPECTATION FAILED at lib/kunit/kunit-example-test.c:29
[16:16:46] Expected 1 + 1 == 3, but
[16:16:46] 1 + 1 == 2
[16:16:46] 3 == 3
[16:16:46] not ok 1 - example_simple_test
[16:16:46]
We should *not* be expecting lines to end with \n in kunit_tool_test.py
for this reason.
Do the same for `raw_output()` as well which suffers from the same
issue.
This is a followup to [1], but rebased onto kunit-fixes to pick up the
other raw_output() fix and fixes for kunit_tool_test.py.
[1] https://lore.kernel.org/linux-kselftest/20201020233219.4146059-1-dlatypov@google.com/
Signed-off-by: Daniel Latypov <dlatypov@google.com>
Reviewed-by: David Gow <davidgow@google.com>
Tested-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Currently the tool redirects make stdout + stderr, and only shows them
if the make command fails.
This means build warnings aren't shown to the user.
This change prints the contents of stderr even if make succeeds, under
the assumption these are only build warnings or other messages the user
likely wants to see.
We drop stdout from the raised exception since we can no longer easily
collate stdout and stderr and just showing the stderr seems fine.
Example with a warning:
[14:56:35] Building KUnit Kernel ...
../lib/kunit/kunit-test.c: In function ‘kunit_test_successful_try’:
../lib/kunit/kunit-test.c:19:6: warning: unused variable ‘unused’ [-Wunused-variable]
19 | int unused;
| ^~~~~~
[14:56:40] Starting KUnit Kernel ...
Note the stderr has a trailing \n, and since we use print, we add
another, but it helps separate make and kunit.py output.
Example with a build error:
[15:02:45] Building KUnit Kernel ...
ERROR:root:../lib/kunit/kunit-test.c: In function ‘kunit_test_successful_try’:
../lib/kunit/kunit-test.c:19:2: error: unknown type name ‘invalid_type’
19 | invalid_type *test = data;
| ^~~~~~~~~~~~
...
Signed-off-by: Daniel Latypov <dlatypov@google.com>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
The code uses annotations, but they aren't accurate.
Note that type checking in python is a separate process, running
`kunit.py run` will not check and complain about invalid types at
runtime.
Fix pre-existing issues found by running a type checker
$ mypy *.py
All but one of these were returning `None` without denoting this
properly (via `Optional[Type]`).
Signed-off-by: Daniel Latypov <dlatypov@google.com>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
When JSON support was added in [1], the KunitParseRequest tuple was
updated to contain a 'build_dir' field, but kunit.py parse doesn't
accept --build_dir as an option. The code nevertheless tried to access
it, resulting in this error:
AttributeError: 'Namespace' object has no attribute 'build_dir'
Given that the parser only uses the build_dir variable to set the
'build_environment' json field, we set it to None (which gives the JSON
'null') for now. Ultimately, we probably do want to be able to set this,
but since it's new functionality which (for the parse subcommand) never
worked, this is the quickest way of getting it back up and running.
[1]: https://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest.git/commit/?h=kunit-fixes&id=21a6d1780d5bbfca0ce9b8104ca6233502fcbf86
Fixes: 21a6d1780d ("kunit: tool: allow generating test results in JSON")
Signed-off-by: David Gow <davidgow@google.com>
Reviewed-by: Brendan Higgins <brendanhiggins@google.com>
Tested-by: Brendan Higgins <brendanhiggins@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
The tools/testing/kunit/test_data/ directory was marked as binary
because some of the test_data files cause checkpatch warnings. Fix this
by dropping the .gitattributes file.
Fixes: afc63da64f ("kunit: kunit_parser: make parser more robust")
Signed-off-by: Brendan Higgins <brendanhiggins@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Pull core dump fix from Al Viro:
"Fix for multithreaded coredump playing fast and loose with getting
registers of secondary threads; if a secondary gets caught in the
middle of exit(2), the conditition it will be stopped in for dumper to
examine might be unusual enough for things to go wrong.
Quite a few architectures are fine with that, but some are not."
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
don't dump the threads that had been already exiting when zapped.
Commit
d9e9a64180 ("x86/mm/pti: Allocate a separate user PGD")
changed the PGD allocation to allocate PGD_ALLOCATION_ORDER pages, so in
the error path it should be freed using free_pages() rather than
free_page().
Commit
06ace26f4e ("x86/efi: Free efi_pgd with free_pages()")
fixed one instance of this, but missed another.
Move the freeing out-of-line to avoid code duplication and fix this bug.
Fixes: d9e9a64180 ("x86/mm/pti: Allocate a separate user PGD")
Link: https://lore.kernel.org/r/20201110163919.1134431-1-nivedita@alum.mit.edu
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Pull btrfs fixes from David Sterba:
"A handful of minor fixes and updates:
- handle missing device replace item on mount (syzbot report)
- fix space reservation calculation when finishing relocation
- fix memory leak on error path in ref-verify (debugging feature)
- fix potential overflow during defrag on 32bit arches
- minor code update to silence smatch warning
- minor error message updates"
* tag 'for-5.10-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: ref-verify: fix memory leak in btrfs_ref_tree_mod
btrfs: dev-replace: fail mount if we don't have replace item with target device
btrfs: scrub: update message regarding read-only status
btrfs: clean up NULL checks in qgroup_unreserve_range()
btrfs: fix min reserved size calculation in merge_reloc_root
btrfs: print the block rsv type when we fail our reservation
btrfs: fix potential overflow in cluster_pages_for_defrag on 32bit arch
Pull fscrypt fix from Eric Biggers:
"Fix a regression where a new WARN_ON() was reachable when using
FSCRYPT_POLICY_FLAG_IV_INO_LBLK_32 on ext4, causing xfstest
generic/602 to sometimes fail on ext4"
* tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt:
fscrypt: remove reachable WARN in fscrypt_setup_iv_ino_lblk_32_key()
Pull turbostat updates from Len Brown:
"Update update to version 20.09.30, one kernel side fix"
* 'turbostat' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
tools/power turbostat: update version number
powercap: restrict energy meter to root access
tools/power turbostat: harden against cpu hotplug
tools/power turbostat: adjust for temperature offset
tools/power turbostat: Build with _FILE_OFFSET_BITS=64
tools/power turbostat: Support AMD Family 19h
tools/power turbostat: Remove empty columns for Jacobsville
tools/power turbostat: Add a new GFXAMHz column that exposes gt_act_freq_mhz.
tools/power x86_energy_perf_policy: Input/output error in a VM
tools/power turbostat: Skip pc8, pc9, pc10 columns, if they are disabled
tools/power turbostat: Support additional CPU model numbers
tools/power turbostat: Fix output formatting for ACPI CST enumeration
tools/power turbostat: Replace HTTP links with HTTPS ones: TURBOSTAT UTILITY
tools/power turbostat: Use sched_getcpu() instead of hardcoded cpu 0
tools/power turbostat: Enable accumulate RAPL display
tools/power turbostat: Introduce functions to accumulate RAPL consumption
tools/power turbostat: Make the energy variable to be 64 bit
tools/power turbostat: Always print idle in the system configuration header
tools/power turbostat: Print /dev/cpu_dma_latency
Reading /proc/sys/kernel/sched_domain/cpu*/domain0/flags mutliple times
with small reads causes oopses with slub corruption issues because the kfree is
free'ing an offset from a previous allocation. Fix this by adding in a new
pointer 'buf' for the allocation and kfree and use the temporary pointer tmp
to handle memory copies of the buf offsets.
Fixes: 5b9f8ff7b3 ("sched/debug: Output SD flag names rather than their values")
Reported-by: Jeff Bastian <jbastian@redhat.com>
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lkml.kernel.org/r/20201029151103.373410-1-colin.king@canonical.com
During fast wakeup path, scheduler always check whether local or prev
cpus are good candidates for the task before looking for other cpus in
the domain. With commit b7a331615d ("sched/fair: Add asymmetric CPU
capacity wakeup scan") the heterogenous system gains a dedicated path
but doesn't try to reuse prev cpu whenever possible. If the previous
cpu is idle and belong to the LLC domain, we should check it 1st
before looking for another cpu because it stays one of the best
candidate and this also stabilizes task placement on the system.
This change aligns asymmetric path behavior with symmetric one and reduces
cases where the task migrates across all cpus of the sd_asym_cpucapacity
domains at wakeup.
This change does not impact normal EAS mode but only the overloaded case or
when EAS is not used.
- On hikey960 with performance governor (EAS disable)
./perf bench sched pipe -T -l 50000
mainline w/ patch
# migrations 999364 0
ops/sec 149313(+/-0.28%) 182587(+/- 0.40) +22%
- On hikey with performance governor
./perf bench sched pipe -T -l 50000
mainline w/ patch
# migrations 0 0
ops/sec 47721(+/-0.76%) 47899(+/- 0.56) +0.4%
According to test on hikey, the patch doesn't impact symmetric system
compared to current implementation (only tested on arm64)
Also read the uclamped value of task's utilization at most twice instead
instead each time we compare task's utilization with cpu's capacity.
Fixes: b7a331615d ("sched/fair: Add asymmetric CPU capacity wakeup scan")
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lkml.kernel.org/r/20201029161824.26389-1-vincent.guittot@linaro.org
schbench shows latency increase for 95 percentile above since:
commit 0b0695f2b3 ("sched/fair: Rework load_balance()")
Align the behavior of the load balancer with the wake up path, which tries
to select an idle CPU which belongs to the LLC for a waking task.
calculate_imbalance() will use nr_running instead of the spare
capacity when CPUs share resources (ie cache) at the domain level. This
will ensure a better spread of tasks on idle CPUs.
Running schbench on a hikey (8cores arm64) shows the problem:
tip/sched/core :
schbench -m 2 -t 4 -s 10000 -c 1000000 -r 10
Latency percentiles (usec)
50.0th: 33
75.0th: 45
90.0th: 51
95.0th: 4152
*99.0th: 14288
99.5th: 14288
99.9th: 14288
min=0, max=14276
tip/sched/core + patch :
schbench -m 2 -t 4 -s 10000 -c 1000000 -r 10
Latency percentiles (usec)
50.0th: 34
75.0th: 47
90.0th: 52
95.0th: 78
*99.0th: 94
99.5th: 94
99.9th: 94
min=0, max=94
Fixes: 0b0695f2b3 ("sched/fair: Rework load_balance()")
Reported-by: Chris Mason <clm@fb.com>
Suggested-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Rik van Riel <riel@surriel.com>
Tested-by: Rik van Riel <riel@surriel.com>
Link: https://lkml.kernel.org/r/20201102102457.28808-1-vincent.guittot@linaro.org
gcc -Wextra points out a duplicate initialization of one array
member:
arch/x86/events/intel/uncore_snb.c:478:37: warning: initialized field overwritten [-Woverride-init]
478 | [SNB_PCI_UNCORE_IMC_DATA_READS] = { SNB_UNCORE_PCI_IMC_DATA_WRITES_BASE,
The only sensible explanation is that a duplicate 'READS' was used
instead of the correct 'WRITES', so change it back.
Fixes: 24633d901e ("perf/x86/intel/uncore: Add BW counters for GT, IA and IO breakdown")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20201026215203.3893972-1-arnd@kernel.org
Chris Wilson reported a problem spotted by check_chain_key(): a chain
key got changed in validate_chain() because we modify the ->read in
validate_chain() to skip checks for dependency adding, and ->read is
taken into calculation for chain key since commit f611e8cf98
("lockdep: Take read/write status in consideration when generate
chainkey").
Fix this by avoiding to modify ->read in validate_chain() based on two
facts: a) since we now support recursive read lock detection, there is
no need to skip checks for dependency adding for recursive readers, b)
since we have a), there is only one case left (nest_lock) where we want
to skip checks in validate_chain(), we simply remove the modification
for ->read and rely on the return value of check_deadlock() to skip the
dependency adding.
Reported-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20201102053743.450459-1-boqun.feng@gmail.com
Make intel_pstate take the new CPUFREQ_GOV_STRICT_TARGET governor
flag into account when it operates in the passive mode with HWP
enabled, so as to fix the "powersave" governor behavior in that
case (currently, HWP is allowed to scale the performance all the
way up to the policy max limit when the "powersave" governor is
used, but it should be constrained to the policy min limit then).
Fixes: f6ebbcf08f ("cpufreq: intel_pstate: Implement passive mode with HWP enabled")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: 5.9+ <stable@vger.kernel.org> # 5.9+: 9a2a9ebc0a cpufreq: Introduce governor flags
Cc: 5.9+ <stable@vger.kernel.org> # 5.9+: 218f668701 cpufreq: Introduce CPUFREQ_GOV_STRICT_TARGET
Cc: 5.9+ <stable@vger.kernel.org> # 5.9+: ea9364bbad cpufreq: Add strict_target to struct cpufreq_policy
Add a new field to be set when the CPUFREQ_GOV_STRICT_TARGET flag is
set for the current governor to struct cpufreq_policy, so that the
drivers needing to check CPUFREQ_GOV_STRICT_TARGET do not have to
access the governor object during every frequency transition.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Introduce a new governor flag, CPUFREQ_GOV_STRICT_TARGET, for the
governors that want the target frequency to be set exactly to the
given value without leaving any room for adjustments on the hardware
side and set this flag for the powersave and performance governors.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
A new cpufreq governor flag will be added subsequently, so replace
the bool dynamic_switching fleid in struct cpufreq_governor with a
flags field and introduce CPUFREQ_GOV_DYNAMIC_SWITCHING to set for
the "dynamic switching" governors instead of it.
No intentional functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Remove non-privileged user access to power data contained in
/sys/class/powercap/intel-rapl*/*/energy_uj
Non-privileged users currently have read access to power data and can
use this data to form a security attack. Some privileged
drivers/applications need read access to this data, but don't expose it
to non-privileged users.
For example, thermald uses this data to ensure that power management
works correctly. Thus removing non-privileged access is preferred over
completely disabling this power reporting capability with
CONFIG_INTEL_RAPL=n.
Fixes: 95677a9a38 ("PowerCap: Fix mode for energy counter")
Signed-off-by: Len Brown <len.brown@intel.com>
Cc: stable@vger.kernel.org
bdget_disk needs to be paired with bdput to not leak a reference
on the block device inode.
Fixes: 08ba91ee6e ("nbd: Add the nbd NBD_DISCONNECT_ON_CLOSE config flag.")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull NVMe fix from Christoph:
"nvme fixes for 5.10:
- don't clear the read-only bit on a revalidate (Sagi Grimberg)"
* tag 'nvme-5.10-2020-11-10' of git://git.infradead.org/nvme:
nvme: fix incorrect behavior when BLKROSET is called by the user
intel-pinctrl for v5.10-2
* Respect bias setting when comes from ACPI
The following is an automated git shortlog grouped by driver:
intel:
- Set default bias in case no particular value given
- Fix 2 kOhm bias which is 833 Ohm
When GPIOs that are routed to PDC are used as output they can still latch
the IRQ pending at GIC. As a result the spurious IRQ was handled when the
client driver change the direction to input to starts using it as IRQ.
Currently such erroneous latched IRQ are cleared with .irq_enable callback
however if the driver continue to use GPIO as interrupt and invokes
disable_irq() followed by enable_irq() then everytime during enable_irq()
previously latched interrupt gets cleared.
This can make edge IRQs not seen after enable_irq() if they had arrived
after the driver has invoked disable_irq() and were pending at GIC.
Move clearing erroneous IRQ to .irq_request_resources callback as this is
the place where GPIO direction is changed as input and its locked as IRQ.
While at this add a missing check to invoke msm_gpio_irq_clear_unmask()
from .irq_enable callback only when GPIO is not routed to PDC.
Fixes: e35a6ae0eb ("pinctrl/msm: Setup GPIO chip in hierarchy")
Signed-off-by: Maulik Shah <mkshah@codeaurora.org>
Link: https://lore.kernel.org/r/1604561884-10166-1-git-send-email-mkshah@codeaurora.org
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
gpio fixes for v5.10-rc3
- fix missing conversion to gpiolib irqchip in gpio-dwapb
- fix bank properties for ast2600 variant in gpio-aspeed
- make sysfs work again when the character device is disabled
- fix interrupt handling in gpio-pcie-idio-24
Commit ce3d31ad3c ("arm64/smp: Move rcu_cpu_starting() earlier") ensured
that RCU is informed early about incoming CPUs that might end up calling
into printk() before they are online. However, if such a CPU fails the
early CPU feature compatibility checks in check_local_cpu_capabilities(),
then it will be powered off or parked without informing RCU, leading to
an endless stream of stalls:
| rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
| rcu: 2-O...: (0 ticks this GP) idle=002/1/0x4000000000000000 softirq=0/0 fqs=2593
| (detected by 0, t=5252 jiffies, g=9317, q=136)
| Task dump for CPU 2:
| task:swapper/2 state:R running task stack: 0 pid: 0 ppid: 1 flags:0x00000028
| Call trace:
| ret_from_fork+0x0/0x30
Ensure that the dying CPU invokes rcu_report_dead() prior to being powered
off or parked.
Cc: Qian Cai <cai@redhat.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Suggested-by: Qian Cai <cai@redhat.com>
Link: https://lore.kernel.org/r/20201105222242.GA8842@willie-the-truck
Link: https://lore.kernel.org/r/20201106103602.9849-3-will@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Sparse gets cross about us returning 0 from image_load(), which has a
return type of 'void *':
>> arch/arm64/kernel/kexec_image.c:130:16: sparse: sparse: Using plain integer as NULL pointer
Return NULL instead, as we don't use the return value for anything if it
does not indicate an error.
Cc: Benjamin Gwin <bgwin@google.com>
Reported-by: kernel test robot <lkp@intel.com>
Fixes: 108aa50365 ("arm64: kexec_file: try more regions if loading segments fails")
Link: https://lore.kernel.org/r/202011091736.T0zH8kaC-lkp@intel.com
Signed-off-by: Will Deacon <will@kernel.org>
In a surprising turn of events, it transpires that CPU capabilities
configured as ARM64_CPUCAP_WEAK_LOCAL_CPU_FEATURE are never set as the
result of late-onlining. Therefore our handling of erratum 1418040 does
not get activated if it is not required by any of the boot CPUs, even
though we allow late-onlining of an affected CPU.
In order to get things working again, replace the cpus_have_const_cap()
invocation with an explicit check for the current CPU using
this_cpu_has_cap().
Cc: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
Cc: Stephen Boyd <swboyd@chromium.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201106114952.10032-1-will@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
We now use cpu_pm for saving and restoring device context for deeper SoC
idle states. But for omap3, we must also block idle if SDMA is busy.
If we don't block idle when SDMA is busy, we eventually end up saving and
restoring SDMA register state on PER domain idle while SDMA is active and
that causes at least audio playback to fail.
Fixes: 4c74ecf792 ("dmaengine: ti: omap-dma: Add device tree match data and use it for cpu_pm")
Reported-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Tested-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
Acked-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
Link: https://lore.kernel.org/r/20201109154013.11950-1-tony@atomide.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Mika writes:
thunderbolt: Fixes for v5.10-rc4
This includes two fixes for resource leaks that have been around for a
while. Then two fixes for code that was added during v5.10 merge window
and PCI IDs for Intel Tiger Lake-H.
All these have been in linux-next without reported issues.
* tag 'thunderbolt-for-v5.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/westeri/thunderbolt:
thunderbolt: Add support for Intel Tiger Lake-H
thunderbolt: Only configure USB4 wake for lane 0 adapters
thunderbolt: Add uaccess dependency to debugfs interface
thunderbolt: Fix memory leak if ida_simple_get() fails in enumerate_services()
thunderbolt: Add the missed ida_simple_remove() in ring_request_msix()
The SPI chip selects are represented as:
cs-gpios = <&gpio4 11 GPIO_ACTIVE_LOW>, <&gpio4 13 GPIO_ACTIVE_LOW>;
, which means that they are used in GPIO function instead of native
SPI mode.
Fix the IOMUX for the chip select 1 to use GPIO4_13 instead of
the native CSPI_SSI function.
Fixes: c605cbf5e1 ("ARM: dts: imx: add device tree support for Freescale imx50evk board")
Signed-off-by: Fabio Estevam <festevam@gmail.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
The datasheet for both the industrial and consumer variant of the
SoC lists a typical voltage of 0.95V for the 1.6GHz CPU operating
point.
Fixes: e85c9d0faa (arm64: dts: imx8mm: Add cpufreq properties)
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Fabio Estevam <festevam@gmail.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
After updating userspace Ethtool from 5.7 to 5.9, I noticed that
NETDEV_FEAT_CHANGE is no more raised when changing netdev features
through Ethtool.
That's because the old Ethtool ioctl interface always calls
netdev_features_change() at the end of user request processing to
inform the kernel that our netdevice has some features changed, but
the new Netlink interface does not. Instead, it just notifies itself
with ETHTOOL_MSG_FEATURES_NTF.
Replace this ethtool_notify() call with netdev_features_change(), so
the kernel will be aware of any features changes, just like in case
with the ioctl interface. This does not omit Ethtool notifications,
as Ethtool itself listens to NETDEV_FEAT_CHANGE and drops
ETHTOOL_MSG_FEATURES_NTF on it
(net/ethtool/netlink.c:ethnl_netdev_event()).
From v1 [1]:
- dropped extra new line as advised by Jakub;
- no functional changes.
[1] https://lore.kernel.org/netdev/AlZXQ2o5uuTVHCfNGOiGgJ8vJ3KgO5YIWAnQjH0cDE@cp3-web-009.plabs.ch
Fixes: 0980bfcd69 ("ethtool: set netdev features with FEATURES_SET request")
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Reviewed-by: Michal Kubecek <mkubecek@suse.cz>
Link: https://lore.kernel.org/r/ahA2YWXYICz5rbUSQqNG4roJ8OlJzzYQX7PTiG80@cp4-web-028.plabs.ch
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jianlin reports that a bridged IPv6 VXLAN endpoint, carrying IPv6
packets over a link with a PMTU estimation of exactly 1350 bytes,
won't trigger ICMPv6 Packet Too Big replies when the encapsulated
datagrams exceed said PMTU value. VXLAN over IPv6 adds 70 bytes of
overhead, so an ICMPv6 reply indicating 1280 bytes as inner MTU
would be legitimate and expected.
This comes from an off-by-one error I introduced in checks added
as part of commit 4cb47a8644 ("tunnels: PMTU discovery support
for directly bridged IP packets"), whose purpose was to prevent
sending ICMPv6 Packet Too Big messages with an MTU lower than the
smallest permissible IPv6 link MTU, i.e. 1280 bytes.
In iptunnel_pmtud_check_icmpv6(), avoid triggering a reply only if
the advertised MTU would be less than, and not equal to, 1280 bytes.
Also fix the analogous comparison for IPv4, that is, skip the ICMP
reply only if the resulting MTU is strictly less than 576 bytes.
This becomes apparent while running the net/pmtu.sh bridged VXLAN
or GENEVE selftests with adjusted lower-link MTU values. Using
e.g. GENEVE, setting ll_mtu to the values reported below, in the
test_pmtu_ipvX_over_bridged_vxlanY_or_geneveY_exception() test
function, we can see failures on the following tests:
test | ll_mtu
-------------------------------|--------
pmtu_ipv4_br_geneve4_exception | 626
pmtu_ipv6_br_geneve4_exception | 1330
pmtu_ipv6_br_geneve6_exception | 1350
owing to the different tunneling overheads implied by the
corresponding configurations.
Reported-by: Jianlin Shi <jishi@redhat.com>
Fixes: 4cb47a8644 ("tunnels: PMTU discovery support for directly bridged IP packets")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Link: https://lore.kernel.org/r/4f5fc2f33bfdf8409549fafd4f952b008bf04d63.1604681709.git.sbrivio@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Due to the legacy usage of hard_header_len for SIT tunnels while
already using infrastructure from net/ipv4/ip_tunnel.c the
calculation of the path MTU in tnl_update_pmtu is incorrect.
This leads to unnecessary creation of MTU exceptions for any
flow going over a SIT tunnel.
As SIT tunnels do not have a header themsevles other than their
transport (L3, L2) headers we're leaving hard_header_len set to zero
as tnl_update_pmtu is already taking care of the transport headers
sizes.
This will also help avoiding unnecessary IPv6 GC runs and spinlock
contention seen when using SIT tunnels and for more than
net.ipv6.route.gc_thresh flows.
Fixes: c544193214 ("GRE: Refactor GRE tunneling code.")
Signed-off-by: Oliver Herms <oliver.peter.herms@gmail.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20201103104133.GA1573211@tws
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull kvm fixes from Paolo Bonzini:
"ARM:
- fix compilation error when PMD and PUD are folded
- fix regression in reads-as-zero behaviour of ID_AA64ZFR0_EL1
- add aarch64 get-reg-list test
x86:
- fix semantic conflict between two series merged for 5.10
- fix (and test) enforcement of paravirtual cpuid features
selftests:
- various cleanups to memory management selftests
- new selftests testcase for performance of dirty logging"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (30 commits)
KVM: selftests: allow two iterations of dirty_log_perf_test
KVM: selftests: Introduce the dirty log perf test
KVM: selftests: Make the number of vcpus global
KVM: selftests: Make the per vcpu memory size global
KVM: selftests: Drop pointless vm_create wrapper
KVM: selftests: Add wrfract to common guest code
KVM: selftests: Simplify demand_paging_test with timespec_diff_now
KVM: selftests: Remove address rounding in guest code
KVM: selftests: Factor code out of demand_paging_test
KVM: selftests: Use a single binary for dirty/clear log test
KVM: selftests: Always clear dirty bitmap after iteration
KVM: selftests: Add blessed SVE registers to get-reg-list
KVM: selftests: Add aarch64 get-reg-list test
selftests: kvm: test enforcement of paravirtual cpuid features
selftests: kvm: Add exception handling to selftests
selftests: kvm: Clear uc so UCALL_NONE is being properly reported
selftests: kvm: Fix the segment descriptor layout to match the actual layout
KVM: x86: handle MSR_IA32_DEBUGCTLMSR with report_ignored_msrs
kvm: x86: request masterclock update any time guest uses different msr
kvm: x86: ensure pv_cpuid.features is initialized when enabling cap
...
If BPF code contains unused BPF subprogram and there are no other subprogram
calls (which can realistically happen in real-world applications given
sufficiently smart Clang code optimizations), libbpf will erroneously assume
that subprograms are entry-point programs and will attempt to load them with
UNSPEC program type.
Fix by not relying on subcall instructions and rather detect it based on the
structure of BPF object's sections.
Fixes: 9a94f277c4 ("tools: libbpf: restore the ability to load programs from .text section")
Reported-by: Dmitrii Banshchikov <dbanschikov@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20201107000251.256821-1-andrii@kernel.org
Pull nfsd fixes from Bruce Fields:
"This is mainly server-to-server copy and fallout from Chuck's 5.10 rpc
refactoring"
* tag 'nfsd-5.10-1' of git://linux-nfs.org/~bfields/linux:
net/sunrpc: fix useless comparison in proc_do_xprt()
net/sunrpc: return 0 on attempt to write to "transports"
NFSD: fix missing refcount in nfsd4_copy by nfsd4_do_async_copy
NFSD: Fix use-after-free warning when doing inter-server copy
NFSD: MKNOD should return NFSERR_BADTYPE instead of NFSERR_INVAL
SUNRPC: Fix general protection fault in trace_rpc_xdr_overflow()
NFSD: NFSv3 PATHCONF Reply is improperly formed
Pull ext4 fixes and cleanups from Ted Ts'o:
"More fixes and cleanups for the new fast_commit features, but also a
few other miscellaneous bug fixes and a cleanup for the MAINTAINERS
file"
* tag 'ext4_for_linus_cleanups' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (28 commits)
jbd2: fix up sparse warnings in checkpoint code
ext4: fix sparse warnings in fast_commit code
ext4: cleanup fast commit mount options
jbd2: don't start fast commit on aborted journal
ext4: make s_mount_flags modifications atomic
ext4: issue fsdev cache flush before starting fast commit
ext4: disable fast commit with data journalling
ext4: fix inode dirty check in case of fast commits
ext4: remove unnecessary fast commit calls from ext4_file_mmap
ext4: mark buf dirty before submitting fast commit buffer
ext4: fix code documentatioon
ext4: dedpulicate the code to wait on inode that's being committed
jbd2: don't read journal->j_commit_sequence without taking a lock
jbd2: don't touch buffer state until it is filled
jbd2: add todo for a fast commit performance optimization
jbd2: don't pass tid to jbd2_fc_end_commit_fallback()
jbd2: don't use state lock during commit path
jbd2: drop jbd2_fc_init documentation
ext4: clean up the JBD2 API that initializes fast commits
jbd2: rename j_maxlen to j_total_len and add jbd2_journal_max_txn_bufs
...
Pull erofs fixes from Gao Xiang:
"A week ago, Vladimir reported an issue that the kernel log would
become polluted if the page allocation debug option is enabled. I also
found this when I cleaned up magical page->mapping and originally
planned to submit these all for 5.11 but it seems the impact can be
noticed so submit the fix in advance.
In addition, nl6720 also reported that atime is empty although EROFS
has the only one on-disk timestamp as a practical consideration for
now but it's better to derive it as what we did for the other
timestamps.
Summary:
- fix setting up pcluster improperly for temporary pages
- derive atime instead of leaving it empty"
* tag 'erofs-for-5.10-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: fix setting up pcluster for temporary pages
erofs: derive atime instead of leaving it empty
The Medion Akoya E2228T's ACPI _LID implementation is quite broken,
it has the same issues as the one from the Medion Akoya E2215T:
1. For notifications it uses an ActiveLow Edge GpioInt, rather then
an ActiveBoth one, meaning that the device is only notified when the
lid is closed, not when it is opened.
2. Matching with this its _LID method simply always returns 0 (closed)
In order for the Linux LID code to work properly with this implementation,
the lid_init_state selection needs to be set to ACPI_BUTTON_LID_INIT_OPEN,
add a DMI quirk for this.
While working on this I also found out that the MD60### part of the model
number differs per country/batch while all of the E2215T and E2228T models
have this issue, so also remove the " MD60198" part from the E2215T quirk.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Clang is more aggressive about -Wformat warnings when the format flag
specifies a type smaller than the parameter. It turns out that gsi is an
int. Fixes:
drivers/acpi/evged.c:105:48: warning: format specifies type 'unsigned
char' but the argument has type 'unsigned int' [-Wformat]
trigger == ACPI_EDGE_SENSITIVE ? 'E' : 'L', gsi);
^~~
Link: https://github.com/ClangBuiltLinux/linux/issues/378
Fixes: ea6f3af4c5 ("ACPI: GED: add support for _Exx / _Lxx handler methods")
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Replaces spaces with tabs where spaces have been (inconsistently) used
for indentation and removes trailing whitespaces.
Signed-off-by: Maximilian Luz <luzmaximilian@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
For some reason building with W=1 doesn't pick up on this, but the
kerneldoc name for acpi_dma_configure_id() is not right, so fix it up.
Signed-off-by: John Garry <john.garry@huawei.com>
Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
GpioIo() doesn't provide an explicit state for an output pin.
Linux tries to be smart and uses a common sense based on other
parameters. Document how it looks like in the code.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Fix factual mistakes and style issues in GPIO properties document.
This converts IoRestriction from InputOnly to OutputOnly as pins
in the example are used as outputs.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The zynqmp_pm_set_suspend_mode() and zynqmp_pm_get_trustzone_version()
functions pass values as api_id into zynqmp_pm_invoke_fn
that are beyond PM_API_MAX, resulting in an out-of-bounds access:
drivers/firmware/xilinx/zynqmp.c: In function 'zynqmp_pm_set_suspend_mode':
drivers/firmware/xilinx/zynqmp.c:150:24: warning: array subscript 2562 is above array bounds of 'u32[64]' {aka 'unsigned int[64]'} [-Warray-bounds]
150 | if (zynqmp_pm_features[api_id] != PM_FEATURE_UNCHECKED)
| ~~~~~~~~~~~~~~~~~~^~~~~~~~
drivers/firmware/xilinx/zynqmp.c:28:12: note: while referencing 'zynqmp_pm_features'
28 | static u32 zynqmp_pm_features[PM_API_MAX];
| ^~~~~~~~~~~~~~~~~~
Replace the resulting undefined behavior with an error return.
This may break some things that happen to work at the moment
but seems better than randomly overwriting kernel data.
I assume we need additional fixes for the two functions that now
return an error.
Fixes: 76582671eb ("firmware: xilinx: Add Zynqmp firmware driver")
Fixes: e178df31cf ("firmware: xilinx: Implement ZynqMP power management APIs")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20201026155449.3703142-1-arnd@kernel.org
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
With the ltlk and spkout drivers, the index read function, i.e.
in_nowait, is getting called from the read_all_doc mechanism, from
the timer softirq:
Call Trace:
<IRQ>
dump_stack+0x71/0x98
dequeue_task_idle+0x1f/0x28
__schedule+0x167/0x5d6
? trace_hardirqs_on+0x2e/0x3a
? usleep_range+0x7f/0x7f
schedule+0x8a/0xae
schedule_timeout+0xb1/0xea
? del_timer_sync+0x31/0x31
do_wait_for_common+0xba/0x12b
? wake_up_q+0x45/0x45
wait_for_common+0x37/0x50
ttyio_in+0x2a/0x6b
spk_ttyio_in_nowait+0xc/0x13
spk_get_index_count+0x20/0x93
cursor_done+0x1c6/0x4c6
? read_all_doc+0xb1/0xb1
call_timer_fn+0x89/0x140
run_timer_softirq+0x164/0x1a5
? read_all_doc+0xb1/0xb1
? hrtimer_forward+0x7b/0x87
? timerqueue_add+0x62/0x68
? enqueue_hrtimer+0x95/0x9f
__do_softirq+0x181/0x31f
irq_exit+0x6a/0x86
smp_apic_timer_interrupt+0x15e/0x183
apic_timer_interrupt+0xf/0x20
</IRQ>
We thus should not schedule() at all, even with timeout == 0, this
crashes the kernel. We can however use try_wait_for_completion()
instead of wait_for_completion_timeout(0).
Cc: stable@vger.kernel.org
Reported-by: John Covici <covici@ccs.covici.com>
Tested-by: John Covici <covici@ccs.covici.com>
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Link: https://lore.kernel.org/r/20201108131233.tadycr73sxlvodgo@function
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d97a9d7aea ("staging/speakup: Add inflection synth parameter")
introduced a new "inflection" speakup parameter next to "pitch", but
the values of the var_id_t enum are actually used by the keymap tables
so we must not renumber them. The effect was that notably the volume
control shortcut (speakup-1 or 2) was actually changing the inflection.
This moves the INFLECTION value at the end of the var_id_t enum to
fix back the enum values. This also adds a warning about it.
Fixes: d97a9d7aea ("staging/speakup: Add inflection synth parameter")
Cc: stable@vger.kernel.org
Reported-by: Kirk Reiser <kirk@reisers.ca>
Reported-by: Gregory Nowak <greg@gregn.net>
Tested-by: Gregory Nowak <greg@gregn.net>
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Link: https://lore.kernel.org/r/20201012160646.qmdo4eqtj24hpch4@function
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Starting with Arch Perfmon v5, the anythread filter on generic counters may be
deprecated. The current kernel was exporting the any filter without checking.
On Icelake, it means you could do cpu/event=0x3c,any/ even though the filter
does not exist. This patch corrects the problem by relying on the CPUID 0xa leaf
function to determine if anythread is supported or not as described in the
Intel SDM Vol3b 18.2.5.1 AnyThread Deprecation section.
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20201028194247.3160610-1-eranian@google.com
Currently perf_event_attr::exclusive can be used to ensure an
event(group) is the sole group scheduled on the PMU. One consequence
is that when you have a pinned event (say the watchdog) you can no
longer have regular exclusive event(group)s.
Inspired by the fact that !pinned events are considered less strict,
allow !pinned,exclusive events to share the PMU with pinned,!exclusive
events.
Pinned,exclusive is still fully exclusive.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20201029162902.105962225@infradead.org
Commit 9e6302056f ("perf: Use hrtimers for event multiplexing")
placed the hrtimer (re)start call in the wrong place. Instead of
capturing all scheduling failures, it only considered the PMU failure.
The result is that groups using perf_event_attr::exclusive are no
longer rotated.
Fixes: 9e6302056f ("perf: Use hrtimers for event multiplexing")
Reported-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20201029162902.038667689@infradead.org
Since event_sched_out() clears cpuctx->exclusive upon removal of an
exclusive event (and only group leaders can be exclusive), there is no
point in group_sched_out() trying to do it too. It is impossible for
cpuctx->exclusive to still be set here.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20201029162901.904060564@infradead.org
Since commit 086d08725d ("remoteproc: create vdev subdevice with
specific dma memory pool"), every remoteproc has a DMA subdevice
("remoteprocX#vdevYbuffer") for each virtio device, which inherits
DMA capabilities from the corresponding platform device. This allowed
to associate different DMA pools with each vdev, and required from
virtio drivers to perform DMA operations with the parent device
(vdev->dev.parent) instead of grandparent (vdev->dev.parent->parent).
virtio_rpmsg_bus was already changed in the same merge cycle with
commit d999b622fc ("rpmsg: virtio: allocate buffer from parent"),
but virtio_console did not. In fact, operations using the grandparent
worked fine while the grandparent was the platform device, but since
commit c774ad0108 ("remoteproc: Fix and restore the parenting
hierarchy for vdev") this was changed, and now the grandparent device
is the remoteproc device without any DMA capabilities.
So, starting v5.8-rc1 the following warning is observed:
[ 2.483925] ------------[ cut here ]------------
[ 2.489148] WARNING: CPU: 3 PID: 101 at kernel/dma/mapping.c:427 0x80e7eee8
[ 2.489152] Modules linked in: virtio_console(+)
[ 2.503737] virtio_rpmsg_bus rpmsg_core
[ 2.508903]
[ 2.528898] <Other modules, stack and call trace here>
[ 2.913043]
[ 2.914907] ---[ end trace 93ac8746beab612c ]---
[ 2.920102] virtio-ports vport1p0: Error allocating inbufs
kernel/dma/mapping.c:427 is:
WARN_ON_ONCE(!dev->coherent_dma_mask);
obviously because the grandparent now is remoteproc dev without any
DMA caps:
[ 3.104943] Parent: remoteproc0#vdev1buffer, grandparent: remoteproc0
Fix this the same way as it was for virtio_rpmsg_bus, using just the
parent device (vdev->dev.parent, "remoteprocX#vdevYbuffer") for DMA
operations.
This also allows now to reserve DMA pools/buffers for rproc serial
via Device Tree.
Fixes: c774ad0108 ("remoteproc: Fix and restore the parenting hierarchy for vdev")
Cc: stable@vger.kernel.org # 5.1+
Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Date: Thu, 5 Nov 2020 11:10:24 +0800
Link: https://lore.kernel.org/r/AOKowLclCbOCKxyiJ71WeNyuAAj2q8EUtxrXbyky5E@cp7-web-042.plabs.ch
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The offending commit breaks BLKROSET ioctl because a device
revalidation will blindly override BLKROSET setting. Hence,
we remove the disk rw setting in case NVME_NS_ATTR_RO is cleared
from by the controller.
Fixes: 1293477f4f ("nvme: set gendisk read only based on nsattr")
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Even though one iteration is not enough for the dirty log performance
test (due to the cost of building page tables, zeroing memory etc.)
two is okay and it is the default. Without this patch,
"./dirty_log_perf_test" without any further arguments fails.
Cc: Ben Gardon <bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The gma500 driver expects 3 pipelines in several it's IRQ functions.
Accessing struct drm_device.vblank[], this fails with devices that only
have 2 pipelines. An example KASAN report is shown below.
[ 62.267688] ==================================================================
[ 62.268856] BUG: KASAN: slab-out-of-bounds in psb_irq_postinstall+0x250/0x3c0 [gma500_gfx]
[ 62.269450] Read of size 1 at addr ffff8880012bc6d0 by task systemd-udevd/285
[ 62.269949]
[ 62.270192] CPU: 0 PID: 285 Comm: systemd-udevd Tainted: G E 5.10.0-rc1-1-default+ #572
[ 62.270807] Hardware name: /DN2800MT, BIOS MTCDT10N.86A.0164.2012.1213.1024 12/13/2012
[ 62.271366] Call Trace:
[ 62.271705] dump_stack+0xae/0xe5
[ 62.272180] print_address_description.constprop.0+0x17/0xf0
[ 62.272987] ? psb_irq_postinstall+0x250/0x3c0 [gma500_gfx]
[ 62.273474] __kasan_report.cold+0x20/0x38
[ 62.273989] ? psb_irq_postinstall+0x250/0x3c0 [gma500_gfx]
[ 62.274460] kasan_report+0x3a/0x50
[ 62.274891] psb_irq_postinstall+0x250/0x3c0 [gma500_gfx]
[ 62.275380] drm_irq_install+0x131/0x1f0
<...>
[ 62.300751] Allocated by task 285:
[ 62.301223] kasan_save_stack+0x1b/0x40
[ 62.301731] __kasan_kmalloc.constprop.0+0xbf/0xd0
[ 62.302293] drmm_kmalloc+0x55/0x100
[ 62.302773] drm_vblank_init+0x77/0x210
Resolve the issue by only handling vblank entries up to the number of
CRTCs.
I'm adding a Fixes tag for reference, although the bug has been present
since the driver's initial commit.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Fixes: 5c49fd3aa0 ("gma500: Add the core DRM files and headers")
Cc: Alan Cox <alan@linux.intel.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Patrik Jakobsson <patrik.r.jakobsson@gmail.com>
Cc: dri-devel@lists.freedesktop.org
Cc: stable@vger.kernel.org#v3.3+
Link: https://patchwork.freedesktop.org/patch/msgid/20201105190256.3893-1-tzimmermann@suse.de
The victim inode's parent and name info is required when an event
needs to be delivered to a group interested in filename info OR
when the inode's parent is interested in an event on its children.
Let us call the first condition 'parent_needed' and the second
condition 'parent_interested'.
In fsnotify_parent(), the condition where the inode's parent is
interested in some events on its children, but not necessarily
interested the specific event is called 'parent_watched'.
fsnotify_parent() tests the condition (!parent_watched && !parent_needed)
for sending the event without parent and name info, which is correct.
It then wrongly assumes that parent_watched implies !parent_needed
and tests the condition (parent_watched && !parent_interested)
for sending the event without parent and name info, which is wrong,
because parent may still be needed by some group.
For example, after initializing a group with FAN_REPORT_DFID_NAME and
adding a FAN_MARK_MOUNT with FAN_OPEN mask, open events on non-directory
children of "testdir" are delivered with file name info.
After adding another mark to the same group on the parent "testdir"
with FAN_CLOSE|FAN_EVENT_ON_CHILD mask, open events on non-directory
children of "testdir" are no longer delivered with file name info.
Fix the logic and use auxiliary variables to clarify the conditions.
Fixes: 9b93f33105 ("fsnotify: send event with parent/name info to sb/mount/non-dir marks")
Cc: stable@vger.kernel.org#v5.9
Link: https://lore.kernel.org/r/20201108105906.8493-1-amir73il@gmail.com
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
When booting a hyperthreaded system with the kernel parameter
'mitigations=auto,nosmt', the following warning occurs:
WARNING: CPU: 0 PID: 1 at drivers/xen/events/events_base.c:1112 unbind_from_irqhandler+0x4e/0x60
...
Hardware name: Xen HVM domU, BIOS 4.2.amazon 08/24/2006
...
Call Trace:
xen_uninit_lock_cpu+0x28/0x62
xen_hvm_cpu_die+0x21/0x30
takedown_cpu+0x9c/0xe0
? trace_suspend_resume+0x60/0x60
cpuhp_invoke_callback+0x9a/0x530
_cpu_up+0x11a/0x130
cpu_up+0x7e/0xc0
bringup_nonboot_cpus+0x48/0x50
smp_init+0x26/0x79
kernel_init_freeable+0xea/0x229
? rest_init+0xaa/0xaa
kernel_init+0xa/0x106
ret_from_fork+0x35/0x40
The secondary CPUs are not activated with the nosmt mitigations and only
the primary thread on each CPU core is used. In this situation,
xen_hvm_smp_prepare_cpus(), and more importantly xen_init_lock_cpu(), is
not called, so the lock_kicker_irq is not initialized for the secondary
CPUs. Let's fix this by exiting early in xen_uninit_lock_cpu() if the
irq is not set to avoid the warning from above for each secondary CPU.
Signed-off-by: Brian Masney <bmasney@redhat.com>
Link: https://lore.kernel.org/r/20201107011119.631442-1-bmasney@redhat.com
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Kernel 5.4 introduces HID_QUIRK_INCREMENT_USAGE_ON_DUPLICATE, devices need to
be set explicitly with this flag.
Signed-off-by: Chris Ye <lzye@google.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
The system call exit path is running with interrupts enabled while
checking for TIF/PIF/CIF bits which require special handling. If all
bits have been checked interrupts are disabled and the kernel exits to
user space.
The problem is that after checking all bits and before interrupts are
disabled bits can be set already again, due to interrupt handling.
This means that the kernel can exit to user space with some
TIF/PIF/CIF bits set, which should never happen. E.g. TIF_NEED_RESCHED
might be set, which might lead to additional latencies, since that bit
will only be recognized with next exit to user space.
Fix this by checking the corresponding bits only when interrupts are
disabled.
Fixes: 0b0ed657fe ("s390: remove critical section cleanup from entry.S")
Cc: <stable@vger.kernel.org> # 5.8
Acked-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Synchronize access to shm or shared memory buffer list to prevent
race conditions due to concurrent updates to shared shm list by
multiple threads.
Fixes: 757cc3e9ff ("tee: add AMD-TEE driver")
Reviewed-by: Devaraj Rangasamy <Devaraj.Rangasamy@amd.com>
Signed-off-by: Rijo Thomas <Rijo-john.Thomas@amd.com>
Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>
The driver maintains a list of shared memory buffers along with their
mapped buffer id's in a global linked list. These buffers need to be
unmapped after use by the user-space client.
The global shared memory list is initialized to zero entries in the
function amdtee_open(). This clearing of list entries can be a source
for memory leak on secure side if the global linked list previously
held some mapped buffer entries allocated from another TEE context.
Fix potential memory leak issue by moving global shared memory list
to AMD-TEE driver context data structure.
Fixes: 757cc3e9ff ("tee: add AMD-TEE driver")
Reviewed-by: Devaraj Rangasamy <Devaraj.Rangasamy@amd.com>
Signed-off-by: Rijo Thomas <Rijo-john.Thomas@amd.com>
Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>
In the original code, the "if (*lenp < 0)" check didn't work because
"*lenp" is unsigned. Fortunately, the memory_read_from_buffer() call
will never fail in this context so it doesn't affect runtime.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Pull driver core documentation fixes from Greg KH:
"Some small Documentation fixes that were fallout from the larger
documentation update we did in 5.10-rc2.
Nothing major here at all, but all of these have been in linux-next
and resolve build warnings when building the documentation files"
* tag 'driver-core-5.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
Documentation: remove mic/index from misc-devices/index.rst
scripts: get_api.pl: Add sub-titles to ABI output
scripts: get_abi.pl: Don't let ABI files to create subtitles
docs: leds: index.rst: add a missing file
docs: ABI: sysfs-class-net: fix a typo
docs: ABI: sysfs-driver-dma-ioatdma: what starts with /sys
Pull tty/serial fixes from Greg KH:
"Here are a small number of small tty and serial fixes for some
reported problems for the tty core, vt code, and some serial drivers.
They include fixes for:
- a buggy and obsolete vt font ioctl removal
- 8250_mtk serial baudrate runtime warnings
- imx serial earlycon build configuration fix
- txx9 serial driver error path cleanup issues
- tty core fix in release_tty that can be triggered by trying to bind
an invalid serial port name to a speakup console device
Almost all of these have been in linux-next without any problems, the
only one that hasn't, just deletes code :)"
* tag 'tty-5.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
vt: Disable KD_FONT_OP_COPY
tty: fix crash in release_tty if tty->port is not set
serial: txx9: add missing platform_driver_unregister() on error in serial_txx9_init
tty: serial: imx: enable earlycon by default if IMX_SERIAL_CONSOLE is enabled
serial: 8250_mtk: Fix uart_get_baud_rate warning
Pull USB fixes from Greg KH:
"Here are some small USB fixes and new device ids:
- USB gadget fixes for some reported issues
- Fixes for the ever-troublesome apple fastcharge driver, hopefully
we finally have it right.
- More USB core quirks for odd devices
- USB serial driver fixes for some long-standing issues that were
recently found
- some new USB serial driver device ids
All have been in linux-next with no reported issues"
* tag 'usb-5.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
USB: apple-mfi-fastcharge: fix reference leak in apple_mfi_fc_set_property
usb: mtu3: fix panic in mtu3_gadget_stop()
USB: serial: option: add Telit FN980 composition 0x1055
USB: serial: option: add LE910Cx compositions 0x1203, 0x1230, 0x1231
USB: serial: cyberjack: fix write-URB completion race
USB: Add NO_LPM quirk for Kingston flash drive
USB: serial: option: add Quectel EC200T module support
usb: raw-gadget: fix memory leak in gadget_setup
usb: dwc2: Avoid leaving the error_debugfs label unused
usb: dwc3: ep0: Fix delay status handling
usb: gadget: fsl: fix null pointer checking
usb: gadget: goku_udc: fix potential crashes in probe
usb: dwc3: pci: add support for the Intel Alder Lake-S
current->group_leader->exit_signal may change during copy_process() if
current->real_parent exits.
Move the assignment inside tasklist_lock to avoid the race.
Signed-off-by: Eddy Wu <eddy_wu@trendmicro.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It's buggy:
On Fri, Nov 06, 2020 at 10:30:08PM +0800, Minh Yuan wrote:
> We recently discovered a slab-out-of-bounds read in fbcon in the latest
> kernel ( v5.10-rc2 for now ). The root cause of this vulnerability is that
> "fbcon_do_set_font" did not handle "vc->vc_font.data" and
> "vc->vc_font.height" correctly, and the patch
> <https://lkml.org/lkml/2020/9/27/223> for VT_RESIZEX can't handle this
> issue.
>
> Specifically, we use KD_FONT_OP_SET to set a small font.data for tty6, and
> use KD_FONT_OP_SET again to set a large font.height for tty1. After that,
> we use KD_FONT_OP_COPY to assign tty6's vc_font.data to tty1's vc_font.data
> in "fbcon_do_set_font", while tty1 retains the original larger
> height. Obviously, this will cause an out-of-bounds read, because we can
> access a smaller vc_font.data with a larger vc_font.height.
Further there was only one user ever.
- Android's loadfont, busybox and console-tools only ever use OP_GET
and OP_SET
- fbset documentation only mentions the kernel cmdline font: option,
not anything else.
- systemd used OP_COPY before release 232 published in Nov 2016
Now unfortunately the crucial report seems to have gone down with
gmane, and the commit message doesn't say much. But the pull request
hints at OP_COPY being broken
https://github.com/systemd/systemd/pull/3651
So in other words, this never worked, and the only project which
foolishly every tried to use it, realized that rather quickly too.
Instead of trying to fix security issues here on dead code by adding
missing checks, fix the entire thing by removing the functionality.
Note that systemd code using the OP_COPY function ignored the return
value, so it doesn't matter what we're doing here really - just in
case a lone server somewhere happens to be extremely unlucky and
running an affected old version of systemd. The relevant code from
font_copy_to_all_vcs() in systemd was:
/* copy font from active VT, where the font was uploaded to */
cfo.op = KD_FONT_OP_COPY;
cfo.height = vcs.v_active-1; /* tty1 == index 0 */
(void) ioctl(vcfd, KDFONTOP, &cfo);
Note this just disables the ioctl, garbage collecting the now unused
callbacks is left for -next.
v2: Tetsuo found the old mail, which allowed me to find it on another
archive. Add the link too.
Acked-by: Peilin Ye <yepeilin.cs@gmail.com>
Reported-by: Minh Yuan <yuanmingbuaa@gmail.com>
References: https://lists.freedesktop.org/archives/systemd-devel/2016-June/036935.html
References: https://github.com/systemd/systemd/pull/3651
Cc: Greg KH <greg@kroah.com>
Cc: Peilin Ye <yepeilin.cs@gmail.com>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Link: https://lore.kernel.org/r/20201108153806.3140315-1-daniel.vetter@ffwll.ch
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pull xfs fixes from Darrick Wong:
- Fix an uninitialized struct problem
- Fix an iomap problem zeroing unwritten EOF blocks
- Fix some clumsy error handling when writeback fails on filesystems
with blocksize < pagesize
- Fix a retry loop not resetting loop variables properly
- Fix scrub flagging rtinherit inodes on a non-rt fs, since the kernel
actually does permit that combination
- Fix excessive page cache flushing when unsharing part of a file
* tag 'xfs-5.10-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: only flush the unshared range in xfs_reflink_unshare
xfs: fix scrub flagging rtinherit even if there is no rt device
xfs: fix missing CoW blocks writeback conversion retry
iomap: clean up writeback state logic on writepage error
iomap: support partial page discard on writeback block mapping failure
xfs: flush new eof page on truncate to avoid post-eof corruption
xfs: set xefi_discard when creating a deferred agfl free log intent item
Merge procfs splice read fixes from Christoph Hellwig:
"Greg reported a problem due to the fact that Android tests use procfs
files to test splice, which stopped working with the changes for
set_fs() removal.
This series adds read_iter support for seq_file, and uses those for
various proc files using seq_file to restore splice read support"
[ Side note: Christoph initially had a scripted "move everything over"
patch, which looks fine, but I personally would prefer us to actively
discourage splice() on random files. So this does just the minimal
basic core set of proc file op conversions.
For completeness, and in case people care, that script was
sed -i -e 's/\.proc_read\(\s*=\s*\)seq_read/\.proc_read_iter\1seq_read_iter/g'
but I'll wait and see if somebody has a strong argument for using
splice on random small /proc files before I'd run it on the whole
kernel. - Linus ]
* emailed patches from Christoph Hellwig <hch@lst.de>:
proc "seq files": switch to ->read_iter
proc "single files": switch to ->read_iter
proc/stat: switch to ->read_iter
proc/cpuinfo: switch to ->read_iter
proc: wire up generic_file_splice_read for iter ops
seq_file: add seq_read_iter
Pull x86 fixes from Thomas Gleixner:
"A set of x86 fixes:
- Use SYM_FUNC_START_WEAK in the mem* ASM functions instead of a
combination of .weak and SYM_FUNC_START_LOCAL which makes LLVMs
integrated assembler upset
- Correct the mitigation selection logic which prevented the related
prctl to work correctly
- Make the UV5 hubless system work correctly by fixing up the
malformed table entries and adding the missing ones"
* tag 'x86-urgent-2020-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/platform/uv: Recognize UV5 hubless system identifier
x86/platform/uv: Remove spaces from OEM IDs
x86/platform/uv: Fix missing OEM_TABLE_ID
x86/speculation: Allow IBPB to be conditionally enabled on CPUs with always-on STIBP
x86/lib: Change .weak to SYM_FUNC_START_WEAK for arch/x86/lib/mem*_64.S
Pull perf fix from Thomas Gleixner:
"A single fix for the perf core plugging a memory leak in the address
filter parser"
* tag 'perf-urgent-2020-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/core: Fix a memory leak in perf_event_parse_addr_filter()
Pull futex fix from Thomas Gleixner:
"A single fix for the futex code where an intermediate state in the
underlying RT mutex was not handled correctly and triggering a BUG()
instead of treating it as another variant of retry condition"
* tag 'locking-urgent-2020-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
futex: Handle transient "ownerless" rtmutex state correctly
Pull irq fixes from Thomas Gleixner:
"A set of fixes for interrupt chip drivers:
- Fix the fallout of the IPI as interrupt conversion in Kconfig and
the BCM2836 interrupt chip driver
- Fixes for interrupt affinity setting and the handling of
hierarchical irq domains in the SiFive PLIC driver
- Make the unmapped event handling in the TI SCI driver work
correctly
- A few minor fixes and cleanups in various chip drivers and Kconfig"
* tag 'irq-urgent-2020-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
dt-bindings: irqchip: ti, sci-inta: Fix diagram indentation for unmapped events
irqchip/ti-sci-inta: Add support for unmapped event handling
dt-bindings: irqchip: ti, sci-inta: Update for unmapped event handling
irqchip/renesas-intc-irqpin: Merge irlm_bit and needs_irlm
irqchip/sifive-plic: Fix chip_data access within a hierarchy
irqchip/sifive-plic: Fix broken irq_set_affinity() callback
irqchip/stm32-exti: Add all LP timer exti direct events support
irqchip/bcm2836: Fix missing __init annotation
irqchip/mips: Drop selection of IRQ_DOMAIN_HIERARCHY
irqchip/mst: Make mst_intc_of_init static
irqchip/mst: MST_IRQ should depend on ARCH_MEDIATEK or ARCH_MSTARV7
genirq: Let GENERIC_IRQ_IPI select IRQ_DOMAIN_HIERARCHY
Pull entry code fix from Thomas Gleixner:
"A single fix for the generic entry code to correct the wrong
assumption that the lockdep interrupt state needs not to be
established before calling the RCU check"
* tag 'core-urgent-2020-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
entry: Fix the incorrect ordering of lockdep and RCU check
Pull powerpc fixes from Michael Ellerman:
- fix miscompilation with GCC 4.9 by using asm_goto_volatile for put_user()
- fix for an RCU splat at boot caused by a recent lockdep change
- fix for a possible deadlock in our EEH debugfs code
- several fixes for handling of _PAGE_ACCESSED on 32-bit platforms
- build fix when CONFIG_NUMA=n
Thanks to Andreas Schwab, Christophe Leroy, Oliver O'Halloran, Qian Cai,
and Scott Cheloha.
* tag 'powerpc-5.10-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/numa: Fix build when CONFIG_NUMA=n
powerpc/8xx: Manage _PAGE_ACCESSED through APG bits in L1 entry
powerpc/8xx: Always fault when _PAGE_ACCESSED is not set
powerpc/40x: Always fault when _PAGE_ACCESSED is not set
powerpc/603: Always fault when _PAGE_ACCESSED is not set
powerpc: Use asm_goto_volatile for put_user()
powerpc/smp: Call rcu_cpu_starting() earlier
powerpc/eeh_cache: Fix a possible debugfs deadlock
When VCNL4035 is enabled and IIO_BUFFER is disabled, it results in the
following Kbuild warning:
WARNING: unmet direct dependencies detected for IIO_TRIGGERED_BUFFER
Depends on [n]: IIO [=y] && IIO_BUFFER [=n]
Selected by [y]:
- VCNL4035 [=y] && IIO [=y] && I2C [=y]
The reason is that VCNL4035 selects IIO_TRIGGERED_BUFFER without depending
on or selecting IIO_BUFFER while IIO_TRIGGERED_BUFFER depends on
IIO_BUFFER. This can also fail building the kernel.
Honor the kconfig dependency to remove unmet direct dependency warnings
and avoid any potential build failures.
Fixes: 55707294c4 ("iio: light: Add support for vishay vcnl4035")
Signed-off-by: Necip Fazil Yildiran <fazilyildiran@gmail.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=209883
Link: https://lore.kernel.org/r/20201102223523.572461-1-fazilyildiran@gmail.com
Cc: <stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
When the command feature of the ADC is used, it is possible to program
the ADC, and specify at each step what input should be processed, and in
comparison to what reference.
This broke the AUX and battery readings when the touchscreen was
enabled, most likely because the CMD feature would change the VREF all
the time.
Now, when AUX or battery are read, we temporarily disable the CMD
feature, which means that we won't get touchscreen readings in that time
frame. But it now gives correct values for AUX / battery, and the
touchscreen isn't disabled for long enough to be an actual issue.
Fixes: b96952f498 ("IIO: Ingenic JZ47xx: Add touchscreen mode.")
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Acked-by: Artur Rojek <contact@artur-rojek.eu>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20201103201238.161083-1-paul@crapouillou.net
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
The dirty log perf test will time verious dirty logging operations
(enabling dirty logging, dirtying memory, getting the dirty log,
clearing the dirty log, and disabling dirty logging) in order to
quantify dirty logging performance. This test can be used to inform
future performance improvements to KVM's dirty logging infrastructure.
This series was tested by running the following invocations on an Intel
Skylake machine:
dirty_log_perf_test -b 20m -i 100 -v 64
dirty_log_perf_test -b 20g -i 5 -v 4
dirty_log_perf_test -b 4g -i 5 -v 32
demand_paging_test -b 20m -v 64
demand_paging_test -b 20g -v 4
demand_paging_test -b 4g -v 32
All behaved as expected.
Signed-off-by: Ben Gardon <bgardon@google.com>
Message-Id: <20201027233733.1484855-6-bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Wrfract will be used by the dirty logging perf test introduced later in
this series to dirty memory sparsely.
This series was tested by running the following invocations on an Intel
Skylake machine:
dirty_log_perf_test -b 20m -i 100 -v 64
dirty_log_perf_test -b 20g -i 5 -v 4
dirty_log_perf_test -b 4g -i 5 -v 32
demand_paging_test -b 20m -v 64
demand_paging_test -b 20g -v 4
demand_paging_test -b 4g -v 32
All behaved as expected.
Signed-off-by: Ben Gardon <bgardon@google.com>
Message-Id: <20201027233733.1484855-5-bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add a helper function to get the current time and return the time since
a given start time. Use that function to simplify the timekeeping in the
demand paging test.
This series was tested by running the following invocations on an Intel
Skylake machine:
dirty_log_perf_test -b 20m -i 100 -v 64
dirty_log_perf_test -b 20g -i 5 -v 4
dirty_log_perf_test -b 4g -i 5 -v 32
demand_paging_test -b 20m -v 64
demand_paging_test -b 20g -v 4
demand_paging_test -b 4g -v 32
All behaved as expected.
Signed-off-by: Ben Gardon <bgardon@google.com>
Message-Id: <20201027233733.1484855-4-bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Rounding the address the guest writes to a host page boundary
will only have an effect if the host page size is larger than the guest
page size, but in that case the guest write would still go to the same
host page. There's no reason to round the address down, so remove the
rounding to simplify the demand paging test.
This series was tested by running the following invocations on an Intel
Skylake machine:
dirty_log_perf_test -b 20m -i 100 -v 64
dirty_log_perf_test -b 20g -i 5 -v 4
dirty_log_perf_test -b 4g -i 5 -v 32
demand_paging_test -b 20m -v 64
demand_paging_test -b 20g -v 4
demand_paging_test -b 4g -v 32
All behaved as expected.
Signed-off-by: Ben Gardon <bgardon@google.com>
Message-Id: <20201027233733.1484855-3-bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Much of the code in demand_paging_test can be reused by other, similar
multi-vCPU-memory-touching-perfromance-tests. Factor that common code
out for reuse.
No functional change expected.
This series was tested by running the following invocations on an Intel
Skylake machine:
dirty_log_perf_test -b 20m -i 100 -v 64
dirty_log_perf_test -b 20g -i 5 -v 4
dirty_log_perf_test -b 4g -i 5 -v 32
demand_paging_test -b 20m -v 64
demand_paging_test -b 20g -v 4
demand_paging_test -b 4g -v 32
All behaved as expected.
Signed-off-by: Ben Gardon <bgardon@google.com>
Message-Id: <20201027233733.1484855-2-bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Remove the clear_dirty_log test, instead merge it into the existing
dirty_log_test. It should be cleaner to use this single binary to do
both tests, also it's a preparation for the upcoming dirty ring test.
The default behavior will run all the modes in sequence.
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20201001012233.6013-1-peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
We used not to clear the dirty bitmap before because KVM_GET_DIRTY_LOG
would overwrite it the next time it copies the dirty log onto it.
In the upcoming dirty ring tests we'll start to fetch dirty pages from
a ring buffer, so no one is going to clear the dirty bitmap for us.
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20201001012228.5916-1-peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Check for KVM_GET_REG_LIST regressions. The blessed list was
created by running on v4.15 with the --core-reg-fixup option.
The following script was also used in order to annotate system
registers with their names when possible. When new system
registers are added the names can just be added manually using
the same grep.
while read reg; do
if [[ ! $reg =~ ARM64_SYS_REG ]]; then
printf "\t$reg\n"
continue
fi
encoding=$(echo "$reg" | sed "s/ARM64_SYS_REG(//;s/),//")
if ! name=$(grep "$encoding" ../../../../arch/arm64/include/asm/sysreg.h); then
printf "\t$reg\n"
continue
fi
name=$(echo "$name" | sed "s/.*SYS_//;s/[\t ]*sys_reg($encoding)$//")
printf "\t$reg\t/* $name */\n"
done < <(aarch64/get-reg-list --core-reg-fixup --list)
Signed-off-by: Andrew Jones <drjones@redhat.com>
Message-Id: <20201029201703.102716-3-drjones@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Ensure the out value 'uc' in get_ucall() is properly reporting
UCALL_NONE if the call fails. The return value will be correctly
reported, however, the out parameter 'uc' will not be. Clear the struct
to ensure the correct value is being reported in the out parameter.
Signed-off-by: Aaron Lewis <aaronlewis@google.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Alexander Graf <graf@amazon.com>
Message-Id: <20201012194716.3950330-3-aaronlewis@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Fix the layout of 'struct desc64' to match the layout described in the
SDM Vol 3, Chapter 3 "Protected-Mode Memory Management", section 3.4.5
"Segment Descriptors", Figure 3-8 "Segment Descriptor". The test added
later in this series relies on this and crashes if this layout is not
correct.
Signed-off-by: Aaron Lewis <aaronlewis@google.com>
Reviewed-by: Alexander Graf <graf@amazon.com>
Message-Id: <20201012194716.3950330-2-aaronlewis@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Windows2016 guest tries to enable LBR by setting the corresponding bits
in MSR_IA32_DEBUGCTLMSR. KVM does not emulate MSR_IA32_DEBUGCTLMSR and
spams the host kernel logs with error messages like:
kvm [...]: vcpu1, guest rIP: 0xfffff800a8b687d3 kvm_set_msr_common: MSR_IA32_DEBUGCTLMSR 0x1, nop"
This patch fixes this by enabling error logging only with
'report_ignored_msrs=1'.
Signed-off-by: Pankaj Gupta <pankaj.gupta@cloud.ionos.com>
Message-Id: <20201105153932.24316-1-pankaj.gupta.linux@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Commit 5b9bb0ebbc ("kvm: x86: encapsulate wrmsr(MSR_KVM_SYSTEM_TIME)
emulation in helper fn", 2020-10-21) subtly changed the behavior of guest
writes to MSR_KVM_SYSTEM_TIME(_NEW). Restore the previous behavior; update
the masterclock any time the guest uses a different msr than before.
Fixes: 5b9bb0ebbc ("kvm: x86: encapsulate wrmsr(MSR_KVM_SYSTEM_TIME) emulation in helper fn", 2020-10-21)
Signed-off-by: Oliver Upton <oupton@google.com>
Reviewed-by: Peter Shier <pshier@google.com>
Message-Id: <20201027231044.655110-6-oupton@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Make the paravirtual cpuid enforcement mechanism idempotent to ioctl()
ordering by updating pv_cpuid.features whenever userspace requests the
capability. Extract this update out of kvm_update_cpuid_runtime() into a
new helper function and move its other call site into
kvm_vcpu_after_set_cpuid() where it more likely belongs.
Fixes: 66570e966d ("kvm: x86: only provide PV features if enabled in guest's CPUID")
Signed-off-by: Oliver Upton <oupton@google.com>
Reviewed-by: Peter Shier <pshier@google.com>
Message-Id: <20201027231044.655110-5-oupton@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
commit 66570e966d ("kvm: x86: only provide PV features if enabled in
guest's CPUID") only protects against disallowed guest writes to KVM
paravirtual msrs, leaving msr reads unchecked. Fix this by enforcing
KVM_CPUID_FEATURES for msr reads as well.
Fixes: 66570e966d ("kvm: x86: only provide PV features if enabled in guest's CPUID")
Signed-off-by: Oliver Upton <oupton@google.com>
Reviewed-by: Peter Shier <pshier@google.com>
Message-Id: <20201027231044.655110-4-oupton@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Recent introduction of the userspace msr filtering added code that uses
negative error codes for cases that result in either #GP delivery to
the guest, or handled by the userspace msr filtering.
This breaks an assumption that a negative error code returned from the
msr emulation code is a semi-fatal error which should be returned
to userspace via KVM_RUN ioctl and usually kill the guest.
Fix this by reusing the already existing KVM_MSR_RET_INVALID error code,
and by adding a new KVM_MSR_RET_FILTERED error code for the
userspace filtered msrs.
Fixes: 291f35fb2c1d1 ("KVM: x86: report negative values from wrmsr emulation to userspace")
Reported-by: Qian Cai <cai@redhat.com>
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20201101115523.115780-1-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Pull block fixes from Jens Axboe:
- NVMe pull request from Christoph:
- revert a nvme_queue size optimization (Keith Bush)
- fabrics timeout races fixes (Chao Leng and Sagi Grimberg)"
- null_blk zone locking fix (Damien)
* tag 'block-5.10-2020-11-07' of git://git.kernel.dk/linux-block:
null_blk: Fix scheduling in atomic with zoned mode
nvme-tcp: avoid repeated request completion
nvme-rdma: avoid repeated request completion
nvme-tcp: avoid race between time out and tear down
nvme-rdma: avoid race between time out and tear down
nvme: introduce nvme_sync_io_queues
Revert "nvme-pci: remove last_sq_tail"
Pull io_uring fixes from Jens Axboe:
"A set of fixes for io_uring:
- SQPOLL cancelation fixes
- Two fixes for the io_identity COW
- Cancelation overflow fix (Pavel)
- Drain request cancelation fix (Pavel)
- Link timeout race fix (Pavel)"
* tag 'io_uring-5.10-2020-11-07' of git://git.kernel.dk/linux-block:
io_uring: fix link lookup racing with link timeout
io_uring: use correct pointer for io_uring_show_cred()
io_uring: don't forget to task-cancel drained reqs
io_uring: fix overflowed cancel w/ linked ->files
io_uring: drop req/tctx io_identity separately
io_uring: ensure consistent view of original task ->mm from SQPOLL
io_uring: properly handle SQPOLL request cancelations
io-wq: cancel request if it's asking for files and we don't have them
Gratian managed to trigger the BUG_ON(!newowner) in fixup_pi_state_owner().
This is one possible chain of events leading to this:
Task Prio Operation
T1 120 lock(F)
T2 120 lock(F) -> blocks (top waiter)
T3 50 (RT) lock(F) -> boosts T1 and blocks (new top waiter)
XX timeout/ -> wakes T2
signal
T1 50 unlock(F) -> wakes T3 (rtmutex->owner == NULL, waiter bit is set)
T2 120 cleanup -> try_to_take_mutex() fails because T3 is the top waiter
and the lower priority T2 cannot steal the lock.
-> fixup_pi_state_owner() sees newowner == NULL -> BUG_ON()
The comment states that this is invalid and rt_mutex_real_owner() must
return a non NULL owner when the trylock failed, but in case of a queued
and woken up waiter rt_mutex_real_owner() == NULL is a valid transient
state. The higher priority waiter has simply not yet managed to take over
the rtmutex.
The BUG_ON() is therefore wrong and this is just another retry condition in
fixup_pi_state_owner().
Drop the locks, so that T3 can make progress, and then try the fixup again.
Gratian provided a great analysis, traces and a reproducer. The analysis is
to the point, but it confused the hell out of that tglx dude who had to
page in all the futex horrors again. Condensed version is above.
[ tglx: Wrote comment and changelog ]
Fixes: c1e2f0eaf0 ("futex: Avoid violating the 10th rule of futex")
Reported-by: Gratian Crisan <gratian.crisan@ni.com>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/87a6w6x7bb.fsf@ni.com
Link: https://lore.kernel.org/r/87sg9pkvf7.fsf@nanos.tec.linutronix.de
With CONFIG_BRIDGE=m the compilation fails:
ld: drivers/net/ethernet/marvell/prestera/prestera_switchdev.o: in function `prestera_bridge_port_event':
prestera_switchdev.c:(.text+0x2ebd): undefined reference to `br_vlan_enabled'
in case the driver is statically enabled.
Fix it by adding 'BRIDGE || BRIDGE=n' dependency.
Fixes: e1189d9a5f ("net: marvell: prestera: Add Switchdev driver implementation")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Vadym Kochan <vadym.kochan@plvision.eu>
Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Link: https://lore.kernel.org/r/20201106161128.24069-1-vadym.kochan@plvision.eu
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Saeed Mahameed says:
====================
mlx5 fixes 2020-11-03
v1->v2:
- Fix fixes line tag in patch #1
- Toss ktls refcount leak fix, Maxim will look further into the root
cause.
- Toss eswitch chain 0 prio patch, until we determine if it is needed
for -rc and net.
* tag 'mlx5-fixes-2020-11-03' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5e: Fix incorrect access of RCU-protected xdp_prog
net/mlx5e: Fix VXLAN synchronization after function reload
net/mlx5: E-switch, Avoid extack error log for disabled vport
net/mlx5: Fix deletion of duplicate rules
net/mlx5e: Use spin_lock_bh for async_icosq_lock
net/mlx5e: Protect encap route dev from concurrent release
net/mlx5e: Fix modify header actions memory leak
====================
Link: https://lore.kernel.org/r/20201105202129.23644-1-saeedm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
RTL8125B has same or similar short packet hw padding bug as RTL8168evl.
The main workaround has been extended accordingly, however we have to
disable also hw checksumming for short packets on affected new chip
versions. Instead of checking for an affected chip version let's
simply disable hw checksumming for short packets in general.
v2:
- remove the version checks and disable short packet hw csum in general
- reflect this in commit title and message
Fixes: 0439297be9 ("r8169: add support for RTL8125B")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/7fbb35f0-e244-ef65-aa55-3872d7d38698@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The caller of rtl8169_tso_csum_v2() frees the skb if false is returned.
eth_skb_pad() internally frees the skb on error what would result in a
double free. Therefore use __skb_put_padto() directly and instruct it
to not free the skb on error.
Fixes: b423e9ae49 ("r8169: fix offloaded tx checksum for small packets.")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/f7e68191-acff-9ded-4263-c016428a8762@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull i2c fixes from Wolfram Sang:
"Driver bugfixes for I2C.
Most of them are for the new mlxbf driver which got more exposure
after rc1. The sh_mobile patch should already have reached you during
the merge window, but I accidently dropped it. However, since it fixes
a problem with rebooting, it is still fine for rc3"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: designware: slave should do WRITE_REQUESTED before WRITE_RECEIVED
i2c: designware: call i2c_dw_read_clear_intrbits_slave() once
i2c: mlxbf: I2C_MLXBF should depend on MELLANOX_PLATFORM
i2c: mlxbf: Update author and maintainer email info
i2c: mlxbf: Update reference clock frequency
i2c: mlxbf: Remove unecessary wrapper functions
i2c: mlxbf: Fix resrticted cast warning of sparse
i2c: mlxbf: Add CONFIG_ACPI to guard ACPI function call
i2c: sh_mobile: implement atomic transfers
i2c: mediatek: move dma reset before i2c reset
Pull RISC-V fixes from Palmer Dabbelt:
- SPDX comment style fix
- ignore memory that is unusable
- avoid setting a kernel text offset for the !MMU kernels, where
skipping the first page of memory is both unnecessary and costly
- avoid passing the flag bits in satp to pfn_to_virt()
- fix __put_kernel_nofault, where we had the arguments to
__put_user_nocheck reversed
- workaround for a bug in the FU540 to avoid triggering PMP issues
during early boot
- change to how we pull symbols out of the vDSO. The old mechanism was
removed from binutils-2.35 (and has been backported to Debian's 2.34)
* tag 'riscv-for-linus-5.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
RISC-V: Fix the VDSO symbol generaton for binutils-2.35+
RISC-V: Use non-PGD mappings for early DTB access
riscv: uaccess: fix __put_kernel_nofault()
riscv: fix pfn_to_virt err in do_page_fault().
riscv: Set text_offset correctly for M-Mode
RISC-V: Remove any memblock representing unusable memory area
risc-v: kernel: ftrace: Fixes improper SPDX comment style
Johan writes:
USB-serial fixes for 5.10-rc3
Here's a fix for a long-standing issue with the cyberjack driver and
some new device ids.
All have been in linux-next with no reported issues.
* tag 'usb-serial-5.10-rc3' of https://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial:
USB: serial: option: add Telit FN980 composition 0x1055
USB: serial: option: add LE910Cx compositions 0x1203, 0x1230, 0x1231
USB: serial: cyberjack: fix write-URB completion race
USB: serial: option: add Quectel EC200T module support
As shown through runtime testing, the "filename" allocation is not
always freed in perf_event_parse_addr_filter().
There are three possible ways that this could happen:
- It could be allocated twice on subsequent iterations through the loop,
- or leaked on the success path,
- or on the failure path.
Clean up the code flow to make it obvious that 'filename' is always
freed in the reallocation path and in the two return paths as well.
We rely on the fact that kfree(NULL) is NOP and filename is initialized
with NULL.
This fixes the leak. No other side effects expected.
[ Dan Carpenter: cleaned up the code flow & added a changelog. ]
[ Ingo Molnar: updated the changelog some more. ]
Fixes: 375637bc52 ("perf/core: Introduce address range filtering")
Signed-off-by: "kiyin(尹亮)" <kiyin@tencent.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: "Srivatsa S. Bhat" <srivatsa@csail.mit.edu>
Cc: Anthony Liguori <aliguori@amazon.com>
--
kernel/events/core.c | 12 +++++-------
1 file changed, 5 insertions(+), 7 deletions(-)
Testing shows that trailing spaces caused problems with the OEM_ID and
the OEM_TABLE_ID. One being that the OEM_ID would not string compare
correctly. Another the OEM_ID and OEM_TABLE_ID would be concatenated
in the printout. Remove any trailing spaces.
Fixes: 1e61f5a95f ("Add and decode Arch Type in UVsystab")
Signed-off-by: Mike Travis <mike.travis@hpe.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201105222741.157029-3-mike.travis@hpe.com
Add missing __acquires() and __releases() annotations. Also, in an
"this should never happen" WARN_ON check, if it *does* actually
happen, we need to release j_state_lock since this function is always
supposed to release that lock. Otherwise, things will quickly grind
to a halt after the WARN_ON trips.
Fixes: 96f1e09745 ("jbd2: avoid long hold times of j_state_lock...")
Cc: stable@kernel.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Add missing __acquire() and __releases() annotations, and make
fc_ineligible_reasons[] static, as it is not used outside of
fs/ext4/fast_commit.c.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
This patch removes jbd2_fc_init() API and its related functions to
simplify enabling fast commits. With this change, the number of fast
commit blocks to use is solely determined by the JBD2 layer. So, we
move the default value for minimum number of fast commit blocks from
ext4/fast_commit.h to include/linux/jbd2.h. However, whether or not to
use fast commits is determined by the file system. The file system
just sets the fast commit feature using
jbd2_journal_set_features(). JBD2 layer then determines how many
blocks to use for fast commits (based on the value found in the JBD2
superblock).
Note that the JBD2 feature flag of fast commits is just an indication
that there are fast commit blocks present on disk. It doesn't tell
JBD2 layer about the intent of the file system of whether to it wants
to use fast commit or not. That's why, we blindly clear the fast
commit flag in journal_reset() after the recovery is done.
Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/20201106035911.1942128-7-harshadshirwadkar@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The on-disk superblock field sb->s_maxlen represents the total size of
the journal including the fast commit area and is no more the max
number of blocks available for a transaction. The maximum number of
blocks available to a transaction is reduced by the number of fast
commit blocks. So, this patch renames j_maxlen to j_total_len to
better represent its intent. Also, it adds a function to calculate max
number of bufs available for a transaction.
Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/20201106035911.1942128-6-harshadshirwadkar@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Firstly, pass handle to all ext4_fc_track_* functions and use
transaction id found in handle->h_transaction->h_tid for tracking fast
commit updates. Secondly, don't pass inode to
ext4_fc_track_link/create/unlink functions. inode can be found inside
these functions as d_inode(dentry). However, rename path is an
exeception. That's because in that case, we need inode that's not same
as d_inode(dentry). To handle that, add a couple of low-level wrapper
functions that take inode and dentry as arguments.
Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/20201106035911.1942128-5-harshadshirwadkar@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If inode gets evicted due to memory pressure, we have to remove it
from the fast commit list. However, that inode may have uncommitted
changes that fast commits will lose. So, just fall back to full
commits in this case. Also, rename the fast commit ineligiblity reason
from "EXT4_FC_REASON_MEM" to "EXT4_FC_REASON_MEM_NOMEM" for better
expression.
Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/20201106035911.1942128-3-harshadshirwadkar@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Smatch complains that "i" can be uninitialized if we don't enter the
loop. I don't know if it's possible but we may as well silence this
warning.
[ Initialize i to sb->s_blocksize instead of 0. The only way the for
loop could be skipped entirely is the in-memory data structures, in
particular the bh->b_data for the on-disk superblock has gotten
corrupted enough that calculated value of group is >= to
ext4_get_groups_count(sb). In that case, we want to exit
immediately without allocating a block. -- TYT ]
Fixes: 8016e29f43 ("ext4: fast commit recovery path")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: https://lore.kernel.org/r/20201030114620.GB3251003@mwanda
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
The macro MOPT_Q is used to indicates the mount option is related to
quota stuff and is defined to be MOPT_NOSUPPORT when CONFIG_QUOTA is
disabled. Normally the quota options are handled explicitly, so it
didn't matter that the MOPT_STRING flag was missing, even though the
usrjquota and grpjquota mount options take a string argument. It's
important that's present in the !CONFIG_QUOTA case, since without
MOPT_STRING, the mount option matcher will match usrjquota= followed
by an integer, and will otherwise skip the table entry, and so "mount
option not supported" error message is never reported.
[ Fixed up the commit description to better explain why the fix
works. --TYT ]
Fixes: 26092bf524 ("ext4: use a table-driven handler for mount options")
Signed-off-by: Kaixu Xia <kaixuxia@tencent.com>
Link: https://lore.kernel.org/r/1603986396-28917-1-git-send-email-kaixuxia@tencent.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Alexei Starovoitov says:
====================
pull-request: bpf 2020-11-06
1) Pre-allocated per-cpu hashmap needs to zero-fill reused element, from David.
2) Tighten bpf_lsm function check, from KP.
3) Fix bpftool attaching to flow dissector, from Lorenz.
4) Use -fno-gcse for the whole kernel/bpf/core.c instead of function attribute, from Ard.
* git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf: Update verification logic for LSM programs
bpf: Zero-fill re-used per-cpu map element
bpf: BPF_PRELOAD depends on BPF_SYSCALL
tools/bpftool: Fix attaching flow dissector
libbpf: Fix possible use after free in xsk_socket__delete
libbpf: Fix null dereference in xsk_socket__delete
libbpf, hashmap: Fix undefined behavior in hash_bits
bpf: Don't rely on GCC __attribute__((optimize)) to disable GCSE
tools, bpftool: Remove two unused variables.
tools, bpftool: Avoid array index warnings.
xsk: Fix possible memory leak at socket close
bpf: Add struct bpf_redir_neigh forward declaration to BPF helper defs
samples/bpf: Set rlimit for memlock to infinity in all samples
bpf: Fix -Wshadow warnings
selftest/bpf: Fix profiler test using CO-RE relocation for enums
====================
Link: https://lore.kernel.org/r/20201106221759.24143-1-alexei.starovoitov@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull ceph fix from Ilya Dryomov:
"A fix for a potential stall on umount caused by the MDS dropping our
REQUEST_CLOSE message. The code that handled this case was
inadvertently disabled in 5.9, this patch removes it entirely and
fixes the problem in a way that is consistent with ceph-fuse"
* tag 'ceph-for-5.10-rc3' of git://github.com/ceph/ceph-client:
ceph: check session state after bumping session->s_seq
Pull Kselftest fixes from Shuah Khan:
"Fixes to the ftrace test and several fixes from Tommi Rantala for
various other tests"
* tag 'linux-kselftest-fixes-5.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests: binderfs: use SKIP instead of XFAIL
selftests: clone3: use SKIP instead of XFAIL
selftests: core: use SKIP instead of XFAIL in close_range_test.c
selftests: proc: fix warning: _GNU_SOURCE redefined
selftests: pidfd: drop needless linux/kcmp.h inclusion in pidfd_setns_test.c
selftests: pidfd: add CONFIG_CHECKPOINT_RESTORE=y to config
selftests: pidfd: skip test on kcmp() ENOSYS
selftests: pidfd: use ksft_test_result_skip() when skipping test
selftests/harness: prettify SKIP message whitespace again
selftests: pidfd: fix compilation errors due to wait.h
selftests: filter kselftest headers from command in lib.mk
selftests/ftrace: check for do_sys_openat2 in user-memory test
selftests/ftrace: Use $FUNCTION_FORK to reference kernel fork function
Pull SCSI fixes from James Bottomley:
"Three driver fixes. Two (alua and hpsa) are in hard to trigger
attach/detach situations but the mp3sas one involves a polled to
interrupt switch over that could trigger in any high IOPS situation"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: mpt3sas: Fix timeouts observed while reenabling IRQ
scsi: scsi_dh_alua: Avoid crash during alua_bus_detach()
scsi: hpsa: Fix memory leak in hpsa_init_one()
The current logic checks if the name of the BTF type passed in
attach_btf_id starts with "bpf_lsm_", this is not sufficient as it also
allows attachment to non-LSM hooks like the very function that performs
this check, i.e. bpf_lsm_verify_prog.
In order to ensure that this verification logic allows attachment to
only LSM hooks, the LSM_HOOK definitions in lsm_hook_defs.h are used to
generate a BTF_ID set. Upon verification, the attach_btf_id of the
program being attached is checked for presence in this set.
Fixes: 9e4e01dfd3 ("bpf: lsm: Implement attach, detach and execution")
Signed-off-by: KP Singh <kpsingh@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20201105230651.2621917-1-kpsingh@chromium.org
Pull mtd fixes from Miquel Raynal.
* 'mtd/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux:
mtd: rawnand: stm32_fmc2: fix broken ECC
mtd: spi-nor: Fix address width on flash chips > 16MB
mtd: spi-nor: Don't copy self-pointing struct around
mtd: rawnand: ifc: Move the ECC engine initialization to the right place
mtd: rawnand: mxc: Move the ECC engine initialization to the right place
Pull spi fix from Mark Brown:
"This is an additional fix on top of 5e31ba0c05 ('spi: bcm2835: fix
gpio cs level inversion') - when sending my prior pull request I had
misremembred the status of that patch, apologies for the noise here"
* tag 'spi-fix-v5.10-rc2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: bcm2835: remove use of uninitialized gpio flags variable
Pull sound fixes from Takashi Iwai:
"Quite a bunch of small fixes that have been gathered since the last
pull, including changes like below:
- HD-audio runtime PM fixes and refactoring
- HD-audio and USB-audio quirks
- SOF warning fix
- Various ASoC device-specific fixes for Intel, Qualcomm, etc"
* tag 'sound-5.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (26 commits)
ALSA: usb-audio: Add implicit feedback quirk for Qu-16
ASoC: mchp-spdiftx: Do not set Validity bit(s)
ALSA: usb-audio: Add implicit feedback quirk for MODX
ALSA: usb-audio: add usb vendor id as DSD-capable for Khadas devices
ALSA: hda/realtek - Enable headphone for ASUS TM420
ALSA: hda: prevent undefined shift in snd_hdac_ext_bus_get_link()
ASoC: qcom: lpass-cpu: Fix clock disable failure
ASoC: qcom: lpass-sc7180: Fix MI2S bitwidth field bit positions
ASoC: codecs: wcd9335: Set digital gain range correctly
ASoC: codecs: wcd934x: Set digital gain range correctly
ALSA: hda: Reinstate runtime_allow() for all hda controllers
ALSA: hda: Separate runtime and system suspend
ALSA: hda: Refactor codec PM to use direct-complete optimization
ALSA: hda/realtek - Fixed HP headset Mic can't be detected
ALSA: usb-audio: Add implicit feedback quirk for Zoom UAC-2
ALSA: make snd_kcontrol_new name a normal string
ALSA: fix kernel-doc markups
ASoC: SOF: loader: handle all SOF_IPC_EXT types
ASoC: cs42l51: manage mclk shutdown delay
ASoC: qcom: sdm845: set driver name correctly
...
You can't write to this file because the permissions are 0444. But
it sort of looked like you could do a write and it would result in
a read. Then it looked like proc_sys_call_handler() just ignored
it. Which is confusing. It's more clear if the "write" just
returns zero.
Also, the "lenp" pointer is never NULL so that check can be removed.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Pull drm fixes from Dave Airlie:
"It's Friday here so that means another installment of drm fixes to
distract you from the counting process.
Changes all over the place, the amdgpu changes contain support for a
new GPU that is close to current one already in the tree (Green
Sardine) so it shouldn't have much side effects.
Otherwise imx has a few cleanup patches and fixes, amdgpu and i915
have around the usual smattering of fixes, fonts got constified, and
vc4/panfrost has some minor fixes. All in all a fairly regular rc3.
We have an outstanding nouveau regression, but the author is looking
into the fix, so should be here next week.
I now return you to counting.
fonts:
- constify font structures.
MAINTAINERS:
- Fix path for amdgpu power management
amdgpu:
- Add support for more navi1x SKUs
- Fix for suspend on CI dGPUs
- VCN DPG fix for Picasso
- Sienna Cichlid fixes
- Polaris DPM fix
- Add support for Green Sardine
amdkfd:
- Fix an allocation failure check
i915:
- Fix set domain's cache coherency
- Fixes around breadcrumbs
- Fix encoder lookup during PSR atomic
- Hold onto an explicit ref to i915_vma_work.pinned
- gvt: HWSP reset handling fix
- gvt: flush workaround
- gvt: vGPU context pin/unpin
- gvt: mmio cmd access fix for bxt/apl
imx:
- drop unused functions and callbacks
- reuse imx_drm_encoder_parse_of
- spinlock rework
- memory leak fix
- minor cleanups
vc4:
- resource cleanup fix
panfrost:
- madvise/shrinker fix"
* tag 'drm-fixes-2020-11-06-1' of git://anongit.freedesktop.org/drm/drm: (55 commits)
drm/amdgpu/display: remove DRM_AMD_DC_GREEN_SARDINE
drm/amd/display: Add green_sardine support to DM
drm/amd/display: Add green_sardine support to DC
drm/amdgpu: enable vcn support for green_sardine (v2)
drm/amdgpu: enable green_sardine_asd.bin loading (v2)
drm/amdgpu/sdma: add sdma engine support for green_sardine (v2)
drm/amdgpu: add gfx support for green_sardine (v2)
drm/amdgpu: add soc15 common ip block support for green_sardine (v3)
drm/amdgpu: add green_sardine support for gpu_info and ip block setting (v2)
drm/amdgpu: add Green_Sardine APU flag
drm/amdgpu: resolved ASD loading issue on sienna
amdkfd: Check kvmalloc return before memcpy
drm/amdgpu: update golden setting for sienna_cichlid
amd/amdgpu: Disable VCN DPG mode for Picasso
drm/amdgpu/swsmu: remove duplicate call to smu_set_default_dpm_table
drm/i915: Hold onto an explicit ref to i915_vma_work.pinned
drm/i915/gt: Flush xcs before tgl breadcrumbs
drm/i915/gt: Expose more parameters for emitting writes into the ring
drm/i915: Fix encoder lookup during PSR atomic check
drm/i915/gt: Use the local HWSP offset during submission
...
Pull tpm fixes from Jarkko Sakkinen:
"Two critical tpm driver bug fixes"
* tag 'tpmdd-next-v5.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
tpm: efi: Don't create binary_bios_measurements file for an empty log
tpm_tis: Disable interrupts on ThinkPad T490s
Pull iommu fixes from Joerg Roedel:
- Fix a NULL-ptr dereference in the Intel VT-d driver
- Two fixes for Intel SVM support
- Increase IRQ remapping table size in the AMD IOMMU driver. The old
number of 128 turned out to be too low for some recent devices.
- Fix a mask check in generic IOMMU code
* tag 'iommu-fixes-v5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu: Fix a check in iommu_check_bind_data()
iommu/vt-d: Fix a bug for PDP check in prq_event_thread
iommu/vt-d: Fix sid not set issue in intel_svm_bind_gpasid()
iommu/vt-d: Fix kernel NULL pointer dereference in find_domain()
iommu/amd: Increase interrupt remapping table limit to 512 entries
Pull VFIO fixes from Alex Williamson:
- Remove code by using existing helper (Zenghui Yu)
- fsl-mc copy-user return and underflow fixes (Dan Carpenter)
- fsl-mc static function declaration (Diana Craciun)
- Fix ioeventfd sleeping under spinlock (Alex Williamson)
- Fix pm reference count leak in vfio-platform (Zhang Qilong)
- Allow opening IGD device w/o OpRegion support (Fred Gao)
* tag 'vfio-v5.10-rc3' of git://github.com/awilliam/linux-vfio:
vfio/pci: Bypass IGD init in case of -ENODEV
vfio: platform: fix reference leak in vfio_platform_open
vfio/pci: Implement ioeventfd thread handler for contended memory lock
vfio/fsl-mc: Make vfio_fsl_mc_irqs_allocate static
vfio/fsl-mc: prevent underflow in vfio_fsl_mc_mmap()
vfio/fsl-mc: return -EFAULT if copy_to_user() fails
vfio/type1: Use the new helper to find vfio_group
Pull arm64 fixes from Will Deacon:
"Here's the weekly batch of fixes for arm64. Not an awful lot here, but
there are still a few unresolved issues relating to CPU hotplug, RCU
and IRQ tracing that I hope to queue fixes for next week.
Summary:
- Fix early use of kprobes
- Fix kernel placement in kexec_file_load()
- Bump maximum number of NUMA nodes"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: kexec_file: try more regions if loading segments fails
arm64: kprobes: Use BRK instead of single-step when executing instructions out-of-line
arm64: NUMA: Kconfig: Increase NODES_SHIFT to 4
Pull ARC fixes from Vineet Gupta:
- Unbork HSDKv1 platform (won't boot) due to memory map issue
- Prevent stack unwinder from infinite looping
* tag 'arc-5.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
ARC: [plat-hsdk] Remap CCMs super early in asm boot trampoline
ARC: stack unwinding: avoid indefinite looping
Pull s390 fixes from Heiko Carstens:
- fix reference counting for ap devices
- fix paes selftest
- fix pmd_deref()/pud_deref() so they can also handle large pages
- remove unused vdso file and defines
- update defconfigs
- call rcu_cpu_starting() early in smp init code to avoid lockdep
warnings
- fix hotplug of PCI function missing bus
* tag 's390-5.10-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/pci: fix hot-plug of PCI function missing bus
s390/smp: move rcu_cpu_starting() earlier
s390/pkey: fix paes selftest failure with paes and pkey static build
s390: update defconfigs
s390/vdso: remove unused constants
s390/vdso: remove empty unused file
s390/mm: make pmd/pud_deref() large page aware
s390/ap: fix ap devices reference counting
Pull networking fixes from Jakub Kicinski:
"Networking fixes for 5.10-rc3, including fixes from wireless, can, and
netfilter subtrees.
Current merge window - bugs in new features:
- can: isotp: isotp_rcv_cf(): enable RX timeout handling in
listen-only mode
Previous releases - regressions:
- mac80211:
- don't require VHT elements for HE on 2.4 GHz
- fix regression where EAPOL frames were sent in plaintext
- netfilter:
- ipset: Update byte and packet counters regardless of whether
they match
- ip_tunnel: fix over-mtu packet send by allowing fragmenting even if
inner packet has IP_DF (don't fragment) set in its header (when
TUNNEL_DONT_FRAGMENT flag is not set on the tunnel dev)
- net: fec: fix MDIO probing for some FEC hardware blocks
- ip6_tunnel: set inner ipproto before ip6_tnl_encap to un-break gso
support
- sctp: Fix COMM_LOST/CANT_STR_ASSOC err reporting on big-endian
platforms, sparse-related fix used the wrong integer size
Previous releases - always broken:
- netfilter: use actual socket sk rather than skb sk when routing
harder
- r8169: work around short packet hw bug on RTL8125 by padding frames
- net: ethernet: ti: cpsw: disable PTPv1 hw timestamping
advertisement, the hardware does not support it
- chelsio/chtls: fix always leaking ctrl_skb and another leak caused
by a race condition
- fix drivers incorrectly writing into skbs on TX:
- cadence: force nonlinear buffers to be cloned
- gianfar: Account for Tx PTP timestamp in the skb headroom
- gianfar: Replace skb_realloc_headroom with skb_cow_head for PTP
- can: flexcan:
- remove FLEXCAN_QUIRK_DISABLE_MECR quirk for LS1021A
- add ECC initialization for VF610 and LX2160A
- flexcan_remove(): disable wakeup completely
- can: fix packet echo functionality:
- peak_canfd: fix echo management when loopback is on
- make sure skbs are not freed in IRQ context in case they need to
be dropped
- always clone the skbs to make sure they have a reference on the
socket, and prevent it from disappearing
- fix real payload length return value for RTR frames
- can: j1939: return failure on bind if netdev is down, rather than
waiting indefinitely
Misc:
- IPv6: reply ICMP error if the first fragment don't include all
headers to improve compliance with RFC 8200"
* tag 'net-5.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (66 commits)
ionic: check port ptr before use
r8169: work around short packet hw bug on RTL8125
net: openvswitch: silence suspicious RCU usage warning
chelsio/chtls: fix always leaking ctrl_skb
chelsio/chtls: fix memory leaks caused by a race
can: flexcan: flexcan_remove(): disable wakeup completely
can: flexcan: add ECC initialization for VF610
can: flexcan: add ECC initialization for LX2160A
can: flexcan: remove FLEXCAN_QUIRK_DISABLE_MECR quirk for LS1021A
can: mcp251xfd: remove unneeded break
can: mcp251xfd: mcp251xfd_regmap_nocrc_read(): fix semicolon.cocci warnings
can: mcp251xfd: mcp251xfd_regmap_crc_read(): increase severity of CRC read error messages
can: peak_canfd: pucan_handle_can_rx(): fix echo management when loopback is on
can: peak_usb: peak_usb_get_ts_time(): fix timestamp wrapping
can: peak_usb: add range checking in decode operations
can: xilinx_can: handle failure cases of pm_runtime_get_sync
can: ti_hecc: ti_hecc_probe(): add missed clk_disable_unprepare() in error path
can: isotp: padlen(): make const array static, makes object smaller
can: isotp: isotp_rcv_cf(): enable RX timeout handling in listen-only mode
can: isotp: Explain PDU in CAN_ISOTP help text
...
Implement ->read_iter for all proc "single files" so that more bionic
tests cases can pass when they call splice() on other fun files like
/proc/version
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Implement ->read_iter so that the Android bionic test suite can use
this random proc file for its splice test case.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Wire up generic_file_splice_read for the iter based proxy ops, so
that splice reads from them work.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
iov_iter based variant for reading a seq_file. seq_read is
reimplemented on top of the iter variant.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I_CREATING isn't actually set until the inode has been assigned an inode
number and inserted into the inode hash table. So the WARN_ON() in
fscrypt_setup_iv_ino_lblk_32_key() is wrong, and it can trigger when
creating an encrypted file on ext4. Remove it.
This was sometimes causing xfstest generic/602 to fail on ext4. I
didn't notice it before because due to a separate oversight, new inodes
that haven't been assigned an inode number yet don't necessarily have
i_ino == 0 as I had thought, so by chance I never saw the test fail.
Fixes: a992b20cd4 ("fscrypt: add fscrypt_prepare_new_inode() and fscrypt_set_context()")
Reported-by: Theodore Y. Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20201031004556.87862-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Commit aa1c09cb65 ("null_blk: Fix locking in zoned mode") changed
zone locking to using the potentially sleeping wait_on_bit_io()
function. This is acceptable when memory backing is enabled as the
device queue is in that case marked as blocking, but this triggers a
scheduling while in atomic context with memory backing disabled.
Fix this by relying solely on the device zone spinlock for zone
information protection without temporarily releasing this lock around
null_process_cmd() execution in null_zone_write(). This is OK to do
since when memory backing is disabled, command processing does not
block and the memory backing lock nullb->lock is unused. This solution
avoids the overhead of having to mark a zoned null_blk device queue as
blocking when memory backing is unused.
This patch also adds comments to the zone locking code to explain the
unusual locking scheme.
Fixes: aa1c09cb65 ("null_blk: Fix locking in zoned mode")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Commit 2ae0b31e0f ("tty: don't crash in tty_init_dev when missing
tty_port") didn't fully prevent the crash as the cleanup path in
tty_init_dev() calls release_tty() which dereferences tty->port
without checking it for non-null.
Add tty->port checks to release_tty to avoid the kernel crash.
Fixes: 2ae0b31e0f ("tty: don't crash in tty_init_dev when missing tty_port")
Signed-off-by: Matthias Reichl <hias@horus.com>
Link: https://lore.kernel.org/r/20201105123432.4448-1-hias@horus.com
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Mediatek 8250 port supports speed higher than uartclk / 16. If the baud
rates in both the new and the old termios setting are higher than
uartclk / 16, the WARN_ON in uart_get_baud_rate() will be triggered.
Passing NULL as the old termios so uart_get_baud_rate() will use
uartclk / 16 - 1 as the new baud rate which will be replaced by the
original baud rate later by tty_termios_encode_baud_rate() in
mtk8250_set_termios().
Fixes: 551e553f0d ("serial: 8250_mtk: Fix high-speed baud rates clamping")
Signed-off-by: Claire Chang <tientzu@chromium.org>
Link: https://lore.kernel.org/r/20201102120749.374458-1-tientzu@chromium.org
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Mimic the pre-existing ACPI and Device Tree event log behavior by not
creating the binary_bios_measurements file when the EFI TPM event log is
empty.
This fixes the following NULL pointer dereference that can occur when
reading /sys/kernel/security/tpm0/binary_bios_measurements after the
kernel received an empty event log from the firmware:
BUG: kernel NULL pointer dereference, address: 000000000000002c
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 2 PID: 3932 Comm: fwupdtpmevlog Not tainted 5.9.0-00003-g629990edad62 #17
Hardware name: LENOVO 20LCS03L00/20LCS03L00, BIOS N27ET38W (1.24 ) 11/28/2019
RIP: 0010:tpm2_bios_measurements_start+0x3a/0x550
Code: 54 53 48 83 ec 68 48 8b 57 70 48 8b 1e 65 48 8b 04 25 28 00 00 00 48 89 45 d0 31 c0 48 8b 82 c0 06 00 00 48 8b 8a c8 06 00 00 <44> 8b 60 1c 48 89 4d a0 4c 89 e2 49 83 c4 20 48 83 fb 00 75 2a 49
RSP: 0018:ffffa9c901203db0 EFLAGS: 00010246
RAX: 0000000000000010 RBX: 0000000000000000 RCX: 0000000000000010
RDX: ffff8ba1eb99c000 RSI: ffff8ba1e4ce8280 RDI: ffff8ba1e4ce8258
RBP: ffffa9c901203e40 R08: ffffa9c901203dd8 R09: ffff8ba1ec443300
R10: ffffa9c901203e50 R11: 0000000000000000 R12: ffff8ba1e4ce8280
R13: ffffa9c901203ef0 R14: ffffa9c901203ef0 R15: ffff8ba1e4ce8258
FS: 00007f6595460880(0000) GS:ffff8ba1ef880000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000000002c CR3: 00000007d8d18003 CR4: 00000000003706e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
? __kmalloc_node+0x113/0x320
? kvmalloc_node+0x31/0x80
seq_read+0x94/0x420
vfs_read+0xa7/0x190
ksys_read+0xa7/0xe0
__x64_sys_read+0x1a/0x20
do_syscall_64+0x37/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xa9
In this situation, the bios_event_log pointer in the tpm_bios_log struct
was not NULL but was equal to the ZERO_SIZE_PTR (0x10) value. This was
due to the following kmemdup() in tpm_read_log_efi():
int tpm_read_log_efi(struct tpm_chip *chip)
{
...
/* malloc EventLog space */
log->bios_event_log = kmemdup(log_tbl->log, log_size, GFP_KERNEL);
if (!log->bios_event_log) {
ret = -ENOMEM;
goto out;
}
...
}
When log_size is zero, due to an empty event log from firmware,
ZERO_SIZE_PTR is returned from kmemdup(). Upon a read of the
binary_bios_measurements file, the tpm2_bios_measurements_start()
function does not perform a ZERO_OR_NULL_PTR() check on the
bios_event_log pointer before dereferencing it.
Rather than add a ZERO_OR_NULL_PTR() check in functions that make use of
the bios_event_log pointer, simply avoid creating the
binary_bios_measurements_file as is done in other event log retrieval
backends.
Explicitly ignore all of the events in the final event log when the main
event log is empty. The list of events in the final event log cannot be
accurately parsed without referring to the first event in the main event
log (the event log header) so the final event log is useless in such a
situation.
Fixes: 58cc1e4faf ("tpm: parse TPM event logs based on EFI table")
Link: https://lore.kernel.org/linux-integrity/E1FDCCCB-CA51-4AEE-AC83-9CDE995EAE52@canonical.com/
Reported-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Reported-by: Kenneth R. Crudup <kenny@panix.com>
Reported-by: Mimi Zohar <zohar@linux.ibm.com>
Cc: Thiébaud Weksteen <tweek@google.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Tyler Hicks <tyhicks@linux.microsoft.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
There is a misconfiguration in the bios of the gpio pin used for the
interrupt in the T490s. When interrupts are enabled in the tpm_tis
driver code this results in an interrupt storm. This was initially
reported when we attempted to enable the interrupt code in the tpm_tis
driver, which previously wasn't setting a flag to enable it. Due to
the reports of the interrupt storm that code was reverted and we went back
to polling instead of using interrupts. Now that we know the T490s problem
is a firmware issue, add code to check if the system is a T490s and
disable interrupts if that is the case. This will allow us to enable
interrupts for everyone else. If the user has a fixed bios they can
force the enabling of interrupts with tpm_tis.interrupts=1 on the
kernel command line.
Cc: Peter Huewe <peterhuewe@gmx.de>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Jerry Snitselaar <jsnitsel@redhat.com>
Reviewed-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
The AA64ZFR0_EL1 accessors are just the general accessors with
its visibility function open-coded. It also skips the if-else
chain in read_id_reg, but there's no reason not to go there.
Indeed consolidating ID register accessors and removing lines
of code make it worthwhile.
Remove the AA64ZFR0_EL1 accessors, replacing them with the
general accessors for sanitized ID registers.
No functional change intended.
Signed-off-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201105091022.15373-5-drjones@redhat.com
The instruction encodings of ID registers are preallocated. Until an
encoding is assigned a purpose the register is RAZ. KVM's general ID
register accessor functions already support both paths, RAZ or not.
If for each ID register we can determine if it's RAZ or not, then all
ID registers can build on the general functions. The register visibility
function allows us to check whether a register should be completely
hidden or not, extending it to also report when the register should
be RAZ or not allows us to use it for ID registers as well.
Check for RAZ visibility in the ID register accessor functions,
allowing the RAZ case to be handled in a generic way for all system
registers.
The new REG_RAZ flag will be used in a later patch. This patch has
no intended functional change.
Signed-off-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201105091022.15373-4-drjones@redhat.com
REG_HIDDEN_GUEST and REG_HIDDEN_USER are always used together.
Consolidate them into a single REG_HIDDEN flag. We can always
add another flag later if some register needs to expose itself
differently to the guest than it does to userspace.
No functional change intended.
Signed-off-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201105091022.15373-3-drjones@redhat.com
ID registers are RAZ until they've been allocated a purpose, but
that doesn't mean they should be removed from the KVM_GET_REG_LIST
list. So far we only have one register, SYS_ID_AA64ZFR0_EL1, that
is hidden from userspace when its function, SVE, is not present.
Expose SYS_ID_AA64ZFR0_EL1 to userspace as RAZ when SVE is not
implemented. Removing the userspace visibility checks is enough
to reexpose it, as it will already return zero to userspace when
SVE is not present. The register already behaves as RAZ for the
guest when SVE is not present.
Fixes: 73433762fc ("KVM: arm64/sve: System register context switch and access support")
Reported-by: 张东旭 <xu910121@sina.com>
Signed-off-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org#v5.2+
Link: https://lore.kernel.org/r/20201105091022.15373-2-drjones@redhat.com
The PUD and PMD are folded into PGD when the following options are
enabled. In that case, PUD_SHIFT is equal to PMD_SHIFT and we fail
to build with the indicated errors:
CONFIG_ARM64_VA_BITS_42=y
CONFIG_ARM64_PAGE_SHIFT=16
CONFIG_PGTABLE_LEVELS=3
arch/arm64/kvm/mmu.c: In function ‘user_mem_abort’:
arch/arm64/kvm/mmu.c:798:2: error: duplicate case value
case PMD_SHIFT:
^~~~
arch/arm64/kvm/mmu.c:791:2: note: previously used here
case PUD_SHIFT:
^~~~
This fixes the issue by skipping the check on PUD huge page when PUD
and PMD are folded into PGD.
Fixes: 2f40c46021 ("KVM: arm64: Use fallback mapping sizes for contiguous huge page sizes")
Reported-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201103003009.32955-1-gshan@redhat.com
Sometimes we would get the following flow when doing an i2cset:
0x1 STATUS SLAVE_ACTIVITY=0x1 : RAW_INTR_STAT=0x514 : INTR_STAT=0x4
I2C_SLAVE_WRITE_RECEIVED
0x1 STATUS SLAVE_ACTIVITY=0x0 : RAW_INTR_STAT=0x714 : INTR_STAT=0x204
I2C_SLAVE_WRITE_REQUESTED
I2C_SLAVE_WRITE_RECEIVED
Documentation/i2c/slave-interface.rst says that I2C_SLAVE_WRITE_REQUESTED,
which is mandatory, should be sent while the data did not arrive yet. It
means in a write-request I2C_SLAVE_WRITE_REQUESTED should be reported
before any I2C_SLAVE_WRITE_RECEIVED.
By the way, I2C_SLAVE_STOP didn't be reported in the above case because
DW_IC_INTR_STAT was not 0x200.
dev->status can be used to record the current state, especially Designware
I2C controller has no interrupts to identify a write-request. This patch
makes not only I2C_SLAVE_WRITE_REQUESTED been reported first when
IC_INTR_RX_FULL is rising and dev->status isn't STATUS_WRITE_IN_PROGRESS
but also I2C_SLAVE_STOP been reported when a STOP condition is received.
Signed-off-by: Michael Wu <michael.wu@vatics.com>
Acked-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
If some bits were cleared by i2c_dw_read_clear_intrbits_slave() in
i2c_dw_isr_slave() and not handled immediately, those cleared bits would
not be shown again by later i2c_dw_read_clear_intrbits_slave(). They
therefore were forgotten to be handled.
i2c_dw_read_clear_intrbits_slave() should be called once in an ISR and take
its returned state for all later handlings.
Signed-off-by: Michael Wu <michael.wu@vatics.com>
Acked-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
The Mellanox BlueField I2C controller is only present on Mellanox
BlueField SoCs. Hence add a dependency on MELLANOX_PLATFORM, to prevent
asking the user about this driver when configuring a kernel without
Mellanox platform support.
Fixes: b5b5b32081 ("i2c: mlxbf: I2C SMBus driver for Mellanox BlueField SoC")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Correct the email addresses of the author and the maintainer
of the Mellanox BlueField I2C driver.
Fixes: b5b5b32081 ("i2c: mlxbf: I2C SMBus driver for Mellanox BlueField SoC")
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Khalil Blaiech <kblaiech@nvidia.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
The reference clock frequency remains the same across Bluefield
products. Thus, update the frequency and rename the macro.
Fixes: b5b5b32081 ("i2c: mlxbf: I2C SMBus driver for Mellanox BlueField SoC")
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Khalil Blaiech <kblaiech@nvidia.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Few wrapper functions are useless and can be inlined. So
delete mlxbf_i2c_read() and mlxbf_i2c_write() and replace
them with readl() and writel(), respectively. Also delete
mlxbf_i2c_read_data() and mlxbf_i2c_write() and replace
them with ioread32be() and iowrite32be(), respectively.
Fixes: b5b5b32081 ("i2c: mlxbf: I2C SMBus driver for Mellanox BlueField SoC")
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Vadim Pasternak <vadimp@nvidia.com>
Signed-off-by: Khalil Blaiech <kblaiech@nvidia.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
The build fails with "implicit declaration of function
'acpi_device_uid'" error. Thus, protect ACPI function calls
from being called when CONFIG_ACPI is disabled.
Fixes: b5b5b32081 ("i2c: mlxbf: I2C SMBus driver for Mellanox BlueField SoC")
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Vadim Pasternak <vadimp@nvidia.com>
Signed-off-by: Khalil Blaiech <kblaiech@nvidia.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
This enables the PEX8311 internal PCI wire interrupt and the PEX8311
local interrupt input so the local interrupts are forwarded to the PCI.
Fixes: 5855620466 ("gpio: Add GPIO support for the ACCES PCIe-IDIO-24 family")
Cc: stable@vger.kernel.org
Signed-off-by: Arnaud de Turckheim <quarium@gmail.com>
Reviewed-by: William Breathitt Gray <vilhelm.gray@gmail.com>
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Intel Tiger Lake-H has the same Thunderbolt/USB4 controller as Tiger
Lake-LP. Add the Tiger Lake-H PCI IDs to the driver list of supported
devices.
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
pm_runtime_get_sync() will increment pm usage at first and it
will resume the device later. If runtime of the device has
error or device is in inaccessible state(or other error state),
resume operation will fail. If we do not call put operation to
decrease the reference, the result is that this device cannot
enter the idle state and always stay busy or other non-idle
state.
Fixes: 249fa8217b ("USB: Add driver to control USB fast charge for iOS devices")
Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com>
Link: https://lore.kernel.org/r/20201102022650.67115-1-zhangqilong3@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When DMA_RALINK is enabled and DMADEVICES is disabled, it results in the
following Kbuild warnings:
WARNING: unmet direct dependencies detected for DMA_ENGINE
Depends on [n]: DMADEVICES [=n]
Selected by [y]:
- DMA_RALINK [=y] && STAGING [=y] && RALINK [=y] && !SOC_RT288X [=n]
WARNING: unmet direct dependencies detected for DMA_VIRTUAL_CHANNELS
Depends on [n]: DMADEVICES [=n]
Selected by [y]:
- DMA_RALINK [=y] && STAGING [=y] && RALINK [=y] && !SOC_RT288X [=n]
The reason is that DMA_RALINK selects DMA_ENGINE and DMA_VIRTUAL_CHANNELS
without depending on or selecting DMADEVICES while DMA_ENGINE and
DMA_VIRTUAL_CHANNELS are subordinate to DMADEVICES. This can also fail
building the kernel as demonstrated in a bug report.
Honor the kconfig dependency to remove unmet direct dependency warnings
and avoid any potential build failures.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=210055
Signed-off-by: Necip Fazil Yildiran <fazilyildiran@gmail.com>
Link: https://lore.kernel.org/r/20201104181522.43567-1-fazilyildiran@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
After upgrading kernel to version 5.9.x the driver was not
working anymore showing the following kernel trace:
...
mt7621-pci 1e140000.pcie: resource collision:
[mem 0x60000000-0x6fffffff] conflicts with pcie@1e140000 [mem 0x60000000-0x6fffffff]
------------[ cut here ]------------
WARNING: CPU: 2 PID: 73 at kernel/resource.c:1400
devm_request_resource+0xfc/0x10c
Modules linked in:
CPU: 2 PID: 73 Comm: kworker/2:1 Not tainted 5.9.2 #0
Workqueue: events deferred_probe_work_func
Stack : 00000000 81590000 807d0a1c 808a0000 8fd49080
807d0000 00000009 808ac820
00000001 808338d0 7fff0001 800839dc 00000049
00000001 8fe51b00 367204ab
00000000 00000000 807d0a1c 807c0000 00000001
80082358 8fe50000 00559000
00000000 8fe519f1 ffffffff 00000005 00000000
00000001 00000000 807d0000
00000009 808ac820 00000001 808338d0 00000001
803bf1b0 00000008 81390008
Call Trace:
[<8000d018>] show_stack+0x30/0x100
[<8032e66c>] dump_stack+0xa4/0xd4
[<8002db1c>] __warn+0xc0/0x134
[<8002dbec>] warn_slowpath_fmt+0x5c/0xac
[<80033b34>] devm_request_resource+0xfc/0x10c
[<80365ff8>] devm_request_pci_bus_resources+0x58/0xdc
[<8048e13c>] mt7621_pci_probe+0x8dc/0xe48
[<803d2140>] platform_drv_probe+0x40/0x94
[<803cfd94>] really_probe+0x108/0x4ec
[<803cd958>] bus_for_each_drv+0x70/0xb0
[<803d0388>] __device_attach+0xec/0x164
[<803cec8c>] bus_probe_device+0xa4/0xc0
[<803cf1c4>] deferred_probe_work_func+0x80/0xc4
[<80048444>] process_one_work+0x260/0x510
[<80048a4c>] worker_thread+0x358/0x5cc
[<8004f7d0>] kthread+0x134/0x13c
[<80007478>] ret_from_kernel_thread+0x14/0x1c
---[ end trace a9dd2e37537510d3 ]---
mt7621-pci 1e140000.pcie: Error requesting resources
mt7621-pci: probe of 1e140000.pcie failed with error -16
...
With commit 669cbc7081 ("PCI: Move DT resource setup into
devm_pci_alloc_host_bridge()"), the DT 'ranges' is parsed and populated
into resources when the host bridge is allocated. The resources are
requested as well, but that happens a 2nd time for this driver in
mt7621_pcie_request_resources(). Hence we should avoid this second
request.
Also, the bus ranges was also populated by default, so we can remove
it from mt7621_pcie_request_resources() to avoid the following trace
if we don't avoid it:
pci_bus 0000:00: busn_res: can not insert [bus 00-ff]
under domain [bus 00-ff] (conflicts with (null) [bus 00-ff])
Function 'mt7621_pcie_request_resources' has been renamed into
'mt7621_pcie_add_resources' which now is a more accurate name
for this function.
Cc: stable@vger.kernel.org #5.9.x-
Signed-off-by: Sergio Paracuellos <sergio.paracuellos@gmail.com>
Link: https://lore.kernel.org/r/20201102202515.19073-1-sergio.paracuellos@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
make clang-analyzer on x86_64 defconfig caught my attention with:
kernel/printk/printk_ringbuffer.c:885:3: warning:
Value stored to 'desc' is never read [clang-analyzer-deadcode.DeadStores]
desc = to_desc(desc_ring, head_id);
^
Commit b6cf8b3f33 ("printk: add lockless ringbuffer") introduced
desc_reserve() with this unneeded dead-store assignment.
As discussed with John Ogness privately, this is probably just some minor
left-over from previous iterations of the ringbuffer implementation. So,
simply remove this unneeded dead assignment to make clang-analyzer happy.
As compilers will detect this unneeded assignment and optimize this anyway,
the resulting object code is identical before and after this change.
No functional change. No change to object code.
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20201106034005.18822-1-lukas.bulwahn@gmail.com
We were relying on GNU ld's ability to re-link executable files in order
to extract our VDSO symbols. This behavior was deemed a bug as of
binutils-2.35 (specifically the binutils-gdb commit a87e1817a4 ("Have
the linker fail if any attempt to link in an executable is made."), but
as that has been backported to at least Debian's binutils-2.34 in may
manifest in other places.
The previous version of this was a bit of a mess: we were linking a
static executable version of the VDSO, containing only a subset of the
input symbols, which we then linked into the kernel. This worked, but
certainly wasn't a supported path through the toolchain. Instead this
new version parses the textual output of nm to produce a symbol table.
Both rely on near-zero addresses being linkable, but as we rely on weak
undefined symbols being linkable elsewhere I don't view this as a major
issue.
Fixes: e2c0cdfba7 ("RISC-V: User-facing API")
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Currently, we use PGD mappings for early DTB mapping in early_pgd
but this breaks Linux kernel on SiFive Unleashed because on SiFive
Unleashed PMP checks don't work correctly for PGD mappings.
To fix early DTB mappings on SiFive Unleashed, we use non-PGD
mappings (i.e. PMD) for early DTB access.
Fixes: 8f3a2b4a96 ("RISC-V: Move DT mapping outof fixmap")
Signed-off-by: Anup Patel <anup.patel@wdc.com>
Reviewed-by: Atish Patra <atish.patra@wdc.com>
Tested-by: Atish Patra <atish.patra@wdc.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
The copy_from_kernel_nofault() is broken on riscv because the 'dst' and
'src' are mistakenly reversed in __put_kernel_nofault() macro.
copy_to_kernel_nofault:
...
0xffffffe0003159b8 <+30>: sd a4,0(a1) # a1 aka 'src'
Fixes: d464118cdc ("riscv: implement __get_kernel_nofault and __put_user_nofault")
Signed-off-by: Changbin Du <changbin.du@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Anup Patel <anup@brainfault.org>
Tested-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Zero-fill element values for all other cpus than current, just as
when not using prealloc. This is the only way the bpf program can
ensure known initial values for all cpus ('onallcpus' cannot be
set when coming from the bpf program).
The scenario is: bpf program inserts some elements in a per-cpu
map, then deletes some (or userspace does). When later adding
new elements using bpf_map_update_elem(), the bpf program can
only set the value of the new elements for the current cpu.
When prealloc is enabled, previously deleted elements are re-used.
Without the fix, values for other cpus remain whatever they were
when the re-used entry was previously freed.
A selftest is added to validate correct operation in above
scenario as well as in case of LRU per-cpu map element re-use.
Fixes: 6c90598174 ("bpf: pre-allocate hash map elements")
Signed-off-by: David Verbeiren <david.verbeiren@tessares.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20201104112332.15191-1-david.verbeiren@tessares.net
M-Mode Linux is loaded at the start of RAM, not 2MB later. Perhaps this
should be calculated based on PAGE_OFFSET somehow? Even better would be to
deprecate text_offset and instead introduce something absolute.
Signed-off-by: Sean Anderson <seanga2@gmail.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
drm/imx: fixes and cleanups
Remove unused functions and empty callbacks, let the dw_hdmi-imx driver
reuse imx_drm_encoder_parse_of() instead of reimplementing it, replace
the custom register spinlock with the regmap default spinlock and remove
redundant tracking of enabled state in imx-tve, drop the explicit
drm_mode_config_cleanup() call in imx-drm-core, reduce the scope of edid
length variables that are not otherwise used in imx-ldb and
parallel-display, fix a memory leak in the parallel-display bind error
path, and drop an extraneous type qualifier from of_get_tve_mode().
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Philipp Zabel <p.zabel@pengutronix.de>
Link: https://patchwork.freedesktop.org/patch/msgid/7e4af582027bbec269364b95f6978d061b48271a.camel@pengutronix.de
We can't just go over linked requests because it may race with linked
timeouts. Take ctx->completion_lock in that case.
Cc: stable@vger.kernel.org # v5.7+
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Need to initialize nfsd4_copy's refcount to 1 to avoid use-after-free
warning when nfs4_put_copy is called from nfsd4_cb_offload_release.
Fixes: ce0887ac96 ("NFSD add nfs4 inter ssc to nfsd4_copy")
Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The source file nfsd_file is not constructed the same as other
nfsd_file's via nfsd_file_alloc. nfsd_file_put should not be
called to free the object; nfsd_file_put is not the inverse of
kzalloc, instead kfree is called by nfsd4_do_async_copy when done.
Fixes: ce0887ac96 ("NFSD add nfs4 inter ssc to nfsd4_copy")
Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
A late paragraph of RFC 1813 Section 3.3.11 states:
| ... if the server does not support the target type or the
| target type is illegal, the error, NFS3ERR_BADTYPE, should
| be returned. Note that NF3REG, NF3DIR, and NF3LNK are
| illegal types for MKNOD.
The Linux NFS server incorrectly returns NFSERR_INVAL in these
cases.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The TP_fast_assign() section is careful enough not to dereference
xdr->rqst if it's NULL. The TP_STRUCT__entry section is not.
Fixes: 5582863f45 ("SUNRPC: Add XDR overflow trace event")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Commit cc028a10a4 ("NFSD: Hoist status code encoding into XDR
encoder functions") missed a spot.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
It's possible that the first region picked for the new kernel will make
it impossible to fit the other segments in the required 32GB window,
especially if we have a very large initrd.
Instead of giving up, we can keep testing other regions for the kernel
until we find one that works.
Suggested-by: Ryan O'Leary <ryanoleary@google.com>
Signed-off-by: Benjamin Gwin <bgwin@google.com>
Link: https://lore.kernel.org/r/20201103201106.2397844-1-bgwin@google.com
Signed-off-by: Will Deacon <will@kernel.org>
On AMD CPUs which have the feature X86_FEATURE_AMD_STIBP_ALWAYS_ON,
STIBP is set to on and
spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT_PREFERRED
At the same time, IBPB can be set to conditional.
However, this leads to the case where it's impossible to turn on IBPB
for a process because in the PR_SPEC_DISABLE case in ib_prctl_set() the
spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT_PREFERRED
condition leads to a return before the task flag is set. Similarly,
ib_prctl_get() will return PR_SPEC_DISABLE even though IBPB is set to
conditional.
More generally, the following cases are possible:
1. STIBP = conditional && IBPB = on for spectre_v2_user=seccomp,ibpb
2. STIBP = on && IBPB = conditional for AMD CPUs with
X86_FEATURE_AMD_STIBP_ALWAYS_ON
The first case functions correctly today, but only because
spectre_v2_user_ibpb isn't updated to reflect the IBPB mode.
At a high level, this change does one thing. If either STIBP or IBPB
is set to conditional, allow the prctl to change the task flag.
Also, reflect that capability when querying the state. This isn't
perfect since it doesn't take into account if only STIBP or IBPB is
unconditionally on. But it allows the conditional feature to work as
expected, without affecting the unconditional one.
[ bp: Massage commit message and comment; space out statements for
better readability. ]
Fixes: 21998a3515 ("x86/speculation: Avoid force-disabling IBPB based on STIBP and enhanced IBRS.")
Signed-off-by: Anand K Mistry <amistry@google.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lkml.kernel.org/r/20201105163246.v2.1.Ifd7243cd3e2c2206a893ad0a5b9a4f19549e22c6@changeid
rq->xdp_prog is RCU-protected and should be accessed only with
rcu_access_pointer for the NULL check in mlx5e_poll_rx_cq.
rq->xdp_prog may change on the fly only from one non-NULL value to
another non-NULL value, so the checks in mlx5e_xdp_handle and
mlx5e_poll_rx_cq will have the same result during one NAPI cycle,
meaning that no additional synchronization is needed.
Fixes: fe45386a20 ("net/mlx5e: Use RCU to protect rq->xdp_prog")
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
During driver reload, perform firmware tear-down which results in
firmware losing the configured VXLAN ports. These ports are still
available in the driver's database. Fix this by cleaning up driver's
VXLAN database in the nic unload flow, before firmware tear-down. With
that, minimize mlx5_vxlan_destroy() to remove only what was added in
mlx5_vxlan_create() and warn on leftover UDP ports.
Fixes: 18a2b7f969 ("net/mlx5: convert to new udp_tunnel infrastructure")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
When E-switch vport is disabled, querying its hardware address is
unsupported.
Avoid setting extack error log message in such case.
Fixes: f099fde16d ("net/mlx5: E-switch, Support querying port function mac address")
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
When a rule is duplicated, the refcount of the rule is increased so only
the second deletion of the rule should cause destruction of the FTE.
Currently, the FTE will be destroyed in the first deletion of rule since
the modify_mask will be 0.
Fix it and call to destroy FTE only if all the rules (FTE's children)
have been removed.
Fixes: 718ce4d601 ("net/mlx5: Consolidate update FTE for all removal changes")
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
async_icosq_lock may be taken from softirq and non-softirq contexts. It
requires protection with spin_lock_bh, otherwise a softirq may be
triggered in the middle of the critical section, and it may deadlock if
it tries to take the same lock. This patch fixes such a scenario by
using spin_lock_bh to disable softirqs on that CPU while inside the
critical section.
Fixes: 8d94b590f1 ("net/mlx5e: Turn XSK ICOSQ into a general asynchronous one")
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
In functions mlx5e_route_lookup_ipv{4|6}() route_dev can be arbitrary net
device and not necessary mlx5 eswitch port representor. As such, in order
to ensure that route_dev is not destroyed concurrent the code needs either
explicitly take reference to the device before releasing reference to
rtable instance or ensure that caller holds rtnl lock. First approach is
chosen as a fix since rtnl lock dependency was intentionally removed from
mlx5 TC layer.
To prevent unprotected usage of route_dev in encap code take a reference to
the device before releasing rt. Don't save direct pointer to the device in
mlx5_encap_entry structure and use ifindex instead. Modify users of
route_dev pointer to properly obtain the net device instance from its
ifindex.
Fixes: 61086f3910 ("net/mlx5e: Protect encap hash table with mutex")
Fixes: 6707f74be8 ("net/mlx5e: Update hw flows when encap source mac changed")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Modify header actions are allocated during parse tc actions and only
freed during the flow creation, however, on error flow the allocated
memory is wrongly unfreed.
Fix this by calling dealloc_mod_hdr_actions in __mlx5e_add_fdb_flow
and mlx5e_add_nic_flow error flow.
Fixes: d7e75a325c ("net/mlx5e: Add offloading of E-Switch TC pedit (header re-write) actions")
Fixes: 2f4fe4cab0 ("net/mlx5e: Add offloading of NIC TC pedit (header re-write) actions")
Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Pull Kunit fixes from Shuah Khan:
"Several kunit_tool and documentation fixes"
* tag 'linux-kselftest-kunit-fixes-5.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
kunit: tools: fix kunit_tool tests for parsing test plans
Documentation: kunit: Update Kconfig parts for KUNIT's module support
kunit: test: fix remaining kernel-doc warnings
kunit: Don't fail test suites if one of them is empty
kunit: Fix kunit.py --raw_output option
Pull tracing fixes from Steven Rostedt:
- Fix off-by-one error in retrieving the context buffer for
trace_printk()
- Fix off-by-one error in stack nesting limit
- Fix recursion to not make all NMI code false positive as recursing
- Stop losing events in function tracing when transitioning between irq
context
- Stop losing events in ring buffer when transitioning between irq
context
- Fix return code of error pointer in parse_synth_field() to prevent
NULL pointer dereference.
- Fix false positive of NMI recursion in kprobe event handling
* tag 'trace-v5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
kprobes: Tell lockdep about kprobe nesting
tracing: Make -ENOMEM the default error for parse_synth_field()
ring-buffer: Fix recursion protection transitions between interrupt context
tracing: Fix the checking of stackidx in __ftrace_trace_stack
ftrace: Handle tracing when switching between context
ftrace: Fix recursion check for NMI test
tracing: Fix out of bounds write in get_trace_buf
Pull hyperv fixes from Wei Liu:
- clarify a comment (Michael Kelley)
- change a pr_warn() to pr_info() (Olaf Hering)
* tag 'hyperv-fixes-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
x86/hyperv: Clarify comment on x2apic mode
hv_balloon: disable warning when floor reached
Pull rdma fixes from Jason Gunthorpe:
"A few more merge window regressions that didn't make rc1:
- New validation in the DMA layer triggers wrong use of the DMA layer
in rxe, siw and rdmavt
- Accidental change of a hypervisor facing ABI when widening the port
speed u8 to u16 in vmw_pvrdma
- Memory leak on error unwind in SRP target"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/srpt: Fix typo in srpt_unregister_mad_agent docstring
RDMA/vmw_pvrdma: Fix the active_speed and phys_state value
IB/srpt: Fix memory leak in srpt_add_one
RDMA: Fix software RDMA drivers for dma mapping error
Pull spi fixes from Mark Brown:
"A small collection of driver specific fixes that have come in since
the merge window, nothing too major here but all good to have"
* tag 'spi-fix-v5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: fsl-dspi: fix wrong pointer in suspend/resume
spi: bcm2835: fix gpio cs level inversion
spi: imx: fix runtime pm support for !CONFIG_PM
Pull regulator fixes from Mark Brown:
"An addition to MAINTAINERS plus a fix for a nasty bootstrapping
problem which caused problems when we need to read the voltage of a
regulator that is not yet available during initialization, we were not
correctly distinguishing between this case and the case where a
regulator is put into a bypass mode"
* tag 'regulator-fix-v5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: defer probe when trying to get voltage from unresolved supply
MAINTAINERS: Add entry for Qualcomm IPQ4019 VQMMC regulator
Pull power management fixes from Rafael Wysocki:
"These fix the device links support in runtime PM, correct mistakes in
the cpuidle documentation, fix the handling of policy limits changes
in the schedutil cpufreq governor, fix assorted issues in the OPP
(operating performance points) framework and make one janitorial
change.
Specifics:
- Unify the handling of managed and stateless device links in the
runtime PM framework and prevent runtime PM references to devices
from being leaked after device link removal (Rafael Wysocki).
- Fix two mistakes in the cpuidle documentation (Julia Lawall).
- Prevent the schedutil cpufreq governor from missing policy limits
updates in some cases (Viresh Kumar).
- Prevent static OPPs from being dropped by mistake (Viresh Kumar).
- Prevent helper function in the OPP framework from returning
prematurely (Viresh Kumar).
- Prevent opp_table_lock from being held too long during removal of
OPP tables with no more active references (Viresh Kumar).
- Drop redundant semicolon from the Intel RAPL power capping driver
(Tom Rix)"
* tag 'pm-5.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM: runtime: Resume the device earlier in __device_release_driver()
PM: runtime: Drop pm_runtime_clean_up_links()
PM: runtime: Drop runtime PM references to supplier on link removal
powercap/intel_rapl: remove unneeded semicolon
Documentation: PM: cpuidle: correct path name
Documentation: PM: cpuidle: correct typo
cpufreq: schedutil: Don't skip freq update if need_freq_update is set
opp: Reduce the size of critical section in _opp_table_kref_release()
opp: Fix early exit from dev_pm_opp_register_set_opp_helper()
opp: Don't always remove static OPPs in _of_add_opp_table_v1()
Pull highmem initialization fix from Mike Rapoport:
"Fix highmem initialization on arm and xtensa
Recent refactoring of memblock iterators has broken initialization of
highmem on arm and xtensa because it changed the way beginning and end
of memory regions are rounded to PFNs. This fix restores the original
behaviour"
* tag 'fixes-2020-11-05' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
ARM, xtensa: highmem: avoid clobbering non-page aligned memory reservations
Pull gfs2 fixes from Andreas Gruenbacher:
"Various gfs2 fixes"
* tag 'gfs2-v5.10-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
gfs2: Wake up when sd_glock_disposal becomes zero
gfs2: Don't call cancel_delayed_work_sync from within delete work function
gfs2: check for live vs. read-only file system in gfs2_fitrim
gfs2: don't initialize statfs_change inodes in spectator mode
gfs2: Split up gfs2_meta_sync into inode and rgrp versions
gfs2: init_journal's undo directive should also undo the statfs inodes
gfs2: Add missing truncate_inode_pages_final for sd_aspace
gfs2: Free rd_bits later in gfs2_clear_rgrpd to fix use-after-free
Pull PCI fixes from Bjorn Helgaas:
- Fix ACS regression that broke device pass-through (Rajat Jain)
- Revert DesignWare ATU memory resource to use last entry to fix
Tegra194 regression (Rob Herring)
- Remove duplicate mvebu resource requests to fix regression on Turris
Omnia (Rob Herring)
* tag 'pci-v5.10-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
PCI: mvebu: Fix duplicate resource requests
PCI: dwc: Restore ATU memory resource setup to use last entry
PCI: Always enable ACS even if no ACS Capability
RISC-V limits the physical memory size by -PAGE_OFFSET. Any memory beyond
that size from DRAM start is unusable. Just remove any memblock pointing
to those memory region without worrying about computing the maximum size.
Signed-off-by: Atish Patra <atish.patra@wdc.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Previous commit changed how we index the registered credentials, but
neglected to update one spot that is used when the personalities are
iterated through ->show_fdinfo(). Ensure we use the right struct type
for the iteration.
Reported-by: syzbot+a6d494688cdb797bdfce@syzkaller.appspotmail.com
Fixes: 1e6fa5216a ("io_uring: COW io_identity on mismatch")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
If there is a long-standing request of one task locking up execution of
deferred requests, and the defer list contains requests of another task
(all files-less), then a potential execution of __io_uring_task_cancel()
by that another task will sleep until that first long-standing request
completion, and that may take long.
E.g.
tsk1: req1/read(empty_pipe) -> tsk2: req(DRAIN)
Then __io_uring_task_cancel(tsk2) waits for req1 completion.
It seems we even can manufacture a complicated case with many tasks
sharing many rings that can lock them forever.
Cancel deferred requests for __io_uring_task_cancel() as well.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
htmldocs fails with:
drivers/infiniband/ulp/srpt/ib_srpt.c:630: warning: Function parameter or member 'port_cnt' not described in 'srpt_unregister_mad_agent'
Fixes: 372a178628 ("IB/srpt: Fix memory leak in srpt_add_one")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
In gpiochip_setup_dev() the call to gpiolib_cdev_register() indirectly
calls device_add(). This is still required for the sysfs even when
CONFIG_GPIO_CDEV is not selected in the build.
Replace the stubbed functions in gpiolib-cdev.h with macros in gpiolib.c
that perform the required device_add() and device_del() when
CONFIG_GPIO_CDEV is not selected.
Fixes: d143493c01 (gpiolib: make cdev a build option)
Reported-by: Nicolas Schichan <nschichan@freebox.fr>
Signed-off-by: Kent Gibson <warthog618@gmail.com>
Tested-by: Nicolas Schichan <nschichan@freebox.fr>
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Pull NVMe fixes from Christoph:
"nvme fixes for 5.10:
- revert a nvme_queue size optimization (Keith Bush)
- fabrics timeout races fixes (Chao Leng and Sagi Grimberg)"
* tag 'nvme-5.10-2020-11-05' of git://git.infradead.org/nvme:
nvme-tcp: avoid repeated request completion
nvme-rdma: avoid repeated request completion
nvme-tcp: avoid race between time out and tear down
nvme-rdma: avoid race between time out and tear down
nvme: introduce nvme_sync_io_queues
Revert "nvme-pci: remove last_sq_tail"
Fix build warnings when CONFIG_PM is not set/enabled:
../drivers/media/platform/marvell-ccic/mmp-driver.c:324:12: warning: 'mmpcam_runtime_suspend' defined but not used [-Wunused-function]
324 | static int mmpcam_runtime_suspend(struct device *dev)
../drivers/media/platform/marvell-ccic/mmp-driver.c:310:12: warning: 'mmpcam_runtime_resume' defined but not used [-Wunused-function]
310 | static int mmpcam_runtime_resume(struct device *dev)
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
The addition of MT8183 support added a dependency on the SCP remoteproc
module. However the initial patch used the "select" Kconfig directive,
which may result in the SCP module to not be compiled if remoteproc was
disabled. In such a case, mtk-vcodec would try to link against
non-existent SCP symbols. "select" was clearly misused here as explained
in kconfig-language.txt.
Replace this by a "depends" directive on at least one of the VPU and
SCP modules, to allow the driver to be compiled as long as one of these
is enabled, and adapt the code to support this new scenario.
Also adapt the Kconfig text to explain the extra requirements for MT8173
and MT8183.
Reported-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Alexandre Courbot <acourbot@chromium.org>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Fixes: bf1d556ad4 ("media: mtk-vcodec: abstract firmware interface")
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
mtk-vcodec supports two kinds of firmware, VPU and SCP. Both were
supported from the same source files, but this is clearly unclean and
makes it more difficult to disable support for one or the other.
Move these implementations into their own file, after adding the
necessary private interfaces.
[hverkuil: smatch fix: mtk_vcodec_fw_vpu_init() error: uninitialized symbol 'rst_id'.]
Signed-off-by: Alexandre Courbot <acourbot@chromium.org>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Fixes: bf1d556ad4 ("media: mtk-vcodec: abstract firmware interface")
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
When _PAGE_ACCESSED is not set, a minor fault is expected.
To do this, TLB miss exception ANDs _PAGE_PRESENT and _PAGE_ACCESSED
into the L2 entry valid bit.
To simplify the processing and reduce the number of instructions in
TLB miss exceptions, manage it as an APG bit and get it next to
_PAGE_GUARDED bit to allow a copy in one go. Then declare the
corresponding groups as handling all accesses as user accesses.
As the PP bits always define user as No Access, it will generate
a fault.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/80f488db230c6b0e7b3b990d72bd94a8a069e93e.1602492856.git.christophe.leroy@csgroup.eu
The kernel expects pte_young() to work regardless of CONFIG_SWAP.
Make sure a minor fault is taken to set _PAGE_ACCESSED when it
is not already set, regardless of the selection of CONFIG_SWAP.
This adds at least 3 instructions to the TLB miss exception
handlers fast path. Following patch will reduce this overhead.
Also update the rotation instruction to the correct number of bits
to reflect all changes done to _PAGE_ACCESSED over time.
Fixes: d069cb4373 ("powerpc/8xx: Don't touch ACCESSED when no SWAP.")
Fixes: 5f356497c3 ("powerpc/8xx: remove unused _PAGE_WRITETHRU")
Fixes: e0a8e0d90a ("powerpc/8xx: Handle PAGE_USER via APG bits")
Fixes: 5b2753fc3e ("powerpc/8xx: Implementation of PAGE_EXEC")
Fixes: a891c43b97 ("powerpc/8xx: Prepare handlers for _PAGE_HUGE for 512k pages.")
Cc: stable@vger.kernel.org
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/af834e8a0f1fa97bfae65664950f0984a70c4750.1602492856.git.christophe.leroy@csgroup.eu
If there is a device BTRFS_DEV_REPLACE_DEVID without the device replace
item, then it means the filesystem is inconsistent state. This is either
corruption or a crafted image. Fail the mount as this needs a closer
look what is actually wrong.
As of now if BTRFS_DEV_REPLACE_DEVID is present without the replace
item, in __btrfs_free_extra_devids() we determine that there is an
extra device, and free those extra devices but continue to mount the
device.
However, we were wrong in keeping tack of the rw_devices so the syzbot
testcase failed:
WARNING: CPU: 1 PID: 3612 at fs/btrfs/volumes.c:1166 close_fs_devices.part.0+0x607/0x800 fs/btrfs/volumes.c:1166
Kernel panic - not syncing: panic_on_warn set ...
CPU: 1 PID: 3612 Comm: syz-executor.2 Not tainted 5.9.0-rc4-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x198/0x1fd lib/dump_stack.c:118
panic+0x347/0x7c0 kernel/panic.c:231
__warn.cold+0x20/0x46 kernel/panic.c:600
report_bug+0x1bd/0x210 lib/bug.c:198
handle_bug+0x38/0x90 arch/x86/kernel/traps.c:234
exc_invalid_op+0x14/0x40 arch/x86/kernel/traps.c:254
asm_exc_invalid_op+0x12/0x20 arch/x86/include/asm/idtentry.h:536
RIP: 0010:close_fs_devices.part.0+0x607/0x800 fs/btrfs/volumes.c:1166
RSP: 0018:ffffc900091777e0 EFLAGS: 00010246
RAX: 0000000000040000 RBX: ffffffffffffffff RCX: ffffc9000c8b7000
RDX: 0000000000040000 RSI: ffffffff83097f47 RDI: 0000000000000007
RBP: dffffc0000000000 R08: 0000000000000001 R09: ffff8880988a187f
R10: 0000000000000000 R11: 0000000000000001 R12: ffff88809593a130
R13: ffff88809593a1ec R14: ffff8880988a1908 R15: ffff88809593a050
close_fs_devices fs/btrfs/volumes.c:1193 [inline]
btrfs_close_devices+0x95/0x1f0 fs/btrfs/volumes.c:1179
open_ctree+0x4984/0x4a2d fs/btrfs/disk-io.c:3434
btrfs_fill_super fs/btrfs/super.c:1316 [inline]
btrfs_mount_root.cold+0x14/0x165 fs/btrfs/super.c:1672
The fix here is, when we determine that there isn't a replace item
then fail the mount if there is a replace target device (devid 0).
CC: stable@vger.kernel.org # 4.19+
Reported-by: syzbot+4cfe71a4da060be47502@syzkaller.appspotmail.com
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Based on user feedback update the message printed when scrub fails to
start due to write requirements. To make a distinction add a device id
to the messages.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Smatch complains that this code dereferences "entry" before checking
whether it's NULL on the next line. Fortunately, rb_entry() will never
return NULL so it doesn't cause a problem. We can clean up the NULL
checking a bit to silence the warning and make the code more clear.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
To help with debugging, print the type of the block rsv when we fail to
use our target block rsv in btrfs_use_block_rsv.
This now produces:
[ 544.672035] BTRFS: block rsv 1 returned -28
which is still cryptic without consulting the enum in block-rsv.h but I
guess it's better than nothing.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ add note from Nikolay ]
Signed-off-by: David Sterba <dsterba@suse.com>
On 32-bit systems, this shift will overflow for files larger than 4GB as
start_index is unsigned long while the calls to btrfs_delalloc_*_space
expect u64.
CC: stable@vger.kernel.org # 4.4+
Fixes: df480633b8 ("btrfs: extent-tree: Switch to new delalloc space reserve and release")
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: David Sterba <dsterba@suse.com>
[ define the variable instead of repeating the shift ]
Signed-off-by: David Sterba <dsterba@suse.com>
Only USB4 lane 0 adapter has the USB4 port capability for wakes so only
program wakes on such adapters.
Fixes: b2911a593a ("thunderbolt: Enable wakes from system suspend")
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Some calls in the debugfs interface are made to the linux/uaccess.h header,
but the header is not referenced. So, for x86_64 architectures, this
dependency seems to be pulled in elsewhere, which leads to a successful
compilation. However, on arm/arm64 architectures, it was found to error out
on implicit declarations.
This change fixes the implicit declaration error by adding the
linux/uaccess.h header.
Fixes: 54e418106c ("thunderbolt: Add debugfs interface")
Signed-off-by: Casey Bowman <casey.g.bowman@intel.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
The svc->key field is not released as it should be if ida_simple_get()
fails so fix that.
Fixes: 9aabb68568 ("thunderbolt: Fix to check return value of ida_simple_get")
Cc: stable@vger.kernel.org
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
The mcp2221 driver GPIO output handling has has several issues.
* A wrong value is used for the GPIO direction.
* Wrong offsets are calculated for some GPIO set value/set direction
operations, when offset is larger than 0.
This has been fixed by introducing proper manifest constants for the
direction encoding, and using 'offsetof' when calculating GPIO
register offsets.
The updated driver has been tested with the Sparx5 pcb134/pcb135
board, which has the mcp2221 device with several (output) GPIO's.
Fixes: 328de1c519 ("HID: mcp2221: add GPIO functionality support")
Reviewed-by: Rishi Gupta <gupt21@gmail.com>
Signed-off-by: Lars Povlsen <lars.povlsen@microchip.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Some HID devices don't use a report ID because they only have a single
report. In those cases, the report ID in struct hid_report will be zero
and the data for the report will start at the first byte, so don't skip
over the first byte.
Signed-off-by: Pablo Ceballos <pceballos@google.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Use the uic_cmd->cmd_active as a flag to track the lifecycle of an UIC cmd.
The flag is set before sending the UIC cmd and cleared in IRQ handler. When
a PMC or UIC cmd completion timeout happens, if the flag is not set,
instead of returning timeout error, we still treat it as a successful
operation. This is to deal with the scenario in which completion has been
raised but the one waiting for the completion cannot be awaken in time due
to kernel scheduling problem.
Link: https://lore.kernel.org/r/1604384682-15837-3-git-send-email-cang@codeaurora.org
Reviewed-by: Stanley Chu <stanley.chu@mediatek.com>
Signed-off-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The scsi_block_reqs_cnt increased in ufshcd_hold() is supposed to be
decreased back in ufshcd_ungate_work() in a paired way. However, if
specific ufshcd_hold/release sequences are met, it is possible that
scsi_block_reqs_cnt is increased twice but only one ungate work is
queued. To make sure scsi_block_reqs_cnt is handled by ufshcd_hold() and
ufshcd_ungate_work() in a paired way, increase it only if queue_work()
returns true.
Link: https://lore.kernel.org/r/1604384682-15837-2-git-send-email-cang@codeaurora.org
Reviewed-by: Hongwu Su <hongwus@codeaurora.org>
Reviewed-by: Stanley Chu <stanley.chu@mediatek.com>
Reviewed-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
There's no reason to flush an entire file when we're unsharing part of
a file. Therefore, only initiate writeback on the selected range.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
noc/axi/ahb are bus clk, not peripheral clk.
Since peripheral clk has a limitation that for peripheral clock slice,
IP clock slices must be stopped to change the clock source.
However if the bus clk is marked as critical clk peripheral, the
assigned clock parent operation will fail.
So we added CLK_SET_PARENT_GATE flag to avoid glitch.
And add imx8m_clk_hw_composite_bus_critical for bus critical clock usage
Fixes: 936c383673 ("clk: imx: fix composite peripheral flags")
Reviewed-by: Abel Vesa <abel.vesa@nxp.com>
Reported-by: Abel Vesa <abel.vesa@nxp.com>
Signed-off-by: Peng Fan <peng.fan@nxp.com>
Link: https://lore.kernel.org/r/1604229834-25594-1-git-send-email-peng.fan@nxp.com
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Commit f89c696e7f ("drm/mediatek: mtk_dpi: Convert to bridge driver")
introduced the following build warning with W=1
drivers/gpu/drm/mediatek/mtk_dpi.c:530:39: warning: unused variable 'mtk_dpi_encoder_funcs' [-Wunused-const-variable]
static const struct drm_encoder_funcs mtk_dpi_encoder_funcs = {
This struct is and the 'mtk_dpi_encoder_destroy()' are not needed
anymore, so remove them.
Fixes: f89c696e7f ("drm/mediatek: mtk_dpi: Convert to bridge driver")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Enric Balletbo i Serra <enric.balletbo@collabora.com>
Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org>
Andreas reported that commit ee0a49a687 ("powerpc/uaccess: Switch
__put_user_size_allowed() to __put_user_asm_goto()") broke
CLONE_CHILD_SETTID.
Further inspection showed that the put_user() in schedule_tail() was
missing entirely, the store not emitted by the compiler.
<.schedule_tail>:
mflr r0
std r0,16(r1)
stdu r1,-112(r1)
bl <.finish_task_switch>
ld r9,2496(r3)
cmpdi cr7,r9,0
bne cr7,<.schedule_tail+0x60>
ld r3,392(r13)
ld r9,1392(r3)
cmpdi cr7,r9,0
beq cr7,<.schedule_tail+0x3c>
li r4,0
li r5,0
bl <.__task_pid_nr_ns>
nop
bl <.calculate_sigpending>
nop
addi r1,r1,112
ld r0,16(r1)
mtlr r0
blr
nop
nop
nop
bl <.__balance_callback>
b <.schedule_tail+0x1c>
Notice there are no stores other than to the stack. There should be a
stw in there for the store to current->set_child_tid.
This is only seen with GCC 4.9 era compilers (tested with 4.9.3 and
4.9.4), and only when CONFIG_PPC_KUAP is disabled.
When CONFIG_PPC_KUAP=y, the inline asm that's part of the isync()
and mtspr() inlined via allow_user_access() seems to be enough to
avoid the bug.
We already have a macro to work around this (or a similar bug), called
asm_volatile_goto which includes an empty asm block to tickle the
compiler into generating the right code. So use that.
With this applied the code generation looks more like it will work:
<.schedule_tail>:
mflr r0
std r31,-8(r1)
std r0,16(r1)
stdu r1,-144(r1)
std r3,112(r1)
bl <._mcount>
nop
ld r3,112(r1)
bl <.finish_task_switch>
ld r9,2624(r3)
cmpdi cr7,r9,0
bne cr7,<.schedule_tail+0xa0>
ld r3,2408(r13)
ld r31,1856(r3)
cmpdi cr7,r31,0
beq cr7,<.schedule_tail+0x80>
li r4,0
li r5,0
bl <.__task_pid_nr_ns>
nop
li r9,-1
clrldi r9,r9,12
cmpld cr7,r31,r9
bgt cr7,<.schedule_tail+0x80>
lis r9,16
rldicr r9,r9,32,31
subf r9,r31,r9
cmpldi cr7,r9,3
ble cr7,<.schedule_tail+0x80>
li r9,0
stw r3,0(r31) <-- stw
nop
bl <.calculate_sigpending>
nop
addi r1,r1,144
ld r0,16(r1)
ld r31,-8(r1)
mtlr r0
blr
nop
bl <.__balance_callback>
b <.schedule_tail+0x30>
Fixes: ee0a49a687 ("powerpc/uaccess: Switch __put_user_size_allowed() to __put_user_asm_goto()")
Reported-by: Andreas Schwab <schwab@linux-m68k.org>
Tested-by: Andreas Schwab <schwab@linux-m68k.org>
Suggested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201104111742.672142-1-mpe@ellerman.id.au
Some messages sent by the MDS entail a session sequence number
increment, and the MDS will drop certain types of requests on the floor
when the sequence numbers don't match.
In particular, a REQUEST_CLOSE message can cross with one of the
sequence morphing messages from the MDS which can cause the client to
stall, waiting for a response that will never come.
Originally, this meant an up to 5s delay before the recurring workqueue
job kicked in and resent the request, but a recent change made it so
that the client would never resend, causing a 60s stall unmounting and
sometimes a blockisting event.
Add a new helper for incrementing the session sequence and then testing
to see whether a REQUEST_CLOSE needs to be resent, and move the handling
of CEPH_MDS_SESSION_CLOSING into that function. Change all of the
bare sequence counter increments to use the new helper.
Reorganize check_session_state with a switch statement. It should no
longer be called when the session is CLOSING, so throw a warning if it
ever is (but still handle that case sanely).
[ idryomov: whitespace, pr_err() call fixup ]
URL: https://tracker.ceph.com/issues/47563
Fixes: fa99677342 ("ceph: fix potential mdsc use-after-free crash")
Reported-by: Patrick Donnelly <pdonnell@redhat.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Prior to commit 0f71c60ffd ("PCI: dwc: Remove storing of PCI resources"),
the DWC driver was setting up the last memory resource rather than the
first memory resource. This doesn't matter for most platforms which only
have 1 memory resource, but it broke Tegra194 which has a 2nd
(prefetchable) memory region that requires an ATU entry. The first region
on Tegra194 relies on the default 1:1 pass-thru of outbound transactions
and doesn't need an ATU entry.
Fixes: 0f71c60ffd ("PCI: dwc: Remove storing of PCI resources")
Link: https://lore.kernel.org/r/20201026154852.221483-1-robh@kernel.org
Reported-by: Vidya Sagar <vidyas@nvidia.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Acked-by: Jingoo Han <jingoohan1@gmail.com>
Cc: Gustavo Pimentel <gustavo.pimentel@synopsys.com>
Marc Kleine-Budde says:
====================
pull-request: can 2020-11-03
The first two patches are by Oleksij Rempel and they add a generic
can-controller Device Tree yaml binding and convert the text based binding
of the flexcan driver to a yaml based binding.
Zhang Changzhong's patch fixes a remove_proc_entry warning in the AF_CAN
core.
A patch by me fixes a kfree_skb() call from IRQ context in the rx-offload
helper.
Vincent Mailhol contributes a patch to prevent a call to kfree_skb() in
hard IRQ context in can_get_echo_skb().
Oliver Hartkopp's patch fixes the length calculation for RTR CAN frames
in the __can_get_echo_skb() helper.
Oleksij Rempel's patch fixes a use-after-free that shows up with j1939 in
can_create_echo_skb().
Yegor Yefremov contributes 4 patches to enhance the j1939 documentation.
Zhang Changzhong's patch fixes a hanging task problem in j1939_sk_bind()
if the netdev is down.
Then there are three patches for the newly added CAN_ISOTP protocol. Geert
Uytterhoeven enhances the kconfig help text. Oliver Hartkopp's patch adds
missing RX timeout handling in listen-only mode and Colin Ian King's patch
decreases the generated object code by 926 bytes.
Zhang Changzhong contributes a patch for the ti_hecc driver that fixes the
error path in the probe function.
Navid Emamdoost's patch for the xilinx_can driver fixes the error handling
in case of failing pm_runtime_get_sync().
There are two patches for the peak_usb driver. Dan Carpenter adds range
checking in decode operations and Stephane Grosjean's patch fixes
a timestamp wrapping problem.
Stephane Grosjean's patch for th peak_canfd driver fixes echo management if
loopback is on.
The next three patches all target the mcp251xfd driver. The first one is
by me and it increased the severity of CRC read error messages. The kernel
test robot removes an unneeded semicolon and Tom Rix removes unneeded
break in several switch-cases.
The last 4 patches are by Joakim Zhang and target the flexcan driver,
the first three fix ECC related device specific quirks for the LS1021A,
LX2160A and the VF610 SoC. The last patch disable wakeup completely upon
driver remove.
* tag 'linux-can-fixes-for-5.10-20201103' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can: (27 commits)
can: flexcan: flexcan_remove(): disable wakeup completely
can: flexcan: add ECC initialization for VF610
can: flexcan: add ECC initialization for LX2160A
can: flexcan: remove FLEXCAN_QUIRK_DISABLE_MECR quirk for LS1021A
can: mcp251xfd: remove unneeded break
can: mcp251xfd: mcp251xfd_regmap_nocrc_read(): fix semicolon.cocci warnings
can: mcp251xfd: mcp251xfd_regmap_crc_read(): increase severity of CRC read error messages
can: peak_canfd: pucan_handle_can_rx(): fix echo management when loopback is on
can: peak_usb: peak_usb_get_ts_time(): fix timestamp wrapping
can: peak_usb: add range checking in decode operations
can: xilinx_can: handle failure cases of pm_runtime_get_sync
can: ti_hecc: ti_hecc_probe(): add missed clk_disable_unprepare() in error path
can: isotp: padlen(): make const array static, makes object smaller
can: isotp: isotp_rcv_cf(): enable RX timeout handling in listen-only mode
can: isotp: Explain PDU in CAN_ISOTP help text
can: j1939: j1939_sk_bind(): return failure if netdev is down
can: j1939: use backquotes for code samples
can: j1939: swap addr and pgn in the send example
can: j1939: fix syntax and spelling
can: j1939: rename jacd tool
...
====================
Link: https://lore.kernel.org/r/<20201103220636.972106-1-mkl@pengutronix.de>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Since commit 530b5affc6 ("spi: fsl-dspi: fix use-after-free in
remove path"), this driver causes a "NULL pointer dereference"
in dspi_suspend/resume.
This is because since this commit, the drivers private data point to
"dspi" instead of "ctlr", the codes in suspend and resume func were
not modified correspondly.
Fixes: 530b5affc6 ("spi: fsl-dspi: fix use-after-free in remove path")
Signed-off-by: Zhao Qiang <qiang.zhao@nxp.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://lore.kernel.org/r/20201103020546.1822-1-qiang.zhao@nxp.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Current io_match_files() check in io_cqring_overflow_flush() is useless
because requests drop ->files before going to the overflow list, however
linked to it request do not, and we don't check them.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We can't bundle this into one operation, as the identity may not have
originated from the tctx to begin with. Drop one ref for each of them
separately, if they don't match the static assignment. If we don't, then
if the identity is a lookup from registered credentials, we could be
freeing that identity as we're dropping a reference assuming it came from
the tctx. syzbot reports this as a use-after-free, as the identity is
still referencable from idr lookup:
==================================================================
BUG: KASAN: use-after-free in instrument_atomic_read_write include/linux/instrumented.h:101 [inline]
BUG: KASAN: use-after-free in atomic_fetch_add_relaxed include/asm-generic/atomic-instrumented.h:142 [inline]
BUG: KASAN: use-after-free in __refcount_add include/linux/refcount.h:193 [inline]
BUG: KASAN: use-after-free in __refcount_inc include/linux/refcount.h:250 [inline]
BUG: KASAN: use-after-free in refcount_inc include/linux/refcount.h:267 [inline]
BUG: KASAN: use-after-free in io_init_req fs/io_uring.c:6700 [inline]
BUG: KASAN: use-after-free in io_submit_sqes+0x15a9/0x25f0 fs/io_uring.c:6774
Write of size 4 at addr ffff888011e08e48 by task syz-executor165/8487
CPU: 1 PID: 8487 Comm: syz-executor165 Not tainted 5.10.0-rc1-next-20201102-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x107/0x163 lib/dump_stack.c:118
print_address_description.constprop.0.cold+0xae/0x4c8 mm/kasan/report.c:385
__kasan_report mm/kasan/report.c:545 [inline]
kasan_report.cold+0x1f/0x37 mm/kasan/report.c:562
check_memory_region_inline mm/kasan/generic.c:186 [inline]
check_memory_region+0x13d/0x180 mm/kasan/generic.c:192
instrument_atomic_read_write include/linux/instrumented.h:101 [inline]
atomic_fetch_add_relaxed include/asm-generic/atomic-instrumented.h:142 [inline]
__refcount_add include/linux/refcount.h:193 [inline]
__refcount_inc include/linux/refcount.h:250 [inline]
refcount_inc include/linux/refcount.h:267 [inline]
io_init_req fs/io_uring.c:6700 [inline]
io_submit_sqes+0x15a9/0x25f0 fs/io_uring.c:6774
__do_sys_io_uring_enter+0xc8e/0x1b50 fs/io_uring.c:9159
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x440e19
Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 eb 0f fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007fff644ff178 EFLAGS: 00000246 ORIG_RAX: 00000000000001aa
RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 0000000000440e19
RDX: 0000000000000000 RSI: 000000000000450c RDI: 0000000000000003
RBP: 0000000000000004 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000022b4850
R13: 0000000000000010 R14: 0000000000000000 R15: 0000000000000000
Allocated by task 8487:
kasan_save_stack+0x1b/0x40 mm/kasan/common.c:48
kasan_set_track mm/kasan/common.c:56 [inline]
__kasan_kmalloc.constprop.0+0xc2/0xd0 mm/kasan/common.c:461
kmalloc include/linux/slab.h:552 [inline]
io_register_personality fs/io_uring.c:9638 [inline]
__io_uring_register fs/io_uring.c:9874 [inline]
__do_sys_io_uring_register+0x10f0/0x40a0 fs/io_uring.c:9924
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Freed by task 8487:
kasan_save_stack+0x1b/0x40 mm/kasan/common.c:48
kasan_set_track+0x1c/0x30 mm/kasan/common.c:56
kasan_set_free_info+0x1b/0x30 mm/kasan/generic.c:355
__kasan_slab_free+0x102/0x140 mm/kasan/common.c:422
slab_free_hook mm/slub.c:1544 [inline]
slab_free_freelist_hook+0x5d/0x150 mm/slub.c:1577
slab_free mm/slub.c:3140 [inline]
kfree+0xdb/0x360 mm/slub.c:4122
io_identity_cow fs/io_uring.c:1380 [inline]
io_prep_async_work+0x903/0xbc0 fs/io_uring.c:1492
io_prep_async_link fs/io_uring.c:1505 [inline]
io_req_defer fs/io_uring.c:5999 [inline]
io_queue_sqe+0x212/0xed0 fs/io_uring.c:6448
io_submit_sqe fs/io_uring.c:6542 [inline]
io_submit_sqes+0x14f6/0x25f0 fs/io_uring.c:6784
__do_sys_io_uring_enter+0xc8e/0x1b50 fs/io_uring.c:9159
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9
The buggy address belongs to the object at ffff888011e08e00
which belongs to the cache kmalloc-96 of size 96
The buggy address is located 72 bytes inside of
96-byte region [ffff888011e08e00, ffff888011e08e60)
The buggy address belongs to the page:
page:00000000a7104751 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x11e08
flags: 0xfff00000000200(slab)
raw: 00fff00000000200 ffffea00004f8540 0000001f00000002 ffff888010041780
raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff888011e08d00: 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc
ffff888011e08d80: 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc
> ffff888011e08e00: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
^
ffff888011e08e80: 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc
ffff888011e08f00: 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc
==================================================================
Reported-by: syzbot+625ce3bb7835b63f7f3d@syzkaller.appspotmail.com
Fixes: 1e6fa5216a ("io_uring: COW io_identity on mismatch")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Track if a given task io_uring context contains SQPOLL instances, so we
can iterate those for cancelation (and request counts). This ensures that
we properly wait on SQPOLL contexts, and find everything that needs
canceling.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This can't currently happen, but will be possible shortly. Handle missing
files just like we do not being able to grab a needed mm, and mark the
request as needing cancelation.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
When an exception/interrupt hits kernel space and the kernel is not
currently in the idle task then RCU must be watching.
irqentry_enter() validates this via rcu_irq_enter_check_tick(), which in
turn invokes lockdep when taking a lock. But at that point lockdep does not
yet know about the fact that interrupts have been disabled by the CPU,
which triggers a lockdep splat complaining about inconsistent state.
Invoking trace_hardirqs_off() before rcu_irq_enter_check_tick() defeats the
point of rcu_irq_enter_check_tick() because trace_hardirqs_off() uses RCU.
So use the same sequence as for the idle case and tell lockdep about the
irq state change first, invoke the RCU check and then do the lockdep and
tracer update.
Fixes: a5497bab5f ("entry: Provide generic interrupt entry/exit code")
Reported-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/87y2jhl19s.fsf@nanos.tec.linutronix.de
The kernel has always allowed directories to have the rtinherit flag
set, even if there is no rt device, so this check is wrong.
Fixes: 80e4e12688 ("xfs: scrub inodes")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
In commit 7588cbeec6, we tried to fix a race stemming from the lack of
coordination between higher level code that wants to allocate and remap
CoW fork extents into the data fork. Christoph cites as examples the
always_cow mode, and a directio write completion racing with writeback.
According to the comments before the goto retry, we want to restart the
lookup to catch the extent in the data fork, but we don't actually reset
whichfork or cow_fsb, which means the second try executes using stale
information. Up until now I think we've gotten lucky that either
there's something left in the CoW fork to cause cow_fsb to be reset, or
either data/cow fork sequence numbers have advanced enough to force a
fresh lookup from the data fork. However, if we reach the retry with an
empty stable CoW fork and a stable data fork, neither of those things
happens. The retry foolishly re-calls xfs_convert_blocks on the CoW
fork which fails again. This time, we toss the write.
I've recently been working on extending reflink to the realtime device.
When the realtime extent size is larger than a single block, we have to
force the page cache to CoW the entire rt extent if a write (or
fallocate) are not aligned with the rt extent size. The strategy I've
chosen to deal with this is derived from Dave's blocksize > pagesize
series: dirtying around the write range, and ensuring that writeback
always starts mapping on an rt extent boundary. This has brought this
race front and center, since generic/522 blows up immediately.
However, I'm pretty sure this is a bug outright, independent of that.
Fixes: 7588cbeec6 ("xfs: retry COW fork delalloc conversion when no extent was found")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
The iomap writepage error handling logic is a mash of old and
slightly broken XFS writepage logic. When keepwrite writeback state
tracking was introduced in XFS in commit 0d085a529b ("xfs: ensure
WB_SYNC_ALL writeback handles partial pages correctly"), XFS had an
additional cluster writeback context that scanned ahead of
->writepage() to process dirty pages over the current ->writepage()
extent mapping. This context expected a dirty page and required
retention of the TOWRITE tag on partial page processing so the
higher level writeback context would revisit the page (in contrast
to ->writepage(), which passes a page with the dirty bit already
cleared).
The cluster writeback mechanism was eventually removed and some of
the error handling logic folded into the primary writeback path in
commit 150d5be09c ("xfs: remove xfs_cancel_ioend"). This patch
accidentally conflated the two contexts by using the keepwrite logic
in ->writepage() without accounting for the fact that the page is
not dirty. Further, the keepwrite logic has no practical effect on
the core ->writepage() caller (write_cache_pages()) because it never
revisits a page in the current function invocation.
Technically, the page should be redirtied for the keepwrite logic to
have any effect. Otherwise, write_cache_pages() may find the tagged
page but will skip it since it is clean. Even if the page was
redirtied, however, there is still no practical effect to keepwrite
since write_cache_pages() does not wrap around within a single
invocation of the function. Therefore, the dirty page would simply
end up retagged on the next writeback sequence over the associated
range.
All that being said, none of this really matters because redirtying
a partially processed page introduces a potential infinite redirty
-> writeback failure loop that deviates from the current design
principle of clearing the dirty state on writepage failure to avoid
building up too much dirty, unreclaimable memory on the system.
Therefore, drop the spurious keepwrite usage and dirty state
clearing logic from iomap_writepage_map(), treat the partially
processed page the same as a fully processed page, and let the
imminent ioend failure clean up the writeback state.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
iomap writeback mapping failure only calls into ->discard_page() if
the current page has not been added to the ioend. Accordingly, the
XFS callback assumes a full page discard and invalidation. This is
problematic for sub-page block size filesystems where some portion
of a page might have been mapped successfully before a failure to
map a delalloc block occurs. ->discard_page() is not called in that
error scenario and the bio is explicitly failed by iomap via the
error return from ->prepare_ioend(). As a result, the filesystem
leaks delalloc blocks and corrupts the filesystem block counters.
Since XFS is the only user of ->discard_page(), tweak the semantics
to invoke the callback unconditionally on mapping errors and provide
the file offset that failed to map. Update xfs_discard_page() to
discard the corresponding portion of the file and pass the range
along to iomap_invalidatepage(). The latter already properly handles
both full and sub-page scenarios by not changing any iomap or page
state on sub-page invalidations.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
It is possible to expose non-zeroed post-EOF data in XFS if the new
EOF page is dirty, backed by an unwritten block and the truncate
happens to race with writeback. iomap_truncate_page() will not zero
the post-EOF portion of the page if the underlying block is
unwritten. The subsequent call to truncate_setsize() will, but
doesn't dirty the page. Therefore, if writeback happens to complete
after iomap_truncate_page() (so it still sees the unwritten block)
but before truncate_setsize(), the cached page becomes inconsistent
with the on-disk block. A mapped read after the associated page is
reclaimed or invalidated exposes non-zero post-EOF data.
For example, consider the following sequence when run on a kernel
modified to explicitly flush the new EOF page within the race
window:
$ xfs_io -fc "falloc 0 4k" -c fsync /mnt/file
$ xfs_io -c "pwrite 0 4k" -c "truncate 1k" /mnt/file
...
$ xfs_io -c "mmap 0 4k" -c "mread -v 1k 8" /mnt/file
00000400: 00 00 00 00 00 00 00 00 ........
$ umount /mnt/; mount <dev> /mnt/
$ xfs_io -c "mmap 0 4k" -c "mread -v 1k 8" /mnt/file
00000400: cd cd cd cd cd cd cd cd ........
Update xfs_setattr_size() to explicitly flush the new EOF page prior
to the page truncate to ensure iomap has the latest state of the
underlying block.
Fixes: 68a9f5e700 ("xfs: implement iomap based buffered write path")
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Steffen Klassert says:
====================
1) Fix packet receiving of standard IP tunnels when the xfrm_interface
module is installed. From Xin Long.
2) Fix a race condition between spi allocating and hash list
resizing. From zhuoliang zhang.
====================
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Since the kprobe handlers have protection that prohibits other handlers from
executing in other contexts (like if an NMI comes in while processing a
kprobe, and executes the same kprobe, it will get fail with a "busy"
return). Lockdep is unaware of this protection. Use lockdep's nesting api to
differentiate between locks taken in INT3 context and other context to
suppress the false warnings.
Link: https://lore.kernel.org/r/20201102160234.fa0ae70915ad9e2b21c08b85@kernel.org
Cc: Peter Zijlstra <peterz@infradead.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Enable Green_Sardine VCN support and VCN firmware loading
v2: use apu flags
Signed-off-by: Thong Thai <thong.thai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Commit
393f203f5f ("x86_64: kasan: add interceptors for memset/memmove/memcpy functions")
added .weak directives to arch/x86/lib/mem*_64.S instead of changing the
existing ENTRY macros to WEAK. This can lead to the assembly snippet
.weak memcpy
...
.globl memcpy
which will produce a STB_WEAK memcpy with GNU as but STB_GLOBAL memcpy
with LLVM's integrated assembler before LLVM 12. LLVM 12 (since
https://reviews.llvm.org/D90108) will error on such an overridden symbol
binding.
Commit
ef1e03152c ("x86/asm: Make some functions local")
changed ENTRY in arch/x86/lib/memcpy_64.S to SYM_FUNC_START_LOCAL, which
was ineffective due to the preceding .weak directive.
Use the appropriate SYM_FUNC_START_WEAK instead.
Fixes: 393f203f5f ("x86_64: kasan: add interceptors for memset/memmove/memcpy functions")
Fixes: ef1e03152c ("x86/asm: Make some functions local")
Reported-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Fangrui Song <maskray@google.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Nathan Chancellor <natechancellor@gmail.com>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20201103012358.168682-1-maskray@google.com
The write-URB busy flag was being cleared before the completion handler
was done with the URB, something which could lead to corrupt transfers
due to a racing write request if the URB is resubmitted.
Fixes: 507ca9bc04 ("[PATCH] USB: add ability for usb-serial drivers to determine if their write urb is currently being used.")
Cc: stable <stable@vger.kernel.org> # 2.6.13
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
Oded writes:
This tag contains the following fixes:
- Fix the kernel pointer type we are using across the driver to prevent
compiler warnings (from u64 to void*)
- Configure GAUDI's MMU coresight component in the correct location. The
current code had a bug where the configuration was not executed in some
cases
- Mask watchdog timeout errors in QMANs which can spam the kernel log
* tag 'misc-habanalabs-fixes-2020-11-04' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/ogabbay/linux:
habanalabs/gaudi: mask WDT error in QMAN
habanalabs/gaudi: move coresight mmu config
habanalabs: fix kernel pointer type
free_highpages() iterates over the free memblock regions in high
memory, and marks each page as available for the memory management
system.
Until commit cddb5ddf2b ("arm, xtensa: simplify initialization of
high memory pages") it rounded beginning of each region upwards and end of
each region downwards.
However, after that commit free_highmem() rounds the beginning and end of
each region downwards, and we may end up freeing a page that is
memblock_reserve()d, resulting in memory corruption.
Restore the original rounding of the region boundaries to avoid freeing
reserved pages.
Fixes: cddb5ddf2b ("arm, xtensa: simplify initialization of high memory pages")
Link: https://lore.kernel.org/r/20201029110334.4118-1-ardb@kernel.org/
Link: https://lore.kernel.org/r/20201031094345.6984-1-rppt@kernel.org
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Co-developed-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Max Filippov <jcmvbkbc@gmail.com>
This interrupt cause is not relevant because of how the user use the
QMAN arbitration mechanism. We must mask it as the log explodes with it.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
We must relocate the coresight mmu configuration to the coresight
flow to make it work in case the first submission is to configure
the profiler.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
All throughout the driver, normal kernel pointers are
stored as 'u64' struct members, which is kind of silly
and requires casting through a uintptr_t to void* every
time they are used.
There is one line that missed the intermediate uintptr_t
case, which leads to a compiler warning:
drivers/misc/habanalabs/common/command_buffer.c: In function 'hl_cb_mmap':
drivers/misc/habanalabs/common/command_buffer.c:512:44: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
512 | rc = hdev->asic_funcs->cb_mmap(hdev, vma, (void *) cb->kernel_address,
Rather than adding one more cast, just fix the type and
remove all the other casts.
Fixes: 0db575350c ("habanalabs: make use of dma_mmap_coherent")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
As part of commit a919ba0697 ("hwmon: (pmbus) Stop caching register
values"), the update of the sensor value is now triggered directly by the
sensor attribute value being read from sysfs. This created (or at least
made much more likely) a locking issue, since nothing protected the device
page selection from being unexpectedly modified by concurrent reads. If
sensor values on different pages on the same device were being concurrently
read by multiple threads, this could cause spurious read errors due to the
page register not reading back the same value last written, or sensor
values being read from the incorrect page.
Add locking of the update_lock mutex in pmbus_show_sensor and
pmbus_show_samples so that these cannot result in concurrent reads from the
underlying device.
Fixes: a919ba0697 ("hwmon: (pmbus) Stop caching register values")
Signed-off-by: Robert Hancock <robert.hancock@calian.com>
Reviewed-by: Alex Qiu <xqiu@google.com>
Link: https://lore.kernel.org/r/20201103193315.3011800-1-robert.hancock@calian.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
For kernel 5.10, this function was called twice right next to each
other in the same function due to what looks like a mis-merge.
Remove one of them.
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
pcluster should be only set up for all managed pages instead of
temporary pages. Since it currently uses page->mapping to identify,
the impact is minor for now.
[ Update: Vladimir reported the kernel log becomes polluted
because PAGE_FLAGS_CHECK_AT_FREE flag(s) set if the page
allocation debug option is enabled. ]
Link: https://lore.kernel.org/r/20201022145724.27284-1-hsiangkao@aol.com
Fixes: 5ddcee1f3a ("erofs: get rid of __stagingpage_alloc helper")
Cc: <stable@vger.kernel.org> # 5.5+
Tested-by: Vladimir Zapolskiy <vladimir@tuxera.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
EROFS has _only one_ ondisk timestamp (ctime is currently
documented and recorded, we might also record mtime instead
with a new compat feature if needed) for each extended inode
since EROFS isn't mainly for archival purposes so no need to
keep all timestamps on disk especially for Android scenarios
due to security concerns. Also, romfs/cramfs don't have their
own on-disk timestamp, and squashfs only records mtime instead.
Let's also derive access time from ondisk timestamp rather than
leaving it empty, and if mtime/atime for each file are really
needed for specific scenarios as well, we can also use xattrs
to record them then.
Link: https://lore.kernel.org/r/20201031195102.21221-1-hsiangkao@aol.com
[ Gao Xiang: It'd be better to backport for user-friendly concern. ]
Fixes: 431339ba90 ("staging: erofs: add inode operations")
Cc: stable <stable@vger.kernel.org> # 4.19+
Reported-by: nl6720 <nl6720@gmail.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
We wrap the timeline on construction of the next request, but there may
still be requests in flight that have not yet finalized the breadcrumb.
(The breadcrumb is delayed as we need engine-local offsets, and for the
virtual engine that is not known until execution.) As such, by the time
we write to the timeline's HWSP offset it may have changed, and we
should use the value we preserved in the request instead.
Though the window is small and infrequent (at full flow we can expect a
timeline's seqno to wrap once every 30 minutes), the impact of writing
the old seqno into the new HWSP is severe: the old requests are never
completed, and the new requests are completed before they are even
submitted.
Fixes: ebece75392 ("drm/i915: Keep timeline HWSP allocated until idle across the system")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <stable@vger.kernel.org> # v5.2+
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201022064127.10159-1-chris@chris-wilson.co.uk
(cherry picked from commit c10f6019d0)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
race between user context and softirq causing memleak,
consider the call sequence scenario
chtls_setkey() //user context
chtls_peer_close()
chtls_abort_req_rss()
chtls_setkey() //user context
work request skb queued in chtls_setkey() won't be freed
because resources are already cleaned for this connection,
fix it by not queuing work request while socket is closing.
v1->v2:
- fix W=1 warning.
v2->v3:
- separate it out from another memleak fix.
Fixes: cc35c88ae4 ("crypto : chtls - CPL handler definition")
Signed-off-by: Vinay Kumar Yadav <vinay.yadav@chelsio.com>
Link: https://lore.kernel.org/r/20201102173650.24754-1-vinay.yadav@chelsio.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
After double check with Layerscape CAN owner (Pankaj Bansal), confirm
that LX2160A indeed supports ECC feature, so correct the feature table.
For SoCs with ECC supported, even use FLEXCAN_QUIRK_DISABLE_MECR quirk to
disable non-correctable errors interrupt and freeze mode, had better use
FLEXCAN_QUIRK_SUPPORT_ECC quirk to initialize all memory.
Fixes: 2c19bb43e5 ("can: flexcan: add lx2160ar1 support")
Cc: Pankaj Bansal <pankaj.bansal@nxp.com>
Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Link: https://lore.kernel.org/r/20201020155402.30318-5-qiangqing.zhang@nxp.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Echo management is driven by PUCAN_MSG_LOOPED_BACK bit, while loopback
frames are identified with PUCAN_MSG_SELF_RECEIVE bit. Those bits are set
for each outgoing frame written to the IP core so that a copy of each one
will be placed into the rx path. Thus,
- when PUCAN_MSG_LOOPED_BACK is set then the rx frame is an echo of a
previously sent frame,
- when PUCAN_MSG_LOOPED_BACK+PUCAN_MSG_SELF_RECEIVE are set, then the rx
frame is an echo AND a loopback frame. Therefore, this frame must be
put into the socket rx path too.
This patch fixes how CAN frames are handled when these are sent while the
can interface is configured in "loopback on" mode.
Signed-off-by: Stephane Grosjean <s.grosjean@peak-system.com>
Link: https://lore.kernel.org/r/20201013153947.28012-1-s.grosjean@peak-system.com
Fixes: 8ac8321e4a ("can: peak: add support for PEAK PCAN-PCIe FD CAN-FD boards")
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Fabian Inostroza <fabianinostrozap@gmail.com> has discovered a potential
problem in the hardware timestamp reporting from the PCAN-USB USB CAN interface
(only), related to the fact that a timestamp of an event may precede the
timestamp used for synchronization when both records are part of the same USB
packet. However, this case was used to detect the wrapping of the time counter.
This patch details and fixes the two identified cases where this problem can
occur.
Reported-by: Fabian Inostroza <fabianinostrozap@gmail.com>
Signed-off-by: Stephane Grosjean <s.grosjean@peak-system.com>
Link: https://lore.kernel.org/r/20201014085631.15128-1-s.grosjean@peak-system.com
Fixes: bb4785551f ("can: usb: PEAK-System Technik USB adapters driver core")
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
These values come from skb->data so Smatch considers them untrusted. I
believe Smatch is correct but I don't have a way to test this.
The usb_if->dev[] array has 2 elements but the index is in the 0-15
range without checks. The cfd->len can be up to 255 but the maximum
valid size is CANFD_MAX_DLEN (64) so that could lead to memory
corruption.
Fixes: 0a25e1f4f1 ("can: peak_usb: add support for PEAK new CANFD USB adapters")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: https://lore.kernel.org/r/20200813140604.GA456946@mwanda
Acked-by: Stephane Grosjean <s.grosjean@peak-system.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Don't populate the const array plen on the stack but instead it static. Makes
the object code smaller by 926 bytes.
Before:
text data bss dec hex filename
26531 1943 64 28538 6f7a net/can/isotp.o
After:
text data bss dec hex filename
25509 2039 64 27612 6bdc net/can/isotp.o
(gcc version 10.2.0)
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Link: https://lore.kernel.org/r/20201020154203.54711-1-colin.king@canonical.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
When a netdev down event occurs after a successful call to
j1939_sk_bind(), j1939_netdev_notify() can handle it correctly.
But if the netdev already in down state before calling j1939_sk_bind(),
j1939_sk_release() will stay in wait_event_interruptible() blocked
forever. Because in this case, j1939_netdev_notify() won't be called and
j1939_tp_txtimer() won't call j1939_session_cancel() or other function
to clear session for ENETDOWN error, this lead to mismatch of
j1939_session_get/put() and jsk->skb_pending will never decrease to
zero.
To reproduce it use following commands:
1. ip link add dev vcan0 type vcan
2. j1939acd -r 100,80-120 1122334455667788 vcan0
3. presses ctrl-c and thread will be blocked forever
This patch adds check for ndev->flags in j1939_sk_bind() to avoid this
kind of situation and return with -ENETDOWN.
Fixes: 9d71dd0c70 ("can: add support of SAE J1939 protocol")
Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com>
Link: https://lore.kernel.org/r/1599460308-18770-1-git-send-email-zhangchangzhong@huawei.com
Acked-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
All user space generated SKBs are owned by a socket (unless injected into the
key via AF_PACKET). If a socket is closed, all associated skbs will be cleaned
up.
This leads to a problem when a CAN driver calls can_put_echo_skb() on a
unshared SKB. If the socket is closed prior to the TX complete handler,
can_get_echo_skb() and the subsequent delivering of the echo SKB to all
registered callbacks, a SKB with a refcount of 0 is delivered.
To avoid the problem, in can_get_echo_skb() the original SKB is now always
cloned, regardless of shared SKB or not. If the process exists it can now
safely discard its SKBs, without disturbing the delivery of the echo SKB.
The problem shows up in the j1939 stack, when it clones the incoming skb, which
detects the already 0 refcount.
We can easily reproduce this with following example:
testj1939 -B -r can0: &
cansend can0 1823ff40#0123
WARNING: CPU: 0 PID: 293 at lib/refcount.c:25 refcount_warn_saturate+0x108/0x174
refcount_t: addition on 0; use-after-free.
Modules linked in: coda_vpu imx_vdoa videobuf2_vmalloc dw_hdmi_ahb_audio vcan
CPU: 0 PID: 293 Comm: cansend Not tainted 5.5.0-rc6-00376-g9e20dcb7040d #1
Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
Backtrace:
[<c010f570>] (dump_backtrace) from [<c010f90c>] (show_stack+0x20/0x24)
[<c010f8ec>] (show_stack) from [<c0c3e1a4>] (dump_stack+0x8c/0xa0)
[<c0c3e118>] (dump_stack) from [<c0127fec>] (__warn+0xe0/0x108)
[<c0127f0c>] (__warn) from [<c01283c8>] (warn_slowpath_fmt+0xa8/0xcc)
[<c0128324>] (warn_slowpath_fmt) from [<c0539c0c>] (refcount_warn_saturate+0x108/0x174)
[<c0539b04>] (refcount_warn_saturate) from [<c0ad2cac>] (j1939_can_recv+0x20c/0x210)
[<c0ad2aa0>] (j1939_can_recv) from [<c0ac9dc8>] (can_rcv_filter+0xb4/0x268)
[<c0ac9d14>] (can_rcv_filter) from [<c0aca2cc>] (can_receive+0xb0/0xe4)
[<c0aca21c>] (can_receive) from [<c0aca348>] (can_rcv+0x48/0x98)
[<c0aca300>] (can_rcv) from [<c09b1fdc>] (__netif_receive_skb_one_core+0x64/0x88)
[<c09b1f78>] (__netif_receive_skb_one_core) from [<c09b2070>] (__netif_receive_skb+0x38/0x94)
[<c09b2038>] (__netif_receive_skb) from [<c09b2130>] (netif_receive_skb_internal+0x64/0xf8)
[<c09b20cc>] (netif_receive_skb_internal) from [<c09b21f8>] (netif_receive_skb+0x34/0x19c)
[<c09b21c4>] (netif_receive_skb) from [<c0791278>] (can_rx_offload_napi_poll+0x58/0xb4)
Fixes: 0ae89beb28 ("can: add destructor for self generated skbs")
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: http://lore.kernel.org/r/20200124132656.22156-1-o.rempel@pengutronix.de
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
The can_get_echo_skb() function returns the number of received bytes to
be used for netdev statistics. In the case of RTR frames we get a valid
(potential non-zero) data length value which has to be passed for further
operations. But on the wire RTR frames have no payload length. Therefore
the value to be used in the statistics has to be zero for RTR frames.
Reported-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://lore.kernel.org/r/20201020064443.80164-1-socketcan@hartkopp.net
Fixes: cf5046b309 ("can: dev: let can_get_echo_skb() return dlc of CAN frame")
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Pull perf tools fixes from Arnaldo Carvalho de Melo:
"Only fixes and a sync of the headers so that the perf build is silent:
- Fix visibility attribute in python module init code with newer gcc
- Fix DRAM_BW_Use 0 issue for CLX/SKX in intel JSON vendor event
files
- Fix the build on new fedora by removing LTO compiler options when
building perl support
- Remove broken __no_tail_call attribute
- Fix segfault when trying to trace events by cgroup
- Fix crash with non-jited BPF progs
- Increase buffer size in TUI browser, fixing format truncation
- Fix printing of build-id for objects lacking one
- Fix byte swapping for ino_generation field in MMAP2 perf.data
records
- Fix byte swapping for CGROUP perf.data records, for cross arch
analysis of perf.data files
- Fix the fast path of feature detection
- Update kernel header copies"
* tag 'perf-tools-for-v5.10-2020-11-03' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (23 commits)
tools feature: Fixup fast path feature detection
perf tools: Add missing swap for cgroup events
perf tools: Add missing swap for ino_generation
perf tools: Initialize output buffer in build_id__sprintf
perf hists browser: Increase size of 'buf' in perf_evsel__hists_browse()
tools include UAPI: Update linux/mount.h copy
tools headers UAPI: Update tools's copy of linux/perf_event.h
tools kvm headers: Update KVM headers from the kernel sources
tools UAPI: Update copy of linux/mman.h from the kernel sources
tools arch x86: Sync the msr-index.h copy with the kernel sources
tools x86 headers: Update required-features.h header from the kernel
tools x86 headers: Update cpufeatures.h headers copies
tools headers UAPI: Update fscrypt.h copy
tools headers UAPI: Sync drm/i915_drm.h with the kernel sources
tools headers UAPI: Sync prctl.h with the kernel sources
perf scripting python: Avoid declaring function pointers with a visibility attribute
perf tools: Remove broken __no_tail_call attribute
perf vendor events: Fix DRAM_BW_Use 0 issue for CLX/SKX
perf trace: Fix segfault when trying to trace events by cgroup
perf tools: Fix crash with non-jited bpf progs
...
If a driver calls can_get_echo_skb() during a hardware IRQ (which is often, but
not always, the case), the 'WARN_ON(in_irq)' in
net/core/skbuff.c#skb_release_head_state() might be triggered, under network
congestion circumstances, together with the potential risk of a NULL pointer
dereference.
The root cause of this issue is the call to kfree_skb() instead of
dev_kfree_skb_irq() in net/core/dev.c#enqueue_to_backlog().
This patch prevents the skb to be freed within the call to netif_rx() by
incrementing its reference count with skb_get(). The skb is finally freed by
one of the in-irq-context safe functions: dev_consume_skb_any() or
dev_kfree_skb_any(). The "any" version is used because some drivers might call
can_get_echo_skb() in a normal context.
The reason for this issue to occur is that initially, in the core network
stack, loopback skb were not supposed to be received in hardware IRQ context.
The CAN stack is an exeption.
This bug was previously reported back in 2017 in [1] but the proposed patch
never got accepted.
While [1] directly modifies net/core/dev.c, we try to propose here a
smoother modification local to CAN network stack (the assumption
behind is that only CAN devices are affected by this issue).
[1] http://lore.kernel.org/r/57a3ffb6-3309-3ad5-5a34-e93c3fe3614d@cetitec.com
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Link: https://lore.kernel.org/r/20201002154219.4887-2-mailhol.vincent@wanadoo.fr
Fixes: 39549eef35 ("can: CAN Network device driver and Netlink interface")
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
A CAN driver, using the rx-offload infrastructure, is reading CAN frames
(usually in IRQ context) from the hardware and placing it into the rx-offload
queue to be delivered to the networking stack via NAPI.
In case the rx-offload queue is full, trying to add more skbs results in the
skbs being dropped using kfree_skb(). If done from hard-IRQ context this
results in the following warning:
[ 682.552693] ------------[ cut here ]------------
[ 682.557360] WARNING: CPU: 0 PID: 3057 at net/core/skbuff.c:650 skb_release_head_state+0x74/0x84
[ 682.566075] Modules linked in: can_raw can coda_vpu flexcan dw_hdmi_ahb_audio v4l2_jpeg imx_vdoa can_dev
[ 682.575597] CPU: 0 PID: 3057 Comm: cansend Tainted: G W 5.7.0+ #18
[ 682.583098] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
[ 682.589657] [<c0112628>] (unwind_backtrace) from [<c010c1c4>] (show_stack+0x10/0x14)
[ 682.597423] [<c010c1c4>] (show_stack) from [<c06c481c>] (dump_stack+0xe0/0x114)
[ 682.604759] [<c06c481c>] (dump_stack) from [<c0128f10>] (__warn+0xc0/0x10c)
[ 682.611742] [<c0128f10>] (__warn) from [<c0129314>] (warn_slowpath_fmt+0x5c/0xc0)
[ 682.619248] [<c0129314>] (warn_slowpath_fmt) from [<c0b95dec>] (skb_release_head_state+0x74/0x84)
[ 682.628143] [<c0b95dec>] (skb_release_head_state) from [<c0b95e08>] (skb_release_all+0xc/0x24)
[ 682.636774] [<c0b95e08>] (skb_release_all) from [<c0b95eac>] (kfree_skb+0x74/0x1c8)
[ 682.644479] [<c0b95eac>] (kfree_skb) from [<bf001d1c>] (can_rx_offload_queue_sorted+0xe0/0xe8 [can_dev])
[ 682.654051] [<bf001d1c>] (can_rx_offload_queue_sorted [can_dev]) from [<bf001d6c>] (can_rx_offload_get_echo_skb+0x48/0x94 [can_dev])
[ 682.666007] [<bf001d6c>] (can_rx_offload_get_echo_skb [can_dev]) from [<bf01efe4>] (flexcan_irq+0x194/0x5dc [flexcan])
[ 682.676734] [<bf01efe4>] (flexcan_irq [flexcan]) from [<c019c1ec>] (__handle_irq_event_percpu+0x4c/0x3ec)
[ 682.686322] [<c019c1ec>] (__handle_irq_event_percpu) from [<c019c5b8>] (handle_irq_event_percpu+0x2c/0x88)
[ 682.695993] [<c019c5b8>] (handle_irq_event_percpu) from [<c019c64c>] (handle_irq_event+0x38/0x5c)
[ 682.704887] [<c019c64c>] (handle_irq_event) from [<c01a1058>] (handle_fasteoi_irq+0xc8/0x180)
[ 682.713432] [<c01a1058>] (handle_fasteoi_irq) from [<c019b2c0>] (generic_handle_irq+0x30/0x44)
[ 682.722063] [<c019b2c0>] (generic_handle_irq) from [<c019b8f8>] (__handle_domain_irq+0x64/0xdc)
[ 682.730783] [<c019b8f8>] (__handle_domain_irq) from [<c06df4a4>] (gic_handle_irq+0x48/0x9c)
[ 682.739158] [<c06df4a4>] (gic_handle_irq) from [<c0100b30>] (__irq_svc+0x70/0x98)
[ 682.746656] Exception stack(0xe80e9dd8 to 0xe80e9e20)
[ 682.751725] 9dc0: 00000001 e80e8000
[ 682.759922] 9de0: e820cf80 00000000 ffffe000 00000000 eaf08fe4 00000000 600d0013 00000000
[ 682.768117] 9e00: c1732e3c c16093a8 e820d4c0 e80e9e28 c018a57c c018b870 600d0013 ffffffff
[ 682.776315] [<c0100b30>] (__irq_svc) from [<c018b870>] (lock_acquire+0x108/0x4e8)
[ 682.783821] [<c018b870>] (lock_acquire) from [<c0e938e4>] (down_write+0x48/0xa8)
[ 682.791242] [<c0e938e4>] (down_write) from [<c02818dc>] (unlink_file_vma+0x24/0x40)
[ 682.798922] [<c02818dc>] (unlink_file_vma) from [<c027a258>] (free_pgtables+0x34/0xb8)
[ 682.806858] [<c027a258>] (free_pgtables) from [<c02835a4>] (exit_mmap+0xe4/0x170)
[ 682.814361] [<c02835a4>] (exit_mmap) from [<c01248e0>] (mmput+0x5c/0x110)
[ 682.821171] [<c01248e0>] (mmput) from [<c012e910>] (do_exit+0x374/0xbe4)
[ 682.827892] [<c012e910>] (do_exit) from [<c0130888>] (do_group_exit+0x38/0xb4)
[ 682.835132] [<c0130888>] (do_group_exit) from [<c0130914>] (__wake_up_parent+0x0/0x14)
[ 682.843063] irq event stamp: 1936
[ 682.846399] hardirqs last enabled at (1935): [<c02938b0>] rmqueue+0xf4/0xc64
[ 682.853553] hardirqs last disabled at (1936): [<c0100b20>] __irq_svc+0x60/0x98
[ 682.860799] softirqs last enabled at (1878): [<bf04cdcc>] raw_release+0x108/0x1f0 [can_raw]
[ 682.869256] softirqs last disabled at (1876): [<c0b8f478>] release_sock+0x18/0x98
[ 682.876753] ---[ end trace 7bca4751ce44c444 ]---
This patch fixes the problem by replacing the kfree_skb() by
dev_kfree_skb_any(), as rx-offload might be called from threaded IRQ handlers
as well.
Fixes: ca913f1ac0 ("can: rx-offload: can_rx_offload_queue_sorted(): fix error handling, avoid skb mem leak")
Fixes: 6caf8a6d65 ("can: rx-offload: can_rx_offload_queue_tail(): fix error handling, avoid skb mem leak")
Link: http://lore.kernel.org/r/20201019190524.1285319-3-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Pull documentation build warning fixes from Jonathan Corbet:
"This contains a series of warning fixes from Mauro; once applied, the
number of warnings from the once-noisy docs build process is nearly
zero.
Getting to this point has required a lot of work; once there,
hopefully we can keep things that way.
I have packaged this as a separate pull because it does a fair amount
of reaching outside of Documentation/. The changes are all in comments
and in code placement. It's all been in linux-next since last week"
* tag 'docs-5.10-warnings' of git://git.lwn.net/linux: (24 commits)
docs: SafeSetID: fix a warning
amdgpu: fix a few kernel-doc markup issues
selftests: kselftest_harness.h: fix kernel-doc markups
drm: amdgpu_dm: fix a typo
gpu: docs: amdgpu.rst: get rid of wrong kernel-doc markups
drm: amdgpu: kernel-doc: update some adev parameters
docs: fs: api-summary.rst: get rid of kernel-doc include
IB/srpt: docs: add a description for cq_size member
locking/refcount: move kernel-doc markups to the proper place
docs: lockdep-design: fix some warning issues
MAINTAINERS: fix broken doc refs due to yaml conversion
ice: docs fix a devlink info that broke a table
crypto: sun8x-ce*: update entries to its documentation
net: phy: remove kernel-doc duplication
mm: pagemap.h: fix two kernel-doc markups
blk-mq: docs: add kernel-doc description for a new struct member
docs: userspace-api: add iommu.rst to the index file
docs: hwmon: mp2975.rst: address some html build warnings
docs: net: statistics.rst: remove a duplicated kernel-doc
docs: kasan.rst: add two missing blank lines
...
The i2c driver default do dma reset after i2c reset, but sometimes
i2c reset will trigger dma tx2rx, then apdma write data to dram
which has been i2c_put_dma_safe_msg_buf(kfree). Move dma reset
before i2c reset in mtk_i2c_init_hw to fix it.
Signed-off-by: Qii Wang <qii.wang@mediatek.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Bypass the IGD initialization when -ENODEV returns,
that should be the case if opregion is not available for IGD
or within discrete graphics device's option ROM,
or host/lpc bridge is not found.
Then use of -ENODEV here means no special device resources found
which needs special care for VFIO, but we still allow other normal
device resource access.
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Xiong Zhang <xiong.y.zhang@intel.com>
Cc: Hang Yuan <hang.yuan@linux.intel.com>
Cc: Stuart Summers <stuart.summers@intel.com>
Signed-off-by: Fred Gao <fred.gao@intel.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
pm_runtime_get_sync() will increment pm usage counter even it
failed. Forgetting to call pm_runtime_put will result in
reference leak in vfio_platform_open, so we should fix it.
Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com>
Acked-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Fixed compiler warning:
drivers/vfio/fsl-mc/vfio_fsl_mc_intr.c:16:5: warning: no previous
prototype for function 'vfio_fsl_mc_irqs_allocate' [-Wmissing-prototypes]
^
drivers/vfio/fsl-mc/vfio_fsl_mc_intr.c:16:1: note: declare 'static'
if the function is not intended to be used outside of this translation unit
int vfio_fsl_mc_irqs_allocate(struct vfio_fsl_mc_device *vdev)
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Diana Craciun <diana.craciun@oss.nxp.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
My static analsysis tool complains that the "index" can be negative.
There are some checks in do_mmap() which try to prevent underflows but
I don't know if they are sufficient for this situation. Either way,
making "index" unsigned is harmless so let's do it just to be safe.
Fixes: 6724728968 ("vfio/fsl-mc: Allow userspace to MMAP fsl-mc device MMIO regions")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Diana Craciun <diana.craciun@oss.nxp.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Pull documentation fixes from Jonathan Corbet:
"A small number of fixes, plus a build tweak to respect the desire for
silence in V=0 builds"
* tag 'docs-5.10-3' of git://git.lwn.net/linux:
docs: fix automarkup regression on Python 2
documentation: arm: sunxi: add Allwinner H6 documents
scripts: kernel-doc: split typedef complex regex
scripts: kernel-doc: fix typedef parsing
docs: Makefile: honor V=0 for docs building
Pull x86 SEV-ES fixes from Borislav Petkov:
"A couple of changes to the SEV-ES code to perform more stringent
hypervisor checks before enabling encryption (Joerg Roedel)"
* tag 'x86_seves_for_v5.10_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/sev-es: Do not support MMIO to/from encrypted memory
x86/head/64: Check SEV encryption before switching to kernel page-table
x86/boot/compressed/64: Check SEV encryption in 64-bit boot-path
x86/boot/compressed/64: Sanity-check CPUID results in the early #VC handler
x86/boot/compressed/64: Introduce sev_status
The cleanup for the yfs_store_opaque_acl2_operation calls the wrong
function to destroy the ACL content buffer. It's an afs_acl struct, not
a yfs_acl struct - and the free function for latter may pass invalid
pointers to kfree().
Fix this by using the afs_acl_put() function. The yfs_acl_put()
function is then no longer used and can be removed.
general protection fault, probably for non-canonical address 0x7ebde00000000: 0000 [#1] SMP PTI
...
RIP: 0010:compound_head+0x0/0x11
...
Call Trace:
virt_to_cache+0x8/0x51
kfree+0x5d/0x79
yfs_free_opaque_acl+0x16/0x29
afs_put_operation+0x60/0x114
__vfs_setxattr+0x67/0x72
__vfs_setxattr_noperm+0x66/0xe9
vfs_setxattr+0x67/0xce
setxattr+0x14e/0x184
__do_sys_fsetxattr+0x66/0x8f
do_syscall_64+0x2d/0x3a
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fixes: e49c7b2f6d ("afs: Build an abstraction around an "operation" concept")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When using the afs.yfs.acl xattr to change an AuriStor ACL, a warning
can be generated when the request is marshalled because the buffer
pointer isn't increased after adding the last element, thereby
triggering the check at the end if the ACL wasn't empty. This just
causes something like the following warning, but doesn't stop the call
from happening successfully:
kAFS: YFS.StoreOpaqueACL2: Request buffer underflow (36<108)
Fix this simply by increasing the count prior to the check.
Fixes: f5e4546347 ("afs: Implement YFS ACL setting")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It turns out that the Python 2 re module lacks the ASCII flag, so don't try
to use it there.
Fixes: f66e47f98c ("docs: automarkup.py: Fix regexes to solve sphinx 3 warnings")
Reported-by: Dafna Hirschfeld <dafna.hirschfeld@collabora.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Under some circumstances in particular with "Reconfigure I/O Path"
a zPCI function may first appear in Standby through a PCI event with
PEC 0x0302 which initially makes it visible to the zPCI subsystem,
Only after that is it configured with a zPCI event with PEC 0x0301.
If the zbus is still missing a PCI function zero (devfn == 0) when the
PCI event 0x0301 is handled zdev->zbus->bus is still NULL and gets
dereferenced in common code.
Check for this case and enable but don't scan the zPCI function.
This matches what would happen if we immediately got the 0x0301
configuration request or the function was included in CLP List PCI.
In all cases the PCI functions with devfn != 0 will be scanned once
function 0 appears.
Fixes: 3047766bc6 ("s390/pci: fix enabling a reserved PCI function")
Cc: <stable@vger.kernel.org> # 5.8
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Acked-by: Pierre Morel <pmorel@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
The call to rcu_cpu_starting() in smp_init_secondary() is not early
enough in the CPU-hotplug onlining process, which results in lockdep
splats as follows:
WARNING: suspicious RCU usage
-----------------------------
kernel/locking/lockdep.c:3497 RCU-list traversed in non-reader section!!
other info that might help us debug this:
RCU used illegally from offline CPU!
rcu_scheduler_active = 1, debug_locks = 1
no locks held by swapper/1/0.
Call Trace:
show_stack+0x158/0x1f0
dump_stack+0x1f2/0x238
__lock_acquire+0x2640/0x4dd0
lock_acquire+0x3a8/0xd08
_raw_spin_lock_irqsave+0xc0/0xf0
clockevents_register_device+0xa8/0x528
init_cpu_timer+0x33e/0x468
smp_init_secondary+0x11a/0x328
smp_start_secondary+0x82/0x88
This is avoided by moving the call to rcu_cpu_starting up near the
beginning of the smp_init_secondary() function. Note that the
raw_smp_processor_id() is required in order to avoid calling into
lockdep before RCU has declared the CPU to be watched for readers.
Link: https://lore.kernel.org/lkml/160223032121.7002.1269740091547117869.tip-bot2@tip-bot2/
Signed-off-by: Qian Cai <cai@redhat.com>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
When both the paes and the pkey kernel module are statically build
into the kernel, the paes cipher selftests run before the pkey
kernel module is initialized. So a static variable set in the pkey
init function and used in the pkey_clr2protkey function is not
initialized when the paes cipher's selftests request to call pckmo for
transforming a clear key value into a protected key.
This patch moves the initial setup of the static variable into
the function pck_clr2protkey. So it's possible, to use the function
for transforming a clear to a protected key even before the pkey
init function has been called and the paes selftests may run
successful.
Reported-by: Alexander Egorenkov <Alexander.Egorenkov@ibm.com>
Cc: <stable@vger.kernel.org> # 4.20
Fixes: f822ad2c2c ("s390/pkey: move pckmo subfunction available checks away from module init")
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
pmd/pud_deref() assume that they will never operate on large pmd/pud
entries, and therefore only use the non-large _xxx_ENTRY_ORIGIN mask.
With commit 9ec8fa8dc3 ("s390/vmemmap: extend modify_pagetable()
to handle vmemmap"), that assumption is no longer true, at least for
pmd_deref().
In theory, we could end up with wrong addresses because some of the
non-address bits of a large entry would not be masked out.
In practice, this does not (yet) show any impact, because vmemmap_free()
is currently never used for s390.
Fix pmd/pud_deref() to check for the entry type and use the
_xxx_ENTRY_ORIGIN_LARGE mask for large entries.
While at it, also move pmd/pud_pfn() around, in order to avoid code
duplication, because they do the same thing.
Fixes: 9ec8fa8dc3 ("s390/vmemmap: extend modify_pagetable() to handle vmemmap")
Cc: <stable@vger.kernel.org> # 5.9
Signed-off-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
With the last rework of the AP bus scan function one get_device() is
missing causing the reference counter to be one instance too
low. Together with binding/unbinding device drivers to an ap device it
may end up in an segfault because the ap device is freed but a device
driver still assumes it's pointer to the ap device is valid:
Unable to handle kernel pointer dereference in virtual kernel address space
Failing address: 6b6b6b6b6b6b6000 TEID: 6b6b6b6b6b6b6803
Fault in home space mode while using kernel ASCE.
Krnl PSW : 0404e00180000000 000000001472f3b6 (klist_next+0x7e/0x180)
R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
Call Trace:
[<000000001472f3b6>] klist_next+0x7e/0x180
([<000000001472f36a>] klist_next+0x32/0x180)
[<00000000147c14de>] bus_for_each_dev+0x66/0xb8
[<0000000014aab0d4>] ap_scan_adapter+0xcc/0x6c0
[<0000000014aab74a>] ap_scan_bus+0x82/0x140
[<0000000013f3b654>] process_one_work+0x27c/0x478
[<0000000013f3b8b6>] worker_thread+0x66/0x368
[<0000000013f44e32>] kthread+0x17a/0x1a0
[<0000000014af23e4>] ret_from_fork+0x24/0x2c
Kernel panic - not syncing: Fatal exception: panic_on_oops
Fixed by adjusting the reference count with get_device() on the right
place. Also now the device drivers don't need to adjust the ap
device's reference counting any more. This is now done in the ap bus
probe and remove functions.
Reported-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Fixes: 4f2fcccdb5 ("s390/ap: add card/queue deconfig state")
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Commit 36dadef23f ("kprobes: Init kprobes in early_initcall") enabled
using kprobes from early_initcall. Unfortunately at this point the
hardware debug infrastructure is not operational. The OS lock may still
be locked, and the hardware watchpoints may have unknown values when
kprobe enables debug monitors to single-step instructions.
Rather than using hardware single-step, append a BRK instruction after
the instruction to be executed out-of-line.
Fixes: 36dadef23f ("kprobes: Init kprobes in early_initcall")
Suggested-by: Will Deacon <will@kernel.org>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Link: https://lore.kernel.org/r/20201103134900.337243-1-jean-philippe@linaro.org
Signed-off-by: Will Deacon <will@kernel.org>
The "data->flags" variable is a u64 so if one of the high 32 bits is
set the original code will allow it, but it should be rejected. The
fix is to declare "mask" as a u64 instead of a u32.
Fixes: d90573812e ("iommu/uapi: Handle data and argsz filled by users")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: https://lore.kernel.org/r/20201103101623.GA1127762@mwanda
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Commit fc0e38dae6 ("GFS2: Fix glock deallocation race") fixed a
sd_glock_disposal accounting bug by adding a missing atomic_dec
statement, but it failed to wake up sd_glock_wait when that decrement
causes sd_glock_disposal to reach zero. As a consequence,
gfs2_gl_hash_clear can now run into a 10-minute timeout instead of
being woken up. Add the missing wakeup.
Fixes: fc0e38dae6 ("GFS2: Fix glock deallocation race")
Cc: stable@vger.kernel.org # v2.6.39+
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Certain device drivers allocate IO queues on a per-cpu basis.
On AMD EPYC platform, which can support up-to 256 cpu threads,
this can exceed the current MAX_IRQ_PER_TABLE limit of 256,
and result in the error message:
AMD-Vi: Failed to allocate IRTE
This has been observed with certain NVME devices.
AMD IOMMU hardware can actually support upto 512 interrupt
remapping table entries. Therefore, update the driver to
match the hardware limit.
Please note that this also increases the size of interrupt remapping
table to 8KB per device when using the 128-bit IRTE format.
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Link: https://lore.kernel.org/r/20201015025002.87997-1-suravee.suthikulpanit@amd.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
22dd1ac91a ("tools: Remove feature-libelf-mmap feature detection")
correctly simplified the this feature detection, but forgot to remove
the call to the removed function in the main() function for the
test-all.c fast path feature detection, making it fail and thus do all
the feature detection individually, fix it.
$ cat /tmp/build/perf/feature/test-all.make.output
test-all.c: In function ‘main’:
test-all.c:188:2: error: implicit declaration of function ‘main_test_libelf_mmap’; did you mean ‘main_test_libelf’? [-Werror=implicit-function-declaration]
188 | main_test_libelf_mmap();
| ^~~~~~~~~~~~~~~~~~~~~
| main_test_libelf
cc1: all warnings being treated as errors
$ vim tools/build/feature/test-all.c
$ rm -rf /tmp/build/perf ; mkdir -p /tmp/build/perf ;make V=1 -k O=/tmp/build/perf -C tools/perf install-bin ; perf test python
<SNIP>
$ cat /tmp/build/perf/feature/test-all.make.output
$
Fixes: 22dd1ac91a ("tools: Remove feature-libelf-mmap feature detection")
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Making perf with gcc-9.1.1 generates the following warning:
CC ui/browsers/hists.o
ui/browsers/hists.c: In function 'perf_evsel__hists_browse':
ui/browsers/hists.c:3078:61: error: '%d' directive output may be \
truncated writing between 1 and 11 bytes into a region of size \
between 2 and 12 [-Werror=format-truncation=]
3078 | "Max event group index to sort is %d (index from 0 to %d)",
| ^~
ui/browsers/hists.c:3078:7: note: directive argument in the range [-2147483648, 8]
3078 | "Max event group index to sort is %d (index from 0 to %d)",
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /usr/include/stdio.h:937,
from ui/browsers/hists.c:5:
IOW, the string in line 3078 might be too long for buf[] of 64 bytes.
Fix this by increasing the size of buf[] to 128.
Fixes: dbddf17474 ("perf report/top TUI: Support hotkeys to let user select any event for sorting")
Signed-off-by: Song Liu <songliubraving@fb.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: stable@vger.kernel.org # v5.7+
Link: http://lore.kernel.org/lkml/20201030235431.534417-1-songliubraving@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick the changes from:
dab741e0e0 ("Add a "nosymfollow" mount option.")
That ends up adding support for the new MS_NOSYMFOLLOW mount flag:
$ tools/perf/trace/beauty/mount_flags.sh > before
$ cp include/uapi/linux/mount.h tools/include/uapi/linux/mount.h
$ tools/perf/trace/beauty/mount_flags.sh > after
$ diff -u before after
--- before 2020-11-03 08:51:28.117997454 -0300
+++ after 2020-11-03 08:51:38.992218869 -0300
@@ -7,6 +7,7 @@
[32 ? (ilog2(32) + 1) : 0] = "REMOUNT",
[64 ? (ilog2(64) + 1) : 0] = "MANDLOCK",
[128 ? (ilog2(128) + 1) : 0] = "DIRSYNC",
+ [256 ? (ilog2(256) + 1) : 0] = "NOSYMFOLLOW",
[1024 ? (ilog2(1024) + 1) : 0] = "NOATIME",
[2048 ? (ilog2(2048) + 1) : 0] = "NODIRATIME",
[4096 ? (ilog2(4096) + 1) : 0] = "BIND",
$
So now one can use it in --filter expressions for tracepoints.
This silences this perf build warnings:
Warning: Kernel ABI header at 'tools/include/uapi/linux/mount.h' differs from latest version at 'include/uapi/linux/mount.h'
diff -u tools/include/uapi/linux/mount.h include/uapi/linux/mount.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mattias Nissler <mnissler@chromium.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The diff is just tabs versus spaces, trivial.
This silences this perf tools build warning:
Warning: Kernel ABI header at 'tools/include/uapi/linux/perf_event.h' differs from latest version at 'include/uapi/linux/perf_event.h'
diff -u tools/include/uapi/linux/perf_event.h include/uapi/linux/perf_event.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Some should cause changes in tooling, like the one adding LAST_EXCP, but
the way it is structured end up not making that happen.
The new SVM_EXIT_INVPCID should get used by arch/x86/util/kvm-stat.c,
in the svm_exit_reasons table.
The tools/perf/trace/beauty part has scripts to catch changes and
automagically create tables, like tools/perf/trace/beauty/kvm_ioctl.sh,
but changes are needed to make tools/perf/arch/x86/util/kvm-stat.c catch
those automatically.
These were handled by the existing scripts:
$ tools/perf/trace/beauty/kvm_ioctl.sh > before
$ cp include/uapi/linux/kvm.h tools/include/uapi/linux/kvm.h
$ tools/perf/trace/beauty/kvm_ioctl.sh > after
$ diff -u before after
--- before 2020-11-03 08:43:52.910728608 -0300
+++ after 2020-11-03 08:44:04.273959984 -0300
@@ -89,6 +89,7 @@
[0xbf] = "SET_NESTED_STATE",
[0xc0] = "CLEAR_DIRTY_LOG",
[0xc1] = "GET_SUPPORTED_HV_CPUID",
+ [0xc6] = "X86_SET_MSR_FILTER",
[0xe0] = "CREATE_DEVICE",
[0xe1] = "SET_DEVICE_ATTR",
[0xe2] = "GET_DEVICE_ATTR",
$
$ tools/perf/trace/beauty/vhost_virtio_ioctl.sh > before
$ cp include/uapi/linux/vhost.h tools/include/uapi/linux/vhost.h
$
$ tools/perf/trace/beauty/vhost_virtio_ioctl.sh > after
$ diff -u before after
--- before 2020-11-03 08:45:55.522225198 -0300
+++ after 2020-11-03 08:46:12.881578666 -0300
@@ -37,4 +37,5 @@
[0x71] = "VDPA_GET_STATUS",
[0x73] = "VDPA_GET_CONFIG",
[0x76] = "VDPA_GET_VRING_NUM",
+ [0x78] = "VDPA_GET_IOVA_RANGE",
};
$
This addresses these perf build warnings:
Warning: Kernel ABI header at 'tools/arch/arm64/include/uapi/asm/kvm.h' differs from latest version at 'arch/arm64/include/uapi/asm/kvm.h'
diff -u tools/arch/arm64/include/uapi/asm/kvm.h arch/arm64/include/uapi/asm/kvm.h
Warning: Kernel ABI header at 'tools/arch/s390/include/uapi/asm/sie.h' differs from latest version at 'arch/s390/include/uapi/asm/sie.h'
diff -u tools/arch/s390/include/uapi/asm/sie.h arch/s390/include/uapi/asm/sie.h
Warning: Kernel ABI header at 'tools/arch/x86/include/uapi/asm/kvm.h' differs from latest version at 'arch/x86/include/uapi/asm/kvm.h'
diff -u tools/arch/x86/include/uapi/asm/kvm.h arch/x86/include/uapi/asm/kvm.h
Warning: Kernel ABI header at 'tools/arch/x86/include/uapi/asm/svm.h' differs from latest version at 'arch/x86/include/uapi/asm/svm.h'
diff -u tools/arch/x86/include/uapi/asm/svm.h arch/x86/include/uapi/asm/svm.h
Warning: Kernel ABI header at 'tools/include/uapi/linux/kvm.h' differs from latest version at 'include/uapi/linux/kvm.h'
diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h
Warning: Kernel ABI header at 'tools/include/uapi/linux/vhost.h' differs from latest version at 'include/uapi/linux/vhost.h'
diff -u tools/include/uapi/linux/vhost.h include/uapi/linux/vhost.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Yarygin <yarygin@linux.vnet.ibm.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
e47168f3d1 ("powerpc/8xx: Support 16k hugepages with 4k pages")
That don't cause any changes in tooling, just addresses this perf build
warning:
Warning: Kernel ABI header at 'tools/include/uapi/linux/mman.h' differs from latest version at 'include/uapi/linux/mman.h'
diff -u tools/include/uapi/linux/mman.h include/uapi/linux/mman.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick the changes from:
ecac71816a ("x86/paravirt: Use CONFIG_PARAVIRT_XXL instead of CONFIG_PARAVIRT")
That don entail any changes in tooling, just addressing these perf tools
build warning:
Warning: Kernel ABI header at 'tools/arch/x86/include/asm/required-features.h' differs from latest version at 'arch/x86/include/asm/required-features.h'
diff -u tools/arch/x86/include/asm/required-features.h arch/x86/include/asm/required-features.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick the changes from:
5866e9205b ("x86/cpu: Add hardware-enforced cache coherency as a CPUID feature")
ff4f82816d ("x86/cpufeatures: Enumerate ENQCMD and ENQCMDS instructions")
360e7c5c4c ("x86/cpufeatures: Add SEV-ES CPU feature")
18ec63faef ("x86/cpufeatures: Enumerate TSX suspend load address tracking instructions")
e48cb1a3fb ("x86/resctrl: Enumerate per-thread MBA controls")
Which don't cause any changes in tooling, just addresses these build
warnings:
Warning: Kernel ABI header at 'tools/arch/x86/include/asm/cpufeatures.h' differs from latest version at 'arch/x86/include/asm/cpufeatures.h'
diff -u tools/arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/cpufeatures.h
Warning: Kernel ABI header at 'tools/arch/x86/include/asm/disabled-features.h' differs from latest version at 'arch/x86/include/asm/disabled-features.h'
diff -u tools/arch/x86/include/asm/disabled-features.h arch/x86/include/asm/disabled-features.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Cc: Kyung Min Park <kyung.min.park@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To get the changes from:
c7f0207b61 ("fscrypt: make "#define fscrypt_policy" user-only")
That don't cause any changes in tools/perf, only addresses this perf
tools build warning:
Warning: Kernel ABI header at 'tools/include/uapi/linux/fscrypt.h' differs from latest version at 'include/uapi/linux/fscrypt.h'
diff -u tools/include/uapi/linux/fscrypt.h include/uapi/linux/fscrypt.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Eric Biggers <ebiggers@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick the changes in:
13149e8baf ("drm/i915: add syncobj timeline support")
cda9edd024 ("drm/i915: introduce a mechanism to extend execbuf2")
That don't result in any changes in tooling, just silences this perf
build warning:
Warning: Kernel ABI header at 'tools/include/uapi/drm/i915_drm.h' differs from latest version at 'include/uapi/drm/i915_drm.h'
diff -u tools/include/uapi/drm/i915_drm.h include/uapi/drm/i915_drm.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To get the changes in:
1c101da8b9 ("arm64: mte: Allow user control of the tag check mode via prctl()")
af5ce95282 ("arm64: mte: Allow user control of the generated random tags via prctl()")
Which don't cause any change in tooling, only addresses this perf build
warning:
Warning: Kernel ABI header at 'tools/include/uapi/linux/prctl.h' differs from latest version at 'include/uapi/linux/prctl.h'
diff -u tools/include/uapi/linux/prctl.h include/uapi/linux/prctl.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To avoid this:
util/scripting-engines/trace-event-python.c: In function 'python_start_script':
util/scripting-engines/trace-event-python.c:1595:2: error: 'visibility' attribute ignored [-Werror=attributes]
1595 | PyMODINIT_FUNC (*initfunc)(void);
| ^~~~~~~~~~~~~~
That started breaking when building with PYTHON=python3 and these gcc
versions (I haven't checked with the clang ones, maybe it breaks there
as well):
# export PERF_TARBALL=http://192.168.86.5/perf/perf-5.9.0.tar.xz
# dm fedora:33 fedora:rawhide
1 107.80 fedora:33 : Ok gcc (GCC) 10.2.1 20201005 (Red Hat 10.2.1-5), clang version 11.0.0 (Fedora 11.0.0-1.fc33)
2 92.47 fedora:rawhide : Ok gcc (GCC) 10.2.1 20201016 (Red Hat 10.2.1-6), clang version 11.0.0 (Fedora 11.0.0-1.fc34)
#
Avoid that by ditching that 'initfunc' function pointer with its:
#define Py_EXPORTED_SYMBOL _attribute_ ((visibility ("default")))
#define PyMODINIT_FUNC Py_EXPORTED_SYMBOL PyObject*
And just call PyImport_AppendInittab() at the end of the ifdef python3
block with the functions that were being attributed to that initfunc.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian reports an issue that the metric DRAM_BW_Use often remains 0.
The metric expression for DRAM_BW_Use on CLX/SKX:
"( 64 * ( uncore_imc@cas_count_read@ + uncore_imc@cas_count_write@ ) / 1000000000 ) / duration_time"
The counts of uncore_imc/cas_count_read/ and uncore_imc/cas_count_write/
are scaled up by 64, that is to turn a count of cache lines into bytes,
the count is then divided by 1000000000 to give GB.
However, the counts of uncore_imc/cas_count_read/ and
uncore_imc/cas_count_write/ have been scaled yet.
The scale values are from sysfs, such as
/sys/devices/uncore_imc_0/events/cas_count_read.scale.
It's 6.103515625e-5 (64 / 1024.0 / 1024.0).
So if we use original metric expression, the result is not correct.
But the difficulty is, for SKL client, the counts are not scaled.
The metric expression for DRAM_BW_Use on SKL:
"64 * ( arb@event\\=0x81\\,umask\\=0x1@ + arb@event\\=0x84\\,umask\\=0x1@ ) / 1000000 / duration_time / 1000"
root@kbl-ppc:~# perf stat -M DRAM_BW_Use -a -- sleep 1
Performance counter stats for 'system wide':
190 arb/event=0x84,umask=0x1/ # 1.86 DRAM_BW_Use
29,093,178 arb/event=0x81,umask=0x1/
1,000,703,287 ns duration_time
1.000703287 seconds time elapsed
The result is expected.
So the easy way is just change the metric expression for CLX/SKX.
This patch changes the metric expression to:
"( ( ( uncore_imc@cas_count_read@ + uncore_imc@cas_count_write@ ) * 1048576 ) / 1000000000 ) / duration_time"
1048576 = 1024 * 1024.
Before (tested on CLX):
root@lkp-csl-2sp5 ~# perf stat -M DRAM_BW_Use -a -- sleep 1
Performance counter stats for 'system wide':
765.35 MiB uncore_imc/cas_count_read/ # 0.00 DRAM_BW_Use
5.42 MiB uncore_imc/cas_count_write/
1001515088 ns duration_time
1.001515088 seconds time elapsed
After:
root@lkp-csl-2sp5 ~# perf stat -M DRAM_BW_Use -a -- sleep 1
Performance counter stats for 'system wide':
767.95 MiB uncore_imc/cas_count_read/ # 0.80 DRAM_BW_Use
5.02 MiB uncore_imc/cas_count_write/
1001900010 ns duration_time
1.001900010 seconds time elapsed
Fixes: 038d3b53c2 ("perf vendor events intel: Update CascadelakeX events to v1.08")
Fixes: b5ff7f2799 ("perf vendor events: Update SkylakeX events to v1.21")
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20201023005334.7869-1-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The addr in PERF_RECORD_KSYMBOL events for non-jited bpf progs points to
the bpf interpreter, ie. within kernel text section. When processing the
unregister event, this causes unexpected removal of vmlinux_map,
crashing perf later in cleanup:
# perf record -- timeout --signal=INT 2s /usr/share/bcc/tools/execsnoop
PCOMM PID PPID RET ARGS
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.208 MB perf.data (5155 samples) ]
perf: tools/include/linux/refcount.h:131: refcount_sub_and_test: Assertion `!(new > val)' failed.
Aborted (core dumped)
# perf script -D|grep KSYM
0 0xa40 [0x48]: PERF_RECORD_KSYMBOL addr ffffffffa9b6b530 len 0 type 1 flags 0x0 name bpf_prog_f958f6eb72ef5af6
0 0xab0 [0x48]: PERF_RECORD_KSYMBOL addr ffffffffa9b6b530 len 0 type 1 flags 0x0 name bpf_prog_8c42dee26e8cd4c2
0 0xb20 [0x48]: PERF_RECORD_KSYMBOL addr ffffffffa9b6b530 len 0 type 1 flags 0x0 name bpf_prog_f958f6eb72ef5af6
108563691893 0x33d98 [0x58]: PERF_RECORD_KSYMBOL addr ffffffffa9b6b3b0 len 0 type 1 flags 0x0 name bpf_prog_bc5697a410556fc2_syscall__execve
108568518458 0x34098 [0x58]: PERF_RECORD_KSYMBOL addr ffffffffa9b6b3f0 len 0 type 1 flags 0x0 name bpf_prog_45e2203c2928704d_do_ret_sys_execve
109301967895 0x34830 [0x58]: PERF_RECORD_KSYMBOL addr ffffffffa9b6b3b0 len 0 type 1 flags 0x1 name bpf_prog_bc5697a410556fc2_syscall__execve
109302007356 0x348b0 [0x58]: PERF_RECORD_KSYMBOL addr ffffffffa9b6b3f0 len 0 type 1 flags 0x1 name bpf_prog_45e2203c2928704d_do_ret_sys_execve
perf: tools/include/linux/refcount.h:131: refcount_sub_and_test: Assertion `!(new > val)' failed.
Here the addresses match the bpf interpreter:
# grep -e ffffffffa9b6b530 -e ffffffffa9b6b3b0 -e ffffffffa9b6b3f0 /proc/kallsyms
ffffffffa9b6b3b0 t __bpf_prog_run224
ffffffffa9b6b3f0 t __bpf_prog_run192
ffffffffa9b6b530 t __bpf_prog_run32
Fix by not allowing vmlinux_map to be removed by PERF_RECORD_KSYMBOL
unregister event.
Signed-off-by: Tommi Rantala <tommi.t.rantala@nokia.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Tested-by: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20201016114718.54332-1-tommi.t.rantala@nokia.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick the changes from:
ecb8ac8b1f ("mm/madvise: introduce process_madvise() syscall: an external memory hinting API")
That addresses these perf build warning:
Warning: Kernel ABI header at 'tools/include/uapi/asm-generic/unistd.h' differs from latest version at 'include/uapi/asm-generic/unistd.h'
diff -u tools/include/uapi/asm-generic/unistd.h include/uapi/asm-generic/unistd.h
Warning: Kernel ABI header at 'tools/perf/arch/x86/entry/syscalls/syscall_64.tbl' differs from latest version at 'arch/x86/entry/syscalls/syscall_64.tbl'
diff -u tools/perf/arch/x86/entry/syscalls/syscall_64.tbl arch/x86/entry/syscalls/syscall_64.tbl
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick the changes in:
85367030a6 ("libbpf: Centralize poisoning and poison reallocarray()")
7d9c71e10b ("libbpf: Extract generic string hashing function for reuse")
That don't entail any changes in tools/perf.
This addresses this perf build warning:
Warning: Kernel ABI header at 'tools/perf/util/hashmap.h' differs from latest version at 'tools/lib/bpf/hashmap.h'
diff -u tools/perf/util/hashmap.h tools/lib/bpf/hashmap.h
Not a kernel ABI, its just that this uses the mechanism in place for
checking kernel ABI files drift.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andriin@fb.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To avoid breaking the build by mixing files compiled with things coming
from distro specific compiler options for perl with the rest of perf,
i.e. to avoid this:
`.gnu.debuglto_.debug_macro' referenced in section `.gnu.debuglto_.debug_macro' of /tmp/build/perf/util/scripting-engines/perf-in.o: defined in discarded section `.gnu.debuglto_.debug_macro[wm4.stdcpredef.h.19.8dc41bed5d9037ff9622e015fb5f0ce3]' of /tmp/build/perf/util/scripting-engines/perf-in.o
Noticed on Fedora 33.
Signed-off-by: Justin M. Forbes <jforbes@fedoraproject.org>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1593431
Cc: Jiri Olsa <jolsa@redhat.com>
Link: 589a32b62f
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Commit 6735b4632d ("Fonts: Support FONT_EXTRA_WORDS macros for built-in
fonts") introduced the following error when building rpc_defconfig (only
this build appears to be affected):
`acorndata_8x8' referenced in section `.text' of arch/arm/boot/compressed/ll_char_wr.o:
defined in discarded section `.data' of arch/arm/boot/compressed/font.o
`acorndata_8x8' referenced in section `.data.rel.ro' of arch/arm/boot/compressed/font.o:
defined in discarded section `.data' of arch/arm/boot/compressed/font.o
make[3]: *** [/scratch/linux/arch/arm/boot/compressed/Makefile:191: arch/arm/boot/compressed/vmlinux] Error 1
make[2]: *** [/scratch/linux/arch/arm/boot/Makefile:61: arch/arm/boot/compressed/vmlinux] Error 2
make[1]: *** [/scratch/linux/arch/arm/Makefile:317: zImage] Error 2
The .data section is discarded at link time. Reinstating acorndata_8x8 as
const ensures it is still available after linking. Do the same for the
other 12 built-in fonts as well, for consistency purposes.
Cc: <stable@vger.kernel.org>
Cc: Russell King <linux@armlinux.org.uk>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fixes: 6735b4632d ("Fonts: Support FONT_EXTRA_WORDS macros for built-in fonts")
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Co-developed-by: Peilin Ye <yepeilin.cs@gmail.com>
Signed-off-by: Peilin Ye <yepeilin.cs@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20201102183242.2031659-1-yepeilin.cs@gmail.com
The request may be executed asynchronously, and rq->state may be
changed to IDLE. To avoid repeated request completion, only
MQ_RQ_COMPLETE of rq->state is checked in nvme_tcp_complete_timed_out.
It is not safe, so need adding check IDLE for rq->state.
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Chao Leng <lengchao@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
The request may be executed asynchronously, and rq->state may be
changed to IDLE. To avoid repeated request completion, only
MQ_RQ_COMPLETE of rq->state is checked in nvme_rdma_complete_timed_out.
It is not safe, so need adding check IDLE for rq->state.
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Chao Leng <lengchao@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Now use teardown_lock to serialize for time out and tear down. This may
cause abnormal: first cancel all request in tear down, then time out may
complete the request again, but the request may already be freed or
restarted.
To avoid race between time out and tear down, in tear down process,
first we quiesce the queue, and then delete the timer and cancel
the time out work for the queue. At the same time we need to delete
teardown_lock.
Signed-off-by: Chao Leng <lengchao@huawei.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Now use teardown_lock to serialize for time out and tear down. This may
cause abnormal: first cancel all request in tear down, then time out may
complete the request again, but the request may already be freed or
restarted.
To avoid race between time out and tear down, in tear down process,
first we quiesce the queue, and then delete the timer and cancel
the time out work for the queue. At the same time we need to delete
teardown_lock.
Signed-off-by: Chao Leng <lengchao@huawei.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Introduce sync io queues for some scenarios which just only need sync
io queues not sync all queues.
Signed-off-by: Chao Leng <lengchao@huawei.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
When unloading the call to pm_runtime_put_sync_suspend() will attempt to
turn the GPU cores off, however panfrost_device_fini() will have turned
the clocks off. This leads to the hardware locking up.
Instead don't call pm_runtime_put_sync_suspend() and instead simply mark
the device as suspended using pm_runtime_set_suspended(). And also
include this on the error path in panfrost_probe().
Fixes: aebe8c22a9 ("drm/panfrost: Fix possible suspend in panfrost_remove")
Signed-off-by: Steven Price <steven.price@arm.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201030145833.29006-1-steven.price@arm.com
Currently, LOG_BUF_SHIFT defaults to 17, which is 2 ^ 17 bytes = 128 KB,
and LOG_CPU_MAX_BUF_SHIFT defaults to 12, which is 2 ^ 12 bytes = 4 KB.
Half of 128 KB is 64 KB, so more than 16 CPUs are required for the value
to be used, as then the sum of contributions is greater than 64 KB for
the first time. My guess is, that the description was written with the
configuration values used in the SUSE in mind.
Fixes: 23b2899f7f ("printk: allow increasing the ring buffer depending on the number of CPUs")
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20200811092924.6256-1-pmenzel@molgen.mpg.de
Commit 5a18e1e0c1 introduced the 'failover_pending' state to track
the "failover pending window" - where we wait for the partner to become
ready (after a transport event) before actually attempting to failover.
i.e window is between following two events:
a. we get a transport event due to a FAILOVER
b. later, we get CRQ_INITIALIZED indicating the partner is
ready at which point we schedule a FAILOVER reset.
and ->failover_pending is true during this window.
If during this window, we attempt to open (or close) a device, we pretend
that the operation succeded and let the FAILOVER reset path complete the
operation.
This is fine, except if the transport event ("a" above) occurs during the
open and after open has already checked whether a failover is pending. If
that happens, we fail the open, which can cause the boot scripts to leave
the interface down requiring administrator to manually bring up the device.
This fix "extends" the failover pending window till we are _actually_
ready to perform the failover reset (i.e until after we get the RTNL
lock). Since open() holds the RTNL lock, we can be sure that we either
finish the open or if the open() fails due to the failover pending window,
we can again pretend that open is done and let the failover complete it.
We could try and block the open until failover is completed but a) that
could still timeout the application and b) Existing code "pretends" that
failover occurred "just after" open succeeded, so marks the open successful
and lets the failover complete the open. So, mark the open successful even
if the transport event occurs before we actually start the open.
Fixes: 5a18e1e0c1 ("ibmvnic: Fix failover case for non-redundant configuration")
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Acked-by: Dany Madden <drt@linux.ibm.com>
Link: https://lore.kernel.org/r/20201030170711.1562994-1-sukadev@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
While reenabling the IRQ after irq poll there may be small time window
where HBA firmware has posted some replies and raise the interrupts but
driver has not received the interrupts. So we may observe I/O timeouts as
the driver has not processed the replies as interrupts got missed while
reenabling the IRQ.
To fix this issue the driver has to go for one more round of processing the
reply descriptors from reply descriptor post queue after enabling the IRQ.
Link: https://lore.kernel.org/r/20201102072746.27410-1-sreekanth.reddy@broadcom.com
Reported-by: Tomas Henzl <thenzl@redhat.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Commit 978aa04741 ("sctp: fix some type cast warnings introduced since
very beginning")' broke err reading from sctp_arg, because it reads the
value as 32-bit integer, although the value is stored as 16-bit integer.
Later this value is passed to the userspace in 16-bit variable, thus the
user always gets 0 on big-endian platforms. Fix it by reading the __u16
field of sctp_arg union, as reading err field would produce a sparse
warning.
Fixes: 978aa04741 ("sctp: fix some type cast warnings introduced since very beginning")
Signed-off-by: Petr Malat <oss@malat.biz>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Link: https://lore.kernel.org/r/20201030132633.7045-1-oss@malat.biz
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Some of the font tty ioctl's always used the current foreground VC for
their operations. Don't do that then.
This fixes a data race on fg_console.
Side note: both Michael Ellerman and Jiri Slaby point out that all these
ioctls are deprecated, and should probably have been removed long ago,
and everything seems to be using the KDFONTOP ioctl instead.
In fact, Michael points out that it looks like busybox's loadfont
program seems to have switched over to using KDFONTOP exactly _because_
of this bug (ahem.. 12 years ago ;-).
Reported-by: Minh Yuan <yuanmingbuaa@gmail.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Jiri Slaby <jirislaby@kernel.org>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Merge misc fixes from Andrew Morton:
"Subsystems affected by this patch series: mm (memremap, memcg,
slab-generic, kasan, mempolicy, pagecache, oom-kill, pagemap),
kthread, signals, lib, epoll, and core-kernel"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
kernel/hung_task.c: make type annotations consistent
epoll: add a selftest for epoll timeout race
mm: always have io_remap_pfn_range() set pgprot_decrypted()
mm, oom: keep oom_adj under or at upper limit when printing
kthread_worker: prevent queuing delayed work from timer_fn when it is being canceled
mm/truncate.c: make __invalidate_mapping_pages() static
lib/crc32test: remove extra local_irq_disable/enable
ptrace: fix task_join_group_stop() for the case when current is traced
mm: mempolicy: fix potential pte_unmap_unlock pte error
kasan: adopt KUNIT tests to SW_TAGS mode
mm: memcg: link page counters to root if use_hierarchy is false
mm: memcontrol: correct the NR_ANON_THPS counter of hierarchical memcg
hugetlb_cgroup: fix reservation accounting
mm/mremap_pages: fix static key devmap_managed_key updates
If bits is 0, the case when the map is empty, then the >> is the size of
the register which is undefined behavior - on x86 it is the same as a
shift by 0.
Fix by handling the 0 case explicitly and guarding calls to hash_bits for
empty maps in hashmap__for_each_key_entry and hashmap__for_each_entry_safe.
Fixes: e3b9242240 ("libbpf: add resizable non-thread safe internal hashmap")
Suggested-by: Andrii Nakryiko <andriin@fb.com>,
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20201029223707.494059-1-irogers@google.com
The TI CPTS does not natively support PTPv1, only PTPv2. But, as it
happens, the CPTS can provide HW timestamp for PTPv1 Sync messages, because
CPTS HW parser looks for PTP messageType id in PTP message octet 0 which
value is 0 for PTPv1. As result, CPTS HW can detect Sync messages for PTPv1
and PTPv2 (Sync messageType = 0 for both), but it fails for any other PTPv1
messages (Delay_req/resp) and will return PTP messageType id 0 for them.
The commit e9523a5a32 ("net: ethernet: ti: cpsw: enable
HWTSTAMP_FILTER_PTP_V1_L4_EVENT filter") added PTPv1 hw timestamping
advertisement by mistake, only to make Linux Kernel "timestamping" utility
work, and this causes issues with only PTPv1 compatible HW/SW - Sync HW
timestamped, but Delay_req/resp are not.
Hence, fix it disabling PTPv1 hw timestamping advertisement, so only PTPv1
compatible HW/SW can properly roll back to SW timestamping.
Fixes: e9523a5a32 ("net: ethernet: ti: cpsw: enable HWTSTAMP_FILTER_PTP_V1_L4_EVENT filter")
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Link: https://lore.kernel.org/r/20201029190910.30789-1-grygorii.strashko@ti.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The copy_to_user() function returns the number of bytes remaining to be
copied, but this code should return -EFAULT.
Fixes: df747bcd5b ("vfio/fsl-mc: Implement VFIO_DEVICE_GET_REGION_INFO ioctl call")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Diana Craciun <diana.craciun@oss.nxp.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
When attaching a new group to the container, let's use the new helper
vfio_iommu_find_iommu_group() to check if it's already attached. There
is no functional change.
Also take this chance to add a missing blank line.
Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
parse_synth_field() returns a pointer and requires that errors get
surrounded by ERR_PTR(). The ret variable is initialized to zero, but should
never be used as zero, and if it is, it could cause a false return code and
produce a NULL pointer dereference. It makes no sense to set ret to zero.
Set ret to -ENOMEM (the most common error case), and have any other errors
set it to something else. This removes the need to initialize ret on *every*
error branch.
Fixes: 761a8c58db ("tracing, synthetic events: Replace buggy strcat() with seq_buf operations")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
The recursion protection of the ring buffer depends on preempt_count() to be
correct. But it is possible that the ring buffer gets called after an
interrupt comes in but before it updates the preempt_count(). This will
trigger a false positive in the recursion code.
Use the same trick from the ftrace function callback recursion code which
uses a "transition" bit that gets set, to allow for a single recursion for
to handle transitions between contexts.
Cc: stable@vger.kernel.org
Fixes: 567cd4da54 ("ring-buffer: User context bit recursion checking")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Right now, we can end up calling cancel_delayed_work_sync from within
delete_work_func via gfs2_lookup_by_inum -> gfs2_inode_lookup ->
gfs2_cancel_delete_work. When that happens, it will result in a
deadlock. Instead, gfs2_inode_lookup should skip the call to
gfs2_cancel_delete_work when called from delete_work_func (blktype ==
GFS2_BLKST_UNLINKED).
Reported-by: Alexander Ahring Oder Aring <aahringo@redhat.com>
Fixes: a0e3cc65fa ("gfs2: Turn gl_delete into a delayed work")
Cc: stable@vger.kernel.org # v5.8+
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Commit 32927393dc ("sysctl: pass kernel pointers to ->proc_handler")
removed various __user annotations from function signatures as part of
its refactoring.
It also removed the __user annotation for proc_dohung_task_timeout_secs()
at its declaration in sched/sysctl.h, but not at its definition in
kernel/hung_task.c.
Hence, sparse complains:
kernel/hung_task.c:271:5: error: symbol 'proc_dohung_task_timeout_secs' redeclared with different type (incompatible argument 3 (different address spaces))
Adjust the annotation at the definition fitting to that refactoring to make
sparse happy again, which also resolves this warning from sparse:
kernel/hung_task.c:277:52: warning: incorrect type in argument 3 (different address spaces)
kernel/hung_task.c:277:52: expected void *
kernel/hung_task.c:277:52: got void [noderef] __user *buffer
No functional change. No change in object code.
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrey Ignatov <rdna@fb.com>
Link: https://lkml.kernel.org/r/20201028130541.20320-1-lukas.bulwahn@gmail.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This testcase
#include <stdio.h>
#include <unistd.h>
#include <signal.h>
#include <sys/ptrace.h>
#include <sys/wait.h>
#include <pthread.h>
#include <assert.h>
void *tf(void *arg)
{
return NULL;
}
int main(void)
{
int pid = fork();
if (!pid) {
kill(getpid(), SIGSTOP);
pthread_t th;
pthread_create(&th, NULL, tf, NULL);
return 0;
}
waitpid(pid, NULL, WSTOPPED);
ptrace(PTRACE_SEIZE, pid, 0, PTRACE_O_TRACECLONE);
waitpid(pid, NULL, 0);
ptrace(PTRACE_CONT, pid, 0,0);
waitpid(pid, NULL, 0);
int status;
int thread = waitpid(-1, &status, 0);
assert(thread > 0 && thread != pid);
assert(status == 0x80137f);
return 0;
}
fails and triggers WARN_ON_ONCE(!signr) in do_jobctl_trap().
This is because task_join_group_stop() has 2 problems when current is traced:
1. We can't rely on the "JOBCTL_STOP_PENDING" check, a stopped tracee
can be woken up by debugger and it can clone another thread which
should join the group-stop.
We need to check group_stop_count || SIGNAL_STOP_STOPPED.
2. If SIGNAL_STOP_STOPPED is already set, we should not increment
sig->group_stop_count and add JOBCTL_STOP_CONSUME. The new thread
should stop without another do_notify_parent_cldstop() report.
To clarify, the problem is very old and we should blame
ptrace_init_task(). But now that we have task_join_group_stop() it makes
more sense to fix this helper to avoid the code duplication.
Reported-by: syzbot+3485e3773f7da290eecc@syzkaller.appspotmail.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christian Brauner <christian@brauner.io>
Cc: "Eric W . Biederman" <ebiederm@xmission.com>
Cc: Zhiqiang Liu <liuzhiqiang26@huawei.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20201019134237.GA18810@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When flags in queue_pages_pte_range don't have MPOL_MF_MOVE or
MPOL_MF_MOVE_ALL bits, code breaks and passing origin pte - 1 to
pte_unmap_unlock seems like not a good idea.
queue_pages_pte_range can run in MPOL_MF_MOVE_ALL mode which doesn't
migrate misplaced pages but returns with EIO when encountering such a
page. Since commit a7f40cfe3b ("mm: mempolicy: make mbind() return
-EIO when MPOL_MF_STRICT is specified") and early break on the first pte
in the range results in pte_unmap_unlock on an underflow pte. This can
lead to lockups later on when somebody tries to lock the pte resp.
page_table_lock again..
Fixes: a7f40cfe3b ("mm: mempolicy: make mbind() return -EIO when MPOL_MF_STRICT is specified")
Signed-off-by: Shijie Luo <luoshijie1@huawei.com>
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Feilong Lin <linfeilong@huawei.com>
Cc: Shijie Luo <luoshijie1@huawei.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20201019074853.50856-1-luoshijie1@huawei.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Now that we have KASAN-KUNIT tests integration, it's easy to see that
some KASAN tests are not adopted to the SW_TAGS mode and are failing.
Adjust the allocation size for kasan_memchr() and kasan_memcmp() by
roung it up to OOB_TAG_OFF so the bad access ends up in a separate
memory granule.
Add a new kmalloc_uaf_16() tests that relies on UAF, and a new
kasan_bitops_tags() test that is tailored to tag-based mode, as it's
hard to adopt the existing kmalloc_oob_16() and kasan_bitops_generic()
(renamed from kasan_bitops()) without losing the precision.
Add new kmalloc_uaf_16() and kasan_bitops_uaf() tests that rely on UAFs,
as it's hard to adopt the existing kmalloc_oob_16() and
kasan_bitops_oob() (rename from kasan_bitops()) without losing the
precision.
Disable kasan_global_oob() and kasan_alloca_oob_left/right() as SW_TAGS
mode doesn't instrument globals nor dynamic allocas.
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: David Gow <davidgow@google.com>
Link: https://lkml.kernel.org/r/76eee17b6531ca8b3ca92b240cb2fd23204aaff7.1603129942.git.andreyknvl@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Richard reported a warning which can be reproduced by running the LTP
madvise6 test (cgroup v1 in the non-hierarchical mode should be used):
WARNING: CPU: 0 PID: 12 at mm/page_counter.c:57 page_counter_uncharge (mm/page_counter.c:57 mm/page_counter.c:50 mm/page_counter.c:156)
Modules linked in:
CPU: 0 PID: 12 Comm: kworker/0:1 Not tainted 5.9.0-rc7-22-default #77
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-48-gd9c812d-rebuilt.opensuse.org 04/01/2014
Workqueue: events drain_local_stock
RIP: 0010:page_counter_uncharge (mm/page_counter.c:57 mm/page_counter.c:50 mm/page_counter.c:156)
Call Trace:
__memcg_kmem_uncharge (mm/memcontrol.c:3022)
drain_obj_stock (./include/linux/rcupdate.h:689 mm/memcontrol.c:3114)
drain_local_stock (mm/memcontrol.c:2255)
process_one_work (./arch/x86/include/asm/jump_label.h:25 ./include/linux/jump_label.h:200 ./include/trace/events/workqueue.h:108 kernel/workqueue.c:2274)
worker_thread (./include/linux/list.h:282 kernel/workqueue.c:2416)
kthread (kernel/kthread.c:292)
ret_from_fork (arch/x86/entry/entry_64.S:300)
The problem occurs because in the non-hierarchical mode non-root page
counters are not linked to root page counters, so the charge is not
propagated to the root memory cgroup.
After the removal of the original memory cgroup and reparenting of the
object cgroup, the root cgroup might be uncharged by draining a objcg
stock, for example. It leads to an eventual underflow of the charge and
triggers a warning.
Fix it by linking all page counters to corresponding root page counters
in the non-hierarchical mode.
Please note, that in the non-hierarchical mode all objcgs are always
reparented to the root memory cgroup, even if the hierarchy has more
than 1 level. This patch doesn't change it.
The patch also doesn't affect how the hierarchical mode is working,
which is the only sane and truly supported mode now.
Thanks to Richard for reporting, debugging and providing an alternative
version of the fix!
Fixes: bf4f059954 ("mm: memcg/slab: obj_cgroup API")
Reported-by: <ltp@lists.linux.it>
Signed-off-by: Roman Gushchin <guro@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Reviewed-by: Michal Koutný <mkoutny@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20201026231326.3212225-1-guro@fb.com
Debugged-by: Richard Palethorpe <rpalethorpe@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michal Privoznik was using "free page reporting" in QEMU/virtio-balloon
with hugetlbfs and hit the warning below. QEMU with free page hinting
uses fallocate(FALLOC_FL_PUNCH_HOLE) to discard pages that are reported
as free by a VM. The reporting granularity is in pageblock granularity.
So when the guest reports 2M chunks, we fallocate(FALLOC_FL_PUNCH_HOLE)
one huge page in QEMU.
WARNING: CPU: 7 PID: 6636 at mm/page_counter.c:57 page_counter_uncharge+0x4b/0x50
Modules linked in: ...
CPU: 7 PID: 6636 Comm: qemu-system-x86 Not tainted 5.9.0 #137
Hardware name: Gigabyte Technology Co., Ltd. X570 AORUS PRO/X570 AORUS PRO, BIOS F21 07/31/2020
RIP: 0010:page_counter_uncharge+0x4b/0x50
...
Call Trace:
hugetlb_cgroup_uncharge_file_region+0x4b/0x80
region_del+0x1d3/0x300
hugetlb_unreserve_pages+0x39/0xb0
remove_inode_hugepages+0x1a8/0x3d0
hugetlbfs_fallocate+0x3c4/0x5c0
vfs_fallocate+0x146/0x290
__x64_sys_fallocate+0x3e/0x70
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Investigation of the issue uncovered bugs in hugetlb cgroup reservation
accounting. This patch addresses the found issues.
Fixes: 075a61d07a ("hugetlb_cgroup: add accounting for shared mappings")
Reported-by: Michal Privoznik <mprivozn@redhat.com>
Co-developed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Cc: <stable@vger.kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Tejun Heo <tj@kernel.org>
Link: https://lkml.kernel.org/r/20201021204426.36069-1-mike.kravetz@oracle.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
commit 6f42193fd8 ("memremap: don't use a separate devm action for
devmap_managed_enable_get") changed the static key updates such that we
now call devmap_managed_enable_put() without doing the equivalent
devmap_managed_enable_get().
devmap_managed_enable_get() is only called for MEMORY_DEVICE_PRIVATE and
MEMORY_DEVICE_FS_DAX, But memunmap_pages() get called for other pgmap
types too. This results in the below warning when switching between
system-ram and devdax mode for devdax namespace.
jump label: negative count!
WARNING: CPU: 52 PID: 1335 at kernel/jump_label.c:235 static_key_slow_try_dec+0x88/0xa0
Modules linked in:
....
NIP static_key_slow_try_dec+0x88/0xa0
LR static_key_slow_try_dec+0x84/0xa0
Call Trace:
static_key_slow_try_dec+0x84/0xa0
__static_key_slow_dec_cpuslocked+0x34/0xd0
static_key_slow_dec+0x54/0xf0
memunmap_pages+0x36c/0x500
devm_action_release+0x30/0x50
release_nodes+0x2f4/0x3e0
device_release_driver_internal+0x17c/0x280
bus_remove_device+0x124/0x210
device_del+0x1d4/0x530
unregister_dev_dax+0x48/0xe0
devm_action_release+0x30/0x50
release_nodes+0x2f4/0x3e0
device_release_driver_internal+0x17c/0x280
unbind_store+0x130/0x170
drv_attr_store+0x40/0x60
sysfs_kf_write+0x6c/0xb0
kernfs_fop_write+0x118/0x280
vfs_write+0xe8/0x2a0
ksys_write+0x84/0x140
system_call_exception+0x120/0x270
system_call_common+0xf0/0x27c
Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Link: https://lkml.kernel.org/r/20201023183222.13186-1-rcampbell@nvidia.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ARC HSDK platform stopped booting on released v5.10-rc1, getting stuck
in startup of non master SMP cores.
This was bisected to upstream commit 7fef431be9
"(mm/page_alloc: place pages to tail in __free_pages_core())"
That commit itself is harmless, it just exposed a subtle assumption in
our platform code (hence CC'ing linux-mm just as FYI in case some other
arches / platforms trip on it).
The upstream commit is semantically disruptive as it reverses the order
of page allocations (actually it can be good test for hardware
verification to exercise different memory patterns altogether).
For ARC HSDK platform that meant a remapped memory region (pertaining to
unused Closely Coupled Memory) started getting used early for dynamice
allocations, while not effectively remapped on all the cores, triggering
memory error exception on those cores.
The fix is to move the CCM remapping from early platform code to to early core
boot code. And while it is undesirable to riddle common boot code with
platform quirks, there is no other way to do this since the faltering code
involves setting up stack itself so even function calls are not allowed at
that point.
If anyone is interested, all the gory details can be found at Link below.
Link: https://github.com/foss-for-synopsys-dwc-arc-processors/linux/issues/32
Cc: David Hildenbrand <david@redhat.com>
Cc: linux-mm@kvack.org
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Failure in srpt_refresh_port() for the second port will leave MAD
registered for the first one, however, the srpt_add_one() will be marked
as "failed" and SRPT will leak resources for that registered but not used
and released first port.
Unregister the MAD agent for all ports in case of failure.
Fixes: a42d985bd5 ("ib_srpt: Initial SRP Target merge for v3.3-rc1")
Link: https://lore.kernel.org/r/20201028065051.112430-1-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Camelia Groza says:
====================
dpaa_eth: buffer layout fixes
The patches are related to the software workaround for the A050385 erratum.
The first patch ensures optimal buffer usage for non-erratum scenarios. The
second patch fixes a currently inconsequential discrepancy between the
FMan and Ethernet drivers.
Changes in v3:
- refactor defines for clarity in 1/2
- add more details on the user impact in 1/2
- remove unnecessary inline identifier in 2/2
Changes in v2:
- make the returned value for TX ports explicit in 2/2
- simplify the buf_layout reference in 2/2
====================
Link: https://lore.kernel.org/r/cover.1604339942.git.camelia.groza@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The headroom reserved for received frames needs to be aligned to an
RX specific value. There is currently a discrepancy between the values
used in the Ethernet driver and the values passed to the FMan.
Coincidentally, the resulting aligned values are identical.
Fixes: 3c68b8fffb ("dpaa_eth: FMan erratum A050385 workaround")
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Camelia Groza <camelia.groza@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Impose a larger RX private data area only when the A050385 erratum is
present on the hardware. A smaller buffer size is sufficient in all
other scenarios. This enables a wider range of linear Jumbo frame
sizes in non-erratum scenarios, instead of turning to multi
buffer Scatter/Gather frames. The maximum linear frame size is
increased by 128 bytes for non-erratum arm64 platforms.
Cleanup the hardware annotations header defines in the process.
Fixes: 3c68b8fffb ("dpaa_eth: FMan erratum A050385 workaround")
Signed-off-by: Camelia Groza <camelia.groza@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When we read device memory, we lock a spinlock, write the address we
want to read from the device and then spin in a loop reading the data
in 32-bit quantities from another register.
As the description makes clear, this is rather inefficient, incurring
a PCIe bus transaction for every read. In a typical device today, we
want to read 786k SMEM if it crashes, leading to 192k register reads.
Occasionally, we've seen the whole loop take over 20 seconds and then
triggering the soft lockup detector.
Clearly, it is unreasonable to spin here for such extended periods of
time.
To fix this, break the loop down into an outer and an inner loop, and
break out of the inner loop if more than half a second elapsed. To
avoid too much overhead, check for that only every 128 reads, though
there's no particular reason for that number. Then, unlock and relock
to obtain NIC access again, reprogram the start address and continue.
This will keep (interrupt) latencies on the CPU down to a reasonable
time.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Mordechay Goodstein <mordechay.goodstein@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/iwlwifi.20201022165103.45878a7e49aa.I3b9b9c5a10002915072312ce75b68ed5b3dc6e14@changeid
The clang build reports this warning
fw.c:1485:21: warning: address of array 'rtwdev->chip->fw_fifo_addr'
will always evaluate to 'true'
if (!rtwdev->chip->fw_fifo_addr) {
fw_fifo_addr is an array in rtw_chip_info so it is always
nonzero. A better check is if the first element of the array is
nonzero. In the cases where fw_fifo_addr is initialized by rtw88b
and rtw88c, the first array element is 0x780.
Fixes: 0fbc2f0f34 ("rtw88: add dump firmware fifo support")
Signed-off-by: Tom Rix <trix@redhat.com>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Acked-by: Tzu-En Huang <tehuang@realtek.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20201011155438.15892-1-trix@redhat.com
Johannes Berg says:
====================
A couple of fixes, for
* HE on 2.4 GHz
* a few issues syzbot found, but we have many more reports :-(
* a regression in nl80211-transported EAPOL frames which had
affected a number of users, from Mathy
* kernel-doc markings in mac80211, from Mauro
* a format argument in reg.c, from Ye Bin
====================
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The QSPI flash node needs to have the required "jedec,spi-nor"
in the compatible string.
Fixes: 0cb140d07f ("arm64: dts: stratix10: Add QSPI support for Stratix10")
Cc: stable@vger.kernel.org
Suggested-by: Vignesh Raghavendra <vigneshr@ti.com>
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
There is no need to specify a "ULL" suffix for "all bits set": "~0" is
sufficient, and works regardless of type. In fact adding the suffix
makes the code more fragile.
Fixes: 48ab6d5d1f ("dma-mapping: fix 32-bit overflow with CONFIG_ARM_LPAE=n")
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Since the device is resumed from runtime-suspend in
__device_release_driver() anyway, it is better to do that before
looking for busy managed device links from it to consumers, because
if there are any, device_links_unbind_consumers() will be called
and it will cause the consumer devices' drivers to unbind, so the
consumer devices will be runtime-resumed. In turn, resuming each
consumer device will cause the supplier to be resumed and when the
runtime PM references from the given consumer to it are dropped, it
may be suspended. Then, the runtime-resume of the next consumer
will cause the supplier to resume again and so on.
Update the code accordingly.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Fixes: 9ed9895370 ("driver core: Functional dependencies tracking support")
Cc: All applicable <stable@vger.kernel.org> # All applicable
Tested-by: Xiang Chen <chenxiang66@hisilicon.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
After commit d12544fb2a ("PM: runtime: Remove link state checks in
rpm_get/put_supplier()") nothing prevents the consumer device's
runtime PM from acquiring additional references to the supplier
device after pm_runtime_clean_up_links() has run (or even while it
is running), so calling this function from __device_release_driver()
may be pointless (or even harmful).
Moreover, it ignores stateless device links, so the runtime PM
handling of managed and stateless device links is inconsistent
because of it, so better get rid of it entirely.
Fixes: d12544fb2a ("PM: runtime: Remove link state checks in rpm_get/put_supplier()")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: 5.1+ <stable@vger.kernel.org> # 5.1+
Tested-by: Xiang Chen <chenxiang66@hisilicon.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
While removing a device link, drop the supplier device's runtime PM
usage counter as many times as needed to drop all of the runtime PM
references to it from the consumer in addition to dropping the
consumer's link count.
Fixes: baa8809f60 ("PM / runtime: Optimize the use of device links")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: 5.1+ <stable@vger.kernel.org> # 5.1+
Tested-by: Xiang Chen <chenxiang66@hisilicon.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The cpufreq policy's frequency limits (min/max) can get changed at any
point of time, while schedutil is trying to update the next frequency.
Though the schedutil governor has necessary locking and support in place
to make sure we don't miss any of those updates, there is a corner case
where the governor will find that the CPU is already running at the
desired frequency and so may skip an update.
For example, consider that the CPU can run at 1 GHz, 1.2 GHz and 1.4 GHz
and is running at 1 GHz currently. Schedutil tries to update the
frequency to 1.2 GHz, during this time the policy limits get changed as
policy->min = 1.4 GHz. As schedutil (and cpufreq core) does clamp the
frequency at various instances, we will eventually set the frequency to
1.4 GHz, while we will save 1.2 GHz in sg_policy->next_freq.
Now lets say the policy limits get changed back at this time with
policy->min as 1 GHz. The next time schedutil is invoked by the
scheduler, we will reevaluate the next frequency (because
need_freq_update will get set due to limits change event) and lets say
we want to set the frequency to 1.2 GHz again. At this point
sugov_update_next_freq() will find the next_freq == current_freq and
will abort the update, while the CPU actually runs at 1.4 GHz.
Until now need_freq_update was used as a flag to indicate that the
policy's frequency limits have changed, and that we should consider the
new limits while reevaluating the next frequency.
This patch fixes the above mentioned issue by extending the purpose of
the need_freq_update flag. If this flag is set now, the schedutil
governor will not try to abort a frequency change even if next_freq ==
current_freq.
As similar behavior is required in the case of
CPUFREQ_NEED_UPDATE_LIMITS flag as well, need_freq_update will never be
set to false if that flag is set for the driver.
We also don't need to consider the need_freq_update flag in
sugov_update_single() anymore to handle the special case of busy CPU, as
we won't abort a frequency update anymore.
Reported-by: zhuguangqing <zhuguangqing@xiaomi.com>
Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
[ rjw: Rearrange code to avoid a branch ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Change the nfsroot default mount option to ask for NFSv2 only *if* the
kernel was built with NFSv2 support.
If not, default to NFSv3 or as last choice to NFSv4, depending on actual
kernel config.
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
The tbl_dma_addr argument is used to check the DMA boundary for the
allocations, and thus needs to be a dma_addr_t. swiotlb-xen instead
passed a physical address, which could lead to incorrect results for
strange offsets. Fix this by removing the parameter entirely and hard
code the DMA address for io_tlb_start instead.
Fixes: 91ffe4ad53 ("swiotlb-xen: introduce phys_to_dma/dma_to_phys translations")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
kernel/dma/swiotlb.c:swiotlb_init gets called first and tries to
allocate a buffer for the swiotlb. It does so by calling
memblock_alloc_low(PAGE_ALIGN(bytes), PAGE_SIZE);
If the allocation must fail, no_iotlb_memory is set.
Later during initialization swiotlb-xen comes in
(drivers/xen/swiotlb-xen.c:xen_swiotlb_init) and given that io_tlb_start
is != 0, it thinks the memory is ready to use when actually it is not.
When the swiotlb is actually needed, swiotlb_tbl_map_single gets called
and since no_iotlb_memory is set the kernel panics.
Instead, if swiotlb-xen.c:xen_swiotlb_init knew the swiotlb hadn't been
initialized, it would do the initialization itself, which might still
succeed.
Fix the panic by setting io_tlb_start to 0 on swiotlb initialization
failure, and also by setting no_iotlb_memory to false on swiotlb
initialization success.
Fixes: ac2cbab21f ("x86: Don't panic if can not alloc buffer for swiotlb")
Reported-by: Elliott Mitchell <ehem+xen@m5p.com>
Tested-by: Elliott Mitchell <ehem+xen@m5p.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: stable@vger.kernel.org
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
When an interrupt or NMI comes in and switches the context, there's a delay
from when the preempt_count() shows the update. As the preempt_count() is
used to detect recursion having each context have its own bit get set when
tracing starts, and if that bit is already set, it is considered a recursion
and the function exits. But if this happens in that section where context
has changed but preempt_count() has not been updated, this will be
incorrectly flagged as a recursion.
To handle this case, create another bit call TRANSITION and test it if the
current context bit is already set. Flag the call as a recursion if the
TRANSITION bit is already set, and if not, set it and continue. The
TRANSITION bit will be cleared normally on the return of the function that
set it, or if the current context bit is clear, set it and clear the
TRANSITION bit to allow for another transition between the current context
and an even higher one.
Cc: stable@vger.kernel.org
Fixes: edc15cafcb ("tracing: Avoid unnecessary multiple recursion checks")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
The code that checks recursion will work to only do the recursion check once
if there's nested checks. The top one will do the check, the other nested
checks will see recursion was already checked and return zero for its "bit".
On the return side, nothing will be done if the "bit" is zero.
The problem is that zero is returned for the "good" bit when in NMI context.
This will set the bit for NMIs making it look like *all* NMI tracing is
recursing, and prevent tracing of anything in NMI context!
The simple fix is to return "bit + 1" and subtract that bit on the end to
get the real bit.
Cc: stable@vger.kernel.org
Fixes: edc15cafcb ("tracing: Avoid unnecessary multiple recursion checks")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
The nesting count of trace_printk allows for 4 levels of nesting. The
nesting counter starts at zero and is incremented before being used to
retrieve the current context's buffer. But the index to the buffer uses the
nesting counter after it was incremented, and not its original number,
which in needs to do.
Link: https://lkml.kernel.org/r/20201029161905.4269-1-hqjagain@gmail.com
Cc: stable@vger.kernel.org
Fixes: 3d9622c12c ("tracing: Add barrier to trace_printk() buffer nesting modification")
Signed-off-by: Qiujun Huang <hqjagain@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
The ReST output should only contain documentation titles
automatically created by the script.
There are two reasons for that:
1) Consistency.
just a handful ABI docs define titles
2) To avoid critical errors.
Docutils (which is the basis for Sphinx) allows a free
assign of documentation title markups. So, one document
could be doing things like:
Level 1
=======
Level 2
-------
While another one could do the reverse:
Level 1
-------
Level 2
=======
But the same document can't mix.
As the output of get_abi.pl will join contents from multiple
files, if they don't define the levels on a consistent errors,
errors like this can happen:
Sphinx parallel build error:
docutils.utils.SystemMessage: /home/rdunlap/lnx/lnx-510-rc2/Documentation/ABI/testing/sysfs-bus-rapidio:2: (SEVERE/4) Title level inconsistent:
Attributes Common for All RapidIO Devices
-----------------------------------------
Which cause some versions of Sphinx to go into an endless
loop.
It should be noticed that an alternative to that would
be to replace all title occurrences by a single markup,
but that will make the parser more complex, and, due to
(1) it would generate an inconsistent output.
So, better to just remove the titles defined at the ABI
files from the output.
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/6c62ef5c01d39dee8d891f8390c816d2a889670a.1604312590.git.mchehab+huawei@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Felipe writes:
USB: fixes for v5.10-rc2
Nothing major as of yet, we're adding support for Intel Alder Lake-S
in dwc3, together with a few fixes that are quite important (memory
leak in raw-gadget, probe crashes in goku_udc, and so on).
Signed-off-by: Felipe Balbi <balbi@kernel.org>
* tag 'fixes-for-v5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/balbi/usb:
usb: raw-gadget: fix memory leak in gadget_setup
usb: dwc2: Avoid leaving the error_debugfs label unused
usb: dwc3: ep0: Fix delay status handling
usb: gadget: fsl: fix null pointer checking
usb: gadget: goku_udc: fix potential crashes in probe
usb: dwc3: pci: add support for the Intel Alder Lake-S
After merging the drm-misc tree, linux-next build (arm
multi_v7_defconfig) failed like this:
In file included from drivers/gpu/drm/nouveau/nouveau_ttm.c:26:
include/linux/swiotlb.h: In function 'swiotlb_max_mapping_size':
include/linux/swiotlb.h:99:9: error: 'SIZE_MAX' undeclared (first use in this function)
99 | return SIZE_MAX;
| ^~~~~~~~
include/linux/swiotlb.h:7:1: note: 'SIZE_MAX' is defined in header '<stdint.h>'; did you forget to '#include <stdint.h>'?
6 | #include <linux/init.h>
+++ |+#include <stdint.h>
7 | #include <linux/types.h>
include/linux/swiotlb.h:99:9: note: each undeclared identifier is reported only once for each function it appears in
99 | return SIZE_MAX;
| ^~~~~~~~
Caused by commit
abe420bfae ("swiotlb: Introduce swiotlb_max_mapping_size()")
but only exposed by commit "drm/nouveu: fix swiotlb include"
Fix it by including linux/limits.h as appropriate.
Fixes: abe420bfae ("swiotlb: Introduce swiotlb_max_mapping_size()")
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Link: https://lore.kernel.org/r/20201102124327.2f82b2a7@canb.auug.org.au
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Building 5.10-rc1 in a setgid directory failed with the following
error:
dpkg-deb: error: control directory has bad permissions 2755 (must be
>=0755 and <=0775)
When building with fakeroot, the earlier chown call would have removed
the setgid bits, but in a rootless build they remain.
Fixes: 3e85418036 ("builddeb: Enable rootless builds")
Cc: Guillem Jover <guillem@hadrons.org>
Signed-off-by: Sven Joachim <svenjoac@gmx.de>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
The "size" tool has been solely used by s390 to enforce .bss section usage
restrictions in early startup code. Since commit 980d5f9ab3 ("s390/boot:
enable .bss section for compressed kernel") and commit 2e83e0eb85
("s390: clean .bss before running uncompressed kernel") these restrictions
have been lifted for the decompressor and uncompressed kernel and the
size tool is now unused.
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
The call to rcu_cpu_starting() in start_secondary() is not early
enough in the CPU-hotplug onlining process, which results in lockdep
splats as follows (with CONFIG_PROVE_RCU_LIST=y):
WARNING: suspicious RCU usage
-----------------------------
kernel/locking/lockdep.c:3497 RCU-list traversed in non-reader section!!
other info that might help us debug this:
RCU used illegally from offline CPU!
rcu_scheduler_active = 1, debug_locks = 1
no locks held by swapper/1/0.
Call Trace:
dump_stack+0xec/0x144 (unreliable)
lockdep_rcu_suspicious+0x128/0x14c
__lock_acquire+0x1060/0x1c60
lock_acquire+0x140/0x5f0
_raw_spin_lock_irqsave+0x64/0xb0
clockevents_register_device+0x74/0x270
register_decrementer_clockevent+0x94/0x110
start_secondary+0x134/0x800
start_secondary_prolog+0x10/0x14
This is avoided by adding a call to rcu_cpu_starting() near the
beginning of the start_secondary() function. Note that the
raw_smp_processor_id() is required in order to avoid calling into
lockdep before RCU has declared the CPU to be watched for readers.
It's safe to call rcu_cpu_starting() in the arch code as well as later
in generic code, as explained by Paul:
It uses a per-CPU variable so that RCU pays attention only to the
first call to rcu_cpu_starting() if there is more than one of them.
This is even intentional, due to there being a generic
arch-independent call to rcu_cpu_starting() in
notify_cpu_starting().
So multiple calls to rcu_cpu_starting() are fine by design.
Fixes: 4d004099a6 ("lockdep: Fix lockdep recursion")
Signed-off-by: Qian Cai <cai@redhat.com>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
[mpe: Add Fixes tag, reword slightly & expand change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201028182334.13466-1-cai@redhat.com
Lockdep complains that a possible deadlock below in
eeh_addr_cache_show() because it is acquiring a lock with IRQ enabled,
but eeh_addr_cache_insert_dev() needs to acquire the same lock with IRQ
disabled. Let's just make eeh_addr_cache_show() acquire the lock with
IRQ disabled as well.
CPU0 CPU1
---- ----
lock(&pci_io_addr_cache_root.piar_lock);
local_irq_disable();
lock(&tp->lock);
lock(&pci_io_addr_cache_root.piar_lock);
<Interrupt>
lock(&tp->lock);
*** DEADLOCK ***
lock_acquire+0x140/0x5f0
_raw_spin_lock_irqsave+0x64/0xb0
eeh_addr_cache_insert_dev+0x48/0x390
eeh_probe_device+0xb8/0x1a0
pnv_pcibios_bus_add_device+0x3c/0x80
pcibios_bus_add_device+0x118/0x290
pci_bus_add_device+0x28/0xe0
pci_bus_add_devices+0x54/0xb0
pcibios_init+0xc4/0x124
do_one_initcall+0xac/0x528
kernel_init_freeable+0x35c/0x3fc
kernel_init+0x24/0x148
ret_from_kernel_thread+0x5c/0x80
lock_acquire+0x140/0x5f0
_raw_spin_lock+0x4c/0x70
eeh_addr_cache_show+0x38/0x110
seq_read+0x1a0/0x660
vfs_read+0xc8/0x1f0
ksys_read+0x74/0x130
system_call_exception+0xf8/0x1d0
system_call_common+0xe8/0x218
Fixes: 5ca85ae631 ("powerpc/eeh_cache: Add a way to dump the EEH address cache")
Signed-off-by: Qian Cai <cai@redhat.com>
Reviewed-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201028152717.8967-1-cai@redhat.com
Due to bug in the bootloader, the PHY has floating address and may
randomly change on each PHY reset. To avoid it, the updated bootloader
with the following patch[0] should be used:
| ARM: protonic: disable on-die termination to fix PHY bootstrapping
|
| If on-die termination is enabled, the RXC pin of iMX6 will be pulled
| high. Since we already have an 10K pull-down on board, the RXC level on
| PHY reset will be ~800mV, which is mostly interpreted as 1. On some
| reboots we get 0 instead and kernel can't detect the PHY properly.
|
| Since the default 0x020e07ac value is 0, it is sufficient to remove this
| entry from the affected imxcfg files.
|
| Since we get stable 0 on pin PHYADDR[2], the PHY address is changed from
| 4 to 0.
With latest bootloader update, the PHY address will be fixed to "0".
[0] https://git.pengutronix.de/cgit/barebox/commit/?id=93f7dcf631edfcda19e7757b28d66017ea274b81
Fixes: 0d446a5055 ("ARM: dts: add Protonic PRTI6Q board")
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
The ZII devel B board has two generations of Marvell Switches. The
mv88e6352 supports an MDIO clock of 12MHz. However the older 88e6185
does not like 12MHz, and often fails to probe.
Reduce the clock speed to 5MHz, which seems to work reliably.
Cc: Chris Healy <cphealy@gmail.com>
Fixes: b955387667 ("ARM: dts: ZII: update MDIO speed and preamble")
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Chris Healy <cphealy@gmail.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Set 10ms as minimum i2c slave configuration timeout since st_lsm6dsx
relies on accel ODR for i2c master clock and at high sample rates
(e.g. 833Hz or 416Hz) the slave sensor occasionally may need more cycles
than i2c master timeout (2s/833Hz + 1 ~ 3ms) to apply the configuration
resulting in an uncomplete slave configuration and a constant reading
from the i2c slave connected to st_lsm6dsx i2c master.
Fixes: 8f9a5249e3 ("iio: imu: st_lsm6dsx: enable 833Hz sample frequency for tagged sensors")
Fixes: c91c1c844e ("iio: imu: st_lsm6dsx: add i2c embedded controller support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Cc: <Stable@vger.kernel.org>
Link: https://lore.kernel.org/r/a69c8236bf16a1569966815ed71710af2722ed7d.1604247274.git.lorenzo@kernel.org
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Pull x86 fixes from Thomas Gleixner:
"Three fixes all related to #DB:
- Handle the BTF bit correctly so it doesn't get lost due to a kernel
#DB
- Only clear and set the virtual DR6 value used by ptrace on user
space triggered #DB. A kernel #DB must leave it alone to ensure
data consistency for ptrace.
- Make the bitmasking of the virtual DR6 storage correct so it does
not lose DR_STEP"
* tag 'x86-urgent-2020-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/debug: Fix DR_STEP vs ptrace_get_debugreg(6)
x86/debug: Only clear/set ->virtual_dr6 for userspace #DB
x86/debug: Fix BTF handling
Pull timer fixes from Thomas Gleixner:
"A few fixes for timers/timekeeping:
- Prevent undefined behaviour in the timespec64_to_ns() conversion
which is used for converting user supplied time input to
nanoseconds. It lacked overflow protection.
- Mark sched_clock_read_begin/retry() to prevent recursion in the
tracer
- Remove unused debug functions in the hrtimer and timerlist code"
* tag 'timers-urgent-2020-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
time: Prevent undefined behaviour in timespec64_to_ns()
timers: Remove unused inline funtion debug_timer_free()
hrtimer: Remove unused inline function debug_hrtimer_free()
time/sched_clock: Mark sched_clock_read_begin/retry() as notrace
Pull smp fix from Thomas Gleixner:
"A single fix for stop machine.
Mark functions no trace to prevent a crash caused by recursion when
enabling or disabling a tracer on RISC-V (probably all architectures
which patch through stop machine)"
* tag 'smp-urgent-2020-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
stop_machine, rcu: Mark functions as notrace
Pull locking fixes from Thomas Gleixner:
"A couple of locking fixes:
- Fix incorrect failure injection handling in the fuxtex code
- Prevent a preemption warning in lockdep when tracking
local_irq_enable() and interrupts are already enabled
- Remove more raw_cpu_read() usage from lockdep which causes state
corruption on !X86 architectures.
- Make the nr_unused_locks accounting in lockdep correct again"
* tag 'locking-urgent-2020-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
lockdep: Fix nr_unused_locks accounting
locking/lockdep: Remove more raw_cpu_read() usage
futex: Fix incorrect should_fail_futex() handling
lockdep: Fix preemption WARN for spurious IRQ-enable
Pull char/misc fixes/removals from Greg KH:
"Here's some small fixes for 5.10-rc2 and a big driver removal.
The fixes are for some reported issues in the interconnect and
coresight drivers, nothing major.
The "big" driver removal is the MIC drivers have been asked to be
removed as the hardware never shipped and Intel no longer wants to
maintain something that no one can use. This is welcomed by many as
the DMA usage of these drivers was "interesting" and the security
people were starting to question some issues that were starting to be
found in the codebase.
Note, one of the subsystems for this driver, the "VOP" code, will
probably come back in future kernel versions as it was looking to
potentially solve some PCIe virtualization issues that a number of
other vendors were wanting to solve. But as-is, this codebase didn't
work for anyone else so no actual functionality is being removed.
All of these have been in linux-next with no reported issues"
* tag 'char-misc-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
coresight: cti: Initialize dynamic sysfs attributes
coresight: Fix uninitialised pointer bug in etm_setup_aux()
coresight: add module license
misc: mic: remove the MIC drivers
interconnect: qcom: use icc_sync state for sm8[12]50
interconnect: qcom: Ensure that the floor bandwidth value is enforced
interconnect: qcom: sc7180: Init BCMs before creating the nodes
interconnect: qcom: sdm845: Init BCMs before creating the nodes
interconnect: Aggregate before setting initial bandwidth
interconnect: qcom: sdm845: Enable keepalive for the MM1 BCM
Pull driver core and documentation fixes from Greg KH:
"Here is one tiny debugfs change to fix up an API where the last user
was successfully fixed up in 5.10-rc1 (so it couldn't be merged
earlier), and a much larger Documentation/ABI/ update to the files so
they can be automatically parsed by our tools.
The Documentation/ABI/ updates are just formatting issues, small ones
to bring the files into parsable format, and have been acked by
numerous subsystem maintainers and the documentation maintainer. I
figured it was good to get this into 5.10-rc2 to help wih the merge
issues that would arise if these were to stick in linux-next until
5.11-rc1.
The debugfs change has been in linux-next for a long time, and the
Documentation updates only for the last linux-next release"
* tag 'driver-core-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (40 commits)
scripts: get_abi.pl: assume ReST format by default
docs: ABI: sysfs-class-led-trigger-pattern: remove hw_pattern duplication
docs: ABI: sysfs-class-backlight: unify ABI documentation
docs: ABI: sysfs-c2port: remove a duplicated entry
docs: ABI: sysfs-class-power: unify duplicated properties
docs: ABI: unify /sys/class/leds/<led>/brightness documentation
docs: ABI: stable: remove a duplicated documentation
docs: ABI: change read/write attributes
docs: ABI: cleanup several ABI documents
docs: ABI: sysfs-bus-nvdimm: use the right format for ABI
docs: ABI: vdso: use the right format for ABI
docs: ABI: fix syntax to be parsed using ReST notation
docs: ABI: convert testing/configfs-acpi to ReST
docs: Kconfig/Makefile: add a check for broken ABI files
docs: abi-testing.rst: enable --rst-sources when building docs
docs: ABI: don't escape ReST-incompatible chars from obsolete and removed
docs: ABI: create a 2-depth index for ABI
docs: ABI: make it parse ABI/stable as ReST-compatible files
docs: ABI: sysfs-uevent: make it compatible with ReST output
docs: ABI: testing: make the files compatible with ReST output
...
Pull staging driver fixes from Greg KH:
"Here are some small staging driver fixes for issues that have been
reported in 5.10-rc1:
- octeon driver fixes
- wfx driver fixes
- memory leak fix in vchiq driver
- fieldbus driver bugfix
- comedi driver bugfix
All of these have been in linux-next with no reported issues"
* tag 'staging-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: fieldbus: anybuss: jump to correct label in an error path
staging: wfx: fix test on return value of gpiod_get_value()
staging: wfx: fix use of uninitialized pointer
staging: mmal-vchiq: Fix memory leak for vchiq_instance
staging: comedi: cb_pcidas: Allow 2-channel commands for AO subdevice
staging: octeon: Drop on uncorrectable alignment or FCS error
staging: octeon: repair "fixed-link" support
Pull tty/serial fixes from Greg KH:
"Here are some small TTY and Serial driver fixes for reported issues
for 5.10-rc2. They include:
- vt ioctl bugfix for reported problems
- fsl_lpuart serial driver fix
- 21285 serial driver bugfix
All have been in linux-next with no reported issues"
* tag 'tty-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
vt_ioctl: fix GIO_UNIMAP regression
vt: keyboard, extend func_buf_lock to readers
vt: keyboard, simplify vt_kdgkbsent
tty: serial: fsl_lpuart: LS1021A has a FIFO size of 16 words, like LS1028A
tty: serial: 21285: fix lockup on open
Pull USB driver fixes from Greg KH:
"Here are a number of small bugfixes for reported issues in some USB
drivers. They include:
- typec bugfixes
- xhci bugfixes and lockdep warning fixes
- cdc-acm driver regression fix
- kernel doc fixes
- cdns3 driver bugfixes for a bunch of reported issues
- other tiny USB driver fixes
All have been in linux-next with no reported issues"
* tag 'usb-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: cdns3: gadget: own the lock wrongly at the suspend routine
usb: cdns3: Fix on-chip memory overflow issue
usb: cdns3: gadget: suspicious implicit sign extension
xhci: Don't create stream debugfs files with spinlock held.
usb: xhci: Workaround for S3 issue on AMD SNPS 3.0 xHC
xhci: Fix sizeof() mismatch
usb: typec: stusb160x: fix signedness comparison issue with enum variables
usb: typec: add missing MODULE_DEVICE_TABLE() to stusb160x
USB: apple-mfi-fastcharge: don't probe unhandled devices
usbcore: Check both id_table and match() when both available
usb: host: ehci-tegra: Fix error handling in tegra_ehci_probe()
usb: typec: stusb160x: fix an IS_ERR() vs NULL check in probe
usb: typec: tcpm: reset hard_reset_count for any disconnect
usb: cdc-acm: fix cooldown mechanism
usb: host: fsl-mph-dr-of: check return of dma_set_mask()
usb: fix kernel-doc markups
usb: typec: stusb160x: fix some signedness bugs
usb: cdns3: Variable 'length' set but not used
Pull kvm fixes from Paolo Bonzini:
"ARM:
- selftest fix
- force PTE mapping on device pages provided via VFIO
- fix detection of cacheable mapping at S2
- fallback to PMD/PTE mappings for composite huge pages
- fix accounting of Stage-2 PGD allocation
- fix AArch32 handling of some of the debug registers
- simplify host HYP entry
- fix stray pointer conversion on nVHE TLB invalidation
- fix initialization of the nVHE code
- simplify handling of capabilities exposed to HYP
- nuke VCPUs caught using a forbidden AArch32 EL0
x86:
- new nested virtualization selftest
- miscellaneous fixes
- make W=1 fixes
- reserve new CPUID bit in the KVM leaves"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: vmx: remove unused variable
KVM: selftests: Don't require THP to run tests
KVM: VMX: eVMCS: make evmcs_sanitize_exec_ctrls() work again
KVM: selftests: test behavior of unmapped L2 APIC-access address
KVM: x86: Fix NULL dereference at kvm_msr_ignored_check()
KVM: x86: replace static const variables with macros
KVM: arm64: Handle Asymmetric AArch32 systems
arm64: cpufeature: upgrade hyp caps to final
arm64: cpufeature: reorder cpus_have_{const, final}_cap()
KVM: arm64: Factor out is_{vhe,nvhe}_hyp_code()
KVM: arm64: Force PTE mapping on fault resulting in a device mapping
KVM: arm64: Use fallback mapping sizes for contiguous huge page sizes
KVM: arm64: Fix masks in stage2_pte_cacheable()
KVM: arm64: Fix AArch32 handling of DBGD{CCINT,SCRext} and DBGVCR
KVM: arm64: Allocate stage-2 pgd pages with GFP_KERNEL_ACCOUNT
KVM: arm64: Drop useless PAN setting on host EL1 to EL2 transition
KVM: arm64: Remove leftover kern_hyp_va() in nVHE TLB invalidation
KVM: arm64: Don't corrupt tpidr_el2 on failed HVC call
x86/kvm: Reserve KVM_FEATURE_MSI_EXT_DEST_ID
Since overrun interrupt support has been added, there's a regression when
two ADCs are used at the same time, with:
- an ADC configured to use IRQs. EOCIE bit is set. The handler is normally
called in this case.
- an ADC configured to use DMA. EOCIE bit isn't set. EOC triggers the DMA
request. It's then automatically cleared by DMA read. But the handler
gets called due to status bit is temporarily set (IRQ triggered by the
other ADC).
This is a regression as similar issue had been fixed earlier by
commit dcb1092017 ("iio: adc: stm32-adc:
fix a race when using several adcs with dma and irq").
Issue is that stm32_adc_eoc_enabled() returns non-zero value (always)
since OVR bit has been added and enabled for both DMA and IRQ case.
Remove OVR mask in IER register, and rely only on CSR status for overrun.
To avoid subsequent calls to interrupt routine on overrun, CSR OVR bit has
to be cleared. CSR OVR bit cannot be cleared directly by software.
To do this ADC must be stopped first, and OVR bit in ADC ISR has
to be cleared.
Also add a check in ADC IRQ handler to report spurious IRQs.
Fixes: cc06e67d8f ("iio: adc: stm32-adc: Add check on overrun interrupt")
Signed-off-by: Olivier Moysan <olivier.moysan@st.com>
Signed-off-by: Fabrice Gasnier <fabrice.gasnier@st.com>
Cc: <Stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20201021085313.5335-1-olivier.moysan@st.com
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Pull irqchip fixes from Marc Zyngier:
- A couple of fixes after the IPI as IRQ patches (Kconfig, bcm2836)
- Two SiFive PLIC fixes (irq_set_affinity, hierarchy handling)
- "unmapped events" handling for the ti-sci-inta controller
- Tidying up for the irq-mst driver (static functions, Kconfig)
- Small cleanup in the Renesas irqpin driver
- STM32 exti can now handle LP timer events
The DMA (BCDMA/PKTDMA and their rings/flows) events are under the INTA's
supervision as unmapped events in AM64.
In order to keep the current SW stack working, the INTA driver must replace
the dev_id with it's own when a request comes for BCDMA or PKTDMA
resources.
Implement parsing of the optional "ti,unmapped-event-sources" phandle array
to get the sci-dev-ids of the devices where the unmapped events originate.
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201020073243.19255-3-peter.ujfalusi@ti.com
The new DMA architecture introduced with AM64 introduced new event types:
unampped events.
These events are mapped within INTA in contrast to other K3 devices where
the events with similar function was originating from the UDMAP or ringacc.
The ti,unmapped-event-sources should contain phandle array to the devices
in the system (typically DMA controllers) from where the unmapped events
originate.
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20201020073243.19255-2-peter.ujfalusi@ti.com
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
1) Incorrect netlink report logic in flowtable and genID.
2) Add a selftest to check that wireguard passes the right sk
to ip_route_me_harder, from Jason A. Donenfeld.
3) Pass the actual sk to ip_route_me_harder(), also from Jason.
4) Missing expression validation of updates via nft --check.
5) Update byte and packet counters regardless of whether they
match, from Stefano Brivio.
====================
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The tunnel device such as vxlan, bareudp and geneve in the lwt mode set
the outer df only based TUNNEL_DONT_FRAGMENT.
And this was also the behavior for gre device before switching to use
ip_md_tunnel_xmit in commit 962924fa2b ("ip_gre: Refactor collect
metatdata mode tunnel xmit to ip_md_tunnel_xmit")
When the ip_gre in lwt mode xmit with ip_md_tunnel_xmi changed the rule and
make the discrepancy between handling of DF by different tunnels. So in the
ip_md_tunnel_xmit should follow the same rule like other tunnels.
Fixes: cfc7381b30 ("ip_tunnel: add collect_md mode to IPIP tunnel")
Signed-off-by: wenxu <wenxu@ucloud.cn>
Link: https://lore.kernel.org/r/1604028728-31100-1-git-send-email-wenxu@ucloud.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In my test setup, I had a SAMA5D27 device configured with ip forwarding, and
second device with usb ethernet (r8152) sending ICMP packets. If the packet
was larger than about 220 bytes, the SAMA5 device would "oops" with the
following trace:
kernel BUG at net/core/skbuff.c:1863!
Internal error: Oops - BUG: 0 [#1] ARM
Modules linked in: xt_MASQUERADE ppp_async ppp_generic slhc iptable_nat xt_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 can_raw can bridge stp llc ipt_REJECT nf_reject_ipv4 sd_mod cdc_ether usbnet usb_storage r8152 scsi_mod mii o
ption usb_wwan usbserial micrel macb at91_sama5d2_adc phylink gpio_sama5d2_piobu m_can_platform m_can industrialio_triggered_buffer kfifo_buf of_mdio can_dev fixed_phy sdhci_of_at91 sdhci_pltfm libphy sdhci mmc_core ohci_at91 ehci_atmel o
hci_hcd iio_rescale industrialio sch_fq_codel spidev prox2_hal(O)
CPU: 0 PID: 0 Comm: swapper Tainted: G O 5.9.1-prox2+ #1
Hardware name: Atmel SAMA5
PC is at skb_put+0x3c/0x50
LR is at macb_start_xmit+0x134/0xad0 [macb]
pc : [<c05258cc>] lr : [<bf0ea5b8>] psr: 20070113
sp : c0d01a60 ip : c07232c0 fp : c4250000
r10: c0d03cc8 r9 : 00000000 r8 : c0d038c0
r7 : 00000000 r6 : 00000008 r5 : c59b66c0 r4 : 0000002a
r3 : 8f659eff r2 : c59e9eea r1 : 00000001 r0 : c59b66c0
Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none
Control: 10c53c7d Table: 2640c059 DAC: 00000051
Process swapper (pid: 0, stack limit = 0x75002d81)
<snipped stack>
[<c05258cc>] (skb_put) from [<bf0ea5b8>] (macb_start_xmit+0x134/0xad0 [macb])
[<bf0ea5b8>] (macb_start_xmit [macb]) from [<c053e504>] (dev_hard_start_xmit+0x90/0x11c)
[<c053e504>] (dev_hard_start_xmit) from [<c0571180>] (sch_direct_xmit+0x124/0x260)
[<c0571180>] (sch_direct_xmit) from [<c053eae4>] (__dev_queue_xmit+0x4b0/0x6d0)
[<c053eae4>] (__dev_queue_xmit) from [<c05a5650>] (ip_finish_output2+0x350/0x580)
[<c05a5650>] (ip_finish_output2) from [<c05a7e24>] (ip_output+0xb4/0x13c)
[<c05a7e24>] (ip_output) from [<c05a39d0>] (ip_forward+0x474/0x500)
[<c05a39d0>] (ip_forward) from [<c05a13d8>] (ip_sublist_rcv_finish+0x3c/0x50)
[<c05a13d8>] (ip_sublist_rcv_finish) from [<c05a19b8>] (ip_sublist_rcv+0x11c/0x188)
[<c05a19b8>] (ip_sublist_rcv) from [<c05a2494>] (ip_list_rcv+0xf8/0x124)
[<c05a2494>] (ip_list_rcv) from [<c05403c4>] (__netif_receive_skb_list_core+0x1a0/0x20c)
[<c05403c4>] (__netif_receive_skb_list_core) from [<c05405c4>] (netif_receive_skb_list_internal+0x194/0x230)
[<c05405c4>] (netif_receive_skb_list_internal) from [<c0540684>] (gro_normal_list.part.0+0x14/0x28)
[<c0540684>] (gro_normal_list.part.0) from [<c0541280>] (napi_complete_done+0x16c/0x210)
[<c0541280>] (napi_complete_done) from [<bf14c1c0>] (r8152_poll+0x684/0x708 [r8152])
[<bf14c1c0>] (r8152_poll [r8152]) from [<c0541424>] (net_rx_action+0x100/0x328)
[<c0541424>] (net_rx_action) from [<c01012ec>] (__do_softirq+0xec/0x274)
[<c01012ec>] (__do_softirq) from [<c012d6d4>] (irq_exit+0xcc/0xd0)
[<c012d6d4>] (irq_exit) from [<c0160960>] (__handle_domain_irq+0x58/0xa4)
[<c0160960>] (__handle_domain_irq) from [<c0100b0c>] (__irq_svc+0x6c/0x90)
Exception stack(0xc0d01ef0 to 0xc0d01f38)
1ee0: 00000000 0000003d 0c31f383 c0d0fa00
1f00: c0d2eb80 00000000 c0d2e630 4dad8c49 4da967b0 0000003d 0000003d 00000000
1f20: fffffff5 c0d01f40 c04e0f88 c04e0f8c 30070013 ffffffff
[<c0100b0c>] (__irq_svc) from [<c04e0f8c>] (cpuidle_enter_state+0x7c/0x378)
[<c04e0f8c>] (cpuidle_enter_state) from [<c04e12c4>] (cpuidle_enter+0x28/0x38)
[<c04e12c4>] (cpuidle_enter) from [<c014f710>] (do_idle+0x194/0x214)
[<c014f710>] (do_idle) from [<c014fa50>] (cpu_startup_entry+0xc/0x14)
[<c014fa50>] (cpu_startup_entry) from [<c0a00dc8>] (start_kernel+0x46c/0x4a0)
Code: e580c054 8a000002 e1a00002 e8bd8070 (e7f001f2)
---[ end trace 146c8a334115490c ]---
The solution was to force nonlinear buffers to be cloned. This was previously
reported by Klaus Doth (https://www.spinics.net/lists/netdev/msg556937.html)
but never formally submitted as a patch.
This is the third revision, hopefully the formatting is correct this time!
Suggested-by: Klaus Doth <krnl@doth.eu>
Fixes: 653e92a917 ("net: macb: add support for padding and fcs computation")
Signed-off-by: Mark Deneen <mdeneen@saucontech.com>
Link: https://lore.kernel.org/r/20201030155814.622831-1-mdeneen@saucontech.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull vhost fixes from Michael Tsirkin:
"Fixes all over the place.
A new UAPI is borderline: can also be considered a new feature but
also seems to be the only way we could come up with to fix addressing
for userspace - and it seems important to switch to it now before
userspace making assumptions about addressing ability of devices is
set in stone"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
vdpasim: allow to assign a MAC address
vdpasim: fix MAC address configuration
vdpa: handle irq bypass register failure case
vdpa_sim: Fix DMA mask
Revert "vhost-vdpa: fix page pinning leakage in error path"
vdpa/mlx5: Fix error return in map_direct_mr()
vhost_vdpa: Return -EFAULT if copy_from_user() fails
vdpa_sim: implement get_iova_range()
vhost: vdpa: report iova range
vdpa: introduce config op to get valid iova range
Pull more flexible-array member conversions from Gustavo A. R. Silva:
"Replace zero-length arrays with flexible-array members"
* tag 'flexible-array-conversions-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux:
printk: ringbuffer: Replace zero-length array with flexible-array member
net/smc: Replace zero-length array with flexible-array member
net/mlx5: Replace zero-length array with flexible-array member
mei: hw: Replace zero-length array with flexible-array member
gve: Replace zero-length array with flexible-array member
Bluetooth: btintel: Replace zero-length array with flexible-array member
scsi: target: tcmu: Replace zero-length array with flexible-array member
ima: Replace zero-length array with flexible-array member
enetc: Replace zero-length array with flexible-array member
fs: Replace zero-length array with flexible-array member
Bluetooth: Replace zero-length array with flexible-array member
params: Replace zero-length array with flexible-array member
tracepoint: Replace zero-length array with flexible-array member
platform/chrome: cros_ec_proto: Replace zero-length array with flexible-array member
platform/chrome: cros_ec_commands: Replace zero-length array with flexible-array member
mailbox: zynqmp-ipi-message: Replace zero-length array with flexible-array member
dmaengine: ti-cppi5: Replace zero-length array with flexible-array member
Hangbin Liu says:
====================
IPv6: reply ICMP error if fragment doesn't contain all headers
When our Engineer run latest IPv6 Core Conformance test, test v6LC.1.3.6:
First Fragment Doesn’t Contain All Headers[1] failed. The test purpose is to
verify that the node (Linux for example) should properly process IPv6 packets
that don’t include all the headers through the Upper-Layer header.
Based on RFC 8200, Section 4.5 Fragment Header
- If the first fragment does not include all headers through an
Upper-Layer header, then that fragment should be discarded and
an ICMP Parameter Problem, Code 3, message should be sent to
the source of the fragment, with the Pointer field set to zero.
The first patch add a definition for ICMPv6 Parameter Problem, code 3.
The second patch add a check for the 1st fragment packet to make sure
Upper-Layer header exist.
[1] Page 68, v6LC.1.3.6: First Fragment Doesn’t Contain All Headers part A, B,
C and D at https://ipv6ready.org/docs/Core_Conformance_5_0_0.pdf
[2] My reproducer:
import sys, os
from scapy.all import *
def send_frag_dst_opt(src_ip6, dst_ip6):
ip6 = IPv6(src = src_ip6, dst = dst_ip6, nh = 44)
frag_1 = IPv6ExtHdrFragment(nh = 60, m = 1)
dst_opt = IPv6ExtHdrDestOpt(nh = 58)
frag_2 = IPv6ExtHdrFragment(nh = 58, offset = 4, m = 1)
icmp_echo = ICMPv6EchoRequest(seq = 1)
pkt_1 = ip6/frag_1/dst_opt
pkt_2 = ip6/frag_2/icmp_echo
send(pkt_1)
send(pkt_2)
def send_frag_route_opt(src_ip6, dst_ip6):
ip6 = IPv6(src = src_ip6, dst = dst_ip6, nh = 44)
frag_1 = IPv6ExtHdrFragment(nh = 43, m = 1)
route_opt = IPv6ExtHdrRouting(nh = 58)
frag_2 = IPv6ExtHdrFragment(nh = 58, offset = 4, m = 1)
icmp_echo = ICMPv6EchoRequest(seq = 2)
pkt_1 = ip6/frag_1/route_opt
pkt_2 = ip6/frag_2/icmp_echo
send(pkt_1)
send(pkt_2)
if __name__ == '__main__':
src = sys.argv[1]
dst = sys.argv[2]
conf.iface = sys.argv[3]
send_frag_dst_opt(src, dst)
send_frag_route_opt(src, dst)
====================
Link: https://lore.kernel.org/r/20201027123313.3717941-1-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Based on RFC 8200, Section 4.5 Fragment Header:
- If the first fragment does not include all headers through an
Upper-Layer header, then that fragment should be discarded and
an ICMP Parameter Problem, Code 3, message should be sent to
the source of the fragment, with the Pointer field set to zero.
Checking each packet header in IPv6 fast path will have performance impact,
so I put the checking in ipv6_frag_rcv().
As the packet may be any kind of L4 protocol, I only checked some common
protocols' header length and handle others by (offset + 1) > skb->len.
Also use !(frag_off & htons(IP6_OFFSET)) to catch atomic fragments
(fragmented packet with only one fragment).
When send ICMP error message, if the 1st truncated fragment is ICMP message,
icmp6_send() will break as is_ineligible() return true. So I added a check
in is_ineligible() to let fragment packet with nexthdr ICMP but no ICMP header
return false.
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Based on RFC7112, Section 6:
IANA has added the following "Type 4 - Parameter Problem" message to
the "Internet Control Message Protocol version 6 (ICMPv6) Parameters"
registry:
CODE NAME/DESCRIPTION
3 IPv6 First Fragment has incomplete IPv6 Header Chain
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The position index in leq_seq_next is not updated when the next
entry is fetched an no more entries are available. This causes
seq_file to report the following error:
"seq_file: buggy .next function lec_seq_next [lec] did not update
position index"
Fix this by always updating the position index.
[ Note: this is an ancient 2002 bug, the sha is from the
tglx/history repo ]
Fixes 4aea2cbff417 ("[ATM]: Move lan seq_file ops to lec.c [1/3]")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Link: https://lore.kernel.org/r/20201027114925.21843-1-colin.king@canonical.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull dma-mapping fix from Christoph Hellwig:
"Fix an integer overflow on 32-bit platforms in the new DMA range code
(Geert Uytterhoeven)"
* tag 'dma-mapping-5.10-2' of git://git.infradead.org/users/hch/dma-mapping:
dma-mapping: fix 32-bit overflow with CONFIG_ARM_LPAE=n
Pull SCSI fixes from James Bottomley:
"Four driver fixes and one core fix.
The core fix closes a race window where we could kick off a second
asynchronous scan because the test and set of the variable preventing
it isn't atomic"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: hisi_sas: Stop using queue #0 always for v2 hw
scsi: ibmvscsi: Fix potential race after loss of transport
scsi: mptfusion: Fix null pointer dereferences in mptscsih_remove()
scsi: qla2xxx: Return EBUSY on fcport deletion
scsi: core: Don't start concurrent async scan on same host
Unless we want to test with THP, then we shouldn't require it to be
configured by the host kernel. Unfortunately, even advising with
MADV_NOHUGEPAGE does require it, so check for THP first in order
to avoid madvise failing with EINVAL.
Signed-off-by: Andrew Jones <drjones@redhat.com>
Message-Id: <20201029201703.102716-2-drjones@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
It was noticed that evmcs_sanitize_exec_ctrls() is not being executed
nowadays despite the code checking 'enable_evmcs' static key looking
correct. Turns out, static key magic doesn't work in '__init' section
(and it is unclear when things changed) but setup_vmcs_config() is called
only once per CPU so we don't really need it to. Switch to checking
'enlightened_vmcs' instead, it is supposed to be in sync with
'enable_evmcs'.
Opportunistically make evmcs_sanitize_exec_ctrls '__init' and drop unneeded
extra newline from it.
Reported-by: Yang Weijiang <weijiang.yang@intel.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20201014143346.2430936-1-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add a regression test for commit 671ddc700f ("KVM: nVMX: Don't leak
L1 MMIO regions to L2").
First, check to see that an L2 guest can be launched with a valid
APIC-access address that is backed by a page of L1 physical memory.
Next, set the APIC-access address to a (valid) L1 physical address
that is not backed by memory. KVM can't handle this situation, so
resuming L2 should result in a KVM exit for internal error
(emulation).
Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Ricardo Koller <ricarkol@google.com>
Reviewed-by: Peter Shier <pshier@google.com>
Message-Id: <20201026180922.3120555-1-jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
In ip_set_match_extensions(), for sets with counters, we take care of
updating counters themselves by calling ip_set_update_counter(), and of
checking if the given comparison and values match, by calling
ip_set_match_counter() if needed.
However, if a given comparison on counters doesn't match the configured
values, that doesn't mean the set entry itself isn't matching.
This fix restores the behaviour we had before commit 4750005a85
("netfilter: ipset: Fix "don't update counters" mode when counters used
at the matching"), without reintroducing the issue fixed there: back
then, mtype_data_match() first updated counters in any case, and then
took care of matching on counters.
Now, if the IPSET_FLAG_SKIP_COUNTER_UPDATE flag is set,
ip_set_update_counter() will anyway skip counter updates if desired.
The issue observed is illustrated by this reproducer:
ipset create c hash:ip counters
ipset add c 192.0.2.1
iptables -I INPUT -m set --match-set c src --bytes-gt 800 -j DROP
if we now send packets from 192.0.2.1, bytes and packets counters
for the entry as shown by 'ipset list' are always zero, and, no
matter how many bytes we send, the rule will never match, because
counters themselves are not updated.
Reported-by: Mithil Mhatre <mmhatre@redhat.com>
Fixes: 4750005a85 ("netfilter: ipset: Fix "don't update counters" mode when counters used at the matching")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
There is a regular need in the kernel to provide a way to declare having a
dynamically sized set of trailing elements in a structure. Kernel code
should always use “flexible array members”[1] for these cases. The
older style of one-element or zero-length arrays should no longer be
used[2].
Refactor the code according to the use of a flexible-array member in
struct gve_stats_report, instead of a zero-length array, and use the
struct_size() helper to calculate the size for the resource allocation.
[1] https://en.wikipedia.org/wiki/Flexible_array_member
[2] https://www.kernel.org/doc/html/v5.9/process/deprecated.html#zero-length-and-one-element-arrays
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Pull io_uring fixes from Jens Axboe:
- Fixes for linked timeouts (Pavel)
- Set IO_WQ_WORK_CONCURRENT early for async offload (Pavel)
- Two minor simplifications that make the code easier to read and
follow (Pavel)
* tag 'io_uring-5.10-2020-10-30' of git://git.kernel.dk/linux-block:
io_uring: use type appropriate io_kiocb handler for double poll
io_uring: simplify __io_queue_sqe()
io_uring: simplify nxt propagation in io_queue_sqe
io_uring: don't miss setting IO_WQ_WORK_CONCURRENT
io_uring: don't defer put of cancelled ltimeout
io_uring: always clear LINK_TIMEOUT after cancel
io_uring: don't adjust LINK_HEAD in cancel ltimeout
io_uring: remove opcode check on ltimeout kill
Pull libata fix from Jens Axboe:
"Single fix for an old regression with sata_nv"
* tag 'libata-5.10-2020-10-30' of git://git.kernel.dk/linux-block:
ata: sata_nv: Fix retrieving of active qcs
Some devices support ACS functionality even though they don't have a
spec-compliant ACS Capability; pci_enable_acs() has a quirk mechanism to
handle them.
We want to enable ACS whenever possible, but 52fbf5bdee ("PCI: Cache ACS
capability offset in device") inadvertently broke this by calling
pci_enable_acs() only if we find an ACS Capability.
This resulted in ACS not being enabled for these non-compliant devices,
which means devices can't be separated into different IOMMU groups, which
in turn means we may not be able to pass those devices through to VMs, as
reported by Boris V:
https://lore.kernel.org/r/74aeea93-8a46-5f5a-343c-790d4c655da3@bstnet.org
Fixes: 52fbf5bdee ("PCI: Cache ACS capability offset in device")
Link: https://lore.kernel.org/r/20201028231545.4116866-1-rajatja@google.com
Reported-by: Boris V <borisvk@bstnet.org>
Signed-off-by: Rajat Jain <rajatja@google.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Pull btrfs fixes from David Sterba:
- lockdep fixes:
- drop path locks before manipulating sysfs objects or qgroups
- preliminary fixes before tree locks get switched to rwsem
- use annotated seqlock
- build warning fixes (printk format)
- fix relocation vs fallocate race
- tree checker properly validates number of stripes and parity
- readahead vs device replace fixes
- iomap dio fix for unnecessary buffered io fallback
* tag 'for-5.10-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: convert data_seqcount to seqcount_mutex_t
btrfs: don't fallback to buffered read if we don't need to
btrfs: add a helper to read the tree_root commit root for backref lookup
btrfs: drop the path before adding qgroup items when enabling qgroups
btrfs: fix readahead hang and use-after-free after removing a device
btrfs: fix use-after-free on readahead extent after failure to create it
btrfs: tree-checker: validate number of chunk stripes and parity
btrfs: tree-checker: fix incorrect printk format
btrfs: drop the path before adding block group sysfs files
btrfs: fix relocation failure due to race with fallocate
Pull arm64 fixes from Will Deacon:
"The diffstat is a bit spread out thanks to an invasive CPU erratum
workaround which missed the merge window and also a bunch of fixes to
the recently added MTE selftests.
- Fixes to MTE kselftests
- Fix return code from KVM Spectre-v2 hypercall
- Build fixes for ld.lld and Clang's infamous integrated assembler
- Ensure RCU is up and running before we use printk()
- Workaround for Cortex-A77 erratum 1508412
- Fix linker warnings from unexpected ELF sections
- Ensure PE/COFF sections are 64k aligned"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: Change .weak to SYM_FUNC_START_WEAK_PI for arch/arm64/lib/mem*.S
arm64/smp: Move rcu_cpu_starting() earlier
arm64: Add workaround for Arm Cortex-A77 erratum 1508412
arm64: Add part number for Arm Cortex-A77
arm64: mte: Document that user PSTATE.TCO is ignored by kernel uaccess
module: use hidden visibility for weak symbol references
arm64: efi: increase EFI PE/COFF header padding to 64 KB
arm64: vmlinux.lds: account for spurious empty .igot.plt sections
kselftest/arm64: Fix check_user_mem test
kselftest/arm64: Fix check_ksm_options test
kselftest/arm64: Fix check_mmap_options test
kselftest/arm64: Fix check_child_memory test
kselftest/arm64: Fix check_tags_inclusion test
kselftest/arm64: Fix check_buffer_fill test
arm64: avoid -Woverride-init warning
KVM: arm64: ARM_SMCCC_ARCH_WORKAROUND_1 doesn't return SMCCC_RET_NOT_REQUIRED
arm64: vdso32: Allow ld.lld to properly link the VDSO
Pull asm-generic fix from Arnd Bergmann:
"One small bugfix, fixing a build regression for RISC-V"
* tag 'asm-generic-fixes-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
asm-generic: mark __{get,put}_user_fn as __always_inline
Pull ARM SoC fixes from Arnd Bergmann:
"This is a fairly large set of bug fixes on top of -rc1, as most of
them were ready but didn't quite make it into the last-minute pull
requests for the merge window.
Allwinner:
- fix for incorrect CPU overtemperature limit
Amlogic:
- multiple smaller DT bugfixes, and missing device nodes
Marvell EBU:
- add missing aliases for ethernet switch ports on espressobin board
Marvell MMP:
- DTC warning fix
- bugfix for camera interface power-down
NXP i.MX:
- re-enable the GPIO driver on all defconfigs
ST STM32MP1:
- fix random crashes from incorrect voltage settings
Synaptics Berlin:
- enable the correct hardware timer driver
Texas Instruments K2G:
- fix a boot regression in the power domain code
TEE drivers:
- fix regression in TEE "login" method
SCMI drivers:
- multiple code fixes for corner cases in newly added code
MAINTAINERS file:
- move Kukjin Kim and Sangbeom Kim to credits (used to work on
Samsung Exynos)
- Masahiro Yamada is stepping down as Uniphier maintainer
I did not include a series of patches that work around a regression
caused by a bugfix in an ethernet phy driver that resulted in an
inadvertent DT binding change. This is still under discussion"
* tag 'arm-soc-fixes-v5.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (31 commits)
soc: ti: ti_sci_pm_domains: check for proper args count in xlate
ARM: dts: stm32: Describe Vin power supply on stm32mp157c-edx board
ARM: dts: stm32: Describe Vin power supply on stm32mp15xx-dkx board
ARM: multi_v5_defconfig: Select CONFIG_GPIO_MXC
ARM: imx_v4_v5_defconfig: Select CONFIG_GPIO_MXC
ARM: dts: mmp2-olpc-xo-1-75: Use plural form of "-gpios"
ARM: dts: mmp3: Add power domain for the camera
arm64: berlin: Select DW_APB_TIMER_OF
dt-bindings: sram: sunxi-sram: add V3s compatible string
MAINTAINERS: Move Sangbeom Kim to credits
MAINTAINERS: Move Kukjin Kim to credits
MAINTAINERS: step down as maintainer of UniPhier SoCs and Denali driver
ARM: multi_v7_defconfig: Build in CONFIG_GPIO_MXC by default
ARM: imx_v6_v7_defconfig: Build in CONFIG_GPIO_MXC by default
arm64: defconfig: Build in CONFIG_GPIO_MXC by default
arm64: dts: meson: odroid-n2 plus: fix vddcpu_a pwm
ARM: dts: meson8: remove two invalid interrupt lines from the GPU node
arm64: dts: amlogic: add missing ethernet reset ID
firmware: arm_scmi: Fix duplicate workqueue name
firmware: arm_scmi: Fix locking in notifications
...
Pull PNP fix from Rafael Wysocki:
"Make function names in kerneldoc comments match the actual names of
the functions that they correspond to (Mauro Carvalho Chehab)"
* tag 'pnp-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PNP: fix kernel-doc markups
Pull device properties framework fixes from Rafael Wysocki:
"Fix the secondary firmware node handling while manipulating the
primary firmware node for a given device (Andy Shevchenko)"
* tag 'devprop-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
device property: Don't clear secondary pointer for shared primary firmware node
device property: Keep secondary firmware node secondary by type
Pull ACPI fixes from Rafael Wysocki:
"These fix three assorted minor issues.
Specifics:
- Eliminate compiler warning emitted when building the ACPI dock
driver (Arnd Bergmann).
- Drop lid_init_state quirk for Acer SW5-012 that is not needed any
more after recent changes (Hans de Goede).
- Fix "missing minus" typo in the NFIT parsing code (Zhang Qilong)"
* tag 'acpi-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: button: Drop no longer necessary Acer SW5-012 lid_init_state quirk
ACPI: NFIT: Fix comparison to '-ENXIO'
ACPI: dock: fix enum-conversion warning
Pull power management fixes from Rafael Wysocki:
"These fix a few issues related to running intel_pstate in the passive
mode with HWP enabled, correct the handling of the max_cstate module
parameter in intel_idle and make a few janitorial changes.
Specifics:
- Modify Kconfig to prevent configuring either the "conservative" or
the "ondemand" governor as the default cpufreq governor if
intel_pstate is selected, in which case "schedutil" is the default
choice for the default governor setting (Rafael Wysocki).
- Modify the cpufreq core, intel_pstate and the schedutil governor to
avoid missing updates of the HWP max limit when intel_pstate
operates in the passive mode with HWP enabled (Rafael Wysocki).
- Fix max_cstate module parameter handling in intel_idle for
processor models with C-state tables coming from ACPI (Chen Yu).
- Clean up assorted pieces of power management code (Jackie Zamow,
Tom Rix, Zhang Qilong)"
* tag 'pm-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: schedutil: Always call driver if CPUFREQ_NEED_UPDATE_LIMITS is set
cpufreq: Introduce cpufreq_driver_test_flags()
cpufreq: speedstep: remove unneeded semicolon
PM: sleep: fix typo in kernel/power/process.c
intel_idle: Fix max_cstate for processor models without C-state tables
cpufreq: intel_pstate: Avoid missing HWP max updates in passive mode
cpufreq: Introduce CPUFREQ_NEED_UPDATE_LIMITS driver flag
cpufreq: Avoid configuring old governors as default with intel_pstate
cpufreq: e_powersaver: remove unreachable break
Pull MMC host fixes from Ulf Hansson:
- sdhci: Fix performance regression with auto CMD auto select
- sdhci-of-esdhc: Fix initialization for eMMC HS400 mode
- sdhci-of-esdhc: Fix timeout bug for tuning commands
* tag 'mmc-v5.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: sdhci-of-esdhc: make sure delay chain locked for HS400
mmc: sdhci-of-esdhc: set timeout to max before tuning
mmc: sdhci: Use Auto CMD Auto Select only when v4_mode is true
Pull drm fixes from Dave Airlie:
"A busier rc2 than normal, have larger sets of fixes for amdgpu +
nouveau, along with some i915, docs, core, panel, sun4i, v3d, vc4
fixes.
Nothing spooky though or pumpkin related.
docs:
- kernel doc fixes
core:
- fix shmem helpers dma-buf mmap bug
amdgpu:
- Add new navi1x PCI ID
- GPUVM reserved area fixes
- Misc display fixes
- Fix bad interactions between display code and CONFIG_KGDB
- Fixes for SMU manual fan control and i2c
nouveau:
- endian regression fix for old gpus
- buffer object refcount fix
- uapi start/end alignment fix
- display notifier fix
- display clock checking fixes
i915:
- Fix max memory region size calculation
- Restore ILK-M RPS support, restoring performance
- Reject 90/270 degreerotated initial fbs
panel:
- mantix reset fixes
sun4i:
- scalar fix
vc4:
- hdmi audio fixes
v3d:
- fix double free"
* tag 'drm-fixes-2020-10-30-1' of git://anongit.freedesktop.org/drm/drm: (42 commits)
drm/nouveau/kms/nv50-: Fix clock checking algorithm in nv50_dp_mode_valid()
drm/nouveau/kms/nv50-: Get rid of bogus nouveau_conn_mode_valid()
drm/nouveau/device: fix changing endianess code to work on older GPUs
drm/nouveau/gem: fix "refcount_t: underflow; use-after-free"
drm/nouveau/kms/nv50-: Program notifier offset before requesting disp caps
drm/nouveau/nouveau: fix the start/end range for migration
drm/i915: Reject 90/270 degree rotated initial fbs
drm/i915: Restore ILK-M RPS support
drm/i915/region: fix max size calculation
drm/vc4: Rework the structure conversion functions
drm/vc4: hdmi: Add a name to the codec DAI component
drm/shme-helpers: Fix dma_buf_mmap forwarding bug
drm/vc4: hdmi: Avoid sleeping in atomic context
drm/amdgpu/pm: fix the fan speed in fan1_input in manual mode for navi1x
drm/amd/pm: fix the wrong fan speed in fan1_input
drm/amdgpu/swsmu: drop smu i2c bus on navi1x
drm/vc4: drv: Add error handding for bind
drm: drm_print.h: fix kernel-doc markups
drm: kernel-doc: drm_dp_helper.h: fix a typo
drm: kernel-doc: add description for a new function parameter
...
The newly introduced kvm_msr_ignored_check() tries to print error or
debug messages via vcpu_*() macros, but those may cause Oops when NULL
vcpu is passed for KVM_GET_MSRS ioctl.
Fix it by replacing the print calls with kvm_*() macros.
(Note that this will leave vcpu argument completely unused in the
function, but I didn't touch it to make the fix as small as
possible. A clean up may be applied later.)
Fixes: 12bc2132b1 ("KVM: X86: Do the same ignore_msrs check for feature msrs")
BugLink: https://bugzilla.suse.com/show_bug.cgi?id=1178280
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Message-Id: <20201030151414.20165-1-tiwai@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Even though the compiler is able to replace static const variables with
their value, it will warn about them being unused when Linux is built with W=1.
Use good old macros instead, this is not C++.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
KVM/arm64 fixes for 5.10, take #1
- Force PTE mapping on device pages provided via VFIO
- Fix detection of cacheable mapping at S2
- Fallback to PMD/PTE mappings for composite huge pages
- Fix accounting of Stage-2 PGD allocation
- Fix AArch32 handling of some of the debug registers
- Simplify host HYP entry
- Fix stray pointer conversion on nVHE TLB invalidation
- Fix initialization of the nVHE code
- Simplify handling of capabilities exposed to HYP
- Nuke VCPUs caught using a forbidden AArch32 EL0
When PTP timestamping is enabled on Tx, the controller
inserts the Tx timestamp at the beginning of the frame
buffer, between SFD and the L2 frame header. This means
that the skb provided by the stack is required to have
enough headroom otherwise a new skb needs to be created
by the driver to accommodate the timestamp inserted by h/w.
Up until now the driver was relying on the second option,
using skb_realloc_headroom() to create a new skb to accommodate
PTP frames. Turns out that this method is not reliable, as
reallocation of skbs for PTP frames along with the required
overhead (skb_set_owner_w, consume_skb) is causing random
crashes in subsequent skb_*() calls, when multiple concurrent
TCP streams are run at the same time on the same device
(as seen in James' report).
Note that these crashes don't occur with a single TCP stream,
nor with multiple concurrent UDP streams, but only when multiple
TCP streams are run concurrently with the PTP packet flow
(doing skb reallocation).
This patch enforces the first method, by requesting enough
headroom from the stack to accommodate PTP frames, and so avoiding
skb_realloc_headroom() & co, and the crashes no longer occur.
There's no reason not to set needed_headroom to a large enough
value to accommodate PTP frames, so in this regard this patch
is a fix.
Reported-by: James Jurack <james.jurack@ametek.com>
Fixes: bee9e58c9e ("gianfar:don't add FCB length to hard_header_len")
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Link: https://lore.kernel.org/r/20201020173605.1173-1-claudiu.manoil@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When PTP timestamping is enabled on Tx, the controller
inserts the Tx timestamp at the beginning of the frame
buffer, between SFD and the L2 frame header. This means
that the skb provided by the stack is required to have
enough headroom otherwise a new skb needs to be created
by the driver to accommodate the timestamp inserted by h/w.
Up until now the driver was relying on skb_realloc_headroom()
to create new skbs to accommodate PTP frames. Turns out that
this method is not reliable in this context at least, as
skb_realloc_headroom() for PTP frames can cause random crashes,
mostly in subsequent skb_*() calls, when multiple concurrent
TCP streams are run at the same time with the PTP flow
on the same device (as seen in James' report). I also noticed
that when the system is loaded by sending multiple TCP streams,
the driver receives cloned skbs in large numbers.
skb_cow_head() instead proves to be stable in this scenario,
and not only handles cloned skbs too but it's also more efficient
and widely used in other drivers.
The commit introducing skb_realloc_headroom in the driver
goes back to 2009, commit 93c1285c5d
("gianfar: reallocate skb when headroom is not enough for fcb").
For practical purposes I'm referencing a newer commit (from 2012)
that brings the code to its current structure (and fixes the PTP
case).
Fixes: 9c4886e5e6 ("gianfar: Fix invalid TX frames returned on error queue when time stamping")
Reported-by: James Jurack <james.jurack@ametek.com>
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Link: https://lore.kernel.org/r/20201029081057.8506-1-claudiu.manoil@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Chris reported that commit 24d5a3bffef1 ("lockdep: Fix
usage_traceoverflow") breaks the nr_unused_locks validation code
triggered by /proc/lockdep_stats.
By fully splitting LOCK_USED and LOCK_USED_READ it becomes a bad
indicator for accounting nr_unused_locks; simplyfy by using any first
bit.
Fixes: 24d5a3bffef1 ("lockdep: Fix usage_traceoverflow")
Reported-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://lkml.kernel.org/r/20201027124834.GL2628@hirez.programming.kicks-ass.net
I initially thought raw_cpu_read() was OK, since if it is !0 we have
IRQs disabled and can't get migrated, so if we get migrated both CPUs
must have 0 and it doesn't matter which 0 we read.
And while that is true; it isn't the whole store, on pretty much all
architectures (except x86) this can result in computing the address for
one CPU, getting migrated, the old CPU continuing execution with another
task (possibly setting recursion) and then the new CPU reading the value
of the old CPU, which is no longer 0.
Similer to:
baffd723e4 ("lockdep: Revert "lockdep: Use raw_cpu_*() for per-cpu variables"")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20201026152256.GB2651@hirez.programming.kicks-ass.net
On a system without uniform support for AArch32 at EL0, it is possible
for the guest to force run AArch32 at EL0 and potentially cause an
illegal exception if running on a core without AArch32. Add an extra
check so that if we catch the guest doing that, then we prevent it from
running again by resetting vcpu->arch.target and return
ARM_EXCEPTION_IL.
We try to catch this misbehaviour as early as possible and not rely on
an illegal exception occuring to signal the problem. Attempting to run a
32bit app in the guest will produce an error from QEMU if the guest
exits while running in AArch32 EL0.
Tested on Juno by instrumenting the host to fake asym aarch32 and
instrumenting KVM to make the asymmetry visible to the guest.
[will: Incorporated feedback from Marc]
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201021104611.2744565-2-qais.yousef@arm.com
Link: https://lore.kernel.org/r/20201027215118.27003-2-will@kernel.org
Some (apparently older) versions of the FEC hardware block do not like
the MMFR register being cleared to avoid generation of MII events at
initialization time. The action of clearing this register results in no
future MII events being generated at all on the problem block. This means
the probing of the MDIO bus will find no PHYs.
Create a quirk that can be checked at the FECs MII init time so that
the right thing is done. The quirk is set as appropriate for the FEC
hardware blocks that are known to need this.
Fixes: f166f890c8 ("net: ethernet: fec: Replace interrupt driven MDIO with polled IO")
Signed-off-by: Greg Ungerer <gerg@linux-m68k.org>
Acked-by: Fugang Duan <fugand.duan@nxp.com>
Tested-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Clemens Gruber <clemens.gruber@pqgruber.com>
Link: https://lore.kernel.org/r/20201028052232.1315167-1-gerg@linux-m68k.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
ip6_tnl_encap assigns to proto transport protocol which
encapsulates inner packet, but we must pass to set_inner_ipproto
protocol of that inner packet.
Calling set_inner_ipproto after ip6_tnl_encap might break gso.
For example, in case of encapsulating ipv6 packet in fou6 packet, inner_ipproto
would be set to IPPROTO_UDP instead of IPPROTO_IPV6. This would lead to
incorrect calling sequence of gso functions:
ipv6_gso_segment -> udp6_ufo_fragment -> skb_udp_tunnel_segment -> udp6_ufo_fragment
instead of:
ipv6_gso_segment -> udp6_ufo_fragment -> skb_udp_tunnel_segment -> ip6ip6_gso_segment
Fixes: 6c11fbf97e ("ip6_tunnel: add MPLS transmit support")
Signed-off-by: Alexander Ovechkin <ovov@yandex-team.ru>
Link: https://lore.kernel.org/r/20201029171012.20904-1-ovov@yandex-team.ru
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Mark flush request as IDLE in its .end_io(), aligning it with how normal
requests behave. The flush request stays in in-flight tags if we're not
using an IO scheduler, so we need to change its state into IDLE.
Otherwise, we will hang in blk_mq_tagset_wait_completed_request() during
error recovery because flush the request state is kept as COMPLETED.
Reported-by: Yi Zhang <yi.zhang@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Tested-by: Yi Zhang <yi.zhang@redhat.com>
Cc: Chao Leng <lengchao@huawei.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Georgi writes:
interconnect fixes for v5.10
This contains one core fix and a few driver fixes.
- Fix the core to perform also aggregation when setting the initial
bandwidth with sync_state.
- Fixes in some drivers to make sure the correct sequence is used for
initialization when we use sync_state.
- Fix in the sdm845 driver to prevent a board hang that was hit when
bandwidth scaling for display and multimedia was enabled.
Signed-off-by: Georgi Djakov <georgi.djakov@linaro.org>
* tag 'icc-5.10-rc2' of https://git.linaro.org/people/georgi.djakov/linux:
interconnect: qcom: use icc_sync state for sm8[12]50
interconnect: qcom: Ensure that the floor bandwidth value is enforced
interconnect: qcom: sc7180: Init BCMs before creating the nodes
interconnect: qcom: sdm845: Init BCMs before creating the nodes
interconnect: Aggregate before setting initial bandwidth
interconnect: qcom: sdm845: Enable keepalive for the MM1 BCM
The ABI files are supposed to be unique. Yet,
in the specific case of hw_pattern, there are some duplicated
entries as warned by scripts/get_abi.pl:
Warning: /sys/class/leds/<led>/hw_pattern is defined 3 times: Documentation/ABI/testing/sysfs-class-led-trigger-pattern:14 Documentation/ABI/testing/sysfs-class-led-driver-sc27xx:0 Documentation/ABI/testing/sysfs-class-led-driver-el15203000:0
Drop the duplication from the ABI files, moving the specific
definitions to files inside Documentation/leds.
Acked-by: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/038e57881550550b298e598f8f9b7f20515cbe15.1604042072.git.mchehab+huawei@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Both adp8860 and adp8870 define some extensions to the
backlight class. This causes warnings:
Warning: /sys/class/backlight/<backlight>/ambient_light_level is defined 2 times: /sys/class/backlight/<backlight>/ambient_light_level:8 /sys/class/backlight/<backlight>/ambient_light_level:30
Warning: /sys/class/backlight/<backlight>/ambient_light_zone is defined 2 times: /sys/class/backlight/<backlight>/ambient_light_zone:18 /sys/class/backlight/<backlight>/ambient_light_zone:40
As ABI definitions shouldn't be duplicated.
Unfortunately, the ABI is dependent on the specific device
features. As such, ambient_light_level range is somewhat
different among the supported devices.
The ambient_light_zone is even worse: the meanings of each
preset are different, and there's no ABI to retrieve
the supported types nor their meanins. Unfortunately,
it is too late to fix it without causing regressions,
as this has been used since Kernel v2.6.35.
Rewrite those ABI documentation using the current documentation
as a reference, and double-checking at the datasheets:
https://www.analog.com/media/en/technical-documentation/data-sheets/ADP8870.pdfhttps://www.analog.com/media/en/technical-documentation/data-sheets/ADP8860.pdf
in order to properly document the differences between those two
drivers.
Acked-by: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/342195ad5a819d9bcfcebc133c77ab69b4211672.1604042072.git.mchehab+huawei@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The ABI is not supposed to have duplicated entries, as warned
by get_abi.pl:
$ ./scripts/get_abi.pl validate 2>&1|grep sysfs-class-power
Warning: /sys/class/power_supply/<supply_name>/current_avg is defined 2 times: Documentation/ABI/testing/sysfs-class-power:108 Documentation/ABI/testing/sysfs-class-power:391
Warning: /sys/class/power_supply/<supply_name>/current_max is defined 2 times: Documentation/ABI/testing/sysfs-class-power:121 Documentation/ABI/testing/sysfs-class-power:404
Warning: /sys/class/power_supply/<supply_name>/current_now is defined 2 times: Documentation/ABI/testing/sysfs-class-power:130 Documentation/ABI/testing/sysfs-class-power:414
Warning: /sys/class/power_supply/<supply_name>/temp is defined 2 times: Documentation/ABI/testing/sysfs-class-power:281 Documentation/ABI/testing/sysfs-class-power:493
Warning: /sys/class/power_supply/<supply_name>/temp_alert_max is defined 2 times: Documentation/ABI/testing/sysfs-class-power:291 Documentation/ABI/testing/sysfs-class-power:505
Warning: /sys/class/power_supply/<supply_name>/temp_alert_min is defined 2 times: Documentation/ABI/testing/sysfs-class-power:306 Documentation/ABI/testing/sysfs-class-power:521
Warning: /sys/class/power_supply/<supply_name>/temp_max is defined 2 times: Documentation/ABI/testing/sysfs-class-power:322 Documentation/ABI/testing/sysfs-class-power:537
Warning: /sys/class/power_supply/<supply_name>/temp_min is defined 2 times: Documentation/ABI/testing/sysfs-class-power:333 Documentation/ABI/testing/sysfs-class-power:547
Warning: /sys/class/power_supply/<supply_name>/voltage_max is defined 2 times: Documentation/ABI/testing/sysfs-class-power:356 Documentation/ABI/testing/sysfs-class-power:571
Warning: /sys/class/power_supply/<supply_name>/voltage_min is defined 2 times: Documentation/ABI/testing/sysfs-class-power:367 Documentation/ABI/testing/sysfs-class-power:581
Warning: /sys/class/power_supply/<supply_name>/voltage_now is defined 2 times: Documentation/ABI/testing/sysfs-class-power:378 Documentation/ABI/testing/sysfs-class-power:591
Yet, both USB and Battery share a common set of charging-related
properties.
Unify the entries for such properties in order to avoid
duplication, while preserving the battery and USB-specific
data properly documented.
Acked-by: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/bcdf5f76326ea48a990a7cac612af216c387537d.1604042072.git.mchehab+huawei@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The files under Documentation/ABI should follow the syntax
as defined at Documentation/ABI/README.
Allow checking if they're following the syntax by running
the ABI parser script on COMPILE_TEST.
With that, when there's a problem with a file under
Documentation/ABI, it would produce a warning like:
Warning: file ./Documentation/ABI/testing/sysfs-bus-pci-devices-aer_stats#14:
What '/sys/bus/pci/devices/<dev>/aer_stats/aer_rootport_total_err_cor' doesn't have a description
Warning: file ./Documentation/ABI/testing/sysfs-bus-pci-devices-aer_stats#21:
What '/sys/bus/pci/devices/<dev>/aer_stats/aer_rootport_total_err_fatal' doesn't have a description
Acked-by: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/57a38de85cb4b548857207cf1fc1bf1ee08613c9.1604042072.git.mchehab+huawei@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Just like kernel-doc extension, we need to be able to identify
what part of an imported document has issues, as reporting them
as:
get_abi.pl rest --dir $srctree/Documentation/ABI/obsolete --rst-source:1689: ERROR: Unexpected indentation.
Makes a lot harder for someone to fix.
It should be noticed that it the line which will be reported is
the line where the "What:" definition is, and not the line
with actually has an error.
Acked-by: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/d6155ab16fb7631f2fa8e7a770eae72f24bf7cc5.1604042072.git.mchehab+huawei@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If userspace does not include the trailing end of batch message, then
nfnetlink aborts the transaction. This allows to check that ruleset
updates trigger no errors.
After this patch, invoking this command from the prerouting chain:
# nft -c add rule x y fib saddr . oif type local
fails since oif is not supported there.
This patch fixes the lack of rule validation from the abort/check path
to catch configuration errors such as the one above.
Fixes: a654de8fdc ("netfilter: nf_tables: fix chain dependency validation")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
If netfilter changes the packet mark when mangling, the packet is
rerouted using the route_me_harder set of functions. Prior to this
commit, there's one big difference between route_me_harder and the
ordinary initial routing functions, described in the comment above
__ip_queue_xmit():
/* Note: skb->sk can be different from sk, in case of tunnels */
int __ip_queue_xmit(struct sock *sk, struct sk_buff *skb, struct flowi *fl,
That function goes on to correctly make use of sk->sk_bound_dev_if,
rather than skb->sk->sk_bound_dev_if. And indeed the comment is true: a
tunnel will receive a packet in ndo_start_xmit with an initial skb->sk.
It will make some transformations to that packet, and then it will send
the encapsulated packet out of a *new* socket. That new socket will
basically always have a different sk_bound_dev_if (otherwise there'd be
a routing loop). So for the purposes of routing the encapsulated packet,
the routing information as it pertains to the socket should come from
that socket's sk, rather than the packet's original skb->sk. For that
reason __ip_queue_xmit() and related functions all do the right thing.
One might argue that all tunnels should just call skb_orphan(skb) before
transmitting the encapsulated packet into the new socket. But tunnels do
*not* do this -- and this is wisely avoided in skb_scrub_packet() too --
because features like TSQ rely on skb->destructor() being called when
that buffer space is truely available again. Calling skb_orphan(skb) too
early would result in buffers filling up unnecessarily and accounting
info being all wrong. Instead, additional routing must take into account
the new sk, just as __ip_queue_xmit() notes.
So, this commit addresses the problem by fishing the correct sk out of
state->sk -- it's already set properly in the call to nf_hook() in
__ip_local_out(), which receives the sk as part of its normal
functionality. So we make sure to plumb state->sk through the various
route_me_harder functions, and then make correct use of it following the
example of __ip_queue_xmit().
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
If netfilter changes the packet mark, the packet is rerouted. The
ip_route_me_harder family of functions fails to use the right sk, opting
to instead use skb->sk, resulting in a routing loop when used with
tunnels. With the next change fixing this issue in netfilter, test for
the relevant condition inside our test suite, since wireguard was where
the bug was discovered.
Reported-by: Chen Minqiang <ptpt52@gmail.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The netlink report should be sent regardless the available listeners.
Fixes: 84d7fce693 ("netfilter: nf_tables: export rule-set generation ID")
Fixes: 3b49e2e94e ("netfilter: nf_tables: add flow table netlink frontend")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Peter writes:
4 bug fixes for Cadence 3 driver.
The biggest fix is changing endpoint configuration method to
avoid FIFO overflow at multiple endpoints situation.
* tag 'usb-v5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/peter.chen/usb:
usb: cdns3: gadget: own the lock wrongly at the suspend routine
usb: cdns3: Fix on-chip memory overflow issue
usb: cdns3: gadget: suspicious implicit sign extension
usb: cdns3: Variable 'length' set but not used
When (for example) an IBSS station is pre-moved to AUTHORIZED
before it's inserted, and then the insertion fails, we don't
clean up the fast RX/TX states that might already have been
created, since we don't go through all the state transitions
again on the way down.
Do that, if it hasn't been done already, when the station is
freed. I considered only freeing the fast TX/RX state there,
but we might add more state so it's more robust to wind down
the state properly.
Note that we warn if the station was ever inserted, it should
have been properly cleaned up in that case, and the driver
will probably not like things happening out of order.
Reported-by: syzbot+2e293dbd67de2836ba42@syzkaller.appspotmail.com
Link: https://lore.kernel.org/r/20201009141710.7223b322a955.I95bd08b9ad0e039c034927cce0b75beea38e059b@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
There's a race condition in the netdev registration in that
NETDEV_REGISTER actually happens after the netdev is available,
and so if we initialize things only there, we might get called
with an uninitialized wdev through nl80211 - not using a wdev
but using a netdev interface index.
I found this while looking into a syzbot report, but it doesn't
really seem to be related, and unfortunately there's no repro
for it (yet). I can't (yet) explain how it managed to get into
cfg80211_release_pmsr() from nl80211_netlink_notify() without
the wdev having been initialized, as the latter only iterates
the wdevs that are linked into the rdev, which even without the
change here happened after init.
However, looking at this, it seems fairly clear that the init
needs to be done earlier, otherwise we might even re-init on a
netns move, when data might still be pending.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Link: https://lore.kernel.org/r/20201009135821.fdcbba3aad65.Ie9201d91dbcb7da32318812effdc1561aeaf4cdc@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When ieee80211_skb_resize() is called from ieee80211_build_hdr()
the skb has no 802.11 header yet, in fact it consist only of the
payload as the ethernet frame is removed. As such, we're using
the payload data for ieee80211_is_mgmt(), which is of course
completely wrong. This didn't really hurt us because these are
always data frames, so we could only have added more tailroom
than we needed if we determined it was a management frame and
sdata->crypto_tx_tailroom_needed_cnt was false.
However, syzbot found that of course there need not be any payload,
so we're using at best uninitialized memory for the check.
Fix this to pass explicitly the kind of frame that we have instead
of checking there, by replacing the "bool may_encrypt" argument
with an argument that can carry the three possible states - it's
not going to be encrypted, it's a management frame, or it's a data
frame (and then we check sdata->crypto_tx_tailroom_needed_cnt).
Reported-by: syzbot+32fd1a1bfe355e93f1e2@syzkaller.appspotmail.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Link: https://lore.kernel.org/r/20201009132538.e1fd7f802947.I799b288466ea2815f9d4c84349fae697dca2f189@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When sending EAPOL frames via NL80211 they are treated as injected
frames in mac80211. Due to commit 1df2bdba52 ("mac80211: never drop
injected frames even if normally not allowed") these injected frames
were not assigned a sta context in the function ieee80211_tx_dequeue,
causing certain wireless network cards to always send EAPOL frames in
plaintext. This may cause compatibility issues with some clients or
APs, which for instance can cause the group key handshake to fail and
in turn would cause the station to get disconnected.
This commit fixes this regression by assigning a sta context in
ieee80211_tx_dequeue to injected frames as well.
Note that sending EAPOL frames in plaintext is not a security issue
since they contain their own encryption and authentication protection.
Cc: stable@vger.kernel.org
Fixes: 1df2bdba52 ("mac80211: never drop injected frames even if normally not allowed")
Reported-by: Thomas Deutschmann <whissi@gentoo.org>
Tested-by: Christian Hesse <list@eworm.de>
Tested-by: Thomas Deutschmann <whissi@gentoo.org>
Signed-off-by: Mathy Vanhoef <Mathy.Vanhoef@kuleuven.be>
Link: https://lore.kernel.org/r/20201019160113.350912-1-Mathy.Vanhoef@kuleuven.be
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
We finalize caps before initializing kvm hyp code, and any use of
cpus_have_const_cap() in kvm hyp code generates redundant and
potentially unsound code to read the cpu_hwcaps array.
A number of helper functions used in both hyp context and regular kernel
context use cpus_have_const_cap(), as some regular kernel code runs
before the capabilities are finalized. It's tedious and error-prone to
write separate copies of these for hyp and non-hyp code.
So that we can avoid the redundant code, let's automatically upgrade
cpus_have_const_cap() to cpus_have_final_cap() when used in hyp context.
With this change, there's never a reason to access to cpu_hwcaps array
from hyp code, and we don't need to create an NVHE alias for this.
This should have no effect on non-hyp code.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Cc: David Brazdil <dbrazdil@google.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201026134931.28246-4-mark.rutland@arm.com
Currently has_vhe() detects whether it is being compiled for VHE/NVHE
hyp code based on preprocessor definitions, and uses this knowledge to
avoid redundant runtime checks.
There are other cases where we'd like to use this knowledge, so let's
factor the preprocessor checks out into separate helpers.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Cc: David Brazdil <dbrazdil@google.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201026134931.28246-2-mark.rutland@arm.com
According to the i.MX8MN TRM, there is only one OTG port. The
address for OTG2 is reserved on Nano.
This patch removes the non-existent OTG2, usbphynop2, and the usbmisc2
nodes.
Fixes: 6c3debcbae ("arm64: dts: freescale: Add i.MX8MN dtsi support")
Signed-off-by: Adam Ford <aford173@gmail.com>
Reviewed-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
The call to rcu_cpu_starting() in secondary_start_kernel() is not early
enough in the CPU-hotplug onlining process, which results in lockdep
splats as follows:
WARNING: suspicious RCU usage
-----------------------------
kernel/locking/lockdep.c:3497 RCU-list traversed in non-reader section!!
other info that might help us debug this:
RCU used illegally from offline CPU!
rcu_scheduler_active = 1, debug_locks = 1
no locks held by swapper/1/0.
Call trace:
dump_backtrace+0x0/0x3c8
show_stack+0x14/0x60
dump_stack+0x14c/0x1c4
lockdep_rcu_suspicious+0x134/0x14c
__lock_acquire+0x1c30/0x2600
lock_acquire+0x274/0xc48
_raw_spin_lock+0xc8/0x140
vprintk_emit+0x90/0x3d0
vprintk_default+0x34/0x40
vprintk_func+0x378/0x590
printk+0xa8/0xd4
__cpuinfo_store_cpu+0x71c/0x868
cpuinfo_store_cpu+0x2c/0xc8
secondary_start_kernel+0x244/0x318
This is avoided by moving the call to rcu_cpu_starting up near the
beginning of the secondary_start_kernel() function.
Signed-off-by: Qian Cai <cai@redhat.com>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Link: https://lore.kernel.org/lkml/160223032121.7002.1269740091547117869.tip-bot2@tip-bot2/
Link: https://lore.kernel.org/r/20201028182614.13655-1-cai@redhat.com
Signed-off-by: Will Deacon <will@kernel.org>
vdpa_sim generates a ramdom MAC address but it is never used by upper
layers because the VIRTIO_NET_F_MAC bit is not set in the features list.
Because of that, virtio-net always regenerates a random MAC address each
time it is loaded whereas the address should only change on vdpa_sim
load/unload.
Fix that by adding VIRTIO_NET_F_MAC in the features list of vdpa_sim.
Fixes: 2c53d0f64c ("vdpasim: vDPA device simulator")
Cc: jasowang@redhat.com
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Link: https://lore.kernel.org/r/20201029122050.776445-2-lvivier@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
LKP considered variable 'ret' in vhost_vdpa_setup_vq_irq() as
a unused variable, so suggest we remove it. Actually it stores
return value of irq_bypass_register_producer(), but we did not
check it, we should handle the failure case.
This commit will print a message if irq bypass register producer
fail, in this case, vqs still remain functional.
Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/r/20201023104046.404794-1-lingshan.zhu@intel.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
This reverts commit 7ed9e3d97c.
The patch creates a DoS risk since it can result in a high order memory
allocation.
Fixes: 7ed9e3d97c ("vhost-vdpa: fix page pinning leakage in error path")
Cc: stable@vger.kernel.org
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
When streaming bluetooth audio, the sound is choppy due to the
fact that the default baud rate of the HCI interface is too slow
to handle 16-bit stereo at 48KHz.
The Bluetooth chip is capable of up to 4M baud on the serial port,
so this patch sets the max-speed to 4000000 in order to properly
stream audio over the Bluetooth.
Fixes: 593816fa2f ("arm64: dts: imx: Add Beacon i.MX8m-Mini development kit")
Signed-off-by: Adam Ford <aford173@gmail.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Although the DPAA 1 FMan operations are coherent, the device tree
node for the FMan does not indicate that, resulting in a needless
loss of performance. Adding the missing dma-coherent property.
Fixes: 1ffbecdd83 ("arm64: dts: add DPAA FMan nodes")
Signed-off-by: Madalin Bucur <madalin.bucur@oss.nxp.com>
Tested-by: Camelia Groza <camelia.groza@oss.nxp.com>
Acked-by: Li Yang <leoyang.li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
commit e098bc9612 ("drm/amd/pm: optimize the power related source code layout")
moved the directory, update the F: file pattern to match.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This reverts commit f878122841 ("drm/amdgpu:
Fix bug where DPM is not enabled after hibernate and resume").
It was intended to fix Hawaii S4(hibernation) issue but break S3. As
ixFEATURE_STATUS is filled with garbage data on resume which can be
only cleared by reloading smc firmware(but that will involve many
changes). So, we will revert this S4 fix and seek a new way.
Signed-off-by: Evan Quan <evan.quan@amd.com>
Tested-by: Sandeep Raghuraman <sandy.8925@gmail.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
One issue exposed after below commit with which the system will freeze
at suspend after vGPU is created (no need to activate the vGPU).
commit e6ba764802 ("drm/i915: Remove i915->kernel_context")
Old implementation pin the intel_context at setup_submission and
unpin it at clean_submission. So after some vGPU is created, the
intel_context is always pinned there although no workload using it.
It will then block i915 enter suspend state.
There is no need to pin it all the time. Pin/unpin it around workload
lifecycle is more reasonable. After GVT enabled suspend/resume, the
pinned intel_context will also get unpined when userspace put VM process
into suspend state since all workloads are retired, then it's safe to
unpin all intel_context for workloads created. So move the pin/unpin to
create_workload and destroy_workload, while still keep the
create/destroy in old place.
V2:
Rebase.
Fixes: e6ba764802 ("drm/i915: Remove i915->kernel_context")
Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Colin Xu <colin.xu@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20201016054059.238371-1-colin.xu@intel.com
When the system goes to suspend, if the controller is at device mode with
cable connecting to host, the call stack is: cdns3_suspend->
cdns3_gadget_suspend -> cdns3_disconnect_gadget, after cdns3_disconnect_gadget
is called, it owns lock wrongly, it causes the system being deadlock after
resume due to at cdns3_device_thread_irq_handler, it tries to get the lock,
but can't get it forever.
To fix it, we delete the unlock-lock operations at cdns3_disconnect_gadget,
and do it at the caller.
Fixes: b1234e3b3b ("usb: cdns3: add runtime PM support")
Acked-by: Pawel Laszczak <pawell@cadence.com>
Signed-off-by: Peter Chen <peter.chen@nxp.com>
Patch fixes issue caused setting On-chip memory overflow bit in usb_sts
register. The issue occurred because EP_CFG register was set twice
before USB_STS.CFGSTS was set. Every write operation on EP_CFG.BUFFERING
causes that controller increases internal counter holding the number
of reserved on-chip buffers. First time this register was updated in
function cdns3_ep_config before delegating SET_CONFIGURATION request
to class driver and again it was updated when class wanted to enable
endpoint. This patch fixes this issue by configuring endpoints
enabled by class driver in cdns3_gadget_ep_enable and others just
before status stage.
Cc: stable@vger.kernel.org#v5.8+
Fixes: 7733f6c32e ("usb: cdns3: Add Cadence USB3 DRD Driver")
Reported-and-tested-by: Peter Chen <peter.chen@nxp.com>
Signed-off-by: Pawel Laszczak <pawell@cadence.com>
Signed-off-by: Peter Chen <peter.chen@nxp.com>
Commit 3193c0836 ("bpf: Disable GCC -fgcse optimization for
___bpf_prog_run()") introduced a __no_fgcse macro that expands to a
function scope __attribute__((optimize("-fno-gcse"))), to disable a
GCC specific optimization that was causing trouble on x86 builds, and
was not expected to have any positive effect in the first place.
However, as the GCC manual documents, __attribute__((optimize))
is not for production use, and results in all other optimization
options to be forgotten for the function in question. This can
cause all kinds of trouble, but in one particular reported case,
it causes -fno-asynchronous-unwind-tables to be disregarded,
resulting in .eh_frame info to be emitted for the function.
This reverts commit 3193c0836, and instead, it disables the -fgcse
optimization for the entire source file, but only when building for
X86 using GCC with CONFIG_BPF_JIT_ALWAYS_ON disabled. Note that the
original commit states that CONFIG_RETPOLINE=n triggers the issue,
whereas CONFIG_RETPOLINE=y performs better without the optimization,
so it is kept disabled in both cases.
Fixes: 3193c0836f ("bpf: Disable GCC -fgcse optimization for ___bpf_prog_run()")
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Link: https://lore.kernel.org/lkml/CAMuHMdUg0WJHEcq6to0-eODpXPOywLot6UD2=GFHpzoj_hCoBQ@mail.gmail.com/
Link: https://lore.kernel.org/bpf/20201028171506.15682-2-ardb@kernel.org
Add little-endian property to RCPM node (for ls1028a,ls1088a,ls208xa),
otherwise RCPM driver will program hardware with incorrect setting,
causing system (such as LS1028ARDB) failed to be waked by wakeup source.
Fixes: 791c88ca57 (“arm64: dts: ls1028a: Add ftm_alarm0 DT node”)
Fixes: f4fe3a8665 (“arm64: dts: layerscape: add ftm_alarm0 node”)
Signed-off-by: Biwen Li <biwen.li@nxp.com>
Signed-off-by: Ran Wang <ran.wang_1@nxp.com>
Acked-by: Li Yang <leoyang.li@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
The PMIC's interrupt is level low and should be pulled up. The PMIC's
device node had pinctrl-0 property but it lacked pinctrl-names which
is required to apply the pin configuration.
Fixes: 4153f7811a ("arm64: dts: imx8mn: correct interrupt flags")
Fixes: 6386156eb2 ("arm64: dts: imx8mn-evk: add pca9450 for i.mx8mn-evk board")
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Reviewed-by: Robin Gong <yibin.gong@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
The PMIC's interrupt is level low and should be pulled up. The PMIC's
device node had pinctrl-0 property but it lacked pinctrl-names which
is required to apply the pin configuration.
Fixes: 4153f7811a ("arm64: dts: imx8mn: correct interrupt flags")
Fixes: 3e44dd0973 ("arm64: dts: imx8mn-ddr4-evk: Add rohm,bd71847 PMIC support")
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Reviewed-by: Robin Gong <yibin.gong@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
The PMIC's interrupt is level low and should be pulled up. The PMIC's
device node had pinctrl-0 property but it lacked pinctrl-names which
is required to apply the pin configuration. The actual problem in DTS
was pointed out by Felix Radensky from Variscite.
Reported-by: Felix Radensky <felix.r@variscite.com>
Fixes: ade0176dd8 ("arm64: dts: imx8mn-var-som: Add Variscite VAR-SOM-MX8MN System on Module")
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Reviewed-by: Robin Gong <yibin.gong@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
The PMIC's interrupt is level low and should be pulled up. The PMIC's
device node had pinctrl-0 property but it lacked pinctrl-names which
is required to apply the pin configuration.
Fixes: 5f67317bd9 ("arm64: dts: imx8mm: correct interrupt flags")
Fixes: aa71d06483 ("arm64: dts: imx8mm: Split the imx8mm evk board dts to a common dtsi")
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Reviewed-by: Robin Gong <yibin.gong@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
The PMIC's interrupt is level low and should be pulled up. The PMIC's
device node had pinctrl-0 property but it lacked pinctrl-names which
is required to apply the pin configuration.
Fixes: 5f67317bd9 ("arm64: dts: imx8mm: correct interrupt flags")
Fixes: 593816fa2f ("arm64: dts: imx: Add Beacon i.MX8m-Mini development kit")
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Tested-by: Adam Ford <aford173@gmail.com>
Reviewed-by: Robin Gong <yibin.gong@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
The PMIC's interrupt is level low and should be pulled up. The PMIC's
device node had pinctrl-0 property but it lacked pinctrl-names which
is required to apply the pin configuration. The actual problem in DTS
was pointed out by Felix Radensky from Variscite.
Reported-by: Felix Radensky <felix.r@variscite.com>
Fixes: 5f67317bd9 ("arm64: dts: imx8mm: correct interrupt flags")
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Reviewed-by: Robin Gong <yibin.gong@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
While I thought I had this correct (since it actually did reject modes
like I expected during testing), Ville Syrjala from Intel pointed out
that the logic here isn't correct. max_clock refers to the max data rate
supported by the DP encoder. So, limiting it to the output of ds_clock (which
refers to the maximum dotclock of the downstream DP device) doesn't make any
sense. Additionally, since we're using the connector's bpc as the canonical BPC
we should use this in mode_valid until we support dynamically setting the bpp
based on bandwidth constraints.
https://lists.freedesktop.org/archives/dri-devel/2020-September/280276.html
For more info.
So, let's rewrite this using Ville's advice.
v2:
* Ville pointed out I mixed up the dotclock and the link rate. So fix that...
* ...and also rename all the variables in this function to be more appropriately
labeled so I stop mixing them up.
* Reuse the bpp from the connector for now until we have dynamic bpp selection.
* Use use DIV_ROUND_UP for calculating the mode rate like i915 does, which we
should also have been doing from the start
Signed-off-by: Lyude Paul <lyude@redhat.com>
Fixes: 409d38139b ("drm/nouveau/kms/nv50-: Use downstream DP clock limits for mode validation")
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Ville also pointed out that I got a lot of the logic here wrong as well, whoops.
While I don't think anyone's likely using 3D output with nouveau, the next patch
will make nouveau_conn_mode_valid() make a lot less sense. So, let's just get
rid of it and open-code it like before, while taking care to move the 3D frame
packing calculations on the dot clock into the right place.
Signed-off-by: Lyude Paul <lyude@redhat.com>
Fixes: d6a9efece7 ("drm/nouveau/kms/nv50-: Share DP SST mode_valid() handling with MST")
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: <stable@vger.kernel.org> # v5.8+
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
With this we try to detect if the endianess switch works and assume LE if
not. Suggested by Ben.
Fixes: 51c05340e4 ("drm/nouveau/device: detect if changing endianness failed")
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Cc: <stable@vger.kernel.org> # v5.8+
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
we can't use nouveau_bo_ref here as no ttm object was allocated and
nouveau_bo_ref mainly deals with that. Simply deallocate the object.
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Not entirely sure why this never came up when I originally tested this
(maybe some BIOSes already have this setup?) but the ->caps_init vfunc
appears to cause the display engine to throw an exception on driver
init, at least on my ThinkPad P72:
nouveau 0000:01:00.0: disp: chid 0 mthd 008c data 00000000 0000508c 0000102b
This is magic nvidia speak for "You need to have the DMA notifier offset
programmed before you can call NV507D_GET_CAPABILITIES." So, let's fix
this by doing that, and also perform an update afterwards to prevent
racing with the GPU when reading capabilities.
v2:
* Don't just program the DMA notifier offset, make sure to actually
perform an update
v3:
* Don't call UPDATE()
* Actually read the correct notifier fields, as apparently the
CAPABILITIES_DONE field lives in a different location than the main
NV_DISP_CORE_NOTIFIER_1 field. As well, 907d+ use a different
CAPABILITIES_DONE field then pre-907d cards.
v4:
* Don't forget to check the return value of core507d_read_caps()
v5:
* Get rid of NV50_DISP_CAPS_NTFY[14], use NV50_DISP_CORE_NTFY
* Disable notifier after calling GetCapabilities()
Signed-off-by: Lyude Paul <lyude@redhat.com>
Fixes: 4a2cb4181b ("drm/nouveau/kms/nv50-: Probe SOR and PIOR caps for DP interlacing support")
Cc: <stable@vger.kernel.org> # v5.8+
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
The user level OpenCL code shouldn't have to align start and end
addresses to a page boundary. That is better handled in the nouveau
driver. The npages field is also redundant since it can be computed
from the start and end addresses.
Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Before this patch, gfs2_fitrim was not properly checking for a "live" file
system. If the file system had something to trim and the file system
was read-only (or spectator) it would start the trim, but when it starts
the transaction, gfs2_trans_begin returns -EROFS (read-only file system)
and it errors out. However, if the file system was already trimmed so
there's no work to do, it never called gfs2_trans_begin. That code is
bypassed so it never returns the error. Instead, it returns a good
return code with 0 work. All this makes for inconsistent behavior:
The same fstrim command can return -EROFS in one case and 0 in another.
This tripped up xfstests generic/537 which reports the error as:
+fstrim with unrecovered metadata just ate your filesystem
This patch adds a check for a "live" (iow, active journal, iow, RW)
file system, and if not, returns the error properly.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Before commit 97fd734ba1, the local statfs_changeX inode was never
initialized for spectator mounts. However, it still checks for
spectator mounts when unmounting everything. There's no good reason to
lookup the statfs_changeX files because spectators cannot perform recovery.
It still, however, needs the master statfs file for statfs calls.
This patch adds the check for spectator mounts to init_statfs.
Fixes: 97fd734ba1 ("gfs2: lookup local statfs inodes prior to journal recovery")
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Before this patch, function gfs2_meta_sync called filemap_fdatawrite to write
the address space for the metadata being synced. That's great for inodes, but
resource groups all point to the same superblock-address space, sdp->sd_aspace.
Each rgrp has its own range of blocks on which it should operate. That meant
every time an rgrp's metadata was synced, it would write all of them instead
of just the range.
This patch eliminates function gfs2_meta_sync and tailors specific metasync
functions for inodes and rgrps.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Hi,
Before this patch, function init_journal's "undo" directive jumped to label
fail_jinode_gh. But now that it does statfs initialization, it needs to
jump to fail_statfs instead. Failure to do so means that mount failures
after init_journal is successful will neglect to let go of the proper
statfs information, stranding the statfs_changeX inodes. This makes it
impossible to free its glocks, and results in:
gfs2: fsid=sda.s: G: s:EX n:2/805f f:Dqob t:EX d:UN/603701000 a:0 v:0 r:4 m:200 p:1
gfs2: fsid=sda.s: H: s:EX f:H e:0 p:1397947 [(ended)] init_journal+0x548/0x890 [gfs2]
gfs2: fsid=sda.s: I: n:6/32863 t:8 f:0x00 d:0x00000201 s:24 p:0
gfs2: fsid=sda.s: G: s:SH n:5/805f f:Dqob t:SH d:UN/603712000 a:0 v:0 r:3 m:200 p:0
gfs2: fsid=sda.s: H: s:SH f:EH e:0 p:1397947 [(ended)] gfs2_inode_lookup+0x1fb/0x410 [gfs2]
VFS: Busy inodes after unmount of sda. Self-destruct in 5 seconds. Have a nice day...
The next time the file system is mounted, it then reuses the same glocks,
which ends in a kernel NULL pointer dereference when trying to dump the
reused glock.
This patch makes the "undo" function of init_journal jump to fail_statfs
so the statfs files are properly deconstructed upon failure.
Fixes: 97fd734ba1 ("gfs2: lookup local statfs inodes prior to journal recovery")
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Gfs2 creates an address space for its rgrps called sd_aspace, but it never
called truncate_inode_pages_final on it. This confused vfs greatly which
tried to reference the address space after gfs2 had freed the superblock
that contained it.
This patch adds a call to truncate_inode_pages_final for sd_aspace, thus
avoiding the use-after-free.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Function gfs2_clear_rgrpd calls kfree(rgd->rd_bits) before calling
return_all_reservations, but return_all_reservations still dereferences
rgd->rd_bits in __rs_deltree. Fix that by moving the call to kfree below the
call to return_all_reservations.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
K2G devices still only use single parameter for power-domains property,
so check for this properly in the driver. Without this, every peripheral
fails to probe resulting in boot failure.
Link: https://lore.kernel.org/r/20201029093337.21170-1-t-kristo@ti.com
Fixes: efa5c01cd7 ("soc: ti: ti_sci_pm_domains: switch to use multiple genpds instead of one")
Reported-by: Nishanth Menon <nm@ti.com>
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Acked-by: Nishanth Menon <nm@ti.com>
Acked-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
VFIO allows a device driver to resolve a fault by mapping a MMIO
range. This can be subsequently result in user_mem_abort() to
try and compute a huge mapping based on the MMIO pfn, which is
a sure recipe for things to go wrong.
Instead, force a PTE mapping when the pfn faulted in has a device
mapping.
Fixes: 6d674e28f6 ("KVM: arm/arm64: Properly handle faulting of device mappings")
Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Santosh Shukla <sashukla@nvidia.com>
[maz: rewritten commit message]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/1603711447-11998-2-git-send-email-sashukla@nvidia.com
Although huge pages can be created out of multiple contiguous PMDs
or PTEs, the corresponding sizes are not supported at Stage-2 yet.
Instead of failing the mapping, fall back to the nearer supported
mapping size (CONT_PMD to PMD and CONT_PTE to PTE respectively).
Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Gavin Shan <gshan@redhat.com>
[maz: rewritten commit message]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201025230626.18501-1-gshan@redhat.com
Pull fallthrough fix from Gustavo A. R. Silva:
"This fixes a ton of fall-through warnings when building with Clang
12.0.0 and -Wimplicit-fallthrough"
* tag 'fallthrough-fixes-clang-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux:
include: jhash/signal: Fix fall-through warnings for Clang
Pull networking fixes from Jakub Kicinski:
"Current release regressions:
- r8169: fix forced threading conflicting with other shared
interrupts; we tried to fix the use of raise_softirq_irqoff from an
IRQ handler on RT by forcing hard irqs, but this driver shares
legacy PCI IRQs so drop the _irqoff() instead
- tipc: fix memory leak caused by a recent syzbot report fix to
tipc_buf_append()
Current release - bugs in new features:
- devlink: Unlock on error in dumpit() and fix some error codes
- net/smc: fix null pointer dereference in smc_listen_decline()
Previous release - regressions:
- tcp: Prevent low rmem stalls with SO_RCVLOWAT.
- net: protect tcf_block_unbind with block lock
- ibmveth: Fix use of ibmveth in a bridge; the self-imposed filtering
to only send legal frames to the hypervisor was too strict
- net: hns3: Clear the CMDQ registers before unmapping BAR region;
incorrect cleanup order was leading to a crash
- bnxt_en - handful of fixes to fixes:
- Send HWRM_FUNC_RESET fw command unconditionally, even if there
are PCIe errors being reported
- Check abort error state in bnxt_open_nic().
- Invoke cancel_delayed_work_sync() for PFs also.
- Fix regression in workqueue cleanup logic in bnxt_remove_one().
- mlxsw: Only advertise link modes supported by both driver and
device, after removal of 56G support from the driver 56G was not
cleared from advertised modes
- net/smc: fix suppressed return code
Previous release - always broken:
- netem: fix zero division in tabledist, caused by integer overflow
- bnxt_en: Re-write PCI BARs after PCI fatal error.
- cxgb4: set up filter action after rewrites
- net: ipa: command payloads already mapped
Misc:
- s390/ism: fix incorrect system EID, it's okay to change since it
was added in current release
- vsock: use ns_capable_noaudit() on socket create to suppress false
positive audit messages"
* tag 'net-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (36 commits)
r8169: fix issue with forced threading in combination with shared interrupts
netem: fix zero division in tabledist
ibmvnic: fix ibmvnic_set_mac
mptcp: add missing memory scheduling in the rx path
tipc: fix memory leak caused by tipc_buf_append()
gtp: fix an use-before-init in gtp_newlink()
net: protect tcf_block_unbind with block lock
ibmveth: Fix use of ibmveth in a bridge.
net/sched: act_mpls: Add softdep on mpls_gso.ko
ravb: Fix bit fields checking in ravb_hwtstamp_get()
devlink: Unlock on error in dumpit()
devlink: Fix some error codes
chelsio/chtls: fix memory leaks in CPL handlers
chelsio/chtls: fix deadlock issue
net: hns3: Clear the CMDQ registers before unmapping BAR region
bnxt_en: Send HWRM_FUNC_RESET fw command unconditionally.
bnxt_en: Check abort error state in bnxt_open_nic().
bnxt_en: Re-write PCI BARs after PCI fatal error.
bnxt_en: Invoke cancel_delayed_work_sync() for PFs also.
bnxt_en: Fix regression in workqueue cleanup logic in bnxt_remove_one().
...
stage2_pte_cacheable() tries to figure out whether the mapping installed
in its 'pte' parameter is cacheable or not. Unfortunately, it fails
miserably because it extracts the memory attributes from the entry using
FIELD_GET(), which returns the attributes shifted down to bit 0, but then
compares this with the unshifted value generated by the PAGE_S2_MEMATTR()
macro.
A direct consequence of this bug is that cache maintenance is silently
skipped, which in turn causes 32-bit guests to crash early on when their
set/way maintenance is trapped but not emulated correctly.
Fix the broken masks by avoiding the use of FIELD_GET() altogether.
Fixes: 6d9d2115c4 ("KVM: arm64: Add support for stage-2 map()/unmap() in generic page-table")
Reported-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20201029144716.30476-1-will@kernel.org
The DBGD{CCINT,SCRext} and DBGVCR register entries in the cp14 array
are missing their target register, resulting in all accesses being
targetted at the guard sysreg (indexed by __INVALID_SYSREG__).
Point the emulation code at the actual register entries.
Fixes: bdfb4b389c ("arm64: KVM: add trap handlers for AArch32 debug registers")
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20201029172409.2768336-1-maz@kernel.org
The new calling convention says that pointers coming from the SMCCC
interface are turned into their HYP version in the host HVC handler.
However, there is still a stray kern_hyp_va() in the TLB invalidation
code, which could result in a corrupted pointer.
Drop the spurious conversion.
Fixes: a071261d93 ("KVM: arm64: nVHE: Fix pointers during SMCCC convertion")
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201026095116.72051-3-maz@kernel.org
The hyp-init code starts by stashing a register in TPIDR_EL2
in in order to free a register. This happens no matter if the
HVC call is legal or not.
Although nothing wrong seems to come out of it, it feels odd
to alter the EL2 state for something that eventually returns
an error.
Instead, use the fact that we know exactly which bits of the
__kvm_hyp_init call are non-zero to perform the check with
a series of EOR/ROR instructions, combined with a build-time
check that the value is the one we expect.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201026095116.72051-2-maz@kernel.org
Commit [bb1860efc8] changed the sink handling code introducing an
uninitialised pointer bug. This results in the default sink selection
failing.
Prior to commit:
static void etm_setup_aux(...)
<snip>
struct coresight_device *sink;
<snip>
/* First get the selected sink from user space. */
if (event->attr.config2) {
id = (u32)event->attr.config2;
sink = coresight_get_sink_by_id(id);
} else {
sink = coresight_get_enabled_sink(true);
}
<ctd>
*sink always initialised - possibly to NULL which triggers the
automatic sink selection.
After commit:
static void etm_setup_aux(...)
<snip>
struct coresight_device *sink;
<snip>
/* First get the selected sink from user space. */
if (event->attr.config2) {
id = (u32)event->attr.config2;
sink = coresight_get_sink_by_id(id);
}
<ctd>
*sink pointer uninitialised when not providing a sink on the perf command
line. This breaks later checks to enable automatic sink selection.
Fixes: bb1860efc8 ("coresight: etm: perf: Sink selection using sysfs is deprecated")
Signed-off-by: Mike Leach <mike.leach@linaro.org>
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Link: https://lore.kernel.org/r/20201029164559.1268531-3-mathieu.poirier@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pull rdma fixes from Jason Gunthorpe:
"The good news is people are testing rc1 in the RDMA world - the bad
news is testing of the for-next area is not as good as I had hoped, as
we really should have caught at least the rdma_connect_locked() issue
before now.
Notable merge window regressions that didn't get caught/fixed in time
for rc1:
- Fix in kernel users of rxe, they were broken by the rapid fix to
undo the uABI breakage in rxe from another patch
- EFA userspace needs to read the GID table but was broken with the
new GID table logic
- Fix user triggerable deadlock in mlx5 using devlink reload
- Fix deadlock in several ULPs using rdma_connect from the CM handler
callbacks
- Memory leak in qedr"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/qedr: Fix memory leak in iWARP CM
RDMA: Add rdma_connect_locked()
RDMA/uverbs: Fix false error in query gid IOCTL
RDMA/mlx5: Fix devlink deadlock on net namespace deletion
RDMA/rxe: Fix small problem in network_type patch
Currently it is possible to craft a special netlink RTM_NEWQDISC
command that can result in jitter being equal to 0x80000000. It is
enough to set the 32 bit jitter to 0x02000000 (it will later be
multiplied by 2^6) or just set the 64 bit jitter via
TCA_NETEM_JITTER64. This causes an overflow during the generation of
uniformly distributed numbers in tabledist(), which in turn leads to
division by zero (sigma != 0, but sigma * 2 is 0).
The related fragment of code needs 32-bit division - see commit
9b0ed89 ("netem: remove unnecessary 64 bit modulus"), so switching to
64 bit is not an option.
Fix the issue by keeping the value of jitter within the range that can
be adequately handled by tabledist() - [0;INT_MAX]. As negative std
deviation makes no sense, take the absolute value of the passed value
and cap it at INT_MAX. Inside tabledist(), switch to unsigned 32 bit
arithmetic in order to prevent overflows.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Aleksandr Nogikh <nogikh@google.com>
Reported-by: syzbot+ec762a6342ad0d3c0d8f@syzkaller.appspotmail.com
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Link: https://lore.kernel.org/r/20201028170731.1383332-1-aleksandrnogikh@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski brought up a concern in ibmvnic_set_mac().
ibmvnic_set_mac() does this:
ether_addr_copy(adapter->mac_addr, addr->sa_data);
if (adapter->state != VNIC_PROBED)
rc = __ibmvnic_set_mac(netdev, addr->sa_data);
So if state == VNIC_PROBED, the user can assign an invalid address to
adapter->mac_addr, and ibmvnic_set_mac() will still return 0.
The fix is to validate ethernet address at the beginning of
ibmvnic_set_mac(), and move the ether_addr_copy to
the case of "adapter->state != VNIC_PROBED".
Fixes: c26eba03e4 ("ibmvnic: Update reset infrastructure to support tunable parameters")
Signed-off-by: Lijun Pan <ljp@linux.ibm.com>
Link: https://lore.kernel.org/r/20201027220456.71450-1-ljp@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
MMIO memory is usually not mapped encrypted, so there is no reason to
support emulated MMIO when it is mapped encrypted.
Prevent a possible hypervisor attack where a RAM page is mapped as
an MMIO page in the nested page-table, so that any guest access to it
will trigger a #VC exception and leak the data on that page to the
hypervisor via the GHCB (like with valid MMIO). On the read side this
attack would allow the HV to inject data into the guest.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lkml.kernel.org/r/20201028164659.27002-6-joro@8bytes.org
Restore RPS for ILK-M. We lost it when an extra HAS_RPS()
check appeared in intel_rps_enable().
Unfortunaltey this just makes the performance worse on my
ILK because intel_ips insists on limiting the GPU freq to
the minimum. If we don't do the RPS init then intel_ips will
not limit the frequency for whatever reason. Either it can't
get at some required information and thus makes wrong decisions,
or we mess up some weights/etc. and cause it to make the wrong
decisions when RPS init has been done, or the entire thing is
just wrong. Would require a bunch of reverse engineering to
figure out what's going on.
Cc: stable@vger.kernel.org
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Fixes: 9c878557b1 ("drm/i915/gt: Use the RPM config register to determine clk frequencies")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201021131443.25616-1-ville.syrjala@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
(cherry picked from commit 2bf06370bc)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
We are incorrectly limiting the max allocation size as per the mm
max_order, which is effectively the largest power-of-two that we can fit
in the region size. However, it's normal to setup the region or
allocator with a non-power-of-two size(for example 3G), which we should
already handle correctly, except it seems for the early too-big-check.
v2: make sure we also exercise the I915_BO_ALLOC_CONTIGUOUS path, which
is quite different, since for that we are actually limited by the
largest power-of-two that we can fit within the region size. (Chris)
Fixes: b908be543e ("drm/i915: support creating LMEM objects")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: CQ Tang <cq.tang@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20201021103606.241395-1-matthew.auld@intel.com
(cherry picked from commit 83ebef47f8)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
In preparation to enable -Wimplicit-fallthrough for Clang, explicitly
add break statements instead of letting the code fall through to the
next case.
This patch adds four break statements that, together, fix almost 40,000
warnings when building Linux 5.10-rc1 with Clang 12.0.0 and this[1] change
reverted. Notice that in order to enable -Wimplicit-fallthrough for Clang,
such change[1] is meant to be reverted at some point. So, this patch helps
to move in that direction.
Something important to mention is that there is currently a discrepancy
between GCC and Clang when dealing with switch fall-through to empty case
statements or to cases that only contain a break/continue/return
statement[2][3][4].
Now that the -Wimplicit-fallthrough option has been globally enabled[5],
any compiler should really warn on missing either a fallthrough annotation
or any of the other case-terminating statements (break/continue/return/
goto) when falling through to the next case statement. Making exceptions
to this introduces variation in case handling which may continue to lead
to bugs, misunderstandings, and a general lack of robustness. The point
of enabling options like -Wimplicit-fallthrough is to prevent human error
and aid developers in spotting bugs before their code is even built/
submitted/committed, therefore eliminating classes of bugs. So, in order
to really accomplish this, we should, and can, move in the direction of
addressing any error-prone scenarios and get rid of the unintentional
fallthrough bug-class in the kernel, entirely, even if there is some minor
redundancy. Better to have explicit case-ending statements than continue to
have exceptions where one must guess as to the right result. The compiler
will eliminate any actual redundancy.
[1] commit e2079e93f5 ("kbuild: Do not enable -Wimplicit-fallthrough for clang for now")
[2] https://github.com/ClangBuiltLinux/linux/issues/636
[3] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91432
[4] https://godbolt.org/z/xgkvIh
[5] commit a035d552a9 ("Makefile: Globally enable fall-through warning")
Co-developed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Pull AFS fixes from David Howells:
- Fix copy_file_range() to an afs file now returning EINVAL if the
splice_write file op isn't supplied.
- Fix a deref-before-check in afs_unuse_cell().
- Fix a use-after-free in afs_xattr_get_acl().
- Fix afs to not try to clear PG_writeback when laundering a page.
- Fix afs to take a ref on a page that it sets PG_private on and to
drop that ref when clearing PG_private. This is done through recently
added helpers.
- Fix a page leak if write_begin() fails.
- Fix afs_write_begin() to not alter the dirty region info stored in
page->private, but rather do this in afs_write_end() instead when we
know what we actually changed.
- Fix afs_invalidatepage() to alter the dirty region info on a page
when partial page invalidation occurs so that we don't inadvertantly
include a span of zeros that will get written back if a page gets
laundered due to a remote 3rd-party induced invalidation.
We mustn't, however, reduce the dirty region if the page has been
seen to be mapped (ie. we got called through the page_mkwrite vector)
as the page might still be mapped and we might lose data if the file
is extended again.
- Fix the dirty region info to have a lower resolution if the size of
the page is too large for this to be encoded (e.g. powerpc32 with 64K
pages).
Note that this might not be the ideal way to handle this, since it
may allow some leakage of undirtied zero bytes to the server's copy
in the case of a 3rd-party conflict.
To aid the last two fixes, two additional changes:
- Wrap the manipulations of the dirty region info stored in
page->private into helper functions.
- Alter the encoding of the dirty region so that the region bounds can
be stored with one fewer bit, making a bit available for the
indication of mappedness.
* tag 'afs-fixes-20201029' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
afs: Fix dirty-region encoding on ppc32 with 64K pages
afs: Fix afs_invalidatepage to adjust the dirty region
afs: Alter dirty range encoding in page->private
afs: Wrap page->private manipulations in inline functions
afs: Fix where page->private is set during write
afs: Fix page leak on afs_write_begin() failure
afs: Fix to take ref on page when PG_private is set
afs: Fix afs_launder_page to not clear PG_writeback
afs: Fix a use after free in afs_xattr_get_acl()
afs: Fix tracing deref-before-check
afs: Fix copy_file_range()
When SEV is enabled, the kernel requests the C-bit position again from
the hypervisor to build its own page-table. Since the hypervisor is an
untrusted source, the C-bit position needs to be verified before the
kernel page-table is used.
Call sev_verify_cbit() before writing the CR3.
[ bp: Massage. ]
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lkml.kernel.org/r/20201028164659.27002-5-joro@8bytes.org
Check whether the hypervisor reported the correct C-bit when running as
an SEV guest. Using a wrong C-bit position could be used to leak
sensitive data from the guest to the hypervisor.
The check function is in a separate file:
arch/x86/kernel/sev_verify_cbit.S
so that it can be re-used in the running kernel image.
[ bp: Massage. ]
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lkml.kernel.org/r/20201028164659.27002-4-joro@8bytes.org
Commit ed42989eab ("tipc: fix the skb_unshare() in tipc_buf_append()")
replaced skb_unshare() with skb_copy() to not reduce the data reference
counter of the original skb intentionally. This is not the correct
way to handle the cloned skb because it causes memory leak in 2
following cases:
1/ Sending multicast messages via broadcast link
The original skb list is cloned to the local skb list for local
destination. After that, the data reference counter of each skb
in the original list has the value of 2. This causes each skb not
to be freed after receiving ACK:
tipc_link_advance_transmq()
{
...
/* release skb */
__skb_unlink(skb, &l->transmq);
kfree_skb(skb); <-- memory exists after being freed
}
2/ Sending multicast messages via replicast link
Similar to the above case, each skb cannot be freed after purging
the skb list:
tipc_mcast_xmit()
{
...
__skb_queue_purge(pkts); <-- memory exists after being freed
}
This commit fixes this issue by using skb_unshare() instead. Besides,
to avoid use-after-free error reported by KASAN, the pointer to the
fragment is set to NULL before calling skb_unshare() to make sure that
the original skb is not freed after freeing the fragment 2 times in
case skb_unshare() returns NULL.
Fixes: ed42989eab ("tipc: fix the skb_unshare() in tipc_buf_append()")
Acked-by: Jon Maloy <jmaloy@redhat.com>
Reported-by: Thang Hoang Ngo <thang.h.ngo@dektech.com.au>
Signed-off-by: Tung Nguyen <tung.q.nguyen@dektech.com.au>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Link: https://lore.kernel.org/r/20201027032403.1823-1-tung.q.nguyen@dektech.com.au
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
*_pdp_find() from gtp_encap_recv() would trigger a crash when a peer
sends GTP packets while creating new GTP device.
RIP: 0010:gtp1_pdp_find.isra.0+0x68/0x90 [gtp]
<SNIP>
Call Trace:
<IRQ>
gtp_encap_recv+0xc2/0x2e0 [gtp]
? gtp1_pdp_find.isra.0+0x90/0x90 [gtp]
udp_queue_rcv_one_skb+0x1fe/0x530
udp_queue_rcv_skb+0x40/0x1b0
udp_unicast_rcv_skb.isra.0+0x78/0x90
__udp4_lib_rcv+0x5af/0xc70
udp_rcv+0x1a/0x20
ip_protocol_deliver_rcu+0xc5/0x1b0
ip_local_deliver_finish+0x48/0x50
ip_local_deliver+0xe5/0xf0
? ip_protocol_deliver_rcu+0x1b0/0x1b0
gtp_encap_enable() should be called after gtp_hastable_new() otherwise
*_pdp_find() will access the uninitialized hash table.
Fixes: 1e3a3abd8b ("gtp: make GTP sockets in gtp_newlink optional")
Signed-off-by: Masahiro Fujiwara <fujiwara.masahiro@gmail.com>
Link: https://lore.kernel.org/r/20201027114846.3924-1-fujiwara.masahiro@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull ext4 fixes from Ted Ts'o:
"Bug fixes for the new ext4 fast commit feature, plus a fix for the
'data=journal' bug fix.
Also use the generic casefolding support which has now landed in
fs/libfs.c for 5.10"
* tag 'ext4_for_linus_fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: indicate that fast_commit is available via /sys/fs/ext4/feature/...
ext4: use generic casefolding support
ext4: do not use extent after put_bh
ext4: use IS_ERR() for error checking of path
ext4: fix mmap write protection for data=journal mode
jbd2: fix a kernel-doc markup
ext4: use s_mount_flags instead of s_mount_state for fast commit state
ext4: make num of fast commit blocks configurable
ext4: properly check for dirty state in ext4_inode_datasync_dirty()
ext4: fix double locking in ext4_fc_commit_dentry_updates()
On r8a7791/koelsch and shmobile_defconfig, PCIe probing fails with:
rcar-pcie fe000000.pcie: Adjusted size 0x0 invalid
rcar-pcie: probe of fe000000.pcie failed with error -22
of_dma_get_range() returns the following map:
cpu_start 0x40000000 dma_start 0x40000000 size 0x080000000 offset 0
cpu_start 0x00000000 dma_start 0x00000000 size 0x100000000 offset 0
If CONFIG_ARM_LPAE=n, dma_addr_t is 32-bit. Hence when assigning
r->dma_start + r->size to dma_end, this value will be truncated to
32-bit, yielding zero when processing the second table entry.
Consequently, both dma_start and dma_end will be zero, leading to a zero
size.
Fix this by changing the dma_start and dma_end variables from dma_addr_t
to u64.
Fixes: e0d072782c ("dma-mapping: introduce DMA range map, supplanting dma_pfn_offset")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Make sure that we actually initialize xefi_discard when we're scheduling
a deferred free of an AGFL block. This was (eventually) found by the
UBSAN while I was banging on realtime rmap problems, but it exists in
the upstream codebase. While we're at it, rearrange the structure to
reduce the struct size from 64 to 56 bytes.
Fixes: fcb762f5de ("xfs: add bmapi nodiscard flag")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
sg_copy_buffer() returns a size_t with the number of bytes copied.
Return 0 instead of false if the copy is skipped.
Signed-off-by: David Disseldorp <ddiss@suse.de>
Reviewed-by: Douglas Gilbert <dgilbert@interlog.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Use platform_get_resource() to fetch the memory resource and
platform_get_irq_optional() to get optional IRQ instead of
open-coded variants.
IRQ is not supposed to be changed at runtime, so there is
no functional change in ace_fsm_yieldirq().
On the other hand we now take first resources instead of last ones
to proceed. I can't imagine how broken should be firmware to have
a garbage in the first resource slots. But if it the case, it needs
to be documented.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Michal Simek <michal.simek@xilinx.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Fix a possible memory leak at xsk socket close that is caused by the
refcounting of the umem object being wrong. The reference count of the
umem was decremented only after the pool had been freed. Note that if
the buffer pool is destroyed, it is important that the umem is
destroyed after the pool, otherwise the umem would disappear while the
driver is still running. And as the buffer pool needs to be destroyed
in a work queue, the umem is also (if its refcount reaches zero)
destroyed after the buffer pool in that same work queue.
What was missing is that the refcount also needs to be decremented
when the pool is not freed and when the pool has not even been
created. The first case happens when the refcount of the pool is
higher than 1, i.e. it is still being used by some other socket using
the same device and queue id. In this case, it is safe to decrement
the refcount of the umem outside of the work queue as the umem will
never be freed because the refcount of the umem is always greater than
or equal to the refcount of the buffer pool. The second case is if the
buffer pool has not been created yet, i.e. the socket was closed
before it was bound but after the umem was created. In this case, it
is safe to destroy the umem outside of the work queue, since there is
no pool that can use it by definition.
Fixes: 1c1efc2af1 ("xsk: Create and free buffer pool independently from umem")
Reported-by: syzbot+eb71df123dc2be2c1456@syzkaller.appspotmail.com
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Link: https://lore.kernel.org/bpf/1603801921-2712-1-git-send-email-magnus.karlsson@gmail.com
The dirty region bounds stored in page->private on an afs page are 15 bits
on a 32-bit box and can, at most, represent a range of up to 32K within a
32K page with a resolution of 1 byte. This is a problem for powerpc32 with
64K pages enabled.
Further, transparent huge pages may get up to 2M, which will be a problem
for the afs filesystem on all 32-bit arches in the future.
Fix this by decreasing the resolution. For the moment, a 64K page will
have a resolution determined from PAGE_SIZE. In the future, the page will
need to be passed in to the helper functions so that the page size can be
assessed and the resolution determined dynamically.
Note that this might not be the ideal way to handle this, since it may
allow some leakage of undirtied zero bytes to the server's copy in the case
of a 3rd-party conflict. Fixing that would require a separately allocated
record and is a more complicated fix.
Fixes: 4343d00872 ("afs: Get rid of the afs_writeback record")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Fix afs_invalidatepage() to adjust the dirty region recorded in
page->private when truncating a page. If the dirty region is entirely
removed, then the private data is cleared and the page dirty state is
cleared.
Without this, if the page is truncated and then expanded again by truncate,
zeros from the expanded, but no-longer dirty region may get written back to
the server if the page gets laundered due to a conflicting 3rd-party write.
It mustn't, however, shorten the dirty region of the page if that page is
still mmapped and has been marked dirty by afs_page_mkwrite(), so a flag is
stored in page->private to record this.
Fixes: 4343d00872 ("afs: Get rid of the afs_writeback record")
Signed-off-by: David Howells <dhowells@redhat.com>
Currently, page->private on an afs page is used to store the range of
dirtied data within the page, where the range includes the lower bound, but
excludes the upper bound (e.g. 0-1 is a range covering a single byte).
This, however, requires a superfluous bit for the last-byte bound so that
on a 4KiB page, it can say 0-4096 to indicate the whole page, the idea
being that having both numbers the same would indicate an empty range.
This is unnecessary as the PG_private bit is clear if it's an empty range
(as is PG_dirty).
Alter the way the dirty range is encoded in page->private such that the
upper bound is reduced by 1 (e.g. 0-0 is then specified the same single
byte range mentioned above).
Applying this to both bounds frees up two bits, one of which can be used in
a future commit.
This allows the afs filesystem to be compiled on ppc32 with 64K pages;
without this, the following warnings are seen:
../fs/afs/internal.h: In function 'afs_page_dirty_to':
../fs/afs/internal.h:881:15: warning: right shift count >= width of type [-Wshift-count-overflow]
881 | return (priv >> __AFS_PAGE_PRIV_SHIFT) & __AFS_PAGE_PRIV_MASK;
| ^~
../fs/afs/internal.h: In function 'afs_page_dirty':
../fs/afs/internal.h:886:28: warning: left shift count >= width of type [-Wshift-count-overflow]
886 | return ((unsigned long)to << __AFS_PAGE_PRIV_SHIFT) | from;
| ^~
Fixes: 4343d00872 ("afs: Get rid of the afs_writeback record")
Signed-off-by: David Howells <dhowells@redhat.com>
The afs filesystem uses page->private to store the dirty range within a
page such that in the event of a conflicting 3rd-party write to the server,
we write back just the bits that got changed locally.
However, there are a couple of problems with this:
(1) I need a bit to note if the page might be mapped so that partial
invalidation doesn't shrink the range.
(2) There aren't necessarily sufficient bits to store the entire range of
data altered (say it's a 32-bit system with 64KiB pages or transparent
huge pages are in use).
So wrap the accesses in inline functions so that future commits can change
how this works.
Also move them out of the tracing header into the in-directory header.
There's not really any need for them to be in the tracing header.
Signed-off-by: David Howells <dhowells@redhat.com>
In afs, page->private is set to indicate the dirty region of a page. This
is done in afs_write_begin(), but that can't take account of whether the
copy into the page actually worked.
Fix this by moving the change of page->private into afs_write_end().
Fixes: 4343d00872 ("afs: Get rid of the afs_writeback record")
Signed-off-by: David Howells <dhowells@redhat.com>
Fix the leak of the target page in afs_write_begin() when it fails.
Fixes: 15b4650e55 ("afs: convert to new aops")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Nick Piggin <npiggin@gmail.com>
Fix afs to take a ref on a page when it sets PG_private on it and to drop
the ref when removing the flag.
Note that in afs_write_begin(), a lot of the time, PG_private is already
set on a page to which we're going to add some data. In such a case, we
leave the bit set and mustn't increment the page count.
As suggested by Matthew Wilcox, use attach/detach_page_private() where
possible.
Fixes: 31143d5d51 ("AFS: implement basic file write support")
Reported-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
When the zoned mode is enabled in null_blk, Serializing read, write
and zone management operations for each zone is necessary to protect
device level information for managing zone resources (zone open and
closed counters) as well as each zone condition and write pointer
position. Commit 35bc10b2ea ("null_blk: synchronization fix for
zoned device") introduced a spinlock to implement this serialization.
However, when memory backing is also enabled, GFP_NOIO memory
allocations are executed under the spinlock, resulting in might_sleep()
warnings. Furthermore, the zone_lock spinlock is locked/unlocked using
spin_lock_irq/spin_unlock_irq, similarly to the memory backing code with
the nullb->lock spinlock. This nested use of irq locks wrecks the irq
enabled/disabled state.
Fix all this by introducing a bitmap for per-zone lock, with locking
implemented using wait_on_bit_lock_io() and clear_and_wake_up_bit().
This locking mechanism allows keeping a zone locked while executing
null_process_cmd(), serializing all operations to the zone while
allowing to sleep during memory backing allocation with GFP_NOIO.
Device level zone resource management information is protected using
a spinlock which is not held while executing null_process_cmd();
Fixes: 35bc10b2ea ("null_blk: synchronization fix for zoned device")
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
In the cae of the REQ_OP_ZONE_RESET_ALL operation, the command sector is
ignored and the operation is applied to all sequential zones. For these
commands, tracing the effect of the command using the command sector to
determine the target zone is thus incorrect.
Fix null_zone_mgmt() zone condition tracing in the case of
REQ_OP_ZONE_RESET_ALL to apply tracing to all sequential zones that are
not already empty.
Fixes: 766c3297d7 ("null_blk: add trace in null_blk_zoned.c")
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Mounted NBD device can be resized, one use case is rbd-nbd.
Fix the issue by setting up default block size, then not touch it
in nbd_size_update() any more. This kind of usage is aligned with loop
which has same use case too.
Cc: stable@vger.kernel.org
Fixes: c8a83a6b54 ("nbd: Use set_blocksize() to set device blocksize")
Reported-by: lining <lining2020x@163.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Jan Kara <jack@suse.cz>
Tested-by: lining <lining2020x@163.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Because sugov_update_next_freq() may skip a frequency update even if
the need_freq_update flag has been set for the policy at hand, policy
limits updates may not take effect as expected.
For example, if the intel_pstate driver operates in the passive mode
with HWP enabled, it needs to update the HWP min and max limits when
the policy min and max limits change, respectively, but that may not
happen if the target frequency does not change along with the limit
at hand. In particular, if the policy min is changed first, causing
the target frequency to be adjusted to it, and the policy max limit
is changed later to the same value, the HWP max limit will not be
updated to follow it as expected, because the target frequency is
still equal to the policy min limit and it will not change until
that limit is updated.
To address this issue, modify get_next_freq() to let the driver
callback run if the CPUFREQ_NEED_UPDATE_LIMITS cpufreq driver flag
is set regardless of whether or not the new frequency to set is
equal to the previous one.
Fixes: f6ebbcf08f ("cpufreq: intel_pstate: Implement passive mode with HWP enabled")
Reported-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Cc: 5.9+ <stable@vger.kernel.org> # 5.9+: 1c534352f4 cpufreq: Introduce CPUFREQ_NEED_UPDATE_LIMITS ...
Cc: 5.9+ <stable@vger.kernel.org> # 5.9+: a62f68f5ca cpufreq: Introduce cpufreq_driver_test_flags()
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Add a helper function to test the flags of the cpufreq driver in use
againt a given flags mask.
In particular, this will be needed to test the
CPUFREQ_NEED_UPDATE_LIMITS cpufreq driver flag in the schedutil
governor.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The early #VC handler which doesn't have a GHCB can only handle CPUID
exit codes. It is needed by the early boot code to handle #VC exceptions
raised in verify_cpu() and to get the position of the C-bit.
But the CPUID information comes from the hypervisor which is untrusted
and might return results which trick the guest into the no-SEV boot path
with no C-bit set in the page-tables. All data written to memory would
then be unencrypted and could leak sensitive data to the hypervisor.
Add sanity checks to the early #VC handler to make sure the hypervisor
can not pretend that SEV is disabled.
[ bp: Massage a bit. ]
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lkml.kernel.org/r/20201028164659.27002-3-joro@8bytes.org
Disable MI2S bit clock from PAUSE/STOP/SUSPEND usecase instead of
shutdown time. Acheive this by invoking clk_disable API from
cpu daiops trigger instead of cpu daiops shutdown.
Change non-atomic API "clk_prepare_enable" to atomic API
"clk_enable" in trigger, as trigger is being called from atomic context.
Fixes: 7e6799d8f8 ("ASoC: qcom: lpass-cpu: Enable MI2S BCLK and LRCLK together")
Signed-off-by: V Sujith Kumar Reddy <vsujithk@codeaurora.org>
Signed-off-by: Srinivasa Rao Mandadapu <srivasam@codeaurora.org>
Link: https://lore.kernel.org/r/1603098363-9251-1-git-send-email-srivasam@codeaurora.org
Signed-off-by: Mark Brown <broonie@kernel.org>
The i2c-hid driver would quietly fail to probe the i2c-hid sensor-hub
with an ACPI device-id of SMO91D0 every other boot.
Specifically, the i2c_smbus_read_byte() "Make sure there is something at
this address" check would fail every other boot.
It seems that the BIOS does not properly reset/power-cycle the device
leaving it in a confused state where it refuses to respond to i2c-xfers.
On boots where probing the device failed, the driver-core puts the device
in D3 after the probe-failure, which causes the probe to succeed the next
boot.
Putting the device in D3 from the shutdown-handler fixes the sensors not
working every other boot.
This has been tested on both a Lenovo Miix 2-10 and a Dell Venue 8 Pro 5830
both of which use an i2c-hid sensor-hub with an ACPI id of SMO91D0.
Note that it is safe to call acpi_device_set_power() with a NULL pointer
as first argument, so on none ACPI enumerated devices this change is a
no-op.
Cc: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Acked-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
The code:
trb->length = cpu_to_le32(TRB_BURST_LEN(priv_ep->trb_burst_size)
| TRB_LEN(length));
TRB_BURST_LEN(priv_ep->trb_burst_size) may be overflow for int 32 if
priv_ep->trb_burst_size is equal or larger than 0x80;
Below is the Coverity warning:
sign_extension: Suspicious implicit sign extension: priv_ep->trb_burst_size
with type u8 (8 bits, unsigned) is promoted in priv_ep->trb_burst_size << 24
to type int (32 bits, signed), then sign-extended to type unsigned long
(64 bits, unsigned). If priv_ep->trb_burst_size << 24 is greater than 0x7FFFFFFF,
the upper bits of the result will all be 1.
To fix it, it needs to add an explicit cast to unsigned int type for ((p) << 24).
Reviewed-by: Jun Li <jun.li@nxp.com>
Signed-off-by: Peter Chen <peter.chen@nxp.com>
Most of the helpers to retrieve vc4 structures from the DRM base structures
rely on the fact that the first member of the vc4 structure is the DRM one
and just cast the pointers between them.
However, this is pretty fragile especially since there's no check to make
sure that the DRM structure is indeed at the offset 0 in the structure, so
let's use container_of to make it more robust.
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Reviewed-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201028123752.1733242-1-maxime@cerno.tech
Creating debugfs files while loding the spin_lock_irqsave(xhci->lock)
creates a lock dependecy that could possibly deadlock.
Lockdep warns:
=====================================================
WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected
5.10.0-rc1pdx86+ #8 Not tainted
-----------------------------------------------------
systemd-udevd/386 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
ffffffffb1a94038 (pin_fs_lock){+.+.}-{2:2}, at: simple_pin_fs+0x22/0xa0
and this task is already holding:
ffff9e7b87fbc430 (&xhci->lock){-.-.}-{2:2}, at: xhci_alloc_streams+0x5f9/0x810
which would create a new lock dependency:
(&xhci->lock){-.-.}-{2:2} -> (pin_fs_lock){+.+.}-{2:2}
Create the files a bit later after lock is released.
Fixes: 673d746836 ("usb: xhci: add debugfs support for ep with stream")
CC: Li Jun <jun.li@nxp.com>
Reported-by: Hans de Goede <hdegoede@redhat.com>
Reported-by: Mike Galbraith <efault@gmx.de>
Tested-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20201028203124.375344-4-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
chip->port_type and chip->pwr_opmode are enums and when GCC considers them
as unsigned, the conditions are never met.
This patch takes advantage of the ret variable and fixes the following
warnings:
drivers/usb/typec/stusb160x.c:548 stusb160x_get_fw_caps() warn: unsigned 'chip->port_type' is never less than zero.
drivers/usb/typec/stusb160x.c:570 stusb160x_get_fw_caps() warn: unsigned 'chip->pwr_opmode' is never less than zero.
Fixes: da0cb63100 ("usb: typec: add support for STUSB160x Type-C controller family")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Amelie Delaunay <amelie.delaunay@st.com>
Link: https://lore.kernel.org/r/20201028163309.12878-1-amelie.delaunay@st.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Nesting container_of() causes warnings with W=2, which is
annoying if it happens in headers and fills the build log
like:
In file included from drivers/clk/qcom/clk-alpha-pll.c:6:
drivers/clk/qcom/clk-alpha-pll.c: In function 'clk_alpha_pll_hwfsm_enable':
include/linux/kernel.h:852:8: warning: declaration of '__mptr' shadows a previous local [-Wshadow]
852 | void *__mptr = (void *)(ptr); \
| ^~~~~~
drivers/clk/qcom/clk-alpha-pll.c:155:31: note: in expansion of macro 'container_of'
155 | #define to_clk_alpha_pll(_hw) container_of(to_clk_regmap(_hw), \
| ^~~~~~~~~~~~
drivers/clk/qcom/clk-regmap.h:27:28: note: in expansion of macro 'container_of'
27 | #define to_clk_regmap(_hw) container_of(_hw, struct clk_regmap, hw)
| ^~~~~~~~~~~~
drivers/clk/qcom/clk-alpha-pll.c:155:44: note: in expansion of macro 'to_clk_regmap'
155 | #define to_clk_alpha_pll(_hw) container_of(to_clk_regmap(_hw), \
| ^~~~~~~~~~~~~
drivers/clk/qcom/clk-alpha-pll.c:254:30: note: in expansion of macro 'to_clk_alpha_pll'
254 | struct clk_alpha_pll *pll = to_clk_alpha_pll(hw);
| ^~~~~~~~~~~~~~~~
include/linux/kernel.h:852:8: note: shadowed declaration is here
852 | void *__mptr = (void *)(ptr); \
| ^~~~~~
Redefine two copies of the to_clk_regmap() macro as inline functions
to avoid a lot of these.
Fixes: ea11dda9e0 ("clk: meson: add regmap clocks")
Fixes: 085d7a4554 ("clk: qcom: Add a regmap type clock struct")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20201026161411.3708639-1-arnd@kernel.org
Acked-by: Jerome Brunet <jbrunet@baylibre.com>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Kyungmin Park maintained and contributed to some of the upstreamed
S5Pv210 and Exynos4210 machines - as described in commit 10ffa96407
("MAINTAINERS: add maintainer of Samsung Mobile Machine support").
However the entry in maintainers got slightly twisted by
commit 004bbd3c01 ("MAINTAINERS: remove non existent files") -
the directory matching pattern was changed from specific machines to
the entire S5Pv210.
Anyway since long time, all S5Pv210 maintenance is covered by the
Samsung ARM architectures maintainer entry and Krzysztof Kozlowski, so
move Kyungmin Park to the CREDITS.
There was also no activity on LKML regarding other maintained drivers:
https://lore.kernel.org/lkml/?q=f%3A%22Kyungmin+Park%22
Dear Kyungmin Park, thank you for all the effort you put in to the
upstream Samsung support.
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Andrzej Hajda <a.hajda@samsung.com>
Cc: Sylwester Nawrocki <s.nawrocki@samsung.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Olof Johansson <olof@lixom.net>
Link: https://lore.kernel.org/r/20201016151528.7553-1-krzk@kernel.org
Coredump logics needs to report not only the registers of the dumping
thread, but (since 2.5.43) those of other threads getting killed.
Doing that might require extra state saved on the stack in asm glue at
kernel entry; signal delivery logics does that (we need to be able to
save sigcontext there, at the very least) and so does seccomp.
That covers all callers of do_coredump(). Secondary threads get hit with
SIGKILL and caught as soon as they reach exit_mm(), which normally happens
in signal delivery, so those are also fine most of the time. Unfortunately,
it is possible to end up with secondary zapped when it has already entered
exit(2) (or, worse yet, is oopsing). In those cases we reach exit_mm()
when mm->core_state is already set, but the stack contents is not what
we would have in signal delivery.
At least on two architectures (alpha and m68k) it leads to infoleaks - we
end up with a chunk of kernel stack written into coredump, with the contents
consisting of normal C stack frames of the call chain leading to exit_mm()
instead of the expected copy of userland registers. In case of alpha we
leak 312 bytes of stack. Other architectures (including the regset-using
ones) might have similar problems - the normal user of regsets is ptrace
and the state of tracee at the time of such calls is special in the same
way signal delivery is.
Note that had the zapper gotten to the exiting thread slightly later,
it wouldn't have been included into coredump anyway - we skip the threads
that have already cleared their ->mm. So let's pretend that zapper always
loses the race. IOW, have exit_mm() only insert into the dumper list if
we'd gotten there from handling a fatal signal[*]
As the result, the callers of do_exit() that have *not* gone through get_signal()
are not seen by coredump logics as secondary threads. Which excludes voluntary
exit()/oopsen/traps/etc. The dumper thread itself is unaffected by that,
so seccomp is fine.
[*] originally I intended to add a new flag in tsk->flags, but ebiederman pointed
out that PF_SIGNALED is already doing just what we need.
Cc: stable@vger.kernel.org
Fixes: d89f3847def4 ("[PATCH] thread-aware coredumps, 2.5.43-C3")
History-tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Pull tracing fix from Steven Rostedt:
"Fix synthetic event "strcat" overrun
New synthetic event code used strcat() and miscalculated the ending,
causing the concatenation to write beyond the allocated memory.
Instead of using strncat(), the code is switched over to seq_buf which
has all the mechanisms in place to protect against writing more than
what is allocated, and cleans up the code a bit"
* tag 'trace-v5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing, synthetic events: Replace buggy strcat() with seq_buf operations
On exception entry, the kernel explicitly resets the PSTATE.TCO (tag
check override) so that any kernel memory accesses will be checked (the
bit is restored on exception return). This has the side-effect that the
uaccess routines will not honour the PSTATE.TCO that may have been set
by the user prior to a syscall.
There is no issue in practice since PSTATE.TCO is expected to be used
only for brief periods in specific routines (e.g. garbage collection).
To control the tag checking mode of the uaccess routines, the user will
have to invoke a corresponding prctl() call.
Document the kernel behaviour w.r.t. PSTATE.TCO accordingly.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Fixes: df9d7a22dd ("arm64: mte: Add Memory Tagging Extension documentation")
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Szabolcs Nagy <szabolcs.nagy@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Mauro says:
This series contain the patches from a previous series I sent:
[PATCH v2 00/24] Documentation build fixes against next-20201013
Plus other patches I sent later, against other versions of linux-next between
20201013 and v5.10-rc1.
It fixes most of the remaining documentation build warnings.
There were some changes from v2, as I changed some patches due to the
feedback received, and added reviewed-by/acked-by to several of them.
After this series, there will be just 3 warnings at include/kunit/test.h, whose
fixes were already applied by Shuah via her tree at linux-next. Hopefully, she
will be sending it upstream anytime toon. So, I dropped the fix from my trees.
The vast majority of patches here are also on my linux-next tree, as my
original plan were to send them upstream by the end of the merge window.
I'll drop from it once they get merged.
As those patches are fixes, I guess it should be ok to get them merged for
-rc2 or -rc3.
[jc: removed DRM and JBD patches applied elsewhere]
Commit afb585a97f "ext4: data=journal: write-protect pages on
j_submit_inode_data_buffers()") added calls ext4_jbd2_inode_add_write()
to track inode ranges whose mappings need to get write-protected during
transaction commits. However the added calls use wrong start of a range
(0 instead of page offset) and so write protection is not necessarily
effective. Use correct range start to fix the problem.
Fixes: afb585a97f ("ext4: data=journal: write-protect pages on j_submit_inode_data_buffers()")
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20201027132751.29858-1-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
As reported by kernel-doc:
./drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c:1: warning: no structured comments found
./drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:1: warning: no structured comments found
Those files only contain
/**
* DOC:
*/
markups, but they're included twice there: one to parse
such markup, and another one to parse internal functions.
In the case of amdgpu_xgmi.c, as it has just one such
markup, we can simply include the file once, and let it
parse the entire file without passing arguments to kernel-doc.
This should place everything altogether.
For amdgpu_ras.c, however, we need to remove the kernel-doc
with just internal. This should be re-introduced if this
file ever gets new non-DOC markups.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/bd070923591ae54f9587e7407b6291ac116952b2.1603791716.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
ext4_inode_datasync_dirty() needs to return 'true' if the inode is
dirty, 'false' otherwise, but the logic seems to be incorrectly changed
by commit aa75f4d3da ("ext4: main fast-commit commit path").
This introduces a problem with swap files that are always failing to be
activated, showing this error in dmesg:
[ 34.406479] swapon: file is not committed
Simple test case to reproduce the problem:
# fallocate -l 8G swapfile
# chmod 0600 swapfile
# mkswap swapfile
# swapon swapfile
Fix the logic to return the proper state of the inode.
Link: https://lore.kernel.org/lkml/20201024131333.GA32124@xps-13-7390
Fixes: 8016e29f43 ("ext4: fast commit recovery path")
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/20201027044915.2553163-1-harshadshirwadkar@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Running "make htmldocs: produce lots of warnings on those files:
./drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c:177: warning: Excess function parameter 'man' description in 'amdgpu_vram_mgr_init'
./drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c:177: warning: Excess function parameter 'p_size' description in 'amdgpu_vram_mgr_init'
./drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c:211: warning: Excess function parameter 'man' description in 'amdgpu_vram_mgr_fini'
./drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c:177: warning: Excess function parameter 'man' description in 'amdgpu_vram_mgr_init'
./drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c:177: warning: Excess function parameter 'p_size' description in 'amdgpu_vram_mgr_init'
./drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c:211: warning: Excess function parameter 'man' description in 'amdgpu_vram_mgr_fini'
./drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c:177: warning: Excess function parameter 'man' description in 'amdgpu_vram_mgr_init'
./drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c:177: warning: Excess function parameter 'p_size' description in 'amdgpu_vram_mgr_init'
./drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c:211: warning: Excess function parameter 'man' description in 'amdgpu_vram_mgr_fini'
./drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c:177: warning: Excess function parameter 'man' description in 'amdgpu_vram_mgr_init'
./drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c:177: warning: Excess function parameter 'p_size' description in 'amdgpu_vram_mgr_init'
./drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c:211: warning: Excess function parameter 'man' description in 'amdgpu_vram_mgr_fini'
./drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c:90: warning: Excess function parameter 'man' description in 'amdgpu_gtt_mgr_init'
./drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c:90: warning: Excess function parameter 'p_size' description in 'amdgpu_gtt_mgr_init'
./drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c:134: warning: Excess function parameter 'man' description in 'amdgpu_gtt_mgr_fini'
./drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c:90: warning: Excess function parameter 'man' description in 'amdgpu_gtt_mgr_init'
./drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c:90: warning: Excess function parameter 'p_size' description in 'amdgpu_gtt_mgr_init'
./drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c:134: warning: Excess function parameter 'man' description in 'amdgpu_gtt_mgr_fini'
./drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:675: warning: Excess function parameter 'dev' description in 'amdgpu_device_asic_init'
./drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:675: warning: Excess function parameter 'dev' description in 'amdgpu_device_asic_init'
./drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:675: warning: Excess function parameter 'dev' description in 'amdgpu_device_asic_init'
./drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:675: warning: Excess function parameter 'dev' description in 'amdgpu_device_asic_init'
They're related to the repacement of some parameters by adev,
and due to a few renamed parameters.
While here, uniform the name of the parameter for it to be
the same on all functions using a pointer to struct amdgpu_device.
Update the kernel-doc documentation accordingly.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/5755c2b361890b8ae5cea0f61dfd70b1c135eefe.1603791716.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
The direct-io.c file used to have just two exported symbols:
- dio_end_io()
- __blockdev_direct_IO()
The first one was removed by changeset
c33fe275b5 ("fs: remove no longer used dio_end_io()")
And the last one is used on most places indirectly, via
the inline macro blockdev_direct_IO() provided by fs.h.
Yet, neither the macro or the function have kernel-doc
markups.
So, drop the inclusion of fs/direct-io.c at the docs.
Fixes: c33fe275b5 ("fs: remove no longer used dio_end_io()")
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/d0a9fffedca102633c168adaf157f34288a4ea67.1603791716.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Changeset a435b9a143 ("locking/refcount: Provide __refcount API to obtain the old value")
added a set of functions starting with __ that have a new
parameter, adding a series of new warnings:
$ ./scripts/kernel-doc -none include/linux/refcount.h
include/linux/refcount.h:169: warning: Function parameter or member 'oldp' not described in '__refcount_add_not_zero'
include/linux/refcount.h:208: warning: Function parameter or member 'oldp' not described in '__refcount_add'
include/linux/refcount.h:239: warning: Function parameter or member 'oldp' not described in '__refcount_inc_not_zero'
include/linux/refcount.h:261: warning: Function parameter or member 'oldp' not described in '__refcount_inc'
include/linux/refcount.h:291: warning: Function parameter or member 'oldp' not described in '__refcount_sub_and_test'
include/linux/refcount.h:327: warning: Function parameter or member 'oldp' not described in '__refcount_dec_and_test'
include/linux/refcount.h:347: warning: Function parameter or member 'oldp' not described in '__refcount_dec'
The issue is that the kernel-doc markups are now misplaced,
as they should be added just before the functions.
So, move the kernel-doc markups to the proper places,
in order to drop the warnings.
It should be noticed that git show produces a crappy output,
for this patch without "--patience" flag.
Fixes: a435b9a143 ("locking/refcount: Provide __refcount API to obtain the old value")
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/7985c31d1ace591bc5e1faa05c367f1295b78afd.1603791716.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
There are several warnings caused by a recent change
224ec489d3 ("lockdep/Documention: Recursive read lock detection reasoning")
Those are reported by htmldocs build:
Documentation/locking/lockdep-design.rst:429: WARNING: Definition list ends without a blank line; unexpected unindent.
Documentation/locking/lockdep-design.rst:452: WARNING: Block quote ends without a blank line; unexpected unindent.
Documentation/locking/lockdep-design.rst:453: WARNING: Unexpected indentation.
Documentation/locking/lockdep-design.rst:453: WARNING: Blank line required after table.
Documentation/locking/lockdep-design.rst:454: WARNING: Block quote ends without a blank line; unexpected unindent.
Documentation/locking/lockdep-design.rst:455: WARNING: Unexpected indentation.
Documentation/locking/lockdep-design.rst:455: WARNING: Blank line required after table.
Documentation/locking/lockdep-design.rst:456: WARNING: Block quote ends without a blank line; unexpected unindent.
Documentation/locking/lockdep-design.rst:457: WARNING: Unexpected indentation.
Documentation/locking/lockdep-design.rst:457: WARNING: Blank line required after table.
Besides the reported issues, there are some missing blank
lines that ended producing wrong html output, and some
literals are not properly identified.
Also, the symbols used at the irq enabled/disable table
are not displayed as expected, as they're not literals.
Also, on another table they're using a different notation.
Fixes: 224ec489d3 ("lockdep/Documention: Recursive read lock detection reasoning")
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/3b9431ac5c01e38111cd59928a93e7259ab7db0f.1603791716.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Changeset 410d06879c ("ice: add the DDP Track ID to devlink info")
added description for a new devlink field, but forgot to add
one of its columns, causing it to break:
.../Documentation/networking/devlink/ice.rst:15: WARNING: Error parsing content block for the "list-table" directive: uniform two-level bullet list expected, but row 11 does not contain the same number of items as row 1 (3 vs 4).
.. list-table:: devlink info versions implemented
:widths: 5 5 5 90
...
* - ``fw.app.bundle_id``
- 0xc0000001
- Unique identifier for the DDP package loaded in the device. Also
referred to as the DDP Track ID. Can be used to uniquely identify
the specific DDP package.
Add the type field to the ``fw.app.bundle_id`` row.
Fixes: 410d06879c ("ice: add the DDP Track ID to devlink info")
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/84ae28bda1987284033966b7b56a4b27ae40713b.1603791716.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Sphinx 3 now checks for duplicated function declarations:
.../Documentation/networking/kapi:143: ../include/linux/phy.h:163: WARNING: Duplicate C declaration, also defined in 'networking/kapi'.
Declaration is 'unsigned int phy_supported_speeds (struct phy_device *phy, unsigned int *speeds, unsigned int size)'.
.../Documentation/networking/kapi:143: ../include/linux/phy.h:1034: WARNING: Duplicate C declaration, also defined in 'networking/kapi'.
Declaration is 'int phy_read_mmd (struct phy_device *phydev, int devad, u32 regnum)'.
.../Documentation/networking/kapi:143: ../include/linux/phy.h:1076: WARNING: Duplicate C declaration, also defined in 'networking/kapi'.
Declaration is 'int __phy_read_mmd (struct phy_device *phydev, int devad, u32 regnum)'.
.../Documentation/networking/kapi:143: ../include/linux/phy.h:1088: WARNING: Duplicate C declaration, also defined in 'networking/kapi'.
Declaration is 'int phy_write_mmd (struct phy_device *phydev, int devad, u32 regnum, u16 val)'.
.../Documentation/networking/kapi:143: ../include/linux/phy.h:1100: WARNING: Duplicate C declaration, also defined in 'networking/kapi'.
Declaration is 'int __phy_write_mmd (struct phy_device *phydev, int devad, u32 regnum, u16 val)'.
It turns that both the C and the H files have the same
kernel-doc markup for the same functions. Let's drop the
at the header file, keeping the one closer to the code.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/75e9a357f9a716833d2094b04898754876365e68.1603791716.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
.../Documentation/hwmon/mp2975.rst:25: WARNING: Unexpected indentation.
.../Documentation/hwmon/mp2975.rst:27: WARNING: Block quote ends without a blank line; unexpected unindent.
.../Documentation/hwmon/mp2975.rst:69: WARNING: Unexpected indentation.
.../Documentation/hwmon/mp2975.rst:70: WARNING: Block quote ends without a blank line; unexpected unindent.
.../Documentation/hwmon/mp2975.rst:72: WARNING: Bullet list ends without a blank line; unexpected unindent.
.../Documentation/hwmon/mp2975.rst: WARNING: document isn't included in any toctree
List blocks should have blank lines before and after them,
in order to be properly parsed.
Fixes: 4beb7a028e9f ("hwmon: (pmbus) Add support for MPS Multi-phase mp2975 controller")
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/b02f98d886ab1f5af233f8999c7a15529fc52cdc.1603791716.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
include/linux/ethtool.h is included twice with kernel-doc,
both to document ethtool_pause_stats(). The first one is
at statistics.rst, and the second one at ethtool-netlink.rst.
Replace one of the references to use the name of the
function. The automarkup.py extension should create the
cross-references.
Solves this warning:
../Documentation/networking/ethtool-netlink.rst: WARNING: Duplicate C declaration, also defined in 'networking/statistics'.
Declaration is 'ethtool_pause_stats'.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Acked-by: David S. Miller <davem@davemloft.net>
Link: https://lore.kernel.org/r/fdbf853bbdaf3bc1d38f32744b739d175c5c31f5.1603791716.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Sphinx C domain code after 3.2.1 will start complaning if :c:struct
would be used for an union type:
.../Documentation/gpu/drm-kms-helpers:352: ../drivers/video/hdmi.c:851: WARNING: C 'identifier' cross-reference uses wrong tag: reference name is 'union hdmi_infoframe' but found name is 'struct hdmi_infoframe'. Full reference name is 'union hdmi_infoframe'. Full found name is 'struct hdmi_infoframe'.
So, let's address this issue too in advance, in order to
avoid future issues.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/6e4ec3eec914df62389a299797a3880ae4490f35.1603791716.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
The include/linux/genalloc.h file defined this typedef:
typedef unsigned long (*genpool_algo_t)(unsigned long *map,unsigned long size,unsigned long start,unsigned int nr,void *data, struct gen_pool *pool, unsigned long start_addr);
Because it has a type composite of two words (unsigned long),
the parser gets the typedef name wrong:
.. c:macro:: long
**Typedef**: Allocation callback function type definition
Fix the regex in order to accept composite types when
defining a typedef for a function pointer.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/328e8018041cc44f7a1684e57f8d111230761c4f.1603792384.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
If a flash chip has more than 16MB capacity but its BFPT reports
BFPT_DWORD1_ADDRESS_BYTES_3_OR_4, the spi-nor framework defaults to 3.
The check in spi_nor_set_addr_width() doesn't catch it because addr_width
did get set. This fixes that check.
Fixes: f9acd7fa80 ("mtd: spi-nor: sfdp: default to addr_width of 3 for configurable widths")
Signed-off-by: Bert Vermeulen <bert@biot.com>
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
Reviewed-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Reviewed-by: Pratyush Yadav <p.yadav@ti.com>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Tested-by: Joel Stanley <joel@jms.id.au>
Tested-by: Cédric Le Goater <clg@kaod.org>
Link: https://lore.kernel.org/r/20201006132346.12652-1-bert@biot.com
The 7211a0 has a tca_drv_sel bit in the USB SETUP register that
should never be enabled. This feature is only used if there is a
USB Type-C PHY, and the 7211 does not have one. If the bit is
enabled, the VBUS signal will never be asserted. In the 7211a0,
the bit was incorrectly defaulted to on so the driver had to clear
the bit. In the 7211c0 the state was inverted so the driver should
no longer clear the bit. This hasn't been a problem because all
current 7211 boards don't use the VBUS signal, but there are some
future customer boards that may use it.
Signed-off-by: Al Cooper <alcooperx@gmail.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20201002190115.48017-1-alcooperx@gmail.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
STM32 DT fixes for v5.10, round 1
Highlights:
-----------
-On STM32MP157 DK & ED boards: Add Vin supply description to avoid
random kernel crash due to vref_ddr regulator issue.
* tag 'stm32-dt-for-v5.10-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/atorgue/stm32:
ARM: dts: stm32: Describe Vin power supply on stm32mp157c-edx board
ARM: dts: stm32: Describe Vin power supply on stm32mp15xx-dkx board
Link: https://lore.kernel.org/r/4ac236b3-b980-f653-f644-53e586570724@st.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
If should_futex_fail() returns true in futex_wake_pi(), then the 'ret'
variable is set to -EFAULT and then immediately overwritten. So the failure
injection is non-functional.
Fix it by actually leaving the function and returning -EFAULT.
The Fixes tag is kinda blury because the initial commit which introduced
failure injection was already sloppy, but the below mentioned commit broke
it completely.
[ tglx: Massaged changelog ]
Fixes: 6b4f4bc9cb ("locking/futex: Allow low-level atomic operations to return -EAGAIN")
Signed-off-by: Mateusz Nosek <mateusznosek0@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20200927000858.24219-1-mateusznosek0@gmail.com
Geert reports that commit be2881824a ("arm64/build: Assert for
unwanted sections") results in build errors on arm64 for configurations
that have CONFIG_MODULES disabled.
The commit in question added ASSERT()s to the arm64 linker script to
ensure that linker generated sections such as .got.plt etc are empty,
but as it turns out, there are corner cases where the linker does emit
content into those sections. More specifically, weak references to
function symbols (which can remain unsatisfied, and can therefore not
be emitted as relative references) will be emitted as GOT and PLT
entries when linking the kernel in PIE mode (which is the case when
CONFIG_RELOCATABLE is enabled, which is on by default).
What happens is that code such as
struct device *(*fn)(struct device *dev);
struct device *iommu_device;
fn = symbol_get(mdev_get_iommu_device);
if (fn) {
iommu_device = fn(dev);
essentially gets converted into the following when CONFIG_MODULES is off:
struct device *iommu_device;
if (&mdev_get_iommu_device) {
iommu_device = mdev_get_iommu_device(dev);
where mdev_get_iommu_device is emitted as a weak symbol reference into
the object file. The first reference is decorated with an ordinary
ABS64 data relocation (which yields 0x0 if the reference remains
unsatisfied). However, the indirect call is turned into a direct call
covered by a R_AARCH64_CALL26 relocation, which is converted into a
call via a PLT entry taking the target address from the associated
GOT entry.
Given that such GOT and PLT entries are unnecessary for fully linked
binaries such as the kernel, let's give these weak symbol references
hidden visibility, so that the linker knows that the weak reference
via R_AARCH64_CALL26 can simply remain unsatisfied.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Fangrui Song <maskray@google.com>
Acked-by: Jessica Yu <jeyu@kernel.org>
Cc: Jessica Yu <jeyu@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Link: https://lore.kernel.org/r/20201027151132.14066-1-ardb@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Add description for Vin power supply and for peripherals that
are supplied by Vin.
Signed-off-by: Pascal Paillet <p.paillet@st.com>
Signed-off-by: Alexandre Torgue <alexandre.torgue@st.com>
Add description for Vin power supply and for peripherals that
are supplied by Vin.
Signed-off-by: Pascal Paillet <p.paillet@st.com>
Signed-off-by: Alexandre Torgue <alexandre.torgue@st.com>
Commit 76085aff29 ("efi/libstub/arm64: align PE/COFF sections to segment
alignment") increased the PE/COFF section alignment to match the minimum
segment alignment of the kernel image, which ensures that the kernel does
not need to be moved around in memory by the EFI stub if it was built as
relocatable.
However, the first PE/COFF section starts at _stext, which is only 4 KB
aligned, and so the section layout is inconsistent. Existing EFI loaders
seem to care little about this, but it is better to clean this up.
So let's pad the header to 64 KB to match the PE/COFF section alignment.
Fixes: 76085aff29 ("efi/libstub/arm64: align PE/COFF sections to segment alignment")
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20201027073209.2897-2-ardb@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
ata_qc_complete_multiple() has to be called with the tags physically
active, that is the hw tag is at bit 0. ap->qc_active has the same tag
at bit ATA_TAG_INTERNAL instead, so call ata_qc_get_active() to fix that
up. This is done in the vein of 8385d756e1 ("libata: Fix retrieving of
active qcs").
Fixes: 28361c4036 ("libata: add extra internal command")
Tested-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
When the bio's size reaches max_append_sectors, bio_add_hw_page returns
0 then __bio_iov_append_get_pages returns -EINVAL. This is an expected
result of building a small enough bio not to be split in the IO path.
However, iov_iter is not advanced in this case, causing the same pages
are filled for the bio again and again.
Fix the case by properly advancing the iov_iter for already processed
pages.
Fixes: 0512a75b98 ("block: Introduce REQ_OP_ZONE_APPEND")
Cc: stable@vger.kernel.org # 5.8+
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Now that we started making the linker warn about orphan sections
(input sections that are not explicitly consumed by an output section),
some configurations produce the following warning:
aarch64-linux-gnu-ld: warning: orphan section `.igot.plt' from
`arch/arm64/kernel/head.o' being placed in section `.igot.plt'
It could be any file that triggers this - head.o is simply the first
input file in the link - and the resulting .igot.plt section never
actually appears in vmlinux as it turns out to be empty.
So let's add .igot.plt to our collection of input sections to disregard
unless they are empty.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Jessica Yu <jeyu@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Link: https://lore.kernel.org/r/20201028133332.5571-1-ardb@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
The icache_policy_str[] definition causes a warning when extra
warning flags are enabled:
arch/arm64/kernel/cpuinfo.c:38:26: warning: initialized field overwritten [-Woverride-init]
38 | [ICACHE_POLICY_VIPT] = "VIPT",
| ^~~~~~
arch/arm64/kernel/cpuinfo.c:38:26: note: (near initialization for 'icache_policy_str[2]')
arch/arm64/kernel/cpuinfo.c:39:26: warning: initialized field overwritten [-Woverride-init]
39 | [ICACHE_POLICY_PIPT] = "PIPT",
| ^~~~~~
arch/arm64/kernel/cpuinfo.c:39:26: note: (near initialization for 'icache_policy_str[3]')
arch/arm64/kernel/cpuinfo.c:40:27: warning: initialized field overwritten [-Woverride-init]
40 | [ICACHE_POLICY_VPIPT] = "VPIPT",
| ^~~~~~~
arch/arm64/kernel/cpuinfo.c:40:27: note: (near initialization for 'icache_policy_str[0]')
There is no real need for the default initializer here, as printing a
NULL string is harmless. Rewrite the logic to have an explicit
reserved value for the only one that uses the default value.
This partially reverts the commit that removed ICACHE_POLICY_AIVIVT.
Fixes: 155433cb36 ("arm64: cache: Remove support for ASID-tagged VIVT I-caches")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20201026193807.3816388-1-arnd@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Commit 78a5b53e9f ("Input: soc_button_array - work around DSDTs which
modify the irqflags") adds a workaround for DSDTs with a _LID method
which play tricks with the irqflags, assuming that the OS is using
an irq-type of IRQ_TYPE_LEVEL_LOW.
Now that this workaround is in place, we no longer need to disable the
lid functionality on the Acer SW5-012.
Fixes: 78a5b53e9f ("Input: soc_button_array - work around DSDTs which modify the irqflags")
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
In commit 5ba1278787, we shuffled with the check of 'perm'. But my
brain somehow inverted the condition in 'do_unimap_ioctl' (I thought
it is ||, not &&), so GIO_UNIMAP stopped working completely.
Move the 'perm' checks back to do_unimap_ioctl and do them right again.
In fact, this reverts this part of code to the pre-5ba127878722 state.
Except 'perm' is now a bool.
Fixes: 5ba1278787 ("vt_ioctl: move perm checks level up")
Cc: stable@vger.kernel.org
Reported-by: Fabian Vogt <fvogt@suse.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Link: https://lore.kernel.org/r/20201026055419.30518-1-jslaby@suse.cz
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Both read-side users of func_table/func_buf need locking. Without that,
one can easily confuse the code by repeatedly setting altering strings
like:
while (1)
for (a = 0; a < 2; a++) {
struct kbsentry kbs = {};
strcpy((char *)kbs.kb_string, a ? ".\n" : "88888\n");
ioctl(fd, KDSKBSENT, &kbs);
}
When that program runs, one can get unexpected output by holding F1
(note the unxpected period on the last line):
.
88888
.8888
So protect all accesses to 'func_table' (and func_buf) by preexisting
'func_buf_lock'.
It is easy in 'k_fn' handler as 'puts_queue' is expected not to sleep.
On the other hand, KDGKBSENT needs a local (atomic) copy of the string
because copy_to_user can sleep. Use already allocated, but unused
'kbs->kb_string' for that purpose.
Note that the program above needs at least CAP_SYS_TTY_CONFIG.
This depends on the previous patch and on the func_buf_lock lock added
in commit 46ca3f735f (tty/vt: fix write/write race in ioctl(KDSKBSENT)
handler) in 5.2.
Likely fixes CVE-2020-25656.
Cc: <stable@vger.kernel.org>
Reported-by: Minh Yuan <yuanmingbuaa@gmail.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Link: https://lore.kernel.org/r/20201019085517.10176-2-jslaby@suse.cz
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Use 'strlen' of the string, add one for NUL terminator and simply do
'copy_to_user' instead of the explicit 'for' loop. This makes the
KDGKBSENT case more compact.
The only thing we need to take care about is NULL 'func_table[i]'. Use
an empty string in that case.
The original check for overflow could never trigger as the func_buf
strings are always shorter or equal to 'struct kbsentry's.
Cc: <stable@vger.kernel.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Link: https://lore.kernel.org/r/20201019085517.10176-1-jslaby@suse.cz
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Prior to the commit that this one fixes, the FIFO size was derived from
the read-only register LPUARTx_FIFO[TXFIFOSIZE] using the following
formula:
TX FIFO size = 2 ^ (LPUARTx_FIFO[TXFIFOSIZE] - 1)
The documentation for LS1021A is a mess. Under chapter 26.1.3 LS1021A
LPUART module special consideration, it mentions TXFIFO_SZ and RXFIFO_SZ
being equal to 4, and in the register description for LPUARTx_FIFO, it
shows the out-of-reset value of TXFIFOSIZE and RXFIFOSIZE fields as "011",
even though these registers read as "101" in reality.
And when LPUART on LS1021A was working, the "101" value did correspond
to "16 datawords", by applying the formula above, even though the
documentation is wrong again (!!!!) and says that "101" means 64 datawords
(hint: it doesn't).
So the "new" formula created by commit f77ebb241c has all the premises
of being wrong for LS1021A, because it relied only on false data and no
actual experimentation.
Interestingly, in commit c2f448cff2 ("tty: serial: fsl_lpuart: add
LS1028A support"), Michael Walle applied a workaround to this by manually
setting the FIFO widths for LS1028A. It looks like the same values are
used by LS1021A as well, in fact.
When the driver thinks that it has a deeper FIFO than it really has,
getty (user space) output gets truncated.
Many thanks to Michael for pointing out where to look.
Fixes: f77ebb241c ("tty: serial: fsl_lpuart: correct the FIFO depth size")
Suggested-by: Michael Walle <michael@walle.cc>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20201023013429.3551026-1-vladimir.oltean@nxp.com
Reviewed-by:Fugang Duan <fugang.duan@nxp.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 293f899594 ("tty: serial: 21285: stop using the unused[]
variable from struct uart_port") introduced a bug which stops the
transmit interrupt being disabled when there are no characters to
transmit - disabling the transmit interrupt at the interrupt controller
is the only way to stop an interrupt storm. If this interrupt is not
disabled when there are no transmit characters, we end up with an
interrupt storm which prevents the machine making forward progress.
Fixes: 293f899594 ("tty: serial: 21285: stop using the unused[] variable from struct uart_port")
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Cc: stable <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/E1kU4GS-0006lE-OO@rmk-PC.armlinux.org.uk
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Current tcpm_detach() only reset hard_reset_count if port->attached
is true, this may cause this counter clear is missed if the CC
disconnect event is generated after tcpm_port_reset() is done
by other events, e.g. VBUS off comes first before CC disconect for
a power sink, in that case the first tcpm_detach() will only clear
port->attached flag but leave hard_reset_count there because
tcpm_port_is_disconnected() is still false, then later tcpm_detach()
by CC disconnect will directly return due to port->attached is cleared,
finally this will result tcpm will not try hard reset or error recovery
for later attach.
ChiYuan reported this issue on his platform with below tcpm trace:
After power sink session setup after hard reset 2 times, detach
from the power source and then attach:
[ 4848.046358] VBUS off
[ 4848.046384] state change SNK_READY -> SNK_UNATTACHED
[ 4848.050908] Setting voltage/current limit 0 mV 0 mA
[ 4848.050936] polarity 0
[ 4848.052593] Requesting mux state 0, usb-role 0, orientation 0
[ 4848.053222] Start toggling
[ 4848.086500] state change SNK_UNATTACHED -> TOGGLING
[ 4848.089983] CC1: 0 -> 0, CC2: 3 -> 3 [state TOGGLING, polarity 0, connected]
[ 4848.089993] state change TOGGLING -> SNK_ATTACH_WAIT
[ 4848.090031] pending state change SNK_ATTACH_WAIT -> SNK_DEBOUNCED @200 ms
[ 4848.141162] CC1: 0 -> 0, CC2: 3 -> 0 [state SNK_ATTACH_WAIT, polarity 0, disconnected]
[ 4848.141170] state change SNK_ATTACH_WAIT -> SNK_ATTACH_WAIT
[ 4848.141184] pending state change SNK_ATTACH_WAIT -> SNK_UNATTACHED @20 ms
[ 4848.163156] state change SNK_ATTACH_WAIT -> SNK_UNATTACHED [delayed 20 ms]
[ 4848.163162] Start toggling
[ 4848.216918] CC1: 0 -> 0, CC2: 0 -> 3 [state TOGGLING, polarity 0, connected]
[ 4848.216954] state change TOGGLING -> SNK_ATTACH_WAIT
[ 4848.217080] pending state change SNK_ATTACH_WAIT -> SNK_DEBOUNCED @200 ms
[ 4848.231771] CC1: 0 -> 0, CC2: 3 -> 0 [state SNK_ATTACH_WAIT, polarity 0, disconnected]
[ 4848.231800] state change SNK_ATTACH_WAIT -> SNK_ATTACH_WAIT
[ 4848.231857] pending state change SNK_ATTACH_WAIT -> SNK_UNATTACHED @20 ms
[ 4848.256022] state change SNK_ATTACH_WAIT -> SNK_UNATTACHED [delayed20 ms]
[ 4848.256049] Start toggling
[ 4848.871148] VBUS on
[ 4848.885324] CC1: 0 -> 0, CC2: 0 -> 3 [state TOGGLING, polarity 0, connected]
[ 4848.885372] state change TOGGLING -> SNK_ATTACH_WAIT
[ 4848.885548] pending state change SNK_ATTACH_WAIT -> SNK_DEBOUNCED @200 ms
[ 4849.088240] state change SNK_ATTACH_WAIT -> SNK_DEBOUNCED [delayed200 ms]
[ 4849.088284] state change SNK_DEBOUNCED -> SNK_ATTACHED
[ 4849.088291] polarity 1
[ 4849.088769] Requesting mux state 1, usb-role 2, orientation 2
[ 4849.088895] state change SNK_ATTACHED -> SNK_STARTUP
[ 4849.088907] state change SNK_STARTUP -> SNK_DISCOVERY
[ 4849.088915] Setting voltage/current limit 5000 mV 0 mA
[ 4849.088927] vbus=0 charge:=1
[ 4849.090505] state change SNK_DISCOVERY -> SNK_WAIT_CAPABILITIES
[ 4849.090828] pending state change SNK_WAIT_CAPABILITIES -> SNK_READY @240 ms
[ 4849.335878] state change SNK_WAIT_CAPABILITIES -> SNK_READY [delayed240 ms]
this patch fix this issue by clear hard_reset_count at any cases
of cc disconnect, í.e. don't check port->attached flag.
Fixes: 4b4e02c831 ("typec: tcpm: Move out of staging")
Cc: stable@vger.kernel.org
Reported-and-tested-by: ChiYuan Huang <cy_huang@richtek.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Signed-off-by: Li Jun <jun.li@nxp.com>
Link: https://lore.kernel.org/r/1602500592-3817-1-git-send-email-jun.li@nxp.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit a4e7279cd1 ("cdc-acm: introduce a cool down") is causing
regression if there is some USB error, such as -EPROTO.
This has been reported on some samples of the Odroid-N2 using the Combee II
Zibgee USB dongle.
> struct acm *acm = container_of(work, struct acm, work)
is incorrect in case of a delayed work and causes warnings, usually from
the workqueue:
> WARNING: CPU: 0 PID: 0 at kernel/workqueue.c:1474 __queue_work+0x480/0x528.
When this happens, USB eventually stops working completely after a while.
Also the ACM_ERROR_DELAY bit is never set, so the cooldown mechanism
previously introduced cannot be triggered and acm_submit_read_urb() is
never called.
This changes makes the cdc-acm driver use a single delayed work, fixing the
pointer arithmetic in acm_softint() and set the ACM_ERROR_DELAY when the
cooldown mechanism appear to be needed.
Fixes: a4e7279cd1 ("cdc-acm: introduce a cool down")
Cc: Oliver Neukum <oneukum@suse.com>
Reported-by: Pascal Vizeli <pascal.vizeli@nabucasa.com>
Acked-by: Oliver Neukum <oneukum@suse.com>
Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
Link: https://lore.kernel.org/r/20201019170702.150534-1-jbrunet@baylibre.com
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There are two flows for handling RDMA_CM_EVENT_ROUTE_RESOLVED, either the
handler triggers a completion and another thread does rdma_connect() or
the handler directly calls rdma_connect().
In all cases rdma_connect() needs to hold the handler_mutex, but when
handler's are invoked this is already held by the core code. This causes
ULPs using the 2nd method to deadlock.
Provide a rdma_connect_locked() and have all ULPs call it from their
handlers.
Link: https://lore.kernel.org/r/0-v2-53c22d5c1405+33-rdma_connect_locking_jgg@nvidia.com
Reported-and-tested-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Fixes: 2a7cec5381 ("RDMA/cma: Fix locking for the RDMA_CM_CONNECT state")
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Acked-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
According to the SMCCC spec[1](7.5.2 Discovery) the
ARM_SMCCC_ARCH_WORKAROUND_1 function id only returns 0, 1, and
SMCCC_RET_NOT_SUPPORTED.
0 is "workaround required and safe to call this function"
1 is "workaround not required but safe to call this function"
SMCCC_RET_NOT_SUPPORTED is "might be vulnerable or might not be, who knows, I give up!"
SMCCC_RET_NOT_SUPPORTED might as well mean "workaround required, except
calling this function may not work because it isn't implemented in some
cases". Wonderful. We map this SMC call to
0 is SPECTRE_MITIGATED
1 is SPECTRE_UNAFFECTED
SMCCC_RET_NOT_SUPPORTED is SPECTRE_VULNERABLE
For KVM hypercalls (hvc), we've implemented this function id to return
SMCCC_RET_NOT_SUPPORTED, 0, and SMCCC_RET_NOT_REQUIRED. One of those
isn't supposed to be there. Per the code we call
arm64_get_spectre_v2_state() to figure out what to return for this
feature discovery call.
0 is SPECTRE_MITIGATED
SMCCC_RET_NOT_REQUIRED is SPECTRE_UNAFFECTED
SMCCC_RET_NOT_SUPPORTED is SPECTRE_VULNERABLE
Let's clean this up so that KVM tells the guest this mapping:
0 is SPECTRE_MITIGATED
1 is SPECTRE_UNAFFECTED
SMCCC_RET_NOT_SUPPORTED is SPECTRE_VULNERABLE
Note: SMCCC_RET_NOT_AFFECTED is 1 but isn't part of the SMCCC spec
Fixes: c118bbb527 ("arm64: KVM: Propagate full Spectre v2 workaround state to KVM guests")
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Acked-by: Marc Zyngier <maz@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Cc: Andre Przywara <andre.przywara@arm.com>
Cc: Steven Price <steven.price@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Link: https://developer.arm.com/documentation/den0028/latest [1]
Link: https://lore.kernel.org/r/20201023154751.1973872-1-swboyd@chromium.org
Signed-off-by: Will Deacon <will@kernel.org>
Both pm_runtime_force_suspend() and pm_runtime_force_resume() have
some implicit checks, so it can make code flow more straightforward if
we separate runtime and system suspend callbacks.
High Definition Audio Specification, 4.5.9.3 Codec Wake From System S3
states that codec can wake the system up from S3 if WAKEEN is toggled.
Since HDA controller has different wakeup settings for runtime and
system susend, we also need to explicitly disable direct-complete which
can be enabled automatically by PCI core. In addition to that, avoid
waking up codec if runtime resume is for system suspend, to not break
direct-complete for codecs.
While at it, also remove AZX_DCAPS_SUSPEND_SPURIOUS_WAKEUP, as the
original bug commit a6630529ae ("ALSA: hda: Workaround for spurious
wakeups on some Intel platforms") solves doesn't happen with this
patch.
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Link: https://lore.kernel.org/r/20201027130038.16463-3-kai.heng.feng@canonical.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Upon system resume, hda_codec_pm_resume() uses hda_codec_force_resume()
to resume the codec. However, pm_runtime_force_resume() won't really
resume the codec because of pm_runtime_need_not_resume() check.
Hence, hda_codec_force_resume() schedules a jackpoll work, which is to
really power up the codec.
Instead of doing that, we can use direct-complete to make the PM flow
more straightforward, and keep codec always suspended through system PM
flow if conditions are met.
On system suspend, PM core will decide what to do based on
hda_codec_pm_prepare():
- If codec is not runtime-suspended, PM core will suspend and resume the
device as normal.
- If codec is runtime-suspended, PM core will try to keep it suspended.
If it's still suspended after system resume, we use
hda_codec_pm_complete() to resume codec if it's needed.
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Link: https://lore.kernel.org/r/20201027130038.16463-2-kai.heng.feng@canonical.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Patch removes not used variable 'length' from
cdns3_wa2_descmiss_copy_data function.
Fixes: 141e70fef4 ("usb: cdns3: gadget: need to handle sg case for workaround 2 case")
Acked-by: Roger Quadros <rogerq@ti.com>
Signed-off-by: Pawel Laszczak <pawell@cadence.com>
Signed-off-by: Peter Chen <peter.chen@nxp.com>
gcc warns about a mismatch argument type when passing
'false' into a function that expects an enum:
drivers/dma/ti/k3-udma-private.c: In function 'xudma_tchan_get':
drivers/dma/ti/k3-udma-private.c:86:34: warning: implicit conversion from 'enum <anonymous>' to 'enum udma_tp_level' [-Wenum-conversion]
86 | return __udma_reserve_##res(ud, false, id); \
| ^~~~~
drivers/dma/ti/k3-udma-private.c:95:1: note: in expansion of macro 'XUDMA_GET_PUT_RESOURCE'
95 | XUDMA_GET_PUT_RESOURCE(tchan);
| ^~~~~~~~~~~~~~~~~~~~~~
In this case, false has the same numerical value as
UDMA_TP_NORMAL, so passing that is most likely the correct
way to avoid the warning without changing the behavior.
Fixes: d702419134 ("dmaengine: ti: k3-udma: Add glue layer for non DMAengine users")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
Link: https://lore.kernel.org/r/20201026160123.3704531-1-arnd@kernel.org
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Parallel write,read,zone-mgmt operations accessing/altering zone state
and write-pointer may get into race. Avoid the situation by using a new
spinlock for zoned device.
Concurrent zone-appends (on a zone) returning same write-pointer issue
is also avoided using this lock.
Cc: stable@vger.kernel.org
Fixes: e0489ed5da ("null_blk: Support REQ_OP_ZONE_APPEND")
Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The check for src mac address in ibmveth_is_packet_unsupported is wrong.
Commit 6f2275433a wanted to shut down messages for loopback packets,
but now suppresses bridged frames, which are accepted by the hypervisor
otherwise bridging won't work at all.
Fixes: 6f2275433a ("ibmveth: Detect unsupported packets before sending to the hypervisor")
Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
Link: https://lore.kernel.org/r/20201026104221.26570-1-msuchanek@suse.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In the function ravb_hwtstamp_get() in ravb_main.c with the existing
values for RAVB_RXTSTAMP_TYPE_V2_L2_EVENT (0x2) and RAVB_RXTSTAMP_TYPE_ALL
(0x6)
if (priv->tstamp_rx_ctrl & RAVB_RXTSTAMP_TYPE_V2_L2_EVENT)
config.rx_filter = HWTSTAMP_FILTER_PTP_V2_L2_EVENT;
else if (priv->tstamp_rx_ctrl & RAVB_RXTSTAMP_TYPE_ALL)
config.rx_filter = HWTSTAMP_FILTER_ALL;
if the test on RAVB_RXTSTAMP_TYPE_ALL should be true,
it will never be reached.
This issue can be verified with 'hwtstamp_config' testing program
(tools/testing/selftests/net/hwtstamp_config.c). Setting filter type
to ALL and subsequent retrieving it gives incorrect value:
$ hwtstamp_config eth0 OFF ALL
flags = 0
tx_type = OFF
rx_filter = ALL
$ hwtstamp_config eth0
flags = 0
tx_type = OFF
rx_filter = PTP_V2_L2_EVENT
Correct this by converting if-else's to switch.
Fixes: c156633f13 ("Renesas Ethernet AVB driver proper")
Reported-by: Julia Lawall <julia.lawall@inria.fr>
Signed-off-by: Andrew Gabbasov <andrew_gabbasov@mentor.com>
Reviewed-by: Sergei Shtylyov <sergei.shtylyov@gmail.com>
Link: https://lore.kernel.org/r/20201026102130.29368-1-andrew_gabbasov@mentor.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
There's planned tests != run tests in pidfd_test when some test is
skipped:
$ ./pidfd_test
TAP version 13
1..8
[...]
# pidfd_send_signal signal recycled pid test: Skipping test
# Planned tests != run tests (8 != 7)
# Totals: pass:7 fail:0 xfail:0 xpass:0 skip:0 error:0
Fix by using ksft_test_result_skip():
$ ./pidfd_test
TAP version 13
1..8
[...]
ok 8 # SKIP pidfd_send_signal signal recycled pid test: Unsharing pid namespace not permitted
# Totals: pass:7 fail:0 xfail:0 xpass:0 skip:1 error:0
Signed-off-by: Tommi Rantala <tommi.t.rantala@nokia.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Drop unneeded <linux/wait.h> header inclusion to fix pidfd compilation
errors seen in Fedora 32:
In file included from pidfd_open_test.c:9:
../../../../usr/include/linux/wait.h:17:16: error: expected identifier before numeric constant
17 | #define P_ALL 0
| ^
Signed-off-by: Tommi Rantala <tommi.t.rantala@nokia.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Commit 1056d3d2c9 ("selftests: enforce local header dependency in
lib.mk") added header dependency to the rule, but as the rule uses $^,
the headers are added to the compiler command line.
This can cause unexpected precompiled header files being generated when
compilation fails:
$ echo { >> openat2_test.c
$ make
gcc -Wall -O2 -g -fsanitize=address -fsanitize=undefined openat2_test.c
tools/testing/selftests/kselftest_harness.h tools/testing/selftests/kselftest.h helpers.c
-o tools/testing/selftests/openat2/openat2_test
openat2_test.c:313:1: error: expected identifier or ‘(’ before ‘{’ token
313 | {
| ^
make: *** [../lib.mk:140: tools/testing/selftests/openat2/openat2_test] Error 1
$ file openat2_test*
openat2_test: GCC precompiled header (version 014) for C
openat2_test.c: C source, ASCII text
Fix it by filtering out the headers, so that we'll only pass the actual
*.c files in the compiler command line.
Fixes: 1056d3d2c9 ("selftests: enforce local header dependency in lib.mk")
Signed-off-by: Tommi Rantala <tommi.t.rantala@nokia.com>
Acked-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
More recent libc implementations are now using openat/openat2 system
calls so also add do_sys_openat2 to the tracing so that the test
passes on these systems because do_sys_open may not be called.
Thanks to Masami Hiramatsu for the help on getting this fix to work
correctly.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Commit cad6967ac1 ("fork: introduce kernel_clone()") replaced "_do_fork()"
with "kernel_clone()". The ftrace selftests reference the fork function in
several of the tests. The rename will make the tests break, but if those
names are changed in the tests, they would then break on older kernels. The
same set of tests should pass older kernels if they have previously passed.
Obviously, a new test may not work on older kernels if the test was added
due to a bug or a new feature.
The setup of ftracetest will now create a $FUNCTION_FORK bash variable
that will contain "_do_fork" for older kernels and "kernel_clone" for newer
ones. It figures out the proper name by examining /proc/kallsyms.
Note, available_filter_functions could also be used, but because some tests
should be able to pass without function tracing enabled, it could not be
used.
Fixes: eea11285da ("tracing: switch to kernel_clone()")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Commit d53d9bc0cf ("x86/debug: Change thread.debugreg6 to
thread.virtual_dr6") changed the semantics of the variable from random
collection of bits, to exactly only those bits that ptrace() needs.
Unfortunately this lost DR_STEP for PTRACE_{BLOCK,SINGLE}STEP.
Furthermore, it turns out that userspace expects DR_STEP to be
unconditionally available, even for manual TF usage outside of
PTRACE_{BLOCK,SINGLE}_STEP.
Fixes: d53d9bc0cf ("x86/debug: Change thread.debugreg6 to thread.virtual_dr6")
Reported-by: Kyle Huey <me@kylehuey.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Kyle Huey <me@kylehuey.com>
Link: https://lore.kernel.org/r/20201027183330.GM2628@hirez.programming.kicks-ass.net
The SDM states that #DB clears DEBUGCTLMSR_BTF, this means that when the
bit is set for userspace (TIF_BLOCKSTEP) and a kernel #DB happens first,
the BTF bit meant for userspace execution is lost.
Have the kernel #DB handler restore the BTF bit when it was requested
for userspace.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Kyle Huey <me@kylehuey.com>
Link: https://lore.kernel.org/r/20201027093607.956147736@infradead.org
Fix afs_launder_page() to not clear PG_writeback on the page it is
laundering as the flag isn't set in this case.
Fixes: 4343d00872 ("afs: Get rid of the afs_writeback record")
Signed-off-by: David Howells <dhowells@redhat.com>
The patch dca54a7bbb: "afs: Add tracing for cell refcount and active user
count" from Oct 13, 2020, leads to the following Smatch complaint:
fs/afs/cell.c:596 afs_unuse_cell()
warn: variable dereferenced before check 'cell' (see line 592)
Fix this by moving the retrieval of the cell debug ID to after the check of
the validity of the cell pointer.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: dca54a7bbb ("afs: Add tracing for cell refcount and active user count")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Dan Carpenter <dan.carpenter@oracle.com>
The prevention of splice-write without explicit ops made the
copy_file_write() syscall to an afs file (as done by the generic/112
xfstest) fail with EINVAL.
Fix by using iter_file_splice_write() for afs.
Fixes: 36e2c7421f ("fs: don't allow splice read/write without explicit ops")
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
The memlock rlimit is a notorious source of failure for BPF programs. Most
of the samples just set it to infinity, but a few used a lower limit. The
problem with unconditionally setting a lower limit is that this will also
override the limit if the system-wide setting is *higher* than the limit
being set, which can lead to failures on systems that lock a lot of memory,
but set 'ulimit -l' to unlimited before running a sample.
One fix for this is to only conditionally set the limit if the current
limit is lower, but it is simpler to just unify all the samples and have
them all set the limit to infinity.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Link: https://lore.kernel.org/bpf/20201026233623.91728-1-toke@redhat.com
Pull x86 fixes from Thomas Gleixner:
"A couple of x86 fixes which missed rc1 due to my stupidity:
- Drop lazy TLB mode before switching to the temporary address space
for text patching.
text_poke() switches to the temporary mm which clears the lazy mode
and restores the original mm afterwards. Due to clearing lazy mode
this might restore a already dead mm if exit_mmap() runs in
parallel on another CPU.
- Document the x32 syscall design fail vs. syscall numbers 512-547
properly.
- Fix the ORC unwinder to handle the inactive task frame correctly.
This was unearthed due to the slightly different code generation of
gcc-10.
- Use an up to date screen_info for the boot params of kexec instead
of the possibly stale and invalid version which happened to be
valid when the kexec kernel was loaded"
* tag 'x86-urgent-2020-10-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/alternative: Don't call text_poke() in lazy TLB mode
x86/syscalls: Document the fact that syscalls 512-547 are a legacy mistake
x86/unwind/orc: Fix inactive tasks with stack pointer in %sp on GCC 10 compiled kernels
hyperv_fb: Update screen_info after removing old framebuffer
x86/kexec: Use up-to-dated screen_info copy to fill boot params
When running the trigger hook, ALSA by default will take a spinlock, and
thus will run the trigger hook in atomic context.
However, our HDMI driver will send the infoframes as part of the trigger
hook, and part of that process is to wait for a bit to be cleared for up to
100ms. To be nicer to the system, that wait has some usleep_range that
interact poorly with the atomic context.
There's several ways we can fix this, but the more obvious one is to make
ALSA take a mutex instead by setting the nonatomic flag on the DAI link.
That doesn't work though, since now the cyclic callback installed by the
dmaengine helpers in ALSA will take a mutex, while that callback is run by
dmaengine's virt-chan code in a tasklet where sleeping is not allowed
either.
Given the delay we need to poll the bit for, changing the usleep_range for
a udelay and keep running it from a context where interrupts are disabled
is not really a good option either.
However, we can move the infoframe setup code in the hw_params hook, like
is usually done in other HDMI controllers, that isn't protected by a
spinlock and thus where we can sleep. Infoframes will be sent on a regular
basis anyway, and since hw_params is where the audio parameters that end up
in the infoframes are setup, this also makes a bit more sense.
Fixes: bb7d785688 ("drm/vc4: Add HDMI audio support")
Suggested-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Reviewed-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Takashi Iwai <tiwai@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20201027101558.427256-1-maxime@cerno.tech
Pull orphan section fixes from Kees Cook:
"A couple corner cases were found from the link-time orphan section
handling series:
- arm: handle .ARM.exidx and .ARM.extab sections (Nathan Chancellor)
- x86: collect .ctors.* with .ctors (Kees Cook)"
* tag 'orphan-handling-v5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
arm/build: Always handle .ARM.exidx and .ARM.extab sections
vmlinux.lds.h: Keep .ctors.* with .ctors
After turning on warnings for orphan section placement, enabling
CONFIG_UNWINDER_FRAME_POINTER instead of CONFIG_UNWINDER_ARM causes
thousands of warnings when clang + ld.lld are used:
$ scripts/config --file arch/arm/configs/multi_v7_defconfig \
-d CONFIG_UNWINDER_ARM \
-e CONFIG_UNWINDER_FRAME_POINTER
$ make -skj"$(nproc)" ARCH=arm CROSS_COMPILE=arm-linux-gnueabi- LLVM=1 defconfig zImage
ld.lld: warning: init/built-in.a(main.o):(.ARM.extab) is being placed in '.ARM.extab'
ld.lld: warning: init/built-in.a(main.o):(.ARM.extab.init.text) is being placed in '.ARM.extab.init.text'
ld.lld: warning: init/built-in.a(main.o):(.ARM.extab.ref.text) is being placed in '.ARM.extab.ref.text'
ld.lld: warning: init/built-in.a(do_mounts.o):(.ARM.extab.init.text) is being placed in '.ARM.extab.init.text'
ld.lld: warning: init/built-in.a(do_mounts.o):(.ARM.extab) is being placed in '.ARM.extab'
ld.lld: warning: init/built-in.a(do_mounts_rd.o):(.ARM.extab.init.text) is being placed in '.ARM.extab.init.text'
ld.lld: warning: init/built-in.a(do_mounts_rd.o):(.ARM.extab) is being placed in '.ARM.extab'
ld.lld: warning: init/built-in.a(do_mounts_initrd.o):(.ARM.extab.init.text) is being placed in '.ARM.extab.init.text'
ld.lld: warning: init/built-in.a(initramfs.o):(.ARM.extab.init.text) is being placed in '.ARM.extab.init.text'
ld.lld: warning: init/built-in.a(initramfs.o):(.ARM.extab) is being placed in '.ARM.extab'
ld.lld: warning: init/built-in.a(calibrate.o):(.ARM.extab.init.text) is being placed in '.ARM.extab.init.text'
ld.lld: warning: init/built-in.a(calibrate.o):(.ARM.extab) is being placed in '.ARM.extab'
These sections are handled by the ARM_UNWIND_SECTIONS define, which is
only added to the list of sections when CONFIG_ARM_UNWIND is set.
CONFIG_ARM_UNWIND is a hidden symbol that is only selected when
CONFIG_UNWINDER_ARM is set so CONFIG_UNWINDER_FRAME_POINTER never
handles these sections. According to the help text of
CONFIG_UNWINDER_ARM, these sections should be discarded so that the
kernel image size is not affected.
Fixes: 5a17850e25 ("arm/build: Warn on orphan section placement")
Link: https://github.com/ClangBuiltLinux/linux/issues/1152
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Review-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
[kees: Made the discard slightly more specific]
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20200928224854.3224862-1-natechancellor@gmail.com
gcc points out a type mismatch:
drivers/acpi/dock.c: In function 'hot_remove_dock_devices':
drivers/acpi/dock.c:234:53: warning: implicit conversion from 'enum <anonymous>' to 'enum dock_callback_type' [-Wenum-conversion]
234 | dock_hotplug_event(dd, ACPI_NOTIFY_EJECT_REQUEST, false);
This is harmless because 'false' still has the correct numeric value,
but passing DOCK_CALL_HANDLER documents better what is going on
and avoids the warning.
Fixes: 37f908778f ("ACPI / dock: Walk list in reverse order during removal of devices")
Fixes: f09ce741a0 ("ACPI / dock / PCI: Drop ACPI dock notifier chain")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
It sounds that there were function renames. Update the kernel-doc
markups accordingly.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
It appears that firmware nodes can be shared between devices. In such case
when a (child) device is about to be deleted, its firmware node may be shared
and ACPI_COMPANION_SET(..., NULL) call for it breaks the secondary link
of the shared primary firmware node.
In order to prevent that, check, if the device has a parent and parent's
firmware node is shared with its child, and avoid crashing the link.
Fixes: c15e1bdda4 ("device property: Fix the secondary firmware node handling in set_primary_fwnode()")
Reported-by: Ferry Toth <fntoth@gmail.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Tested-by: Ferry Toth <fntoth@gmail.com>
Cc: 5.9+ <stable@vger.kernel.org> # 5.9+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Behind primary and secondary we understand the type of the nodes
which might define their ordering. However, if primary node gone,
we can't maintain the ordering by definition of the linked list.
Thus, by ordering secondary node becomes first in the list.
But in this case the meaning of it is still secondary (or auxiliary).
The type of the node is maintained by the secondary pointer in it:
secondary pointer Meaning
NULL or valid primary node
ERR_PTR(-ENODEV) secondary node
So, if by some reason we do the following sequence of calls
set_primary_fwnode(dev, NULL);
set_primary_fwnode(dev, primary);
we should preserve secondary node.
This concept is supported by the description of set_primary_fwnode()
along with implementation of set_secondary_fwnode(). Hence, fix
the commit c15e1bdda4 to follow this as well.
Fixes: c15e1bdda4 ("device property: Fix the secondary firmware node handling in set_primary_fwnode()")
Cc: Ferry Toth <fntoth@gmail.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Tested-by: Ferry Toth <fntoth@gmail.com>
Cc: 5.9+ <stable@vger.kernel.org> # 5.9+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Under some circumstances, the compiler generates .ctors.* sections. This
is seen doing a cross compile of x86_64 from a powerpc64el host:
x86_64-linux-gnu-ld: warning: orphan section `.ctors.65435' from `kernel/trace/trace_clock.o' being
placed in section `.ctors.65435'
x86_64-linux-gnu-ld: warning: orphan section `.ctors.65435' from `kernel/trace/ftrace.o' being
placed in section `.ctors.65435'
x86_64-linux-gnu-ld: warning: orphan section `.ctors.65435' from `kernel/trace/ring_buffer.o' being
placed in section `.ctors.65435'
Include these orphans along with the regular .ctors section.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Tested-by: Stephen Rothwell <sfr@canb.auug.org.au>
Fixes: 83109d5d5f ("x86/build: Warn on orphan section placement")
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Nick Desaulniers <ndesaulniers@google.com>
Link: https://lore.kernel.org/r/20201005025720.2599682-1-keescook@chromium.org
Fix a typo in a comment in freeze_processes().
Signed-off-by: Jackie Zamow <jackie.zamow@gmail.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
It has been confirmed that the SMU metrics table should always reflect
the current fan speed even in manual mode.
Fixes: 3033e9f1c2de ("drm/amdgpu/swsmu: handle manual fan readback on SMU11")
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
fix the wrong fan speed in fan1_input when the fan control mode is manual.
the fan speed value is not correct when we set manual mode to fan1_enalbe - 1.
since the fan speed in the metrics table always reflects the real fan speed,we
can fetch the fan speed for both auto and manual mode.
Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Currently intel_idle driver gets the c-state information from ACPI
_CST if the processor model is not recognized by it. However the
c-state in _CST starts with index 1 which is different from the
index in intel_idle driver's internal c-state table.
While intel_idle_max_cstate_reached() was previously introduced to
deal with intel_idle driver's internal c-state table, re-using
this function directly on _CST is incorrect.
Fix this by subtracting 1 from the index when checking max_cstate
in the _CST case.
For example, append intel_idle.max_cstate=1 in boot command line,
Before the patch:
grep . /sys/devices/system/cpu/cpu0/cpuidle/state*/name
POLL
After the patch:
grep . /sys/devices/system/cpu/cpu0/cpuidle/state*/name
/sys/devices/system/cpu/cpu0/cpuidle/state0/name:POLL
/sys/devices/system/cpu/cpu0/cpuidle/state1/name:C1_ACPI
Fixes: 18734958e9 ("intel_idle: Use ACPI _CST for processor models without C-state tables")
Reported-by: Pengfei Xu <pengfei.xu@intel.com>
Cc: 5.6+ <stable@vger.kernel.org> # 5.6+
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
[ rjw: Changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Pull operating performance points (OPP) framework fixes for 5.10-rc2
from Viresh Kumar:
"- Don't remove the static OPPs erroneously.
- Check for the right condition before making an early exit.
- Avoid a lockdep by reducing the size of the critical section."
* 'opp/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm:
opp: Reduce the size of critical section in _opp_table_kref_release()
opp: Fix early exit from dev_pm_opp_register_set_opp_helper()
opp: Don't always remove static OPPs in _of_add_opp_table_v1()
If the cpufreq policy max limit is changed when intel_pstate operates
in the passive mode with HWP enabled and the "powersave" governor is
used on top of it, the HWP max limit is not updated as appropriate.
Namely, in the "powersave" governor case, the target P-state
is always equal to the policy min limit, so if the latter does
not change, intel_cpufreq_adjust_hwp() is not invoked to update
the HWP Request MSR due to the "target_pstate != old_pstate" check
in intel_cpufreq_update_pstate(), so the HWP max limit is not
updated as a result.
Also, if the CPUFREQ_NEED_UPDATE_LIMITS flag is not set for the
driver and the target frequency does not change along with the
policy max limit, the "target_freq == policy->cur" check in
__cpufreq_driver_target() prevents the driver's ->target() callback
from being invoked at all, so the HWP max limit is not updated.
To prevent that occurring, set the CPUFREQ_NEED_UPDATE_LIMITS flag
in the intel_cpufreq driver structure if HWP is enabled and modify
intel_cpufreq_update_pstate() to do the "target_pstate != old_pstate"
check only in the non-HWP case and let intel_cpufreq_adjust_hwp()
always run in the HWP case (it will update HWP Request only if the
cached value of the register is different from the new one including
the limits, so if neither the target P-state value nor the max limit
changes, the register write will still be avoided).
Fixes: f6ebbcf08f ("cpufreq: intel_pstate: Implement passive mode with HWP enabled")
Reported-by: Zhang Rui <rui.zhang@intel.com>
Cc: 5.9+ <stable@vger.kernel.org> # 5.9+: 1c534352f4 cpufreq: Introduce CPUFREQ_NEED_UPDATE_LIMITS ...
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Generally, a cpufreq driver may need to update some internal upper
and lower frequency boundaries on policy max and min changes,
respectively, but currently this does not work if the target
frequency does not change along with the policy limit.
Namely, if the target frequency does not change along with the
policy min or max, the "target_freq == policy->cur" check in
__cpufreq_driver_target() prevents driver callbacks from being
invoked and they do not even have a chance to update the
corresponding internal boundary.
This particularly affects the "powersave" and "performance"
governors that always set the target frequency to one of the
policy limits and it never changes when the other limit is updated.
To allow cpufreq the drivers needing to update internal frequency
boundaries on policy limits changes to avoid this issue, introduce
a new driver flag, CPUFREQ_NEED_UPDATE_LIMITS, that (when set) will
neutralize the check mentioned above.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Commit 33aa46f252 ("cpufreq: intel_pstate: Use passive mode by
default without HWP") was meant to cause intel_pstate to be used
in the passive mode with the schedutil governor on top of it, but
it missed the case in which either "ondemand" or "conservative"
was selected as the default governor in the existing kernel config,
in which case the previous old governor configuration would be used,
causing the default legacy governor to be used on top of intel_pstate
instead of schedutil.
Address this by preventing "ondemand" and "conservative" from being
configured as the default cpufreq governor in the case when schedutil
is the default choice for the default governor setting.
[Note that the default cpufreq governor can still be set via the
kernel command line if need be and that choice is not limited,
so if anyone really wants to use one of the legacy governors by
default, it can be achieved this way.]
Fixes: 33aa46f252 ("cpufreq: intel_pstate: Use passive mode by default without HWP")
Reported-by: Julia Lawall <julia.lawall@inria.fr>
Cc: 5.8+ <stable@vger.kernel.org> # 5.8+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Pull devicetree fixes from Rob Herring:
- More binding additionalProperties/unevaluatedProperties additions
- More yamllint fixes on additions in the merge window
- CrOS embedded controller schema updates to fix warnings
- LEDs schema update adding ID_RGB
- A reserved-memory fix for regions starting at address 0x0
* tag 'devicetree-fixes-for-5.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
dt-bindings: Another round of adding missing 'additionalProperties/unevalutatedProperties'
dt-bindings: Explicitly allow additional properties in board/SoC schemas
dt-bindings: More whitespace clean-ups in schema files
mfd: google,cros-ec: add missing properties
dt-bindings: input: convert cros-ec-keyb to json-schema
dt-bindings: i2c: convert i2c-cros-ec-tunnel to json-schema
of: Fix reserved-memory overlap detection
dt-bindings: mailbox: mtk-gce: fix incorrect mbox-cells value
dt-bindings: leds: Update devicetree documents for ID_RGB
The removal of compat_process_vm_{readv,writev} didn't change
process_vm_rw(), which always assumes it's not doing a compat syscall.
Instead of passing in 'false' unconditionally for 'compat', make it
conditional on in_compat_syscall().
[ Both Al and Christoph point out that trying to access a 64-bit process
from a 32-bit one cannot work anyway, and is likely better prohibited,
but that's a separate issue - Linus ]
Fixes: c3973b401e ("mm: remove compat_process_vm_{readv,writev}")
Reported-and-tested-by: Kyle Huey <me@kylehuey.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Without the explicit __always_inline, some RISC-V configs place the
functions out of line, triggering the BUILD_BUG_ON checks in the
function.
Fixes: 11129e8ed4 ("riscv: use memcpy based uaccess for nommu again")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
By doing so we can associate the sequence counter to the chunk_mutex
for lockdep purposes (compiled-out otherwise), the mutex is otherwise
used on the write side.
Also avoid explicitly disabling preemption around the write region as it
will now be done automatically by the seqcount machinery based on the
lock type.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Since we switched to the iomap infrastructure in b5ff9f1a96e8f ("btrfs:
switch to iomap for direct IO") we're calling generic_file_buffered_read()
directly and not via generic_file_read_iter() anymore.
If the read could read everything there is no need to bother calling
generic_file_buffered_read(), like it is handled in
generic_file_read_iter().
If we call generic_file_buffered_read() in this case we can hit a
situation where we do an invalid readahead and cause this UBSAN splat
in fstest generic/091:
run fstests generic/091 at 2020-10-21 10:52:32
================================================================================
UBSAN: shift-out-of-bounds in ./include/linux/log2.h:57:13
shift exponent 64 is too large for 64-bit type 'long unsigned int'
CPU: 0 PID: 656 Comm: fsx Not tainted 5.9.0-rc7+ #821
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4-rebuilt.opensuse.org 04/01/2014
Call Trace:
__dump_stack lib/dump_stack.c:77
dump_stack+0x57/0x70 lib/dump_stack.c:118
ubsan_epilogue+0x5/0x40 lib/ubsan.c:148
__ubsan_handle_shift_out_of_bounds.cold+0x61/0xe9 lib/ubsan.c:395
__roundup_pow_of_two ./include/linux/log2.h:57
get_init_ra_size mm/readahead.c:318
ondemand_readahead.cold+0x16/0x2c mm/readahead.c:530
generic_file_buffered_read+0x3ac/0x840 mm/filemap.c:2199
call_read_iter ./include/linux/fs.h:1876
new_sync_read+0x102/0x180 fs/read_write.c:415
vfs_read+0x11c/0x1a0 fs/read_write.c:481
ksys_read+0x4f/0xc0 fs/read_write.c:615
do_syscall_64+0x33/0x40 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9 arch/x86/entry/entry_64.S:118
RIP: 0033:0x7fe87fee992e
RSP: 002b:00007ffe01605278 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
RAX: ffffffffffffffda RBX: 000000000004f000 RCX: 00007fe87fee992e
RDX: 0000000000004000 RSI: 0000000001677000 RDI: 0000000000000003
RBP: 000000000004f000 R08: 0000000000004000 R09: 000000000004f000
R10: 0000000000053000 R11: 0000000000000246 R12: 0000000000004000
R13: 0000000000000000 R14: 000000000007a120 R15: 0000000000000000
================================================================================
BTRFS info (device nullb0): has skinny extents
BTRFS info (device nullb0): ZONED mode enabled, zone size 268435456 B
BTRFS info (device nullb0): enabling ssd optimizations
Fixes: f85781fb50 ("btrfs: switch to iomap for direct IO")
Reviewed-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In addition to the rest of Qcom interconnect drivers use icc_sync_state
for SM8150/SM8250 interconnect drivers to notify the interconnect
framework when all consumers are probed and there is no need to keep the
bandwidth set to maximum anymore.
Also move the BCM initialization before creating the nodes to set the
max bandwidth in hardware for the initialization/probing stage.
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Fixes: 7d3b0b0d81 ("interconnect: qcom: Use icc_sync_state")
Link: https://lore.kernel.org/r/20201027133418.976687-1-dmitry.baryshkov@linaro.org
Signed-off-by: Georgi Djakov <georgi.djakov@linaro.org>
There was a memory corruption bug happening while running the synthetic
event selftests:
kmemleak: Cannot insert 0xffff8c196fa2afe5 into the object search tree (overlaps existing)
CPU: 5 PID: 6866 Comm: ftracetest Tainted: G W 5.9.0-rc5-test+ #577
Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016
Call Trace:
dump_stack+0x8d/0xc0
create_object.cold+0x3b/0x60
slab_post_alloc_hook+0x57/0x510
? tracing_map_init+0x178/0x340
__kmalloc+0x1b1/0x390
tracing_map_init+0x178/0x340
event_hist_trigger_func+0x523/0xa40
trigger_process_regex+0xc5/0x110
event_trigger_write+0x71/0xd0
vfs_write+0xca/0x210
ksys_write+0x70/0xf0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7fef0a63a487
Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
RSP: 002b:00007fff76f18398 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000039 RCX: 00007fef0a63a487
RDX: 0000000000000039 RSI: 000055eb3b26d690 RDI: 0000000000000001
RBP: 000055eb3b26d690 R08: 000000000000000a R09: 0000000000000038
R10: 000055eb3b2cdb80 R11: 0000000000000246 R12: 0000000000000039
R13: 00007fef0a70b500 R14: 0000000000000039 R15: 00007fef0a70b700
kmemleak: Kernel memory leak detector disabled
kmemleak: Object 0xffff8c196fa2afe0 (size 8):
kmemleak: comm "ftracetest", pid 6866, jiffies 4295082531
kmemleak: min_count = 1
kmemleak: count = 0
kmemleak: flags = 0x1
kmemleak: checksum = 0
kmemleak: backtrace:
__kmalloc+0x1b1/0x390
tracing_map_init+0x1be/0x340
event_hist_trigger_func+0x523/0xa40
trigger_process_regex+0xc5/0x110
event_trigger_write+0x71/0xd0
vfs_write+0xca/0x210
ksys_write+0x70/0xf0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xa9
The cause came down to a use of strcat() that was adding an string that was
shorten, but the strcat() did not take that into account.
strcat() is extremely dangerous as it does not care how big the buffer is.
Replace it with seq_buf operations that prevent the buffer from being
overwritten if what is being written is bigger than the buffer.
Fixes: 10819e2579 ("tracing: Handle synthetic event array field type checking correctly")
Reviewed-by: Tom Zanussi <zanussi@kernel.org>
Tested-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
The "cb_pcidas" driver supports asynchronous commands on the analog
output (AO) subdevice for those boards that have an AO FIFO. The code
(in `cb_pcidas_ao_check_chanlist()` and `cb_pcidas_ao_cmd()`) to
validate and set up the command supports output to a single channel or
to two channels simultaneously (the boards have two AO channels).
However, the code in `cb_pcidas_auto_attach()` that initializes the
subdevices neglects to initialize the AO subdevice's `len_chanlist`
member, leaving it set to 0, but the Comedi core will "correct" it to 1
if the driver neglected to set it. This limits commands to use a single
channel (either channel 0 or 1), but the limit should be two channels.
Set the AO subdevice's `len_chanlist` member to be the same value as the
`n_chan` member, which will be 2.
Cc: <stable@vger.kernel.org>
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Link: https://lore.kernel.org/r/20201021122142.81628-1-abbotti@mev.co.uk
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It is not possible to create cross-references for duplicated
symbols. While Sphinx always detected it, on Sphinx 3 it
generates warnings like this:
.../Documentation/gpu/drm-kms-helpers:326: ../drivers/gpu/drm/drm_edid.c:1626: WARNING: Duplicate C declaration, also defined in 'gpu/drm-kms-helpers'.
Declaration is 'bool drm_edid_are_equal (const struct edid *edid1, const struct edid *edid2)'.
So, get rid of the duplicated kernel-doc markup.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/9310f4074fa9d29cd3ad60684d86d0ace8dab7ae.1603791716.git.mchehab+huawei@kernel.org
i.MX fixes for 5.10, 2nd round:
A couple of more defconfig fixes to enable CONFIG_GPIO_MXC option for
i.MX ARMv4/v5 devices.
* tag 'imx-fixes-5.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux:
ARM: multi_v5_defconfig: Select CONFIG_GPIO_MXC
ARM: imx_v4_v5_defconfig: Select CONFIG_GPIO_MXC
Link: https://lore.kernel.org/r/20201026235230.GL9880@dragon
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
The error_debugfs label is only used when either
CONFIG_USB_DWC2_PERIPHERAL or CONFIG_USB_DWC2_DUAL_ROLE is enabled. Add
the same #if to the error_debugfs label itself as the code which uses
this label already has.
This avoids the following compiler warning:
warning: label ‘error_debugfs’ defined but not used [-Wunused-label]
Fixes: e1c08cf231 ("usb: dwc2: Add missing cleanups when usb_add_gadget_udc() fails")
Acked-by: Minas Harutyunyan <Minas.Harutyunyan@synopsys.com>
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Felipe Balbi <balbi@kernel.org>
If we want to send a control status on our own time (through
delayed_status), make sure to handle a case where we may queue the
delayed status before the host requesting for it (when XferNotReady
is generated). Otherwise, the driver won't send anything because it's
not EP0_STATUS_PHASE yet. To resolve this, regardless whether
dwc->ep0state is EP0_STATUS_PHASE, make sure to clear the
dwc->delayed_status flag if dwc3_ep0_send_delayed_status() is called.
The control status can be sent when the host requests it later.
Cc: <stable@vger.kernel.org>
Fixes: d97c78a190 ("usb: dwc3: gadget: END_TRANSFER before CLEAR_STALL command")
Signed-off-by: Thinh Nguyen <Thinh.Nguyen@synopsys.com>
Signed-off-by: Felipe Balbi <balbi@kernel.org>
clang warns about functions returning a 'const int' result:
drivers/gpu/drm/imx/imx-tve.c:487:8: warning: type qualifiers ignored on function return type [-Wignored-qualifiers]
Remove the extraneous 'const' qualifier here. I would guess that the
function was intended to be marked __attribute__((const)) instead,
but that would also be wrong since it call other functions without
that attribute.
Fixes: fcbc51e54d ("staging: drm/imx: Add support for Television Encoder (TVEv2)")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
The edid_len variable is never used again. Use a local variable instead
of storing it in the device structure.
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Remove leftover container_of helper, it has been replaced by
bridge_to_imxpd().
Fixes: fe141cedc4 ("drm/imx: pd: Use bus format/flags provided by the bridge when available")
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
The edid memory is only freed if the component.unbind() is called. This
is okay if the parallel-display was bound but if the bind() fails we
leak the memory.
Signed-off-by: Marco Felsch <m.felsch@pengutronix.de>
[p.zabel@pengutronix.de: rebased, dropped now empty unbind()]
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
The edid_len variable is never used again. Use a local variable instead
of storing it in the device structure.
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
__nvme_fc_terminate_io() is now called by only 1 place, in reset_work.
Consoldate and move the functionality of terminate_io into reset_work.
In reset_work, rather than calling the create_association directly,
schedule the connect work element to do its thing. After scheduling,
flush the connect work element to continue with semantic of not
returning until connect has been attempted at least once.
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
nvme_fc_error_recovery() special cases handling when in CONNECTING state
and calls __nvme_fc_terminate_io(). __nvme_fc_terminate_io() itself
special cases CONNECTING state and calls the routine to abort outstanding
ios.
Simplify the sequence by putting the call to abort outstanding I/Os
directly in nvme_fc_error_recovery.
Move the location of __nvme_fc_abort_outstanding_ios(), and
nvme_fc_terminate_exchange() which is called by it, to avoid adding
function prototypes for nvme_fc_error_recovery().
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
err_work was created to handle errors (mainly I/O timeouts) while in
CONNECTING state. The flag for err_work_active is also unneeded.
Remove err_work_active and err_work. The actions to abort I/Os are moved
inline to nvme_error_recovery().
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Whenever there are errors during CONNECTING, the driver recovers by
aborting all outstanding ios and counts on the io completion to fail them
and thus the connection/association they are on. However, the connection
failure depends on a failure state from the core routines. Not all
commands that are issued by the core routine are guaranteed to cause a
failure of the core routine. They may be treated as a failure status and
the status is then ignored.
As such, whenever the transport enters error_recovery while CONNECTING,
it will set a new flag indicating an association failed. The
create_association routine which creates and initializes the controller,
will monitor the state of the flag as well as the core routine error
status and ensure the association fails if there was an error.
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Receiving a zero length message leads to the following warnings because
the CQE is processed twice:
refcount_t: underflow; use-after-free.
WARNING: CPU: 0 PID: 0 at lib/refcount.c:28
RIP: 0010:refcount_warn_saturate+0xd9/0xe0
Call Trace:
<IRQ>
nvme_rdma_recv_done+0xf3/0x280 [nvme_rdma]
__ib_process_cq+0x76/0x150 [ib_core]
...
Sanity check the received data length, to avoids this.
Thanks to Chao Leng & Sagi for suggestions.
Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Revalidating nvme zoned namespaces requires IO commands, and there are
controller states that prevent IO. For example, a sanitize in progress
is required to fail all IO, but we don't want to remove a namespace
we've previously added just because the controller is in such a state.
Suppress the error in this case.
Reported-by: Michael Nguyen <michael.nguyen@wdc.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
fsl_ep_fifo_status() should return error if _ep->desc is null.
Fixes: 75eaa498c9 (“usb: gadget: Correct NULL pointer checking in fsl gadget”)
Reviewed-by: Peter Chen <peter.chen@nxp.com>
Signed-off-by: Ran Wang <ran.wang_1@nxp.com>
Signed-off-by: Felipe Balbi <balbi@kernel.org>
goku_probe() goes to error label "err" and invokes goku_remove()
in case of failures of pci_enable_device(), pci_resource_start()
and ioremap(). goku_remove() gets a device from
pci_get_drvdata(pdev) and works with it without any checks, in
particular it dereferences a corresponding pointer. But
goku_probe() did not set this device yet. So, one can expect
various crashes. The patch moves setting the device just after
allocation of memory for it.
Found by Linux Driver Verification project (linuxtesting.org).
Reported-by: Pavel Andrianov <andrianov@ispras.ru>
Signed-off-by: Evgeny Novikov <novikov@ispras.ru>
Signed-off-by: Felipe Balbi <balbi@kernel.org>
There is a lot of stuff here which can be done outside of the big
opp_table_lock, do that. This helps avoiding few circular dependency
lockdeps around debugfs and interconnects.
Reported-by: Rob Clark <robdclark@gmail.com>
Reported-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Based on more testing, commit 8ca5ee624b4c ("ARM: OMAP2+: Restore MPU
power domain if cpu_cluster_pm_enter() fails") is a poor fix for handling
cpu_cluster_pm_enter() returned errors.
We should not override the cpuidle states with a hardcoded PWRDM_POWER_ON
value. Instead, we should use a configured idle state that does not cause
the context to be lost. Otherwise we end up configuring a potentially
improper state for the MPUSS. We also want to update the returned state
index for the selected state.
Let's just select the highest power idle state C1 to ensure no context
loss is allowed on cpu_cluster_pm_enter() errors. With these changes we
can now unconditionally call omap4_enter_lowpower() for WFI like we did
earlier before commit 55be2f5033 ("ARM: OMAP2+: Handle errors for
cpu_pm"). And we can return the selected state index.
Fixes: 8f04aea048 ("ARM: OMAP2+: Restore MPU power domain if cpu_cluster_pm_enter() fails")
Fixes: 55be2f5033 ("ARM: OMAP2+: Handle errors for cpu_pm")
Signed-off-by: Tony Lindgren <tony@atomide.com>
The Zoom UAC-2 USB audio interface provides an async playback endpoint
("1 OUT (ASYNC)") and capture endpoint ("2 IN (ASYNC)"), both with
2-channel S32_LE in 44.1, 48, 88.2, 96, 176.4, or 192
kilosamples/s. The device provides explicit feedback to adjust the
host's playback rate, but the feedback appears unstable and biased
relative to the device's capture rate.
"alsaloop -t 1000" experiences playback underruns and tries to
resample the captured audio to match the varying playback
rate. Forcing the kernel to use implicit feedback appears to
produce more stable results. This causes the host to transmit one
playback sample for each capture sample received. (Zoom North America
has been notified of this change.)
Signed-off-by: Keith Winstein <keithw@cs.stanford.edu>
Tested-by: Keith Winstein <keithw@cs.stanford.edu>
Cc: <stable@vger.kernel.org>
BugLink: https://lore.kernel.org/r/20201027071841.GA164525@trolley.csail.mit.edu
Signed-off-by: Takashi Iwai <tiwai@suse.de>
The patch missed returning 0 early in case of success and hence the
static OPPs got removed by mistake. Fix it.
Fixes: 90d46d71cc ("opp: Handle multiple calls for same OPP table in _of_add_opp_table_v1()")
Reported-by: Aisheng Dong <aisheng.dong@nxp.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: Dong Aisheng <aisheng.dong@nxp.com>
The i8042 module exports several symbols which may be used by other
modules.
Before this commit it would refuse to load (when built as a module itself)
on systems without an i8042 controller.
This is a problem specifically for the asus-nb-wmi module. Many Asus
laptops support the Asus WMI interface. Some of them have an i8042
controller and need to use i8042_install_filter() to filter some kbd
events. Other models do not have an i8042 controller (e.g. they use an
USB attached kbd).
Before this commit the asus-nb-wmi driver could not be loaded on Asus
models without an i8042 controller, when the i8042 code was built as
a module (as Arch Linux does) because the module_init function of the
i8042 module would fail with -ENODEV and thus the i8042_install_filter
symbol could not be loaded.
This commit fixes this by exiting from module_init with a return code
of 0 if no controller is found. It also adds a i8042_present bool to
make the module_exit function a no-op in this case and also adds a
check for i8042_present to the exported i8042_command function.
The latter i8042_present check should not really be necessary because
when builtin that function can already be used on systems without
an i8042 controller, but better safe then sorry.
Reported-and-tested-by: Marius Iacob <themariusus@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20201008112628.3979-2-hdegoede@redhat.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Michael Chan says:
====================
bnxt_en: Bug fixes.
These 5 bug fixes are all related to the firmware reset or AER recovery.
2 patches fix the cleanup logic for the workqueue used to handle firmware
reset and recovery. 1 patch ensures that the chip will have the proper
BAR addresses latched after fatal AER recovery. 1 patch fixes the
open path to check for firmware reset abort error. The last one
sends the fw reset command unconditionally to fix the AER reset logic.
====================
Link: https://lore.kernel.org/r/1603685901-17917-1-git-send-email-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In the AER or firmware reset flow, if we are in fatal error state or
if pci_channel_offline() is true, we don't send any commands to the
firmware because the commands will likely not reach the firmware and
most commands don't matter much because the firmware is likely to be
reset imminently.
However, the HWRM_FUNC_RESET command is different and we should always
attempt to send it. In the AER flow for example, the .slot_reset()
call will trigger this fw command and we need to try to send it to
effect the proper reset.
Fixes: b340dc680e ("bnxt_en: Avoid sending firmware messages when AER error is detected.")
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When a PCIe fatal error occurs, the internal latched BAR addresses
in the chip get reset even though the BAR register values in config
space are retained.
pci_restore_state() will not rewrite the BAR addresses if the
BAR address values are valid, causing the chip's internal BAR addresses
to stay invalid. So we need to zero the BAR registers during PCIe fatal
error to force pci_restore_state() to restore the BAR addresses. These
write cycles to the BAR registers will cause the proper BAR addresses to
latch internally.
Fixes: 6316ea6db9 ("bnxt_en: Enable AER support.")
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
As part of the commit b148bb238c
("bnxt_en: Fix possible crash in bnxt_fw_reset_task()."),
cancel_delayed_work_sync() is called only for VFs to fix a possible
crash by cancelling any pending delayed work items. It was assumed
by mistake that the flush_workqueue() call on the PF would flush
delayed work items as well.
As flush_workqueue() does not cancel the delayed workqueue, extend
the fix for PFs. This fix will avoid the system crash, if there are
any pending delayed work items in fw_reset_task() during driver's
.remove() call.
Unify the workqueue cleanup logic for both PF and VF by calling
cancel_work_sync() and cancel_delayed_work_sync() directly in
bnxt_remove_one().
Fixes: b148bb238c ("bnxt_en: Fix possible crash in bnxt_fw_reset_task().")
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
A recent patch has moved the workqueue cleanup logic before
calling unregister_netdev() in bnxt_remove_one(). This caused a
regression because the workqueue can be restarted if the device is
still open. Workqueue cleanup must be done after unregister_netdev().
The workqueue will not restart itself after the device is closed.
Call bnxt_cancel_sp_work() after unregister_netdev() and
call bnxt_dl_fw_reporters_destroy() after that. This fixes the
regession and the original NULL ptr dereference issue.
Fixes: b16939b59c ("bnxt_en: Fix NULL ptr dereference crash in bnxt_fw_reset_task()")
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Nesting macros that use the same local variable names causes
warnings when building with "make W=2":
include/asm-generic/percpu.h:117:14: warning: declaration of '__ret' shadows a previous local [-Wshadow]
include/asm-generic/percpu.h:126:14: warning: declaration of '__ret' shadows a previous local [-Wshadow]
These are fairly harmless, but since the warning comes from
a global header, the warning happens every time the headers
are included, which is fairly annoying.
Rename the variables to avoid shadowing and shut up the warning.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Signed-off-by: Dennis Zhou <dennis@kernel.org>
Ido Schimmel says:
====================
mlxsw: Various fixes
This patch set contains various fixes for mlxsw.
Patch #1 ensures that only link modes that are supported by both the
device and the driver are advertised. When a link mode that is not
supported by the driver is negotiated by the device, it will be
presented as an unknown speed by ethtool, causing the bond driver to
wrongly assume that the link is down.
Patch #2 fixes a trivial memory leak upon module removal.
Patch #3 fixes a use-after-free that syzkaller was able to trigger once
on a slow emulator after a few months of fuzzing.
====================
Link: https://lore.kernel.org/r/20201024133733.2107509-1-idosch@idosch.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Each EMAD transaction stores the skb used to issue the EMAD request
('trans->tx_skb') so that the request could be retried in case of a
timeout. The skb can be freed when a corresponding response is received
or as part of the retry logic (e.g., failed retransmit, exceeded maximum
number of retries).
The two tasks (i.e., response processing and retransmits) are
synchronized by the atomic 'trans->active' field which ensures that
responses to inactive transactions are ignored.
In case of a failed retransmit the transaction is finished and all of
its resources are freed. However, the current code does not mark it as
inactive. Syzkaller was able to hit a race condition in which a
concurrent response is processed while the transaction's resources are
being freed, resulting in a use-after-free [1].
Fix the issue by making sure to mark the transaction as inactive after a
failed retransmit and free its resources only if a concurrent task did
not already do that.
[1]
BUG: KASAN: use-after-free in consume_skb+0x30/0x370
net/core/skbuff.c:833
Read of size 4 at addr ffff88804f570494 by task syz-executor.0/1004
CPU: 0 PID: 1004 Comm: syz-executor.0 Not tainted 5.8.0-rc7+ #68
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0xf6/0x16e lib/dump_stack.c:118
print_address_description.constprop.0+0x1c/0x250
mm/kasan/report.c:383
__kasan_report mm/kasan/report.c:513 [inline]
kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530
check_memory_region_inline mm/kasan/generic.c:186 [inline]
check_memory_region+0x14e/0x1b0 mm/kasan/generic.c:192
instrument_atomic_read include/linux/instrumented.h:56 [inline]
atomic_read include/asm-generic/atomic-instrumented.h:27 [inline]
refcount_read include/linux/refcount.h:147 [inline]
skb_unref include/linux/skbuff.h:1044 [inline]
consume_skb+0x30/0x370 net/core/skbuff.c:833
mlxsw_emad_trans_finish+0x64/0x1c0 drivers/net/ethernet/mellanox/mlxsw/core.c:592
mlxsw_emad_process_response drivers/net/ethernet/mellanox/mlxsw/core.c:651 [inline]
mlxsw_emad_rx_listener_func+0x5c9/0xac0 drivers/net/ethernet/mellanox/mlxsw/core.c:672
mlxsw_core_skb_receive+0x4df/0x770 drivers/net/ethernet/mellanox/mlxsw/core.c:2063
mlxsw_pci_cqe_rdq_handle drivers/net/ethernet/mellanox/mlxsw/pci.c:595 [inline]
mlxsw_pci_cq_tasklet+0x12a6/0x2520 drivers/net/ethernet/mellanox/mlxsw/pci.c:651
tasklet_action_common.isra.0+0x13f/0x3e0 kernel/softirq.c:550
__do_softirq+0x223/0x964 kernel/softirq.c:292
asm_call_on_stack+0x12/0x20 arch/x86/entry/entry_64.S:711
Allocated by task 1006:
save_stack+0x1b/0x40 mm/kasan/common.c:48
set_track mm/kasan/common.c:56 [inline]
__kasan_kmalloc mm/kasan/common.c:494 [inline]
__kasan_kmalloc.constprop.0+0xc2/0xd0 mm/kasan/common.c:467
slab_post_alloc_hook mm/slab.h:586 [inline]
slab_alloc_node mm/slub.c:2824 [inline]
slab_alloc mm/slub.c:2832 [inline]
kmem_cache_alloc+0xcd/0x2e0 mm/slub.c:2837
__build_skb+0x21/0x60 net/core/skbuff.c:311
__netdev_alloc_skb+0x1e2/0x360 net/core/skbuff.c:464
netdev_alloc_skb include/linux/skbuff.h:2810 [inline]
mlxsw_emad_alloc drivers/net/ethernet/mellanox/mlxsw/core.c:756 [inline]
mlxsw_emad_reg_access drivers/net/ethernet/mellanox/mlxsw/core.c:787 [inline]
mlxsw_core_reg_access_emad+0x1ab/0x1420 drivers/net/ethernet/mellanox/mlxsw/core.c:1817
mlxsw_reg_trans_query+0x39/0x50 drivers/net/ethernet/mellanox/mlxsw/core.c:1831
mlxsw_sp_sb_pm_occ_clear drivers/net/ethernet/mellanox/mlxsw/spectrum_buffers.c:260 [inline]
mlxsw_sp_sb_occ_max_clear+0xbff/0x10a0 drivers/net/ethernet/mellanox/mlxsw/spectrum_buffers.c:1365
mlxsw_devlink_sb_occ_max_clear+0x76/0xb0 drivers/net/ethernet/mellanox/mlxsw/core.c:1037
devlink_nl_cmd_sb_occ_max_clear_doit+0x1ec/0x280 net/core/devlink.c:1765
genl_family_rcv_msg_doit net/netlink/genetlink.c:669 [inline]
genl_family_rcv_msg net/netlink/genetlink.c:714 [inline]
genl_rcv_msg+0x617/0x980 net/netlink/genetlink.c:731
netlink_rcv_skb+0x152/0x440 net/netlink/af_netlink.c:2470
genl_rcv+0x24/0x40 net/netlink/genetlink.c:742
netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline]
netlink_unicast+0x53a/0x750 net/netlink/af_netlink.c:1330
netlink_sendmsg+0x850/0xd90 net/netlink/af_netlink.c:1919
sock_sendmsg_nosec net/socket.c:651 [inline]
sock_sendmsg+0x150/0x190 net/socket.c:671
____sys_sendmsg+0x6d8/0x840 net/socket.c:2359
___sys_sendmsg+0xff/0x170 net/socket.c:2413
__sys_sendmsg+0xe5/0x1b0 net/socket.c:2446
do_syscall_64+0x56/0xa0 arch/x86/entry/common.c:384
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Freed by task 73:
save_stack+0x1b/0x40 mm/kasan/common.c:48
set_track mm/kasan/common.c:56 [inline]
kasan_set_free_info mm/kasan/common.c:316 [inline]
__kasan_slab_free+0x12c/0x170 mm/kasan/common.c:455
slab_free_hook mm/slub.c:1474 [inline]
slab_free_freelist_hook mm/slub.c:1507 [inline]
slab_free mm/slub.c:3072 [inline]
kmem_cache_free+0xbe/0x380 mm/slub.c:3088
kfree_skbmem net/core/skbuff.c:622 [inline]
kfree_skbmem+0xef/0x1b0 net/core/skbuff.c:616
__kfree_skb net/core/skbuff.c:679 [inline]
consume_skb net/core/skbuff.c:837 [inline]
consume_skb+0xe1/0x370 net/core/skbuff.c:831
mlxsw_emad_trans_finish+0x64/0x1c0 drivers/net/ethernet/mellanox/mlxsw/core.c:592
mlxsw_emad_transmit_retry.isra.0+0x9d/0xc0 drivers/net/ethernet/mellanox/mlxsw/core.c:613
mlxsw_emad_trans_timeout_work+0x43/0x50 drivers/net/ethernet/mellanox/mlxsw/core.c:625
process_one_work+0xa3e/0x17a0 kernel/workqueue.c:2269
worker_thread+0x9e/0x1050 kernel/workqueue.c:2415
kthread+0x355/0x470 kernel/kthread.c:291
ret_from_fork+0x22/0x30 arch/x86/entry/entry_64.S:293
The buggy address belongs to the object at ffff88804f5703c0
which belongs to the cache skbuff_head_cache of size 224
The buggy address is located 212 bytes inside of
224-byte region [ffff88804f5703c0, ffff88804f5704a0)
The buggy address belongs to the page:
page:ffffea00013d5c00 refcount:1 mapcount:0 mapping:0000000000000000
index:0x0
flags: 0x100000000000200(slab)
raw: 0100000000000200 dead000000000100 dead000000000122 ffff88806c625400
raw: 0000000000000000 00000000000c000c 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff88804f570380: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
ffff88804f570400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88804f570480: fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc
^
ffff88804f570500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff88804f570580: 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc
Fixes: caf7297e7a ("mlxsw: core: Introduce support for asynchronous EMAD register access")
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
During port creation the driver instructs the device to advertise all
the supported link modes queried from the device.
Since cited commit not all the link modes supported by the device are
supported by the driver. This can result in the device negotiating a
link mode that is not recognized by the driver causing ethtool to show
an unsupported speed:
$ ethtool swp1
...
Speed: Unknown!
This is especially problematic when the netdev is enslaved to a bond, as
the bond driver uses unknown speed as an indication that the link is
down:
[13048.900895] net_ratelimit: 86 callbacks suppressed
[13048.900902] t_bond0: (slave swp52): failed to get link speed/duplex
[13048.912160] t_bond0: (slave swp49): failed to get link speed/duplex
Fix this by making sure that only link modes that are supported by both
the device and the driver are advertised.
Fixes: b97cd89126 ("mlxsw: Remove 56G speed support")
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Since commit 12d16b397c ("gpio: mxc: Support module build") the
CONFIG_GPIO_MXC option needs to be explicitly selected.
Select it to avoid boot issues on imx25/imx27 due to the lack of the
GPIO driver.
Signed-off-by: Fabio Estevam <festevam@gmail.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Since commit 12d16b397c ("gpio: mxc: Support module build") the
CONFIG_GPIO_MXC option needs to be explicitly selected.
Select it to avoid boot issues on imx25/imx27 due to the lack of the
GPIO driver.
Signed-off-by: Fabio Estevam <festevam@gmail.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Karsten Graul says:
====================
net/smc: fixes 2020-10-23
Patch 1 fixes a potential null pointer dereference. Patch 2 takes care
of a suppressed return code and patch 3 corrects the system EID in the
ISM driver.
====================
Link: https://lore.kernel.org/r/20201023184830.59548-1-kgraul@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The system EID that is defined by the ISM driver is not correct. Using
an incorrect system EID allows to communicate with remote Linux systems
that use the same incorrect system EID, but when it comes to
interoperability with other operating systems then the system EIDs do
never match which prevents SMC-Dv2 communication.
Using the correct system EID fixes this problem.
Fixes: 201091ebb2 ("net/smc: introduce System Enterprise ID (SEID)")
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The patch that repaired the invalid return code in smcd_new_buf_create()
missed to take care of errno ENOSPC which has a special meaning that no
more DMBEs can be registered on the device. Fix that by keeping this
errno value during the translation of the return code.
Fixes: 6b1bbf94ab ("net/smc: fix invalid return code in smcd_new_buf_create()")
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
smc_listen_work() calls smc_listen_decline() on label out_decl,
providing the ini pointer variable. But this pointer can still be null
when the label out_decl is reached.
Fix this by checking the ini variable in smc_listen_work() and call
smc_listen_decline() with the result directly.
Fixes: a7c9c5f4af ("net/smc: CLC accept / confirm V2")
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The current code sets up the filter action field before
rewrites are set up. When the action 'switch' is used
with rewrites, this may result in initial few packets
that get switched out don't have rewrites applied
on them.
So, make sure filter action is set up along with rewrites
or only after everything else is set up for rewrites.
Fixes: 12b276fbf6 ("cxgb4: add support to create hash filters")
Signed-off-by: Raju Rangoju <rajur@chelsio.com>
Link: https://lore.kernel.org/r/20201023115852.18262-1-rajur@chelsio.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Smatch complains that "ret" might be uninitialized if we don't enter
the loop. We do always enter the loop so it's a false positive, but
it's cleaner to just return a literal zero and that silences the
warning as well.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: https://lore.kernel.org/r/20201023112212.GA282278@mwanda
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The code to try to shut up sparse warnings about questionable locking
didn't shut up sparse: it made the result not parse as valid C at all,
since the end result now has a label with no statement.
The proper fix is to just always lock the hardware, the same way Bart
did in commit 8ae178760b ("scsi: qla2xxx: Simplify the functions for
dumping firmware"). That avoids the whole problem with having locking
that is not statically obvious.
But in the meantime, just remove the incorrect attempt at trying to
avoid a sparse warning that just made things worse.
This was exposed by commit 3e6efab865 ("scsi: qla2xxx: Fix reset of
MPI firmware"), very similarly to how commit cbb01c2f2f ("scsi:
qla2xxx: Fix MPI failure AEN (8200) handling") exposed the same problem
in another place, and caused that commit 8ae178760b.
Please don't add code to just shut up sparse without actually fixing
what sparse complains about.
Reported-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Arun Easi <aeasi@marvell.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A couple of um files ended up not including the header file that defines
the __section() macro, and the simplest fix is to just revert the change
for those files.
Fixes: 33def8498f treewide: Convert macro and uses of __section(foo) to __section("foo")
Reported-and-tested-by: Guenter Roeck <linux@roeck-us.net>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Some drivers (such as EFA) have a GID table, but aren't IB/RoCE devices.
Remove the unnecessary rdma_ib_or_roce() check.
This fixes rdma-core failures for EFA when it uses the new ioctl interface
for querying the GID table.
Fixes: 9f85cbe50a ("RDMA/uverbs: Expose the new GID query API to user space")
Link: https://lore.kernel.org/r/20201026082621.32463-1-galpress@amazon.com
Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
When a mlx5 core devlink instance is reloaded in different net namespace,
its associated IB device is deleted and recreated.
Example sequence is:
$ ip netns add foo
$ devlink dev reload pci/0000:00:08.0 netns foo
$ ip netns del foo
mlx5 IB device needs to attach and detach the netdevice to it through the
netdev notifier chain during load and unload sequence. A below call graph
of the unload flow.
cleanup_net()
down_read(&pernet_ops_rwsem); <- first sem acquired
ops_pre_exit_list()
pre_exit()
devlink_pernet_pre_exit()
devlink_reload()
mlx5_devlink_reload_down()
mlx5_unload_one()
[...]
mlx5_ib_remove()
mlx5_ib_unbind_slave_port()
mlx5_remove_netdev_notifier()
unregister_netdevice_notifier()
down_write(&pernet_ops_rwsem);<- recurrsive lock
Hence, when net namespace is deleted, mlx5 reload results in deadlock.
When deadlock occurs, devlink mutex is also held. This not only deadlocks
the mlx5 device under reload, but all the processes which attempt to
access unrelated devlink devices are deadlocked.
Hence, fix this by mlx5 ib driver to register for per net netdev notifier
instead of global one, which operats on the net namespace without holding
the pernet_ops_rwsem.
Fixes: 4383cfcc65 ("net/mlx5: Add devlink reload")
Link: https://lore.kernel.org/r/20201026134359.23150-1-parav@nvidia.com
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
The patch referenced below has a typo that results in using the wrong L2
header size for outbound traffic. (V4 <-> V6).
It also breaks kernel-side RC traffic because they use AVs that use
RDMA_NETWORK_XXX enums instead of RXE_NETWORK_TYPE_XXX enums. Fix this by
transcoding between these enum types.
Fixes: e0d696d201 ("RDMA/rxe: Move the definitions for rxe_av.network_type to uAPI")
Link: https://lore.kernel.org/r/20201016211343.22906-1-rpearson@hpe.com
Signed-off-by: Bob Pearson <rpearson@hpe.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
In commit 8d98416a55 ("scsi: hisi_sas: Switch v3 hw to MQ"), the dispatch
function was changed to choose the delivery queue based on the request tag
HW queue index.
This heavily degrades performance for v2 hw, since the HW queues are not
exposed there, and, as such, HW queue #0 is used for every command.
Revert to previous behaviour for when nr_hw_queues is not set, that being
to choose the HW queue based on target device index.
Link: https://lore.kernel.org/r/1602750425-240341-1-git-send-email-john.garry@huawei.com
Fixes: 8d98416a55 ("scsi: hisi_sas: Switch v3 hw to MQ")
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This patch simplifies the ASSERT*() and BREAK_TO_DEBUGGER() macros:
- Move the dependency check of CONFIG_KGDB into Kconfig
- Unify the kgdb_breakpoint() call
- Drop the non-existing CONFIG_HAVE_KGDB
Also align the behavior of ASSERT() macro in both cases with and
without CONFIG_DEBUG_KERNEL_DC.
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
ASSERT_CRITICAL() invokes kgdb_breakpoint() whenever either
CONFIG_KGDB or CONFIG_HAVE_KGDB is set. This, however, may lead to a
kernel panic when no kdb stuff is attached, since the
kgdb_breakpoint() call issues INT3. It's nothing but a surprise for
normal end-users.
For avoiding the pitfall, make the kgdb_breakpoint() call only when
CONFIG_DEBUG_KERNEL_DC is set.
https://bugzilla.opensuse.org/show_bug.cgi?id=1177973
Cc: <stable@vger.kernel.org>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
After a loss of transport due to an adapter migration or crash/disconnect
from the host partner there is a tiny window where we can race adjusting
the request_limit of the adapter. The request limit is atomically
increased/decreased to track the number of inflight requests against the
allowed limit of our VIOS partner.
After a transport loss we set the request_limit to zero to reflect this
state. However, there is a window where the adapter may attempt to queue a
command because the transport loss event hasn't been fully processed yet
and request_limit is still greater than zero. The hypercall to send the
event will fail and the error path will increment the request_limit as a
result. If the adapter processes the transport event prior to this
increment the request_limit becomes out of sync with the adapter state and
can result in SCSI commands being submitted on the now reset connection
prior to an SRP Login resulting in a protocol violation.
Fix this race by protecting request_limit with the host lock when changing
the value via atomic_set() to indicate no transport.
Link: https://lore.kernel.org/r/20201025001355.4527-1-tyreld@linux.ibm.com
Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Another round of wack-a-mole. The json-schema default is additional
unknown properties are allowed, but for DT all properties should be
defined.
Signed-off-by: Rob Herring <robh@kernel.org>
[why]
get_pixel_clk_frequency_100hz is undefined in clock_source_funcs.
[how]
set function pointer: ".get_pixel_clk_frequency_100hz = get_pixel_clk_frequency_100hz"
Signed-off-by: David Galiffi <David.Galiffi@amd.com>
Reviewed-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Acked-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The mptscsih_remove() function triggers a kernel oops if the Scsi_Host
pointer (ioc->sh) is NULL, as can be seen in this syslog:
ioc0: LSI53C1030 B2: Capabilities={Initiator,Target}
Begin: Waiting for root file system ...
scsi host2: error handler thread failed to spawn, error = -4
mptspi: ioc0: WARNING - Unable to register controller with SCSI subsystem
Backtrace:
[<000000001045b7cc>] mptspi_probe+0x248/0x3d0 [mptspi]
[<0000000040946470>] pci_device_probe+0x1ac/0x2d8
[<0000000040add668>] really_probe+0x1bc/0x988
[<0000000040ade704>] driver_probe_device+0x160/0x218
[<0000000040adee24>] device_driver_attach+0x160/0x188
[<0000000040adef90>] __driver_attach+0x144/0x320
[<0000000040ad7c78>] bus_for_each_dev+0xd4/0x158
[<0000000040adc138>] driver_attach+0x4c/0x80
[<0000000040adb3ec>] bus_add_driver+0x3e0/0x498
[<0000000040ae0130>] driver_register+0xf4/0x298
[<00000000409450c4>] __pci_register_driver+0x78/0xa8
[<000000000007d248>] mptspi_init+0x18c/0x1c4 [mptspi]
This patch adds the necessary NULL-pointer checks. Successfully tested on
a HP C8000 parisc workstation with buggy SCSI drives.
Link: https://lore.kernel.org/r/20201022090005.GA9000@ls3530.fritz.box
Cc: <stable@vger.kernel.org>
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The current scanning mechanism is supposed to fall back to a synchronous
host scan if an asynchronous scan is in progress. However, this rule isn't
strictly respected, scsi_prep_async_scan() doesn't hold scan_mutex when
checking shost->async_scan. When scsi_scan_host() is called concurrently,
two async scans on same host can be started and a hang in do_scan_async()
is observed.
Fixes this issue by checking & setting shost->async_scan atomically with
shost->scan_mutex.
Link: https://lore.kernel.org/r/20201010032539.426615-1-ming.lei@redhat.com
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ewan D. Milne <emilne@redhat.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Lee Duncan <lduncan@suse.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When building with W=2, there are lots of warnings about the
snd_kcontrol_new name field being an array of 'unsigned char'
but initialized to a string:
include/sound/soc.h:93:48: warning: pointer targets in initialization of 'const unsigned char *' from 'char *' differ in signedness [-Wpointer-sign]
Make it a regular 'char *' to avoid flooding the build log with this.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20201026165715.3723704-1-arnd@kernel.org
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Some tests logs for kunit_tool tests are missing their test plans
causing their tests to fail; fix this by adding the test plans.
Fixes: 45dcbb6f5e ("kunit: test: add test plan to KUnit TAP format")
Signed-off-by: Brendan Higgins <brendanhiggins@google.com>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
If 'CONFIG_KUNIT=m', letting kunit tests that do not support loadable
module build depends on 'KUNIT' instead of 'KUNIT=y' result in compile
errors. This commit updates the document for this.
Fixes: 9fe124bf1b ("kunit: allow kunit to be loaded as a module")
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Reviewed-by: David Gow <davidgow@google.com>
Reviewed-by: Brendan Higgins <brendanhiggins@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
test.h still produce three warnings:
include/kunit/test.h:282: warning: Function parameter or member '__suites' not described in 'kunit_test_suites_for_module'
include/kunit/test.h:282: warning: Excess function parameter 'suites_list' description in 'kunit_test_suites_for_module'
include/kunit/test.h:314: warning: Excess function parameter 'suites' description in 'kunit_test_suites'
They're all due to errors at kernel-doc markups. Update them.
It should be noticed that this patch moved a kernel-doc
markup that were located at the wrong place, and using a wrong
name. Kernel-doc only supports kaving the markup just before the
function/macro declaration. Placing it elsewhere will make it do
wrong assumptions.
Fixes: aac35468ca ("kunit: test: create a single centralized executor for all tests")
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Brendan Higgins <brendanhiggins@google.com>
Tested-by: Brendan Higgins <brendanhiggins@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Due to the raw_output() function on kunit_parser.py actually being a
generator, it only runs if something reads the lines it returns. Since
we no-longer do that (parsing doesn't actually happen if raw_output is
enabled), it was not printing anything.
Fixes: 45ba7a893a ("kunit: kunit_tool: Separate out config/build/exec/parse")
Signed-off-by: David Gow <davidgow@google.com>
Reviewed-by: Brendan Higgins <brendanhiggins@google.com>
Tested-by: Brendan Higgins <brendanhiggins@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
No ECC initialization should happen during the host controller probe.
In fact, we need the probe function to call nand_scan() in order to:
- identify the device, its capabilities and constraints (nand_scan_ident())
- configure the ECC engine accordingly (->attach_chip())
- scan its content and prepare the core (nand_scan_tail())
Moving these lines to fsl_ifc_attach_chip() fixes a regression caused by
a previous commit supposed to clarify these steps.
Based on a fix done for the mxc_nand driver by Miquel Raynal.
Fixes: d7157ff49a ("mtd: rawnand: Use the ECC framework user input parsing bits")
Reported-by: Han Xu <xhnjupt@gmail.com>
Signed-off-by: Fabio Estevam <festevam@gmail.com>
Tested-by: Han Xu <xhnjupt@gmail.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20201016132626.30112-1-festevam@gmail.com
No ECC initialization should happen during the host controller probe.
In fact, we need the probe function to call nand_scan() in order to:
- identify the device, its capabilities and constraints (nand_scan_ident())
- configure the ECC engine accordingly (->attach_chip())
- scan its content and prepare the core (nand_scan_tail())
Moving these lines to mxcnd_attach_chip() fixes a regression caused by
a previous commit supposed to clarify these steps.
When moving the ECC initialization from probe() to attach(), get rid
of the pdata usage to determine the engine type and let the core decide
instead.
Tested on a imx27-pdk board.
Fixes: d7157ff49a ("mtd: rawnand: Use the ECC framework user input parsing bits")
Reported-by: Fabio Estevam <festevam@gmail.com>
Co-developed-by: Miquel Raynal <miquel.raynal@bootlin.com>
Signed-off-by: Fabio Estevam <festevam@gmail.com>
Tested-by: Sascha Hauer <s.hauer@pengutronix.de>
Tested-by: Martin Kaiser <martin@kaiser.cx>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20201016213613.1450-1-festevam@gmail.com
Pull crypto fix from Herbert Xu:
"This fixes a regression in x86/poly1305"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: x86/poly1305 - add back a needed assignment
Pull s390 fix from Heiko Carstens:
"Fix s390 compile breakage caused by commit 33def8498f ("treewide:
Convert macro and uses of __section(foo) to __section("foo")")"
* tag 's390-5.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390: correct __bootdata / __bootdata_preserved macros
The comment about Hyper-V accessors is unclear regarding their
potential use in x2apic mode, as is the associated commit message
in e211288b72. Clarify that while the architectural and
synthetic MSRs are equivalent in x2apic mode, the full set of xapic
accessors cannot be used because of register layout differences.
Fixes: e211288b72 ("x86/hyperv: Make vapic support x2apic mode")
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/1603723972-81303-1-git-send-email-mikelley@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
The snprintf() function returns the number of characters which would
have been printed if there were enough space, but the scnprintf()
returns the number of characters which were actually printed. If the
buffer is not large enough, then using snprintf() would result in a
read overflow and an information leak.
Fixes: 8910c0bd53 ("hwmon: (pmbus/max20730) add device monitoring via debugfs")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: https://lore.kernel.org/r/20201022070824.GC2817762@mwanda
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
GPIO_T is mapped to the most significant byte of input/output mask, and
the byte in "output" mask should be 0 because GPIO_T is input only. All
the other bits need to be 1 because GPIO_Q/R/S support both input and
output modes.
Fixes: ab4a85534c ("gpio: aspeed: Add in ast2600 details to Aspeed driver")
Signed-off-by: Billy Tsai <billy_tsai@aspeedtech.com>
Reviewed-by: Tao Ren <rentao.bupt@gmail.com>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Reviewed-by: Andrew Jeffery <andrew@aj.id.au>
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Commit 0ea683931a ("gpio: dwapb: Convert driver to using the
GPIO-lib-based IRQ-chip") missed the case in dwapb_irq_set_wake().
Without this fix, probing the dwapb gpio driver will hit a error:
"address between user and kernel address ranges" on a Ampere armv8a
server and cause a panic.
Fixes: 0ea683931a ("gpio: dwapb: Convert driver to using the GPIO-lib-based IRQ-chip")
Signed-off-by: Jia He <justin.he@arm.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Acked-by: Serge Semin <fancer.lancer@gmail.com>
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
i.MX fixes for 5.10:
With commit 12d16b397c ("gpio: mxc: Support module build") in place,
GPIO_MXC has no 'def_bool y' anymore, and needs to be enabled by
defconfig. It updates the defconfig files to explicitly enable the
option for fixing boot failure on i.MX platform.
* tag 'imx-fixes-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux:
ARM: multi_v7_defconfig: Build in CONFIG_GPIO_MXC by default
ARM: imx_v6_v7_defconfig: Build in CONFIG_GPIO_MXC by default
arm64: defconfig: Build in CONFIG_GPIO_MXC by default
Link: https://lore.kernel.org/r/20201026135601.GA32675@dragon
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
I got the following lockdep splat with tree locks converted to rwsem
patches on btrfs/104:
======================================================
WARNING: possible circular locking dependency detected
5.9.0+ #102 Not tainted
------------------------------------------------------
btrfs-cleaner/903 is trying to acquire lock:
ffff8e7fab6ffe30 (btrfs-root-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x32/0x170
but task is already holding lock:
ffff8e7fab628a88 (&fs_info->commit_root_sem){++++}-{3:3}, at: btrfs_find_all_roots+0x41/0x80
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #3 (&fs_info->commit_root_sem){++++}-{3:3}:
down_read+0x40/0x130
caching_thread+0x53/0x5a0
btrfs_work_helper+0xfa/0x520
process_one_work+0x238/0x540
worker_thread+0x55/0x3c0
kthread+0x13a/0x150
ret_from_fork+0x1f/0x30
-> #2 (&caching_ctl->mutex){+.+.}-{3:3}:
__mutex_lock+0x7e/0x7b0
btrfs_cache_block_group+0x1e0/0x510
find_free_extent+0xb6e/0x12f0
btrfs_reserve_extent+0xb3/0x1b0
btrfs_alloc_tree_block+0xb1/0x330
alloc_tree_block_no_bg_flush+0x4f/0x60
__btrfs_cow_block+0x11d/0x580
btrfs_cow_block+0x10c/0x220
commit_cowonly_roots+0x47/0x2e0
btrfs_commit_transaction+0x595/0xbd0
sync_filesystem+0x74/0x90
generic_shutdown_super+0x22/0x100
kill_anon_super+0x14/0x30
btrfs_kill_super+0x12/0x20
deactivate_locked_super+0x36/0xa0
cleanup_mnt+0x12d/0x190
task_work_run+0x5c/0xa0
exit_to_user_mode_prepare+0x1df/0x200
syscall_exit_to_user_mode+0x54/0x280
entry_SYSCALL_64_after_hwframe+0x44/0xa9
-> #1 (&space_info->groups_sem){++++}-{3:3}:
down_read+0x40/0x130
find_free_extent+0x2ed/0x12f0
btrfs_reserve_extent+0xb3/0x1b0
btrfs_alloc_tree_block+0xb1/0x330
alloc_tree_block_no_bg_flush+0x4f/0x60
__btrfs_cow_block+0x11d/0x580
btrfs_cow_block+0x10c/0x220
commit_cowonly_roots+0x47/0x2e0
btrfs_commit_transaction+0x595/0xbd0
sync_filesystem+0x74/0x90
generic_shutdown_super+0x22/0x100
kill_anon_super+0x14/0x30
btrfs_kill_super+0x12/0x20
deactivate_locked_super+0x36/0xa0
cleanup_mnt+0x12d/0x190
task_work_run+0x5c/0xa0
exit_to_user_mode_prepare+0x1df/0x200
syscall_exit_to_user_mode+0x54/0x280
entry_SYSCALL_64_after_hwframe+0x44/0xa9
-> #0 (btrfs-root-00){++++}-{3:3}:
__lock_acquire+0x1167/0x2150
lock_acquire+0xb9/0x3d0
down_read_nested+0x43/0x130
__btrfs_tree_read_lock+0x32/0x170
__btrfs_read_lock_root_node+0x3a/0x50
btrfs_search_slot+0x614/0x9d0
btrfs_find_root+0x35/0x1b0
btrfs_read_tree_root+0x61/0x120
btrfs_get_root_ref+0x14b/0x600
find_parent_nodes+0x3e6/0x1b30
btrfs_find_all_roots_safe+0xb4/0x130
btrfs_find_all_roots+0x60/0x80
btrfs_qgroup_trace_extent_post+0x27/0x40
btrfs_add_delayed_data_ref+0x3fd/0x460
btrfs_free_extent+0x42/0x100
__btrfs_mod_ref+0x1d7/0x2f0
walk_up_proc+0x11c/0x400
walk_up_tree+0xf0/0x180
btrfs_drop_snapshot+0x1c7/0x780
btrfs_clean_one_deleted_snapshot+0xfb/0x110
cleaner_kthread+0xd4/0x140
kthread+0x13a/0x150
ret_from_fork+0x1f/0x30
other info that might help us debug this:
Chain exists of:
btrfs-root-00 --> &caching_ctl->mutex --> &fs_info->commit_root_sem
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&fs_info->commit_root_sem);
lock(&caching_ctl->mutex);
lock(&fs_info->commit_root_sem);
lock(btrfs-root-00);
*** DEADLOCK ***
3 locks held by btrfs-cleaner/903:
#0: ffff8e7fab628838 (&fs_info->cleaner_mutex){+.+.}-{3:3}, at: cleaner_kthread+0x6e/0x140
#1: ffff8e7faadac640 (sb_internal){.+.+}-{0:0}, at: start_transaction+0x40b/0x5c0
#2: ffff8e7fab628a88 (&fs_info->commit_root_sem){++++}-{3:3}, at: btrfs_find_all_roots+0x41/0x80
stack backtrace:
CPU: 0 PID: 903 Comm: btrfs-cleaner Not tainted 5.9.0+ #102
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014
Call Trace:
dump_stack+0x8b/0xb0
check_noncircular+0xcf/0xf0
__lock_acquire+0x1167/0x2150
? __bfs+0x42/0x210
lock_acquire+0xb9/0x3d0
? __btrfs_tree_read_lock+0x32/0x170
down_read_nested+0x43/0x130
? __btrfs_tree_read_lock+0x32/0x170
__btrfs_tree_read_lock+0x32/0x170
__btrfs_read_lock_root_node+0x3a/0x50
btrfs_search_slot+0x614/0x9d0
? find_held_lock+0x2b/0x80
btrfs_find_root+0x35/0x1b0
? do_raw_spin_unlock+0x4b/0xa0
btrfs_read_tree_root+0x61/0x120
btrfs_get_root_ref+0x14b/0x600
find_parent_nodes+0x3e6/0x1b30
btrfs_find_all_roots_safe+0xb4/0x130
btrfs_find_all_roots+0x60/0x80
btrfs_qgroup_trace_extent_post+0x27/0x40
btrfs_add_delayed_data_ref+0x3fd/0x460
btrfs_free_extent+0x42/0x100
__btrfs_mod_ref+0x1d7/0x2f0
walk_up_proc+0x11c/0x400
walk_up_tree+0xf0/0x180
btrfs_drop_snapshot+0x1c7/0x780
? btrfs_clean_one_deleted_snapshot+0x73/0x110
btrfs_clean_one_deleted_snapshot+0xfb/0x110
cleaner_kthread+0xd4/0x140
? btrfs_alloc_root+0x50/0x50
kthread+0x13a/0x150
? kthread_create_worker_on_cpu+0x40/0x40
ret_from_fork+0x1f/0x30
BTRFS info (device sdb): disk space caching is enabled
BTRFS info (device sdb): has skinny extents
This happens because qgroups does a backref lookup when we create a
delayed ref. From here it may have to look up a root from an indirect
ref, which does a normal lookup on the tree_root, which takes the read
lock on the tree_root nodes.
To fix this we need to add a variant for looking up roots that searches
the commit root of the tree_root. Then when we do the backref search
using the commit root we are sure to not take any locks on the tree_root
nodes. This gets rid of the lockdep splat when running btrfs/104.
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When enabling qgroups we walk the tree_root and then add a qgroup item
for every root that we have. This creates a lock dependency on the
tree_root and qgroup_root, which results in the following lockdep splat
(with tree locks using rwsem), eg. in tests btrfs/017 or btrfs/022:
======================================================
WARNING: possible circular locking dependency detected
5.9.0-default+ #1299 Not tainted
------------------------------------------------------
btrfs/24552 is trying to acquire lock:
ffff9142dfc5f630 (btrfs-quota-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x35/0x1c0 [btrfs]
but task is already holding lock:
ffff9142dfc5d0b0 (btrfs-root-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x35/0x1c0 [btrfs]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (btrfs-root-00){++++}-{3:3}:
__lock_acquire+0x3fb/0x730
lock_acquire.part.0+0x6a/0x130
down_read_nested+0x46/0x130
__btrfs_tree_read_lock+0x35/0x1c0 [btrfs]
__btrfs_read_lock_root_node+0x3a/0x50 [btrfs]
btrfs_search_slot_get_root+0x11d/0x290 [btrfs]
btrfs_search_slot+0xc3/0x9f0 [btrfs]
btrfs_insert_item+0x6e/0x140 [btrfs]
btrfs_create_tree+0x1cb/0x240 [btrfs]
btrfs_quota_enable+0xcd/0x790 [btrfs]
btrfs_ioctl_quota_ctl+0xc9/0xe0 [btrfs]
__x64_sys_ioctl+0x83/0xa0
do_syscall_64+0x2d/0x70
entry_SYSCALL_64_after_hwframe+0x44/0xa9
-> #0 (btrfs-quota-00){++++}-{3:3}:
check_prev_add+0x91/0xc30
validate_chain+0x491/0x750
__lock_acquire+0x3fb/0x730
lock_acquire.part.0+0x6a/0x130
down_read_nested+0x46/0x130
__btrfs_tree_read_lock+0x35/0x1c0 [btrfs]
__btrfs_read_lock_root_node+0x3a/0x50 [btrfs]
btrfs_search_slot_get_root+0x11d/0x290 [btrfs]
btrfs_search_slot+0xc3/0x9f0 [btrfs]
btrfs_insert_empty_items+0x58/0xa0 [btrfs]
add_qgroup_item.part.0+0x72/0x210 [btrfs]
btrfs_quota_enable+0x3bb/0x790 [btrfs]
btrfs_ioctl_quota_ctl+0xc9/0xe0 [btrfs]
__x64_sys_ioctl+0x83/0xa0
do_syscall_64+0x2d/0x70
entry_SYSCALL_64_after_hwframe+0x44/0xa9
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(btrfs-root-00);
lock(btrfs-quota-00);
lock(btrfs-root-00);
lock(btrfs-quota-00);
*** DEADLOCK ***
5 locks held by btrfs/24552:
#0: ffff9142df431478 (sb_writers#10){.+.+}-{0:0}, at: mnt_want_write_file+0x22/0xa0
#1: ffff9142f9b10cc0 (&fs_info->subvol_sem){++++}-{3:3}, at: btrfs_ioctl_quota_ctl+0x7b/0xe0 [btrfs]
#2: ffff9142f9b11a08 (&fs_info->qgroup_ioctl_lock){+.+.}-{3:3}, at: btrfs_quota_enable+0x3b/0x790 [btrfs]
#3: ffff9142df431698 (sb_internal#2){.+.+}-{0:0}, at: start_transaction+0x406/0x510 [btrfs]
#4: ffff9142dfc5d0b0 (btrfs-root-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x35/0x1c0 [btrfs]
stack backtrace:
CPU: 1 PID: 24552 Comm: btrfs Not tainted 5.9.0-default+ #1299
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014
Call Trace:
dump_stack+0x77/0x97
check_noncircular+0xf3/0x110
check_prev_add+0x91/0xc30
validate_chain+0x491/0x750
__lock_acquire+0x3fb/0x730
lock_acquire.part.0+0x6a/0x130
? __btrfs_tree_read_lock+0x35/0x1c0 [btrfs]
? lock_acquire+0xc4/0x140
? __btrfs_tree_read_lock+0x35/0x1c0 [btrfs]
down_read_nested+0x46/0x130
? __btrfs_tree_read_lock+0x35/0x1c0 [btrfs]
__btrfs_tree_read_lock+0x35/0x1c0 [btrfs]
? btrfs_root_node+0xd9/0x200 [btrfs]
__btrfs_read_lock_root_node+0x3a/0x50 [btrfs]
btrfs_search_slot_get_root+0x11d/0x290 [btrfs]
btrfs_search_slot+0xc3/0x9f0 [btrfs]
btrfs_insert_empty_items+0x58/0xa0 [btrfs]
add_qgroup_item.part.0+0x72/0x210 [btrfs]
btrfs_quota_enable+0x3bb/0x790 [btrfs]
btrfs_ioctl_quota_ctl+0xc9/0xe0 [btrfs]
__x64_sys_ioctl+0x83/0xa0
do_syscall_64+0x2d/0x70
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fix this by dropping the path whenever we find a root item, add the
qgroup item, and then re-lookup the root item we found and continue
processing roots.
Reported-by: David Sterba <dsterba@suse.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Very sporadically I had test case btrfs/069 from fstests hanging (for
years, it is not a recent regression), with the following traces in
dmesg/syslog:
[162301.160628] BTRFS info (device sdc): dev_replace from /dev/sdd (devid 2) to /dev/sdg started
[162301.181196] BTRFS info (device sdc): scrub: finished on devid 4 with status: 0
[162301.287162] BTRFS info (device sdc): dev_replace from /dev/sdd (devid 2) to /dev/sdg finished
[162513.513792] INFO: task btrfs-transacti:1356167 blocked for more than 120 seconds.
[162513.514318] Not tainted 5.9.0-rc6-btrfs-next-69 #1
[162513.514522] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[162513.514747] task:btrfs-transacti state:D stack: 0 pid:1356167 ppid: 2 flags:0x00004000
[162513.514751] Call Trace:
[162513.514761] __schedule+0x5ce/0xd00
[162513.514765] ? _raw_spin_unlock_irqrestore+0x3c/0x60
[162513.514771] schedule+0x46/0xf0
[162513.514844] wait_current_trans+0xde/0x140 [btrfs]
[162513.514850] ? finish_wait+0x90/0x90
[162513.514864] start_transaction+0x37c/0x5f0 [btrfs]
[162513.514879] transaction_kthread+0xa4/0x170 [btrfs]
[162513.514891] ? btrfs_cleanup_transaction+0x660/0x660 [btrfs]
[162513.514894] kthread+0x153/0x170
[162513.514897] ? kthread_stop+0x2c0/0x2c0
[162513.514902] ret_from_fork+0x22/0x30
[162513.514916] INFO: task fsstress:1356184 blocked for more than 120 seconds.
[162513.515192] Not tainted 5.9.0-rc6-btrfs-next-69 #1
[162513.515431] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[162513.515680] task:fsstress state:D stack: 0 pid:1356184 ppid:1356177 flags:0x00004000
[162513.515682] Call Trace:
[162513.515688] __schedule+0x5ce/0xd00
[162513.515691] ? _raw_spin_unlock_irqrestore+0x3c/0x60
[162513.515697] schedule+0x46/0xf0
[162513.515712] wait_current_trans+0xde/0x140 [btrfs]
[162513.515716] ? finish_wait+0x90/0x90
[162513.515729] start_transaction+0x37c/0x5f0 [btrfs]
[162513.515743] btrfs_attach_transaction_barrier+0x1f/0x50 [btrfs]
[162513.515753] btrfs_sync_fs+0x61/0x1c0 [btrfs]
[162513.515758] ? __ia32_sys_fdatasync+0x20/0x20
[162513.515761] iterate_supers+0x87/0xf0
[162513.515765] ksys_sync+0x60/0xb0
[162513.515768] __do_sys_sync+0xa/0x10
[162513.515771] do_syscall_64+0x33/0x80
[162513.515774] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[162513.515781] RIP: 0033:0x7f5238f50bd7
[162513.515782] Code: Bad RIP value.
[162513.515784] RSP: 002b:00007fff67b978e8 EFLAGS: 00000206 ORIG_RAX: 00000000000000a2
[162513.515786] RAX: ffffffffffffffda RBX: 000055b1fad2c560 RCX: 00007f5238f50bd7
[162513.515788] RDX: 00000000ffffffff RSI: 000000000daf0e74 RDI: 000000000000003a
[162513.515789] RBP: 0000000000000032 R08: 000000000000000a R09: 00007f5239019be0
[162513.515791] R10: fffffffffffff24f R11: 0000000000000206 R12: 000000000000003a
[162513.515792] R13: 00007fff67b97950 R14: 00007fff67b97906 R15: 000055b1fad1a340
[162513.515804] INFO: task fsstress:1356185 blocked for more than 120 seconds.
[162513.516064] Not tainted 5.9.0-rc6-btrfs-next-69 #1
[162513.516329] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[162513.516617] task:fsstress state:D stack: 0 pid:1356185 ppid:1356177 flags:0x00000000
[162513.516620] Call Trace:
[162513.516625] __schedule+0x5ce/0xd00
[162513.516628] ? _raw_spin_unlock_irqrestore+0x3c/0x60
[162513.516634] schedule+0x46/0xf0
[162513.516647] wait_current_trans+0xde/0x140 [btrfs]
[162513.516650] ? finish_wait+0x90/0x90
[162513.516662] start_transaction+0x4d7/0x5f0 [btrfs]
[162513.516679] btrfs_setxattr_trans+0x3c/0x100 [btrfs]
[162513.516686] __vfs_setxattr+0x66/0x80
[162513.516691] __vfs_setxattr_noperm+0x70/0x200
[162513.516697] vfs_setxattr+0x6b/0x120
[162513.516703] setxattr+0x125/0x240
[162513.516709] ? lock_acquire+0xb1/0x480
[162513.516712] ? mnt_want_write+0x20/0x50
[162513.516721] ? rcu_read_lock_any_held+0x8e/0xb0
[162513.516723] ? preempt_count_add+0x49/0xa0
[162513.516725] ? __sb_start_write+0x19b/0x290
[162513.516727] ? preempt_count_add+0x49/0xa0
[162513.516732] path_setxattr+0xba/0xd0
[162513.516739] __x64_sys_setxattr+0x27/0x30
[162513.516741] do_syscall_64+0x33/0x80
[162513.516743] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[162513.516745] RIP: 0033:0x7f5238f56d5a
[162513.516746] Code: Bad RIP value.
[162513.516748] RSP: 002b:00007fff67b97868 EFLAGS: 00000202 ORIG_RAX: 00000000000000bc
[162513.516750] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f5238f56d5a
[162513.516751] RDX: 000055b1fbb0d5a0 RSI: 00007fff67b978a0 RDI: 000055b1fbb0d470
[162513.516753] RBP: 000055b1fbb0d5a0 R08: 0000000000000001 R09: 00007fff67b97700
[162513.516754] R10: 0000000000000004 R11: 0000000000000202 R12: 0000000000000004
[162513.516756] R13: 0000000000000024 R14: 0000000000000001 R15: 00007fff67b978a0
[162513.516767] INFO: task fsstress:1356196 blocked for more than 120 seconds.
[162513.517064] Not tainted 5.9.0-rc6-btrfs-next-69 #1
[162513.517365] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[162513.517763] task:fsstress state:D stack: 0 pid:1356196 ppid:1356177 flags:0x00004000
[162513.517780] Call Trace:
[162513.517786] __schedule+0x5ce/0xd00
[162513.517789] ? _raw_spin_unlock_irqrestore+0x3c/0x60
[162513.517796] schedule+0x46/0xf0
[162513.517810] wait_current_trans+0xde/0x140 [btrfs]
[162513.517814] ? finish_wait+0x90/0x90
[162513.517829] start_transaction+0x37c/0x5f0 [btrfs]
[162513.517845] btrfs_attach_transaction_barrier+0x1f/0x50 [btrfs]
[162513.517857] btrfs_sync_fs+0x61/0x1c0 [btrfs]
[162513.517862] ? __ia32_sys_fdatasync+0x20/0x20
[162513.517865] iterate_supers+0x87/0xf0
[162513.517869] ksys_sync+0x60/0xb0
[162513.517872] __do_sys_sync+0xa/0x10
[162513.517875] do_syscall_64+0x33/0x80
[162513.517878] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[162513.517881] RIP: 0033:0x7f5238f50bd7
[162513.517883] Code: Bad RIP value.
[162513.517885] RSP: 002b:00007fff67b978e8 EFLAGS: 00000206 ORIG_RAX: 00000000000000a2
[162513.517887] RAX: ffffffffffffffda RBX: 000055b1fad2c560 RCX: 00007f5238f50bd7
[162513.517889] RDX: 0000000000000000 RSI: 000000007660add2 RDI: 0000000000000053
[162513.517891] RBP: 0000000000000032 R08: 0000000000000067 R09: 00007f5239019be0
[162513.517893] R10: fffffffffffff24f R11: 0000000000000206 R12: 0000000000000053
[162513.517895] R13: 00007fff67b97950 R14: 00007fff67b97906 R15: 000055b1fad1a340
[162513.517908] INFO: task fsstress:1356197 blocked for more than 120 seconds.
[162513.518298] Not tainted 5.9.0-rc6-btrfs-next-69 #1
[162513.518672] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[162513.519157] task:fsstress state:D stack: 0 pid:1356197 ppid:1356177 flags:0x00000000
[162513.519160] Call Trace:
[162513.519165] __schedule+0x5ce/0xd00
[162513.519168] ? _raw_spin_unlock_irqrestore+0x3c/0x60
[162513.519174] schedule+0x46/0xf0
[162513.519190] wait_current_trans+0xde/0x140 [btrfs]
[162513.519193] ? finish_wait+0x90/0x90
[162513.519206] start_transaction+0x4d7/0x5f0 [btrfs]
[162513.519222] btrfs_create+0x57/0x200 [btrfs]
[162513.519230] lookup_open+0x522/0x650
[162513.519246] path_openat+0x2b8/0xa50
[162513.519270] do_filp_open+0x91/0x100
[162513.519275] ? find_held_lock+0x32/0x90
[162513.519280] ? lock_acquired+0x33b/0x470
[162513.519285] ? do_raw_spin_unlock+0x4b/0xc0
[162513.519287] ? _raw_spin_unlock+0x29/0x40
[162513.519295] do_sys_openat2+0x20d/0x2d0
[162513.519300] do_sys_open+0x44/0x80
[162513.519304] do_syscall_64+0x33/0x80
[162513.519307] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[162513.519309] RIP: 0033:0x7f5238f4a903
[162513.519310] Code: Bad RIP value.
[162513.519312] RSP: 002b:00007fff67b97758 EFLAGS: 00000246 ORIG_RAX: 0000000000000055
[162513.519314] RAX: ffffffffffffffda RBX: 00000000ffffffff RCX: 00007f5238f4a903
[162513.519316] RDX: 0000000000000000 RSI: 00000000000001b6 RDI: 000055b1fbb0d470
[162513.519317] RBP: 00007fff67b978c0 R08: 0000000000000001 R09: 0000000000000002
[162513.519319] R10: 00007fff67b974f7 R11: 0000000000000246 R12: 0000000000000013
[162513.519320] R13: 00000000000001b6 R14: 00007fff67b97906 R15: 000055b1fad1c620
[162513.519332] INFO: task btrfs:1356211 blocked for more than 120 seconds.
[162513.519727] Not tainted 5.9.0-rc6-btrfs-next-69 #1
[162513.520115] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[162513.520508] task:btrfs state:D stack: 0 pid:1356211 ppid:1356178 flags:0x00004002
[162513.520511] Call Trace:
[162513.520516] __schedule+0x5ce/0xd00
[162513.520519] ? _raw_spin_unlock_irqrestore+0x3c/0x60
[162513.520525] schedule+0x46/0xf0
[162513.520544] btrfs_scrub_pause+0x11f/0x180 [btrfs]
[162513.520548] ? finish_wait+0x90/0x90
[162513.520562] btrfs_commit_transaction+0x45a/0xc30 [btrfs]
[162513.520574] ? start_transaction+0xe0/0x5f0 [btrfs]
[162513.520596] btrfs_dev_replace_finishing+0x6d8/0x711 [btrfs]
[162513.520619] btrfs_dev_replace_by_ioctl.cold+0x1cc/0x1fd [btrfs]
[162513.520639] btrfs_ioctl+0x2a25/0x36f0 [btrfs]
[162513.520643] ? do_sigaction+0xf3/0x240
[162513.520645] ? find_held_lock+0x32/0x90
[162513.520648] ? do_sigaction+0xf3/0x240
[162513.520651] ? lock_acquired+0x33b/0x470
[162513.520655] ? _raw_spin_unlock_irq+0x24/0x50
[162513.520657] ? lockdep_hardirqs_on+0x7d/0x100
[162513.520660] ? _raw_spin_unlock_irq+0x35/0x50
[162513.520662] ? do_sigaction+0xf3/0x240
[162513.520671] ? __x64_sys_ioctl+0x83/0xb0
[162513.520672] __x64_sys_ioctl+0x83/0xb0
[162513.520677] do_syscall_64+0x33/0x80
[162513.520679] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[162513.520681] RIP: 0033:0x7fc3cd307d87
[162513.520682] Code: Bad RIP value.
[162513.520684] RSP: 002b:00007ffe30a56bb8 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
[162513.520686] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007fc3cd307d87
[162513.520687] RDX: 00007ffe30a57a30 RSI: 00000000ca289435 RDI: 0000000000000003
[162513.520689] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[162513.520690] R10: 0000000000000008 R11: 0000000000000202 R12: 0000000000000003
[162513.520692] R13: 0000557323a212e0 R14: 00007ffe30a5a520 R15: 0000000000000001
[162513.520703]
Showing all locks held in the system:
[162513.520712] 1 lock held by khungtaskd/54:
[162513.520713] #0: ffffffffb40a91a0 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x15/0x197
[162513.520728] 1 lock held by in:imklog/596:
[162513.520729] #0: ffff8f3f0d781400 (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0x4d/0x60
[162513.520782] 1 lock held by btrfs-transacti/1356167:
[162513.520784] #0: ffff8f3d810cc848 (&fs_info->transaction_kthread_mutex){+.+.}-{3:3}, at: transaction_kthread+0x4a/0x170 [btrfs]
[162513.520798] 1 lock held by btrfs/1356190:
[162513.520800] #0: ffff8f3d57644470 (sb_writers#15){.+.+}-{0:0}, at: mnt_want_write_file+0x22/0x60
[162513.520805] 1 lock held by fsstress/1356184:
[162513.520806] #0: ffff8f3d576440e8 (&type->s_umount_key#62){++++}-{3:3}, at: iterate_supers+0x6f/0xf0
[162513.520811] 3 locks held by fsstress/1356185:
[162513.520812] #0: ffff8f3d57644470 (sb_writers#15){.+.+}-{0:0}, at: mnt_want_write+0x20/0x50
[162513.520815] #1: ffff8f3d80a650b8 (&type->i_mutex_dir_key#10){++++}-{3:3}, at: vfs_setxattr+0x50/0x120
[162513.520820] #2: ffff8f3d57644690 (sb_internal#2){.+.+}-{0:0}, at: start_transaction+0x40e/0x5f0 [btrfs]
[162513.520833] 1 lock held by fsstress/1356196:
[162513.520834] #0: ffff8f3d576440e8 (&type->s_umount_key#62){++++}-{3:3}, at: iterate_supers+0x6f/0xf0
[162513.520838] 3 locks held by fsstress/1356197:
[162513.520839] #0: ffff8f3d57644470 (sb_writers#15){.+.+}-{0:0}, at: mnt_want_write+0x20/0x50
[162513.520843] #1: ffff8f3d506465e8 (&type->i_mutex_dir_key#10){++++}-{3:3}, at: path_openat+0x2a7/0xa50
[162513.520846] #2: ffff8f3d57644690 (sb_internal#2){.+.+}-{0:0}, at: start_transaction+0x40e/0x5f0 [btrfs]
[162513.520858] 2 locks held by btrfs/1356211:
[162513.520859] #0: ffff8f3d810cde30 (&fs_info->dev_replace.lock_finishing_cancel_unmount){+.+.}-{3:3}, at: btrfs_dev_replace_finishing+0x52/0x711 [btrfs]
[162513.520877] #1: ffff8f3d57644690 (sb_internal#2){.+.+}-{0:0}, at: start_transaction+0x40e/0x5f0 [btrfs]
This was weird because the stack traces show that a transaction commit,
triggered by a device replace operation, is blocking trying to pause any
running scrubs but there are no stack traces of blocked tasks doing a
scrub.
After poking around with drgn, I noticed there was a scrub task that was
constantly running and blocking for shorts periods of time:
>>> t = find_task(prog, 1356190)
>>> prog.stack_trace(t)
#0 __schedule+0x5ce/0xcfc
#1 schedule+0x46/0xe4
#2 schedule_timeout+0x1df/0x475
#3 btrfs_reada_wait+0xda/0x132
#4 scrub_stripe+0x2a8/0x112f
#5 scrub_chunk+0xcd/0x134
#6 scrub_enumerate_chunks+0x29e/0x5ee
#7 btrfs_scrub_dev+0x2d5/0x91b
#8 btrfs_ioctl+0x7f5/0x36e7
#9 __x64_sys_ioctl+0x83/0xb0
#10 do_syscall_64+0x33/0x77
#11 entry_SYSCALL_64+0x7c/0x156
Which corresponds to:
int btrfs_reada_wait(void *handle)
{
struct reada_control *rc = handle;
struct btrfs_fs_info *fs_info = rc->fs_info;
while (atomic_read(&rc->elems)) {
if (!atomic_read(&fs_info->reada_works_cnt))
reada_start_machine(fs_info);
wait_event_timeout(rc->wait, atomic_read(&rc->elems) == 0,
(HZ + 9) / 10);
}
(...)
So the counter "rc->elems" was set to 1 and never decreased to 0, causing
the scrub task to loop forever in that function. Then I used the following
script for drgn to check the readahead requests:
$ cat dump_reada.py
import sys
import drgn
from drgn import NULL, Object, cast, container_of, execscript, \
reinterpret, sizeof
from drgn.helpers.linux import *
mnt_path = b"/home/fdmanana/btrfs-tests/scratch_1"
mnt = None
for mnt in for_each_mount(prog, dst = mnt_path):
pass
if mnt is None:
sys.stderr.write(f'Error: mount point {mnt_path} not found\n')
sys.exit(1)
fs_info = cast('struct btrfs_fs_info *', mnt.mnt.mnt_sb.s_fs_info)
def dump_re(re):
nzones = re.nzones.value_()
print(f're at {hex(re.value_())}')
print(f'\t logical {re.logical.value_()}')
print(f'\t refcnt {re.refcnt.value_()}')
print(f'\t nzones {nzones}')
for i in range(nzones):
dev = re.zones[i].device
name = dev.name.str.string_()
print(f'\t\t dev id {dev.devid.value_()} name {name}')
print()
for _, e in radix_tree_for_each(fs_info.reada_tree):
re = cast('struct reada_extent *', e)
dump_re(re)
$ drgn dump_reada.py
re at 0xffff8f3da9d25ad8
logical 38928384
refcnt 1
nzones 1
dev id 0 name b'/dev/sdd'
$
So there was one readahead extent with a single zone corresponding to the
source device of that last device replace operation logged in dmesg/syslog.
Also the ID of that zone's device was 0 which is a special value set in
the source device of a device replace operation when the operation finishes
(constant BTRFS_DEV_REPLACE_DEVID set at btrfs_dev_replace_finishing()),
confirming again that device /dev/sdd was the source of a device replace
operation.
Normally there should be as many zones in the readahead extent as there are
devices, and I wasn't expecting the extent to be in a block group with a
'single' profile, so I went and confirmed with the following drgn script
that there weren't any single profile block groups:
$ cat dump_block_groups.py
import sys
import drgn
from drgn import NULL, Object, cast, container_of, execscript, \
reinterpret, sizeof
from drgn.helpers.linux import *
mnt_path = b"/home/fdmanana/btrfs-tests/scratch_1"
mnt = None
for mnt in for_each_mount(prog, dst = mnt_path):
pass
if mnt is None:
sys.stderr.write(f'Error: mount point {mnt_path} not found\n')
sys.exit(1)
fs_info = cast('struct btrfs_fs_info *', mnt.mnt.mnt_sb.s_fs_info)
BTRFS_BLOCK_GROUP_DATA = (1 << 0)
BTRFS_BLOCK_GROUP_SYSTEM = (1 << 1)
BTRFS_BLOCK_GROUP_METADATA = (1 << 2)
BTRFS_BLOCK_GROUP_RAID0 = (1 << 3)
BTRFS_BLOCK_GROUP_RAID1 = (1 << 4)
BTRFS_BLOCK_GROUP_DUP = (1 << 5)
BTRFS_BLOCK_GROUP_RAID10 = (1 << 6)
BTRFS_BLOCK_GROUP_RAID5 = (1 << 7)
BTRFS_BLOCK_GROUP_RAID6 = (1 << 8)
BTRFS_BLOCK_GROUP_RAID1C3 = (1 << 9)
BTRFS_BLOCK_GROUP_RAID1C4 = (1 << 10)
def bg_flags_string(bg):
flags = bg.flags.value_()
ret = ''
if flags & BTRFS_BLOCK_GROUP_DATA:
ret = 'data'
if flags & BTRFS_BLOCK_GROUP_METADATA:
if len(ret) > 0:
ret += '|'
ret += 'meta'
if flags & BTRFS_BLOCK_GROUP_SYSTEM:
if len(ret) > 0:
ret += '|'
ret += 'system'
if flags & BTRFS_BLOCK_GROUP_RAID0:
ret += ' raid0'
elif flags & BTRFS_BLOCK_GROUP_RAID1:
ret += ' raid1'
elif flags & BTRFS_BLOCK_GROUP_DUP:
ret += ' dup'
elif flags & BTRFS_BLOCK_GROUP_RAID10:
ret += ' raid10'
elif flags & BTRFS_BLOCK_GROUP_RAID5:
ret += ' raid5'
elif flags & BTRFS_BLOCK_GROUP_RAID6:
ret += ' raid6'
elif flags & BTRFS_BLOCK_GROUP_RAID1C3:
ret += ' raid1c3'
elif flags & BTRFS_BLOCK_GROUP_RAID1C4:
ret += ' raid1c4'
else:
ret += ' single'
return ret
def dump_bg(bg):
print()
print(f'block group at {hex(bg.value_())}')
print(f'\t start {bg.start.value_()} length {bg.length.value_()}')
print(f'\t flags {bg.flags.value_()} - {bg_flags_string(bg)}')
bg_root = fs_info.block_group_cache_tree.address_of_()
for bg in rbtree_inorder_for_each_entry('struct btrfs_block_group', bg_root, 'cache_node'):
dump_bg(bg)
$ drgn dump_block_groups.py
block group at 0xffff8f3d673b0400
start 22020096 length 16777216
flags 258 - system raid6
block group at 0xffff8f3d53ddb400
start 38797312 length 536870912
flags 260 - meta raid6
block group at 0xffff8f3d5f4d9c00
start 575668224 length 2147483648
flags 257 - data raid6
block group at 0xffff8f3d08189000
start 2723151872 length 67108864
flags 258 - system raid6
block group at 0xffff8f3db70ff000
start 2790260736 length 1073741824
flags 260 - meta raid6
block group at 0xffff8f3d5f4dd800
start 3864002560 length 67108864
flags 258 - system raid6
block group at 0xffff8f3d67037000
start 3931111424 length 2147483648
flags 257 - data raid6
$
So there were only 2 reasons left for having a readahead extent with a
single zone: reada_find_zone(), called when creating a readahead extent,
returned NULL either because we failed to find the corresponding block
group or because a memory allocation failed. With some additional and
custom tracing I figured out that on every further ocurrence of the
problem the block group had just been deleted when we were looping to
create the zones for the readahead extent (at reada_find_extent()), so we
ended up with only one zone in the readahead extent, corresponding to a
device that ends up getting replaced.
So after figuring that out it became obvious why the hang happens:
1) Task A starts a scrub on any device of the filesystem, except for
device /dev/sdd;
2) Task B starts a device replace with /dev/sdd as the source device;
3) Task A calls btrfs_reada_add() from scrub_stripe() and it is currently
starting to scrub a stripe from block group X. This call to
btrfs_reada_add() is the one for the extent tree. When btrfs_reada_add()
calls reada_add_block(), it passes the logical address of the extent
tree's root node as its 'logical' argument - a value of 38928384;
4) Task A then enters reada_find_extent(), called from reada_add_block().
It finds there isn't any existing readahead extent for the logical
address 38928384, so it proceeds to the path of creating a new one.
It calls btrfs_map_block() to find out which stripes exist for the block
group X. On the first iteration of the for loop that iterates over the
stripes, it finds the stripe for device /dev/sdd, so it creates one
zone for that device and adds it to the readahead extent. Before getting
into the second iteration of the loop, the cleanup kthread deletes block
group X because it was empty. So in the iterations for the remaining
stripes it does not add more zones to the readahead extent, because the
calls to reada_find_zone() returned NULL because they couldn't find
block group X anymore.
As a result the new readahead extent has a single zone, corresponding to
the device /dev/sdd;
4) Before task A returns to btrfs_reada_add() and queues the readahead job
for the readahead work queue, task B finishes the device replace and at
btrfs_dev_replace_finishing() swaps the device /dev/sdd with the new
device /dev/sdg;
5) Task A returns to reada_add_block(), which increments the counter
"->elems" of the reada_control structure allocated at btrfs_reada_add().
Then it returns back to btrfs_reada_add() and calls
reada_start_machine(). This queues a job in the readahead work queue to
run the function reada_start_machine_worker(), which calls
__reada_start_machine().
At __reada_start_machine() we take the device list mutex and for each
device found in the current device list, we call
reada_start_machine_dev() to start the readahead work. However at this
point the device /dev/sdd was already freed and is not in the device
list anymore.
This means the corresponding readahead for the extent at 38928384 is
never started, and therefore the "->elems" counter of the reada_control
structure allocated at btrfs_reada_add() never goes down to 0, causing
the call to btrfs_reada_wait(), done by the scrub task, to wait forever.
Note that the readahead request can be made either after the device replace
started or before it started, however in pratice it is very unlikely that a
device replace is able to start after a readahead request is made and is
able to complete before the readahead request completes - maybe only on a
very small and nearly empty filesystem.
This hang however is not the only problem we can have with readahead and
device removals. When the readahead extent has other zones other than the
one corresponding to the device that is being removed (either by a device
replace or a device remove operation), we risk having a use-after-free on
the device when dropping the last reference of the readahead extent.
For example if we create a readahead extent with two zones, one for the
device /dev/sdd and one for the device /dev/sde:
1) Before the readahead worker starts, the device /dev/sdd is removed,
and the corresponding btrfs_device structure is freed. However the
readahead extent still has the zone pointing to the device structure;
2) When the readahead worker starts, it only finds device /dev/sde in the
current device list of the filesystem;
3) It starts the readahead work, at reada_start_machine_dev(), using the
device /dev/sde;
4) Then when it finishes reading the extent from device /dev/sde, it calls
__readahead_hook() which ends up dropping the last reference on the
readahead extent through the last call to reada_extent_put();
5) At reada_extent_put() it iterates over each zone of the readahead extent
and attempts to delete an element from the device's 'reada_extents'
radix tree, resulting in a use-after-free, as the device pointer of the
zone for /dev/sdd is now stale. We can also access the device after
dropping the last reference of a zone, through reada_zone_release(),
also called by reada_extent_put().
And a device remove suffers the same problem, however since it shrinks the
device size down to zero before removing the device, it is very unlikely to
still have readahead requests not completed by the time we free the device,
the only possibility is if the device has a very little space allocated.
While the hang problem is exclusive to scrub, since it is currently the
only user of btrfs_reada_add() and btrfs_reada_wait(), the use-after-free
problem affects any path that triggers readhead, which includes
btree_readahead_hook() and __readahead_hook() (a readahead worker can
trigger readahed for the children of a node) for example - any path that
ends up calling reada_add_block() can trigger the use-after-free after a
device is removed.
So fix this by waiting for any readahead requests for a device to complete
before removing a device, ensuring that while waiting for existing ones no
new ones can be made.
This problem has been around for a very long time - the readahead code was
added in 2011, device remove exists since 2008 and device replace was
introduced in 2013, hard to pick a specific commit for a git Fixes tag.
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
If we fail to find suitable zones for a new readahead extent, we end up
leaving a stale pointer in the global readahead extents radix tree
(fs_info->reada_tree), which can trigger the following trace later on:
[13367.696354] BUG: kernel NULL pointer dereference, address: 00000000000000b0
[13367.696802] #PF: supervisor read access in kernel mode
[13367.697249] #PF: error_code(0x0000) - not-present page
[13367.697721] PGD 0 P4D 0
[13367.698171] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC PTI
[13367.698632] CPU: 6 PID: 851214 Comm: btrfs Tainted: G W 5.9.0-rc6-btrfs-next-69 #1
[13367.699100] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[13367.700069] RIP: 0010:__lock_acquire+0x20a/0x3970
[13367.700562] Code: ff 1f 0f b7 c0 48 0f (...)
[13367.701609] RSP: 0018:ffffb14448f57790 EFLAGS: 00010046
[13367.702140] RAX: 0000000000000000 RBX: 29b935140c15e8cf RCX: 0000000000000000
[13367.702698] RDX: 0000000000000002 RSI: ffffffffb3d66bd0 RDI: 0000000000000046
[13367.703240] RBP: ffff8a52ba8ac040 R08: 00000c2866ad9288 R09: 0000000000000001
[13367.703783] R10: 0000000000000001 R11: 00000000b66d9b53 R12: ffff8a52ba8ac9b0
[13367.704330] R13: 0000000000000000 R14: ffff8a532b6333e8 R15: 0000000000000000
[13367.704880] FS: 00007fe1df6b5700(0000) GS:ffff8a5376600000(0000) knlGS:0000000000000000
[13367.705438] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[13367.705995] CR2: 00000000000000b0 CR3: 000000022cca8004 CR4: 00000000003706e0
[13367.706565] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[13367.707127] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[13367.707686] Call Trace:
[13367.708246] ? ___slab_alloc+0x395/0x740
[13367.708820] ? reada_add_block+0xae/0xee0 [btrfs]
[13367.709383] lock_acquire+0xb1/0x480
[13367.709955] ? reada_add_block+0xe0/0xee0 [btrfs]
[13367.710537] ? reada_add_block+0xae/0xee0 [btrfs]
[13367.711097] ? rcu_read_lock_sched_held+0x5d/0x90
[13367.711659] ? kmem_cache_alloc_trace+0x8d2/0x990
[13367.712221] ? lock_acquired+0x33b/0x470
[13367.712784] _raw_spin_lock+0x34/0x80
[13367.713356] ? reada_add_block+0xe0/0xee0 [btrfs]
[13367.713966] reada_add_block+0xe0/0xee0 [btrfs]
[13367.714529] ? btrfs_root_node+0x15/0x1f0 [btrfs]
[13367.715077] btrfs_reada_add+0x117/0x170 [btrfs]
[13367.715620] scrub_stripe+0x21e/0x10d0 [btrfs]
[13367.716141] ? kvm_sched_clock_read+0x5/0x10
[13367.716657] ? __lock_acquire+0x41e/0x3970
[13367.717184] ? scrub_chunk+0x60/0x140 [btrfs]
[13367.717697] ? find_held_lock+0x32/0x90
[13367.718254] ? scrub_chunk+0x60/0x140 [btrfs]
[13367.718773] ? lock_acquired+0x33b/0x470
[13367.719278] ? scrub_chunk+0xcd/0x140 [btrfs]
[13367.719786] scrub_chunk+0xcd/0x140 [btrfs]
[13367.720291] scrub_enumerate_chunks+0x270/0x5c0 [btrfs]
[13367.720787] ? finish_wait+0x90/0x90
[13367.721281] btrfs_scrub_dev+0x1ee/0x620 [btrfs]
[13367.721762] ? rcu_read_lock_any_held+0x8e/0xb0
[13367.722235] ? preempt_count_add+0x49/0xa0
[13367.722710] ? __sb_start_write+0x19b/0x290
[13367.723192] btrfs_ioctl+0x7f5/0x36f0 [btrfs]
[13367.723660] ? __fget_files+0x101/0x1d0
[13367.724118] ? find_held_lock+0x32/0x90
[13367.724559] ? __fget_files+0x101/0x1d0
[13367.724982] ? __x64_sys_ioctl+0x83/0xb0
[13367.725399] __x64_sys_ioctl+0x83/0xb0
[13367.725802] do_syscall_64+0x33/0x80
[13367.726188] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[13367.726574] RIP: 0033:0x7fe1df7add87
[13367.726948] Code: 00 00 00 48 8b 05 09 91 (...)
[13367.727763] RSP: 002b:00007fe1df6b4d48 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[13367.728179] RAX: ffffffffffffffda RBX: 000055ce1fb596a0 RCX: 00007fe1df7add87
[13367.728604] RDX: 000055ce1fb596a0 RSI: 00000000c400941b RDI: 0000000000000003
[13367.729021] RBP: 0000000000000000 R08: 00007fe1df6b5700 R09: 0000000000000000
[13367.729431] R10: 00007fe1df6b5700 R11: 0000000000000246 R12: 00007ffd922b07de
[13367.729842] R13: 00007ffd922b07df R14: 00007fe1df6b4e40 R15: 0000000000802000
[13367.730275] Modules linked in: btrfs blake2b_generic xor (...)
[13367.732638] CR2: 00000000000000b0
[13367.733166] ---[ end trace d298b6805556acd9 ]---
What happens is the following:
1) At reada_find_extent() we don't find any existing readahead extent for
the metadata extent starting at logical address X;
2) So we proceed to create a new one. We then call btrfs_map_block() to get
information about which stripes contain extent X;
3) After that we iterate over the stripes and create only one zone for the
readahead extent - only one because reada_find_zone() returned NULL for
all iterations except for one, either because a memory allocation failed
or it couldn't find the block group of the extent (it may have just been
deleted);
4) We then add the new readahead extent to the readahead extents radix
tree at fs_info->reada_tree;
5) Then we iterate over each zone of the new readahead extent, and find
that the device used for that zone no longer exists, because it was
removed or it was the source device of a device replace operation.
Since this left 'have_zone' set to 0, after finishing the loop we jump
to the 'error' label, call kfree() on the new readahead extent and
return without removing it from the radix tree at fs_info->reada_tree;
6) Any future call to reada_find_extent() for the logical address X will
find the stale pointer in the readahead extents radix tree, increment
its reference counter, which can trigger the use-after-free right
away or return it to the caller reada_add_block() that results in the
use-after-free of the example trace above.
So fix this by making sure we delete the readahead extent from the radix
tree if we fail to setup zones for it (when 'have_zone = 0').
Fixes: 3194502118 ("btrfs: reada: bypass adding extent when all zone failed")
CC: stable@vger.kernel.org # 4.9+
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This patch addresses a compile warning:
fs/btrfs/extent-tree.c: In function '__btrfs_free_extent':
fs/btrfs/extent-tree.c:3187:4: warning: format '%lu' expects argument of type 'long unsigned int', but argument 8 has type 'unsigned int' [-Wformat=]
Fixes: 1c2a07f598 ("btrfs: extent-tree: kill BUG_ON() in __btrfs_free_extent()")
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Pujin Shi <shipujin.t@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Similarly to commit 457e490f2b ("blkcg: allocate struct blkcg_gq
outside request queue spinlock"), blkg_create can also trigger
occasional -ENOMEM failures at the radix insertion because any
allocation inside blkg_create has to be non-blocking, making it more
likely to fail. This causes trouble for userspace tools trying to
configure io weights who need to deal with this condition.
This patch reduces the occurrence of -ENOMEMs on this path by preloading
the radix tree element on a GFP_KERNEL context, such that we guarantee
the later non-blocking insertion won't fail.
A similar solution exists in blkcg_init_queue for the same situation.
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Currently s390 build is broken.
SECTCMP .boot.data
error: section .boot.data differs between vmlinux and arch/s390/boot/compressed/vmlinux
make[2]: *** [arch/s390/boot/section_cmp.boot.data] Error 1
SECTCMP .boot.preserved.data
error: section .boot.preserved.data differs between vmlinux and arch/s390/boot/compressed/vmlinux
make[2]: *** [arch/s390/boot/section_cmp.boot.preserved.data] Error 1
make[1]: *** [bzImage] Error 2
Commit 33def8498f ("treewide: Convert macro and uses of __section(foo)
to __section("foo")") converted all __section(foo) to __section("foo").
This is wrong for __bootdata / __bootdata_preserved macros which want
variable names to be a part of intermediate section names .boot.data.<var
name> and .boot.preserved.data.<var name>. Those sections are later
sorted by alignment + name and merged together into final .boot.data
/ .boot.preserved.data sections. Those sections must be identical in
the decompressor and the decompressed kernel (that is checked during
the build).
Fixes: 33def8498f ("treewide: Convert macro and uses of __section(foo) to __section("foo")")
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
As it stands now, the vdso32 Makefile hardcodes the linker to ld.bfd
using -fuse-ld=bfd with $(CC). This was taken from the arm vDSO
Makefile, as the comment notes, done in commit d2b30cd4b7 ("ARM:
8384/1: VDSO: force use of BFD linker").
Commit fe00e50b2d ("ARM: 8858/1: vdso: use $(LD) instead of $(CC) to
link VDSO") changed that Makefile to use $(LD) directly instead of
through $(CC), which matches how the rest of the kernel operates. Since
then, LD=ld.lld means that the arm vDSO will be linked with ld.lld,
which has shown no problems so far.
Allow ld.lld to link this vDSO as we do the regular arm vDSO. To do
this, we need to do a few things:
* Add a LD_COMPAT variable, which defaults to $(CROSS_COMPILE_COMPAT)ld
with gcc and $(LD) if LLVM is 1, which will be ld.lld, or
$(CROSS_COMPILE_COMPAT)ld if not, which matches the logic of the main
Makefile. It is overrideable for further customization and avoiding
breakage.
* Eliminate cc32-ldoption, which matches commit 055efab312 ("kbuild:
drop support for cc-ldoption").
With those, we can use $(LD_COMPAT) in cmd_ldvdso and change the flags
from compiler linker flags to linker flags directly. We eliminate
-mfloat-abi=soft because it is not handled by the linker.
Reported-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Link: https://github.com/ClangBuiltLinux/linux/issues/1033
Link: https://lore.kernel.org/r/20201020011406.1818918-1-natechancellor@gmail.com
Signed-off-by: Will Deacon <will@kernel.org>
ARM SCMI fixes for v5.10
Handful of fixes, some of them fixing issues since SCMI was merged.
It indicates that it is getting used now and people are finding bugs.
Fixes include unblocking multiple thread access with SMC/HVC transport,
ARCH_COLD_RESET value, locking in notifications, avoiding workqueue
name duplication with multiple instances of SCMI and adding missing
Rx size initialisation when using buffers for repeat transmits.
* tag 'scmi-fixes-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux:
firmware: arm_scmi: Fix duplicate workqueue name
firmware: arm_scmi: Fix locking in notifications
firmware: arm_scmi: Add missing Rx size re-initialisation
firmware: arm_scmi: Expand SMC/HVC message pool to more than one
firmware: arm_scmi: Fix ARCH_COLD_RESET
Link: https://lore.kernel.org/r/20201015123518.GA3839@bogus
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
When GPIO library asks pin control to set the bias, it doesn't pass
any value of it and argument is considered boolean (and this is true
for ACPI GpioIo() / GpioInt() resources, by the way). Thus, individual
drivers must behave well, when they got the resistance value of 1 Ohm,
i.e. transforming it to sane default.
In case of Intel pin control hardware the 5 kOhm sounds plausible
because on one hand it's a minimum of resistors present in all
hardware generations and at the same time it's high enough to minimize
leakage current (will be only 200 uA with the above choice).
Fixes: e57725eabf ("pinctrl: intel: Add support for hardware debouncer")
Reported-by: Jamie McClymont <jamie@kwiius.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
2 kOhm bias was never an option in Intel GPIO hardware, the available
matrix is:
000 none
001 1 kOhm (if available)
010 5 kOhm
100 20 kOhm
As easy to get the 3 resistors are gated separately and according to
parallel circuits calculations we may get combinations of the above where
the result is always strictly less than minimal resistance. Hence,
additional values can be:
011 ~833.3 Ohm
101 ~952.4 Ohm
110 ~4 kOhm
111 ~800 Ohm
That said, convert TERM definitions to be the bit masks to reflect the above.
While at it, enable the same setting for pull down case.
Fixes: 7981c0015a ("pinctrl: intel: Add Intel Sunrisepoint pin controller and GPIO support")
Cc: Jamie McClymont <jamie@kwiius.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
The Ethernet PHY on the Bananapi M64 has the RX and TX delays
enabled on the PHY, using pull-ups on the RXDLY and TXDLY pins.
Fix the phy-mode description to correct reflect this so that the
implementation doesn't reconfigure the delays incorrectly. This
happened with commit bbc4d71d63 ("net: phy: realtek: fix rtl8211e
rx/tx delay config").
Fixes: e729549990 ("arm64: allwinner: bananapi-m64: Enable dwmac-sun8i")
Fixes: 94f4428867 ("arm64: dts: allwinner: A64: Restore EMAC changes")
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Tested-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Acked-by: Jernej Skrabec <jernej.skrabec@siol.net>
Link: https://lore.kernel.org/r/20201024162515.30032-10-wens@kernel.org
The Ethernet PHY on the Libre Computer ALL-H5-CC has the RX and TX
delays enabled on the PHY, using pull-ups on the RXDLY and TXDLY pins.
Fix the phy-mode description to correct reflect this so that the
implementation doesn't reconfigure the delays incorrectly. This
happened with commit bbc4d71d63 ("net: phy: realtek: fix rtl8211e
rx/tx delay config").
Fixes: 60d0426d76 ("arm64: dts: allwinner: h5: Add Libre Computer ALL-H5-CC H5 board")
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Acked-by: Jernej Skrabec <jernej.skrabec@siol.net>
Link: https://lore.kernel.org/r/20201024162515.30032-9-wens@kernel.org
The Ethernet PHY on the Bananapi M2+ has the RX and TX delays
enabled on the PHY, using pull-ups on the RXDLY and TXDLY pins.
Fix the phy-mode description to correct reflect this so that the
implementation doesn't reconfigure the delays incorrectly. This
happened with commit bbc4d71d63 ("net: phy: realtek: fix rtl8211e
rx/tx delay config").
Fixes: 8c7ba536e7 ("ARM: sun8i: bananapi-m2-plus: Enable dwmac-sun8i")
Fixes: 4904337fe3 ("ARM: dts: sunxi: Restore EMAC changes (boards)")
Fixes: aa8fee415f ("ARM: dts: sun8i: h3: Split out non-SoC-specific parts of Bananapi M2 Plus")
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Tested-by: Jernej Skrabec <jernej.skrabec@siol.net>
Acked-by: Jernej Skrabec <jernej.skrabec@siol.net>
Link: https://lore.kernel.org/r/20201024162515.30032-8-wens@kernel.org
The Ethernet PHY on the Cubieboard 4 and A80 Optimus have the RX
and TX delays enabled on the PHY, using pull-ups on the RXDLY and
TXDLY pins.
Fix the phy-mode description to correct reflect this so that the
implementation doesn't reconfigure the delays incorrectly. This
happened with commit bbc4d71d63 ("net: phy: realtek: fix rtl8211e
rx/tx delay config").
Fixes: 98048143b7 ("ARM: dts: sun9i: cubieboard4: Enable GMAC")
Fixes: bc9bd03a44 ("ARM: dts: sun9i: a80-optimus: Enable GMAC")
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Acked-by: Jernej Skrabec <jernej.skrabec@siol.net>
Link: https://lore.kernel.org/r/20201024162515.30032-7-wens@kernel.org
The Ethernet PHY on the Bananapi M3 and Cubietruck Plus have the RX
and TX delays enabled on the PHY, using pull-ups on the RXDLY and
TXDLY pins.
Fix the phy-mode description to correct reflect this so that the
implementation doesn't reconfigure the delays incorrectly. This
happened with commit bbc4d71d63 ("net: phy: realtek: fix rtl8211e
rx/tx delay config").
Fixes: 039359948a ("ARM: dts: sun8i: a83t: Enable Ethernet on two boards")
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Acked-by: Jernej Skrabec <jernej.skrabec@siol.net>
Link: https://lore.kernel.org/r/20201024162515.30032-6-wens@kernel.org
The Ethernet PHY on the Orange Pi Plus 2E has the RX and TX delays
enabled on the PHY, using pull-ups on the RXDLY and TXDLY pins.
Fix the phy-mode description to correct reflect this so that the
implementation doesn't reconfigure the delays incorrectly. This
happened with commit bbc4d71d63 ("net: phy: realtek: fix rtl8211e
rx/tx delay config").
Fixes: 4904337fe3 ("ARM: dts: sunxi: Restore EMAC changes (boards)")
Fixes: 7a78ef92cd ("ARM: sun8i: h3: Enable EMAC with external PHY on Orange Pi Plus 2E")
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Tested-by: Jernej Skrabec <jernej.skrabec@siol.net>
Acked-by: Jernej Skrabec <jernej.skrabec@siol.net>
Link: https://lore.kernel.org/r/20201024162515.30032-5-wens@kernel.org
The Ethernet PHY on the Bananapi M1+ has the RX and TX delays
enabled on the PHY, using pull-ups on the RXDLY and TXDLY pins.
Fix the phy-mode description to correct reflect this so that the
implementation doesn't reconfigure the delays incorrectly. This
happened with commit bbc4d71d63 ("net: phy: realtek: fix rtl8211e
rx/tx delay config").
Fixes: 04c85ecad3 ("ARM: dts: sun7i: Add dts file for Bananapi M1 Plus board")
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Acked-by: Jernej Skrabec <jernej.skrabec@siol.net>
Link: https://lore.kernel.org/r/20201024162515.30032-4-wens@kernel.org
The Ethernet PHY on the Cubietruck has the RX and TX delays
enabled on the PHY, using pull-ups on the RXDLY and TXDLY pins.
Fix the phy-mode description to correct reflect this so that the
implementation doesn't reconfigure the delays incorrectly. This
happened with commit bbc4d71d63 ("net: phy: realtek: fix rtl8211e
rx/tx delay config").
Fixes: 67073d9767 ("ARM: dts: sun7i: cubietruck: Enable the GMAC")
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Tested-by: Emilio López <emilio@elopez.com.ar>
Reviewed-by: Emilio López <emilio@elopez.com.ar>
Link: https://lore.kernel.org/r/20201024162515.30032-3-wens@kernel.org
The Ethernet PHY on the A31 Hummingbird has the RX and TX delays
enabled on the PHY, using pull-ups on the RXDLY and TXDLY pins.
Fix the phy-mode description to correct reflect this so that the
implementation doesn't reconfigure the delays incorrectly. This
happened with commit bbc4d71d63 ("net: phy: realtek: fix rtl8211e
rx/tx delay config").
Fixes: c220aec2bb ("ARM: dts: sun6i: Add Merrii A31 Hummingbird support")
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Acked-by: Jernej Skrabec <jernej.skrabec@siol.net>
Link: https://lore.kernel.org/r/20201024162515.30032-2-wens@kernel.org
According to board schematic, PHY provides both, RX and TX delays.
However, according to "fix" Realtek provided for this board, only TX
delay should be provided by PHY.
Tests show that both variants work but TX only PHY delay works
slightly better.
Update ethernet node to reflect the fact that PHY provides TX delay.
Fixes: 94f4428867 ("arm64: dts: allwinner: A64: Restore EMAC changes")
Signed-off-by: Jernej Skrabec <jernej.skrabec@siol.net>
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Link: https://lore.kernel.org/r/20201022211301.3548422-1-jernej.skrabec@siol.net
Before the commit bbc4d71d63 ("net: phy: realtek: fix rtl8211e rx/tx
delay config"), the software overwrite for RX/TX delays of the RTL8211e
were not working properly and the Beelink GS1 had both RX/TX delay of RGMII
interface set using pull-up on the TXDLY and RXDLY pins.
Now that these delays are working properly they overwrite the HW
config and set this to 'rgmii' meaning no delay on both RX/TX.
This makes the ethernet of this board not working anymore.
Set the phy-mode to 'rgmii-id' meaning RGMII with RX/TX delays
in the device-tree to keep the correct configuration.
Fixes: 089bee8dd1 ("arm64: dts: allwinner: h6: Introduce Beelink GS1 board")
Signed-off-by: Clément Péron <peron.clem@gmail.com>
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Acked-by: Chen-Yu Tsai <wens@csie.org>
Link: https://lore.kernel.org/r/20201018172409.1754775-1-peron.clem@gmail.com
UBSAN reports:
Undefined behaviour in ./include/linux/time64.h:127:27
signed integer overflow:
17179869187 * 1000000000 cannot be represented in type 'long long int'
Call Trace:
timespec64_to_ns include/linux/time64.h:127 [inline]
set_cpu_itimer+0x65c/0x880 kernel/time/itimer.c:180
do_setitimer+0x8e/0x740 kernel/time/itimer.c:245
__x64_sys_setitimer+0x14c/0x2c0 kernel/time/itimer.c:336
do_syscall_64+0xa1/0x540 arch/x86/entry/common.c:295
Commit bd40a17576 ("y2038: itimer: change implementation to timespec64")
replaced the original conversion which handled time clamping correctly with
timespec64_to_ns() which has no overflow protection.
Fix it in timespec64_to_ns() as this is not necessarily limited to the
usage in itimers.
[ tglx: Added comment and adjusted the fixes tag ]
Fixes: 361a3bf005 ("time64: Add time64.h header and define struct timespec64")
Signed-off-by: Zeng Tao <prime.zeng@hisilicon.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/1598952616-6416-1-git-send-email-prime.zeng@hisilicon.com
When using the scaler on the A10-like frontend with single-planar formats,
the current code will setup the channel 0 filter (used for the R or Y
component) with a different phase parameter than the channel 1 filter (used
for the G/B or U/V components).
This creates a bleed out that keeps repeating on of the last line of the
RGB plane across the rest of the display. The Allwinner BSP either applies
the same phase parameter over both channels or use a separate one, the
condition being whether the input format is YUV420 or not.
Since YUV420 is both subsampled and multi-planar, and since YUYV is
subsampled but single-planar, we can rule out the subsampling and assume
that the condition is actually whether the format is single or
multi-planar. And it looks like applying the same phase parameter over both
channels for single-planar formats fixes our issue, while we keep the
multi-planar formats working properly.
Reported-by: Taras Galchenko <tpgalchenko@gmail.com>
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Acked-by: Jernej Skrabec <jernej.skrabec@siol.net>
Link: https://patchwork.freedesktop.org/patch/msgid/20201015093642.261440-2-maxime@cerno.tech
The scaler filter phase setup in the allwinner kernel has two different
cases for setting up the scaler filter, the first one using different phase
parameters for the two channels, and the second one reusing the first
channel parameters on the second channel.
The allwinner kernel has a third option where the horizontal phase of the
second channel will be set to a different value than the vertical one (and
seems like it's the same value than one used on the first channel).
However, that code path seems to never be taken, so we can ignore it for
now, and it's essentially what we're doing so far as well.
Since we will have always the same values across each components of the
filter setup for a given channel, we can simplify a bit our frontend
structure by only storing the phase value we want to apply to a given
channel.
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Acked-by: Jernej Skrabec <jernej.skrabec@siol.net>
Link: https://patchwork.freedesktop.org/patch/msgid/20201015093642.261440-1-maxime@cerno.tech
We have a few leftovers from the merge window period in
drm-misc-next-fixes, let's bring them into drm-misc-fixes
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Since sched_clock_read_begin() and sched_clock_read_retry() are called
by notrace function sched_clock(), they shouldn't be traceable either,
or else ftrace_graph_caller will run into a dead loop on the path
as below (arm for instance):
ftrace_graph_caller()
prepare_ftrace_return()
function_graph_enter()
ftrace_push_return_trace()
trace_clock_local()
sched_clock()
sched_clock_read_begin/retry()
Fixes: 1b86abc1c6 ("sched_clock: Expose struct clock_read_data")
Signed-off-by: Quanyang Wang <quanyang.wang@windriver.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20200929082027.16787-1-quanyang.wang@windriver.com
This has not been required since commit 75229eca56 ("drm: Make
drm_encoder_helper_funcs optional").
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Reenable kernel login method for kernel TEE client API
The kernel TEE login method was accidentally disabled previously when
enabling a few other login methods, so fix that here.
* tag 'tee-fix-for-v5.10' of git://git.linaro.org:/people/jens.wiklander/linux-tee:
tee: client UUID: Skip REE kernel login method as well
Link: https://lore.kernel.org/r/20201013070918.GA3328976@jade
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
The camera interfaces on MMP3 are on a separate power island that needs
to be turned on for them to operate and, ideally, turned off when the
cameras are not in use.
This hooks the power island with the camera interfaces in the device
tree.
Link: https://lore.kernel.org/r/20200925234805.228251-2-lkundrak@v3.sk
Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Use drmm_mode_config_init() and drop the explicit calls to
drm_mode_config_cleanup().
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This is the same code and comment that is already shared by imx-ldb,
imx-tve, and parallel-display in imx_drm_encoder_parse_of().
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Sam Ravnborg <sam@ravnborg.org>
ipu_mbus_code_to_colorspace, ipu_stride_to_bytes, and
ipu_pixelformat_is_planar are unused. Remove them.
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Sam Ravnborg <sam@ravnborg.org>
mvebu fixes for 5.9 (part 1)
- Allow to use correct MAC address for particular DSA slaves /
ethernet ports on Espressobin (Armada 3720)
- Remove incorrect check in ll_get_coherency_base() used for Armada
370/XP SoCs.
* tag 'mvebu-fixes-5.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/gclement/mvebu:
ARM: mvebu: drop pointless check for coherency_base
arm64: dts: marvell: espressobin: Add ethernet switch aliases
Link: https://lore.kernel.org/r/87y2kkesj5.fsf@BL-laptop
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
I am leaving Socionext. Orphan the UniPhier platform and Denali NAND
driver until somebody takes the role.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Bail out early from sysc_wait_softreset() just like we do in sysc_reset()
if there's no sysstatus srst_shift to fix a bogus resetdone warning on
enable as suggested by Grygorii Strashko <grygorii.strashko@ti.com>.
We do not currently handle resets for modules that need writing to the
sysstatus register. If we at some point add that, we also need to add
SYSS_QUIRK_RESETDONE_INVERTED flag for cpsw as the sysstatus bit is low
when reset is done as described in the am335x TRM "Table 14-202
SOFT_RESET Register Field Descriptions"
Fixes: d46f9fbec7 ("bus: ti-sysc: Use optional clocks on for enable and wait for softreset bit")
Suggested-by: Grygorii Strashko <grygorii.strashko@ti.com>
Acked-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Commit d46f9fbec7 ("bus: ti-sysc: Use optional clocks on for enable and
wait for softreset bit") started showing a "OCP softreset timed out"
warning on enable if the interconnect target module is not out of reset.
This caused the warning to be often triggered for i2c and hdq while the
devices are working properly.
Turns out that some interconnect target modules seem to have an unusable
reset status bits unless the module specific reset quirks are activated.
Let's just skip the reset status check for those modules as we only want
to activate the reset quirks when doing a reset, and not on enable. This
way we don't see the bogus "OCP softreset timed out" warnings during boot.
Fixes: d46f9fbec7 ("bus: ti-sysc: Use optional clocks on for enable and wait for softreset bit")
Signed-off-by: Tony Lindgren <tony@atomide.com>
We should select also PM_GENERIC_DOMAINS_OF in addition to
PM_GENERIC_DOMAINS to have device tree based PM domain providers
enabled.
Fixes: 58cbff023b ("soc: ti: omap-prm: Add basic power domain support")
Signed-off-by: Tony Lindgren <tony@atomide.com>
i.MX SoC GPIO driver provides the basic functions of GPIO pin operations
and IRQ operations, it is now changed from "def_bool y" to "tristate", so
it should be explicitly enabled to make sure all consumers work normally.
Signed-off-by: Anson Huang <Anson.Huang@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
i.MX SoC GPIO driver provides the basic functions of GPIO pin operations
and IRQ operations, it is now changed from "def_bool y" to "tristate", so
it should be explicitly enabled to make sure all consumers work normally.
Signed-off-by: Anson Huang <Anson.Huang@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
i.MX SoC GPIO driver provides the basic functions of GPIO pin operations
and IRQ operations, it is now changed from "def_bool y" to "tristate", so
it should be explicitly enabled to make sure all consumers work normally.
Signed-off-by: Anson Huang <Anson.Huang@nxp.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
I accidentally misplaced select PM_GENERIC_DOMAINS, it should be
selected for all the SoCs instead.
Fixes: 58cbff023b ("soc: ti: omap-prm: Add basic power domain support")
Signed-off-by: Tony Lindgren <tony@atomide.com>
Commit
db227c19e6 ("ARM: 8985/1: efi/decompressor: deal with HYP mode boot gracefully")
updated the EFI entry code to permit firmware to invoke the EFI stub
loader in HYP mode, with the MMU either enabled or disabled, neither
of which is permitted by the EFI spec, but which does happen in the
field.
In the MMU on case, we remain in HYP mode as configured by the firmware,
and rely on the fact that any HVC instruction issued in this mode will
be dispatched via the SVC slot in the HYP vector table. However, this
slot will point to a Thumb2 symbol if the kernel is built in Thumb2
mode, and so we have to configure HSCTLR to ensure that the exception
handlers are invoked in Thumb2 mode as well.
Fixes: db227c19e6 ("ARM: 8985/1: efi/decompressor: deal with HYP mode boot gracefully")
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
The reserved-memory overlap detection code fails to detect overlaps if
either of the regions starts at address 0x0. The code explicitly checks
for and ignores such regions, apparently in order to ignore dynamically
allocated regions which have an address of 0x0 at this point. These
dynamically allocated regions also have a size of 0x0 at this point, so
fix this by removing the check and sorting the dynamically allocated
regions ahead of any static regions at address 0x0.
For example, there are two overlaps in this case but they are not
currently reported:
foo@0 {
reg = <0x0 0x2000>;
};
bar@0 {
reg = <0x0 0x1000>;
};
baz@1000 {
reg = <0x1000 0x1000>;
};
quux {
size = <0x1000>;
};
but they are after this patch:
OF: reserved mem: OVERLAP DETECTED!
bar@0 (0x00000000--0x00001000) overlaps with foo@0 (0x00000000--0x00002000)
OF: reserved mem: OVERLAP DETECTED!
foo@0 (0x00000000--0x00002000) overlaps with baz@1000 (0x00001000--0x00002000)
Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Link: https://lore.kernel.org/r/ded6fd6b47b58741aabdcc6967f73eca6a3f311e.1603273666.git-series.vincent.whitchurch@axis.com
Signed-off-by: Rob Herring <robh@kernel.org>
io_poll_double_wake() is called for both request types - both pure poll
requests, and internal polls. This means that we should be using the
right handler based on the request type. Use the one that the original
caller already assigned for the waitqueue handling, that will always
match the correct type.
Cc: stable@vger.kernel.org # v5.8+
Reported-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
An interrupt submitted to an affinity change will always be left enabled
after plic_set_affinity() has been called, while the expectation is that
it should stay in whatever state it was before the call.
Preserving the configuration fixes a PWM hang issue on the Unleashed
board.
[ 919.015783] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[ 919.020922] rcu: 0-...0: (0 ticks this GP)
idle=7d2/1/0x4000000000000002 softirq=1424/1424 fqs=105807
[ 919.030295] (detected by 1, t=225825 jiffies, g=1561, q=3496)
[ 919.036109] Task dump for CPU 0:
[ 919.039321] kworker/0:1 R running task 0 30 2 0x00000008
[ 919.046359] Workqueue: events set_brightness_delayed
[ 919.051302] Call Trace:
[ 919.053738] [<ffffffe000930d92>] __schedule+0x194/0x4de
[ 982.035783] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[ 982.040923] rcu: 0-...0: (0 ticks this GP)
idle=7d2/1/0x4000000000000002 softirq=1424/1424 fqs=113325
[ 982.050294] (detected by 1, t=241580 jiffies, g=1561, q=3509)
[ 982.056108] Task dump for CPU 0:
[ 982.059321] kworker/0:1 R running task 0 30 2 0x00000008
[ 982.066359] Workqueue: events set_brightness_delayed
[ 982.071302] Call Trace:
[ 982.073739] [<ffffffe000930d92>] __schedule+0x194/0x4de
[..]
Fixes: bb0fed1c60 ("irqchip/sifive-plic: Switch to fasteoi flow")
Signed-off-by: Greentime Hu <greentime.hu@sifive.com>
[maz: tidy-up commit message]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20201020081532.2377-1-greentime.hu@sifive.com
bcm2836_arm_irqchip_smp_init() calls set_smp_ipi_range(), which has
an __init annotation. Make sure the caller has the same annotation.
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
With SO_RCVLOWAT, under memory pressure,
it is possible to enter a state where:
1. We have not received enough bytes to satisfy SO_RCVLOWAT.
2. We have not entered buffer pressure (see tcp_rmem_pressure()).
3. But, we do not have enough buffer space to accept more packets.
In this case, we advertise 0 rwnd (due to #3) but the application does
not drain the receive queue (no wakeup because of #1 and #2) so the
flow stalls.
Modify the heuristic for SO_RCVLOWAT so that, if we are advertising
rwnd<=rcv_mss, force a wakeup to prevent a stall.
Without this patch, setting tcp_rmem to 6143 and disabling TCP
autotune causes a stalled flow. With this patch, no stall occurs. This
is with RPC-style traffic with large messages.
Fixes: 03f45c883c ("tcp: avoid extra wakeups for SO_RCVLOWAT users")
Signed-off-by: Arjun Roy <arjunroy@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20201023184709.217614-1-arjunroy.kdev@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Clang warns about the extra parentheses in this comparison:
drivers/net/ethernet/freescale/ucc_geth.c:1361:28:
warning: equality comparison with extraneous parentheses
if ((ugeth->phy_interface == PHY_INTERFACE_MODE_SGMII))
~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~
It seems clear the intent here is to do a comparison not an
assignment, so drop the extra parentheses to avoid any confusion.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201023033236.3296988-1-mpe@ellerman.id.au
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The sentinel descriptor entry was getting missed in the
traverse of the ring from head to tail, so change to a
loop of 0 to the end.
Fixes: f1d2e894f1 ("ionic: use index not pointer for queue tracking")
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kmemleak pointed out to us that ionic_rx_flush() is sending
skbs into napi_gro_XXX with a disabled napi context, and these
end up getting lost and leaked. We can safely remove the flush.
Fixes: 0f3154e6bc ("ionic: Add Tx and Rx handling")
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The sparse complaints around the static_asserts were obscuring
more useful complaints. So, don't check the static_asserts,
and fix the remaining sparse complaints.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
chtls_pt_recvmsg() receives a skb with tls header and subsequent
skb with data, need to finalize the data copy whenever next skb
with tls header is available. but here current tls header is
overwritten by next available tls header, ends up corrupting
user buffer data. fixing it by finalizing current record whenever
next skb contains tls header.
v1->v2:
- Improved commit message.
Fixes: 17a7d24aa8 ("crypto: chtls - generic handling of data and hdr")
Signed-off-by: Vinay Kumar Yadav <vinay.yadav@chelsio.com>
Link: https://lore.kernel.org/r/20201022190556.21308-1-vinay.yadav@chelsio.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
IPA transactions describe actions to be performed by the IPA
hardware. Three cases use IPA transactions: transmitting a socket
buffer; providing a page to receive packet data; and issuing an IPA
immediate command. An IPA transaction contains a scatter/gather
list (SGL) to hold the set of actions to be performed.
We map buffers in the SGL for DMA at the time they are added to the
transaction. For skb TX transactions, we fill the SGL with a call
to skb_to_sgvec(). Page RX transactions involve a single page
pointer, and that is recorded in the SGL with sg_set_page(). In
both of these cases we then map the SGL for DMA with a call to
dma_map_sg().
Immediate commands are different. The payload for an immediate
command comes from a region of coherent DMA memory, which must
*not* be mapped for DMA. For that reason, gsi_trans_cmd_add()
sort of hand-crafts each SGL entry added to a command transaction.
This patch fixes a problem with the code that crafts the SGL entry
for an immediate command. Previously a portion of the SGL entry was
updated using sg_set_buf(). However this is not valid because it
includes a call to virt_to_page() on the buffer, but the command
buffer pointer is not a linear address.
Since we never actually map the SGL for command transactions, there
are very few fields in the SGL we need to fill. Specifically, we
only need to record the DMA address and the length, so they can be
used by __gsi_trans_commit() to fill a TRE. We additionally need to
preserve the SGL flags so for_each_sg() still works. For that we
can simply assign a null page pointer for command SGL entries.
Fixes: 9dd441e4ed ("soc: qcom: ipa: GSI transactions")
Reported-by: Stephen Boyd <swboyd@chromium.org>
Tested-by: Stephen Boyd <swboyd@chromium.org>
Signed-off-by: Alex Elder <elder@linaro.org>
Link: https://lore.kernel.org/r/20201022010029.11877-1-elder@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
One of the assignments that was removed by commit 4a0c1de64b ("crypto:
x86/poly1305 - Remove assignments with no effect") is actually needed,
since it affects the return value.
This fixes the following crypto self-test failure:
alg: shash: poly1305-simd test failed (wrong result) on test vector 2, cfg="init+update+final aligned buffer"
Fixes: 4a0c1de64b ("crypto: x86/poly1305 - Remove assignments with no effect")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
turbostat tends to get confused when CPUs are added and removed
while it is running.
There are races, such as checking the current cpu, and then
reading a sysfs file that depends on that cpu number.
Close the two issues that seem to come up the most.
First, there is an infinite reset loop detector --
change that to allow more resets before giving up.
Secondly, one of those file reads didn't really need
to exit the program on failure...
Signed-off-by: Len Brown <len.brown@intel.com>
cpu1: MSR_IA32_TEMPERATURE_TARGET: 0x05640000 (95 C) (100 default - 5 offset)
Account for the new "offset" field in MSR_TEMPERATURE_TARGET.
While this field is usually zero, ignoring it results in over-stating
the current temperature, both per-core and per-package.
Signed-off-by: Len Brown <len.brown@intel.com>
This device needs HID_QUIRK_MULTI_INPUT in order to be presented to userspace
in a consistent way.
Reported-and-tested-by: David Gámiz Jiménez <david.gamiz@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Restructure __io_queue_sqe() so it follows simple if/else if/else
control flow. It's more readable and removes extra goto/labels.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Don't overuse goto's, complex control flow doesn't make compilers happy
and makes code harder to read.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Set IO_WQ_WORK_CONCURRENT for all REQ_F_FORCE_ASYNC requests, do that in
that is also looks better.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Inline io_link_cancel_timeout() and __io_kill_linked_timeout() into
io_kill_linked_timeout(). That allows to easily move a put of a cancelled
linked timeout out of completion_lock and to not deferring it. It is also
much more readable when not scattered across three different functions.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Move REQ_F_LINK_TIMEOUT clearing out of __io_kill_linked_timeout()
because it might return early and leave the flag set. It's not a
problem, but may be confusing.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
An armed linked timeout can never be a head of a link, so we don't need
to clear REQ_F_LINK_HEAD for it.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
__io_kill_linked_timeout() already checks for REQ_F_LTIMEOUT_ACTIVE and
it's set only for linked timeouts. No need to verify next request's
opcode.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
On the odroid N2 plus, cpufreq is not available due to an error on the cpu
regulators. vddcpu a and b get the same PWM. The one provided to vddcpu A
is incorrect. Because vddcpu B PWM is busy the regulator cannot register:
> pwm-regulator regulator-vddcpu-b: Failed to get PWM: -16
Like on the odroid n2, use PWM A out of GPIOE_2 for vddcpu A to fix the
problem
Fixes: 98d24896ee ("arm64: dts: meson: add support for the ODROID-N2+")
Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
Acked-by: Neil Armstrong <narmstrong@baylibre.com>
Signed-off-by: Kevin Hilman <khilman@baylibre.com>
Link: https://lore.kernel.org/r/20201023094139.809379-1-jbrunet@baylibre.com
The 3.10 vendor kernel defines the following GPU 20 interrupt lines:
#define INT_MALI_GP AM_IRQ(160)
#define INT_MALI_GP_MMU AM_IRQ(161)
#define INT_MALI_PP AM_IRQ(162)
#define INT_MALI_PMU AM_IRQ(163)
#define INT_MALI_PP0 AM_IRQ(164)
#define INT_MALI_PP0_MMU AM_IRQ(165)
#define INT_MALI_PP1 AM_IRQ(166)
#define INT_MALI_PP1_MMU AM_IRQ(167)
#define INT_MALI_PP2 AM_IRQ(168)
#define INT_MALI_PP2_MMU AM_IRQ(169)
#define INT_MALI_PP3 AM_IRQ(170)
#define INT_MALI_PP3_MMU AM_IRQ(171)
#define INT_MALI_PP4 AM_IRQ(172)
#define INT_MALI_PP4_MMU AM_IRQ(173)
#define INT_MALI_PP5 AM_IRQ(174)
#define INT_MALI_PP5_MMU AM_IRQ(175)
#define INT_MALI_PP6 AM_IRQ(176)
#define INT_MALI_PP6_MMU AM_IRQ(177)
#define INT_MALI_PP7 AM_IRQ(178)
#define INT_MALI_PP7_MMU AM_IRQ(179)
However, the driver from the 3.10 vendor kernel does not use the
following four interrupt lines:
- INT_MALI_PP3
- INT_MALI_PP3_MMU
- INT_MALI_PP7
- INT_MALI_PP7_MMU
Drop the "pp3" and "ppmmu3" interrupt lines. This is also important
because there is no matching entry in interrupt-names for it (meaning
the "pp2" interrupt is actually assigned to the "pp3" interrupt line).
Fixes: 7d3f6b536e ("ARM: dts: meson8: add the Mali-450 MP6 GPU")
Reported-by: Thomas Graichen <thomas.graichen@gmail.com>
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Kevin Hilman <khilman@baylibre.com>
Tested-by: thomas graichen <thomas.graichen@gmail.com>
Reviewed-by: Neil Armstrong <narmstrong@baylibre.com>
Link: https://lore.kernel.org/r/20200815181957.408649-1-martin.blumenstingl@googlemail.com
525c9e5a32 introduced pm_runtime support for the i.MX SPI driver. With
this pm_runtime is used to bring up the clocks initially. When CONFIG_PM
is disabled the clocks are no longer enabled and the driver doesn't work
anymore. Fix this by enabling the clocks in the probe function and
telling pm_runtime that the device is active using
pm_runtime_set_active().
Fixes: 525c9e5a32 spi: imx: enable runtime pm support
Tested-by: Christian Eggers <ceggers@arri.de> [tested for !CONFIG_PM only]
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Link: https://lore.kernel.org/r/20201021104513.21560-1-s.hauer@pengutronix.de
Signed-off-by: Mark Brown <broonie@kernel.org>
A delay must be introduced before the shutdown down of the mclk,
as stated in CS42L51 datasheet. Otherwise the codec may
produce some noise after the end of DAPM power down sequence.
The delay between DAC and CLOCK_SUPPLY widgets is too short.
Add a delay in mclk shutdown request to manage the shutdown delay
explicitly. From experiments, at least 10ms delay is necessary.
Set delay to 20ms as recommended in Documentation/timers/timers-howto.rst
when using msleep().
Signed-off-by: Olivier Moysan <olivier.moysan@st.com>
Link: https://lore.kernel.org/r/20201020150109.482-1-olivier.moysan@st.com
Signed-off-by: Mark Brown <broonie@kernel.org>
With the current state of code, we would endup with something like
below in /proc/asound/cards for 2 machines based on this driver.
Machine 1:
0 [DB845c ]: DB845c - DB845c
DB845c
Machine 2:
0 [LenovoYOGAC6301]: Lenovo-YOGA-C63 - Lenovo-YOGA-C630-13Q50
LENOVO-81JL-LenovoYOGAC630_13Q50-LNVNB161216
This is not very UCM friendly both w.r.t to common up configs and
card identification, and UCM2 became totally not usefull with just
one ucm sdm845.conf for two machines which have different setups
w.r.t HDMI and other dais.
Reasons for such thing is partly because Qualcomm machine drivers never
cared to set driver_name.
This patch sets up driver name for the this driver to sort out the
UCM integration issues!
after this patch contents of /proc/asound/cards:
Machine 1:
0 [DB845c ]: sdm845 - DB845c
DB845c
Machine 2:
0 [LenovoYOGAC6301]: sdm845 - Lenovo-YOGA-C630-13Q50
LENOVO-81JL-LenovoYOGAC630_13Q50-LNVNB161216
with this its possible to align with what UCM2 expects and we can have
sdm845/DB845.conf
sdm845/LENOVO-81JL-LenovoYOGAC630_13Q50-LNVNB161216.conf
... for board variants. This should scale much better!
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Link: https://lore.kernel.org/r/20201023095849.22894-1-srinivas.kandagatla@linaro.org
Signed-off-by: Mark Brown <broonie@kernel.org>
This patch introduces a new ioctl for vhost-vdpa device that can
report the iova range by the device.
For device that implements get_iova_range() method, we fetch it from
the vDPA device. If device doesn't implement get_iova_range() but
depends on platform IOMMU, we will query via DOMAIN_ATTR_GEOMETRY,
otherwise [0, ULLONG_MAX] is assumed.
For safety, this patch also rules out the map request which is not in
the valid range.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20201023090043.14430-3-jasowang@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
On rare occations there is the following error:
mmc0: Tuning timeout, falling back to fixed sampling clock
There are SD cards which takes a significant longer time to reply to the
first CMD19 command. The eSDHC takes the data timeout value into account
during the tuning period. The SDHCI core doesn't explicitly set this
timeout for the tuning procedure. Thus on the slow cards, there might be
a spurious "Buffer Read Ready" interrupt, which in turn triggers a wrong
sequence of events. In the end this will lead to an unsuccessful tuning
procedure and to the above error.
To workaround this, set the timeout to the maximum value (which is the
best we can do) and the SDHCI core will take care of the proper timeout
handling.
Fixes: ba49cbd093 ("mmc: sdhci-of-esdhc: add tuning support")
Signed-off-by: Michael Walle <michael@walle.cc>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20201022222337.19857-1-michael@walle.cc
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
The Varmilo VA104M Keyboard (04b4:07b1, reported as Varmilo Z104M)
exposes media control hotkeys as a USB HID consumer control device, but
these keys do not work in the current (5.8-rc1) kernel due to the
incorrect HID report descriptor. Fix the problem by modifying the
internal HID report descriptor.
More specifically, the keyboard report descriptor specifies the
logical boundary as 572~10754 (0x023c ~ 0x2a02) while the usage
boundary is specified as 0~10754 (0x00 ~ 0x2a02). This results in an
incorrect interpretation of input reports, causing inputs to be ignored.
By setting the Logical Minimum to zero, we align the logical boundary
with the Usage ID boundary.
Some notes:
* There seem to be multiple variants of the VA104M keyboard. This
patch specifically targets 04b4:07b1 variant.
* The device works out-of-the-box on Windows platform with the generic
consumer control device driver (hidserv.inf). This suggests that
Windows either ignores the Logical Minimum/Logical Maximum or
interprets the Usage ID assignment differently from the linux
implementation; Maybe there are other devices out there that only
works on Windows due to this problem?
Signed-off-by: Frank Yang <puilp0502@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
we found that the following race condition exists in
xfrm_alloc_userspi flow:
user thread state_hash_work thread
---- ----
xfrm_alloc_userspi()
__find_acq_core()
/*alloc new xfrm_state:x*/
xfrm_state_alloc()
/*schedule state_hash_work thread*/
xfrm_hash_grow_check() xfrm_hash_resize()
xfrm_alloc_spi /*hold lock*/
x->id.spi = htonl(spi) spin_lock_bh(&net->xfrm.xfrm_state_lock)
/*waiting lock release*/ xfrm_hash_transfer()
spin_lock_bh(&net->xfrm.xfrm_state_lock) /*add x into hlist:net->xfrm.state_byspi*/
hlist_add_head_rcu(&x->byspi)
spin_unlock_bh(&net->xfrm.xfrm_state_lock)
/*add x into hlist:net->xfrm.state_byspi 2 times*/
hlist_add_head_rcu(&x->byspi)
1. a new state x is alloced in xfrm_state_alloc() and added into the bydst hlist
in __find_acq_core() on the LHS;
2. on the RHS, state_hash_work thread travels the old bydst and tranfers every xfrm_state
(include x) into the new bydst hlist and new byspi hlist;
3. user thread on the LHS gets the lock and adds x into the new byspi hlist again.
So the same xfrm_state (x) is added into the same list_hash
(net->xfrm.state_byspi) 2 times that makes the list_hash become
an inifite loop.
To fix the race, x->id.spi = htonl(spi) in the xfrm_alloc_spi() is moved
to the back of spin_lock_bh, sothat state_hash_work thread no longer add x
which id.spi is zero into the hash_list.
Fixes: f034b5d4ef ("[XFRM]: Dynamic xfrm_state hash table sizing.")
Signed-off-by: zhuoliang zhang <zhuoliang.zhang@mediatek.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
The usb-hid keyboard-dock for the Acer Switch 10 SW5-012 model declares
an application and hid-usage page of 0x0088 for the INPUT(4) report which
it sends. This reports contains 2 8-bit fields which are declared as
HID_MAIN_ITEM_VARIABLE.
The keyboard-touchpad combo never actually generates this report, except
when the touchpad is toggled on/off with the Fn + F7 hotkey combo. The
toggle on/off is handled inside the keyboard-dock, when the touchpad is
toggled off it simply stops sending events.
When the touchpad is toggled on/off an INPUT(4) report is generated with
the first content byte set to 120/121, before this commit the kernel
would report this as ABS_MISC 120/121 events.
Patch the descriptor to replace the HID_MAIN_ITEM_VARIABLE with
HID_MAIN_ITEM_RELATIVE (because no key-presss release events are send)
and add mappings for the 0x00880078 and 0x00880079 usages to generate
touchpad on/off key events when the touchpad is toggled on/off.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
When running in lazy TLB mode the currently active page tables might
be the ones of a previous process, e.g. when running a kernel thread.
This can be problematic in case kernel code is being modified via
text_poke() in a kernel thread, and on another processor exit_mmap()
is active for the process which was running on the first cpu before
the kernel thread.
As text_poke() is using a temporary address space and the former
address space (obtained via cpu_tlbstate.loaded_mm) is restored
afterwards, there is a race possible in case the cpu on which
exit_mmap() is running wants to make sure there are no stale
references to that address space on any cpu active (this e.g. is
required when running as a Xen PV guest, where this problem has been
observed and analyzed).
In order to avoid that, drop off TLB lazy mode before switching to the
temporary address space.
Fixes: cefa929c03 ("x86/mm: Introduce temporary mm structs")
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20201009144225.12019-1-jgross@suse.com
It is valid (albeit uncommon) to call local_irq_enable() without first
having called local_irq_disable(). In this case we enter
lockdep_hardirqs_on*() with IRQs enabled and trip a preemption warning
for using __this_cpu_read().
Use this_cpu_read() instead to avoid the warning.
Fixes: 4d004099a6 ("lockdep: Fix lockdep recursion")
Reported-by: syzbot+53f8ce8bbc07924b6417@syzkaller.appspotmail.com
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
It seems that the PID 0x4072 was missing from the list Logitech gave me
for this mouse, as I found one with it in the wild (with which I tested
this patch).
Fixes: 4435ff2f09 ("HID: logitech: Enable high-resolution scrolling on Logitech mice")
Signed-off-by: Harry Cutts <hcutts@chromium.org>
Acked-by: Peter Hutterer <peter.hutterer@who-t.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
The Trust Flex Design Tablet has an UGTizer USB ID and requires the same
initialization as the UGTizer GP0610 to be detected as a graphics tablet
instead of a mouse.
Signed-off-by: Martijn van de Streek <martijn@zeewinde.xyz>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
If guest fills non-priv bb on ApolloLake/Broxton as Mesa i965 does in:
717e7539124d (i965: Use a WC map and memcpy for the batch instead of pw-)
Due to the missing flush of bb filled by VM vCPU, host GPU hangs on
executing these MI_BATCH_BUFFER.
Temporarily workaround this by setting SNOOP bit for PAT3 used by PPGTT
PML4 PTE: PAT(0) PCD(1) PWT(1).
The performance is still expected to be low, will need further improvement.
Acked-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Colin Xu <colin.xu@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20201012045231.226748-1-colin.xu@intel.com
Guest driver may reset HWSP to 0 as init value during D3->D0:
The full sequence is:
- Boot ->D0
- Update HWSP
- D0->D3
- ...In D3 state...
- D3->D0
- DMLR reset.
- Set engine HWSP to 0.
- Set engine ring mode to 0.
- Set engine HWSP to correct value.
- Set engine ring mode to correct value.
Ring mode is masked register so set 0 won't take effect.
However HWPS addr 0 is considered as invalid GGTT address which will
report error like:
gvt: vgpu 1: write invalid HWSP address, reg:0x2080, value:0x0
gvt: vgpu 1: fail to emulate MMIO write 00002080 len 4
Detected your guest driver doesn't support GVT-g.
Now vgpu 2 will enter failsafe mode.
Zero out HWSP addr is considered as a valid setting from device driver
so don't treat it as invalid HWSP addr.
V2:
Treat HWSP addr 0 as valid. (zhenyu)
V3:
Change patch title.
Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Colin Xu <colin.xu@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20200911065239.147789-1-colin.xu@intel.com
When doing a fallocate() we have a short time window, after reserving an
extent and before starting a transaction, where if relocation for the block
group containing the reserved extent happens, we can end up missing the
extent in the data relocation inode causing relocation to fail later.
This only happens when we don't pass a transaction to the internal
fallocate function __btrfs_prealloc_file_range(), which is for all the
cases where fallocate() is called from user space (the internal use cases
include space cache extent allocation and relocation).
When the race triggers the relocation failure, it produces a trace like
the following:
[200611.995995] ------------[ cut here ]------------
[200611.997084] BTRFS: Transaction aborted (error -2)
[200611.998208] WARNING: CPU: 3 PID: 235845 at fs/btrfs/ctree.c:1074 __btrfs_cow_block+0x3a0/0x5b0 [btrfs]
[200611.999042] Modules linked in: dm_thin_pool dm_persistent_data (...)
[200612.003287] CPU: 3 PID: 235845 Comm: btrfs Not tainted 5.9.0-rc6-btrfs-next-69 #1
[200612.004442] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[200612.006186] RIP: 0010:__btrfs_cow_block+0x3a0/0x5b0 [btrfs]
[200612.007110] Code: 1b 00 00 02 72 2a 83 f8 fb 0f 84 b8 01 (...)
[200612.007341] BTRFS warning (device sdb): Skipping commit of aborted transaction.
[200612.008959] RSP: 0018:ffffaee38550f918 EFLAGS: 00010286
[200612.009672] BTRFS: error (device sdb) in cleanup_transaction:1901: errno=-30 Readonly filesystem
[200612.010428] RAX: 0000000000000000 RBX: ffff9174d96f4000 RCX: 0000000000000000
[200612.011078] BTRFS info (device sdb): forced readonly
[200612.011862] RDX: 0000000000000001 RSI: ffffffffa8161978 RDI: 00000000ffffffff
[200612.013215] RBP: ffff9172569a0f80 R08: 0000000000000000 R09: 0000000000000000
[200612.014263] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9174b8403b88
[200612.015203] R13: ffff9174b8400a88 R14: ffff9174c90f1000 R15: ffff9174a5a60e08
[200612.016182] FS: 00007fa55cf878c0(0000) GS:ffff9174ece00000(0000) knlGS:0000000000000000
[200612.017174] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[200612.018418] CR2: 00007f8fb8048148 CR3: 0000000428a46003 CR4: 00000000003706e0
[200612.019510] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[200612.020648] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[200612.021520] Call Trace:
[200612.022434] btrfs_cow_block+0x10b/0x250 [btrfs]
[200612.023407] do_relocation+0x54e/0x7b0 [btrfs]
[200612.024343] ? do_raw_spin_unlock+0x4b/0xc0
[200612.025280] ? _raw_spin_unlock+0x29/0x40
[200612.026200] relocate_tree_blocks+0x3bc/0x6d0 [btrfs]
[200612.027088] relocate_block_group+0x2f3/0x600 [btrfs]
[200612.027961] btrfs_relocate_block_group+0x15e/0x340 [btrfs]
[200612.028896] btrfs_relocate_chunk+0x38/0x110 [btrfs]
[200612.029772] btrfs_balance+0xb22/0x1790 [btrfs]
[200612.030601] ? btrfs_ioctl_balance+0x253/0x380 [btrfs]
[200612.031414] btrfs_ioctl_balance+0x2cf/0x380 [btrfs]
[200612.032279] btrfs_ioctl+0x620/0x36f0 [btrfs]
[200612.033077] ? _raw_spin_unlock+0x29/0x40
[200612.033948] ? handle_mm_fault+0x116d/0x1ca0
[200612.034749] ? up_read+0x18/0x240
[200612.035542] ? __x64_sys_ioctl+0x83/0xb0
[200612.036244] __x64_sys_ioctl+0x83/0xb0
[200612.037269] do_syscall_64+0x33/0x80
[200612.038190] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[200612.038976] RIP: 0033:0x7fa55d07ed87
[200612.040127] Code: 00 00 00 48 8b 05 09 91 0c 00 64 c7 00 26 (...)
[200612.041669] RSP: 002b:00007ffd5ebf03e8 EFLAGS: 00000206 ORIG_RAX: 0000000000000010
[200612.042437] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007fa55d07ed87
[200612.043511] RDX: 00007ffd5ebf0470 RSI: 00000000c4009420 RDI: 0000000000000003
[200612.044250] RBP: 0000000000000003 R08: 000055d8362642a0 R09: 00007fa55d148be0
[200612.044963] R10: fffffffffffff52e R11: 0000000000000206 R12: 00007ffd5ebf1614
[200612.045683] R13: 00007ffd5ebf0470 R14: 0000000000000002 R15: 00007ffd5ebf0470
[200612.046361] irq event stamp: 0
[200612.047040] hardirqs last enabled at (0): [<0000000000000000>] 0x0
[200612.047725] hardirqs last disabled at (0): [<ffffffffa6eb5ab3>] copy_process+0x823/0x1bc0
[200612.048387] softirqs last enabled at (0): [<ffffffffa6eb5ab3>] copy_process+0x823/0x1bc0
[200612.049024] softirqs last disabled at (0): [<0000000000000000>] 0x0
[200612.049722] ---[ end trace 49006c6876e65227 ]---
The race happens like this:
1) Task A starts an fallocate() (plain or zero range) and it calls
__btrfs_prealloc_file_range() with the 'trans' parameter set to NULL;
2) Task A calls btrfs_reserve_extent() and gets an extent that belongs to
block group X;
3) Before task A gets into btrfs_replace_file_extents(), through the call
to insert_prealloc_file_extent(), task B starts relocation of block
group X;
4) Task B enters btrfs_relocate_block_group() and it sets block group X to
RO mode;
5) Task B enters relocate_block_group(), it calls prepare_to_relocate()
whichs joins/starts a transaction and then commits the transaction;
6) Task B then starts scanning the extent tree looking for extents that
belong to block group X - it does not find yet the extent reserved by
task A, since that extent was not yet added to the extent tree, as its
delayed reference was not even yet created at this point;
7) The data relocation inode ends up not having the extent reserved by
task A associated to it;
8) Task A then starts a transaction through btrfs_replace_file_extents(),
inserts a file extent item in the subvolume tree pointing to the
reserved extent and creates a delayed reference for it;
9) Task A finishes and returns success to user space;
10) Later on, while relocation is still in progress, the leaf where task A
inserted the new file extent item is COWed, so we end up at
__btrfs_cow_block(), which calls btrfs_reloc_cow_block(), and that in
turn calls relocation.c:replace_file_extents();
11) At relocation.c:replace_file_extents() we iterate over all the items in
the leaf and find the file extent item pointing to the extent that was
allocated by task A, and then call relocation.c:get_new_location(), to
find the new location for the extent;
12) However relocation.c:get_new_location() fails, returning -ENOENT,
because it couldn't find a corresponding file extent item associated
with the data relocation inode. This is because the extent was not seen
in the extent tree at step 6). The -ENOENT error is propagated to
__btrfs_cow_block(), which aborts the transaction.
So fix this simply by decrementing the block group's number of reservations
after calling insert_prealloc_file_extent(), as relocation waits for that
counter to go down to zero before calling prepare_to_relocate() and start
looking for extents in the extent tree.
This issue only started to happen recently as of commit 8fccebfa53
("btrfs: fix metadata reservation for fallocate that leads to transaction
aborts"), because now we can reserve an extent before starting/joining a
transaction, and previously we always did it after that, so relocation
ended up waiting for a concurrent fallocate() to finish because before
searching for the extents of the block group, it starts/joins a transaction
and then commits it (at prepare_to_relocate()), which made it wait for the
fallocate task to complete first.
Fixes: 8fccebfa53 ("btrfs: fix metadata reservation for fallocate that leads to transaction aborts")
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Now that GENERIC_IRQ_IPI selects IRQ_DOMAIN_HIERARCHY, there is no
need to have this conditional select for IRQ_MIPS_CPU. Similarily,
MIPS_GIC only needs selecting GENERIC_IRQ_IPI.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Marc Zyngier <maz@kernel.org>
mst_intc_of_init has no external caller, so let's make it static.
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
The MStar interrupt controller is only found on MStar, SigmaStar, and
Mediatek SoCs. Hence add dependencies on ARCH_MEDIATEK and
ARCH_MSTARV7, to prevent asking the user about the MStar interrupt
controller driver when configuring a kernel without support for MStar,
SigmaStar, and Mediatek SoCs.
Fixes: ad4c938c92 ("irqchip/irq-mst: Add MStar interrupt controller support")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Daniel Palmer <daniel@thingy.jp>
Link: https://lore.kernel.org/r/20201014131703.18021-1-geert+renesas@glider.be
sdhci-of-dwcmshc meets an eMMC read performance regression with below
command after commit 427b6514d0 ("mmc: sdhci: Add Auto CMD Auto
Select support"):
dd if=/dev/mmcblk0 of=/dev/null bs=8192 count=100000
Before the commit, the above command gives 120MB/s
After the commit, the above command gives 51.3 MB/s
So it looks like sdhci-of-dwcmshc expects Version 4 Mode for Auto
CMD Auto Select. Fix the performance degradation by ensuring v4_mode
is true to use Auto CMD Auto Select.
Fixes: 427b6514d0 ("mmc: sdhci: Add Auto CMD Auto Select support")
Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20201015174115.4cf2c19a@xhacker.debian
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
After enabling interconnect scaling for display on the db845c board,
in certain configurations the board hangs, while the following errors
are observed on the console:
Error sending AMC RPMH requests (-110)
qcom_rpmh TCS Busy, retrying RPMH message send: addr=0x50000
qcom_rpmh TCS Busy, retrying RPMH message send: addr=0x50000
qcom_rpmh TCS Busy, retrying RPMH message send: addr=0x50000
...
In this specific case, the above is related to one of the sequencers
being stuck, while client drivers are returning from probe and trying
to disable the currently unused clock and interconnect resources.
Generally we want to keep the multimedia NoC enabled like the rest of
the NoCs, so let's set the keepalive flag on it too.
Fixes: aae57773fb ("interconnect: qcom: sdm845: Split qnodes into their respective NoCs")
Reported-by: Amit Pundir <amit.pundir@linaro.org>
Reviewed-by: Mike Tipton <mdtipton@codeaurora.org>
Tested-by: John Stultz <john.stultz@linaro.org>
Link: https://lore.kernel.org/r/20201012194034.26944-1-georgi.djakov@linaro.org
Signed-off-by: Georgi Djakov <georgi.djakov@linaro.org>
Stress tests show that DSP may occasionally be late with signaling WAIT
state when all pins are made use of simultaneously plus start/stop
(pause) gets involved. While this isn't tied to standard audio scenarios
where only System Pin (playback and capture) is involved, ensure user is
not hindered when playing with more advanced scenarios.
>From DSP perspective, clock acts as a resource: low clock equals less
resources, high clock more resources. Relax clock selection procedure so
only low -> high switch is allowed when awaiting WAIT signal times out.
Once active stream count decreases, DSP will have more time internally to
adjust thus low clock selection becomes possible again.
Signed-off-by: Cezary Rojewski <cezary.rojewski@intel.com>
Link: https://lore.kernel.org/r/20201012103221.30759-2-cezary.rojewski@intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
catpt_dai_pcm_new() invoked during new PCM runtime creation configures
SSP by sending IPC to DSP firmware. For that to succeed device needs to
be up and running. While components default probing behavior -
snd_soc_catpt causing machine board module to load just after it - needs
no changes, machine board's module may be unloaded and re-loaded at a
different time e.g.: when catpt is already asleep.
Wake device explicitly in catpt_dai_pcm_new() to ensure communication is
established before sending any IPCs, enabling those advanced scenarios
in the process.
Signed-off-by: Cezary Rojewski <cezary.rojewski@intel.com>
Link: https://lore.kernel.org/r/20201012103221.30759-1-cezary.rojewski@intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
SND_SST_IPC and its _PCI and _ACPI variants all target
sound/soc/intel/atom solution alone. SND_SST_IPC is the core component,
required for PCI and ACPI based atom platforms both. _PCI and _ACPI
target Merrifield/Edison and Baytrial/Cherrytrail platforms
respectively.
On top of that, there is an equivalent set of configs targeting the same
solution:
- SND_SST_ATOM_HIFI2_PLATFORM (core)
- SND_SST_ATOM_HIFI2_PLATFORM_PCI
- SND_SST_ATOM_HIFI2_PLATFORM_ACPI
As both sets do the same job - allow for granular platform selection -
remove the duplicate set and rely on SND_SST_ATOM_HIFI2_PLATOFRM_XXX
configs alone.
Signed-off-by: Cezary Rojewski <cezary.rojewski@intel.com>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20201012095005.29859-1-cezary.rojewski@intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
When more than a single SCMI device are present in the system, the
creation of the notification workqueue with the WQ_SYSFS flag will lead
to the following sysfs duplicate node warning:
sysfs: cannot create duplicate filename '/devices/virtual/workqueue/scmi_notify'
CPU: 0 PID: 20 Comm: kworker/0:1 Not tainted 5.9.0-gdf4dd84a3f7d #29
Hardware name: Broadcom STB (Flattened Device Tree)
Workqueue: events deferred_probe_work_func
Backtrace:
show_stack + 0x20/0x24
dump_stack + 0xbc/0xe0
sysfs_warn_dup + 0x70/0x80
sysfs_create_dir_ns + 0x15c/0x1a4
kobject_add_internal + 0x140/0x4d0
kobject_add + 0xc8/0x138
device_add + 0x1dc/0xc20
device_register + 0x24/0x28
workqueue_sysfs_register + 0xe4/0x1f0
alloc_workqueue + 0x448/0x6ac
scmi_notification_init + 0x78/0x1dc
scmi_probe + 0x268/0x4fc
platform_drv_probe + 0x70/0xc8
really_probe + 0x184/0x728
driver_probe_device + 0xa4/0x278
__device_attach_driver + 0xe8/0x148
bus_for_each_drv + 0x108/0x158
__device_attach + 0x190/0x234
device_initial_probe + 0x1c/0x20
bus_probe_device + 0xdc/0xec
deferred_probe_work_func + 0xd4/0x11c
process_one_work + 0x420/0x8f0
worker_thread + 0x4fc/0x91c
kthread + 0x21c/0x22c
ret_from_fork + 0x14/0x20
kobject_add_internal failed for scmi_notify with -EEXIST, don't try to
register things with the same name in the same directory.
arm-scmi brcm_scmi@1: SCMI Notifications - Initialization Failed.
arm-scmi brcm_scmi@1: SCMI Notifications NOT available.
arm-scmi brcm_scmi@1: SCMI Protocol v1.0 'brcm-scmi:' Firmware version 0x1
Fix this by using dev_name(handle->dev) which guarantees that the name is
unique and this also helps correlate which notification workqueue corresponds
to which SCMI device instance.
Link: https://lore.kernel.org/r/20201014021737.287340-1-f.fainelli@gmail.com
Fixes: bd31b24969 ("firmware: arm_scmi: Add notification dispatch and delivery")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
[sudeep.holla: trimmed backtrace to remove all unwanted hexcodes and timestamps]
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
When a protocol registers its events, the notification core takes care
to rescan the hashtable of pending event handlers and activate all the
possibly existent handlers referring to any of the events that are just
registered by the new protocol. When a pending handler becomes active
the core requests and enables the corresponding events in the SCMI
firmware.
If, for whatever reason, the enable fails, such invalid event handler
must be finally removed and freed. Let us ensure to use the
scmi_put_active_handler() helper which handles properly the needed
additional locking.
Failing to properly acquire all the needed mutexes exposes a race that
leads to the following splat being observed:
WARNING: CPU: 0 PID: 388 at lib/refcount.c:28 refcount_warn_saturate+0xf8/0x148
Hardware name: ARM LTD ARM Juno Development Platform/ARM Juno Development
Platform, BIOS EDK II Jun 30 2020
pstate: 40000005 (nZcv daif -PAN -UAO BTYPE=--)
pc : refcount_warn_saturate+0xf8/0x148
lr : refcount_warn_saturate+0xf8/0x148
Call trace:
refcount_warn_saturate+0xf8/0x148
scmi_put_handler_unlocked.isra.10+0x204/0x208
scmi_put_handler+0x50/0xa0
scmi_unregister_notifier+0x1bc/0x240
scmi_notify_tester_remove+0x4c/0x68 [dummy_scmi_consumer]
scmi_dev_remove+0x54/0x68
device_release_driver_internal+0x114/0x1e8
driver_detach+0x58/0xe8
bus_remove_driver+0x88/0xe0
driver_unregister+0x38/0x68
scmi_driver_unregister+0x1c/0x28
scmi_drv_exit+0x1c/0xae0 [dummy_scmi_consumer]
__arm64_sys_delete_module+0x1a4/0x268
el0_svc_common.constprop.3+0x94/0x178
do_el0_svc+0x2c/0x98
el0_sync_handler+0x148/0x1a8
el0_sync+0x158/0x180
Link: https://lore.kernel.org/r/20201013133109.49821-1-cristian.marussi@arm.com
Fixes: e7c215f358 ("firmware: arm_scmi: Add notification callbacks-registration")
Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
GCC 10 optimizes the scheduler code differently than its predecessors.
When CONFIG_DEBUG_SECTION_MISMATCH=y, the Makefile forces GCC not
to inline some functions (-fno-inline-functions-called-once). Before GCC
10, "no-inlined" __schedule() starts with the usual prologue:
push %bp
mov %sp, %bp
So the ORC unwinder simply picks stack pointer from %bp and
unwinds from __schedule() just perfectly:
$ cat /proc/1/stack
[<0>] ep_poll+0x3e9/0x450
[<0>] do_epoll_wait+0xaa/0xc0
[<0>] __x64_sys_epoll_wait+0x1a/0x20
[<0>] do_syscall_64+0x33/0x40
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
But now, with GCC 10, there is no %bp prologue in __schedule():
$ cat /proc/1/stack
<nothing>
The ORC entry of the point in __schedule() is:
sp:sp+88 bp:last_sp-48 type:call end:0
In this case, nobody subtracts sizeof "struct inactive_task_frame" in
__unwind_start(). The struct is put on the stack by __switch_to_asm() and
only then __switch_to_asm() stores %sp to task->thread.sp. But we start
unwinding from a point in __schedule() (stored in frame->ret_addr by
'call') and not in __switch_to_asm().
So for these example values in __unwind_start():
sp=ffff94b50001fdc8 bp=ffff8e1f41d29340 ip=__schedule+0x1f0
The stack is:
ffff94b50001fdc8: ffff8e1f41578000 # struct inactive_task_frame
ffff94b50001fdd0: 0000000000000000
ffff94b50001fdd8: ffff8e1f41d29340
ffff94b50001fde0: ffff8e1f41611d40 # ...
ffff94b50001fde8: ffffffff93c41920 # bx
ffff94b50001fdf0: ffff8e1f41d29340 # bp
ffff94b50001fdf8: ffffffff9376cad0 # ret_addr (and end of the struct)
0xffffffff9376cad0 is __schedule+0x1f0 (after the call to
__switch_to_asm). Now follow those 88 bytes from the ORC entry (sp+88).
The entry is correct, __schedule() really pushes 48 bytes (8*7) + 32 bytes
via subq to store some local values (like 4U below). So to unwind, look
at the offset 88-sizeof(long) = 0x50 from here:
ffff94b50001fe00: ffff8e1f41578618
ffff94b50001fe08: 00000cc000000255
ffff94b50001fe10: 0000000500000004
ffff94b50001fe18: 7793fab6956b2d00 # NOTE (see below)
ffff94b50001fe20: ffff8e1f41578000
ffff94b50001fe28: ffff8e1f41578000
ffff94b50001fe30: ffff8e1f41578000
ffff94b50001fe38: ffff8e1f41578000
ffff94b50001fe40: ffff94b50001fed8
ffff94b50001fe48: ffff8e1f41577ff0
ffff94b50001fe50: ffffffff9376cf12
Here ^^^^^^^^^^^^^^^^ is the correct ret addr from
__schedule(). It translates to schedule+0x42 (insn after a call to
__schedule()).
BUT, unwind_next_frame() tries to take the address starting from
0xffff94b50001fdc8. That is exactly from thread.sp+88-sizeof(long) =
0xffff94b50001fdc8+88-8 = 0xffff94b50001fe18, which is garbage marked as
NOTE above. So this quits the unwinding as 7793fab6956b2d00 is obviously
not a kernel address.
There was a fix to skip 'struct inactive_task_frame' in
unwind_get_return_address_ptr in the following commit:
187b96db5c ("x86/unwind/orc: Fix unwind_get_return_address_ptr() for inactive tasks")
But we need to skip the struct already in the unwinder proper. So
subtract the size (increase the stack pointer) of the structure in
__unwind_start() directly. This allows for removal of the code added by
commit 187b96db5c completely, as the address is now at
'(unsigned long *)state->sp - 1', the same as in the generic case.
[ mingo: Cleaned up the changelog a bit, for better readability. ]
Fixes: ee9f8fce99 ("x86/unwind: Add the ORC unwinder")
Bug: https://bugzilla.suse.com/show_bug.cgi?id=1176907
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20201014053051.24199-1-jslaby@suse.cz
kexec_file_load() currently reuses the old boot_params.screen_info,
but if drivers have change the hardware state, boot_param.screen_info
could contain invalid info.
For example, the video type might be no longer VGA, or the frame buffer
address might be changed. If the kexec kernel keeps using the old screen_info,
kexec'ed kernel may attempt to write to an invalid framebuffer
memory region.
There are two screen_info instances globally available, boot_params.screen_info
and screen_info. Later one is a copy, and is updated by drivers.
So let kexec_file_load use the updated copy.
[ mingo: Tidied up the changelog. ]
Signed-off-by: Kairui Song <kasong@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20201014092429.1415040-2-kasong@redhat.com
Few commands provide the list of description partially and require
to be called consecutively until all the descriptors are fetched
completely. In such cases, we don't release the buffers and reuse
them for consecutive transmits.
However, currently we don't reset the Rx size which will be set as
per the response for the last transmit. This may result in incorrect
response size being interpretted as the firmware may repond with size
greater than the one set but we read only upto the size set by previous
response.
Let us reset the receive buffer size to max possible in such cases as
we don't know the exact size of the response.
Link: https://lore.kernel.org/r/20201012141746.32575-1-sudeep.holla@arm.com
Fixes: b6f20ff8bd ("firmware: arm_scmi: add common infrastructure and support for base protocol")
Reported-by: Etienne Carriere <etienne.carriere@linaro.org>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Since the addition of session's client UUID generation via commit [1],
login via REE kernel method was disallowed. So fix that via passing
nill UUID in case of TEE_IOCTL_LOGIN_REE_KERNEL method as well.
Fixes: e33bcbab16 ("tee: add support for session's client UUID generation") [1]
Signed-off-by: Sumit Garg <sumit.garg@linaro.org>
Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>
SMC/HVC can transmit only one message at the time as the shared memory
needs to be protected and the calls are synchronous.
However, in order to allow multiple threads to send SCMI messages
simultaneously, we need a larger poll of memory.
Let us just use value of 20 to keep it in sync mailbox transport
implementation. Any other value must work perfectly.
Link: https://lore.kernel.org/r/20201008143722.21888-4-etienne.carriere@linaro.org
Fixes: 1dc6558062 ("firmware: arm_scmi: Add smc/hvc transport")
Cc: Peng Fan <peng.fan@nxp.com>
Signed-off-by: Etienne Carriere <etienne.carriere@linaro.org>
[sudeep.holla: reworded the commit message to indicate the practicality]
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
As Nicolas noticed in his case, when xfrm_interface module is installed
the standard IP tunnels will break in receiving packets.
This is caused by the IP tunnel handlers with a higher priority in xfrm
interface processing incoming packets by xfrm_input(), which would drop
the packets and return 0 instead when anything wrong happens.
Rather than changing xfrm_input(), this patch is to adjust the priority
for the IP tunnel handlers in xfrm interface, so that the packets would
go to xfrmi's later than the others', as the others' would not drop the
packets when the handlers couldn't process them.
Note that IPCOMP also defines its own IPIP tunnel handler and it calls
xfrm_input() as well, so we must make its priority lower than xfrmi's,
which means having xfrmi loaded would still break IPCOMP. We may seek
another way to fix it in xfrm_input() in the future.
Reported-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Tested-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Fixes: da9bbf0598 ("xfrm: interface: support IPIP and IPIP6 tunnels processing with .cb_handler")
FIxes: d7b360c286 ("xfrm: interface: support IP6IP6 and IP6IP tunnels processing with .cb_handler")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
We have a dedicated "amlogic,meson-g12a-dwmac" compatible string for the
Ethernet controller since commit 3efdb92426 ("dt-bindings: net:
dwmac-meson: Add a compatible string for G12A onwards").
Using the AXG compatible string worked fine so far because the
dwmac-meson8b driver doesn't handle the newly introduced register bits
for G12A. However, once that changes the driver must be probed with the
correct compatible string to manage these new register bits.
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Reviewed-by: Neil Armstrong <narmstrong@baylibre.com>
Signed-off-by: Kevin Hilman <khilman@baylibre.com>
Link: https://lore.kernel.org/r/20200925211743.537496-1-martin.blumenstingl@googlemail.com
The MMU off code path in ll_get_coherency_base() attempts to decide
whether the coherency fabric is mapped by testing the value of
coherency_base, which carries its virtual address if its mapped, and
0x0 otherwise.
However, what the code actually does is take the virtual address of
the coherency_base symbol, and compare it with 0x0, which are never
equal, and so the branch is never taken. In fact, with the MMU off,
dereferencing the VA of coherency_base is not possible to begin with,
nor can its value be relied upon with the MMU off since it is not
cleaned to the Dcache as is done with coherency_phys_base in
armada_370_coherency_init().
Instead, the value of coherency_phys_base is returned, which results
in the correct behavior since it will be 0x0 as well if the coherency
fabric is not mapped, and it is accessible with the MMU off. So just
drop the comparison and the branch.
Fixes: 30cdef9710 ("ARM: mvebu: make the coherency_ll.S functions work with no coherency fabric")
Cc: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Gregory CLEMENT <gregory.clement@bootlin.com>
Cc: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Espressobin boards have 3 ethernet ports and some of them got assigned more
then one MAC address. MAC addresses are stored in U-Boot environment.
Since commit a2c7023f70 ("net: dsa: read mac address from DT for slave
device") kernel can use MAC addresses from DT for particular DSA port.
Currently Espressobin DTS file contains alias just for ethernet0.
This patch defines additional ethernet aliases in Espressobin DTS files, so
bootloader can fill correct MAC address for DSA switch ports if more MAC
addresses were specified.
DT alias ethernet1 is used for wan port, DT aliases ethernet2 and ethernet3
are used for lan ports for both Espressobin revisions (V5 and V7).
Fixes: 5253cb8c00 ("arm64: dts: marvell: espressobin: add ethernet alias")
Cc: <stable@vger.kernel.org> # a2c7023f70: dsa: read mac address
Signed-off-by: Pali Rohár <pali@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Andre Heider <a.heider@gmail.com>
Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
For compatibility reasons, Glibc off_t is a 32-bit type on 32-bit x86
unless _FILE_OFFSET_BITS=64 is defined. Add this define, as otherwise
reading MSRs with index 0x80000000 and above attempts a pread with a
negative offset, which fails.
Signed-off-by: Alexander Monakov <amonakov@ispras.ru>
Tested-by: Liwei Song <liwei.song@windriver.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Family 19h processors have the same RAPL (Running average power limit)
hardware register interface as Family 17h processors.
Change the family checks to succeed for Family 17h and above to enable
core and package energy measurement on Family 19h machines.
Also update the TDP to the largest found at the bottom of the page at
amd.com->processors->servers->epyc->2nd-gen-epyc, i.e., the EPYC 7H12.
Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Len Brown <lenb@kernel.org>
Cc: linux-pm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Len Brown <len.brown@intel.com>
Jacobsville doesn't have Package C2 and C6. Also
Core and DRAM RAPL are not available. Adjust output
accordingly.
Signed-off-by: Antti Laakso <antti.laakso@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
The column already present called GFXMHz reads from gt_cur_freq_mhz,
which represents the GT frequency that was requested, but power
management might not be able to do that. So the new column will display
what the actual frequency GT is running at.
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
I've encountered an issue with x86_energy_perf_policy. If I run it on a
machine that I'm told is a qemu-kvm virtual machine running inside a
privileged container, I get the following error:
x86_energy_perf_policy: /dev/cpu/0/msr offset 0x1ad read failed: Input/output error
I get the same error in a Digital Ocean droplet, so that might be a
similar environment.
I created the following patch which is intended to give a more
user-friendly message. It's based on a patch for turbostat from Prarit
Bhargava that was posted some time ago. The patch is "[v2] turbostat:
Running on virtual machine is not supported" [1].
Given my limited knowledge of the topic, I can't say with confidence
that this is the right solution, though (that's why this is not an
official patch submission). Also, I'm not sure what the convention with
exit codes is in this tool. Also, instead of the error message, perhaps
the tool should just not print anything in this case, which is how it
behaves in a "regular" VM?
[1] https://patchwork.kernel.org/patch/9868587/
Signed-off-by: Ondřej Lysoněk <olysonek@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Like we skip PC3 and PC6 columns when the package C-state limit
disables them, skip PC8/PC9/CP10 under analogous conditions.
Reported-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
turbostat formatting is broken with ACPI CST for enumeration. The
problem is that the CX_ACPI% is eight characters long which does not
work with tab formatting. One simple solution is to remove the underbar
from the state name such that C1_ACPI will be displayed as C1ACPI.
Signed-off-by: David Arcari <darcari@redhat.com>
Cc: Len Brown <lenb@kernel.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Len Brown <len.brown@intel.com>
Rationale:
Reduces attack surface on kernel devs opening the links for MITM
as HTTPS traffic is much harder to manipulate.
Deterministic algorithm:
For each file:
If not .svg:
For each line:
If doesn't contain `\bxmlns\b`:
For each link, `\bhttp://[^# \t\r\n]*(?:\w|/)`:
If neither `\bgnu\.org/license`, nor `\bmozilla\.org/MPL\b`:
If both the HTTP and HTTPS versions
return 200 OK and serve the same content:
Replace HTTP with HTTPS.
Signed-off-by: Alexander A. Klimov <grandmaster@al2klimov.de>
Signed-off-by: Len Brown <len.brown@intel.com>
Disabling cpu 0 results in an error
turbostat: /sys/devices/system/cpu/cpu0/topology/thread_siblings: open failed: No such file or directory
Use sched_getcpu() instead of a hardcoded cpu 0 to get the max cpu number.
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Since the RAPL Joule Counter is 32 bit, turbostat would
only print a *star* instead of printing the actual energy
consumed to indicate the overflow due to long duration.
This does not meet the requirement from servers as the
sampling time of turbostat is usually very long on servers.
So maintain a set of MSR buffer, and update them
periodically before the 32bit MSR register is wrapped round,
so as to avoid the overflow.
The idea is similar to the implementation of ktime_get():
Periodical MSR timer:
total_rapl_sum += (current_rapl_msr - last_rapl_msr);
Using get_msr_sum() to get the accumulated RAPL:
return (current_rapl_msr - last_rapl_msr) + total_rapl_sum;
The accumulated RAPL mechanism will be turned on in next patch.
Originally-by: Aaron Lu <aaron.lwe@gmail.com>
Reviewed-by: Doug Smythies <dsmythies@telus.net>
Tested-by: Doug Smythies <dsmythies@telus.net>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Change the energy variable from 32bit to 64bit,
so that it can record long time duration.
After this conversion, adjust the DELTA_WRAP32() accordingly.
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
If the --quiet option is not used, turbostat prints a useful system
configuration header during startup.
But inclusion of idle system configuration information in this header
is currently a function of inclusion in the columns chosen to be displayed.
Always list this idle system configuration.
Signed-off-by: Doug Smythies <dsmythies@telus.net>
Signed-off-by: Len Brown <len.brown@intel.com>
Users are puzzled when they use tuned performance and all their
C-states vanish. Dump /dev/cpu_dma_latency and state
whether the value is default, or constraining,
to explain this situation.
Signed-off-by: Len Brown <len.brown@intel.com>
2020-09-03 13:48:07 -04:00
2159 changed files with 26843 additions and 40128 deletions
Example with: "compare_pulse_OC4REF_r_or_OC6REF_r"::
X
X X
X . . X
X . . X
X . . X
count X . . . . X
. . . .
. . . .
+---------------+
OC4REF|..|
+-+. . +-+
. +---+ .
OC6REF. | | .
+-------+ +-------+
+-+ +-+
TRGO2 | | | |
+-+ +---+ +---------+
What: /sys/bus/iio/devices/triggerX/master_mode
KernelVersion: 4.11
@@ -104,6 +122,7 @@ Description:
Configure the device counter enable modes, in all case
counting direction is set by in_count0_count_direction
attribute and the counter is clocked by the internal clock.
always:
Counter is always ON.
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.